Skip to main content
Runflow

Happy Horse Text-to-Video

Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

By Alibaba

Pricing: $0.14 per second

Overview

Happy Horse Text-to-Video is Alibaba's flagship 1080p video generator with **synchronized native audio** built in — no separate audio model, no lip-sync rig, no post-production. Send a single prompt; get a fully-scored clip back.

Key capabilities

  • **Native audio**: ambient sound, music, voice, foley — generated in lock-step with the visuals so they actually match (no overlay tricks)
  • **Multilingual**: prompts and any embedded dialogue work across major languages
  • **Five aspect ratios**: 16:9, 9:16, 1:1, 4:3, 3:4 — covers landscape ads, vertical short-form, square social, and portrait formats from one endpoint
  • **3-15 second clips** at 720p or 1080p
  • **Cinematic motion**: handles complex camera moves (dolly, push-in, aerial), shallow DOF, golden-hour lighting prompts well

Family

Part of the Happy Horse family — pair with the variants when you need a different starting modality:

| Variant | Input | Use it for |

|--------|-------|-----------|

| Text-to-Video | text prompt | one-shot clips from a brief |

| Image-to-Video | image + optional prompt | animating a still or hero shot |

| Video Edit | source video + edit prompt | transforming an existing clip (style, scene swap) |

| Reference-to-Video | text + 1-9 references | multi-character scenes, brand-consistent subjects |

Tech specs

  • **Resolutions**: 720p, 1080p
  • **Duration**: 3-15s
  • **Audio**: native, in-sync, prompt-controlled
  • **Latency**: 60-180s typical for a 5s clip; queue depth varies during peak hours
  • **Pricing**: $0.14/s at 720p, $0.28/s at 1080p — simple per-second billing, no minimums

Examples

  • Cinematic hummingbird
  • Aerial Japanese village
  • Macro espresso pour
  • Studio dancer

Frequently asked questions

What is Happy Horse Text-to-Video?
Happy Horse Text-to-Video is Alibaba's flagship video generator that turns a single text prompt into a 1080p clip with synchronized native audio. Unlike most text-to-video models that generate silent clips, Happy Horse produces ambient sound, music, and dialogue locked to the visuals — no separate audio model or post-production required.
How much does Happy Horse Text-to-Video cost on Runflow?
$0.14 per second of generated video at 720p, and $0.28 per second at 1080p. Billing is simple per-second — no minimums, no setup fees, no surprises. A standard 5-second 720p clip costs $0.70.
How long does a generation take?
Typical latency is 60-180 seconds for a 5-second clip during off-peak hours. During peak demand the queue can be deeper. Runflow's multi-region routing helps absorb regional spikes; we automatically pick the fastest available endpoint.
Can I use Happy Horse output commercially?
Yes. Output generated through Runflow's API is licensed for commercial use, including ads, social content, product launches, and SaaS products. Standard commercial license — no separate model licensing required.
Do I need to manage GPUs or infrastructure?
No. Runflow handles GPU provisioning, queueing, multi-region failover, and capacity scaling. Send an HTTP request, get a video back. No DevOps, no Kubernetes, no AI engineering team required.
How do I get started?
Sign up at runflow.io, get an API key, and POST to `/v1/models/alibaba/happy-horse/text-to-video/runs` with a `prompt` field. Polling the returned run ID gives you the final video URL. Full SDK and OpenAPI spec available — most teams have first generation working in under 10 minutes.

Related models

  • Happy Horse Reference-to-Video, Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.
  • Happy Horse Video Edit, HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images.
  • Happy Horse Image-to-Video, Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

Discoverable surfaces

Production-ready solutions

View all