Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.
Per second of generated video (720p baseline)
Output
ExampleExample output from Happy Horse Image-to-Video
Pricing
Criteria
Per second of generated video (720p baseline)
per second of video
7s of video for $1
Criteria
1080p
per second of video
3s of video for $1
Overview
Happy Horse Image-to-Video animates a still image into 1080p video with synchronized native audio and multilingual lip-sync. Drop in a portrait, product shot, or scene; describe the motion (or skip it); get back a fully-scored clip with mouth movement that actually tracks the audio.
Key capabilities
- ●Native audio + lip-sync: ambient sound, music, dialogue all generated in sync — and if a face is in frame, lips track the speech in 30+ languages
- ●Optional prompt: leave it empty for a natural extension of the image, or steer the motion explicitly
- ●Aspect-ratio preservation: output keeps the source image's ratio (constrained between 1:2.5 and 2.5:1)
- ●3-15 second clips at 720p or 1080p
- ●Strict input requirements: image must be ≥300px on the short side, ≤10MB, JPEG/PNG/WEBP/BMP
Family
Part of the Happy Horse family — pair with the variants when you need a different starting modality:
| Variant | Input | Use it for |
|---|---|---|
| Text-to-Video | text prompt | one-shot clips from a brief |
| Image-to-Video | image + optional prompt | animating a still or hero shot |
| Video Edit | source video + edit prompt | transforming an existing clip (style, scene swap) |
| Reference-to-Video | text + 1-9 references | multi-character scenes, brand-consistent subjects |
Tech specs
- ●Resolutions: 720p, 1080p
- ●Duration: 3-15s
- ●Image limits: ≥300px short side, ≤10MB, JPEG/PNG/WEBP/BMP
- ●Audio: native, lip-synced, multilingual
- ●Latency: 60-180s typical for a 5s clip
- ●Pricing: $0.14/s at 720p, $0.28/s at 1080p — simple per-second billing, no minimums
Frequently asked questions
Related models
Happy Horse Text-to-Video
alibaba/happy-horse/text-to-video
Generate 1080p video with synchronized native audio from a text prompt. Aspect r...
Happy Horse Video Edit
alibaba/happy-horse/video-edit
HappyHorse video editing supports advanced video editing through natural languag...
Happy Horse Reference-to-Video
alibaba/happy-horse/reference-to-video
Generate 1080p video with synchronized native audio from a text prompt and refer...
Wan 2.7 — Image to Video
alibaba/wan/v2.7/image-to-video
Wan 2.7 delivers enhanced motion smoothness, superior scene fidelity, and greate...
Start generating with Happy Horse Image-to-Video
Get API access in minutes. No GPU setup, no infrastructure to manage.