Skip to main content
Runflow

Happy Horse Image-to-Video

Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.

By Alibaba

Pricing: $0.14 per second

Overview

Happy Horse Image-to-Video animates a still image into 1080p video with **synchronized native audio and multilingual lip-sync**. Drop in a portrait, product shot, or scene; describe the motion (or skip it); get back a fully-scored clip with mouth movement that actually tracks the audio.

Key capabilities

  • **Native audio + lip-sync**: ambient sound, music, dialogue all generated in sync — and if a face is in frame, lips track the speech in 30+ languages
  • **Optional prompt**: leave it empty for a natural extension of the image, or steer the motion explicitly
  • **Aspect-ratio preservation**: output keeps the source image's ratio (constrained between 1:2.5 and 2.5:1)
  • **3-15 second clips** at 720p or 1080p
  • **Strict input requirements**: image must be ≥300px on the short side, ≤10MB, JPEG/PNG/WEBP/BMP

Family

Part of the Happy Horse family — pair with the variants when you need a different starting modality:

| Variant | Input | Use it for |

|--------|-------|-----------|

| Text-to-Video | text prompt | one-shot clips from a brief |

| Image-to-Video | image + optional prompt | animating a still or hero shot |

| Video Edit | source video + edit prompt | transforming an existing clip (style, scene swap) |

| Reference-to-Video | text + 1-9 references | multi-character scenes, brand-consistent subjects |

Tech specs

  • **Resolutions**: 720p, 1080p
  • **Duration**: 3-15s
  • **Image limits**: ≥300px short side, ≤10MB, JPEG/PNG/WEBP/BMP
  • **Audio**: native, lip-synced, multilingual
  • **Latency**: 60-180s typical for a 5s clip
  • **Pricing**: $0.14/s at 720p, $0.28/s at 1080p — simple per-second billing, no minimums

Examples

  • Subject smiles to camera
  • Head turn

Frequently asked questions

What is Happy Horse Image-to-Video?
Happy Horse Image-to-Video animates a still image into 1080p video with synchronized native audio and multilingual lip-sync. Drop in a portrait, product shot, or scene; the model fills in motion that's consistent with the source. If the image contains a face, lips track any generated dialogue across 30+ languages.
How much does Happy Horse Image-to-Video cost on Runflow?
$0.14 per second of output at 720p, and $0.28 per second at 1080p. Per-second billing, no minimums. A typical 5-second 720p clip costs $0.70.
What image formats and sizes are supported?
JPEG, JPG, PNG, BMP, and WEBP up to 10 MB. The shortest side must be at least 300 pixels. Aspect ratio must be between 1:2.5 and 2.5:1 — output preserves the source ratio.
How long does a generation take?
Typical latency is 60-180 seconds for a 5-second clip. Image-to-video is slightly slower than text-to-video because the model encodes the source image first. Runflow routes across regions to keep latency stable during demand spikes.
Can I use Happy Horse output commercially?
Yes — Runflow grants commercial use rights for everything generated via the API, including ads, e-commerce, and SaaS products. No additional licensing.
Do I need to manage GPUs or infrastructure?
No. Runflow handles GPU provisioning, queueing, multi-region failover, and scaling. One HTTP call returns a video URL. No DevOps team needed.

Related models

  • Happy Horse Reference-to-Video, Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.
  • Happy Horse Video Edit, HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images.
  • Happy Horse Text-to-Video, Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

Discoverable surfaces

Production-ready solutions

View all