Skip to main content
Runflow

Happy Horse Reference-to-Video

Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

By Alibaba

Pricing: $0.14 per second

Overview

Happy Horse Reference-to-Video composes 1080p multi-character scenes from **1-9 reference images plus a text prompt**. Reference images are addressable by index (`character1`, `character2`, …) so you can stage interactions, brand-consistent product placements, or multi-subject scenes with stable identity across the whole clip.

Key capabilities

  • **Up to 9 references**: each callable as `character1`-`character9` in your prompt — order matches your `image_urls` array
  • **Identity preservation**: subjects keep their reference appearance across motion and scene changes
  • **Native audio + lip-sync**: same audio engine as the rest of the family — multilingual and synchronized to motion
  • **Five aspect ratios**: 16:9, 9:16, 1:1, 4:3, 3:4
  • **3-15 second clips** at 720p or 1080p
  • **Strict reference quality**: each image must be ≥400px on the short side, ≤10MB, JPEG/JPG/PNG/WEBP — 720p+ recommended for clean identity

Family

Part of the Happy Horse family — pair with the variants when you need a different starting modality:

| Variant | Input | Use it for |

|--------|-------|-----------|

| Text-to-Video | text prompt | one-shot clips from a brief |

| Image-to-Video | image + optional prompt | animating a still or hero shot |

| Video Edit | source video + edit prompt | transforming an existing clip (style, scene swap) |

| Reference-to-Video | text + 1-9 references | multi-character scenes, brand-consistent subjects |

Tech specs

  • **Resolutions**: 720p, 1080p
  • **Duration**: 3-15s
  • **References**: 1-9 images, ≥400px short side, ≤10MB each, callable as `character1`-`character9`
  • **Audio**: native, lip-synced, multilingual
  • **Aspect ratios**: 16:9, 9:16, 1:1, 4:3, 3:4
  • **Latency**: 90-240s typical for a 5s clip with multiple references
  • **Pricing**: $0.14/s at 720p, $0.28/s at 1080p — references don't change the price

Examples

  • Neon-lit Tokyo street
  • Sunlit meadow
  • Mountaintop sunrise
  • Cafe by window

Frequently asked questions

What is Happy Horse Reference-to-Video?
Happy Horse Reference-to-Video composes multi-subject 1080p video scenes from 1-9 reference images plus a text prompt. Each reference is addressable as `character1`, `character2`, … in the prompt, so you can stage interactions, multi-product shots, or brand-consistent scenes with stable identity across the whole clip.
How much does Happy Horse Reference-to-Video cost on Runflow?
$0.14 per second of output at 720p, and $0.28 per second at 1080p. The number of reference images doesn't change the price — same flat per-second rate from 1 to 9 references.
How do I reference my images in the prompt?
List your images in `image_urls` (1-9 entries). In the prompt, refer to them as `character1`, `character2`, …, in the same order. Example: with two images, prompt `A dance battle between character1 and character2, neon city background, cinematic camera`.
What reference image quality is required?
JPEG, JPG, PNG, or WEBP up to 10MB each. Shortest side must be at least 400 pixels — 720p or higher is recommended for clean identity preservation. Cluttered or low-resolution refs reduce subject consistency.
Can I use Happy Horse output commercially?
Yes — all output via Runflow is licensed for commercial use, including ads and products with branded subjects.
Do I need to manage GPUs or infrastructure?
No. Runflow handles GPU provisioning, queueing, multi-region routing, and scaling. One HTTP call returns a video URL.

Related models

  • Happy Horse Video Edit, HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images.
  • Happy Horse Image-to-Video, Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images.
  • Happy Horse Text-to-Video, Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s.

Discoverable surfaces

Production-ready solutions

View all