HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images.
Per second of edited video (720p baseline)
Output
ExampleExample output from Happy Horse Video Edit
Pricing
Criteria
Per second of edited video (720p baseline)
per second of video
3s of video for $1
Criteria
1080p
per second of video
1s of video for $1
Overview
Happy Horse Video Edit transforms an existing clip with natural-language instructions — no masks, no keyframes, no manual rotoscoping. Describe the change you want (recolor a sky, swap a season, restyle the whole scene); attach up to 5 reference images for visual anchoring; get back a re-rendered clip that preserves the source's motion and aspect ratio.
Key capabilities
- ●Prompt-driven editing: "recolor the sky to a deep purple sunset," "convert to film noir," "add cherry blossoms throughout" — global or local edits in one shot
- ●Reference image support: include up to 5 reference images and call them as `@Image1`-`@Image5` in your prompt for visual fidelity
- ●Source-faithful: aspect ratio is preserved; output duration matches input (longer-than-15s inputs truncate to the first 15s)
- ●Audio control: keep, replace, or strip the source audio via the `audio_setting` parameter
- ●Wide input range: MP4/MOV (H.264 recommended), 3-60s, ≤2160px long side, ≥320px short side, >8 fps, ≤100MB
Family
Part of the Happy Horse family — pair with the variants when you need a different starting modality:
| Variant | Input | Use it for |
|---|---|---|
| Text-to-Video | text prompt | one-shot clips from a brief |
| Image-to-Video | image + optional prompt | animating a still or hero shot |
| Video Edit | source video + edit prompt | transforming an existing clip (style, scene swap) |
| Reference-to-Video | text + 1-9 references | multi-character scenes, brand-consistent subjects |
Tech specs
- ●Resolutions: 720p, 1080p
- ●Source video: MP4/MOV, 3-60s, ≤2160px long side, ≥320px short side, >8 fps, ≤100MB
- ●Output duration: matches input, capped at 15s
- ●References: up to 5 images, callable as `@Image1`-`@Image5` in the prompt
- ●Audio: configurable via `audio_setting`
- ●Latency: 90-240s typical (longer than text-to-video due to source decode + re-render)
- ●Pricing: $0.28/s at 720p, $0.56/s at 1080p — input/output seconds are billed together, simple per-second billing
Frequently asked questions
Related models
Happy Horse Text-to-Video
alibaba/happy-horse/text-to-video
Generate 1080p video with synchronized native audio from a text prompt. Aspect r...
Happy Horse Image-to-Video
alibaba/happy-horse/image-to-video
Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized nat...
Happy Horse Reference-to-Video
alibaba/happy-horse/reference-to-video
Generate 1080p video with synchronized native audio from a text prompt and refer...
Start generating with Happy Horse Video Edit
Get API access in minutes. No GPU setup, no infrastructure to manage.