Runflow
Back to Gallery

alibaba/happy-horse/video-edit

HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images.

$0.28/second 2026-04-28

Input

Example input video

Your request will cost$2.800

Per second of edited video (720p baseline)

Output

Example

Example output from Happy Horse Video Edit

Pricing

Criteria

Per second of edited video (720p baseline)

$0.28

per second of video

3s of video for $1

Criteria

1080p

$0.56

per second of video

1s of video for $1

Overview

Happy Horse Video Edit transforms an existing clip with natural-language instructions — no masks, no keyframes, no manual rotoscoping. Describe the change you want (recolor a sky, swap a season, restyle the whole scene); attach up to 5 reference images for visual anchoring; get back a re-rendered clip that preserves the source's motion and aspect ratio.

Key capabilities

  • Prompt-driven editing: "recolor the sky to a deep purple sunset," "convert to film noir," "add cherry blossoms throughout" — global or local edits in one shot
  • Reference image support: include up to 5 reference images and call them as `@Image1`-`@Image5` in your prompt for visual fidelity
  • Source-faithful: aspect ratio is preserved; output duration matches input (longer-than-15s inputs truncate to the first 15s)
  • Audio control: keep, replace, or strip the source audio via the `audio_setting` parameter
  • Wide input range: MP4/MOV (H.264 recommended), 3-60s, ≤2160px long side, ≥320px short side, >8 fps, ≤100MB

Family

Part of the Happy Horse family — pair with the variants when you need a different starting modality:

VariantInputUse it for
Text-to-Videotext promptone-shot clips from a brief
Image-to-Videoimage + optional promptanimating a still or hero shot
Video Editsource video + edit prompttransforming an existing clip (style, scene swap)
Reference-to-Videotext + 1-9 referencesmulti-character scenes, brand-consistent subjects

Tech specs

  • Resolutions: 720p, 1080p
  • Source video: MP4/MOV, 3-60s, ≤2160px long side, ≥320px short side, >8 fps, ≤100MB
  • Output duration: matches input, capped at 15s
  • References: up to 5 images, callable as `@Image1`-`@Image5` in the prompt
  • Audio: configurable via `audio_setting`
  • Latency: 90-240s typical (longer than text-to-video due to source decode + re-render)
  • Pricing: $0.28/s at 720p, $0.56/s at 1080p — input/output seconds are billed together, simple per-second billing

Frequently asked questions

Related models

Start generating with Happy Horse Video Edit

Get API access in minutes. No GPU setup, no infrastructure to manage.