We're hiring · Sentinel

Machine Learning Engineer.

Help build the AI quality scoring layer Runflow runs in front of every image and video we ship. Hands-on ML, in production tomorrow, at scale.

Remote · global

Full-time / contract

Senior IC + equity

Apply now How we hire

sentinel · evaluatelive

auto-rotates · hover/focus to pause

01 · router

prompt:

“professional headshot, navy suit, neutral background”

decision:

use_case: headshot
analyzers: 3
criteria: 3

02 · analyzers

Subjectrunning

Layoutrunning

Surfacerunning

03 · criteria → score

Identity preserved

Composition

Surface quality

sentinel score

pass · ship

The role

Sentinel is the AI quality scoring layer Runflow runs in front of every image and video we ship. It decides what's good enough to deliver, what needs to be fixed, and what to throw away. It's how we keep output quality predictable at the volumes our customers run.

We're hiring a Machine Learning Engineer to push that scoring layer forward and extend it across new content types. The work is hands-on ML applied to a real product. The model you train today is in production tomorrow, scoring real customer outputs.

If you've trained, fine-tuned, and quantised small specialised models, built eval pipelines that someone actually relied on, and have strong opinions on when an LLM is the right tool vs. when a sub-100ms specialised model is, this is for you.

Quick facts

TeamSentinel

Reports toHead of AI

LocationRemote · global

TypeFull-time or contract

CompensationSenior IC + equity

StartASAP

Representative projects

What you'd ship in your first 6 months.

Concrete, shippable, on the live roadmap. Not hypothetical. We expect every one of these to land.

Project 01

Build the eval harness that catches regressions before they ship

Labelled samples per use case (good + bad), CI integration, regression alerts on every prompt change or model swap. Make Sentinel itself testable.

Project 02

Find Sentinel's blind spots, then add the analyzers that close them

Audit where Sentinel passes outputs customers later flag, where it rejects outputs that were actually fine. Pick the highest-impact gap, design the new analyzer (researching the best embeddings model for human-identity comparisons, for example), train, quantise, ship into the pipeline.

Project 03

Extend Sentinel from images to video, scoring temporal consistency on Seedance and Kling

Define the criteria that matter for video (identity drift, motion artifacts, scene continuity, lip sync where it applies). Build the analyzers. Ship it. Sentinel scores video by quarter-end.

Responsibilities

What you'll be doing.

You'll own the full lifecycle, from prototype to production at scale.

Architect the evaluation pipeline, router, preprocessors, judges, for image and video

Train, fine-tune, and quantise specialised neural networks (face, pose, segmentation, OCR, embeddings, depth)

Design dynamic rubrics that adapt to use case, headshot vs. product vs. fashion vs. video

Spawn LLM judges per criterion. Decide model, prompt, complexity tier

Extend Sentinel from images to video, temporal consistency, motion, identity drift, lip sync

Build the eval harness, labelled samples per use case, CI regression alerts

Push the LLM router toward the cost / quality / latency frontier (Lite / Flash / Pro tiers)

Co-design the auto-fix loop with workflow engineers, generate → evaluate → fix → re-evaluate

Requirements

About you.

You may be a good fit if…

You've trained or fine-tuned a small neural network end-to-end, datasets, loss curves, the works
You've shipped quantised inference (INT4/INT8) and know what trade-offs are acceptable in production
You've designed an evaluation pipeline for any generative task and made it run reliably
You have strong opinions on when to use an LLM vs. a specialised model, and can defend them
You're comfortable shipping in TypeScript even if Python is your home, the system spans both
You read recent ML papers (LLM-as-judge, eval design, video gen) and can summarise them in 3 sentences

Strong candidates also have…

Have published an evaluation framework or LLM-as-judge benchmark
Have worked on AI video generation or video evaluation specifically
Have hands-on experience with Qwen3-VL, SAM 3.1, OpenPose, Gemini Embedding 2, ArcFace, or similar specialised CV / multimodal models
Have built a model router or orchestrator for multi-model production systems
Have shipped open-source ML infrastructure or contributed to one of the libraries above

What we're not looking for

A specific number of years. A specific degree. A specific stack. We hire on whether you can ship end-to-end and whether you have taste. Everything else is noise.

Tech stack

What we work with.

Backend

Node.jsPythonPostgreSQLRedis

ML

ComfyUIPyTorchdiffuserstransformersONNXTensorRT

Models

Qwen3-VLSAM 3.1OpenPoseGemini Embedding 2ArcFace

LLMs

Gemini 3.1Opus 4.7 ThinkingMuse SparkQwen 3.5Kimi k2.6

How we hire

Five steps. Decision speed is part of the offer.

We move fast because senior candidates with multiple offers reward the team that respects their time. Whole loop fits in two weeks.

01/5

Application submitted

Form, ~5 min

We read every word of every application. No silent rejections, ever.

Triaged within 5 business days

Apply now

02/5

Take-home challenge

2 hours max, your time

A small, real Runflow problem. We score the prompts and decisions you made as much as the output itself.

Reviewed within 3 business days

03/5

30 minutes with the CTO

30 min, live

Quick conversation about your take-home, the team, and how you work day-to-day.

Decision in 48 hours

04/5

Paid challenge, 2 days

2 days, we pay for your time

A bigger problem we pay you to work through. We care about outcomes, not process. AI tools more than welcome.

Decision in 48 hours

05/5

Closing interview

90 min, live

Casual chat with the two founders.

Decision in 24 hours

Show, don't tell.

We value proof over promises. When you apply, include examples. Things that stand out:

A small model you trained, fine-tuned, or quantised, repo + the eval that proves it works

An evaluation rubric you've designed for a real problem, with the failure modes you caught

Any work on video generation, video eval, or temporal modelling

A multi-model system or router you've built, when does each model fire, why

Apply

Ready to ship?

Five minutes. We read every word. Yes / no / not-now within 5 business days, always.

You're applying for Machine Learning Engineer · Sentinel

Name

Location

optional, city, country, time zone

Links

At least one, LinkedIn, GitHub, portfolio, Loom, anything.

Show us 1–3 things you've built that prove you can ship end-to-end.

Repos, posts, demos, anything. Specifics beat resume bullets.

1500 left

What's a problem you solved with AI as the execution layer, not just chat?

We care that you reach for AI as a default tool, not occasionally.

1000 left

Show us a product you find beautiful. Why?

Taste is non-negotiable. We want your real opinions.

800 left

Which AI tools are in your daily workflow?

Pick all you actively use, not just tried.

How much is AI helping you these days?

A sentence or two. Skeptical, force multiplier, in between, where it shines, where it doesn't.

600 left

Share a Claude Code session that shipped a feature end-to-end.

Optional but high-signal. We want to see how you actually work with AI, the prompts, the back-and-forth, the corrections. Whole session, snippet, Loom, gist, anything.

3000 left

When have you most successfully hacked a non-computer system to your advantage?

Optional. Bureaucracy, supply chain, a conference badge, social dynamics, anything. We collect signal here.

1000 left

Most recent role + company

Just the title + company name. We don't need a CV.

200 left

When could you start?

Salary expectation

Confidential. A range is fine.

200 left

Describe an evaluation pipeline you've built or a smaller model you've fine-tuned or quantised. What was the hardest part, and what would you do differently?

Specifics. Datasets, metrics, infra, the bug that took two weeks.

1800 left

Anything else?

Optional. A short note, a question, a relevant detail we missed.

500 left

We commit to a yes / no / not-now within 5 business days. Always. No silent rejections.

Other open roles

Maybe one of these instead.

We hire builders, not roles. If none fit exactly, the open application is at the bottom.

Workflows·Remote · global

ComfyUI Engineer

Own the workflows that power millions of AI images every month. You think in nodes, you debug in graphs, you care obsessively about output quality.

View role

Infrastructure·Remote · global

DevOps Engineer

Build the infrastructure that powers the next generation of AI media. GPU fleet, warm workers, queues, FastAPI services, in production tomorrow.

View role

DevRel·Remote · global

DevRel Engineer

Ship cool things on top of Runflow, bring creators in, drive adoption of the next generation of AI image + video infrastructure.

View role

Open·You define it

Open application

Don't see your role? Tell us what you'd build at Runflow and how it moves the mission forward.

Apply openly

Apply now