ComfyUI API: The Complete Developer's Guide (2026)
Back to blog
Guides Apr 22, 2026 25 min read

ComfyUI API: The Complete Developer's Guide (2026)

Every ComfyUI API endpoint, the /prompt deep dive, image upload flow, WebSocket tracking, and the production patterns - queueing, candidate-pool generation, multi-provider routing - that separate prototypes from products at scale.

Miguel Rasero
Miguel Rasero
CTO & Co-Founder

ComfyUI is usually introduced as a node-based UI for Stable Diffusion. That framing undersells it. Underneath the canvas is a fully-featured HTTP and WebSocket server that can act as the image and video generation backend for any application you want to build.

This guide is written for developers and technical founders who want to treat ComfyUI as infrastructure - not a desktop app. It covers every endpoint, the exact request and response shapes, image uploads, real-time execution tracking, production patterns, and the failure modes that break integrations once you move past localhost.

One idea to carry through the whole piece: the model is not the product. The workflow around the model is. The API is how you get to that workflow. Everything interesting happens in the layers on top.

By the end, you’ll have everything you need to go from a local workflow to a production API that other services can call reliably.

What Is the ComfyUI API?

The ComfyUI API is the HTTP and WebSocket interface exposed by the ComfyUI server that lets external applications submit generation workflows, upload inputs, track execution in real time, and retrieve outputs programmatically - without using the web UI.

When you drag nodes on the ComfyUI canvas and click “Queue Prompt,” the frontend serialize’s your workflow into JSON and POSTs it to an endpoint called /prompt. That same endpoint is available to any HTTP client. Everything the UI does, the API can do. The UI is just one of many possible clients.

This is the mental model that unlocks everything:

  • ComfyUI is a workflow engine with a server wrapped around it.
  • The canvas is a frontend. So is curl. So is your Python backend. So is a Next.js app calling it from a serverless function.
  • Workflows are just JSON. Anything you can build visually, you can submit programmatically.

Once that clicks, ComfyUI stops being a tool and starts being infrastructure. You can put it behind a queue, scale it horizontally, expose it to customers, or use it as the generation layer inside a larger product.

Why developers use the ComfyUI API over hosted alternatives

Hosted image APIs like Replicate, Fal, and Stability’s cloud are faster to start with but give up control. The ComfyUI API trades setup cost for four things that matter at scale:

  1. Custom workflows. Any combination of models, LoRAs, ControlNets, upscalers, and custom nodes that runs on the canvas runs through the API. You are not limited to what a vendor exposes.
  2. Cost. Self-hosted GPU inference on RunPod, Lambda, or your own hardware is 3–10x cheaper per image than per-call pricing once volume is consistent.
  3. IP and data control. Workflows, models, and inputs stay inside your infrastructure. For regulated use cases (healthcare, legal, regulated creative work), this is often non-negotiable.
  4. Workflows as first-class artifacts. A workflow JSON is something you can version, diff, promote from dev to staging to prod, and roll back. Most hosted APIs treat “the model” as the unit of work. Treating “the workflow” as the unit of work is what actually maps to how teams build.

The tradeoff: you own the uptime, scaling, queueing, auth, and observability. The rest of this guide is about making those tradeoffs deliberate, not accidental.

How the ComfyUI REST API Works

The ComfyUI REST API is an HTTP interface built on aiohttp that runs on port 8188 by default. It exposes endpoints for submitting workflows, managing the queue, uploading files, retrieving results, and inspecting the node catalogue.

The architecture is simple:

Client  →  HTTP POST /prompt  →  PromptServer  →  Execution Queue  →  PromptExecutor  →  Output files
                                       ↓
Client  ←  WebSocket /ws      ←  Status events (real-time)
Client  ←  HTTP GET /history  ←  Completed results (poll)
Client  ←  HTTP GET /view     ←  Raw image bytes

Three things to understand before writing any integration code:

1. ComfyUI uses "workflow" and "prompt" interchangeably in the API. What the UI calls a workflow is submitted as a prompt field. Do not confuse this with the text prompt inside a CLIP node. The API's prompt = the entire node graph as JSON.

2. There are two JSON formats for workflows. The regular workflow JSON (what you see in the UI's save/load) is not what the API accepts. You need the API format, which strips positional/visual data and keeps only the executable graph. Export it via the ComfyUI UI: Settings → Enable Dev Mode → "Save (API Format)" button. This is the single most common reason first-time integrations fail.

3. Execution is asynchronous. /prompt returns immediately with a prompt_id. You do not get the image in the response. You either poll /history/{prompt_id} or subscribe to the WebSocket for completion events. Build around this from day one.

Starting the server in API mode

ComfyUI is always "API mode." There is no separate flag. Start it normally:

python main.py --listen 0.0.0.0 --port 8188

For production, add:

python main.py \
  --listen 0.0.0.0 \
  --port 8188 \
  --enable-cors-header "*" \
  --tls-keyfile /path/to/key.pem \
  --tls-certfile /path/to/cert.pem

Note what's missing: there is no authentication flag. The default ComfyUI server exposes every endpoint publicly. We'll cover how to fix that in the security section below.

Complete List of ComfyUI API Endpoints

Below is the full set of ComfyUI API endpoints developers actually use in production. These are defined in server.py in the main ComfyUI repository and are stable across recent releases.

Workflow execution

EndpointMethodPurpose
/promptPOSTSubmit a workflow for execution. Returns prompt_id.
/promptGETGet current queue state (pending + running).
/queueGETDetailed queue view (running + pending items).
/queuePOSTDelete items from queue or clear it entirely.
/interruptPOSTCancel the currently executing workflow.
/historyGETFull execution history.
/history/{prompt_id}GETResults and metadata for a specific prompt.
/historyPOSTClear the history.

File operations

EndpointMethodPurpose
/upload/imagePOSTUpload an image to the input directory.
/upload/maskPOSTUpload a mask associated with an image.
/viewGETRetrieve an image by filename, subfolder, and type.

System and introspection

EndpointMethodPurpose
/object_infoGETFull node catalogue — every node class, its inputs, outputs, and defaults.
/object_info/{node_class}GETSchema for a single node class.
/system_statsGETServer info: Python version, CUDA, VRAM, device list.
/embeddingsGETList installed text embeddings.
/extensionsGETList loaded custom node extensions.
/models/{type}GETList available models of a given type (checkpoints, LoRAs, VAE, etc.).
/freePOSTFree VRAM — unload models and clear cache.

Real-time communication

EndpointProtocolPurpose
/wsWebSocketBidirectional channel for execution status, node progress, and preview images.

Two endpoints do the heavy lifting: /prompt (what you submit) and /history/{prompt_id} (what you get back). /upload/image matters any time your workflow takes an image input. /ws matters when you need real-time feedback. Everything else is supporting infrastructure.

Deep Dive: The /prompt Endpoint

The ComfyUI /prompt endpoint is the primary entry point for executing workflows. It accepts a POST request containing the workflow graph in API format and returns a prompt_id that identifies the execution.

This is the endpoint you'll spend 80% of your integration time against. Getting its request and response shapes right is the difference between a working API and a debugging loop.

Request structure

POST /prompt HTTP/1.1
Host: localhost:8188
Content-Type: application/json

{
  "prompt": { ... workflow API JSON ... },
  "client_id": "optional-uuid-for-websocket-correlation",
  "extra_data": {
    "extra_pnginfo": { ... optional metadata embedded in output PNGs ... }
  },
  "front": false,
  "number": 0
}

Field by field:

  • prompt (required): The workflow in API format. Keys are node IDs (strings like "3", "4"). Values contain class_type and inputs.
  • client_id (optional but recommended): A UUID you generate. Attach the same UUID to your WebSocket connection and ComfyUI will route status events back to you specifically instead of broadcasting.
  • extra_data (optional): Arbitrary metadata. Useful for embedding workflow info into saved PNGs.
  • front (optional): If true, inserts the job at the front of the queue. Useful for priority traffic.
  • number (optional): Execution priority number.

Response structure

On success:

{
  "prompt_id": "a3f9e2b1-4c5d-4e6f-8a9b-0c1d2e3f4a5b",
  "number": 3,
  "node_errors": {}
}
  • prompt_id: Use this to fetch results from /history/{prompt_id}.
  • number: Position in the queue.
  • node_errors: Empty object on success.

On validation failure:

{
  "error": {
    "type": "prompt_outputs_failed_validation",
    "message": "Prompt outputs failed validation",
    "details": "",
    "extra_info": {}
  },
  "node_errors": {
    "4": {
      "errors": [
        {
          "type": "value_not_in_list",
          "message": "Value not in list: ckpt_name: 'nonexistent.safetensors'",
          "details": "...",
          "extra_info": {}
        }
      ],
      "dependent_outputs": ["9"],
      "class_type": "CheckpointLoaderSimple"
    }
  }
}

Validation runs before anything enters the queue. If any node references a missing model, invalid input type, or unknown node class, you get a 400 back with node_errors keyed by node ID. This is good catch and surface these to clients instead of treating them as generic failures.

The custom-inputs problem nobody warns you about

Here's the part most "ComfyUI API 101" posts skip: mapping your application's inputs to specific nodes in the workflow JSON is the hardest part of a real integration. The ComfyUI founder of a competing deployment tool put it bluntly in an HN thread: "The most annoying portion of the Comfy API is dealing with custom inputs."

The problem: your app wants to say "generate an image with this prompt and this reference photo." The ComfyUI API wants you to know that the prompt lives at workflow["6"]["inputs"]["text"], the reference lives at workflow["10"]["inputs"]["image"], and those node IDs can change any time a designer touches the workflow on the canvas.

Three ways teams handle this:

  1. Convention: name your nodes semantically in the UI and look them up by title.
  2. Template wrappers: keep a thin Python/TypeScript layer that knows which node IDs map to which inputs, versioned alongside the workflow JSON.
  3. Input nodes: use dedicated "workflow input" nodes (the pattern adopted by several ComfyUI deployment platforms, e.g. Runflow uses input/output nodes placed directly on the canvas to generate the API contract automatically). This is the cleanest pattern because the designer editing the workflow controls the API surface, and the developer doesn't have to reverse-engineer graph IDs.

Whichever you pick, do it on day one. Graph-ID-string-matching sprinkled through your app is the integration style that collapses the first time someone renames a node.

What a workflow API JSON actually looks like

A minimal text-to-image workflow in API format:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 42,
      "steps": 20,
      "cfg": 7.5,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    }
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": { "ckpt_name": "sd_xl_base_1.0.safetensors" }
  },
  "5": {
    "class_type": "EmptyLatentImage",
    "inputs": { "width": 1024, "height": 1024, "batch_size": 1 }
  },
  "6": {
    "class_type": "CLIPTextEncode",
    "inputs": { "text": "a futuristic city at dusk, neon signs, rain", "clip": ["4", 1] }
  },
  "7": {
    "class_type": "CLIPTextEncode",
    "inputs": { "text": "blurry, low quality", "clip": ["4", 1] }
  },
  "8": {
    "class_type": "VAEDecode",
    "inputs": { "samples": ["3", 0], "vae": ["4", 2] }
  },
  "9": {
    "class_type": "SaveImage",
    "inputs": { "filename_prefix": "api_output", "images": ["8", 0] }
  }
}

Two things worth noting:

  • ["4", 0] is a graph reference. It means "take output index 0 from node with ID 4." This is how nodes connect.
  • Node IDs are arbitrary strings. In exports they're often numbers, but you can use any string. Keep them stable within a workflow.

Submitting a prompt with curl

curl -X POST http://localhost:8188/prompt \
  -H "Content-Type: application/json" \
  -d '{"prompt": <workflow_json>, "client_id": "'$(uuidgen)'"}'

Submitting a prompt with Python

import json
import uuid
import urllib.request

SERVER = "http://127.0.0.1:8188"
CLIENT_ID = str(uuid.uuid4())

def queue_prompt(workflow: dict) -> str:
    payload = {"prompt": workflow, "client_id": CLIENT_ID}
    req = urllib.request.Request(
        f"{SERVER}/prompt",
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read())["prompt_id"]

with open("workflow_api.json") as f:
    workflow = json.load(f)

prompt_id = queue_prompt(workflow)
print(f"Queued: {prompt_id}")

Uploading Images via the API

The ComfyUI upload image API endpoint is POST /upload/image. It accepts multipart/form-data and stores the file in ComfyUI's input/ directory, making it available to LoadImage nodes in subsequent workflows.

Any workflow that does img2img, ControlNet conditioning, inpainting, face swapping, style transfer, or video reference needs this endpoint. The flow is always the same:

  1. Upload the image → get back the stored filename.
  2. Update your workflow JSON so the LoadImage node references that filename.
  3. Submit the workflow via /prompt.

A detail to flag before we get into the mechanics: in a production pipeline, every image you upload is ultimately a candidate for something a final deliverable, a reference for further generation, or an input to a scoring pass. Keep tenant isolation in mind from the first request. Use subfolders (we'll show how below) rather than dumping everyone's uploads into the same directory.

Request structure

POST /upload/image HTTP/1.1
Host: localhost:8188
Content-Type: multipart/form-data; boundary=----BOUNDARY

------BOUNDARY
Content-Disposition: form-data; name="image"; filename="input.png"
Content-Type: image/png

<binary data>
------BOUNDARY
Content-Disposition: form-data; name="type"

input
------BOUNDARY
Content-Disposition: form-data; name="subfolder"

my_project
------BOUNDARY
Content-Disposition: form-data; name="overwrite"

true
------BOUNDARY--

Fields:

  • image (required): The file, as a file upload.
  • type (optional): One of input, temp, output. Defaults to input. Use input for assets your workflow will consume.
  • subfolder (optional): Subdirectory inside the type directory. Useful for multi-tenant isolation.
  • overwrite (optional): "true" or "1" to replace an existing file with the same name.

Response

{
  "name": "input.png",
  "subfolder": "my_project",
  "type": "input"
}

Use name when referencing the file from a LoadImage node in the workflow.

A detail most tutorials miss: ComfyUI does hash-based duplicate detection during upload. If you upload the same bytes twice, it returns the existing filename instead of writing a duplicate. This is efficient, but means you shouldn't rely on the upload timestamp to track "new" images.

Uploading and running a workflow in Python

import json
import urllib.request
from requests_toolbelt.multipart.encoder import MultipartEncoder

def upload_image(file_path: str, subfolder: str = "") -> dict:
    with open(file_path, "rb") as f:
        encoder = MultipartEncoder(
            fields={
                "image": (file_path.split("/")[-1], f, "image/png"),
                "type": "input",
                "subfolder": subfolder,
                "overwrite": "true",
            }
        )
        req = urllib.request.Request(
            f"{SERVER}/upload/image",
            data=encoder.to_string(),
            headers={"Content-Type": encoder.content_type},
        )
        with urllib.request.urlopen(req) as resp:
            return json.loads(resp.read())

# 1. Upload
uploaded = upload_image("./photo.png", subfolder="user_42")

# 2. Patch the workflow to point at the uploaded file
workflow["10"]["inputs"]["image"] = uploaded["name"]
# If you used a subfolder, use: f"{uploaded['subfolder']}/{uploaded['name']}"

# 3. Queue
prompt_id = queue_prompt(workflow)

Uploading masks

Inpainting workflows also need a mask. The endpoint is POST /upload/mask and accepts the same fields as /upload/image, plus an original_ref field that points to the image the mask applies to. The mask is composited with the original image's alpha channel, and the original's PNG metadata is preserved.

WebSocket vs REST: Real-Time Execution Tracking

ComfyUI exposes both a WebSocket at /ws and a REST history endpoint. Use both, for different reasons.

Rule of thumb:

  • REST (/history/{prompt_id} polling): Best for serverless, cron-style jobs, simple backends, and anything where the client is not long-lived.
  • WebSocket (/ws): Best for user-facing products where you show progress, UIs that need real-time updates, and long-running workflows where 30+ second latency matters.

The WebSocket protocol

Open a connection to ws://<host>:<port>/ws?clientId=<your-uuid>. The clientId must match the client_id you send with /prompt requests. Once connected, the server pushes messages as the queue progresses:

  • status - queue state changes (items added, items finished).
  • execution_start - a specific prompt_id has started.
  • executing - a specific node is now running. When node is null, execution is complete.
  • progress - per-node progress (useful for samplers: current step / total steps).
  • executed - a node has finished and produced outputs.
  • execution_cached - a node was skipped because its inputs were cached.
  • execution_error - something went wrong.

Python WebSocket client - the canonical pattern

import json
import uuid
import websocket
import urllib.request
import urllib.parse

SERVER = "127.0.0.1:8188"
CLIENT_ID = str(uuid.uuid4())

def connect():
    ws = websocket.WebSocket()
    ws.connect(f"ws://{SERVER}/ws?clientId={CLIENT_ID}")
    return ws

def wait_for_completion(ws, prompt_id: str):
    while True:
        msg = ws.recv()
        if isinstance(msg, str):
            data = json.loads(msg)
            if data["type"] == "executing":
                d = data["data"]
                if d["node"] is None and d["prompt_id"] == prompt_id:
                    return  # done
            elif data["type"] == "execution_error":
                raise RuntimeError(data["data"])
        # Binary messages are preview images — ignore or display them.

def get_history(prompt_id: str) -> dict:
    with urllib.request.urlopen(f"http://{SERVER}/history/{prompt_id}") as resp:
        return json.loads(resp.read())[prompt_id]

def get_image(filename: str, subfolder: str, folder_type: str) -> bytes:
    params = urllib.parse.urlencode({
        "filename": filename,
        "subfolder": subfolder,
        "type": folder_type,
    })
    with urllib.request.urlopen(f"http://{SERVER}/view?{params}") as resp:
        return resp.read()

# Full flow
ws = connect()
prompt_id = queue_prompt(workflow)
wait_for_completion(ws, prompt_id)
history = get_history(prompt_id)

for node_id, node_output in history["outputs"].items():
    for img in node_output.get("images", []):
        data = get_image(img["filename"], img["subfolder"], img["type"])
        with open(f"out_{img['filename']}", "wb") as f:
            f.write(data)

This is the 50-line pattern that powers most real ComfyUI integrations. Everything else is scaling, error handling, and authentication layered on top.

When to use polling instead

If you're running ComfyUI behind serverless (Vercel, Cloud Run, Lambda), you often can't hold a WebSocket. Poll /history/{prompt_id} every 2–3 seconds. The prompt will be present in history once execution finishes:

import time

def wait_by_polling(prompt_id: str, timeout: int = 600) -> dict:
    start = time.time()
    while time.time() - start < timeout:
        with urllib.request.urlopen(f"http://{SERVER}/history/{prompt_id}") as resp:
            history = json.loads(resp.read())
        if prompt_id in history:
            return history[prompt_id]
        time.sleep(2)
    raise TimeoutError(f"Prompt {prompt_id} did not complete in {timeout}s")


Downsides: higher latency (up to polling interval), more load on the server, no preview images. Upside: works anywhere HTTP works.

Building a ComfyUI API Integration

A clean ComfyUI API integration has five responsibilities, in order of increasing complexity:

  1. Loading workflow templates - keep workflow JSON in version control.
  2. Parameterizing them - swap in user inputs before submission.
  3. Submitting and tracking - the /prompt + /ws pattern above.
  4. Returning outputs - serve the final images to your application.
  5. Failing cleanly - surface meaningful errors, don't just 500.

Here's a production-shaped integration in ~80 lines of Python:

import json
import uuid
import urllib.request
import urllib.parse
import websocket
from requests_toolbelt.multipart.encoder import MultipartEncoder

class ComfyUIClient:
    def __init__(self, host: str = "127.0.0.1:8188"):
        self.host = host
        self.client_id = str(uuid.uuid4())
        self.ws = None

    def _ws(self):
        if self.ws is None:
            self.ws = websocket.WebSocket()
            self.ws.connect(f"ws://{self.host}/ws?clientId={self.client_id}")
        return self.ws

    def upload_image(self, path: str, subfolder: str = "") -> str:
        with open(path, "rb") as f:
            enc = MultipartEncoder({
                "image": (path.split("/")[-1], f, "image/png"),
                "type": "input",
                "subfolder": subfolder,
                "overwrite": "true",
            })
            req = urllib.request.Request(
                f"http://{self.host}/upload/image",
                data=enc.to_string(),
                headers={"Content-Type": enc.content_type},
            )
            with urllib.request.urlopen(req) as r:
                data = json.loads(r.read())
                return f"{data['subfolder']}/{data['name']}" if data['subfolder'] else data['name']

    def queue(self, workflow: dict) -> str:
        payload = {"prompt": workflow, "client_id": self.client_id}
        req = urllib.request.Request(
            f"http://{self.host}/prompt",
            data=json.dumps(payload).encode(),
            headers={"Content-Type": "application/json"},
        )
        with urllib.request.urlopen(req) as r:
            result = json.loads(r.read())
            if result.get("node_errors"):
                raise ValueError(f"Validation failed: {result['node_errors']}")
            return result["prompt_id"]

    def wait(self, prompt_id: str, timeout: int = 600):
        ws = self._ws()
        ws.settimeout(timeout)
        while True:
            msg = ws.recv()
            if not isinstance(msg, str):
                continue
            data = json.loads(msg)
            if data["type"] == "executing":
                d = data["data"]
                if d["node"] is None and d["prompt_id"] == prompt_id:
                    return
            elif data["type"] == "execution_error":
                raise RuntimeError(data["data"])

    def fetch_outputs(self, prompt_id: str) -> list[bytes]:
        with urllib.request.urlopen(f"http://{self.host}/history/{prompt_id}") as r:
            history = json.loads(r.read())[prompt_id]
        images = []
        for node_output in history["outputs"].values():
            for img in node_output.get("images", []):
                q = urllib.parse.urlencode(img)
                with urllib.request.urlopen(f"http://{self.host}/view?{q}") as r:
                    images.append(r.read())
        return images

    def run(self, workflow: dict) -> list[bytes]:
        prompt_id = self.queue(workflow)
        self.wait(prompt_id)
        return self.fetch_outputs(prompt_id)

Usage:

client = ComfyUIClient("your-gpu-host:8188")

# img2img
uploaded_name = client.upload_image("./reference.jpg", subfolder="users/42")
workflow = json.load(open("templates/style_transfer.json"))
workflow["10"]["inputs"]["image"] = uploaded_name
workflow["6"]["inputs"]["text"] = "watercolor, impressionist"

images = client.run(workflow)
with open("result.png", "wb") as f:
    f.write(images[0])

Node / TypeScript equivalent

For Next.js, Remix, or any Node backend:

import WebSocket from "ws";
import { randomUUID } from "crypto";

export class ComfyUIClient {
  private clientId = randomUUID();

  constructor(private host: string = "127.0.0.1:8188") {}

  async queue(workflow: Record<string, unknown>): Promise<string> {
    const res = await fetch(`http://${this.host}/prompt`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ prompt: workflow, client_id: this.clientId }),
    });
    const data = await res.json();
    if (data.node_errors && Object.keys(data.node_errors).length) {
      throw new Error(`Validation failed: ${JSON.stringify(data.node_errors)}`);
    }
    return data.prompt_id;
  }

  async wait(promptId: string, timeoutMs = 600_000): Promise<void> {
    return new Promise((resolve, reject) => {
      const ws = new WebSocket(`ws://${this.host}/ws?clientId=${this.clientId}`);
      const timer = setTimeout(() => {
        ws.close();
        reject(new Error("timeout"));
      }, timeoutMs);
      ws.on("message", (raw) => {
        try {
          const msg = JSON.parse(raw.toString());
          if (msg.type === "executing" && msg.data.node === null && msg.data.prompt_id === promptId) {
            clearTimeout(timer);
            ws.close();
            resolve();
          } else if (msg.type === "execution_error") {
            clearTimeout(timer);
            ws.close();
            reject(new Error(JSON.stringify(msg.data)));
          }
        } catch { /* binary preview frames */ }
      });
      ws.on("error", (e) => { clearTimeout(timer); reject(e); });
    });
  }

  async outputs(promptId: string): Promise<Buffer[]> {
    const res = await fetch(`http://${this.host}/history/${promptId}`);
    const history = (await res.json())[promptId];
    const out: Buffer[] = [];
    for (const node of Object.values<any>(history.outputs)) {
      for (const img of node.images ?? []) {
        const q = new URLSearchParams(img).toString();
        const r = await fetch(`http://${this.host}/view?${q}`);
        out.push(Buffer.from(await r.arrayBuffer()));
      }
    }
    return out;
  }

  async run(workflow: Record<string, unknown>): Promise<Buffer[]> {
    const id = await this.queue(workflow);
    await this.wait(id);
    return this.outputs(id);
  }
}

Production Patterns: Queueing, Scaling, Error Handling

The ComfyUI server is single-process, single-GPU, and executes one prompt at a time. This is fine for prototypes and fatal for production. Five patterns fix it.

1. Put a queue in front of ComfyUI, not behind it

Do not let your application hit ComfyUI directly. Every production integration I've seen eventually converges on this shape:

HTTP API (your service)
    ↓
Message queue (Redis, SQS, RabbitMQ)
    ↓
Worker pool — each worker owns exactly one ComfyUI instance on one GPU
    ↓
Object storage (S3, R2, GCS) for outputs

Your HTTP API writes a job to the queue and returns a job ID. Workers pull jobs, talk to their local ComfyUI on localhost:8188, upload outputs to object storage, and write status to a DB. Clients poll or subscribe for status.

This gives you:

  • Backpressure. Spikes queue up instead of crashing ComfyUI.
  • Horizontal scaling. Add GPU workers; they consume from the same queue.
  • Isolation. A bad workflow on one worker doesn't affect others.
  • Retry semantics. Dead letter queues for workflows that fail repeatedly.

2. Generate more candidates than you need, then score

This is the pattern that separates hobby projects from products people pay for. Instead of generating one image per user request and hoping it's good, generate N candidates and deliver the top K.

The math works because GPU time is cheap relative to the cost of shipping a bad image. A team I've seen running AI headshots at scale generates 240 candidates per user and delivers only the top 60. The rejected 180 never reach a human. At their per-image cost, the extra candidates add roughly $0.02 per delivered image; a single refund costs 100x that.

def generate_with_selection(prompt: str, num_candidates: int, deliver_top: int):
    candidates = [
        generate_image(prompt=prompt, seed=random_seed())
        for _ in range(num_candidates)
    ]
    scored = [score(img) for img in candidates]
    scored.sort(key=lambda s: s.total, reverse=True)
    return scored[:deliver_top]

Which brings us to the scoring function - the thing that makes the candidate-pool pattern actually work. Build it across three tiers:

  1. Generic quality: prompt alignment (CLIP similarity), artifact detection, composition, sharpness.
  2. Use-case specific: for headshots, face fidelity, expression, skin-tone consistency; for garments, fabric accuracy and occlusion; for products, background cleanliness.
  3. Business-specific rules: "no visible logos in the background," "eyes must be open," "skin tone within 2 stops of reference."

Score independently per dimension, pass/fail against configurable thresholds, deliver only what passes. Teams running this pattern consistently see customer-support quality complaints drop from the 30–40% range into low single digits. If you don't want to build this from scratch, there are drop-in services for it (Sentinel, internal tools built on CLIP + specialized vision models); the principle is more important than the vendor.

3. Multi-provider routing, with fallback

When you're at serious volume, hitting a single provider is leaving money and reliability on the table. Provider pricing varies wildly - not just between providers but within the same provider over time (spot pricing, capacity, cold starts during peak hours).

A routing layer checks multiple providers on every request and sends the job to the cheapest one currently available and fast enough, with a fallback chain on failure. A common three-tier setup:

  • Primary: a high-reliability provider (e.g. fal.ai) for latency-sensitive jobs.
  • Cost layer: a cheaper provider (e.g. together.ai, or self-hosted GPUs) for jobs where latency is flexible.
  • Fallback: a known-reliable third (e.g. Replicate) when the first two are down.
def select_provider(model, requirements):
    available = get_healthy_providers(model)
    capable = [p for p in available if p.supports(requirements)]
    capable.sort(key=lambda p: p.current_cost_per_image(requirements))
    return capable[0] if capable else fallback_provider

Routing by cost alone is a mistake. Factor in availability, current queue depth, and error rate. Kill jobs that exceed a latency budget and retry on the next provider. Track actual cost per delivered image, not quoted cost - the two diverge fast once retries and failed generations enter the picture. Teams who do this cleanly see 50–65% cost reductions versus single-provider setups.

4. Warm pools, dynamic containers, and model preloading

ComfyUI loads models lazily - on the first prompt that references them. Cold model loads can take 15–60 seconds for SDXL, longer for Flux or video models. For user-facing workloads this is unacceptable.

The naive fix is static warmup: preload the top N models at worker startup. This works until your workload is diverse enough that the "top N" isn't enough. The more robust pattern, used by every serious ComfyUI deployment platform, is:

  • Dynamic per-user containers. Each worker image is built once per (user, workflow) combination, baking in the exact custom nodes and Python deps the workflow needs. You pay the build cost once, not per job.
  • Lean base + network-mounted model storage. The container itself stays small; models are pulled from fast network storage on first use and cached on the machine.
  • Pre-warmed workloads based on demand prediction. Keep a rolling set of warm workers for your most common workflows so new jobs hit a warm container instead of spinning one up.
  • Explicit VRAM management. For workflows that exceed GPU memory, call POST /free between executions to clear stale models before loading new ones.

Don't build this from scratch unless you have to. RunPod Serverless, Modal, and managed ComfyUI platforms (Comfy.ICU, Runflow, Salad, and others) all offer versions of this. Build-vs-buy decision: under 50K images/month with standard workflows, buy; custom nodes + strict latency SLAs + regulated data usually push toward building or toward a dedicated platform.

5. Timeouts, retries, and environment promotion

Set three timeouts explicitly:

  • HTTP timeout on /prompt: Short (5–10s). This only covers submission, not execution.
  • Execution timeout (your wrapper code): Long (5–20 min depending on workflow). Kill via POST /interrupt if exceeded.
  • WebSocket heartbeat: Send pings every 30s. Reconnect on silence.

Retry policy: do not auto-retry failed prompts unless you've classified the error. Validation errors (node_errors) will fail the same way every time. Retry only on network errors, CUDA OOM (with a warmup/model unload first), and transient WebSocket drops.

And a pattern that's under-discussed: promote workflows through environments the same way you promote code. A workflow JSON is an artifact. Pin it by hash. Test the pinned hash in staging before promoting to prod. Keep rollback one click away. Teams who skip this find themselves debugging production on Friday night when a designer changes a node on the canvas and nobody notices.

Output handling

Images saved by SaveImage nodes live in ComfyUI's output/ directory. Don't serve that directory directly - it's shared across all workflows. Instead:

  1. Pull images via /view immediately after completion.
  2. Upload to object storage with a tenant-scoped key.
  3. Return signed URLs to your client.
  4. Have a cleanup job periodically wipe ComfyUI's output/ directory.

Security and Authentication Considerations

The default ComfyUI API has no authentication. Every endpoint, including file uploads and the VRAM-freeing endpoint, is exposed publicly on whatever port and interface the server binds to. This is the single biggest source of accidental leaks - a quick Shodan search turns up thousands of exposed ComfyUI instances, most of them running on hobbyist boxes with nothing between the public internet and a command-executing endpoint.

Four rules:

1. Never expose ComfyUI's port directly to the internet. Bind it to 127.0.0.1 or a private network interface. If you need external access, put it behind a reverse proxy. If you want a quick sanity check on whether your current setup is exposed, several free tools in the ComfyUI ecosystem (including community plugins like the Runflow ComfyUI node) will scan your ports anonymously and tell you what's open from the outside.

2. Put authentication in the reverse proxy. Nginx, Caddy, or Traefik with API key auth is the standard pattern:

location / {
    if ($http_authorization != "Bearer your-secret-token") {
        return 401;
    }
    proxy_pass http://127.0.0.1:8188;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}


3. Validate workflows before submission. Untrusted workflow JSON can reference arbitrary node classes, including custom nodes you've installed that may run shell commands or read files (some community nodes do this intentionally). If users submit their own workflows, whitelist allowed class_type values in your API layer before forwarding to /prompt.

4. Use TLS for WebSockets in production. Once you're behind HTTPS, the WebSocket URL becomes wss://. Most proxies handle the upgrade automatically if you forward the Upgrade and Connection headers.

CORS and origin validation: ComfyUI has built-in origin checks. If you're calling the API from a browser (rare, usually a bad idea for public products), configure --enable-cors-header and set an allowed origins list. Prefer calling from a server-side backend.

ComfyUI API vs Managed Alternatives

The most common question from technical founders: when is the self-hosted ComfyUI API worth it vs. a managed service?

PlatformBest forTradeoff
Self-hosted ComfyUI APIFull control of workflows, custom nodes, models; lowest per-image cost at scaleYou own uptime, scaling, auth
ReplicateCalling existing published models quicklyLimited to what's published; per-call pricing
Fal.aiLow-latency hosted diffusion with ComfyUI workflow supportLess flexibility than self-hosted; vendor lock-in risk
RunPod ServerlessSelf-hosted ComfyUI with autoscaling and no server managementCold starts; vendor-specific deployment
Comfy.ICUManaged ComfyUI with API, good for teams who want the canvas + an APILess customization than full self-host
RunflowOne-click deploy from ComfyUI with built-in quality scoring, auto-retry, multi-provider routing, and dev/staging/prod promotionOpinionated about treating workflows (not models) as the unit of deployment
BentoML / comfy-packTurning workflows into versioned, deployable servicesAdditional build step; more infrastructure
Salad / ViewComfy / MysticVarious managed ComfyUI-as-an-API offeringsPricing and feature parity vary

Decision heuristic:

  • < 1,000 images/month, standard workflows → Replicate or Fal. Not worth your time to self-host.
  • 1,000–50,000 images/month, custom workflows → A managed ComfyUI platform (RunPod Serverless, Comfy.ICU, Runflow, Salad). You get ComfyUI without running it, and you keep the custom nodes and workflow control you'd lose on a model-only API.
  • > 50,000 images/month, or regulated data, or proprietary custom nodes → Self-hosted ComfyUI API on your own GPU fleet, unless a managed platform explicitly supports your compliance and custom-node requirements. Cost and control justify the operational overhead.

A second axis that matters more than volume for some teams: do you need quality scoring, retries, and environment promotion built in, or are you happy wiring those yourself? If you're building a consumer product where bad outputs become refunds, the managed tiers that include scoring and auto-retry save more than their margin. If you're building an internal tool where a human reviews every output anyway, the raw ComfyUI API is all you need.

Common Pitfalls and Troubleshooting

Pitfalls that eat the most debugging time, ranked by how often I see them:

1. Submitting UI-format workflow JSON instead of API-format. The UI's default save format includes positional data and won't validate. Enable dev mode in ComfyUI settings and use "Save (API Format)".

2. Missing client_id on WebSocket. If you connect to /ws without a clientId query parameter - or use a different one than what you send with /prompt - you'll see every event for every client instead of just yours. Generate one UUID per session, use it everywhere.

3. Forgetting that /prompt validates synchronously. A 400 response means the workflow is malformed. Don't retry. Parse node_errors and surface it to the caller.

4. Assuming /history/{prompt_id} exists immediately. The endpoint returns an empty {} until execution finishes. Check for the prompt_id as a key in the response, not just for a truthy response.

5. Ignoring binary frames on the WebSocket. ComfyUI sends preview images as binary messages in addition to JSON status messages. If your WebSocket code assumes all frames are JSON, it'll crash. Filter on isinstance(msg, str) (Python) or check type in JS.

6. Not handling OOM gracefully. CUDA out-of-memory errors come through as execution_error events, not HTTP errors. Catch them specifically - they're transient if you free VRAM (POST /free) and retry.

7. Hardcoding checkpoint filenames. sd_xl_base_1.0.safetensors on your dev box might be sdxl/base.safetensors on the production server. Parameterize model names in your workflow templates and resolve them per-environment.

8. Running multiple workflows concurrently on one ComfyUI instance. Don't. ComfyUI is single-threaded execution. Two simultaneous /prompt calls get serialized anyway, but you'll lose track of which WebSocket events belong to which prompt if you don't use distinct client_ids. Use a queue and one instance per GPU.

9. Shipping without automated quality scoring. This is the pitfall that doesn't show up until you're at volume. At 10 images/day you eyeball them. At 10,000/day you ship garbage statistically - face distortions, wrong backgrounds, the classic missing finger - and the support tickets arrive before you realize the model regressed. Build a scoring layer (CLIP similarity for prompt alignment, face-fidelity models for portraits, custom rules for business constraints) before you're at scale, not after. The scoring layer also unlocks the candidate-pool pattern in the production section.

10. Treating workflows as disposable. A workflow JSON is a production artifact. Version it, diff it, pin it by hash in prod, and promote it through environments the way you'd promote code. The most common Friday-night outage in ComfyUI shops is "a designer touched the canvas and didn't tell anyone."

FAQ

What is the ComfyUI API? The ComfyUI API is the HTTP and WebSocket interface exposed by the ComfyUI server on port 8188 by default. It lets developers submit workflows, upload inputs, track execution in real time, and retrieve outputs programmatically - everything the ComfyUI canvas does is available as an API call.

Is ComfyUI a REST API? ComfyUI exposes a REST-style HTTP API alongside a WebSocket for real-time events. Endpoints like /prompt, /history, /upload/image, and /view use standard HTTP verbs and JSON payloads. It's not strictly RESTful in the academic sense (no HATEOAS, inconsistent resource modeling), but functionally it behaves like any REST API you'd integrate with.

What is the /prompt endpoint in ComfyUI? The /prompt endpoint is ComfyUI's main execution endpoint. A POST request containing the workflow in API JSON format queues it for execution. The response returns a prompt_id used to track and retrieve results. On validation failure, it returns node_errors describing what's wrong in the workflow.

How do I upload an image to the ComfyUI API? Use POST /upload/image with a multipart/form-data request. Include the image file under the image field, and optionally type (input/temp/output), subfolder, and overwrite. The response returns the stored filename, which you then reference in a LoadImage node inside your workflow.

Can I run ComfyUI workflows without the UI? Yes. The ComfyUI server runs headless by default - the UI is just one client. Any HTTP client (curl, Python, Node, Go) can submit workflows via /prompt and retrieve outputs via /view or /history. The only UI interaction you need is one-time: exporting your workflow in API format so you have the JSON to submit.

How do I authenticate requests to the ComfyUI API? There is no built-in authentication in ComfyUI. For production, put it behind a reverse proxy (Nginx, Caddy, Traefik) that enforces API key or OAuth authentication, and bind the ComfyUI process itself to localhost so it's not directly reachable.

What's the difference between the WebSocket and REST API in ComfyUI? The REST API handles request/response operations: submit a prompt, fetch history, upload an image, download a result. The WebSocket (/ws) provides a persistent connection for real-time execution events - which node is running, sampler progress, completion notifications, and preview images. Use REST for submission and retrieval; use WebSocket when you need live progress tracking.

Why does my ComfyUI API request return a validation error? The most common causes: submitting the UI-format workflow JSON instead of API-format, referencing a checkpoint or LoRA file that isn't installed on the server, or calling a custom node class the server doesn't have loaded. Read the node_errors field in the response - it points to the specific node and field that failed.

Can ComfyUI handle concurrent API requests? A single ComfyUI instance processes one workflow at a time. Concurrent /prompt requests are queued and executed sequentially. For real concurrency, run multiple ComfyUI instances (one per GPU) behind a shared queue and distribute jobs across them.

How do I deploy the ComfyUI API to production? The pattern that works: put a message queue (Redis, SQS) in front of a worker pool, each worker runs its own ComfyUI instance on its own GPU, outputs go to object storage, and a reverse proxy with auth fronts your HTTP API. Platforms like RunPod Serverless, Salad, Comfy.ICU, and Runflow handle this infrastructure for you if you'd rather not build it yourself.

Do I need quality scoring if I'm just calling the ComfyUI API? For prototypes, no. For anything that ships images to paying users, yes. At any real volume, defects become statistical certainties - face distortions, wrong backgrounds, artifacts that pass at thumbnail and fail at full resolution. Automated scoring (CLIP for prompt alignment, face-fidelity models for portraits, custom rules for your use case) is cheaper than manual QA and cheaper than refund tickets.

Where to Go Next

If you're building a real product on the ComfyUI API, the order of operations that works:

  1. Get a single workflow running end-to-end via /prompt + /ws on localhost.
  2. Wrap it in a client class (like the Python example above) and version-control your workflow JSON templates.
  3. Decide between self-hosted GPU infrastructure and a managed ComfyUI platform based on your volume, custom-node needs, and whether you want scoring/retry built-in.
  4. Put a queue in front. Every team that skipped this step has come back to add it.
  5. Add automated quality scoring before volume forces you to. Manual QA doesn't scale past a few dozen images per day.
  6. Instrument it - per-workflow latency, per-node failure rates, GPU utilization, queue depth, pass rate through your scoring layer.

The ComfyUI API is well-designed. Five endpoints do 90% of the work. The architecture translates cleanly from laptop to production. The real work isn't calling the API - it's everything that wraps around it: the queue that absorbs spikes, the scoring layer that catches the bad 3% before customers see it, the routing layer that lets you swap providers without rewriting your app, and the environment promotion that keeps production reproducible.

That's the layer where products get built. The API is how you get there.

comfyuiapiinfrastructureproduction

Want custom benchmarks for your workload?

We'll run our evaluation pipeline against your production data, for free.

Talk to Founders