Guides Jun 23, 2026 9 min read

ComfyUI Qwen image generation: workflow and API guide 2026

ComfyUI Qwen image generation walkthrough: pick the right quant, render text that stays legible, scale to full HD in under a minute, then call Qwen as an API.

Thibaut Hennau

CMO - building the expert's marketplace

21 seconds for a photoreal burger. On an RTX 4090. With text on the bun that actually reads as text.

That single number is why ComfyUI Qwen image generation jumped to the top of my test list. Qwen-Image is a text-to-image model, and the thing it does that almost no other open model does well is render readable words inside the picture. Logos, button labels, the small print at the bottom of a soda can. Most models smear that into alphabet soup. Qwen mostly gets it right.

We learned the rest the expensive way, because we run image models in production and not in a one-off demo. Qwen-Image is genuinely strong on prompt understanding and text, but it has two real weaknesses you only feel after the hundredth render: it barely varies between seeds, and it happily draws trademarked characters. Both matter the moment you put this in front of users.

So this post does two things. First it walks the text-to-image workflow you can build in ComfyUI: the four files, the quant choice, the speed LoRA, and how to push past full HD without the run crawling. Then it covers the production side. When "works on my desk" turns into thousands of images a day, you call the model over an API instead of babysitting weights and VRAM.

A note on disclosure. Runflow is our product. The API shapes below are Runflow's. The ComfyUI method is provider-agnostic and works whether or not you ever touch Runflow. I flag clearly where each path fits.

Qwen image generation workflow in ComfyUI (pixaroma, Ep 57)

What you need before the first Qwen render

Qwen image generation in ComfyUI needs four model files plus four custom nodes: the Qwen-Image diffusion model at a quant that fits your card, a clip model, a VAE, and a four-step speed LoRA.

Start with the update. Qwen is new enough that an old ComfyUI build will not recognize its nodes. Run the update batch file in your install folder, restart, and if a node still throws an error, open the Manager, hit update all, and restart again. A red outline around a node means it is missing, so install it from the custom-nodes Manager.

The quant choice is the one beginners get wrong. The Q number is the precision: Q8 is near full quality, Q4 is lighter and faster but rougher. On a 24GB RTX 4090, Q8 is the target and Q6 is the safe fallback, and in testing the two were within a second or two of each other. A friend ran Q4 on a 12GB card and it worked fine. The rule of thumb: if the model file is smaller than your VRAM, try it.

Files go in specific folders. The Qwen diffusion model goes in diffusion_models, the clip model in clip, the VAE in vae, and the four-step LoRA in loras. If you have used Flux before, the upscaler is probably already on disk. If you load a different quant later, reselect it in the loader node, and refresh node definitions from the Edit menu if the model does not show up.

If you want the full path from a local graph to a hosted endpoint, the ComfyUI API developer guide covers the moving parts.

Why the text rendering is the headline feature

Qwen-Image reads back. Prompt it for a game UI, an energy-drink can, or a webpage mockup, and the words inside the image stay legible instead of melting into noise.

This is the part that surprised me most. I prompted a game interface with a logo, buttons, and a HUD. Qwen kept the logo clean and the buttons readable. I asked for a Qwen energy drink and it nailed the small text along the bottom of the can. I described eight separate icons in one prompt and it placed all eight correctly, which is the kind of instruction following I had only seen from closed models before.

The lever is prompt length. Qwen rewards long, detailed prompts. A two-line description gives you a generic result. A paragraph that names the subject, the layout, the text content, and the style gives you the legible, on-brand image. I generate the prompt itself with an LLM, then hand it to Qwen. The model handles the rest.

It is not flawless on text. A steampunk poem and a clock face came out mostly right but not perfect, and a couple of words dropped. Re-rolling a few seeds usually fixes the stragglers. For illustration, product mockups, and design ideation, though, this is the strongest open model I have run.

Pushing past full HD without the run crawling

Qwen scales to roughly 2.1 megapixels, almost exactly full HD, in a few extra seconds instead of the long wait larger images usually cost.

Here is where the speed claim earns its keep. A 1-megapixel image took 21 seconds. I bumped it to 2.1 megapixels, the near-perfect 1920-wide full HD size, and the run only added a few seconds. With most models, doubling the pixel count is a long, painful wait. Qwen handled it like it was nothing.

A few practical limits. The flux resolution calculator node lets you pick a clean aspect ratio without hand-editing width and height, which I use to lock a landscape ratio for video thumbnails. You can reach a 2K image without the upscaler in under a minute, but past 2,000 pixels subjects start repeating, so I keep it under that ceiling. There is also a built-in upscaler you can toggle on. Because the seed is fixed, re-running upscales the same image rather than making a new one. Change the seed by hand if you want a fresh frame first.

The image-to-image variant swaps the empty latent for a Load Image node and a denoise control. Low denoise nudges the original (a forest to a snowy forest, same cat). High denoise reinvents it. It is a clean way to spin variations of an existing frame.

The seed weakness nobody warns you about

Qwen barely changes between seeds. Run the same long prompt on five different seeds and you get five near-identical compositions, which breaks any workflow that depends on variety from a single prompt.

This is the failure that costs you in production, so I am opening this section with it. I set the seed to randomize on a cartoon-pirate prompt and got the same composition over and over. Flux gives you something genuinely different on every seed. Qwen does not. The longer and more specific the prompt, the less variation you get, because the prompt pins the output that tightly.

There is also the trademark problem. Qwen will cheerfully draw Batman and Superman with the correct logos if you ask. Fun for personal use, a real liability for commercial work, so watch what you prompt when the output is going to ship.

Why does the seed thing matter for a real product? Because most image features want options. A user clicks "generate" and expects four different takes, not the same picture four times. With Qwen you cannot lean on the seed for that. You have to vary the prompt itself, word by word, and even then a single-word change sometimes does nothing. Plan your variation strategy around the prompt, not the seed. (We found this out building a gallery feature, and the fix was prompt templating, not a louder random number.)

What changes when you run Qwen at scale

On one desk, Qwen on a 4090 is a great tool. In an app serving real traffic, the workflow becomes a fleet problem: cold starts, VRAM limits, queueing, and a GPU bill that scales with every user.

The local graph is honest about its cost. A real GPU, the right quant, a clip model, a VAE, and a speed LoRA, all loaded before the first pixel. That is fine for you. It does not survive contact with a thousand concurrent users. One 4090 serializes everything behind it, and the math on buying enough of them does not end well for a small team (we have run that math).

This is the case for calling the model over an API instead. You send a prompt, you get an image, and the GPU pool, the model loading, and the scaling are somebody else's job. Runflow hosts the Qwen family alongside Flux, GPT Image, and the rest of the catalog, with simple fixed per-call pricing so a finance team can actually forecast the line. You can browse the live Qwen model page to see the inputs and the per-call rate, or scan the full model catalog. The pitch is plain: roughly 70% cheaper than standing up the GPU stack in-house, and no AI infra team required to keep it warm.

Here is the real call. Submit a run, then poll for the result:

# Submit a generation run
curl -X POST https://api.runflow.io/v1/models/alibaba/qwen-image-edit-2511/runs \
  -H "Authorization: Bearer $RUNFLOW_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "a vintage energy drink can labeled QWEN, bold legible logo, small ingredient text along the bottom, studio product lighting"
    }
  }'

# Response includes a run id:
# { "id": "run_abc123", "status": "queued" }

# Poll until it finishes
curl https://api.runflow.io/v1/runs/run_abc123 \
  -H "Authorization: Bearer $RUNFLOW_API_KEY"
# -> { "status": "succeeded", "output": { "images": [ "https://..." ] } }

If you would rather keep your exact ComfyUI graph and just host it, ComfyUI Deploy runs the whole workflow as a serverless endpoint, custom nodes and all. Either path gets you off the single-machine ceiling.

Qwen versus Flux versus the fast models

Qwen wins on text, prompt understanding, and clean design output. Flux wins on photoreal faces and genuine seed variety. The fast Nunchaku-style models win on raw speed.

I compared three in the same session. Qwen is the most precise and the most colorful, with the fewest structural mistakes, which is exactly what you want for illustration, product design, and anything with text in it. Flux holds the edge on realism, because it keeps the small imperfections that make a face read as human. Qwen makes people a touch too smooth and too perfect, and that polish is its tell on realistic portraits.

For speed, the bonus model in the source video was a Flux-based Nunchaku workflow generating images in 3 to 4 seconds. Useful when you want a fast first draft and plan to refine later. If you are weighing the fast end of the spectrum, the Z-Image setup is another 6B model worth a look, and if you want to edit images rather than generate from scratch, the Qwen image edit workflows cover the companion model.

The honest scorecard: prompt understanding, best I have tested. Text generation, best I have tested. Image quality, excellent. Realism, not quite there. Speed, good and improving. The gaps right now are a thin LoRA ecosystem and no ControlNet support yet, plus that seed issue. For a free model this good with text, those are easy trades.

Frequently asked questions

What is Qwen-Image in ComfyUI?
Qwen-Image is an open text-to-image model from Alibaba that you run inside ComfyUI as a diffusion model. Its standout trait is rendering legible text and complex layouts inside the generated image, alongside strong prompt understanding.

Which Qwen quant should I use?
On a 24GB card like a 4090, use Q8 for near-full quality with Q6 as the fallback. On 12GB, Q4 works. The rule of thumb: if the model file is smaller than your VRAM, try it, and step down a quant if you run out of memory.

How long does a Qwen image take to generate?
About 21 seconds for a 1-megapixel image on an RTX 4090 with the four-step speed LoRA. Scaling up to roughly 2.1 megapixels (near full HD) only adds a few seconds, which is unusually fast for the resolution.

Why do my Qwen images look the same across different seeds?
Qwen produces very little variation from seed to seed, especially on long prompts. Change the prompt wording to get a different composition rather than relying on the seed. This is a known weakness compared to Flux.

Can Qwen render text inside images?
Yes, and it is the model's headline feature. It handles logos, UI buttons, can labels, and short paragraphs better than most open models. It still misses the occasional word, so re-roll a few seeds for fully clean text.

Is Qwen-Image good for realistic photos?
It is decent but not the leader. Faces come out a little too smooth and perfect, missing the imperfections that make Flux portraits read as real. Qwen is stronger for illustration, product design, and text-heavy images.

Can I use Qwen images commercially?
Be careful. Qwen will draw trademarked characters and logos when prompted, which is a legal risk for commercial work. Keep your prompts clear of protected IP when the output is going to ship.

How do I call Qwen as an API instead of running it locally?
Send a POST to https://api.runflow.io/v1/models/{owner}/{slug}/runs with your prompt, then poll GET /v1/runs/{id} for the result. The GPU pool and model loading are handled for you, with simple fixed per-call pricing.

Does Qwen-Image support ControlNet or many LoRAs?
Not yet. At the time of writing there are only a handful of LoRAs on Civitai and no ControlNet models, so structural control is limited. Expect the ecosystem to fill in over time.

What four files do I need to run the Qwen workflow?
The Qwen-Image diffusion model (your chosen quant), a clip model, a VAE, and a four-step speed LoRA, placed in diffusion_models, clip, vae, and loras respectively, plus the four matching custom nodes.

Where to go next

So, back to that 21-second burger with readable text on the bun. The question is whether you want to keep feeding a single GPU, or hand the scaling problem to someone else.

Update ComfyUI, install the Manager, and confirm the Qwen nodes load without red outlines.
Download the Qwen-Image diffusion model at a quant that fits your card, plus the clip, VAE, and four-step LoRA.
Build the text-to-image graph and render a long, detailed prompt with text in it to feel the model's range.
Push to 2.1 megapixels for a clean full-HD frame, and stay under 2,000 pixels to avoid repeats.
Plan your variation around the prompt, not the seed, and keep trademarked subjects out of commercial work.
When you are ready for real traffic, read the ComfyUI API developer guide and call Qwen over an API instead of buying GPUs.
Or host your exact graph with ComfyUI Deploy and skip the single-machine ceiling entirely.

Start free at runflow.io.

video-sourcecomfyui qwen imageqwen image generationqwencomfyuiai image api

Want custom benchmarks for your workload?

We'll run our evaluation pipeline against your production data, for free.

Talk to Founders

Related posts

Portrait Generation Benchmark Q1 2026: Flux.2 vs SDXL vs Proprietary

Feb 24, 2026·12 min

How We Cut GPU Costs 70% - The Architecture Behind Runflow

Feb 20, 2026·18 min

Background Removal Showdown: RMBG-2.0 vs SAM 2 vs Proprietary APIs

Feb 17, 2026·9 min