Nano Banana API: Live Reliability Benchmarks for Top 6 Providers (2026)
Back to blog
Benchmarks Apr 28, 2026 15 read

Nano Banana API: Live Reliability Benchmarks for Top 6 Providers (2026)

Live reliability benchmarks for the top 6 Nano Banana API providers. Real success rates, P50/P95 latency, error breakdowns, and a decision framework. Updated continuously.

Miguel Rasero
Miguel Rasero
CTO & Co-Founder

Access Live Benchmarks Here

The Nano Banana API space is full of comparisons. Almost none of them are useful. Most are affiliate pages dressed as guides, comparing two or three providers, claiming "99.8% uptime" with no evidence, and quietly steering you to the platform paying them. The handful of honest ones are static snapshots from one Tuesday afternoon in January.

This page is different. It's anchored on a continuous, public benchmark that fans out the same prompt to six Nano Banana 2 providers every 10 minutes, 24 hours a day, and publishes the results live at protos.runflow.io/nano-banana. You can verify everything in this article in real time on that page, including the actual generated images.

The goal is to make the right Nano Banana API choice obvious for your use case, based on data that doesn't go stale the day this article ships. We'll cover what each provider actually delivers on reliability, latency, and cost, where each one wins, where each one loses, and how to pick.

We built and run the benchmark. Runflow is one of the six providers tested. Where Runflow leads, we say so. Where it loses, we say so. Where another provider is the better pick for your use case, we say so. The methodology is documented; the data is public; the verification is one click away.

What Is the Nano Banana API?

The Nano Banana API is the developer-facing endpoint for Google's family of Gemini-based image generation models, available either directly through Google's Vertex AI and Gemini API, or through third-party providers (Replicate, Together AI, Prodia, Runware, Runflow, Fal, and others) that resell or route to the same underlying models.

The naming has gotten layered, so it's worth being precise:

  • Nano Banana (Gemini 2.5 Flash Image, August 2025) is the original model. Fast, cheap, viral. Still in production use for high-volume, latency-sensitive workloads.
  • Nano Banana Pro (Gemini 3 Pro Image, November 2025) is the studio-quality tier. Best text rendering, 4K output, multi-image blending, expensive.
  • Nano Banana 2 (Gemini 3.1 Flash Image, March 2026) is the current default. Pro-level intelligence at Flash speed. Native 1K, 2K, and 4K output. This is what most teams should be benchmarking and what we test on protos.runflow.io.

When developers say "Nano Banana API," they usually mean whichever version they're on, accessed through whichever provider gives them the cleanest combination of reliability, speed, and cost. This article is about helping you pick that combination deliberately.

The Real Question: Reliability, Not Just Pricing

Most Nano Banana API comparisons lead with pricing because pricing is easy to put in a table. Reliability is harder to measure honestly, which is why almost no one publishes it. But for production apps, reliability dominates the cost equation, because a failed call costs you the user experience plus the retry plus the support ticket.

Most providers don't bill for failed calls — the per-image dollar math is roughly a wash regardless of reliability. The cost of unreliability lives somewhere else. Every failure is 30 to 60 seconds your user spent staring at a spinner before they got an error, and at 75% reliability that's roughly one in four sessions ending in a wait-then-fail. Every cluster of failures is a wave of support tickets and a dent in the trust users place in your product. Every retry pipeline is engineering time spent monitoring, alerting, and recovering rather than building. A provider that's a few cents cheaper per call but fails one in four times isn't saving you money — it's shifting the cost from your invoice to your user experience, your support queue, and your roadmap.

This is why the affiliate-pitch articles ranking "$0.05 here vs $0.13 there" mislead you. Pricing is one axis; reliability is another; latency is a third; the right provider for your use case sits somewhere in the multi-axis space. We'll cover all of them, but reliability comes first because reliability gates everything else.

How We Test: The Live Reliability Methodology

nano-banana-2-reliability-benchmark-test-2026

LIVE BENCHMARKS HERE

The benchmark behind this article is simple by design. Every 10 minutes, 24 hours a day, the same prompt fans out to six providers in parallel. Every call is recorded. Every output is stored. Every failure is categorized. The aggregate is published live.

What we measure

Four signals, each shown on its own module on the protos page:

Throughput (successes vs failures over time). Stacked bars per provider, bucketed by 6 hours. This is where you see whether failures are random or clustered. Random failures suggest provider-level instability; clustered failures (where multiple providers fail at the same time) suggest upstream Google capacity events affecting everyone.

Latency (P50 and P95 over time). Successful runs only. P50 is the typical user's experience; P95 is what your slowest 5% of users feel. Filtering to successful runs is methodologically essential because failed calls have no meaningful latency, and including them inflates the numbers in misleading ways.

Failure categorization (what's actually breaking). Per-provider counts of timeouts, HTTP errors, and invalid responses. This is the single most useful module for choosing a provider, because "Replicate fails 25% of the time" is one number; "Replicate's failures are almost entirely HTTP errors during peak hours" is the kind of insight that tells you whether you can engineer around the problem.

failure-timeout-whats-breaking

Recent batches by day (live gallery). Every prompt fan-out, with all six provider outputs side by side, newest first, grouped by UTC day. Each image is stamped with its actual generation latency. This is where image quality becomes verifiable; we don't grade quality programmatically (more on why in the limitations section), but readers can see for themselves whether Provider X's text rendering is actually legible.

Methodology choices worth being explicit about

  • One prompt at a time. Every fan-out uses the same prompt across all six providers, simultaneously, so we're comparing apples to apples. We rotate prompts daily to test different model behaviors (text rendering on cake candles, surface texture on LEGO bricks, and so on).
  • Same parameters across providers. 1K resolution default, default safety settings, default sampler.
  • One geography. All tests run from a single location. Geographic latency variance is real but isn't what this benchmark is measuring.
  • No retries. A failed call is recorded as a failure. We're not measuring "eventually succeeded after 3 attempts," we're measuring first-call reliability.

What we noticed in the data

success-vs-failure-clusters

One pattern worth surfacing because it changes how you should think about provider choice: failures cluster in time, not at random. When you look at the throughput chart, the same hours show failures across multiple providers. That points to upstream Google capacity events affecting downstream resellers simultaneously, not random per-provider flakiness.

The implication: a routing layer with automatic failover materially helps during these windows, because while one upstream is degraded, the other paths are usually fine. A direct connection to a single provider has no such buffer. This shows up clearly in the data in the next section.

Why Fal isn't in the benchmark

Fal.ai is a credible Nano Banana 2 provider and shows up frequently in developer conversations. It's on the shortlist for the next benchmark batch. As of this article's publish date, the live test runs across the six providers documented below; expect Fal to be added in a future update.

Disclosure

We built and run protos.runflow.io. Runflow is one of the six providers tested. The methodology above is the same regardless of where Runflow places. The data below is what the benchmark actually measured during the most recent 7-day window, including the entries where Runflow lost on a given metric.

The Top 6, by Reliability and the Trade-Offs Each Makes

The headline ranking. Numbers below are from a 7-day window snapshot taken on April 28, 2026, with 236 runs per provider. Live data is always at protos.runflow.io/nano-banana.

ProviderSuccess rateP95 latencyTotal failuresMedian failure modeBest-fit use case
Runflow99.6%64.6s1TimeoutProduction apps where reliability matters most
Vertex AI Direct98.7%37.3s3Invalid responseConservative pick, balanced on every axis
Runware95.8%43.2s10HTTP errorLatency-sensitive workloads with budget
Prodia94.1%32.6s14HTTP errorLow-latency batch processing
Together AI87.7%48.5s29HTTP errorCost-sensitive, retry-tolerant pipelines
Replicate75.4%36.3s59HTTP errorSkip for production; useful for prototyping

Three things to read out of this table:

  1. There's a clean three-tier split. Runflow and Vertex are the reliability leaders (above 98%). Runware, Prodia, and Together are the middle tier (87–96%). Replicate is the outlier; one in four calls fails.
  2. P95 latency does not track success rate. Runflow has the highest reliability and the highest P95. Vertex has the second-highest reliability and the lowest P95. The other middle-tier providers cluster between 32 and 48 seconds. Reliability and tail latency are different problems with different solutions; the table shows it.
  3. Failure modes matter as much as failure counts. Vertex's 3 failures are mostly invalid responses (parsing issues; transient and recoverable). Replicate's 59 failures are all HTTP errors (server-side rejections; harder to recover from at the client). Same word ("failure"), different operational implications.

The provider-by-provider breakdown below gets concrete on each.

Provider-by-Provider Breakdown

Runflow

Access Runflow Nano Banana 2 API here

Runflow leads on reliability (99.6% success rate, 1 failure across 236 runs over 7 days) and pays for that reliability with the highest P95 latency at 64.6 seconds. It's the right pick when "the call must succeed" matters more than "the call must be fast."

The data tells a specific story. Runflow's reliability advantage is not uniform; it shows up most clearly during the failure clusters visible in the throughput chart. When upstream Google capacity events hit (the Mon 17:30 bucket in our window is a clean example), most providers see correlated failure spikes; Runflow's failure count stays near zero through the same window. The trade-off is the dashed P95 line, which spikes well past 100 seconds during exactly those windows. Extra retries and failover paths cost time, and the data reflects that honestly.

Pricing: $0.080 per image at 1K default resolution. Roughly 78% above Vertex's $0.045 list price, in line with the markup other reliability-focused providers charge.

Integration: REST API with OpenAI-compatible request shape. Single endpoint, single API key, no per-provider configuration on your side.

Where Runflow wins:

  • Reliability during upstream capacity events (when other providers are rate-limited or returning 5xx, Runflow's routing keeps succeeding)
  • Failure count one to two orders of magnitude below the middle-tier providers
  • Single integration covering multiple upstream providers

Where Runflow loses:

  • Highest P95 latency of the six. If your app generates an image in response to a user click and the user is staring at a spinner, this matters
  • Higher per-image cost than going direct to Vertex
  • The reliability advantage is most pronounced during incidents; during quiet hours, the gap to Vertex is small

Best for: Production apps where reliability dominates the cost equation, batch pipelines where occasional slow successes are fine but failures are expensive, multi-tenant SaaS where one user's failed call becomes a support ticket.

Skip if: Your application requires sub-30-second tail latency on every call, you're early enough that you're optimizing for the lowest possible per-image cost, or your traffic is light enough that you can absorb provider failures with manual retries.

Vertex AI Direct

Vertex AI Direct is Google's official endpoint for Nano Banana 2 and the conservative default for any team that wants the model without an intermediary. 98.7% success rate, 37.3 second P95, and the lowest base price of the six.

This is the provider with the cleanest failure profile in the benchmark. Three failures across 236 runs, two of them invalid-response errors that a properly-instrumented client can detect and handle gracefully. The latency is also the most predictable; the P50 line on the latency chart sits flat near 22 seconds with little variance, and the P95 stays well under 50 seconds even during the failure-cluster windows.

Pricing: $0.045 per image at 1K and 2K resolution; $0.067 at 2K stable; $0.151 at 4K (per Google's official pricing as of early 2026, batch mode at 50% off). Cheapest of the six on a per-call basis.

Integration: Standard Google AI / Vertex AI SDK. Works through Gemini API key (lower setup overhead) or through Vertex AI on Google Cloud (better for enterprise compliance and quotas).

Where Vertex AI Direct wins:

  • Lowest base latency (P50 around 22 seconds, P95 around 37 seconds)
  • Lowest per-image price among providers tested
  • Direct access; no third-party in the request path; useful for compliance audits
  • Cleanest failure profile of any provider tested

Where Vertex AI Direct loses:

  • No automatic failover during Google capacity events; if the upstream is degraded, you're stuck
  • Quotas can be tight on free tier; production volume needs explicit quota increases
  • No multi-provider abstraction; if you ever want to A/B test against another model, you're rewriting integration

Best for: Teams who want the official endpoint, mid-volume production traffic with predictable patterns, anyone for whom "as close to Google as possible" is a compliance or auditability requirement.

Skip if: Your app needs to survive Google capacity events without manual intervention, you anticipate scaling beyond regional quota limits, or you want to evaluate multiple image models behind a single integration.

Runware

Runware is the latency-tight middle-tier choice. 95.8% success rate, 43.2 second P95, and a clean failure profile (10 HTTP errors, no timeouts) over the test window.

Runware sits in an interesting spot. Its P50 is essentially tied with the leaders (around 25 seconds), and its P95 stays within shouting distance of Vertex. The reliability gap to Runflow and Vertex is real but small, and the failures it does have are concentrated in HTTP errors rather than timeouts, which is operationally easier to handle (your retry logic can detect a 5xx and react in seconds, where a timeout costs you the full timeout window first).

Pricing: Mid-market. Specific rates vary by usage tier; check their current pricing page for accurate quotes.

Integration: REST API. Documentation is solid; webhooks supported; S3 upload available for outputs.

Where Runware wins:

  • Tight latency distribution (P95 close to direct providers)
  • Clean failure mode (HTTP errors over timeouts, easier to handle)
  • Good middle-ground reliability without the highest-tier price

Where Runware loses:

  • Reliability gap to top tier is real; you'll see failures during capacity events
  • Less name recognition than Replicate or Together if your team values mainstream tooling

Best for: Latency-sensitive batch workloads, production apps that can absorb 4–5% retry rates with proper error handling.

Skip if: You need the absolute lowest failure rate, or your team prefers the more widely-tooled providers.

Prodia

Prodia is the lowest-latency choice in the benchmark on successful runs. 94.1% success rate, 32.6 second P95 (the second-lowest of the six), and a clean lightweight integration.

The interesting thing about Prodia in the data is how flat its latency line stays. Where Together AI and Runflow show P95 spikes during peak hours, Prodia's latency stays remarkably consistent, even during the windows when its reliability dips. When Prodia succeeds, it succeeds fast.

Pricing: Aggressive. Public per-call pricing on their site.

Integration: REST API. Well-documented. Async with status polling.

Where Prodia wins:

  • Lowest P95 of the third-party providers tested
  • Consistent latency profile across time-of-day variance
  • Light integration footprint

Where Prodia loses:

  • 6% failure rate is meaningful for user-facing flows
  • Failure mode is mostly HTTP errors (handleable, but you have to handle them)

Best for: Latency-critical batch pipelines, prototypes where speed matters more than 100% reliability, async workloads that retry cheaply.

Skip if: You're building synchronous user-facing flows where 6% of users would see a failure on first attempt.

Together AI

Together AI is the volatile middle option. 87.7% success rate, 48.5 second P95, with failure clusters concentrated in specific time windows rather than evenly distributed across the day.

The throughput chart tells the Together story most clearly. Several 6-hour buckets show full 36/36 success counts; others (notably Mon 17:30 in our window) drop hard, with grey failure caps consuming a third or more of the bar. All 29 failures over the test window were HTTP errors. This is consistent with rate-limiting or upstream capacity issues during peak demand rather than random instability.

Pricing: Aggressive on volume; check current pricing.

Integration: OpenAI-compatible API. Familiar surface for any team already wired up to OpenAI's SDK.

Where Together AI wins:

  • OpenAI-compatible API surface; trivial integration if you're already using OpenAI tooling
  • Strong base latency on successful runs (P50 close to the leaders)
  • Cost-competitive at higher volumes

Where Together AI loses:

  • Volatile reliability; the 12% failure rate is concentrated in specific windows, which is harder to reason about than a flat rate
  • P95 drifts past 70 seconds during peak hours

Best for: Cost-sensitive pipelines with proper retry and exponential backoff, async workloads that aren't time-critical.

Skip if: You need consistent reliability hour-to-hour, or your retry budget is tight.

Replicate

Replicate has the highest failure rate of the six tested at 24.6% (75.4% success). Its 59 failures over the 7-day window are all HTTP errors. We don't recommend it for production Nano Banana 2 traffic at this point.

We want to be specific here because Replicate is a generally-respected platform with a strong reputation in other model categories. But the Nano Banana 2 numbers in the benchmark are what they are: one in four calls fails, and the failures are clustered tightly enough that no realistic retry policy makes the effective success rate competitive with the other providers tested.

The latency story is fine when calls succeed (P95 of 36.3 seconds, comparable to the leaders). The problem is the success rate gating the latency.

Pricing: Per-second compute pricing; competitive when calls succeed.

Integration: Excellent. Replicate has the most polished developer experience of the six, with strong client libraries, model versioning, and webhook support.

Where Replicate wins:

  • Best developer experience and tooling
  • Strong async API with webhook callbacks
  • Useful for prototyping and exploring models broadly

Where Replicate loses:

  • Highest failure rate of the six; not in the same league on reliability
  • Cost-per-success blows past competitors once you account for retries
  • Failure pattern (all HTTP errors, clustered) is consistent with upstream capacity issues, which a third party can't fix

Best for: Prototyping, model exploration across many providers, async pipelines with deep retry budgets and tolerance for variance.

Skip if: Production traffic, user-facing flows, anything with an SLA.

Beyond Reliability: Latency, Cost, and Throughput

The reliability ranking is the headline. The full picture has more axes. Here's how the six providers stack up on each.

Latency (P50 and P95, successful runs only, last 7 days)

P50 and P95 over time
ProviderP50P95Latency consistency
Vertex AI Direct22.7s37.3sMost consistent
Replicate23.0s36.3sConsistent (when succeeding)
Prodia23.0s32.6sMost consistent third-party
Together AI24.7s48.5sVolatile (peaks > 70s)
Runware25.1s43.2sSteady
Runflow32.6s64.6sHighest P95

Median latency is tightly clustered. The P95 is where the providers separate, and that's the metric that matters for user-facing apps.

Cost (per 1K image, list pricing)

Approximate as of early 2026; check provider pricing pages for current rates.

ProviderPer-image (1K)Note
Vertex AI Direct$0.045Cheapest official rate
Together AIVaries, mid-marketVolume discounts
ProdiaMid-marketAggressive at volume
RunwareMid-market
ReplicatePer-second computeVariable
Runflow$0.080Premium for reliability

Per-call price alone is misleading. Most providers don't bill for failures, so the dollar gap between a 75% and a 99% reliability provider is small — but the user-experience gap is enormous. The reliability-adjusted cost is paid in spinner-time, support tickets, and engineering overhead, not invoice line items.

Throughput (concurrent requests handled cleanly)

Not directly visible from the benchmark (we run one request per provider every 10 minutes, not concurrent loads), but worth noting:

  • Vertex AI Direct: Subject to Google's per-project quotas. Concurrent capacity scales with quota grants, which require explicit requests at high volumes.
  • Replicate, Together AI, Prodia, Runware: Each handles concurrent requests up to platform limits. Quotas vary by tier.
  • Runflow: Routes across multiple upstreams, so effective throughput is the aggregate of the underlying capacity. Useful when you need to burst.

For high-burst workloads, the routing-layer providers tend to absorb spikes better than direct providers. For steady-state high volume, direct provides better unit economics.

Choosing the Right Provider for Your Use Case

The decision tree, condensed:

You're building a user-facing product where every failure is a support ticket.
→ Runflow for reliability, Vertex AI Direct as a strong second if you can absorb the capacity-event risk yourself.

You're running a batch pipeline where failures retry cheaply and latency matters.
→ Prodia or Runware. P95 is tight, integration is light, failure modes are recoverable.

You're cost-sensitive and your traffic is async-friendly.
→ Together AI or Vertex Direct. Lower per-call cost; build proper retry logic for the 5–12% failures.

You're prototyping or exploring models broadly.
→ Replicate. Best developer experience, easy to iterate. Move off it for production.

You're under enterprise compliance constraints (SOC 2, audit, regional data residency).
→ Vertex AI Direct on Google Cloud. Direct relationship with Google, established compliance posture.

You need to survive Google capacity events without ops intervention.
→ Routing-layer providers (Runflow, or building your own multi-provider router). Direct connections give you no buffer when upstream is degraded.

You're early enough that you're not sure.
→ Start on Vertex AI Direct. Cheapest, fastest, most predictable when things work. Add a routing layer later when reliability becomes a measurable cost.

Integration: Calling Each Provider

Brief examples for the top three by reliability. Other providers follow similar REST-based patterns; consult their docs.

Vertex AI Direct (Gemini API)

from google import genai

client = genai.Client(api_key="YOUR_KEY")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="A birthday cake with candles spelling APRIL 28, 2026 at 12:50 UTC",
    config={"response_modalities": ["Image"]},
)

for part in response.candidates[0].content.parts:
    if part.inline_data is not None:
        with open("output.png", "wb") as f:
            f.write(part.inline_data.data)

Runflow

import requests

response = requests.post(
    "https://api.runflow.io/v1/models/google/nano-banana-2/generate",
    headers={"Authorization": "Bearer YOUR_KEY"},
    json={
        "prompt": "A birthday cake with candles spelling APRIL 28, 2026 at 12:50 UTC",
        "resolution": "1K",
    },
)

with open("output.png", "wb") as f:
    f.write(response.content)

Replicate

import replicate

output = replicate.run(
    "google/nano-banana-2",
    input={
        "prompt": "A birthday cake with candles spelling APRIL 28, 2026 at 12:50 UTC",
    },
)

# output is a URL or list of URLs

The patterns are similar across REST-based providers. The variance is in error handling: Vertex returns structured error responses that map to recoverable vs unrecoverable categories cleanly; some third-party providers flatten everything into HTTP 500s, which makes graceful retry harder. This is a real factor in the operational profile of each provider.

What This Test Doesn't Tell You

image-data-benchmark-nano-banana-2

We want to be explicit about the benchmark's limits.

Image quality is not measured programmatically. The benchmark measures whether a provider returned an image and how fast, not whether the image is good. That's a deliberate choice; quality scoring is its own large topic, and any single metric we'd compute would mislead. What the benchmark does instead: every call's output is preserved in the Recent batches by day gallery, with the latencies stamped on each image. Readers can compare the actual outputs across providers for the same prompt at the same moment, and form their own quality judgments. Hundreds of side-by-side comparisons are available going back 30 days; the gallery is more credible than any quality score we could publish.

Geographic coverage is single-origin. All test calls originate from one location. Real production traffic will have geographic latency variance that this benchmark doesn't capture. If your users are concentrated in a region far from Google's hosting points, your numbers will differ.

Snapshot vs long-term trends. This article quotes a 7-day window. The protos page shows up to 30 days, and the live data updates every 10 minutes. If you're making a procurement decision, look at the longer view; a single bad week from a usually-reliable provider is different from a structural reliability problem.

Per-call price vs reliability-adjusted cost. We mention this in the methodology, but worth repeating: most providers don't bill for failed calls, so the dollar cost is largely insensitive to reliability. The real cost shows up in user experience (every failure is a 30–60 second wait followed by an error), support burden, and the engineering time you spend on retry and incident logic. Use the success rate to estimate the real cost — it's not on the invoice.

Provider-specific feature deltas. Some providers expose Nano Banana Pro alongside Nano Banana 2. Some offer 4K output where others cap at 2K. Some have webhook delivery; others don't. The benchmark tests Nano Banana 2 at 1K with default settings; for other model versions or feature requirements, consult provider documentation.

Single model. This benchmark is Nano Banana 2 only. Behavior on Nano Banana Pro or the original Nano Banana may differ. Each model has its own routing and capacity characteristics on every provider.

FAQ

What is the Nano Banana API?

The Nano Banana API refers to the developer-facing endpoint for Google's Gemini-based image generation models, accessible directly through Vertex AI / Gemini API or through third-party providers like Replicate, Together AI, Prodia, Runware, and Runflow that resell or route to the same underlying models. There are three model versions: Nano Banana (Gemini 2.5 Flash Image), Nano Banana Pro (Gemini 3 Pro Image), and Nano Banana 2 (Gemini 3.1 Flash Image, the current default).

Which Nano Banana API provider is most reliable?

Based on the live benchmark at protos.runflow.io running 7 days a week across six providers, Runflow leads with 99.6% success rate and Vertex AI Direct is a close second at 98.7%. The middle tier (Prodia, Runware, Together AI) ranges from 87.7% to 95.8%. Replicate has the lowest reliability at 75.4% over the most recent 7-day window.

Which Nano Banana API has the lowest latency?

Vertex AI Direct, Replicate, and Prodia have the lowest median latency on successful runs (around 22 to 23 seconds P50). For P95 latency (the slowest 5% of calls), Prodia is fastest at 32.6 seconds. Runflow has the highest P95 at 64.6 seconds, which is the trade-off for the routing and retry logic that gives it higher reliability.

Is the Vertex AI Nano Banana API the same as the third-party providers?

The underlying model is the same; what differs is the request path. Vertex AI Direct gives you a direct connection to Google's hosted model with no intermediary. Third-party providers route requests through their own infrastructure, which can mean better failover (if they route across multiple upstreams) or worse reliability (if they add their own potential failure points). The benchmark numbers reflect the practical operational difference.

What's the cheapest Nano Banana API?

Vertex AI Direct has the lowest list price at $0.045 per image at 1K resolution. Third-party providers vary; some are cheaper at volume, some are more expensive. But per-call price is misleading because most providers don't bill for failed calls — the dollar gap between a 75% and a 99% reliability provider is small. The real cost of unreliability is paid in user experience (every failure is a 30–60 second wait followed by an error), support tickets, and engineering time spent on retry logic. Use the success rate, not the list price, to estimate the cost that actually hits your product.

Which provider should I use for production apps?

For production user-facing apps, the reliability-first picks are Runflow (99.6% reliability, higher P95 latency) or Vertex AI Direct (98.7% reliability, lower P95, but no failover during Google capacity events). For batch or async production workloads, Prodia and Runware are credible mid-tier choices. Avoid Replicate for production at this time; the 24.6% failure rate is not workable for live traffic.

Why is Replicate's reliability so much lower than the others?

The benchmark data shows 59 failures across 236 runs over 7 days, all of them HTTP errors clustered in specific time windows. This pattern is consistent with upstream capacity issues or routing problems rather than random instability. Replicate is an excellent platform for many use cases (especially developer experience and model exploration), but its current Nano Banana 2 reliability profile makes it a poor choice for production traffic. Replicate may improve; the benchmark will show it when they do.

Does Runflow run the benchmark fairly given that Runflow is one of the providers tested?

Yes, by design. The methodology is documented (every 10 minutes, same prompt, six providers in parallel, no retries, no special treatment of any provider including Runflow), the data is public, and the test is continuous so any manipulation would show up immediately as inconsistencies. The article also reports honestly where Runflow loses (highest P95 latency, premium pricing), because hiding those would be more damaging to credibility than reporting them.

How often is the benchmark data updated?

Every 10 minutes. The live page at protos.runflow.io/nano-banana shows aggregate data over the most recent 1 day, 3 days, 7 days, 14 days, or 30 days windows, along with the most recent batch outputs in the live gallery.

Can I see the actual generated images from the benchmark?

Yes. The "Recent batches by day" section of the protos page shows every prompt fan-out with all six provider outputs side by side, the actual generated images, and the per-call latency stamped on each. This is the right place to form your own judgments about image quality, since the benchmark itself doesn't grade quality programmatically.

Why is Fal.ai not included in this benchmark?

Fal is a credible Nano Banana 2 provider and is on the shortlist for the next benchmark batch. The current six providers were chosen as the initial cohort based on developer adoption and request patterns; expanding the cohort is on the roadmap.

What's the difference between Nano Banana, Nano Banana Pro, and Nano Banana 2?

Nano Banana (August 2025, Gemini 2.5 Flash Image) is the original; fast and cheap. Nano Banana Pro (November 2025, Gemini 3 Pro Image) is the studio-quality tier with 4K output and best-in-class text rendering. Nano Banana 2 (March 2026, Gemini 3.1 Flash Image) combines Pro-level quality with Flash-level speed and is the current default for most developers. This benchmark tests Nano Banana 2 specifically; behavior on the other model versions may differ.

Where to Go Next

If you're picking a Nano Banana API provider right now:

  1. Check the live data. Numbers in this article are a 7-day snapshot from late April 2026. Live data is at protos.runflow.io/nano-banana, updated every 10 minutes. Use the longer 30-day window for procurement decisions.
  2. Use the decision framework above. Match your use case (user-facing vs batch, latency-critical vs reliability-critical, prototype vs production) to the right tier of provider.
  3. Don't optimize on per-call price alone. Most providers don't bill for failed calls, so the per-call gap is smaller than it looks — but the reliability gap shows up in user experience, support load, and engineering overhead. Use the success rate to estimate the real cost.
  4. Verify image quality visually. The live gallery on protos.runflow.io shows actual outputs from all six providers for the same prompts at the same moments. Make your own quality judgments from real outputs, not vendor marketing claims.
  5. Test from your own infrastructure before production cutover. Geographic latency varies; provider quotas vary. Run a load test from your hosting region with realistic traffic patterns before committing.

For Runflow's Nano Banana 2 endpoint specifically, including pricing details and integration documentation, see runflow.io/models/google/nano-banana-2.

The benchmark will keep running. The next version of this article will cover whatever's changed: new providers added, reliability shifts, the impact of any model updates Google ships. The right Nano Banana API provider for you today may not be the right one in six months, and that's fine; the benchmark will tell you when to switch.

nano banana apinano banana 2 apinano banana pro apibenchmarks

Want custom benchmarks for your workload?

We'll run our evaluation pipeline against your production data, for free.

Talk to Founders