# Gemini 3.5 Flash goes GA: what $1.50/$9 per million actually buys

> Gemini 3.5 Flash is GA at $1.50/$9 per million tokens (cached input $0.15): triple the Flash it replaces, 25% cheaper than 3.1 Pro, and ahead of it on coding and agentic benchmarks. What it buys.

- **Pillar:** Models
- **Author:** Aditya Marin Gasga (Founding Editor)
- **Published:** 2026-06-13T13:02:00.000Z
- **Tags:** models, ai-pricing

## TL;DR

Gemini 3.5 Flash is generally available at $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15. That is three times the price of the Gemini 3 Flash it replaces, and 25 percent cheaper than Gemini 3.1 Pro, which it beats on coding and agentic benchmarks.

import ComparisonTable from '~/components/article/ComparisonTable.astro';

Gemini 3.5 Flash went GA on May 19 at [$1.50 per million input tokens and $9.00 per million output, with cached input at $0.15](https://llm-stats.com/blog/research/gemini-3.5-flash-launch). The model is the smaller story. The price is the bigger one: this is the tier Google built to be cheap, and it now costs [three times the Gemini 3 Flash it replaces](https://www.xda-developers.com/google-gemini-3-5-flash-costs-3x-model-replaced-cheap-ai-ending/), which ran at $0.50 and $3.

You're paying frontier-adjacent money for the budget badge. The question is whether the model earns it. Mostly, for one kind of buyer, it does.

## The ledger: what got better, what didn't

On the capability side, the upgrade is real and it's lopsided in an interesting way. Google's published numbers have 3.5 Flash beating its own larger Gemini 3.1 Pro on the agentic and coding suite: [76.2 percent vs 70.3 on Terminal-Bench 2.1, 83.6 vs 78.2 on MCP Atlas, and 57.9 vs 43.0 on Finance Agent v2](https://llm-stats.com/blog/research/gemini-3.5-flash-launch). Independent evaluator Artificial Analysis, given pre-release access, [scored it 55 on its Intelligence Index, nine points above Gemini 3 Flash, and measured around 284 output tokens per second](https://www.techtimes.com/articles/316861/20260519/google-ships-gemini-35-flash-cheap-run-agent-model-that-costs-3x-more-per-token.htm). Sundar Pichai cited 289 on stage; the "300 tokens per second" figure circulating is a slight overstatement.

The other column matters just as much. The model [trails 3.1 Pro on long-context retrieval and on knowledge-heavy tests like Humanity's Last Exam](https://www.techtimes.com/articles/316861/20260519/google-ships-gemini-35-flash-cheap-run-agent-model-that-costs-3x-more-per-token.htm), an expected trade given Google tuned it for [agentic execution](/what-is-agentic-ai) over raw recall. The context window is [1M tokens in, about 64K out](https://openrouter.ai/google/gemini-3.5-flash), so teams who bought Gemini for the [2M-token window stay on 3.1 Pro](https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration).

Read the two columns together and the model's identity is clear: this is a workhorse for tool-calling loops, not a librarian. It executes faster and more reliably than the Pro model that costs more; it knows less and retrieves worse over very long context.

## The price, in context

Three comparisons frame what $1.50/$9 means in June 2026.

**Against its own lineage.** Gemini 2.5 Flash to 3 Flash to 3.5 Flash traces [$0.30-class input to $0.50 to $1.50](https://www.xda-developers.com/google-gemini-3-5-flash-costs-3x-model-replaced-cheap-ai-ending/): roughly five times the input price of the 2.5-era tier in under a year. The line only goes up.

**Against the Pro tier.** At [$2/$12, Gemini 3.1 Pro is now 25 percent more expensive](https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration) than the Flash model that outperforms it on coding and agents. Pichai's "frontier capabilities at less than half the price" pitch [refers to competing frontier models, not Google's prior Flash](https://www.techtimes.com/articles/316861/20260519/google-ships-gemini-35-flash-cheap-run-agent-model-that-costs-3x-more-per-token.htm). Both framings are technically true. Only one describes your bill.

**Against the countertrend.** While Western labs walk prices up, DeepSeek made its promotional 75 percent cut permanent in May, pricing [V4-Pro at roughly $0.44/$0.87 and V4-Flash at $0.14/$0.28](https://www.xda-developers.com/google-gemini-3-5-flash-costs-3x-model-replaced-cheap-ai-ending/). The cheap tier still exists. It's just increasingly not American.

The levers that soften the sticker: [batch and flex pricing run 50 percent off, and cache hits cost $0.15, a 90 percent reduction on input](https://evolink.ai/blog/gemini-3-5-flash-pricing-guide). For agent workloads with a fat shared system prompt, caching is the difference between this model being mid-priced and being expensive. Note also that [non-global endpoints price at $1.65/$9.90, with that pricing taking effect July 1](https://cloud.google.com/gemini-enterprise-agent-platform/generative-ai/pricing).

<ComparisonTable
  caption="Price vs published agentic capability, June 2026. Each value traces to the cited source in the text; models without a published Terminal-Bench 2.1 score show price only."
  criteria={[
    { key: "input", label: "Input ($/1M)" },
    { key: "output", label: "Output ($/1M)" },
    { key: "tb21", label: "Terminal-Bench 2.1 (%)" }
  ]}
  items={[
    { name: "Gemini 3.5 Flash", values: { input: "1.50", output: "9.00", tb21: "76.2" } },
    { name: "Gemini 3.1 Pro", values: { input: "2.00", output: "12.00", tb21: "70.3" } },
    { name: "Gemini 3 Flash", values: { input: "0.50", output: "3.00", tb21: "—" } },
    { name: "Gemini 3.1 Flash-Lite", values: { input: "0.25", output: "1.50", tb21: "—" } },
    { name: "DeepSeek V4-Pro", values: { input: "0.44", output: "0.87", tb21: "—" } },
    { name: "DeepSeek V4-Flash", values: { input: "0.14", output: "0.28", tb21: "—" } }
  ]}
/>

## Why the cheap tier got expensive

Zoom out from the per-token line and the industry logic is visible. Agents changed the consumption curve: a single user request now triggers a loop of model calls, tool calls, and retries, which is exactly the workload 3.5 Flash is tuned for. Constellation Research framed the backdrop as enterprise ["sticker shock" as companies see their first real token bills for running agents at scale](https://www.techtimes.com/articles/316861/20260519/google-ships-gemini-35-flash-cheap-run-agent-model-that-costs-3x-more-per-token.htm). Google's bet is that buyers running agents will pay 3x per token for a model that completes the loop in fewer turns, because the bill that matters is per task, not per token.

That bet may even be right. But it quietly retires an assumption the whole budget tier was built on: that next year's cheap model costs what last year's did. Anyone whose unit economics depended on that assumption, and a lot of high-volume products did, just inherited a repricing problem they didn't choose.

## Who should move, who should wait

**Move:** teams running agentic or coding workloads on Gemini 3.1 Pro who don't need the 2M context. You get [better scores on those suites for 25 percent less](https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration), and caching stacks on top.

**Run the math first:** anyone on Gemini 3 Flash doing simple classification, extraction, or routing. A 3x price jump for capability you don't use is a donation. The [$0.10/$0.40 Flash-Lite class](https://costgoat.com/pricing/gemini-api) exists for exactly those jobs.

**Wait:** long-context-heavy workloads (stay on Pro) and anyone with a hard deadline on June 1 housekeeping, since [Gemini 2.0 Flash shut down June 1](https://www.metacto.com/blogs/the-true-cost-of-google-gemini-a-guide-to-api-pricing-and-integration) and forced migrations are the wrong moment to also change tiers.

One number to carry out of this piece: $1.50 is what Google now charges for the cheap one. Recalibrate every "AI is getting cheaper" assumption in your 2026 plan against it.

## FAQ

**How much does Gemini 3.5 Flash cost?**
Gemini 3.5 Flash costs $1.50 per million input tokens and $9.00 per million output tokens, with cached input at $0.15 per million. Batch and flex processing run at a 50 percent discount, and non-global endpoints price at $1.65/$9.90. It went generally available on May 19, 2026 with the API model ID gemini-3.5-flash.

**Is Gemini 3.5 Flash better than Gemini 3.1 Pro?**
On coding and agentic benchmarks, yes: Google reports 76.2 percent vs 70.3 on Terminal-Bench 2.1 and 83.6 vs 78.2 on MCP Atlas, at a price 25 percent lower. Gemini 3.1 Pro remains stronger on long-context retrieval and knowledge-heavy tests like Humanity's Last Exam, and it offers a 2M-token context window against Flash's 1M. Pick by workload, not by tier name.

**Why is Gemini 3.5 Flash more expensive than Gemini 3 Flash?**
Google priced 3.5 Flash at three times its predecessor ($1.50/$9 vs $0.50/$3) and positioned it as an agent-execution model rather than a pure budget tier. Agentic workloads multiply model calls per task, and Google is betting buyers will pay more per token for fewer turns per completed task. Teams that only need classification or routing should use the cheaper Flash-Lite models instead.