Loading...


Updated 2 june 2026 • 6 mins read

Google Gemini API pricing is token-based and varies sharply by model, from low-cost Flash-Lite to higher-accuracy Pro. This 2026 guide breaks down per-model input and output rates, the tiered pricing on Pro, the cost of thinking tokens, context caching and Search grounding, a vendor comparison, and practical ways to cut your bill.
Quick answer: Gemini API pricing is pay-per-token and depends entirely on the model. Gemini 2.5 Pro is the high-accuracy tier at about $1.25 per million input tokens and $10.00 per million output tokens for standard prompts. Gemini 2.5 Flash is the workhorse at roughly $0.30 input and $2.5 output, and Flash-Lite is the budget option near $0.10 input and $0.40 output. Output is far more expensive than input on every model, and on the 2.5 family that output figure includes the model's thinking tokens. There is a free tier in Google AI Studio with rate limits. Always confirm current rates on Google's official Gemini API pricing page, because models and prices change often.
Google's Gemini API has become one of the most cost-competitive ways to build with a frontier model, but the pricing page is easy to misread. Rates change with each model, output costs several times more than input, and the newer reasoning models charge for the tokens they spend thinking. A small change in which model you call can swing your bill by 10x.
That makes Gemini pricing less of a sticker question and more of a FinOps one. For anyone building a product on the API, the real cost depends on model choice, prompt size, output length, and whether you use features like context caching and Search grounding.
This guide breaks down Gemini API pricing by model for 2026, explains the tiered pricing on Pro, shows what actually drives your bill, compares Gemini against OpenAI and Anthropic, and lays out practical ways to cut costs without losing quality.
Gemini is billed per token, not per request. A token is a chunk of text, roughly four characters or three-quarters of a word in English. Every call has two metered parts:
Output is the expensive side. On most models, output tokens cost several times more than input, so a chatty model that reasons at length can cost far more than the input rate suggests. There are also two billing surfaces for the same models: the Gemini API through Google AI Studio, and Vertex AI for enterprise governance. Per-token rates are broadly similar; Vertex adds enterprise controls and its own commitments.
Free tier: Google AI Studio offers a free tier with lower rate limits, useful for prototyping. On the free tier, your prompts and responses may be used to improve Google's products. On the paid tier, your data is not used for training. If you handle anything sensitive, build on the paid tier.
Here is the per-model breakdown for the Gemini 2.5 family, the current generation. Rates are per million tokens for standard text prompts. Confirm live numbers on Google's Gemini API pricing page, since Google updates models and rates regularly.
| Model | Input / 1M | Output / 1M | Best for |
|---|---|---|---|
| Gemini 2.5 Pro (<= 200K prompt) | $1.25 | $10.00 | Complex reasoning, long context, hardest tasks |
| Gemini 2.5 Pro (> 200K prompt) | $2.50 | $15.00 | Very large context analysis |
| Gemini 2.5 Flash | $0.30 | $2.50 | High-volume production, balanced cost and quality |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Classification, routing, cheap high-throughput jobs |
| Gemini 2.0 Flash | $0.10 | $0.40 | Prior-gen workhorse, still widely used |
The pattern to internalize: Pro is for the work that genuinely needs it, Flash handles most production traffic, and Flash-Lite is for cheap, high-volume jobs where a smaller model is good enough. Choosing the right model per task, rather than defaulting everything to Pro, is the single biggest lever on your bill.
Gemini 2.5 Pro uses tiered pricing based on how large your prompt is. Short and medium prompts bill at the lower rate; once a single prompt crosses roughly 200K tokens, both input and output step up to a higher rate.
| Prompt size | Input / 1M | Output / 1M (incl. thinking) |
|---|---|---|
| Up to 200K tokens | $1.25 | $10.00 |
| Above 200K tokens | $2.50 | $15.00 |
Two things surprise teams here. First, the jump is per prompt, so a few very large requests can quietly bill at double the rate while your dashboards still show an average. Second, the output price includes thinking tokens. A hard reasoning task can generate far more internal thinking than visible answer, so a 500-token reply might bill as several thousand output tokens. If you do not need deep reasoning on a given call, a Flash model or a lower thinking budget usually wins on cost.
Most production traffic does not need the Pro model. The Flash tiers exist precisely so you do not overpay for routine work.
Gemini 2.5 Flash is the balanced choice: strong quality at roughly an order of magnitude less than Pro on output. It is the right default for chat, summarization, extraction, and most app features. It is still a thinking model, so the same thinking-token caution applies, but you can tune the thinking budget down for simpler calls.
Gemini 2.5 Flash-Lite is the budget tier at around $0.10 input and $0.40 output per million tokens. It is built for high-throughput, latency-sensitive work like classification, routing, tagging, and simple transformations, where a smaller model is good enough and volume is what drives the bill.
A common and effective pattern is a model cascade: send everything to Flash-Lite or Flash first, and only escalate to Pro the small fraction of requests that truly need deeper reasoning. That alone can cut a Gemini bill substantially with little quality loss.
Per-token rates are only the starting point. These five factors decide what you actually pay:
Token math feels abstract until you put numbers to it. Here is a simple estimate for a product that handles 500,000 requests per month, where each request sends about 2,000 input tokens and receives about 500 output tokens, including thinking. The same workload costs very differently depending on the model you route it to.
| Model | Input cost | Output cost | Estimated monthly total |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10/M × 1,000M = $100 | $0.40/M × 250M = $100 | ~$200 |
| Gemini 2.5 Flash | $0.30/M × 1,000M = $300 | $2.50/M × 250M = $625 | ~$925 |
| Gemini 2.5 Pro | $1.25/M × 1,000M = $1,250 | $10.00/M × 250M = $2,500 | ~$3,750 |
The same traffic ranges from roughly $200 to $3,750 a month depending purely on model choice. That 18x spread is the entire argument for routing carefully and measuring cost per feature, not just total spend. It also shows why output dominates: on Flash and Pro, the output line is larger than the input line even though there are far fewer output tokens.
Most Gemini bills can be reduced meaningfully without hurting quality. In rough order of impact:
Gemini's headline advantage is price, especially on the Flash tiers. Here is how the flagship and workhorse models compare on published list rates. Treat these as approximate, since every provider updates pricing often.
| Model | Input / 1M | Output / 1M | Tier |
|---|---|---|---|
| Gemini 2.5 Flash | $0.30 | $2.50 | Workhorse |
| Gemini 2.5 Pro | $1.25 | $10.00 | Flagship |
| OpenAI GPT-5.5 | $5.00 | $30.00 | Flagship |
| Anthropic Claude Sonnet 4.6 | ~$3.00 | ~$15.00 | Workhorse / mid |
On raw token price, Gemini Flash sits well below the OpenAI and Anthropic flagships, which is why high-volume products often standardize on it. The flagship comparison is closer once you account for quality differences per task. The practical takeaway is the same one that applies to every provider: the cheapest model that meets your quality bar wins, and what matters most is total AI spend visibility across all the models you run, not any single sticker price.
Gemini is one of the most affordable ways to build with a frontier model, but the bill is driven by model choice, output length, and features like caching and grounding, not the headline rate.
Route work to the cheapest model that clears your quality bar, watch output and thinking tokens, and keep AI spend allocated. In 2026, visibility is the real cost lever.
There is a free tier through Google AI Studio with lower rate limits, which is good for prototyping. On the free tier your data may be used to improve Google's products. Production and sensitive workloads should use the paid tier, where data is not used for training
About $1.25 input and $10.00 output per million tokens for standard prompts, stepping up to roughly $2.50 input and $15.00 output once a single prompt exceeds about 200K tokens. The output figure includes thinking tokens.
Flash-Lite is the lowest-cost tier, around $0.10 input and $0.40 output per million tokens, built for high-volume, simpler tasks. Gemini 2.5 Flash is the next step up and the usual default for production.
Generating tokens is more compute-intensive than reading them, so providers price output higher across the board. On Gemini 2.5 models, the output price also covers the model's internal thinking tokens, which can be several times the visible answer.