Updated 2 Jun 2026 • 6 mins read

Google Gemini API Pricing 2026: Every Model and Cost Explained

Ai Cost Optimization

Khushi Dubey
Author

Table of Content

Google Gemini API pricing is token-based and varies sharply by model, from low-cost Flash-Lite to higher-accuracy Pro. This 2026 guide breaks down per-model input and output rates, the tiered pricing on Pro, the cost of thinking tokens, context caching and Search grounding, a vendor comparison, and practical ways to cut your bill.

Quick answer: Gemini API pricing is pay-per-token and depends entirely on the model. Gemini 2.5 Pro is the high-accuracy tier at about $1.25 per million input tokens and $10.00 per million output tokens for standard prompts. Gemini 2.5 Flash is the workhorse at roughly $0.30 input and $2.5 output, and Flash-Lite is the budget option near $0.10 input and $0.40 output. Output is far more expensive than input on every model, and on the 2.5 family that output figure includes the model's thinking tokens. There is a free tier in Google AI Studio with rate limits. Always confirm current rates on Google's official Gemini API pricing page, because models and prices change often.

Google's Gemini API has become one of the most cost-competitive ways to build with a frontier model, but the pricing page is easy to misread. Rates change with each model, output costs several times more than input, and the newer reasoning models charge for the tokens they spend thinking. A small change in which model you call can swing your bill by 10x.

That makes Gemini pricing less of a sticker question and more of a FinOps one. For anyone building a product on the API, the real cost depends on model choice, prompt size, output length, and whether you use features like context caching and Search grounding.

This guide breaks down Gemini API pricing by model for 2026, explains the tiered pricing on Pro, shows what actually drives your bill, compares Gemini against OpenAI and Anthropic, and lays out practical ways to cut costs without losing quality.

How Gemini API pricing works

Gemini is billed per token, not per request. A token is a chunk of text, roughly four characters or three-quarters of a word in English. Every call has two metered parts:

Input tokens. Everything you send: the prompt, system instructions, conversation history, and any files or images, counted as tokens.
Output tokens. Everything the model returns. On the Gemini 2.5 reasoning models, this also includes thinking tokens, the internal reasoning the model generates before its final answer.

Output is the expensive side. On most models, output tokens cost several times more than input, so a chatty model that reasons at length can cost far more than the input rate suggests. There are also two billing surfaces for the same models: the Gemini API through Google AI Studio, and Vertex AI for enterprise governance. Per-token rates are broadly similar; Vertex adds enterprise controls and its own commitments.

Free tier: Google AI Studio offers a free tier with lower rate limits, useful for prototyping. On the free tier, your prompts and responses may be used to improve Google's products. On the paid tier, your data is not used for training. If you handle anything sensitive, build on the paid tier.

Gemini consumer plans

Plan	Price	Notes
Free	$0	Gemini 3.5 Flash default, 100 monthly AI credits, 15 GB storage
Google AI Plus	$7.99/mo	Budget tier — the only sub-$20 plan among major assistants
Google AI Pro	$19.99/mo	Gemini 3.1 Pro, 1M context, Deep Research, Gems, Canvas
Google AI Ultra	$99.99/mo	Cut from $249.99 at I/O 2026; 20 TB storage; ~5× Pro limits

Gemini API pricing by model (2026)

Here is the per-model breakdown for the Gemini 2.5 family, the current generation. Rates are per million tokens for standard text prompts. Confirm live numbers on Google's Gemini API pricing page, since Google updates models and rates regularly.

Model	Input / 1M	Output / 1M	Context	Notes
Gemini 3.1 Pro (≤200K)	$2.00	$12.00	2M	Current flagship; paid-only
Gemini 3.1 Pro (>200K)	$4.00	$18.00	2M	Long-context tier
Gemini 3.5 Flash	$1.50	$9.00	1M	New default (May 19, 2026)
Gemini 3 Flash	$0.50	$3.00	1M	Cheap, capable workhorse
Gemini 3.1 Flash-Lite	$0.25	$1.50	1M	Cheapest Tier-1 budget
Gemini 2.5 Pro (≤200K)	$1.25	$10.00	1M	Legacy flagship
Gemini 2.5 Pro (>200K)	$2.50	$15.00	1M	Legacy long-context
Gemini 2.5 Flash	$0.30	$2.50	1M	Legacy mid-tier
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M	Legacy cheapest

The pattern to internalize: Pro is for the work that genuinely needs it, Flash handles most production traffic, and Flash-Lite is for cheap, high-volume jobs where a smaller model is good enough. Choosing the right model per task, rather than defaulting everything to Pro, is the single biggest lever on your bill.

Gemini 2.5 Pro pricing explained

Gemini 2.5 Pro uses tiered pricing based on how large your prompt is. Short and medium prompts bill at the lower rate; once a single prompt crosses roughly 200K tokens, both input and output step up to a higher rate.

Prompt size	Input / 1M	Output / 1M (incl. thinking)
Up to 200K tokens	$1.25	$10.00
Above 200K tokens	$2.50	$15.00

Two things surprise teams here. First, the jump is per prompt, so a few very large requests can quietly bill at double the rate while your dashboards still show an average. Second, the output price includes thinking tokens. A hard reasoning task can generate far more internal thinking than visible answer, so a 500-token reply might bill as several thousand output tokens. If you do not need deep reasoning on a given call, a Flash model or a lower thinking budget usually wins on cost.

Gemini 2.5 Flash and Flash-Lite: the cost-efficient tiers

Most production traffic does not need the Pro model. The Flash tiers exist precisely so you do not overpay for routine work.

Gemini 2.5 Flash is the balanced choice: strong quality at roughly an order of magnitude less than Pro on output. It is the right default for chat, summarization, extraction, and most app features. It is still a thinking model, so the same thinking-token caution applies, but you can tune the thinking budget down for simpler calls.

Gemini 2.5 Flash-Lite is the budget tier at around $0.10 input and $0.40 output per million tokens. It is built for high-throughput, latency-sensitive work like classification, routing, tagging, and simple transformations, where a smaller model is good enough and volume is what drives the bill.

A common and effective pattern is a model cascade: send everything to Flash-Lite or Flash first, and only escalate to Pro the small fraction of requests that truly need deeper reasoning. That alone can cut a Gemini bill substantially with little quality loss.

What actually drives your Gemini bill

Per-token rates are only the starting point. These five factors decide what you actually pay:

Output volume and thinking tokens. Output costs several times more than input, and on 2.5 models thinking tokens count as output. Long, verbose, or heavily reasoned responses are where cost concentrates.
Prompt size. Large context windows are powerful but you pay for every token you stuff in. Resending long histories or whole documents on each call adds up fast, and on Pro can tip you into the higher pricing tier.
Context caching. If you reuse the same large context across many calls, caching lets you pay a reduced rate for the cached tokens plus a storage fee per hour. Used well it saves money; left running it becomes a quiet recurring charge.
Search grounding. Grounding responses with Google Search has its own cost: a free daily allowance, then a per-thousand-requests charge. Heavy grounded traffic is a separate line item from token usage.
Multimodal inputs. Images, audio, and video are converted to tokens and billed accordingly, and audio input is priced higher than text on most models. Video and audio workloads cost more than they look.

Worked example: estimating a monthly Gemini bill

Token math feels abstract until you put numbers to it. Here is a simple estimate for a product that handles 500,000 requests per month, where each request sends about 2,000 input tokens and receives about 500 output tokens, including thinking. The same workload costs very differently depending on the model you route it to.

Model	Input cost	Output cost	Estimated monthly total
Gemini 2.5 Flash-Lite	$0.10/M × 1,000M = $100	$0.40/M × 250M = $100	~$200
Gemini 2.5 Flash	$0.30/M × 1,000M = $300	$2.50/M × 250M = $625	~$925
Gemini 2.5 Pro	$1.25/M × 1,000M = $1,250	$10.00/M × 250M = $2,500	~$3,750

The same traffic ranges from roughly $200 to $3,750 a month depending purely on model choice. That 18x spread is the entire argument for routing carefully and measuring cost per feature, not just total spend. It also shows why output dominates: on Flash and Pro, the output line is larger than the input line even though there are far fewer output tokens.

How to cut your Gemini API costs

Most Gemini bills can be reduced meaningfully without hurting quality. In rough order of impact:

Route by task, not by habit. Default to Flash or Flash-Lite and reserve Pro for the requests that genuinely need it. Model routing is the largest single lever in almost every deployment.
Control output length and thinking. Cap output tokens, ask for concise answers, and lower the thinking budget on simple calls. Since output dominates cost, this pays off immediately.
Use Batch mode for offline work. Non-urgent jobs run through the Batch API typically cost about half the interactive rate. For anything that does not need a real-time response, this is close to free money.
Cache reused context. If many calls share the same long instructions or documents, context caching cuts the per-call cost of those repeated tokens. Just remember the storage fee and clear caches you no longer need.
Trim prompts and histories. Send only the context the model needs. Summarize long conversation histories instead of resending them in full, and avoid pasting entire documents when a relevant excerpt will do.

Gemini vs OpenAI vs Claude API pricing

Gemini's headline advantage is price, especially on the Flash tiers. Here is how the flagship and workhorse models compare on published list rates. Treat these as approximate, since every provider updates pricing often.

Model	Input / 1M	Output / 1M	Tier
Gemini 3.5 Flash	$1.50	$9.00	Workhorse (new default)
Gemini 3.1 Pro	$2.00	$12.00	Flagship (2M context)
OpenAI GPT-5.5	$5.00	$30.00	Flagship
Claude Opus 4.8	$5.00	$25.00	Flagship
Claude Sonnet 4.6	$3.00	$15.00	Workhorse / mid

On raw token price, Gemini Flash sits well below the OpenAI and Anthropic flagships, which is why high-volume products often standardize on it. The flagship comparison is closer once you account for quality differences per task. The practical takeaway is the same one that applies to every provider: the cheapest model that meets your quality bar wins, and what matters most is total AI spend visibility across all the models you run, not any single sticker price.

5 hidden costs and gotchas to watch

The pricing table rarely tells the whole story. These are the surprises that show up on the invoice:

Thinking tokens inflate output. On 2.5 reasoning models, internal thinking bills as output. A short visible answer can carry a large hidden output cost.
The Pro tier jump is per prompt. A handful of very large prompts can bill at the higher Pro rate while averages look fine.
Cache storage keeps charging. Context caching saves on repeated tokens but adds an hourly storage fee. Forgotten caches become a slow leak.
Search grounding is separate. Grounded requests are billed apart from tokens once you pass the free daily allowance.
No native cost allocation. Billing does not split spend by team, feature, model, or customer. At any real scale, you cannot answer cost per customer or cost per feature without extra tooling.

Conclusion

Gemini is one of the most affordable ways to build with a frontier model, but the bill is driven by model choice, output length, and features like caching and grounding, not the headline rate.

Route work to the cheapest model that clears your quality bar, watch output and thinking tokens, and keep AI spend allocated. In 2026, visibility is the real cost lever.

FAQs

Is the Gemini API free to use?

There is a free tier through Google AI Studio with lower rate limits, which is good for prototyping. On the free tier your data may be used to improve Google's products. Production and sensitive workloads should use the paid tier, where data is not used for training

How much does Gemini 2.5 Pro cost per million tokens?

About $2 input and $12 output for standard prompts, stepping to $4/$18 once a single prompt exceeds ~200K tokens. It offers up to a 2M-token context — the largest of the major flagships.

Which Gemini model is the cheapest?

Which Gemini model is the cheapest? Gemini 2.5 Flash-Lite remains the lowest-cost tier at $0.10/$0.40. Among the newer generation, Gemini 3 Flash ($0.50/$3) is the cheapest capable workhorse and Gemini 3.5 Flash ($1.50/$9) is the usual new default.

Why is Gemini output so much more expensive than input?

Generating tokens is more compute-intensive than reading them, so providers price output higher across the board. On Gemini 2.5 models, the output price also covers the model's internal thinking tokens, which can be several times the visible answer.

Related Blogs

Claude AI in 2026

Datadog Pricing in 2026

ChatGPT Pricing in 2026

Cloud waste? Bench it. Opslyft puts the right players on the field.