LLM / providers / gemini

Google Gemini API Pricing

Google's Gemini lineup pairs a frontier Pro tier with several Flash tiers, all offering a large context window, prompt caching and batch discounts, which suits workloads that mix high-volume and reasoning-heavy calls.

Prices verified June 2026 · changes logged in the changelog

Heads up: The frontier Gemini 3.1 Pro tier applies a higher rate to prompts above roughly 200k tokens than to shorter prompts. The table shows the standard short-context rate — if your prompts routinely exceed that threshold, model the long-context rate noted on the page instead, or the estimate will run low.

Model	$ input /1M	$ output /1M	$ cached /1M	Batch	≈ $/mo *
Gemini 3.1 Pro PreviewFRONTIER	$2	$12	$0.20	−50%	$508
Gemini 3.5 FlashMID	$1.50	$9	$0.15	−50%	$381
Gemini 3 Flash PreviewMID	$0.50	$3	$0.05	−50%	$127
Gemini 3.1 Flash-LiteBUDGET	$0.25	$1.50	$0.025	−50%	$63.5

* Example workload — chatbot, 100k requests/mo, 2,000 input / 300 output tokens per request, 70% of input cached. Computed by the same engine as the calculator. Batch: the −50% is Google's verified Batch API discount; the ≈ $/mo column is computed without it.

Prompt caching

Cached input is billed at 10% of the input rate across all Gemini models we track — a major lever for chatbots and agents where most of the prompt repeats. The calculator models this with your cache share.

Batch / async

The Batch API runs asynchronous jobs at a verified −50% on both input and output across all models we track — flip the Batch toggle in the calculator to model it.

Context window

Gemini 3.1 Pro Preview, Gemini 3.5 Flash, Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite run a verified 1M-token context window. Note the long-context surcharge: Gemini 3.1 Pro Preview bills prompts over 200k tokens at $4 in / $18 out per 1M — the calculator applies these rates automatically once your input crosses 200k tokens.

When Gemini is worth it

Use case	Verdict
High-volume calls where a budget Flash tier is enough	Gemini's Flash tiers fit
You need a large context window across most tiers	Gemini offers it broadly
Prompts regularly exceed the long-context threshold on the Pro tier	Budget for the higher long-context rate

Is Gemini the right price for your workload?

The calculator puts these four models next to the other 24 we track — at your volume, token mix and cache share.

Open calculator

Frequently asked questions

When does Gemini's long-context surcharge apply?

On the frontier Pro tier, prompts above a token threshold are billed at a higher rate than shorter prompts. The table reflects the standard short-context rate; the long-context note on this page shows the higher tier.

Do Gemini models support caching and batch processing?

Across the lineup in our data, the models offer a discounted cached-input rate and a batch (async) discount. Per-model figures are in the table above.

All 28 models → OpenAI pricing → Anthropic pricing → DeepSeek pricing → Grok pricing → Mistral pricing → Gemini alternatives → Cheapest LLM API → Price changelog →