LLM / providers / gemini

Google Gemini API Pricing

Google's Gemini lineup pairs a frontier Pro tier with several Flash tiers, all offering a large context window, prompt caching and batch discounts, which suits workloads that mix high-volume and reasoning-heavy calls.

Prices verified June 2026 · changes logged in the changelog
Heads up: The frontier Gemini 3.1 Pro tier applies a higher rate to prompts above roughly 200k tokens than to shorter prompts. The table shows the standard short-context rate — if your prompts routinely exceed that threshold, model the long-context rate noted on the page instead, or the estimate will run low.
Model$ input /1M$ output /1M$ cached /1MBatch≈ $/mo *
Gemini 3.1 Pro PreviewFRONTIER $2$12$0.20−50%$508
Gemini 3.5 FlashMID $1.50$9$0.15−50%$381
Gemini 3 Flash PreviewMID $0.50$3$0.05−50%$127
Gemini 3.1 Flash-LiteBUDGET $0.25$1.50$0.025−50%$63.5

* Example workload — chatbot, 100k requests/mo, 2,000 input / 300 output tokens per request, 70% of input cached. Computed by the same engine as the calculator. Batch: the −50% is Google's verified Batch API discount; the ≈ $/mo column is computed without it.

Prompt caching

Cached input is billed at 10% of the input rate across all Gemini models we track — a major lever for chatbots and agents where most of the prompt repeats. The calculator models this with your cache share.

Batch / async

The Batch API runs asynchronous jobs at a verified −50% on both input and output across all models we track — flip the Batch toggle in the calculator to model it.

Context window

Gemini 3.1 Pro Preview, Gemini 3.5 Flash, Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite run a verified 1M-token context window. Note the long-context surcharge: Gemini 3.1 Pro Preview bills prompts over 200k tokens at $4 in / $18 out per 1M — the calculator applies these rates automatically once your input crosses 200k tokens.

When Gemini is worth it

Use caseVerdict
High-volume calls where a budget Flash tier is enoughGemini's Flash tiers fit
You need a large context window across most tiersGemini offers it broadly
Prompts regularly exceed the long-context threshold on the Pro tierBudget for the higher long-context rate
Is Gemini the right price for your workload?
The calculator puts these four models next to the other 24 we track — at your volume, token mix and cache share.
Open calculator

Frequently asked questions

On the frontier Pro tier, prompts above a token threshold are billed at a higher rate than shorter prompts. The table reflects the standard short-context rate; the long-context note on this page shows the higher tier.
Across the lineup in our data, the models offer a discounted cached-input rate and a batch (async) discount. Per-model figures are in the table above.
All 28 models → OpenAI pricing → Anthropic pricing → DeepSeek pricing → Grok pricing → Mistral pricing → Gemini alternatives → Cheapest LLM API → Price changelog →