Google's Gemini lineup pairs a frontier Pro tier with several Flash tiers, all offering a large context window, prompt caching and batch discounts, which suits workloads that mix high-volume and reasoning-heavy calls.
| Model | $ input /1M | $ output /1M | $ cached /1M | Batch | ≈ $/mo * |
|---|---|---|---|---|---|
| Gemini 3.1 Pro PreviewFRONTIER | $2 | $12 | $0.20 | −50% | $508 |
| Gemini 3.5 FlashMID | $1.50 | $9 | $0.15 | −50% | $381 |
| Gemini 3 Flash PreviewMID | $0.50 | $3 | $0.05 | −50% | $127 |
| Gemini 3.1 Flash-LiteBUDGET | $0.25 | $1.50 | $0.025 | −50% | $63.5 |
* Example workload — chatbot, 100k requests/mo, 2,000 input / 300 output tokens per request, 70% of input cached. Computed by the same engine as the calculator. Batch: the −50% is Google's verified Batch API discount; the ≈ $/mo column is computed without it.
Cached input is billed at 10% of the input rate across all Gemini models we track — a major lever for chatbots and agents where most of the prompt repeats. The calculator models this with your cache share.
The Batch API runs asynchronous jobs at a verified −50% on both input and output across all models we track — flip the Batch toggle in the calculator to model it.
Gemini 3.1 Pro Preview, Gemini 3.5 Flash, Gemini 3 Flash Preview and Gemini 3.1 Flash-Lite run a verified 1M-token context window. Note the long-context surcharge: Gemini 3.1 Pro Preview bills prompts over 200k tokens at $4 in / $18 out per 1M — the calculator applies these rates automatically once your input crosses 200k tokens.
| Use case | Verdict |
|---|---|
| High-volume calls where a budget Flash tier is enough | Gemini's Flash tiers fit |
| You need a large context window across most tiers | Gemini offers it broadly |
| Prompts regularly exceed the long-context threshold on the Pro tier | Budget for the higher long-context rate |