Google released Gemini 3.1 Flash Lite on Monday, a budget model built for bulk workloads where per-token cost matters more than peak reasoning. It is available now in preview through Google AI Studio and Vertex AI.
Pricing sits at $0.25 per million input tokens and $1.50 per million output tokens. That undercuts Claude 4.5 Haiku ($1.00/$5.00) by a wide margin and comes in slightly below the older Gemini 2.5 Flash ($0.30 input). Speed gains are substantial too: Google reports 2.5x faster time-to-first-token and 45% higher output throughput versus 2.5 Flash, per Artificial Analysis benchmarks. The model pushes roughly 389 tokens per second.
The benchmarks look strong for the tier. Google claims an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. Those are Google-reported numbers, so independent confirmation is pending, but the Arena.ai score at least involves external evaluation. The model card notes it is based on Gemini 3 Pro's architecture, distilled down for throughput.
One notable feature: configurable thinking levels (Minimal, Low, Medium, High) let developers dial reasoning depth per request. Keep it lean for translation and content moderation, crank it up for UI generation or simulations. Early testers from companies like Latitude and Whering report precision comparable to larger models on structured tasks.
No general availability date yet. The model supports text, image, audio, and video input with a 1M token context window.
Bottom Line
Gemini 3.1 Flash Lite delivers roughly 389 tokens per second at $0.25/$1.50 per million tokens, making it one of the cheapest options in Google's lineup for high-volume API workloads.
Quick Facts
- $0.25/1M input tokens, $1.50/1M output tokens
- 2.5x faster time-to-first-token vs Gemini 2.5 Flash (Google-reported)
- 389 tokens/sec output speed (Artificial Analysis)
- 1432 Elo on Arena.ai, 86.9% GPQA Diamond (Google-reported)
- 1M token context window, multimodal input




