Google Launches Gemini 3.1 Flash Lite: Cheapest Gemini 3 Mod

Abstract visualization of fast, lightweight data processing with Google's signature colors

Google released Gemini 3.1 Flash Lite on Monday, a budget model built for bulk workloads where per-token cost matters more than peak reasoning. It is available now in preview through Google AI Studio and Vertex AI.

Pricing sits at $0.25 per million input tokens and $1.50 per million output tokens. That undercuts Claude 4.5 Haiku ($1.00/$5.00) by a wide margin and comes in slightly below the older Gemini 2.5 Flash ($0.30 input). Speed gains are substantial too: Google reports 2.5x faster time-to-first-token and 45% higher output throughput versus 2.5 Flash, per Artificial Analysis benchmarks. The model pushes roughly 389 tokens per second.

The benchmarks look strong for the tier. Google claims an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. Those are Google-reported numbers, so independent confirmation is pending, but the Arena.ai score at least involves external evaluation. The model card notes it is based on Gemini 3 Pro's architecture, distilled down for throughput.

One notable feature: configurable thinking levels (Minimal, Low, Medium, High) let developers dial reasoning depth per request. Keep it lean for translation and content moderation, crank it up for UI generation or simulations. Early testers from companies like Latitude and Whering report precision comparable to larger models on structured tasks.

No general availability date yet. The model supports text, image, audio, and video input with a 1M token context window.

Bottom Line

Gemini 3.1 Flash Lite delivers roughly 389 tokens per second at $0.25/$1.50 per million tokens, making it one of the cheapest options in Google's lineup for high-volume API workloads.

Quick Facts

$0.25/1M input tokens, $1.50/1M output tokens
2.5x faster time-to-first-token vs Gemini 2.5 Flash (Google-reported)
389 tokens/sec output speed (Artificial Analysis)
1432 Elo on Arena.ai, 86.9% GPQA Diamond (Google-reported)
1M token context window, multimodal input

Tags:GoogleGeminiLLMAPI pricingGemini 3.1 Flash LiteAI modelsdeveloper tools

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Google Launches Gemini 3.1 Flash Lite, Its Cheapest Gemini 3 Model

Bottom Line

Quick Facts

Andrés Martínez

Related Articles

Ai2 Opens Molmo 2 Video Model to Developers, Training Code Still Pending

OpenAI Ships GPT-5.3 Instant to Fix ChatGPT's Tone Problem

Alibaba Releases Qwen 3.5 Small Models Down to 0.8B Parameters

Stay Ahead of the AI Curve