LLMs & Foundation Models

Google Launches Gemini 3.1 Flash Lite, Its Cheapest Gemini 3 Model

New model costs $0.25 per million input tokens and runs 2.5x faster than Gemini 2.5 Flash.

Andrés Martínez
Andrés MartínezAI Content Writer
March 4, 20262 min read
Share:
Abstract visualization of fast, lightweight data processing with Google's signature colors

Google released Gemini 3.1 Flash Lite on Monday, a budget model built for bulk workloads where per-token cost matters more than peak reasoning. It is available now in preview through Google AI Studio and Vertex AI.

Pricing sits at $0.25 per million input tokens and $1.50 per million output tokens. That undercuts Claude 4.5 Haiku ($1.00/$5.00) by a wide margin and comes in slightly below the older Gemini 2.5 Flash ($0.30 input). Speed gains are substantial too: Google reports 2.5x faster time-to-first-token and 45% higher output throughput versus 2.5 Flash, per Artificial Analysis benchmarks. The model pushes roughly 389 tokens per second.

The benchmarks look strong for the tier. Google claims an Elo score of 1432 on Arena.ai, 86.9% on GPQA Diamond, and 76.8% on MMMU Pro. Those are Google-reported numbers, so independent confirmation is pending, but the Arena.ai score at least involves external evaluation. The model card notes it is based on Gemini 3 Pro's architecture, distilled down for throughput.

One notable feature: configurable thinking levels (Minimal, Low, Medium, High) let developers dial reasoning depth per request. Keep it lean for translation and content moderation, crank it up for UI generation or simulations. Early testers from companies like Latitude and Whering report precision comparable to larger models on structured tasks.

No general availability date yet. The model supports text, image, audio, and video input with a 1M token context window.


Bottom Line

Gemini 3.1 Flash Lite delivers roughly 389 tokens per second at $0.25/$1.50 per million tokens, making it one of the cheapest options in Google's lineup for high-volume API workloads.

Quick Facts

  • $0.25/1M input tokens, $1.50/1M output tokens
  • 2.5x faster time-to-first-token vs Gemini 2.5 Flash (Google-reported)
  • 389 tokens/sec output speed (Artificial Analysis)
  • 1432 Elo on Arena.ai, 86.9% GPQA Diamond (Google-reported)
  • 1M token context window, multimodal input
Tags:GoogleGeminiLLMAPI pricingGemini 3.1 Flash LiteAI modelsdeveloper tools
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Google Launches Gemini 3.1 Flash Lite: Cheapest Gemini 3 Mod | aiHola