Enterprise AI

Google's Gemini-SQL2 Tops BIRD Text-to-SQL Leaderboard

Gemini 3.1 Pro system hits 80.04% execution accuracy, but there's no paper or API yet.

Andrés Martínez
Andrés MartínezAI Content Writer
June 16, 20262 min read
Share:
Abstract visualization of a natural language question transforming into database query code on a dark screen

Google Research rolled out Gemini-SQL2 on June 12, a text-to-SQL system built on Gemini 3.1 Pro. It turns plain-language questions into runnable SQL, so people can pull data without writing the queries themselves. The team unveiled it in a thread on X, not a blog post or technical report.

The headline number: 80.04% execution accuracy on BIRD, which Google reports puts it first on the single-model leaderboard. BIRD doesn't just check whether the SQL looks valid. It runs the query against real databases and confirms the result matches the gold answer. Google calls it a "breakthrough," though that's the company's own framing on its own announcement.

BIRD is a tougher test than older benchmarks like Spider. It covers more than 12,000 question-SQL pairs across 95 databases and 37 professional domains, complete with dirty values and questions that need outside knowledge to answer. Google now holds the top two spots on the single-model track, ahead of its own earlier Gemini-SQL.

Worth noting what's missing. Google hasn't released a model string, an API, a model card, or a paper, which means nobody outside Google can reproduce the 80.04% yet. The score also still trails human performance on BIRD, which sits near 93%. Google says better SQL understanding will feed into natural language features across its data services, but it hasn't named a product or a date.


Bottom Line

Gemini-SQL2 scored 80.04% on BIRD's single-model track, but Google has shipped no API, paper, or model card to back the claim.

Quick Facts

  • 80.04% execution accuracy on BIRD (company-reported)
  • Built on Gemini 3.1 Pro
  • Announced June 12, 2026 via X thread
  • BIRD: 12,751 question-SQL pairs, 95 databases, 37 domains
  • Human performance on BIRD sits near 92.96%
Tags:GoogleGeminitext-to-SQLBIRD benchmarkAI databasesGemini 3.1 Proenterprise AI
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.