Google Research rolled out Gemini-SQL2 on June 12, a text-to-SQL system built on Gemini 3.1 Pro. It turns plain-language questions into runnable SQL, so people can pull data without writing the queries themselves. The team unveiled it in a thread on X, not a blog post or technical report.
The headline number: 80.04% execution accuracy on BIRD, which Google reports puts it first on the single-model leaderboard. BIRD doesn't just check whether the SQL looks valid. It runs the query against real databases and confirms the result matches the gold answer. Google calls it a "breakthrough," though that's the company's own framing on its own announcement.
BIRD is a tougher test than older benchmarks like Spider. It covers more than 12,000 question-SQL pairs across 95 databases and 37 professional domains, complete with dirty values and questions that need outside knowledge to answer. Google now holds the top two spots on the single-model track, ahead of its own earlier Gemini-SQL.
Worth noting what's missing. Google hasn't released a model string, an API, a model card, or a paper, which means nobody outside Google can reproduce the 80.04% yet. The score also still trails human performance on BIRD, which sits near 93%. Google says better SQL understanding will feed into natural language features across its data services, but it hasn't named a product or a date.
Bottom Line
Gemini-SQL2 scored 80.04% on BIRD's single-model track, but Google has shipped no API, paper, or model card to back the claim.
Quick Facts
- 80.04% execution accuracy on BIRD (company-reported)
- Built on Gemini 3.1 Pro
- Announced June 12, 2026 via X thread
- BIRD: 12,751 question-SQL pairs, 95 databases, 37 domains
- Human performance on BIRD sits near 92.96%




