AI Benchmarks

Google's Gemini 3.5 Flash Beats Its Own Flagship at I/O 2026

Google shipped Gemini 3.5 Flash, Omni video, and Antigravity 2.0, with internal benchmarks doing the heavy lifting.

Andrés Martínez
Andrés MartínezAI Content Writer
May 20, 20266 min read
Share:
Google CEO Sundar Pichai on stage at I/O 2026 announcing Gemini AI updates in Mountain View

Google fired three barrels at its I/O developer conference on Tuesday, shipping Gemini 3.5 Flash, a multimodal video model called Omni, and a rebuilt Antigravity 2.0 agent platform. All three launched the same day in Mountain View. All three lean on Google's own internal benchmarks to make the case.

Google's headline claim, repeated across keynote slides and the DeepMind blog, is that Gemini 3.5 Flash beats Gemini 3.1 Pro on most coding and agentic tests. The Pro model in question shipped on February 19, 2026. So we're talking about a cheap, fast tier that supposedly overtook the company's own flagship in roughly three months. Whether that reads as a win for Flash or an indictment of Pro depends on who you ask.

The benchmark story Google wants you to read

The numbers Google put on stage: 76.2% on Terminal-Bench 2.1 versus 3.1 Pro's 70.3%, 1,656 Elo on GDPval-AA against 1,314, and 83.6% on MCP Atlas against 78.2%. Throw in 84.2% on CharXiv Reasoning for multimodal work, plus the now-standard "four times faster than comparable frontier models" framing, and the slide writes itself.

And then there's the bit Google didn't put on the slide. GPT-5.5 still leads Terminal-Bench 2.1 at 78.2%, two points above 3.5 Flash. Claude Opus 4.7 holds GDPval-AA at 1,753 Elo, nearly a hundred points clear of Google's new model and well ahead on SWE-Bench Pro for repo-scale code edits. The "frontier-level intelligence" language only really holds up if you ignore the actual frontier.

It gets thornier. Gemini 3.5 Flash trails 3.1 Pro on long-context retrieval and on knowledge-heavy tests like Humanity's Last Exam, which Google's own materials acknowledge. The model was tuned for agent execution, not raw recall. Fine. But "beats Pro on the benchmarks we picked" is a different sentence than "beats Pro," and the marketing isn't drawing the distinction.

Pricing puts the rest of the picture in focus. Gemini 3.5 Flash is going for $1.50 per million input tokens and $9 per million output, with cached input at $0.15, on a one-million-token context window. That's a Flash-tier price for what Google is calling Pro-tier reasoning. The economics matter more than the leaderboard position.

Omni: 'create anything,' minus the dangerous parts

The video model was the bigger consumer headline. Gemini Omni, which Sundar Pichai described from the keynote stage as a model that can "create anything from any input, starting with video," launched the same day. The first release, Omni Flash, is rolling out to Google AI Plus, Pro, and Ultra subscribers through the Gemini app and Google Flow, with free access through YouTube Shorts and YouTube Create later this week. Developer and enterprise API access is coming "in the coming weeks," which is a lifetime in AI-product calendar terms.

What Omni actually does, as opposed to what the keynote sizzle reel implied: it generates and edits video from combinations of images, audio, video, and text. Characters stay consistent across edits. Physics holds up better than in prior Veo releases. There's an avatar feature where you record yourself reading numbers, and the system stores a digital you for later use. Every output carries Google's SynthID watermark.

What Omni doesn't do is the interesting part. Google's DeepMind product director Nicole Brichtova told TechCrunch the company is still "working to test" editing the audio and speech in existing videos, so it can "better understand how we can bring this capability to users responsibly." Translation: the deepfake-adjacent feature that would make Omni most powerful is being deliberately held back. Brichtova also confirmed a 10-second clip ceiling for Flash, framing it as a deployment choice rather than a model limit.

A higher-tier "Omni Pro" was teased without a date.

Brichtova said Pro ships "when we feel like we're at a point where we have a step change above Flash," which is corporate for: when we have something better, you'll see it. The avatar onboarding flow, by the way, makes you read out numbers on camera before it'll generate anything in your likeness. Whether that holds up against motivated bad actors is its own question.

And then there's Antigravity 2.0

The third launch is the one most likely to actually change developer workflows. Antigravity 2.0 is a standalone desktop app built around running multiple coding agents in parallel, alongside a new CLI written in Go, an SDK for custom agents, and Managed Agents in the Gemini API that spin up isolated Linux environments per task. Most of it runs on 3.5 Flash, which Google says was itself co-developed using Antigravity.

Buried in the announcement: the original Gemini CLI is being put out to pasture. Consumer access to Gemini CLI and Gemini Code Assist IDE extensions ends June 18, 2026, per Google's developer recap. Antigravity CLI is now the migration path. Whether existing users see this as consolidation or churn depends largely on how attached they got to Gemini CLI in its very brief life.

Google also restructured its consumer AI pricing on the same day. A new $100-per-month AI Ultra tier slots between the $20 Pro plan and the top tier, which Google quietly dropped from $250 to $200. The middle tier gets 5x the Antigravity usage limits of Pro. Anthropic and OpenAI have been carving up that same price band for a while now, so this looks more like Google catching up than leading.

What's actually new here?

That's the question I keep landing on. A faster Flash model with selectively favorable benchmarks. A video generator with its genuinely powerful feature held back. A coding platform that consolidates tools Google shipped six months ago. None of that is bad. Some of it is genuinely useful. But the framing of three "major launches" is doing a lot of work.

The next dates worth watching are short. Omni Pro and Gemini 3.5 Pro are both promised, with 3.5 Pro confirmed for next month by Google. Omni API access for developers lands in the coming weeks. The Gemini CLI sunset hits on June 18. And given that Google's own benchmark slide showed competitors still leading on the tests that arguably matter most, the counter-launches from Anthropic and OpenAI probably won't be far behind.

Tags:GoogleGeminiGoogle I/O 2026Gemini 3.5 FlashGemini OmniAntigravity 2.0AI agentsGoogle DeepMindAI benchmarks
Andrés Martínez

Andrés Martínez

AI Content Writer

Andrés reports on the AI stories that matter right now. No hype, just clear, daily coverage of the tools, trends, and developments changing industries in real time. He makes the complex feel routine.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.

Gemini 3.5 Flash Beats Pro at Google I/O 2026 | aiHola