DeepMind's Gemini Helped Prove a New Math Theorem. The Process is More Interesting Than the Result.

Google DeepMind dropped a paper on January 12th claiming a new theorem in algebraic geometry, proved with help from an internal math-specialized version of Gemini. This comes after weeks of discussion about GPT-5.2 Pro solving Erdős problems, so naturally everyone's watching.

The theorem itself involves something called "motivic classes of spaces of genus 0 maps to flag varieties." I'm not going to pretend I understand what that means. Neither should you, unless you're an algebraic geometer. What's actually interesting is buried in the paper's appendix: how the process worked.

The president of the American Mathematical Society weighed in

Ravi Vakil, Stanford mathematician and current AMS president, co-authored the paper. He called Gemini's contribution "the kind of insight I would have been proud to produce myself." Strong words from someone who literally wrote the textbook on algebraic geometry.

But here's the nuance that matters: Vakil acknowledged he "might have eventually reached this conclusion" on his own, but couldn't say with certainty. That's a careful hedge. The AI found something a human expert might have found. Or might not have.

Vakil's main takeaway was that "meaningful mathematical progress emerged from this genuine synergy between human ingenuity and Gemini's contributions." Synergy. Not autonomy. Keep that word in mind.

How it actually worked

The paper describes the methodology, and it's far from "give model theorem, receive proof." According to the details shared in the Russian-language source material and the paper itself:

The team used decomposition, breaking the complex theorem into simpler subproblems. They started by feeding Gemini simple special cases to verify it understood the definitions. The prompts literally included lines like "to make sure you understand me, tell me which [objects] you would choose."

When the model got stuck, researchers didn't just regenerate. They analyzed partially correct outputs, found the useful kernel, and wrote new prompts: "try using this strategy you found in the last step, but for the general case." That's not the model reasoning. That's humans doing the reasoning about model outputs.

Successful proofs of simple subproblems went into context for harder ones. The team essentially built a ladder of problems from easy to hard so the model could climb it. Someone had to design that ladder.

The researchers read model outputs at nearly every step. In one case, they noticed the model made a non-obvious observation in a partial solution. A human verified it and told the model: "this is a good idea, use it for everything else."

"Entirely human-authored"

The paper includes an important disclaimer: "the treatment in this paper is entirely human-authored (aside from excerpts in an appendix which are clearly marked as such)."

So we have an AI-assisted proof where the final writeup is human-authored, the problem decomposition is human-designed, the verification is human-performed, and the strategic direction is human-guided. What exactly did Gemini contribute?

The AI proved useful for "routine tasks like identifying connections to cross-disciplinary papers and writing data-generation code." But Vakil said "the most striking experience was how it propelled the project forward intellectually." Those are different claims.

The GPT-5.2 comparison is instructive

This lands in the middle of ongoing Erdős problem discourse. GPT-5.2 Pro recently solved Erdős Problem #397, with Fields Medalist Terence Tao accepting the proof after it was formalized in Lean. GPT-5.2 has now cracked problems #728, #729, and #397.

But Tao emphasized these are "lowest-hanging fruit" problems solvable with standard techniques, not profound breakthroughs. And even the #728 result came with three caveats: the original problem statement was ambiguous, similar results existed in existing literature, and the "autonomous" solve involved multiple models with human coordination at every stage.

The DeepMind paper is more honest about the collaboration. It doesn't claim autonomy.

What this actually demonstrates

The work came from DeepMind's Blueshift team, with researchers Freddie Manners and G. Salafatinos working alongside academic collaborators including Jim Bryan and Balázs Elek. DeepMind has been pushing AI for math since their AlphaGeometry and AlphaProof systems achieved silver-medal standard at the International Mathematical Olympiad in 2024. Gemini Deep Think hit gold-medal level at the 2025 IMO, solving five of six problems.

But competition math and research math are different beasts. Competition problems have known solutions. Research means finding something new. This paper claims to be research. That's the interesting part.

The methodology section describes a system called "FullProof" that was used during the work, but details on how it operates aren't provided. Probably something similar to the Pro-model reasoning approaches from OpenAI or DeepMind's own Deep Think.

The uncomfortable middle ground

Neither "AI can now do math research autonomously" nor "this is just a fancy autocomplete" captures what's happening here.

A more accurate description: AI models can now participate meaningfully in mathematical research when embedded in a careful human-designed process, with expert oversight at each step, and with strategic prompting that essentially teaches the model the problem structure. Whether that's impressive or underwhelming depends on your priors.

Vakil found "how meaningful mathematical progress emerged from this genuine synergy." The emphasis on synergy rather than replacement seems deliberate. Human mathematicians still do the hard conceptual work. The AI accelerates some parts.

The paper doesn't specify what "FullProof" does or how much compute was involved. I reached out to DeepMind for details. No response yet.

Watch for the follow-up papers. DeepMind launched an "AI for Math Initiative" partnering with institutions like Imperial College London to explore AI-assisted mathematical research. More results are coming, and we'll eventually get a clearer picture of what these tools can and can't do.