Mathematics

Google's LEAP Solves All 12 Putnam 2025 Problems in Lean

Google's LEAP framework uses general-purpose LLMs, not specialized provers, to formally solve every Putnam 2025 problem.

Oliver Senti
Oliver SentiSenior AI Editor
June 9, 20263 min read
Share:
Abstract visualization of formal mathematical proof structure rendered as a branching dependency graph

Google researchers published a paper this month on LEAP, an agentic system that gets general-purpose language models to write formal, machine-checked proofs in Lean. The headline result: it formally solved all twelve problems from the 2025 William Lowell Putnam Mathematical Competition, the undergraduate contest where the 2025 median score was 2 out of 120.

Why this is harder than it sounds

Natural-language proofs are notoriously hard to verify automatically. They skip steps. They hide assumptions. A formal proof, written in a language like Lean and checked by a compiler, gives you a correctness guarantee instead, but writing one is brutal work. The field has mostly been won by specialized models fine-tuned specifically for Lean.

LEAP takes a different route. The technical paper describes an agentic framework that runs on general models (Gemini 3.1 Pro as the backend) rather than a Lean-specialized prover. It sketches a high-level blueprint, structures it as a dependency graph, then generates Lean code and fixes errors recursively using compiler feedback. Solutions are posted on GitHub.

The numbers, and what they actually compare against

Here the source framing needs a correction. LEAP did not stand alone on Putnam. The paper itself says the 12-out-of-12 result matches two other systems, the closed Axiom and Numina, both of which also cleared all twelve. So perfect Putnam performance, while real, is no longer the rare feat it was a year ago.

The sharper comparison is Lean-IMO-Bench, a new set the authors built from 60 IMO-style problems formalized into Lean, picked for non-routine, structurally messy proofs. On that benchmark LEAP hit a 70% solve rate. General LLMs on their own land under 10%. Specialized prover models manage around 5%. Aristotle, the Harmonic system that reached gold-medal level at IMO 2025, scored 48% here. That gap is the part of the story worth paying attention to, more than the Putnam clean sweep.

One concrete flex: LEAP formally verified a key subproblem in Knuth's work on Hamiltonian decomposition of even-order Cayley graphs, generating more than 5,000 lines of Lean 4 to do it. It also formalized a proof for Erdős Problem 457.

About that irony

The authors take a swing at closed competitors. They call Axiom and Numina scientifically unverifiable because both stayed closed-source with no public access. Fair enough.

But the LEAP framework code itself does not appear to be open either. The paper releases the generated Lean proofs, which is something, since anyone with the Lean compiler can recheck those. Full reproducibility of the system that produced them is a different matter, and that part stays inside Google for now.

The paper went up on arXiv on June 2, 2026. The Lean-IMO-Bench resources are listed as available, so independent benchmarking against the 48% Aristotle figure is the obvious next test.

Tags:LEAPGoogleLeantheorem provingPutnam 2025Geminiformal mathematicsAI math
Oliver Senti

Oliver Senti

Senior AI Editor

Former software engineer turned tech writer, Oliver has spent the last five years tracking the AI landscape. He brings a practitioner's eye to the hype cycles and genuine innovations defining the field, helping readers separate signal from noise.

Related Articles

Stay Ahead of the AI Curve

Get the latest AI news, reviews, and deals delivered straight to your inbox. Join 100,000+ AI enthusiasts.

By subscribing, you agree to our Privacy Policy. Unsubscribe anytime.