Google DeepMind has published a system that solved nine of 353 open Erdős problems and proved 44 of 492 open conjectures from the Online Encyclopedia of Integer Sequences, none of it checked by a human doing the math by hand. The team described AlphaProof Nexus in a preprint on May 21, pairing large language models with the Lean proof assistant so that every logical step gets verified by a compiler instead of a referee.
The boring version did almost as well
Here is the result that should make people pause. DeepMind built a fancy agent, layering an evolutionary search and its reinforcement-learning prover AlphaProof on top of a plain loop, and used it to crack all nine Erdős problems. Then they checked whether the plain loop, the one that just alternates between an LLM writing Lean and the compiler complaining, could match it. It solved all nine too.
The takeaway is awkward for anyone selling complexity. A plain generate-and-check cycle running on Gemini 3.1 Pro now reaches research-level math on its own, and the elaborate version mostly just kept costs down on the worst cases.
"An ongoing shift from specialized trained systems toward simple agentic loops," the authors call it, which is the kind of claim that ages badly if the next hard problem turns out to need the fancy machinery after all.
What the compiler refuses to swallow
Verification is the actual point. In plain-text math, a model can invent a lemma, gesture at a "known result" that does not exist, and hide the hard part of a problem inside a helper claim nobody inspects. Its own failure analysis admits the agent did exactly this, repeatedly, on problems it could not solve. It dumped the difficulty into a single unproven step and dressed up hallucinated lemmas as established literature.
Lean does not care. A proof with a gap fails to compile, so the fakery never reaches the results column. That is why the nine solves carry more weight than the usual benchmark figure: a machine, not a tired reviewer, signed off. The system even caught errors in how existing conjectures had been written down, flagging at least two cases where the informal statement was ambiguous about what "density" meant.
Nine out of 353 is the part to remember
Strip away the framing and the hit rate is roughly 2.5 percent on Erdős problems and about 9 percent on the OEIS set. These are problems someone already bothered to formalize in Lean, which skews toward combinatorics, number theory and optimization, where the library is mature. Anything demanding a large new theory is still out of reach, and DeepMind admits as much.
A few wins are genuinely meaty. It closed a roughly 15-year-old question on Hilbert functions in algebraic geometry and improved a bound in convex optimization by finding a parameter schedule that, as far as the authors can tell, no human had written down. Two of the Erdős problems had been open for 56 years.
All the Lean proofs sit in a public GitHub repo, logged against Bloom's Erdős catalog. Papers on the graph theory and additive combinatorics results are still in preparation, so the cleaner scorecard to watch next is whether those clear peer review.




