AI Solved an Erdős Problem. Sort Of.

On January 8th, Terence Tao posted to Mathstodon that an Erdős problem had been solved "more or less autonomously by AI." Problem #728, a number theory question about factorial divisibility that had been open for decades. GPT-5.2 generated the proof, Harmonic's Aristotle system formalized it in Lean, and the math community verified it within hours.

The viral version of this story, currently making rounds on X, goes something like: "For the first time, an LLM has successfully solved an Erdős problem on its own." That's not wrong, exactly. But it's not right either.

What's an Erdős problem?

Paul Erdős was a Hungarian mathematician who died in 1996 after publishing more papers than anyone in history. He had a habit of posing problems, hundreds of them, often with small cash prizes attached. A $25 problem might take you a weekend. A $1000 problem might take a decade. The monetary value was mostly symbolic, a way of ranking difficulty.

Thomas Bloom's erdosproblems.com now catalogues over 1100 of these conjectures. About 40% have been solved. The rest range from "probably tractable with known techniques" to "might require entirely new mathematics." They span number theory, combinatorics, graph theory, and related fields.

Solving an Erdős problem used to mean something. It still does, though the meaning is getting complicated.

The three caveats nobody reads

Cambridge mathematics student AcerFur, who announced the result on January 6th, was careful to include three caveats. Most people sharing the news skipped them.

First: the original problem statement was ambiguous. Erdős's formulation admitted trivial solutions that clearly weren't what he intended. The erdosproblems.com community had to reconstruct what Erdős probably meant, adding constraints like requiring a, b ≤ (1-ε)n. So GPT-5.2 solved a community interpretation of an Erdős problem, not quite the same thing.

Second: GPT-5.2 needed feedback. The initial attempt didn't work. After some back-and-forth, the model refined its approach. "More or less autonomous" is doing real work in that sentence.

Third: similar results using similar methods already existed in the literature. A 2015 paper by Pomerance covered related ground. The AI's contribution was applying known techniques to this specific formulation, not inventing new mathematics.

We've been here before

If this sounds familiar, it should. In October 2025, OpenAI researchers claimed GPT-5 had solved ten Erdős problems. Kevin Weil, OpenAI's VP, posted that these had "all been open for decades." The math community pushed back immediately. Thomas Bloom, who runs erdosproblems.com, called it "a dramatic misrepresentation." The problems weren't unsolved. GPT-5 had just found existing literature that Bloom hadn't seen.

DeepMind CEO Demis Hassabis called the episode "embarrassing." Yann LeCun was less diplomatic. The original tweets got deleted.

So when Tao writes about problem #728, he's careful. The result is "not replicated in existing literature (although similar results proven by similar methods were located)." He notes this is consistent with "other recent demonstrations of AI using existing methods to resolve Erdos problems."

The distinction matters: AI is now good enough to apply standard techniques to problems humans haven't gotten around to. That's genuinely useful. It's also different from creating new mathematics.

What actually happened with #728

The problem asks whether there exist infinitely many integers a, b, n satisfying certain factorial divisibility conditions with a + b in the n + O(log n) regime. The original formulation on erdosproblems.com was, as Tao puts it, "misformulated."

A user named Kevin Barreto prompted GPT-5.2 Pro and got a proof. The forum discussion on erdosproblems.com shows the back-and-forth: initial attempts, refinements, concerns about whether similar results existed. Boris Alexeev then ran the proof through Aristotle for formal verification.

The Lean formalization passed. The math is correct.

One commenter on the forum thread captured the ambivalence: "In the currently uncertain case... this turns out to be the first case of an LLM entirely solving an open Erdos problem on its own that no human previously solved, for real this time." But then: "I want to voice the following concern: This was a primarily scientific experiment to see how far the models could be pushed."

The long tail theory

Tao has a theory about what's happening. Unsolved mathematical problems follow a long-tail distribution. At the very end of that tail are problems that are "amenable to simple proofs using fairly standard techniques" but haven't been tackled because not enough expert mathematicians have looked at them.

AI is now good enough to harvest that tail. Several problems on erdosproblems.com have been "solved" by AI recently, only for someone to discover the solution already existed in a 1977 paper or a 1981 paper. The AI didn't solve them so much as rediscover them, whether through training data contamination or by independently deriving the same straightforward argument.

Problem #728 might be different. The solution apparently isn't in the literature, at least not for this exact formulation. But it uses techniques from a 2015 paper. So is this AI doing novel mathematics, or AI doing literature review plus interpolation?

What this means

For practical purposes: AI tools are becoming genuinely useful for mathematical research. Not for cracking the Riemann hypothesis. For the grunt work. Finding relevant papers. Testing constructions. Formalizing proofs humans already understand informally.

OpenAI's GPT-5.2 announcement earlier this month claims the model helped "resolve an open research problem in statistical learning theory." They're careful to note that "responsibility for correctness, interpretation, and context remains with human researchers."

That's roughly where we are. AI can contribute to mathematics the way a very diligent graduate student can. It can check things, find things, try things. Sometimes those things turn out to be novel. Often they turn out to already exist.

The erdosproblems.com community has started tracking AI contributions systematically. The table shows a mix: some full solutions, some partial progress, many cases where the AI "solution" turned out to match existing literature.

One participant on the forum put it this way: "I support AI assistance, not full-on AI replacement." The concern isn't that AI will take over mathematics. It's that the narrative around these results runs ahead of the reality, and the hype creates problems down the line.

What happens next

The Formal Conjectures project at DeepMind has formalized statements for over 240 Erdős problems. About 40% of the ~1100 problems on erdosproblems.com have been solved, many recently. The infrastructure for AI-assisted mathematics is getting better.

Expect more announcements like this one. Some will hold up to scrutiny. Some won't. The pattern from October suggests checking the details before retweeting.

Tao's Mathstodon posts remain the best source for understanding what's actually happening. He's consistently careful about distinguishing AI finding literature from AI generating proofs from AI creating new mathematics. The distinctions matter, even when the tweets don't include them.