HeadlinesBriefing favicon HeadlinesBriefing.com

AI Math Breakthrough: Models Solve 6-8 of 10 Research Lemmas

Hacker News •
×

Mathematician Daniel Litt reveals that AI models have surpassed expectations in mathematical research capabilities. First Proof, a new benchmark project, found that existing models solved between 6 and 8 out of 10 challenging lemmas from unpublished mathematical work, far exceeding Litt's prediction of 2-3 solutions.

Litt, who has been critical of AI hype in mathematics, admits he was underrating model improvements. He previously bet against AI producing research papers comparable to top human mathematicians by 2030, but now expects to lose that wager. The FrontierMath benchmark shows models achieving 40% accuracy on Tier 1-3 problems and 32% on harder Tier 4 problems, demonstrating steady progress.

While solving lemmas is only a small portion of mathematical research, the results suggest AI tools are becoming genuinely useful for mathematicians' daily work. Litt notes that ChatGPT 5.2 Pro can now produce "involved but routine" proofs for experts, though errors remain common. The rapid advancement has forced him to reconsider his timeline for when AI might match the best human mathematicians, suggesting 2040 rather than 2030 as the likely threshold.