AI Models Crack High-Level Math in Surprising Breakthrough

AI Models Crack High-Level Math in Surprising Breakthrough AI Models Crack High-Level Math in Surprising Breakthrough
IMAGE CREDITS: GETTY IMAGES

AI models are beginning to solve mathematical problems that once seemed far beyond the reach of machines, and the pace of progress is accelerating faster than many experts expected. Over a recent weekend, Neel Somani, a software engineer, former quantitative researcher, and startup founder, decided to push the limits of modern AI reasoning. He tested a difficult, high-level math problem on OpenAI’s latest system using ChatGPT and let the model work uninterrupted for fifteen minutes. When he returned, the system had produced a complete solution.

What made the result striking was not just the final answer but its validity. Somani carefully checked the reasoning, then formalized the proof using Harmonic, a tool designed to verify mathematical logic. Every step held up under scrutiny. The experiment challenged long-standing assumptions about where large language models break down when confronted with advanced mathematics.

Somani explained that his goal was to establish a realistic benchmark for AI problem solving in pure math. He wanted to see when large models genuinely understand structure and logic rather than pattern-matching from training data. With the newest generation of models, he observed that the boundary had shifted. Problems that once caused systems to stall were now being handled end to end, sometimes with surprising elegance.

The internal reasoning produced by the model was equally eye-catching. It referenced classical results such as Legendre’s formula and Bertrand’s postulate while weaving them into a coherent argument. During its reasoning process, the model also identified a Math Overflow discussion from 2013, where Harvard mathematician Noam Elkies had outlined an elegant solution to a related problem. Instead of copying that work, however, the AI produced a distinct proof that addressed a broader version of the question.

That broader version traces back to the work of Paul Erdős, one of the most prolific mathematicians in history. Erdős left behind more than a thousand conjectures that remain a central challenge for the mathematical community. These problems vary widely in difficulty and subject matter, making them an ideal testing ground for emerging AI systems.

For skeptics of machine intelligence, this kind of result is difficult to dismiss. It is also part of a wider pattern. AI tools are now deeply embedded in modern mathematical research. Some models focus on formal proofs, while others specialize in literature review and exploration of prior work. Since the release of GPT 5.2, which Somani describes as noticeably stronger in mathematical reasoning than earlier versions, the number of credible AI-assisted solutions has grown rapidly.

Somani focused much of his exploration on the Erdős problem set, which is maintained online and carefully tracked by the mathematical community. The first wave of autonomous progress appeared late last year, driven by a Gemini-powered system called AlphaEvolve. More recently, however, researchers have found that GPT 5.2 performs exceptionally well on a range of advanced problems that previously resisted automation.

The shift is visible in the public record. Since Christmas, fifteen Erdős problems have moved from open to solved status. Eleven of those solutions explicitly credit AI models as part of the discovery or verification process. While humans remain involved, the role of machines is no longer peripheral.

A more cautious assessment comes from Fields Medalist Terence Tao. On his GitHub page, Tao tracks cases where AI systems have made meaningful contributions to Erdős problems. He counts eight examples where models generated substantial autonomous progress and six additional cases where AI accelerated discovery by locating and extending earlier research. The results, he notes, do not yet point to fully independent machine mathematicians, but they do reveal a clear shift in capability.

Tao has also reflected publicly on why AI may be especially effective in this area. Writing on Mastodon, he suggested that the scalable nature of large models makes them well suited to tackling the long tail of obscure Erdős problems. Many of these conjectures, while intimidating, turn out to have relatively straightforward solutions once the right approach is found. According to Tao, such problems may now be more likely to fall to purely AI-driven methods than to traditional human or hybrid efforts.

Another factor driving progress is a renewed focus on formalization. Formal proofs translate intuitive mathematical arguments into precise logical steps that can be checked and extended by machines. The process has always been time-consuming, but recent tools have reduced that burden significantly. The open-source proof assistant Lean, developed at Microsoft Research in 2013, has become a standard platform for formal reasoning in mathematics.

AI systems are now layered on top of these tools to automate large portions of the formalization workflow. Harmonic’s Aristotle is one example, designed to help researchers turn informal insights into rigorously verified proofs. This shift makes it easier to validate results and build upon them without redoing foundational work from scratch.

For Harmonic founder Tudor Achim, the growing count of solved Erdős problems is less important than a deeper cultural change. He points to the increasing willingness of respected mathematicians and computer scientists to openly use AI tools in their work. In his view, that adoption speaks louder than any headline number.

Achim argues that academics have strong incentives to protect their reputations. When professors and researchers publicly acknowledge using tools like Aristotle or ChatGPT, it signals real confidence in the technology. That confidence, he believes, marks a turning point in how mathematical research is conducted.

Taken together, these developments suggest that AI is no longer just an assistant for routine calculations or searches. It is becoming a genuine partner in exploring unsolved questions. While human insight remains essential, large language models are now proving capable of advancing the frontier, one conjecture at a time.