An OpenAI Model Just Broke an 80-Year-Old Erdős Conjecture on Its Own — This Is What 'AI Doing Math' Looks Like
On May 20, OpenAI said one of its internal general-purpose reasoning models autonomously disproved the unit distance conjecture, a central problem in discrete geometry Paul Erdős posed in 1946. The 125-page proof leaned on deep algebraic number theory (Golod-Shafarevich theory, infinite class field towers). Fields medalist Tim Gowers called it 'a milestone in AI mathematics'; Princeton's Will Sawin pinned the gain at n^(1+δ), δ≥0.014.

Here's the deal: an AI took a problem statement and broke an 80-year-old conjecture by itself
On May 20, OpenAI announced that one of its internal general-purpose reasoning models autonomously disproved the unit distance conjecture — a central problem in discrete geometry that Paul Erdős posed in 1946. This isn't another benchmark score. The model knocked over an open problem at the heart of a math subfield, without a human walking it through the steps. That's why people are calling it a first.
The problem is easy to state. Place n points in a plane. What's the maximum number of pairs that are exactly distance 1 apart? For nearly 80 years, the field's intuition was that square-grid arrangements are essentially optimal — lay the points out in a regular lattice and you get the most unit-distance pairs. The model built an infinite family of configurations that beats the grid, refuting the bound everyone assumed was right.
The shocking part isn't the result — it's the method. The model didn't brute-force its way by nudging grids around. It connected the Golod-Shafarevich criterion (proved in 1964) and infinite class field towers — deep machinery from algebraic number theory — to an elementary geometry problem. Combinatorial geometers hadn't even thought to reach for those tools. Out came a 125-page proof, checked by outside mathematicians.
And the reaction is heavy. Fields medalist Tim Gowers called it "a milestone in AI mathematics." Princeton's Noga Alon called it "an outstanding achievement." Princeton's Will Sawin quantified the improvement and published a companion paper the same day. A follow-up paper on the disproof (arXiv 2605.20695) is already circulating, and the math community is actively debating it.
The players — Erdős, Gowers, Sawin, and a 'general-purpose' reasoning model
Paul Erdős (1913–1996). The most prolific mathematician of the 20th century — over 1,500 papers, and the namesake of the "Erdős number." The unit distance problem he posed in 1946 is one of the most famous open problems in combinatorial geometry, a hard-core puzzle whose upper and lower bounds barely budged for 80 years. Erdős himself famously attached a prize to it.
Tim Gowers. Fields medalist (1998), a giant in combinatorics and functional analysis, and a long-time public voice on AI's role in math. When he calls this a "milestone," it's not a courtesy — it reads as a judgment that the model crossed the line from "solving competition problems" to "research-grade discovery."
Will Sawin. Princeton mathematician who took the model's disproof and pinned the improvement at n^(1+δ), δ≥0.014 in a companion paper. That matters because it converts a qualitative claim ("better than the grid") into a hard mathematical statement about how much better. In other words, a real human-AI collaboration loop actually ran: the model produced the construction, a human turned it into theory.
OpenAI's 'general-purpose' reasoning model. The key point: this wasn't a math-only fine-tuned system. Per OpenAI, the model (1) wasn't trained for this problem, (2) didn't search for existing solutions, and (3) didn't get step-by-step human guidance. It took the problem statement and produced a 125-page proof on its own. Unlike a theorem-proving specialist like AlphaProof, this was a general model — that's the differentiator.
What it actually broke, and how
The structure. The unit distance problem asks for u(n), the max number of distance-1 pairs among n points. Erdős conjectured the upper bound looks roughly like n^(1+c/loglog n); separately, the (now-broken) belief was that the square grid is nearly optimal. The grid yields about n^(1+c/loglog n) pairs, and for decades nothing beat it.
What the model did. It found an infinite point family that produces asymptotically more unit-distance pairs than the grid — proving that for large enough n, a configuration exists that beats the lattice. Sawin's δ≥0.014 means this edge isn't a rounding error; it's a genuine polynomial-scale improvement.
Why algebraic number theory? This is the jaw-dropper. The unit distance problem is geometric — it's about distances in the plane. The model translated it into number-theoretic structure. The Golod-Shafarevich criterion was built to tackle abstract questions like "does an infinite class field tower exist?" The model used it to extract point arrangements, sitting on specific algebraic-integer structures, where unit-distance pairs explode. That bridge between combinatorial geometry and algebraic number theory is so counterintuitive that even human researchers only saw the connection in hindsight.
Verification. The 125-page proof was reviewed by outside mathematicians, and a related paper (2605.20695) is on arXiv. But not everyone is cheering. Skeptics raise (1) the scope and reproducibility of the verification, (2) the precise definition of "autonomous" (where does the model end and human input begin?), and (3) whether the result was marketed too aggressively. That's healthy skepticism.
| Item | Old consensus | This result |
|---|---|---|
| Optimal arrangement | Square grid | Infinite family that beats the grid |
| Improvement | — | n^(1+δ), δ≥0.014 (Will Sawin) |
| Tools used | Combinatorial geometry | Golod-Shafarevich theory, class field towers |
| Proof length | — | 125 pages |
| Author | Human mathematicians | OpenAI general reasoning model (autonomous) + human verification |
Who gains what
OpenAI. First, a narrative shift. "AI solves math olympiad problems" is impressive but those problems already have answers. Breaking an open research problem is a qualitatively different asset. Second, proof of capability — it signals to enterprise and research markets that GPT-5-class reasoning models can produce genuinely new knowledge. Third, credibility — public endorsements from top authorities (Gowers, Alon) are reputation money can't buy.
Mathematics. It strengthens the "AI as collaborator" view. Just as Sawin took the model's output and theorized it, expect the "model proposes candidate constructions, humans verify and write them up" workflow to spread. Faster attacks on hard problems mean higher math productivity overall.
The 'AI for Science' camp. This becomes a powerful reference for the claim that AI can make real discoveries in drug design, materials, and physics. Paired with Jack Clark predicting a "Nobel-level discovery within 12 months" the same week, it builds a sense that AI for Science has moved from slogan to track record.
Even the skeptics gain. Paradoxically, this is a good case for the cautious crowd too. Debating the definition of "autonomous," reproducibility, and verification scope will sharpen how we evaluate AI discoveries. That pressure is what makes future announcements more transparent.
Precedents — wins and failures
Win: DeepMind AlphaProof / AlphaGeometry (2024). Google DeepMind unveiled theorem-proving systems that scored at silver-medal level at the IMO in 2024. But those were specialists solving competition problems with known answers. OpenAI's case ranks a notch higher: a general model breaking an open research problem.
Win: computer proofs of the Four Color Theorem and Kepler conjecture. The 1976 Four Color Theorem and 2014 Kepler conjecture (Flyspeck) were completed by machines checking vast case sets. But there, machines executed human-designed procedures. Here, the model chose its own tools (algebraic number theory) — a decisive difference.
Disputed / overstated AI math claims. History is littered with "AI solved math" headlines that turned out inflated — the model only worked inside a human-built frame, or the result didn't reproduce. The current caution around "autonomous" comes from that learned wariness. Which is exactly why outside verification and Sawin's companion paper matter so much.
How rivals counter
Google DeepMind. The most direct competitor. Expect it to merge AlphaProof / AlphaGeometry / Gemini into a "general model attacks open problems" push. Right after I/O 2026, a "Gemini cracked problem X" rebuttal wouldn't be surprising. DeepMind has the cred — AlphaFold even won a Nobel in Chemistry (2024).
Anthropic. Could push Claude's reasoning toward math and science discovery. But Anthropic leans hard on a "safety and trust" position, so it might differentiate via "verifiable AI math" rather than discovery bragging. Jack Clark's Oxford lecture the same week sets that table.
Meta FAIR and Chinese labs. Meta via open-source math models; DeepSeek and others via their own reasoning models (the R-series) could announce "we cracked problems too." But getting public verification from top authorities is the real gate — score-bragging alone won't match this impact.
The academy itself. Some mathematicians may reinterpret the result as "doable without AI" or "the key idea was human-supplied." That's less competition than verification — and it'll only make the bar for "AI discovery" stricter.
So what actually changes — by persona
Math and theory researchers. A signal that workflows are shifting: throw candidate constructions at the model, keep verification, theory, and write-up for yourself. Recommendation — pick open problems in your field where "construction / counterexample search" is the crux, and experimentally hand them to a reasoning model.
AI engineers and researchers. The lesson is that a general reasoning model chose its own tools autonomously. The pattern of attacking domain problems without fine-tuning is direct inspiration for agent design (letting the model decide which tool to use when).
Investors and enterprises. This may be the inflection where "AI for Science" crosses from slogan to results. Next targets: fields with huge search spaces and clear verification — drug discovery, materials, chip design. Just always check for outside verification before buying the "autonomous" claim.
General readers. No direct impact, but "AI finds connections humans missed" just moved from abstract theory to demonstration. At the same time, the "what counts as autonomous" debate is a reminder to read AI-discovery news with a critical eye.
Regulators and policy. "AI produces new scientific knowledge" raises fresh governance questions — research integrity, authorship, reproducibility standards. AI's status in author lists and reproducibility requirements for discoveries will become real academic-policy fights.
References
- OpenAI — Model disproves a discrete geometry conjecture
- arXiv — Remarks on the disproof of the unit distance conjecture (2605.20695)
- Gil Kalai — Amazing: Erdős' Unit Distance Problem was Disproved!
- Interesting Engineering — 80-year-old geometry mystery cracked by OpenAI
- explainx.ai — OpenAI solves 80-year Erdős geometry problem
출처
관련 기사

OpenAI's Lilli Replaces Internal Knowledge Search with AI Agents

GPT-5.4 Deep Dive — The First General-Purpose Model That Actually Uses Your Computer

GPT-5.4 Thinking Ships — 33% Fewer Tokens, 33% Fewer Errors, and the Reasoning AI Tipping Point
AI 트렌드를 앞서가세요
매일 아침, 엄선된 AI 뉴스를 받아보세요. 스팸 없음. 언제든 구독 취소.
