OpenAI is circulating a 125-page mathematical writeup claiming that an unreleased internal model — widely speculated to be the GPT-5.6 / "GPT-next" line — produced a constructive disproof of the Erdős planar unit-distance problem, an 80-year-old open problem in combinatorial geometry. The conjecture, posed by Paul Erdős in 1946, asks for the maximum number of unit-distance pairs realizable by n points in the plane; OpenAI's result reportedly establishes a counterexample to the long-standing tight upper bound, supplying a specific point configuration whose pair count exceeds the conjectured ceiling. Critically, the runtime claim is that the model produced the result in under 32 wall-clock hours on under $1,000 worth of compute, with the central inductive construction emerging on what the writeup highlights as a "page 39 moment" in the model's scratchpad.
The methodological framing is the part the field is mostly reacting to. Unlike the 2025 IMO Gold result, which ran on AlphaProof and a Lean-based formal proof harness, this work is being credited to a general-purpose chat model with extended reasoning, no theorem prover in the loop and no domain-specific tools beyond a Python sandbox for combinatorial enumeration. If the writeup holds up under external verification — and OpenAI is explicitly framing this differently from prior "OpenAI claims X" episodes by publishing the full construction — it would be the first time a general-purpose frontier LLM has produced a novel result on an open problem that competent combinatorialists had spent decades on without dislodging.
The broader signal is about the trajectory of test-time compute. The cost figure here is roughly the same order of magnitude as a single graduate student's monthly stipend, and the wall-clock figure is comparable to what a strong human collaborator might spend on a deep dive. If you accept the construction — and there are reasons to be cautious, including the fact that combinatorial counterexamples have a long history of subtle parity errors that survive several reads before being caught — the implication is that we have crossed a threshold where ~30-hour, ~$1k compute budgets on a frontier reasoning model can land novel results on real research conjectures. The community is already cross-checking the page-39 construction; mathematician Terence Tao's social-media response will be closely watched as a leading indicator. Caveats worth flagging: OpenAI has historically over-claimed on math benchmarks where post-hoc verification revealed pattern-matching to in-training-corpus solutions, and Erdős problems specifically have a known leakage surface via MathOverflow and the Erdős Problems database. Independent verification on a held-out subset of unit-distance variants would settle whether this generalizes.