Theorem Proving · LLM Reasoning
AlphaGeometry: Olympiad Geometry Without Human Proofs
AlphaGeometry pairs a language model with a symbolic engine and trains on 100M synthetic theorems, solving 25 of 30 olympiad geometry problems versus 10 for the prior best.
Quick answer
AlphaGeometry, from Google DeepMind and NYU, solves 25 of 30 recent IMO-level geometry problems, versus 10 for the previous best automated method and roughly the 25.9-problem average of human gold medalists. It does this without learning from a single human-written proof: it is trained entirely on 100 million synthetic theorems and their machine-generated proofs.
Why geometry proofs are hard for AI
Olympiad geometry sits in an awkward spot for machines. The deduction itself is rigorous, but the winning move is often an auxiliary construction — drawing a new point, line, or circle that is not in the original figure. That single creative step can collapse a problem that pure deduction would never crack, and the space of possible constructions is effectively infinite.
Two standard approaches each fail here. Symbolic provers can grind out valid deductions but have no taste for which construction to try, so search explodes. Language models trained on text are fluent but hallucinate steps that do not hold. The harder, quieter obstacle is data: there is almost no large corpus of human geometry proofs in a machine-checkable format, so the usual “imitate the experts” recipe has nothing to imitate.
A neuro-symbolic loop
AlphaGeometry splits the job between two components that cover each other’s weaknesses. A symbolic deduction engine exhaustively derives everything that follows from the current figure — fast, and never wrong. When deduction stalls, a transformer language model proposes one auxiliary construction to add to the figure, and the engine runs again on the expanded setup. The loop repeats until the goal is proved or the budget runs out.
The division of labor is the whole point. The language model never has to produce a correct proof; it only has to suggest where to look. Every step it enables is still verified by the symbolic engine, so a bad suggestion costs search time but cannot introduce an invalid proof. Neural intuition picks the construction; symbolic logic guarantees soundness.
Training on synthetic theorems
Because human proof data barely exists, the authors manufacture it. They sample random geometric premises, let the symbolic engine deduce all consequences, and read off theorems together with the deduction steps that prove them — yielding roughly 100 million examples with no human annotation. A key trick recovers the auxiliary constructions: when a proof depends on a point that was not among the original premises, that point is moved into the proof as a construction the model must learn to propose. This is how a system trained only on machine-generated data learns the one genuinely creative move geometers rely on.
Key results
On a test set of 30 olympiad-level geometry problems from 2000–2022, AlphaGeometry solves 25. The prior state-of-the-art automated geometer (Wu’s method) solves 10, and a strong unguided symbolic baseline solves 18; the average human IMO gold medalist solves 25.9 of these problems. Under evaluation by a former IMO medalist, AlphaGeometry’s output is judged to be valid, human-readable proofs rather than opaque coordinate bashing — and in one case it produced a more general solution than the official answer. Crucially, the gains come specifically from the construction-proposing language model: the symbolic engine alone plateaus at 18.
Limits and open questions
AlphaGeometry is narrow. It handles Euclidean plane geometry expressible in its formal language and does not touch combinatorics, number theory, or inequalities, which make up most of an IMO paper. Problems must first be hand-translated into its formal representation, so it is not reading contest PDFs end to end. Some proofs run long and mechanical compared with an elegant human write-up, and the symbolic engine’s reach bounds what is provable at all — if a fact lies outside its rule set, no amount of neural guidance helps.
The open question is generality. The recipe — synthesize verified data, let a model propose the creative leaps, and have a symbolic engine check every step — is appealing precisely because it sidesteps both hallucination and the data drought. Whether it transfers to domains without geometry’s clean, fully decidable deduction is unproven, and that, not the geometry score, is what to watch.
FAQ
How does AlphaGeometry solve geometry problems?
It alternates between a symbolic deduction engine that derives all consequences of the current figure and a language model that proposes a new auxiliary construction whenever deduction stalls. The engine verifies every step, so the model only needs good intuition, not correctness.
Did AlphaGeometry train on human proofs?
No. It is trained from scratch on about 100 million synthetic theorems and machine-generated proofs, with no human demonstrations. The auxiliary constructions are recovered from those synthetic proofs themselves.
How good is AlphaGeometry compared with humans?
On 30 olympiad geometry problems it solves 25, against an average of 25.9 for human IMO gold medalists and 10 for the previous best automated method — so it is near, but not above, gold-medal level on this slice.
What can’t AlphaGeometry do?
It is limited to Euclidean plane geometry stated in its formal language. It does not solve combinatorics, number theory, or inequality problems, and it needs problems pre-translated into its formal input rather than reading raw contest statements.
AlphaGeometry’s lesson is not “AI does geometry” — it is that synthetic data plus a verifier can teach a model the one creative leap a search engine cannot guess. Full results in Nature.