Solving Math Olympiad Geometry Problems with Artificial Intelligence

image

Google’s DeepMind has announced AlphaGeometry, an artificial intelligence program that solves geometry problems at the level of gold medalists in the International Mathematical Olympiad: “AlphaGeometry: An Olympiad-level AI system for geometry”.

AlphaGeometry is a neuro-symbolic system made up of a neural language model and a symbolic deduction engine, which work together to find proofs for complex geometry theorems. Akin to the idea of “thinking, fast and slow”, one system provides fast, “intuitive” ideas, and the other, more deliberate, rational decision-making.

Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions. Symbolic deduction engines, on the other hand, are based on formal logic and use clear rules to arrive at conclusions. They are rational and explainable, but they can be “slow” and inflexible - especially when dealing with large, complex problems on their own.

AlphaGeometry’s language model guides its symbolic deduction engine towards likely solutions to geometry problems. Olympiad geometry problems are based on diagrams that need new geometric constructs to be added before they can be solved, such as points, lines or circles. AlphaGeometry’s language model predicts which new constructs would be most useful to add, from an infinite number of possibilities. These clues help fill in the gaps and allow the symbolic engine to make further deductions about the diagram and close in on the solution.

image

The research paper, published in Nature on 2024-01-17 is “Solving olympiad geometry without human demonstrations” [full text at link]. Here is the abstract.

Proving mathematical theorems at the olympiad level represents a notable milestone in human-level automated reasoning, owing to their reputed difficulty among the world’s best talents in pre-university mathematics. Current machine-learning approaches, however, are not applicable to most mathematical domains owing to the high cost of translating human proofs into machine-verifiable format. The problem is even worse for geometry because of its unique translation challenges, resulting in severe scarcity of training data. We propose AlphaGeometry, a theorem prover for Euclidean plane geometry that sidesteps the need for human demonstrations by synthesizing millions of theorems and proofs across different levels of complexity. AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. On a test set of 30 latest olympiad-level problems, AlphaGeometry solves 25, outperforming the previous best method that only solves ten problems and approaching the performance of an average International Mathematical Olympiad (IMO) gold medallist. Notably, AlphaGeometry produces human-readable proofs, solves all geometry problems in the IMO 2000 and 2015 under human expert evaluation and discovers a generalized version of a translated IMO theorem in 2004.

DeepMind has released complete source code and the trained model as open source: you can download it from GitHub.

7 Likes

Yeah this is pretty closely related to something I tried to get the wxMaxima folks to try doing several years ago:

Take the AlphaGo approach to symbolic math by first, bootstrapping off of interactions with wxMaxima by their user base for “moves” they make treating symbolic derivation as a “game” – and then go full-tilt-boogey the way AlphaZero did.

I suspect the next big advance out of DeepMind will be something more along those lines.

1 Like

Damn I (again) wish John were around for this!

2018 I was suggesting the rStar-Math approach to “self-play” training of math reasoning. See “#2” in this wxMaxima comment:

Now, for the coup de grâce:

#2 (Observing what experts do to simplify expressions.) relates directly to
practical efforts in expertise acquisition, most fabulously demonstrated by
AlphaGo by Google’ DeepMind’s team (co-founded by Hutter’s PhD students).
For decades, Go has been considered uncrackable by machine learning because
it is so “intuitive”. Although they were able to achieve world-class
expertise by observing games by world-class experts – and that would be
adequate for Maxima – AlphaGo Zero, was able to master the game of Go
without human knowledge https://www.nature.com/articles/nature24270.
Treat algebra (and calculus) as a “game” with “rules” that require
“intuition” of “experts” and there may be something done – even if not to
the degree achieved by AlphaGo Zero.

“A twig of lossless compression is worth a tree of thought.”

However, the “a twig of lossless compression” aphorism goes beyond rStar-Math to assert that by merely focusing on lossless compression of all data under consideration, the pre-training (aka “foundation”) model must converge on its own “tree of thought” algorithm – prior to “inference time”. Consider that all this talk about “tree of thought” is under an external judge of the “right” answer. This must happen in the KC limit when one considers that the KC program by definition must generate all prior data. In this case, the “external judge” is the next bit that is given by the dataset. So the KC program must generate a probability distribution for the next bit and do so based on a tree search.

At some point someone is going to publish the paper:

Better Lossless Compression Is All You Need

The abstract will go something like this:

ABSTRACT
Machine inference is prompted decompression of a world model. Inference transitions to reasoning when the world model produced by the learning algorithm must use reasoning algorithms to output the original training data without loss. We argue that all attempts at “inference time reasoning” are best so-characterized. Moreover, all attempts to circumvent the hard problem of lossless compression of the world model (avoiding approximating the Kolmogorov Complexity of the training data) are remedied by the aphorism:

“A twig of lossless compression is worth a tree of thought.”