Generative Language Modeling for Automated Theorem Proving (GPT-f)

“Generative Language Modeling for Automated Theorem Proving” (arXiv, September 2020, by Stanislas Polu and Ilya Sutskever) introduced GPT-f, an automated theorem prover for the Metamath formal language that uses a transformer language model to generate proof steps. The work tackled a long-standing weakness of automated provers: the difficulty of inventing the novel mathematical terms a proof sometimes requires, which language models turn out to be good at generating.

The headline result was that GPT-f found new, shorter proofs that were accepted into the main Metamath library. The authors note this was, to their knowledge, the first time a deep-learning system contributed proofs that a formal mathematics community adopted into its official collection, a meaningful step beyond benchmark numbers to real mathematical contribution.

GPT-f helped launch the modern line of neural theorem proving that runs through LeanDojo, autoformalization, and later systems like AlphaProof. For a general reader it is an early, concrete example of AI not merely assisting mathematicians but producing verified results good enough to be kept. That blend of generative creativity with formal checking is a template for trustworthy AI in any domain where correctness can be machine-verified.

Sources

Last verified June 7, 2026