The Hutter Prize

The Hutter Prize is a cash award, funded by AI theorist Marcus Hutter, for the best lossless compression of a large sample of human knowledge. It was announced in August 2006 and is unusual among AI contests in resting on a theoretical claim: that the ability to compress text well is essentially the same as the ability to understand it, so that compression ratios “can be regarded as intelligence measures.” The better a program can predict the next piece of text, the more tightly it can encode it - and good prediction, the argument goes, requires a model of the world the text describes.

The benchmark is a fixed file. Originally it was enwik8, the first 100 million bytes of a snapshot of English Wikipedia. In 2020 both the file and the prize pool were expanded tenfold: the target became enwik9, the first billion bytes of Wikipedia - roughly the amount of text the organizers liken to a lifetime of reading - and the total funding rose to 500,000 euros. The prize pays 5,000 euros for each one-percent improvement over the previous record. To win, an entrant must submit not just a small compressed file but the decompressor that reconstructs the original exactly, and its size counts too, which rules out smuggling knowledge in through a bloated program.

The prize connects directly to Hutter’s broader theory. His work on AIXI and, with Shane Legg, on universal intelligence builds on the idea that prediction by the shortest program - a formalization of Occam’s razor due to Ray Solomonoff - is the core of intelligence. The Hutter Prize turns that abstract principle into a concrete, measurable contest. Its premise has aged interestingly: the large language models that now dominate AI are trained on next-token prediction, which is mathematically close to the compression objective the prize has rewarded since 2006.

Sources

Related