An Observation on Generalization

This is Ilya Sutskever’s talk “An Observation on Generalization,” delivered on 14 August 2023 at the Simons Institute for the Theory of Computing in Berkeley as part of its program on Large Language Models and Transformers, and posted on the official Simons Institute YouTube channel. Sutskever was then chief scientist and a co-founder of OpenAI.

The talk offers his own theory of unsupervised learning. Sutskever argues that the reason unsupervised pre-training works at all can be reasoned about through the lens of compression: a model that compresses its training data well is forced to discover the structure that also makes it generalize to new data. He connects this to ideas from algorithmic information theory and uses it to explain, at a conceptual level, why next-token prediction on large corpora yields useful general representations.

This is a deep, research-oriented talk aimed at viewers comfortable with machine-learning theory. Its value as a primary source is that it is one of the clearest firsthand statements of how a central architect of modern large language models thinks about why they generalize.

An Observation on Generalization

Sources

Related