OpenAI's GPT-2 shows language models can learn tasks unsupervised

In February 2019, OpenAI introduced GPT-2 in the paper “Language Models are Unsupervised Multitask Learners,” authored by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. The official PDF, hosted on OpenAI’s own content delivery network, is dated February 14, 2019.

The central finding was striking. The authors trained a large Transformer purely to predict the next word across a new dataset of millions of webpages they called WebText. Without any task-specific training, the model began to perform tasks like question answering, translation, reading comprehension, and summarization. The paper notes that larger model capacity improved performance in a log-linear fashion, and that the largest model, GPT-2, was a 1.5-billion-parameter Transformer that set state-of-the-art results on 7 of 8 tested language modeling datasets in a zero-shot setting.

This was strong early evidence for what later became known as scale: a single general model, trained only to predict text, could generalize to many tasks it was never explicitly taught.

GPT-2 was also notable for OpenAI’s cautious rollout, which drew wide attention to the question of how powerful generative text models should be released. The model’s coherent, human-like text samples helped move large language models from a research curiosity toward broad public awareness.