In June 2018, OpenAI published “Improving Language Understanding by Generative Pre-Training,” along with a blog post titled “Improving language understanding with unsupervised learning.” The work introduced the model now known as GPT-1, the first in the Generative Pre-trained Transformer line. The blog post and paper are hosted on OpenAI’s own site.
The core idea was a two-stage recipe. First, a Transformer-based language model was pre-trained on a large corpus of unlabeled text by learning to predict the next word, an unsupervised task that requires no human labels. Then the same model was fine-tuned on specific labeled tasks such as question answering, textual entailment, and sentiment classification. This generative pre-training followed by fine-tuning improved results across a range of language understanding benchmarks using a single, largely task-agnostic architecture.
GPT-1 was modest in size compared with what came later, but it established the template for the entire GPT family. Scaling this same pre-train-then-adapt approach to far larger models produced GPT-2, GPT-3, and ultimately the systems behind ChatGPT.