Finetuned Language Models Are Zero-Shot Learners (FLAN)

“Finetuned Language Models Are Zero-Shot Learners” was submitted to arXiv on September 3, 2021 by Jason Wei, Maarten Bosma, Quoc Le, and colleagues at Google. It introduced instruction tuning and the model FLAN, and it is one of the two papers (alongside OpenAI’s InstructGPT) that turned raw language models into systems that follow plain-language requests.

The recipe is straightforward. Take a large pretrained model - here a 137-billion-parameter one - and fine-tune it on a wide mixture of existing NLP tasks, but reformat every task as a natural-language instruction with a template, such as “Translate this sentence to French:” followed by the input. After training on more than sixty such tasks, the model generalizes to following instructions for tasks it was never explicitly tuned on. The authors reported that FLAN beat zero-shot GPT-3 (175B) on twenty of twenty-five evaluated tasks, and beat few-shot GPT-3 on several, despite being smaller.

Crucially, the paper isolated what makes instruction tuning work: it helped only above a certain model scale, and it benefited from the number and diversity of tasks in the tuning mix. Instruction tuning on too few tasks, or on a model too small, did not transfer.

FLAN established that the gap between a model that merely continues text and one that does what a user asks could be closed cheaply, without reinforcement learning, just by supervised fine-tuning on instructions. InstructGPT, published months later, added human feedback on top; the combination of instruction tuning plus preference learning is the template behind essentially every chat assistant shipped since.

Sources

Last verified June 7, 2026