“Emergent Abilities of Large Language Models” was submitted to arXiv on June 15, 2022 by Jason Wei and a large team drawn from Google, Stanford, DeepMind, and UNC Chapel Hill. It coined a term that defined much of the 2022-2023 debate about how fast AI capabilities would grow.
The paper’s claim was that while overall language-model loss improves smoothly and predictably with scale, certain specific abilities behave differently. On a range of tasks - multi-step arithmetic, certain reasoning problems, following particular instruction formats - small and mid-sized models perform at near-chance levels, and then, past some threshold of scale, performance jumps sharply to well above chance. The authors called these “emergent” abilities, defined as capabilities not present in smaller models that cannot be predicted by extrapolating the performance of smaller models. They catalogued dozens of examples across benchmarks including BIG-bench.
The unsettling implication was that scaling might keep unlocking new capabilities discontinuously, so that a model one size larger could suddenly do things no prior model could, with little warning. That framing fed directly into safety and forecasting discussions: if abilities appear without notice, capability planning and risk assessment become much harder.
The paper’s central claim was sharply contested the following year. A 2023 rebuttal, “Are Emergent Abilities of Large Language Models a Mirage?”, argued that the sudden jumps were largely artifacts of the harsh, all-or-nothing metrics used to score the tasks, and that smoother metrics reveal gradual improvement. The two papers are best read as a pair, and the disagreement remains a live one about whether scale buys genuine phase changes or merely the appearance of them.