In-context Learning and Induction Heads

“In-context Learning and Induction Heads” was published on March 8, 2022 by an Anthropic team led by Catherine Olsson and Nelson Elhage, with co-authors including Neel Nanda, Chris Olah, and Dario Amodei, as part of the Transformer Circuits Thread. It is one of the foundational works in mechanistic interpretability - the attempt to reverse-engineer the specific algorithms that neural networks learn.

An induction head is a particular kind of attention head that implements a simple copy-and-continue rule. Scanning back through the text, it finds the previous place the current token appeared, looks at what came next, and raises the probability of that same continuation. The authors summarize the pattern as “[A][B] … [A] -> [B]”: if the sequence earlier contained A followed by B, then seeing A again predicts B. This is a basic but powerful form of learning from context within a single forward pass, with no weight updates.

The paper’s central claim is that induction heads are a major mechanism behind in-context learning - the ability of large models to pick up a new task from examples in the prompt. The key evidence is a sharp “phase change” early in training: at a specific point, induction heads form, and at exactly that point the model’s ability to use context for prediction jumps. The authors give six complementary lines of evidence linking the two, and argue the connection holds across model sizes.

Why business readers should care: this work showed that the prompt-following behavior products rely on is not magic but a discoverable, mechanical circuit - evidence that the inner workings of large models can be studied scientifically rather than treated as an impenetrable black box.

In-context Learning and Induction Heads

Sources

Related