Large Language Models are Zero-Shot Reasoners (Let's think step by step)

“Large Language Models are Zero-Shot Reasoners,” submitted to arXiv on May 24, 2022 by Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa, made one of the most quoted discoveries in prompting: a single phrase, added before the model’s answer, dramatically improves reasoning without any worked examples.

Chain-of-thought prompting had already shown that giving a model a few hand-written step-by-step examples improved its reasoning. This paper showed you do not even need the examples. Simply appending “Let’s think step by step” to a question - what the authors call Zero-shot-CoT - causes the model to lay out intermediate reasoning on its own. The gains were large. On the MultiArith arithmetic benchmark, accuracy jumped from 17.7 percent to 78.7 percent. On GSM8K grade-school math, it rose from 10.4 percent to 40.7 percent. The effect was demonstrated on InstructGPT (text-davinci-002) and replicated on the 540-billion-parameter PaLM model.

The finding suggested that large models already contain latent multi-step reasoning ability that ordinary prompting fails to elicit, and that the right cue can unlock it. The “magic phrase” became a standard trick and a touchstone example of how sensitive model behavior is to prompt wording.

Why business readers should care: this result is the clearest demonstration that prompt phrasing alone can multiply a model’s accuracy several times over at zero extra cost. It is the canonical case for why prompt design is a real lever, not a cosmetic one.

Sources

Last verified June 7, 2026