Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models

“Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models,” submitted to arXiv on October 9, 2023 by Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen, Heng-Tze Cheng, Ed H. Chi, Quoc V. Le, and Denny Zhou of Google DeepMind, introduced a prompting technique built on a simple human habit: before diving into a hard problem, step back and recall the general principle that governs it.

Step-back prompting works in two moves. First the model is asked to derive a high-level concept or first principle from the specific question - for a physics problem, that might be the relevant law; for a knowledge query, the broader entity or time period involved. Then the model reasons toward the answer using that abstraction as a guide. The intuition is that grounding in a general principle keeps the model from getting lost in the surface details of a specific instance.

Tested across PaLM-2L, GPT-4, and Llama2-70B, the method produced gains on STEM, knowledge, and multi-hop reasoning tasks: 7 percent on MMLU Physics, 11 percent on MMLU Chemistry, 27 percent on the TimeQA temporal-reasoning benchmark, and 7 percent on the MuSiQue multi-hop QA set.

Why business readers should care: step-back prompting is a clean example of how the structure of a prompt - not just its content - shapes accuracy. Asking a model to name the governing principle before answering is a low-effort change that measurably reduces errors on technical questions.

Sources

Last verified June 7, 2026