“Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” submitted to arXiv on May 21, 2022 by Denny Zhou, Nathanael Scharli, Le Hou, Jason Wei, and colleagues at Google, tackled a known limitation of chain-of-thought prompting: it often fails on problems harder than the examples in the prompt.
The method works in two stages. First, decomposition: the model is prompted to break a complex problem into a sequence of simpler subproblems, ordered from easiest to hardest. Second, sequential solving: the model solves each subproblem in turn, and the answer to each one is fed in as context for the next. This builds the final solution out of a chain of smaller, manageable steps, so the model never has to leap directly to a hard answer.
The standout result was on SCAN, a compositional generalization benchmark that tests whether a system can handle longer or novel combinations of known parts. Using GPT-3 (code-davinci-002) with just 14 exemplars, least-to-most prompting solved SCAN in every split, including the difficult length split, with at least 99 percent accuracy. By comparison, chain-of-thought prompting reached only 16 percent, and the method even beat specialized neuro-symbolic models trained on more than 15,000 examples.
Why business readers should care: least-to-most prompting demonstrates that breaking a task into ordered, easier pieces lets a model handle problems it cannot solve in one shot. The decompose-then-solve pattern is a reusable strategy for any complex, multi-step workflow.