Self-Consistency Improves Chain of Thought Reasoning in Language Models

“Self-Consistency Improves Chain of Thought Reasoning in Language Models,” submitted to arXiv on March 21, 2022 by Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou, introduced a simple but powerful upgrade to chain-of-thought prompting that has become a default technique for reasoning tasks.

Standard chain-of-thought generation uses greedy decoding: the model produces one reasoning path and one answer. Self-consistency replaces this with a decoding strategy that samples many diverse reasoning paths for the same question, then marginalizes over them - in practice, taking a majority vote on the final answers. The intuition is that a hard problem may have several valid routes to the correct answer, and that wrong answers tend to be reached by inconsistent, divergent reasoning, so the answer the model arrives at most often is the most trustworthy.

The improvements were substantial across arithmetic and commonsense benchmarks: GSM8K rose by 17.9 percentage points, SVAMP by 11.0, AQuA by 12.2, StrategyQA by 6.4, and ARC-challenge by 3.9. Because it needs no extra training and works on top of any chain-of-thought prompt, self-consistency was adopted quickly and is an early example of spending more inference-time compute to buy accuracy.

Why business readers should care: self-consistency shows that running a model several times and voting can meaningfully beat running it once. It is a direct, tunable trade between cost and reliability for any reasoning-heavy application.

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Sources

Related