Beam Search

Beam search is a way of turning a model’s token-by-token probabilities into a full output sequence. The simplest approach, greedy decoding, picks the single most likely token at every step, but that can paint the model into a corner: a locally attractive first word may force awkward choices later. Beam search hedges by keeping the several best partial sequences alive at once - the “beam width,” often 4 or 10 - extending each by every plausible next token, scoring the resulting candidates, and pruning back down to the best few. At the end it returns the highest-scoring complete sequence. The “Sequence to Sequence Learning with Neural Networks” paper used a left-to-right beam search decoder, and noted that even a beam of size 1 worked while a small beam improved results.

For years beam search was the standard decoder for machine translation and other tasks where there is roughly one correct answer and faithfulness matters more than variety. It is a heuristic, not a guarantee of the single most probable sequence, but it usually finds something close while exploring only a tiny fraction of the exponential space of possible outputs - the same kind of pruned best-first search idea found in classical algorithms like A*.

Its weakness shows up in open-ended generation. Searching hard for the most likely text tends to produce bland, repetitive output, because the highest-probability continuation is often a safe loop. That finding pushed open-ended text generation toward sampling methods like nucleus sampling instead.

Why business readers should care: beam search is the quiet workhorse behind translation and transcription, where you want the most accurate rendering rather than a creative one. Knowing the difference between “search for the best answer” and “sample for a varied one” clarifies why different AI products feel deterministic or chatty.

Sources

Related