The Curious Case of Neural Text Degeneration (nucleus sampling)

“The Curious Case of Neural Text Degeneration,” submitted to arXiv on April 22, 2019 by Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi and published at ICLR 2020, diagnosed why language models that score well on understanding tasks nonetheless generate bland, strangely repetitive text - and offered the decoding method that fixed it.

The key insight is that the problem lies not in the model but in the decoding strategy used to turn its probabilities into text. The authors showed that the same neural model can produce wildly different output quality depending only on how tokens are selected. Greedy decoding and beam search, which always chase the highest-probability tokens, fall into repetitive loops because human text does not actually stay on the most-probable path; it draws on a broader spread of the probability distribution.

Their solution, nucleus sampling (also called top-p sampling), samples from the smallest set of tokens whose cumulative probability exceeds a threshold p - the “nucleus” of the distribution. This keeps the diversity that makes text feel human while truncating the unreliable long tail of low-probability tokens that produce nonsense. Because the nucleus grows and shrinks with the model’s confidence at each step, it adapts in a way that fixed top-k truncation does not.

Why business readers should care: nucleus sampling is the default text-generation setting behind most deployed language models. This paper is why the temperature and top-p knobs exist, and why tuning them - not just the model - changes whether output reads as fluent or robotic.

Sources

Last verified June 7, 2026