A Watermark for Large Language Models

“A Watermark for Large Language Models” was submitted to arXiv on January 24, 2023 by John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein at the University of Maryland, and presented at ICML 2023. It proposed a practical way to mark text as machine-generated at the moment it is produced.

The technique works during generation rather than after. Before each new word, the method uses the preceding tokens to pseudo-randomly split the vocabulary into a “green” list and a “red” list, then softly biases the model toward choosing green tokens. To a reader the text looks normal, but the output ends up containing more green tokens than chance would predict. A detector that knows the same secret rule can count green tokens and compute a statistical test, flagging text as watermarked without needing access to the model itself or to the original prompt. The authors demonstrated it on multi-billion-parameter models and released code.

The appeal is that detection is principled and self-contained: it rests on a statistical measurement rather than on guessing stylistic tells, and it does not require running the language model again. The trade-offs, explored in follow-up work, include robustness when text is paraphrased or edited and the tension between watermark strength and output quality.

For a business reader, watermarking speaks to provenance and accountability: a reliable way to tell whether content came from an AI system matters for trust, disclosure, and detecting misuse, even though no current watermark is unbreakable.

A Watermark for Large Language Models

Sources

Related