The Power of Scale for Parameter-Efficient Prompt Tuning

“The Power of Scale for Parameter-Efficient Prompt Tuning” was submitted to arXiv on April 18, 2021 by Brian Lester, Rami Al-Rfou, and Noah Constant of Google. It introduced prompt tuning, a method that learns “soft prompts” through backpropagation. Rather than writing a text prompt by hand or fine-tuning the whole model, the approach prepends a small set of trainable embedding vectors to the input and adjusts only those vectors while the rest of the model stays frozen.

The paper’s headline finding is in its title: the bigger the frozen model, the better prompt tuning works. At smaller scales it lagged full fine-tuning, but as models grew into the billions of parameters the gap closed, and prompt tuning matched fully fine-tuned models on the SuperGLUE benchmark while training only a tiny fraction of the parameters. The authors also showed that prompt tuning was more robust to domain shift than full fine-tuning, and that learned prompts could be ensembled cheaply.

Prompt tuning can be seen as a simpler cousin of prefix-tuning, restricted to the input embedding layer. For businesses, the appeal is reuse: one large frozen model can be adapted to many tasks by storing only a few kilobytes of learned prompt per task, instead of maintaining a separate multi-gigabyte tuned copy for each.

Sources

Last verified June 7, 2026