Prefix-Tuning: Optimizing Continuous Prompts for Generation

“Prefix-Tuning: Optimizing Continuous Prompts for Generation” was submitted to arXiv on January 1, 2021 by Xiang Lisa Li and Percy Liang of Stanford University. It proposed a lightweight alternative to full fine-tuning: instead of updating all of a model’s weights for a new task, the model parameters stay frozen and only a small, continuous, task-specific vector called the prefix is optimized. The prefix is prepended to the input and influences how the frozen model attends to and generates text, behaving like a set of “virtual tokens” the model learns to condition on.

The authors applied the technique to GPT-2 for table-to-text generation and to BART for summarization. They reported that learning only about 0.1 percent of the parameters achieved performance comparable to full fine-tuning in standard data settings, while doing better in low-data regimes and generalizing more robustly to topics not seen during training. Because each task needs only its own small prefix rather than a full copy of the model, one frozen base model can serve many tasks at once.

Prefix-tuning was an early and influential member of the parameter-efficient fine-tuning family, alongside adapters and the later prompt-tuning and LoRA methods. Its practical promise is the same theme that runs through the whole open-weight ecosystem: you can specialize a large model for a new job without paying the cost of retraining or storing the entire thing.

Sources

Last verified June 7, 2026