“Parameter-Efficient Transfer Learning for NLP” was submitted to arXiv on February 2, 2019 by Neil Houlsby and colleagues at Google. It introduced adapter modules: small, new neural network layers inserted between the existing layers of a large pretrained model such as BERT. During fine-tuning the original model weights are frozen and only the adapters are trained, so the bulk of the network is shared unchanged across every downstream task.
The authors adapted BERT to 26 text classification tasks and reported coming within 0.4 percent of the performance of full fine-tuning while adding only about 3.6 percent new parameters per task. Full fine-tuning, by contrast, requires training and storing a complete copy of the model for each task. Adapters made it practical to support many tasks from one base model without the storage blowup.
This work is widely credited as the origin of the parameter-efficient fine-tuning idea that later flourished with prefix-tuning, prompt tuning, and LoRA. The lesson it established, that you can specialize a frozen giant by training a tiny add-on, became central to how the open-weight model ecosystem is customized today.