YaRN: Efficient Context Window Extension of Large Language Models

YaRN (Yet another RoPE extensioN method), published by Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole in August 2023, is a technique for extending how far an already-trained language model can read. Many models use rotary position embeddings (RoPE) to encode where each token sits in a sequence. RoPE works well within the training length but breaks down on sequences longer than the model ever saw, which caps the usable context window.

YaRN modifies how RoPE’s rotation frequencies are scaled when stretching to longer contexts, doing so in a way that preserves quality far better than naive position interpolation. Crucially, it is compute-efficient: the authors report achieving strong long-context performance with roughly 10 times fewer tokens and 2.5 times fewer training steps than prior extension methods, and the resulting models can even generalize beyond the specific lengths used during fine-tuning.

Because it is cheap and effective, YaRN was quickly adopted by the open-source community to extend models like LLaMA to much longer context windows. For a business, YaRN is a practical reason that long-context versions of open models appeared so fast: extending an existing model’s reach turned out to be far cheaper than training a long-context model from scratch.

YaRN: Efficient Context Window Extension of Large Language Models

Sources

Related