Matryoshka Representation Learning

Matryoshka Representation Learning (MRL), published by Aditya Kusupati and collaborators in May 2022, addresses a tradeoff in embeddings: longer vectors are more accurate but cost more to store and search, while shorter vectors are cheaper but less precise. Normally you pick one size and train a model for it. MRL instead trains a single embedding so that its information is nested coarse-to-fine, like the Russian dolls it is named after. The first few dimensions carry the most important signal, the next dimensions add finer detail, and so on.

The practical payoff is that a downstream task can simply truncate the embedding to whatever length it can afford and still get a usable representation, with no separate model and no extra inference cost. The paper reports that embeddings could be cut to roughly one-fourteenth of their size while preserving accuracy, yielding up to 14x faster real-world retrieval, and even improving few-shot classification by up to 2 percent. The results held across vision, vision-language, and language models.

MRL has since been adopted by major commercial embedding APIs, which is why some models now let you request shorter embeddings on demand. For a business, it means you can dial the cost-accuracy tradeoff of a search or RAG system after the fact, choosing smaller, cheaper vectors where speed matters and longer ones where precision matters.

Matryoshka Representation Learning

Sources

Related