Product Quantization for Nearest Neighbor Search

Product Quantization for Nearest Neighbor Search, by Herve Jegou, Matthijs Douze, and Cordelia Schmid and published in IEEE Transactions on Pattern Analysis and Machine Intelligence in 2011, is a foundational technique for searching enormous collections of high-dimensional vectors quickly and with little memory. Storing millions or billions of full embedding vectors and comparing a query against all of them exactly is too slow and too memory-hungry to be practical.

Product quantization splits each vector into several lower-dimensional sub-vectors and quantizes each sub-vector separately to one of a small set of learned centroids. A full vector is then represented by a short code: just the list of centroid indices for its parts. Distances between a query and the database can be estimated directly from these compact codes using precomputed lookup tables, so search becomes fast and the index fits in a fraction of the original memory, all while the Cartesian-product structure keeps the approximation accurate.

The method is a cornerstone of modern vector search and is built into the widely used FAISS library that the same group later released.

For a business, product quantization is a big part of why semantic search and retrieval-augmented systems can run over very large document sets at acceptable cost: it shrinks the index dramatically while keeping results close to exact.

Product Quantization for Nearest Neighbor Search

Sources

Related