Product Quantization for Nearest Neighbor Search, by Herve Jegou, Matthijs Douze, and Cordelia Schmid and published in IEEE Transactions on Pattern Analysis and Machine Intelligence in 2011, is a foundational technique for searching enormous collections of high-dimensional vectors quickly and with little memory. Storing millions or billions of full embedding vectors and comparing a query against all of them exactly is too slow and too memory-hungry to be practical.
Product quantization splits each vector into several lower-dimensional sub-vectors and quantizes each sub-vector separately to one of a small set of learned centroids. A full vector is then represented by a short code: just the list of centroid indices for its parts. Distances between a query and the database can be estimated directly from these compact codes using precomputed lookup tables, so search becomes fast and the index fits in a fraction of the original memory, all while the Cartesian-product structure keeps the approximation accurate.
The method is a cornerstone of modern vector search and is built into the widely used FAISS library that the same group later released.
For a business, product quantization is a big part of why semantic search and retrieval-augmented systems can run over very large document sets at acceptable cost: it shrinks the index dramatically while keeping results close to exact.