Deep Learning Recommendation Model (DLRM)

In 2019 Maxim Naumov and colleagues at Facebook (now Meta) released the Deep Learning Recommendation Model, or DLRM, along with an open-source implementation. The paper is notable less for a single clever layer than for being a clear, production-grounded account of how a company that serves recommendations to billions of people structures the model and the systems around it.

DLRM handles two kinds of inputs. Dense numerical features pass through a small neural network, while sparse categorical features (such as which user, which item, which page) are each mapped through an embedding table into a learned vector. The model then computes pairwise interactions between all these vectors and feeds the result through more layers to predict, for instance, the probability of a click. The hard part is scale: the embedding tables can hold billions of entries and dominate memory, while the dense layers dominate computation. DLRM addresses this with a hybrid parallelism scheme, splitting the giant embedding tables across machines (model parallelism) while replicating the smaller dense network (data parallelism).

By open-sourcing both the model and a benchmark, Meta gave the field a realistic reference point for industrial-scale recommendation, and DLRM became a standard system used in hardware and training-efficiency research.

For a business reader, DLRM is a window into what consumer-internet recommendation really costs: the intelligence lives in enormous tables of learned vectors, and serving it is as much a distributed-systems problem as a modeling one.

Deep Learning Recommendation Model (DLRM)

Sources

Related