At the ACM Recommender Systems conference in 2016, Google engineers Paul Covington, Jay Adams, and Emre Sargin published “Deep Neural Networks for YouTube Recommendations.” The paper described how one of “the largest scale and most sophisticated industrial recommendation systems in existence” worked, and how deep learning had reshaped it.
The system was organized as two deep neural networks in sequence. A candidate generation model first narrowed YouTube’s enormous catalog down to a few hundred videos likely to interest a given user, drawing on signals such as watch history, search history, and context. A separate ranking model then scored those candidates in detail to order what the viewer actually saw. This two-stage “funnel” - a fast, broad recall step followed by a precise ranking step - became a template that recommendation teams across the industry adopted.
The paper was notable partly because large platforms rarely describe their recommendation machinery in public. It documented practical lessons, such as how representing each video and search token as a learned embedding, and adding a feature for the “age” of a training example, improved freshness and accuracy. The recommendation engine it described drives a large share of the time people spend on YouTube.
For business readers, this paper is a rare look inside the AI that decides what billions of people watch next. It is also a reference design: the candidate-generation-then-ranking pattern it popularized now sits behind feeds and “for you” recommendations across many consumer products.