RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

This is the first lecture of David Silver’s Reinforcement Learning course, taught at University College London and posted on the official Google DeepMind YouTube channel. Silver led the AlphaGo project at DeepMind, and his ten-lecture course became the standard introduction to reinforcement learning for a generation of students and practitioners.

Lecture 1 lays out the basic vocabulary: the agent and environment, the reward signal, states and actions, and the central idea that an agent learns by trial and error to maximize cumulative reward over time. Silver distinguishes reinforcement learning from supervised and unsupervised learning, introduces the notions of policy, value function, and model, and frames the exploration-versus-exploitation trade-off that runs through the rest of the course.

It is a practitioner-level lecture that assumes comfort with basic probability but builds the rest from first principles. As a primary source it is valuable because it is taught firsthand by one of the people who turned reinforcement learning from a niche subject into the engine behind AlphaGo and AlphaZero.

Sources

Last verified June 7, 2026