Reinforcement learning (RL) trains an agent to make a sequence of decisions by interacting with an environment and receiving rewards or penalties. The canonical reference is Richard Sutton and Andrew Barto’s textbook “Reinforcement Learning: An Introduction,” hosted free on Sutton’s own site, which frames the problem as an agent learning what to do to maximize a numerical reward signal over time.
Unlike supervised learning, there are no labeled correct answers. The agent learns by trial and error, discovering which actions lead to better long-term outcomes. A landmark algorithm is Q-learning, introduced by Christopher Watkins (Machine Learning, 1992, DOI 10.1007/BF00992698), which lets an agent learn the value of actions without a model of the environment.
RL has powered systems that play games at superhuman level and is increasingly used to fine-tune AI assistants based on human feedback. It tends to need many trials, which is why much RL research happens in simulators where trials are cheap.
Why business readers should care: Reinforcement learning suits problems involving sequential decisions and delayed payoff, such as pricing, logistics, and recommendation. It is also the technique behind aligning chat assistants to human preferences, making it central to how modern AI is shaped to be helpful.