In 1997 Wolfram Schultz, Peter Dayan, and P. Read Montague published “A Neural Substrate of Prediction and Reward” in Science (Vol. 275, pages 1593-1599). The paper connected a body of single-cell recordings from dopamine neurons in the primate midbrain to the mathematics of temporal-difference learning that had grown up inside machine learning.
The central observation was that dopamine neurons do not simply fire when an animal receives a reward. Instead their fluctuating output tracks the difference between rewards that were expected and rewards that actually arrived. An unexpected reward drives a burst of firing; a fully predicted reward produces little response; and a predicted reward that fails to appear produces a dip below baseline at the moment it was due. This pattern is exactly the reward prediction error signal used in temporal-difference algorithms to update value estimates.
The result was a rare two-way bridge between neuroscience and AI. A learning rule invented to make machines learn from delayed feedback turned out to describe, with surprising precision, a chemical signal the brain had been using all along. The work reframed dopamine from a generic pleasure chemical into a specific teaching signal and gave reinforcement-learning researchers biological grounding for their methods.
For a general reader, this is one of the clearest cases where an engineering idea and a biological mechanism converged on the same equation, which is part of why reinforcement learning is taken seriously as a model of how both animals and machines learn from experience.