A Distributional Perspective on RL (C51)

Distributional reinforcement learning was put on a firm footing in “A Distributional Perspective on Reinforcement Learning,” posted to arXiv on July 21, 2017 by Marc G. Bellemare, Will Dabney, and Remi Munos at DeepMind. Conventional value-based methods estimate the expected return of an action, a single number. This paper argues for modeling the entire distribution of possible returns instead.

The motivation is that two actions can have the same average payoff but very different risk profiles, and an agent that tracks the whole distribution captures information that an averaged estimate discards. The authors prove theoretical results for this view of the Bellman equation and propose a practical algorithm that represents the return distribution over 51 fixed points, which is why it is commonly called C51. The approach reached state-of-the-art performance on the Arcade Learning Environment.

Distributional RL became one of the six ingredients in the Rainbow agent and seeded a productive line of follow-up work. For a general reader, it reframes a basic question: instead of asking only what reward to expect, the agent learns the range of outcomes it might face.

A Distributional Perspective on RL (C51)

Sources

Related