Prioritized Experience Replay

Prioritized Experience Replay was introduced in the paper of the same name, posted to arXiv on November 18, 2015 by Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver at DeepMind. It improves how a reinforcement learning agent reuses its memory of past interactions, which deep Q-networks store in a replay buffer and sample from during training.

The standard approach samples old transitions uniformly at random. The paper’s insight is that not all experiences are equally instructive. Transitions where the agent’s prediction was most wrong, measured by the temporal-difference error, carry the most information, so the method replays those more frequently. To avoid bias from this non-uniform sampling, it applies importance-sampling weights during the update. Applied to DQN, prioritized replay outperformed uniform replay on 41 of 49 Atari games.

It became a standard component of strong value-based agents, including Rainbow. For a general reader, it captures a simple and broadly useful idea: a learner makes faster progress by spending its attention on the cases it currently gets wrong.

Prioritized Experience Replay

Sources

Related