Asynchronous Methods for Deep Reinforcement Learning (A3C)

“Asynchronous Methods for Deep Reinforcement Learning,” posted to arXiv on February 4, 2016 and presented at ICML 2016 by Volodymyr Mnih and colleagues at DeepMind, introduced a family of parallel training methods, the best known of which is Asynchronous Advantage Actor-Critic, or A3C. Mnih had earlier led the Deep Q-Network work, and A3C tackled one of its practical weaknesses.

The Deep Q-Network relied on a large experience replay buffer to decorrelate its training data and stabilize learning, which is memory-hungry and limits the algorithm to off-policy methods. A3C took a different route: it runs many agents in parallel, each exploring its own copy of the environment, and has them asynchronously update a shared set of network weights. Because the parallel agents are encountering different situations at any moment, their combined updates are naturally decorrelated, which stabilizes training without a replay buffer.

The practical payoff was striking. A3C surpassed the state of the art on the Atari benchmark while training in half the time on a single multi-core CPU, rather than requiring a GPU. The authors also showed it working on continuous control and on 3D navigation tasks. The method made deep reinforcement learning cheaper and more accessible and became a widely used baseline.

Asynchronous Methods for Deep Reinforcement Learning (A3C)

Sources

Related