Continuous control with deep reinforcement learning (DDPG)

“Continuous control with deep reinforcement learning,” posted to arXiv on September 9, 2015 by Timothy Lillicrap, Jonathan Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra at DeepMind, introduced the algorithm known as Deep Deterministic Policy Gradient, or DDPG. It solved a problem the Deep Q-Network could not: continuous action spaces.

The Deep Q-Network learned to play Atari games but could only choose among a small, fixed set of discrete actions, because it picks the action with the highest predicted value and that search is impractical over a continuous range. Many real control problems, such as setting the torque on a robot’s joints, require continuous outputs. DDPG is an actor-critic, model-free algorithm that pairs a critic, which estimates action values in the style of Q-learning, with an actor that outputs continuous actions directly, following the deterministic policy gradient. It borrowed the stabilizing tricks of the Deep Q-Network, including experience replay and target networks.

The authors demonstrated DDPG on more than 20 simulated physics tasks, including cartpole swing-up, dexterous manipulation, and legged locomotion, learning competitive policies directly from raw pixel inputs in some cases. DDPG became a standard baseline for continuous-control reinforcement learning and a foundation for later off-policy methods.

Continuous control with deep reinforcement learning (DDPG)

Sources

Related