Learning to walk in minutes with massively parallel deep RL

In September 2021 Nikita Rudin, David Hoeller, Philipp Reist, and Marco Hutter of ETH Zurich published “Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning.” The paper showed that reinforcement-learning training for legged locomotion, which had typically taken hours or days, could be compressed to minutes by running thousands of simulated robots in parallel on a single workstation GPU.

The key idea is throughput. Earlier setups simulated one or a handful of robots; here the physics simulation and the policy network both run on the GPU, letting thousands of robot instances collect experience simultaneously. The authors paired this with a game-inspired curriculum that automatically promotes or demotes each simulated robot to harder or easier terrain based on how well it is doing, so the policy is always trained on appropriately difficult challenges.

Training the quadrupedal ANYmal robot to walk on flat ground took under four minutes, and learning to handle uneven terrain took about twenty minutes, a speedup of multiple orders of magnitude over prior work. The accompanying code, released as Legged Gym on top of NVIDIA’s Isaac Gym, became a widely used foundation for later locomotion research.

For a general reader, this paper matters because it turned legged-robot training from an overnight job into something a researcher can iterate on over a coffee break, dramatically accelerating progress in the field.

Learning to walk in minutes with massively parallel deep RL

Sources

Related