Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Diffusion Policy was introduced by Cheng Chi, Zhenjia Xu, Siyuan Feng, Eric Cousineau, Yilun Du, Benjamin Burchfiel, Russ Tedrake, and Shuran Song - a team spanning Columbia University, the Toyota Research Institute, and MIT - in a paper submitted to arXiv on March 7, 2023. It applies the denoising diffusion idea that powers image generators to robot control: rather than predicting an action directly, the policy starts from noise and iteratively refines it into a sequence of actions, conditioned on recent camera observations.

Framing control as conditional denoising solves problems that had plagued behavior-cloning policies. It naturally represents multimodal action distributions - when there are several equally good ways to act, the policy can capture all of them instead of averaging them into something that works for none. It also scales gracefully to high-dimensional action sequences and trains stably. Across 12 tasks drawn from four different robot manipulation benchmarks, Diffusion Policy outperformed prior state-of-the-art robot learning methods by an average of 46.9 percent.

Why business readers should care: Diffusion Policy became one of the default recipes for teaching robots manipulation skills from human demonstrations, and underpins later systems that fold laundry or bus tables. The same generative-model machinery driving image and video tools turned out to be a strong way to generate physical motion, a sign of how broadly the diffusion technique transfers.

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

Sources

Related