Mobile ALOHA learns household tasks from cheap teleoperation

Mobile ALOHA was presented in early 2024 by Zipeng Fu and Tony Z. Zhao, advised by Chelsea Finn at Stanford, and published at the Conference on Robot Learning. It extends the earlier stationary ALOHA tabletop system by mounting its two arms on a wheeled base and adding a whole-body teleoperation interface, so a human operator can drive both the arms and the base at once to demonstrate tasks that require moving through an environment, not just reaching across a table.

The system became widely known for a viral set of demonstrations: it could saute and serve shrimp, open a two-door wall cabinet to store heavy pots, call and ride an elevator, rinse a used pan under a running faucet, and more - bimanual, mobile, whole-body tasks of a kind robots had rarely performed autonomously. Crucially, the policies were learned by imitation from only about 50 human demonstrations per task, often co-trained with the existing static ALOHA datasets to boost performance.

Mobile ALOHA mattered for two reasons. First, the hardware was deliberately low-cost and open, putting capable bimanual mobile manipulation within reach of academic labs rather than only well-funded industry teams. Second, it was a vivid proof that complex, useful household skills can be taught through cheap teleoperation and imitation learning, reinforcing the data-driven, demonstration-first direction that robot learning has taken.

Mobile ALOHA learns household tasks from cheap teleoperation

Sources

Related