Eureka: Human-Level Reward Design via Coding Large Language Models

Eureka was introduced by Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar - a team from NVIDIA and the University of Pennsylvania - in a paper submitted to arXiv on October 19, 2023 and accepted to ICLR 2024. It attacks one of the hardest parts of reinforcement learning for robots: designing the reward function. A reward that does not exactly capture the intended behavior leads to agents that game it or never learn the task, and tuning rewards by hand is slow expert work.

Eureka hands that job to a large language model. It prompts GPT-4 to write reward functions as executable code, runs the resulting RL training, feeds the outcomes back as feedback, and evolves better reward code over successive rounds - all without task-specific prompt templates or human reward priors. Tested across 29 open-source RL environments spanning 10 distinct robot morphologies, Eureka outperformed human-expert reward designs on 83 percent of tasks, with an average normalized improvement of 52 percent.

Its most striking demonstration was a simulated Shadow Hand learning to spin a pen in rapid circles, a dexterous skill that is notoriously difficult to specify a reward for by hand. Eureka pointed at a broader pattern: language models can serve not just as planners or perception modules for robots, but as automated designers of the training signal itself.

Eureka: Human-Level Reward Design via Coding Large Language Models

Sources

Related