EfficientZero was introduced in “Mastering Atari Games with Limited Data,” posted to arXiv on October 30, 2021 by Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, and Yang Gao. It attacks one of deep reinforcement learning’s most stubborn weaknesses: even strong agents typically need enormous amounts of experience, often hundreds of millions of frames, to reach high performance.
Built on the MuZero architecture, EfficientZero adds techniques to squeeze far more learning out of each interaction, including a self-supervised consistency objective for the learned model and corrections for how the agent estimates future value from short data. On the demanding Atari 100k benchmark, which limits the agent to roughly two hours of real-time gameplay, it reached 194 percent of mean human performance, approaching what older agents needed about 500 times more data to achieve.
Sample efficiency is what stands between flashy game-playing demos and real-world deployment, where every interaction has a cost. For a business reader, EfficientZero points toward reinforcement learning that could be trained on the limited data available in physical or commercial settings rather than in unlimited simulation.