MuZero mastered Go, chess, shogi, and Atari without ever being told the rules

DeepMind’s earlier AlphaGo, AlphaGo Zero, and AlphaZero were each handed a perfect model of the game: they knew the rules and could simulate the consequences of any move. MuZero, published in Nature in December 2020, removed that assumption. As DeepMind explained, it masters its games “without needing to be told the rules,” learning an internal model that predicts only the three things that matter for deciding what to do next - a position’s value, the best-looking move, and the reward. It matched AlphaZero’s superhuman play at Go, chess, and shogi while also reaching state-of-the-art results on the Atari benchmark.