AlphaGo Zero learns Go from scratch

In October 2017, DeepMind announced AlphaGo Zero, a new version of its Go-playing system, in the journal Nature and on its own blog. The earlier AlphaGo that famously beat the world champion Lee Sedol in 2016 had been trained in part on a large database of human games. AlphaGo Zero discarded that crutch entirely. As DeepMind put it, the program “learns to play simply by playing games against itself, starting from completely random play,” with no human game data and no human guidance beyond the rules of Go.

The results were dramatic. Starting from random moves, AlphaGo Zero taught itself Go and, within three days, defeated the published version that had beaten Lee Sedol by 100 games to nil. DeepMind described it as having compressed “thousands of years of human knowledge” into a few days of self-play. Along the way it rediscovered well-known human strategies, discarded some, and invented new ones of its own, suggesting that human play had only ever explored part of the game.

Technically, AlphaGo Zero was also simpler than its predecessor. It used a single neural network instead of two, learned purely by reinforcement from the outcomes of its own games, and combined that network with a tree search that improved as the network improved. Removing human data did not weaken it; it made it both stronger and cleaner.

Why business readers should care: AlphaGo Zero showed that, in a closed world with clear rules and a clear measure of success, an AI can surpass all human knowledge by learning entirely from its own experience. That self-play recipe was generalized months later into AlphaZero, which mastered chess and shogi too, and it remains a touchstone for how far pure trial-and-error learning can reach when the problem is well defined.

AlphaGo Zero learns Go from scratch

Sources

Related