Student of Games unifies perfect and imperfect games

For most of the history of game-playing AI, the strongest systems specialized. AlphaZero and MuZero dominated perfect-information games such as chess and Go, where everything is on the board, while separate systems like DeepStack and Libratus handled imperfect-information games such as poker, where players hide cards. Student of Games, described by Martin Schmid and colleagues in Science Advances in November 2023, was the first single learning algorithm to reach strong performance across both kinds of game.

The algorithm combined guided search, self-play learning, and game-theoretic reasoning into one method. The authors reported that it reached strong play in chess and Go, beat the strongest openly available agent at heads-up no-limit Texas hold’em poker, and defeated the state-of-the-art agent at Scotland Yard, a hidden-information board game. They also proved the method is sound, converging toward perfect play as computation and model capacity grow.

The work was positioned as a step toward more general game-playing agents - systems that do not need to be rebuilt from scratch for each new game, and that can handle the hidden information and bluffing of real strategic settings as readily as the open board of chess. It drew directly on the search-and-self-play lineage of AlphaZero and the imperfect-information techniques of DeepStack.

Student of Games unifies perfect and imperfect games

Sources

Related