Suphx masters four-player mahjong

Suphx, built by Microsoft Research Asia, was the first AI program to reach a level above most top human players at mahjong, a four-player game with hidden tiles, chance, and complex scoring. The team described it in “Suphx: Mastering Mahjong with Deep Reinforcement Learning,” submitted to arXiv on March 30, 2020.

The system was evaluated on Tenhou, a major Japanese online mahjong platform. Suphx reached a stable rank that placed it above 99.99 percent of all officially ranked human players there, stronger in terms of stable rank than most of the platform’s top competitors. It was trained with deep reinforcement learning and three techniques the paper highlights: global reward prediction, which connects individual rounds to the eventual outcome of a full game; oracle guiding, where a teacher network that can see hidden information helps train the normal network; and run-time policy adaptation, which tunes play to the specific situation during a match.

Mahjong is harder for AI than two-player poker in several ways: four players instead of two, a large set of hidden tiles, frequent draws of random tiles, and scoring rules that make the link between a single decision and the final result indirect. Clearing it extended the run of imperfect-information games - poker, Diplomacy, mahjong - that machine learning conquered after the perfect-information games of chess, checkers, and Go.

Sources

Last verified June 7, 2026