TD-Gammon taught itself backgammon by self-play

fact

In “Temporal Difference Learning and TD-Gammon” (Communications of the ACM, March 1995), Gerald Tesauro reported that his neural network learned backgammon almost entirely through self-play using temporal-difference learning, with no human coaching. Former World Champion Bill Robertie assessed that the program “plays at a strong master level,” and elite player Kit Woolsey said “its positional judgment is far better than mine.” This learning-from-self-play approach is a direct ancestor of later systems like AlphaGo.

Sources

PRIMARY https://www.bkgm.com/articles/tesauro/tdl.html

Last verified June 6, 2026

<- Back to the AI Library

TD-Gammon taught itself backgammon by self-play

Sources

Related