In 1961 the British researcher Donald Michie - who had worked with Alan Turing at Bletchley Park - built a machine-learning system out of matchboxes. He called it MENACE, the Matchbox Educable Noughts And Crosses Engine. There was no computer involved: MENACE was about 300 matchboxes, one for each distinct board position a player might face in tic-tac-toe, and each box was filled with colored beads, one color per possible move from that position.
To make a move, you found the box for the current board, shook it, and let one bead drop at random; its color told you where to play. The learning came afterward. If MENACE won the game, you put extra beads of the played colors back into each box it had used, making those moves more likely next time. If it lost, you removed the beads it had played; a draw earned a smaller reward. Over a few hundred games the bead populations shifted so that good moves were heavily represented and bad ones disappeared, and MENACE went from playing randomly to playing tic-tac-toe well. Michie described these experiments and a computer simulation in his published work on game-learning.
MENACE is a strikingly clear physical illustration of reinforcement learning: try moves, get a reward or penalty at the end, and adjust the odds of each action accordingly - the same loop that underlies modern systems from TD-Gammon to game-playing agents like AlphaGo. That the whole idea could be shown with matchboxes and beads, years before such ideas were common, is part of why MENACE is still used to teach machine learning today.