Human Compatible (Stuart Russell, 2019)

“Human Compatible: Artificial Intelligence and the Problem of Control” is a 2019 book by Stuart Russell, a computer science professor at UC Berkeley and co-author of the standard AI textbook “Artificial Intelligence: A Modern Approach.” It was published by Viking in October 2019 (ISBN 9780525558637).

Russell’s argument centers on what he calls the “standard model” of AI: building machines that optimize a fixed, human-specified objective as effectively as possible. He contends this model is fundamentally unsafe, because for any sufficiently capable system a slightly wrong objective leads to harmful behavior, and specifying a complete and correct objective is effectively impossible. A machine confidently pursuing a fixed goal also has an incentive to resist being corrected or switched off.

In place of the standard model he proposes “provably beneficial” AI built on three principles: the machine’s only objective is to maximize the realization of human preferences; the machine is initially uncertain about what those preferences are; and the ultimate source of information about human preferences is human behavior. The crucial element is the second principle. A machine that knows it does not fully know what humans want will ask, defer, and accept correction, because human actions - including the act of switching it off - are evidence about the preferences it is trying to satisfy. This formalized intuition draws directly on Russell’s earlier technical work, including the off-switch game.

The book popularized this reframing of AI safety for a general audience and helped move the “control problem” from the margins into mainstream discussion among AI researchers, building on themes that trace back to Norbert Wiener’s 1960 warnings about machines pursuing literal goals.

Human Compatible (Stuart Russell, 2019)

Sources

Related