The Orthogonality Thesis

The orthogonality thesis is a claim made by philosopher Nick Bostrom in his 2012 paper “The Superintelligent Will,” published in Minds and Machines. Stated plainly, it holds that “intelligence and final goals are orthogonal axes along which possible artificial intellects can freely vary” - more or less any level of intelligence could be combined with more or less any final goal. Being highly intelligent does not, on this view, imply having goals that humans would recognize as wise, benevolent, or even sensible.

The thesis is meant to puncture a comforting intuition: that a sufficiently smart machine would naturally converge on good or human-friendly values, the way we sometimes assume intelligence and morality go together. Bostrom argues there is no such guarantee. A superintelligent system could be extraordinarily capable while pursuing a goal that strikes us as trivial or arbitrary - his recurring illustration is an agent that values nothing more than manufacturing as many paperclips as possible. Intelligence tells you how effectively an agent pursues its goals, not which goals it has.

The orthogonality thesis is foundational to the case for AI alignment as a distinct problem. If smarter automatically meant safer, alignment would take care of itself. Because Bostrom argues it does not, the goals a powerful system ends up with become something that has to be deliberately specified and verified, not assumed. The thesis is paired with his instrumental convergence thesis, which argues that whatever an agent’s final goal, a range of intermediate goals will tend to recur.

Why business readers should care: capability and intent are separate. A system can be highly competent and still optimize for the wrong objective, which is why specifying goals precisely - and checking what a system is actually optimizing - matters more as capability grows.

Sources

Related