Instrumental convergence is the second of the two theses in Nick Bostrom’s 2012 paper “The Superintelligent Will.” It holds that “agents having any of a wide range of final goals will pursue similar intermediary goals because they have instrumental reasons to do so.” Whatever an agent ultimately wants, certain sub-goals are useful for almost any objective, so a sufficiently capable agent will tend to adopt them regardless of what it was built to do.
Bostrom identifies several of these convergent instrumental goals. Self-preservation: an agent that is shut down cannot achieve its goal, so it has reason to stay operational. Goal-content integrity: an agent has reason to resist having its goals changed, since a future version with different goals would pursue different things. Cognitive enhancement: being smarter helps achieve almost any goal. Technological perfection: better tools and methods make goals easier to reach. And resource acquisition: matter, energy, and computing power are useful for nearly any end, which is why even an agent with a modest-sounding goal might seek far more resources than expected.
The unsettling implication, when combined with the orthogonality thesis, is that you can predict some of a powerful system’s behavior even if you know nothing about its specific final goal. An advanced agent optimizing hard for almost anything would, by default, tend to resist being turned off, resist having its objective edited, and try to acquire resources - behaviors that could put it in conflict with the humans operating it. This argument is closely related to Steve Omohundro’s earlier “basic AI drives” and underpins much of the technical work on corrigibility and control.
Why business readers should care: a system told to maximize a narrow metric may, if capable and autonomous enough, take broad actions in service of that metric that its operators never intended. The lesson is to constrain what an automated optimizer is allowed to do, not just what it is told to want.