Alignment Research Center (ARC)

The Alignment Research Center (ARC) is a nonprofit research organization focused on aligning future machine learning systems with human interests. Its current research centers on developing a theoretical foundation for mechanistic explanations of neural network behavior, designing algorithms that predict what a network will do by analyzing its weights rather than by extensive sampling, with applications to problems like out-of-distribution prediction and anomaly detection.

ARC was founded by Paul Christiano, a prominent AI alignment researcher. In its early years ARC ran an evaluations team known as ARC Evals, which conducted some of the first systematic tests of whether frontier models like early GPT-4 could autonomously replicate, acquire resources, or otherwise act in the world. That evaluations work later became an independent organization, and ARC’s own site now directs anyone looking for ARC Evals to METR.

The theoretical side of ARC argues that as AI systems grow more powerful, new alignment techniques will be needed that can scale safely across many orders of magnitude of capability, and that mechanistic understanding of models is a promising route to such techniques.

For a general reader, ARC is significant both for its ambitious theoretical agenda and as the birthplace of the dangerous-capability evaluations that now anchor much of frontier AI governance.

Alignment Research Center (ARC)

Sources

Related