Paul Christiano is an AI safety researcher known for foundational work on reinforcement learning from human feedback (RLHF) and for theoretical research on aligning advanced AI with human interests. He holds a Ph.D. in computer science from UC Berkeley and a B.S. in mathematics from MIT.
At OpenAI, Christiano led the language-model alignment team and pioneered RLHF, the technique of training models against human preference judgments that underpins ChatGPT, Claude, and most modern instruction-following systems. He co-authored the influential 2017 paper “Deep Reinforcement Learning from Human Preferences” and the 2016 “Concrete Problems in AI Safety.” He left OpenAI in 2021 to focus on more conceptual alignment questions and founded the Alignment Research Center (ARC), where he launched the third-party frontier-model evaluation effort that later became METR.
In 2024 Christiano became Head of AI Safety at the US AI Safety Institute within NIST, where he designs and conducts evaluations of frontier models for capabilities of national security concern. He was named to TIME’s 100 most influential people in AI in 2023.