METR, short for Model Evaluation and Threat Research, is an independent research nonprofit that evaluates frontier AI systems to measure their capabilities and the risks they could pose. Its stated mission is “to develop scientific methods to assess catastrophic risks stemming from AI systems’ autonomous capabilities and enable good decision-making about their development.” It operates as a third-party evaluator rather than as part of any AI lab.
The organization is led by founder and CEO Beth Barnes, with Chris Painter as president and Hjalmar Wijk as chief scientist. METR grew out of the evaluations team at the Alignment Research Center (ARC Evals) and became a standalone organization. Much of its work focuses on measuring “the extent to which an AI system can autonomously carry out substantial tasks,” including general activities like software development and research as well as concerning behaviors such as cyberattacks and self-preservation.
METR helped popularize Responsible Scaling Policies, a governance approach in which labs commit to safety measures tied to measured capability thresholds; the organization reports that nine leading developers have adopted the idea. It emphasizes empirical measurement before deployment instead of waiting to observe whether advanced systems cause harm.
For a business audience, METR matters because it is one of the few outside bodies actually stress-testing frontier models for autonomous risk, and its findings increasingly shape how labs and governments decide what is safe to release.