On July 10, 2025, the research nonprofit METR (Model Evaluation and Threat Research) published a randomized controlled trial that cut against the prevailing story about AI coding tools. The study measured how early-2025 AI tools affected the productivity of experienced open-source developers working in large, mature codebases they knew well.
In the study, 16 developers completed 246 real tasks drawn from their own repositories, on which they averaged five years of prior experience. Each task was randomly assigned to either allow or forbid AI tools; when allowed, developers mostly used Cursor Pro with Claude 3.5 and 3.7 Sonnet, the frontier models of the moment. The headline result: allowing AI made developers about 19 percent slower, not faster. The effect ran opposite to nearly everyone’s expectations.
The perception gap was as striking as the slowdown. Before the study, the developers forecast that AI would cut their completion time by 24 percent; even after finishing and experiencing the slowdown, they still estimated AI had sped them up by about 20 percent. Outside experts in economics and machine learning had predicted speedups of around 38 to 39 percent. METR was careful to bound the claim: it does not show AI fails to help most developers in general, only that in this specific setting - skilled engineers, familiar and complex repositories, mid-2025 tools - the measured effect was a slowdown. The result became a widely cited caution that self-reported productivity gains can diverge sharply from measured ones.