OpenAI's o3 jumped ARC-AGI from about 5 percent to the high 80s in one model generation

fact December 20, 2024

ARC-AGI is a benchmark of abstract visual puzzles designed by Francois Chollet to resist memorization and reward genuine on-the-fly reasoning. When OpenAI previewed o3 on December 20, 2024, it scored 75.7 percent on the semi-private ARC-AGI-1 set in a low-compute configuration and 87.5 percent when allowed far more compute to think. As the ARC Prize team noted, “ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o.” The jump from roughly 5 percent to the high 80s in a single model generation was the most dramatic single-benchmark leap of the reasoning era.

Sources

PRIMARY https://arcprize.org/blog/oai-o3-pub-breakthrough

Last verified June 6, 2026

<- Back to the AI Library

OpenAI's o3 jumped ARC-AGI from about 5 percent to the high 80s in one model generation

Sources

Related