ARC (AI2 Reasoning Challenge)

The AI2 Reasoning Challenge, known as ARC, was introduced in “Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge,” submitted to arXiv on March 14, 2018 by Peter Clark, Oren Etzioni, and colleagues at the Allen Institute for AI. It is a set of 7,787 natural grade-school science questions drawn from standardized exams, and it is a distinct benchmark from the later ARC-AGI abstraction puzzles created by Francois Chollet.

ARC’s defining feature is its split into two parts. The Challenge Set contains only questions that two strong baseline methods - one based on information retrieval, one on word co-occurrence statistics - both got wrong. The Easy Set holds the remainder. The intent was to isolate questions that resist surface-level matching and require some combination of reasoning, knowledge, and inference, such as understanding cause and effect or applying a science concept to a new situation. Alongside the questions the authors released the ARC Corpus, 14 million science sentences, and several neural baselines.

At release, leading neural models that performed well on other tasks could barely beat random guessing on the Challenge Set, demonstrating how far genuine question-answering still had to go. ARC became a standard benchmark for science reasoning and a regular entry in large language model evaluation suites, where the Challenge Set in particular remained difficult for years.

Sources

Last verified June 7, 2026