AIME (American Invitational Mathematics Examination)

The American Invitational Mathematics Examination (AIME) is a competition run by the Mathematical Association of America (MAA) for top high-school mathematics students. Per the MAA, it is “a 15-question, 3-hour examination, in which each answer is an integer number between 0 to 999.” Students reach it by excelling on the earlier AMC 10 and AMC 12 contests - roughly the top few percent - and strong AIME scores feed into the selection pipeline for the USA Mathematical Olympiad and ultimately the International Mathematical Olympiad. The integer-only answer format means there is no partial credit and no ambiguity about whether an answer is right.

That clean, unambiguous answer format is exactly why the AIME became a favored AI benchmark. Each problem demands genuine multi-step reasoning, yet the answer is a single integer that is trivial to score automatically, with no room for a model to be graded generously. As reasoning-focused models emerged, labs began reporting AIME results as a headline measure of advanced mathematical reasoning. The DeepSeek-R1 paper, for example, reports AIME among the competition-mathematics tasks on which its reinforcement-learning-trained reasoning model excels.

Because AIME problems are publicly released each year, the benchmark also illustrates the contamination concern: a model trained on internet text may have seen past AIME questions, so labs increasingly report results on the most recent year’s problems to reduce the chance that scores reflect memorization.

For business readers, AIME scores are best read as a proxy for a model’s ceiling on hard, multi-step reasoning rather than for everyday usefulness. A high AIME number signals strength at exactly the kind of structured, logical problem-solving that competition math rewards - impressive, but not the same thing as reliability on messy real-world tasks. Current scores are published by model vendors and leaderboards and change frequently, so they are not reproduced here.

Sources

Last verified June 6, 2026