When Epoch AI released the FrontierMath benchmark of research-level math problems in November 2024, it was widely read as an independent yardstick for advanced mathematical reasoning. In December 2024, OpenAI announced that its new o3 model had made a large leap on FrontierMath, and Sam Altman cited the result publicly. Only around that announcement did it emerge that OpenAI had funded the benchmark.
In a January 23, 2025 statement, Epoch AI laid out the actual arrangement: OpenAI had commissioned Epoch to produce 300 advanced math problems and retained ownership of them, with access to the problems and their solutions - except for a 50-problem holdout set, for which OpenAI received only the problem statements, not the solutions, so independent verification remained possible. Epoch said it had been required to get OpenAI’s permission before disclosing the partnership, and that it had received that permission around the o3 announcement.
The friction was about transparency, not just money. Some mathematicians who wrote problems for FrontierMath said they had not been told an AI company was funding the work or would have access to the benchmark, and indicated they might not have participated had they known. Epoch acknowledged it “made a mistake in not being more transparent” and that its communication should have been more systematic, while pointing to the holdout set as a safeguard against the obvious concern - that a model owner with access to the test could inflate its scores.
Why business readers should care: the episode is a concrete illustration of why benchmark independence and disclosed conflicts of interest matter. When the party announcing a record score also funded and can see the test, the headline number needs scrutiny - and the same caution applies to any vendor citing its own evaluations.