The 2021 paper “Measuring Mathematical Problem Solving With the MATH Dataset” introduces MATH as “a new dataset of 12,500 challenging competition mathematics problems.” Each problem ships with a complete step-by-step solution, so the dataset can be used both to evaluate models and to train them to produce worked derivations rather than only final answers.