MathVista

MathVista tests a skill that combines two hard problems: understanding an image and reasoning mathematically about what it shows. The questions present math in visual form, such as charts, geometric figures, function plots, and scientific diagrams, so a model must read the picture accurately and then carry out the right calculation or logical step.

The benchmark was introduced by Pan Lu, Hritik Bansal, Jianfeng Gao, Kai-Wei Chang, and colleagues in “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts,” posted in October 2023 and accepted to ICLR 2024. It contains 6,141 examples assembled from 28 existing multimodal datasets plus three newly built ones called IQTest, FunctionQA, and PaperQA.

At release, the strongest system tested, GPT-4V, reached 49.9 percent accuracy, still 10.4 percentage points below human performance. That gap highlighted how much harder visual-mathematical reasoning is than either skill alone.

For a general reader, MathVista matters because real-world documents are full of charts and figures. A model that can answer questions from a spreadsheet or a financial graph is far more useful than one that only handles plain text.

Sources

Related