Mind2Web, introduced in “Mind2Web: Towards a Generalist Agent for the Web” on arXiv on June 9, 2023 by Xiang Deng, Yu Gu, Boyuan Zheng, and colleagues at Ohio State University, is described by its authors as the first dataset for building and evaluating generalist web agents that can follow plain-language instructions to complete tasks on any website. Rather than testing on a handful of simplified, hand-built pages, it targets the real, messy web.
The benchmark contains more than 2,000 open-ended tasks collected across 137 real websites spanning 31 domains, each task paired with a crowdsourced sequence of actions a human took to complete it. Because the sites are real, an agent must cope with cluttered HTML, ads, and inconsistent layouts. The paper also reported baseline agents that combine a large language model with HTML filtering, and showed they can do reasonably well but still fall well short of reliably generalizing to websites and domains they have not seen.
Mind2Web matters because so much valuable software has no API, only a web interface meant for humans. A benchmark of realistic, cross-site tasks gave researchers a shared yardstick for the goal of an agent that can book travel, fill forms, or pull reports on whatever site a user points it at. For organizations, that is the difference between automation that works only on pre-wired integrations and an assistant that can operate the tools people actually use.