SuperGLUE

SuperGLUE is the follow-up to GLUE, the influential collection of language-understanding tasks. GLUE was designed to be hard, but within about a year of its release models had pushed past the level of non-expert humans on it, leaving little room to measure further progress. SuperGLUE was created to restore headroom with a set of more difficult tasks in the same format.

The benchmark was introduced by Alex Wang, Samuel R. Bowman, and colleagues in “SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems,” posted in May 2019. It packages a new suite of harder language-understanding tasks together with a software toolkit and a public leaderboard, keeping the single-number style of GLUE so the two are easy to compare.

The rapid saturation of GLUE, and then of SuperGLUE, became a frequently cited example of how quickly the field moves: benchmarks that look challenging at launch can be largely solved within a year or two.

For a general reader, SuperGLUE is part of the foundational era of language-model evaluation, before chat assistants, when the central question was simply how well machines could understand written English.

Sources

Last verified June 7, 2026