The Winograd Schema Challenge was proposed by Hector Levesque, Ernest Davis, and Leora Morgenstern at the 2012 Conference on Principles of Knowledge Representation and Reasoning (KR-2012), as a sharper, harder-to-game alternative to the Turing test. It is named after Terry Winograd, who decades earlier had pointed to the kind of ambiguous sentence it uses.
A Winograd schema is a pair of sentences that differ by one or two words and contain an ambiguous pronoun whose correct referent flips depending on that change. The canonical example is “The city councilmen refused the demonstrators a permit because they feared/advocated violence”: with “feared,” “they” means the councilmen; with “advocated,” it means the demonstrators. Resolving the pronoun requires world knowledge about who would fear and who would advocate, not just grammar.
The challenge was deliberately designed so that the answers are obvious to humans but resistant to shortcuts - the sentence pairs are constructed to defeat simple statistical word-association tricks, forcing a system to actually reason about the situation. That made it, for several years, a respected probe of machine commonsense.
Large language models eventually scored very well on the original schema set, which both demonstrated their progress and exposed a recurring problem in AI evaluation: once a benchmark is public and models are trained on web-scale text, contamination and pattern-matching can inflate scores without proving genuine reasoning. The challenge’s history is now a standard cautionary tale about how quickly a hard benchmark can be saturated.