GPT detectors falsely flag non-native English writers

After ChatGPT’s launch, a market of “AI detector” tools sprang up promising to tell whether a piece of writing was produced by a human or a language model, and schools began using them to police student work. In a paper first posted to arXiv on April 6, 2023, and later published in the journal Patterns, Weixin Liang and colleagues at Stanford tested several widely used GPT detectors and found a serious, systematic bias.

The detectors consistently misclassified writing by non-native English speakers as AI-generated. On a set of TOEFL essays written by non-native speakers, more than half were wrongly flagged as machine-written, while essays by US students were classified with near-perfect accuracy. The reason is that the detectors lean on “perplexity,” a measure of how predictable the text is; non-native writers tend to use simpler and more predictable vocabulary, which the tools mistake for machine output. The authors also showed the detectors were easy to fool with simple prompting, undercutting their reliability in the other direction.

The paper concluded with a warning against deploying such detectors in evaluative or educational settings, where they may unfairly penalize or exclude non-native English speakers.

The lesson is concrete and immediate: a tool that is wrong in a biased way is worse than no tool, because it concentrates harm on an already disadvantaged group. Institutions reaching for an automated shortcut to a hard judgment - did a human write this - should first ask who pays for the errors.

GPT detectors falsely flag non-native English writers

Sources

Related