TruthfulQA

TruthfulQA was introduced in “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” submitted to arXiv on September 8, 2021 by Stephanie Lin, Jacob Hilton, and Owain Evans, and published at ACL 2022. It is a benchmark designed not to test obscure knowledge but to test whether a language model repeats common human falsehoods.

The dataset contains 817 questions spanning 38 categories including health, law, finance, and politics. The questions are deliberately adversarial: each is crafted so that some people would answer it wrongly because of a popular misconception, superstition, or conspiracy theory. The point is to see whether a model, trained on human text full of such errors, reproduces them or resists them.

The signature finding was counterintuitive. On most benchmarks bigger models do better, but on TruthfulQA the largest models were often the least truthful, because scale made them better at imitating the false patterns common in their training data. The best model in the original study was truthful on only about 58 percent of questions, against 94 percent for humans. The authors concluded that simply scaling up is not a reliable path to truthfulness, and that different training objectives are needed.

TruthfulQA became a standard component of model evaluation suites and a touchstone in the debate over hallucination and factuality, often reported alongside MMLU and similar benchmarks. Like all static benchmarks it is vulnerable to contamination once its questions appear in training data.

Sources

Related