FActScore

FActScore, short for Factual precision in atomicity score, is a method for measuring how factual a long-form generated text is, introduced by Sewon Min and colleagues in a May 2023 paper. Instead of judging a whole paragraph as simply true or false, FActScore breaks the text into a series of atomic facts, single short statements that can each be checked independently, and then computes the percentage of those atomic facts that are supported by a reliable knowledge source such as Wikipedia. This gives a fine-grained precision number rather than a coarse pass or fail.

The original study applied the method to model-generated biographies of people and found a striking result: ChatGPT scored only about 58 percent factual precision on that task, meaning a large fraction of the individual claims it produced were not supported. The authors also built an automated estimator so the scoring could run without humans checking every fact, and used it to evaluate roughly 6,500 generations from 13 language models, finding that GPT-4 and ChatGPT were more factual than open models such as Vicuna and Alpaca at the time.

FActScore matters because it changed how hallucination is measured. Counting supported atomic facts is closer to how a careful human fact-checker works, and it makes the difference between a mostly-right and a mostly-wrong answer visible in a single, comparable number.

Sources

Related