The Reversal Curse

“The Reversal Curse: LLMs trained on ‘A is B’ fail to learn ‘B is A’” was submitted to arXiv on September 21, 2023 by Lukas Berglund, Owain Evans, and colleagues. It documents a surprisingly basic failure of generalization in large language models: learning a fact in one direction does not automatically teach the model the same fact in reverse.

The experiments fine-tuned models such as GPT-3 and Llama-1 on invented statements of the form “A is B” - for example, that a fictional person holds some fictional title. The models learned these forward statements well but could not answer the reversed question at better than chance, even though the information is logically identical. The authors then checked real-world knowledge: GPT-4 correctly named a celebrity’s parent about 79 percent of the time when asked in the trained direction, but could identify the celebrity from the parent only about 33 percent of the time.

The finding is striking because the relationship is symmetric to any human - if Valentina Tereshkova was the first woman in space, then the first woman in space was Valentina Tereshkova. The curse reflects how next-token prediction stores associations directionally rather than as bidirectional facts. Notably, it applies to knowledge baked in during training; when the reversed information is present in the immediate context, models handle it fine.

Why business readers should care: the reversal curse is a concrete reminder that language models do not store facts the way a database or a person does, so systems that depend on reliable factual recall need retrieval or structured knowledge rather than trusting the model to reason symmetrically.

Sources

Related