Faith and Fate: Limits of Transformers on Compositionality

“Faith and Fate: Limits of Transformers on Compositionality” is a May 2023 paper led by Nouha Dziri, with senior authors including Yejin Choi. It asks why transformer language models can ace many hard-looking tasks yet stumble on problems that seem simple, like multiplying multi-digit numbers. The team studied three compositional tasks, multi-digit multiplication, logic grid puzzles, and a dynamic-programming problem, and represented each as a computation graph so they could measure exactly how many reasoning steps a correct solution requires.

Their conclusion is that transformers do not reason compositionally in the way the tasks demand. Instead, the models reduce multi-step reasoning into what the authors call linearized subgraph matching: they recognize and stitch together patterns they saw during training rather than executing the underlying procedure. This works when a problem resembles training examples but breaks down as the required depth of composition grows, which explains the abrupt failures on larger instances. The paper also showed that performance degraded predictably with problem complexity, undercutting the hope that more scale alone would fix it.

This work matters because it offers a mechanistic explanation, not just a benchmark score, for a recurring pattern: models that look like they reason often pattern-match. For practitioners, it argues for caution whenever a task requires reliable many-step computation, and for verifying outputs rather than assuming the model worked the problem the way a person would.

Sources

Last verified June 7, 2026