PAL: Program-Aided Language Models

β€œPAL: Program-aided Language Models,” submitted to arXiv on November 18, 2022 by Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig, addressed a stubborn weakness of chain-of-thought reasoning: language models are good at planning a solution but unreliable at the arithmetic in the middle.

PAL splits those two jobs. The model reads a natural-language problem and writes a program - typically Python - as its intermediate reasoning steps, but it does not execute that program itself. The actual computation is handed to a Python interpreter. The neural model handles understanding and decomposition; the interpreter handles exact symbolic and arithmetic operations that language models frequently get wrong. This is a concrete instance of the broader pattern of letting a model call external tools rather than simulating them in its head.

Using the Codex model, PAL set state-of-the-art few-shot accuracy on the GSM8K math word-problem benchmark, beating chain-of-thought prompting on the 540-billion-parameter PaLM model by 15 absolute percentage points in top-1 accuracy.

Why business readers should care: PAL is an early, clear demonstration of why connecting a language model to a calculator or code runtime beats trusting it to compute on its own. The principle - let the model reason, let a deterministic tool execute - is now standard in production AI systems that need correct numbers.

Sources

Last verified June 7, 2026