Sparks of Artificial General Intelligence: Early experiments with GPT-4

“Sparks of Artificial General Intelligence: Early experiments with GPT-4” was submitted to arXiv on March 22, 2023 by a Microsoft Research team led by Sebastien Bubeck, with co-authors including Eric Horvitz and Yin Tat Lee. The paper studied an early, unreleased version of GPT-4 and argued, in its words, that the model could “reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system.”

Rather than scoring the model on standard benchmarks, the 150-plus-page report is a long catalog of qualitative probes across mathematics, coding, vision (via text descriptions), medicine, law, and psychology. The authors highlight tasks they consider novel and hard - drawing a unicorn in the TikZ graphics language, stacking objects of different shapes, solving puzzles that require planning - and conclude that GPT-4’s performance is “strikingly close to human-level” and goes beyond memorization.

The paper became one of the most discussed and most criticized AI documents of its era. Critics objected that it was authored by researchers at a company heavily invested in OpenAI, that the tested model was not the public release, that “sparks of AGI” is not a measurable claim, and that cherry-picked successes say little without systematic failure analysis. The paper itself acknowledges GPT-4’s limitations, including unreliable arithmetic, planning failures, and confident hallucination.

Why business readers should care: the paper crystallized the 2023 argument over whether large language models are approaching general intelligence or are impressive pattern matchers, a question that still drives investment and policy.

Sources

Last verified June 7, 2026