Transformer

The Transformer is a neural network architecture introduced in the 2017 paper “Attention Is All You Need”. Its defining idea is to process sequences using only attention mechanisms - layers that let every position in a sequence directly weigh its relationship to every other position - instead of the step-by-step recurrence used by earlier models.

Why it matters in practice: because attention has no sequential dependency during training, Transformers parallelize across modern hardware far better than recurrent networks. That efficiency is what made today’s large language models economically and technically feasible. When you use a modern AI assistant, you are using a descendant of this architecture.

For business readers: “Transformer” is the answer to “what actually changed in 2017 that led to ChatGPT?” The architecture scaled where its predecessors could not, and the past decade of AI progress has largely been the story of scaling it up.

Sources

Last verified June 6, 2026