In this nearly two-hour lecture on his own channel, Andrej Karpathy builds a small generatively pretrained transformer from scratch in code, spelling out each step. Following the architecture of the “Attention Is All You Need” paper and OpenAI’s GPT-2 and GPT-3, he trains a character-level model on a tiny Shakespeare dataset and ends up with the core of his nanoGPT project.
The talk is deliberately deep rather than a high-level overview. Karpathy implements the attention mechanism, the building blocks of the transformer, and the training loop, pausing to explain why each piece exists and how it connects to the next. By the end, the viewer has watched a working language model come together line by line.
This is the right teacher for the material. Karpathy designed and taught Stanford’s first deep learning course and is known for making hard ideas legible. For a technically inclined reader who wants to move past analogies and see exactly how a GPT is constructed, this is among the most respected resources available.