A Hackers' Guide to Language Models
Jeremy Howard's practical tour of how language models work and how to build with them, from the OpenAI API to local fine-tuning.
Firsthand talks and lectures worth your time.
Jeremy Howard's practical tour of how language models work and how to build with them, from the OpenAI API to local fine-tuning.
Demis Hassabis's 2025 Cambridge lecture on using AI for science, from AlphaFold to the broader goal of solving intelligence to solve everything.
Demis Hassabis's 2024 Nobel Chemistry lecture on AlphaFold and using AI to solve grand scientific problems.
The full documentary on DeepMind's AlphaGo and its historic 2016 match against Go champion Lee Sedol.
Ilya Sutskever's 2023 Simons Institute talk framing unsupervised learning through compression theory to explain why models generalize.
Andrew Ng's 2017 Stanford talk arguing AI will transform industry after industry the way electricity once did, and where it is still limited.
A visual, ground-up explanation of the attention mechanism that lets transformers relate words to one another.
A visual account of how backpropagation distributes blame across a network's weights to compute gradients.
Geoffrey Hinton's 2024 Nobel Physics lecture explaining Hopfield networks and the Boltzmann machine learning algorithm he co-invented.
3Blue1Brown's visual explanation of how a neural network recognizes handwritten digits, layer by layer.
An accessible talk on reverse-engineering the internal computations of neural networks into understandable parts.
A long-form conversation with Anthropic's CEO on scaling, safety, interpretability, and where powerful AI is heading.
A three-and-a-half hour deep dive through the full training stack of the models that power ChatGPT.
Jeff Dean's 2024 Rice lecture on how better algorithms and ML hardware enabled the Gemini models, with applications in science and health.
Hinton's Cambridge lecture arguing that digital intelligence may have advantages over biological brains, and the risks that follow.
A visual explanation of how a neural network adjusts its weights by following the slope of a cost function.
Fei-Fei Li's 2015 TED talk on ImageNet and the effort to give computers the ability to understand images.
Rob Miles gives an accessible introduction to AI safety and why aligning capable systems with human intent is hard.
A one-hour, general-audience tour of how large language models are trained, what they can do, and where their risks lie.
Francois Chollet's AGI-24 keynote arguing that scaling LLMs will not reach general intelligence, and that abstraction is the missing piece.
John Carmack's 2025 Upper Bound keynote on his path to AGI, including a robot that learns to play a real Atari console with a camera.
A live-coded build of a small GPT, implementing a transformer step by step until it produces working text.
Andrej Karpathy builds a byte-pair-encoding tokenizer from scratch and shows why tokenization causes many LLM quirks.
A four-hour live build that reproduces OpenAI's 124M-parameter GPT-2 from scratch, training it on the way.
The opening lecture of MIT's intensive intro deep learning course, covering neurons, training, and sequence models.
Juergen Schmidhuber's keynote tracing modern AI from his 1980s-90s work on neural nets, LSTM, and self-supervised learning to the present.
Josh Starmer's StatQuest explainer walks step by step through how backpropagation adjusts a network's weights.
Josh Starmer's StatQuest explainer builds intuition for how a neural network bends and combines simple curves to fit data.
Yann LeCun's Harvard lecture arguing that today's LLMs are not the path to human-level AI, and proposing world models instead.
Andrew Ng's Stanford talk on where the real opportunities in AI are and how to build with them.
Noam Brown traces how search and planning let AI master poker and Diplomacy, and why test-time reasoning matters for LLMs.
Sebastian Raschka codes a small GPT-style model end to end, then loads pretrained weights and fine-tunes it.
An interview in which RL pioneer Richard Sutton argues that learning from experience, not imitation, is the path to real intelligence.
David Silver's first lecture in the classic DeepMind/UCL course, defining reinforcement learning, rewards, agents, states, and the RL problem.
Karpathy's argument that LLMs are a new kind of computer programmed in English, ushering in Software 3.0.
The opening lecture of Stanford's NLP course, introducing word vectors and the word2vec algorithm.
Fei-Fei Li opens Stanford's computer vision course with the history of vision and the rise of deep learning.
A Microsoft Build keynote walking through how GPT assistants are trained, from pretraining to RLHF, and how to use them well.
Yoshua Bengio's 2025 TED talk warning that advanced AI is learning to deceive and self-preserve, and proposing a non-agentic safer path.
A from-scratch build of a tiny autograd engine, spelling out backpropagation one operation at a time.
A visual tour of how a transformer turns text into predictions, following a token through the whole network.
Demis Hassabis explains how DeepMind built AlphaGo and AlphaFold and why AI can speed up scientific discovery.
Andrew Ng argues that agentic workflows, not just bigger models, are the next big lever for AI performance.
Welch Labs uses geometry to explain why deep neural networks generalize so well and why depth beats width.
Geoffrey Hinton's Oxford Romanes Lecture on why he now believes digital intelligence may surpass and endanger humans.
Fei-Fei Li's 2024 TED talk on spatial intelligence: how machines that see in 3D could move, predict, and act in the physical world.