This is chapter two of 3Blue1Brown’s deep learning series and it tackles the central question of how a neural network learns. Grant Sanderson introduces the idea of a cost function that measures how wrong the network’s outputs are, then shows how the network reduces that cost by nudging its weights in the direction that makes the cost go down fastest.
The video gives a geometric picture of gradient descent: imagine standing on a hilly surface and repeatedly stepping downhill. Sanderson connects this image to the actual gradients computed over thousands of parameters, and explains why the process is a search for a good configuration rather than a guaranteed path to a perfect one.
Gradient descent is the workhorse optimization method behind almost all of deep learning, and understanding it intuitively makes the rest of the field far less mysterious. For a beginner who wants to grasp what “training” really means without first learning multivariable calculus, this is one of the best available explanations, and it sets up the channel’s later chapter on backpropagation.