Batch Normalization: Accelerating Deep Network Training

“Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift” was submitted to arXiv on February 11, 2015 by Sergey Ioffe and Christian Szegedy at Google. It introduced a technique that became a near-universal ingredient in deep neural networks because of how much it eased training.

The authors framed the problem as “internal covariate shift”: as the parameters of early layers change during training, the distribution of inputs arriving at later layers keeps shifting, forcing the network to constantly readjust. Their fix is to insert a normalization step into the network itself. For each mini-batch of training examples, the activations at a given layer are recentered and rescaled to have a consistent mean and variance, with learnable parameters that let the network recover any scale it actually needs.

The practical payoff was large. Batch normalization let networks train with much higher learning rates, made them far less sensitive to how weights were initialized, and added a mild regularizing effect that sometimes removed the need for dropout. The paper reported reaching the same image classification accuracy with about fourteen times fewer training steps, and an ensemble of batch-normalized networks set a new state of the art on ImageNet.

For a general reader, batch normalization shows how an engineering insight, keeping the signal flowing through a deep network well-behaved at every layer, can unlock depths and training speeds that were previously impractical, and it remains a standard tool more than a decade later.

Batch Normalization: Accelerating Deep Network Training

Sources

Related