ResNet trains ultra-deep networks

In December 2015, Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun at Microsoft Research published “Deep Residual Learning for Image Recognition” on arXiv. It solved a stubborn problem: beyond a certain depth, adding more layers to a neural network made it harder to train, not better.

Their fix was the residual connection, a simple shortcut that lets each layer learn only the small change it needs to make rather than relearning everything from scratch. The paper reports residual networks with depths of up to 152 layers, about eight times deeper than the previous VGG networks but with lower computational complexity. An ensemble of these networks reached 3.57 percent error on the ImageNet test set and won first place in the ILSVRC 2015 classification competition.

ResNet mattered because it removed the practical ceiling on network depth. The residual connection became a standard building block across the field, and the same idea reappears inside the Transformer architecture that powers modern language models.

Sources

Last verified June 6, 2026