Going Deeper with Convolutions (GoogLeNet / Inception)

“Going Deeper with Convolutions” was submitted to arXiv in September 2014 by Christian Szegedy and colleagues at Google, along with collaborators from the University of North Carolina and the University of Michigan. It introduced the network they called GoogLeNet, built from a repeating unit named the Inception module, and it won the classification track of the 2014 ImageNet challenge.

The core idea was to stop choosing a single filter size for each layer. Inside an Inception module, the network applies 1 by 1, 3 by 3, and 5 by 5 convolutions and a pooling operation in parallel, then concatenates the results, letting the model attend to features at several scales at once. The trick that made this affordable was using 1 by 1 convolutions to shrink the number of channels before the expensive larger filters ran, which kept the computational cost in check even as the network grew to 22 layers. The authors framed this as getting more out of a fixed compute budget rather than simply piling on parameters.

GoogLeNet mattered because it was a deliberate counterpoint to the brute-force depth of contemporaries like VGG. It showed that careful architectural design could deliver top accuracy with far fewer parameters, and the Inception family went through several refinements in the following years. The name itself was a nod to the “we need to go deeper” internet meme, a rare bit of humor in a foundational paper.

For a business reader, Inception is an early example of an idea that recurs throughout AI: efficiency is itself a feature, and the cleverest design often beats the biggest one when compute and deployment cost are real constraints.

Sources

Last verified June 7, 2026