“Densely Connected Convolutional Networks” was submitted to arXiv in August 2016 by Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Weinberger, and went on to win a best paper award at the 2017 CVPR conference. It pushed the residual idea to its logical extreme.
Where ResNet added a single shortcut around each block, DenseNet connects each layer directly to every layer that comes after it within a block. A layer receives, as its input, the feature maps of all preceding layers concatenated together, giving a network of L layers on the order of L squared direct connections. This dense connectivity has three effects the authors highlight: it eases the vanishing-gradient problem by giving every layer a short path to the loss, it encourages feature reuse so the network does not have to relearn the same patterns, and, perhaps counterintuitively, it lets each layer be very thin, which means DenseNets reach strong accuracy with substantially fewer parameters than comparable architectures.
DenseNet sits alongside ResNet as one of the two dominant answers to the question of how to train very deep networks, and the contrast between them, adding versus concatenating earlier features, became a standard teaching example. Its parameter efficiency made it attractive in settings where memory or model size mattered.
For a general reader, DenseNet illustrates how a single structural choice about how information flows through a network can simultaneously improve accuracy and shrink the model, the kind of design win that does not require more data or more compute.