“Squeeze-and-Excitation Networks” was submitted to arXiv in September 2017 by Jie Hu, Li Shen, and Gang Sun, with collaborators including Samuel Albanie. The architecture it introduced, SENet, won the classification track of the 2017 ImageNet challenge, the final running of that competition, with a top-5 error of about 2.25 percent.
The contribution is a small, bolt-on module called the Squeeze-and-Excitation block. A standard convolution treats all of its output channels as equally relevant, but in any given image some feature channels matter more than others. The SE block first squeezes each channel’s feature map down to a single number that summarizes it globally, then learns, from those summaries, a set of per-channel weights that excite the useful channels and suppress the rest. In effect the network learns to pay attention to which features to emphasize, a form of channel-wise attention. Because the block adds little computation, it can be dropped into existing architectures like ResNet or Inception and reliably improve their accuracy.
SENet mattered both as a competition winner and as an early, influential example of attention applied to convolutional features, foreshadowing the broader move toward attention mechanisms across deep learning. Its plug-in nature meant the idea spread quickly into many later models.
For a general reader, the SE block captures an intuitive idea made concrete: a model works better when it can decide which of its own signals to trust, rather than weighing everything the same.