Feature Pyramid Networks for Object Detection

“Feature Pyramid Networks for Object Detection” was submitted to arXiv in December 2016 by Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. It addressed a stubborn problem in detection: handling objects that appear at very different sizes in the same image.

The classic solution was to resize the input image to many scales and run the detector on each, which is accurate but expensive. Feature Pyramid Networks, or FPN, instead exploit the pyramid of feature maps that a deep convolutional network already produces internally, where early layers are high-resolution but semantically shallow and deep layers are low-resolution but semantically rich. FPN adds a top-down pathway and lateral connections that combine the two, propagating the rich, deep semantics back down to the high-resolution maps. The result is a feature pyramid that is both detailed and meaningful at every level, built for marginal extra cost on top of the existing network. Plugged into a detector, it gave strong gains on the COCO benchmark, especially for small objects.

FPN became a near-universal component of modern detectors and segmentation systems, used in everything from Faster R-CNN and Mask R-CNN variants to one-stage detectors. Its top-down, multi-scale fusion is now a standard building block rather than an optional add-on.

For a general reader, FPN is the piece that quietly made detectors reliable across the full range of object sizes, an unglamorous but essential ingredient in systems that have to see both the truck and the road sign.

Sources

Last verified June 7, 2026