SSD: Single Shot MultiBox Detector

“SSD: Single Shot MultiBox Detector” was submitted to arXiv in December 2015 by Wei Liu and colleagues from the University of North Carolina, Zoox, Google, and the University of Michigan. It belongs to the family of one-stage detectors that aim for speed by skipping a separate region-proposal step.

SSD predicts object locations and classes in a single forward pass of the network. It lays down a fixed set of default boxes of various aspect ratios and scales at each location, and for every default box it predicts both how to adjust it to fit a real object and what class that object is. The crucial design choice is that these predictions are made from several feature maps at different resolutions within the network, so coarse, high-level maps catch large objects while finer maps catch small ones. This multi-scale approach let SSD handle objects of widely varying size without the slow pyramid-of-images tricks earlier methods used. On the VOC2007 benchmark it reached competitive accuracy while running far faster than two-stage detectors like Faster R-CNN.

SSD, alongside YOLO, defined the single-shot detection approach that trades a modest amount of accuracy for real-time speed, the right tradeoff for applications like video analysis and embedded vision. Its multi-scale prediction idea also fed into later detector designs.

For a general reader, SSD is part of the shift that made object detection fast enough to run live on video streams, opening the door to real-time applications rather than offline batch analysis.

Sources

Last verified June 7, 2026