Object Detection

Object detection is the computer vision task of locating and identifying multiple objects within a single image. Unlike image classification, which assigns one label to the whole picture, detection must answer both “what” and “where,” typically by drawing a bounding box around each object and labeling it. It underpins applications from self-driving cars and warehouse robots to medical imaging and retail checkout.

The modern era of object detection began in 2013 with R-CNN, by Ross Girshick and colleagues at UC Berkeley, which applied a convolutional network to candidate image regions and beat hand-engineered pipelines decisively. A rapid series of improvements followed: Fast R-CNN and Faster R-CNN made the approach far quicker by sharing computation and learning the region proposals, while single-pass detectors such as YOLO (“You Only Look Once”) traded a little accuracy for real-time speed by predicting all boxes in one forward pass. Progress was measured on shared benchmarks, especially PASCAL VOC and later COCO (Common Objects in Context).

Detection has since extended into related tasks - instance segmentation (Mask R-CNN), which outlines each object pixel by pixel, and promptable segmentation (Segment Anything), which can isolate any object on request.

Why business readers should care: object detection is the perception layer behind autonomous vehicles, automated inspection, inventory tracking, and security systems, and its accuracy and speed directly determine whether those products are viable.

Sources

Related