“Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks” was submitted to arXiv in June 2015 by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. It closed the last major gap in the R-CNN family.
Fast R-CNN had made the detection stage quick, but it still depended on a separate, slow algorithm to propose where objects might be in an image, and that proposal step had become the bottleneck. Faster R-CNN’s contribution was the Region Proposal Network, a small convolutional network that generates candidate object regions directly from the same shared feature map the detector already computes. Because proposing regions and detecting objects now reuse the same convolutional features, the proposals are, in the authors’ phrase, nearly cost-free, and the whole system becomes a single network trainable end to end. The combined model reached state-of-the-art accuracy on the standard PASCAL VOC and COCO benchmarks while running at around five frames per second on a GPU.
Faster R-CNN became the canonical two-stage detector and a workhorse of applied computer vision, underpinning systems for autonomous driving, medical imaging, and industrial inspection. Its design also formed the foundation for Mask R-CNN, which added pixel-level segmentation on top.
For a business reader, Faster R-CNN is where object detection became fast and unified enough to deploy in production, a quiet turning point behind many of the camera-based AI systems in use today.