Fast R-CNN

“Fast R-CNN” was submitted to arXiv in April 2015 by Ross Girshick, then at Microsoft Research. It was a direct, focused fix for the main weakness of his own earlier R-CNN: speed.

The original R-CNN ran a full convolutional network separately on each of thousands of candidate regions in an image, which was accurate but painfully slow to train and to run. Fast R-CNN reorganized the computation so the network processes the whole image just once to produce a shared feature map, and then a new operation called region-of-interest pooling crops the relevant features for each candidate region from that single map. It also merged the previously separate classification and bounding-box-refinement stages into one network trained with a single combined loss. The payoff was large: training the deep VGG16-based detector roughly nine times faster than R-CNN and running it about two hundred times faster at test time, while also improving accuracy.

Fast R-CNN was a crucial middle step in the evolution of object detection. It still relied on an external, slow method to propose candidate regions, a bottleneck that the follow-up Faster R-CNN would remove, but its shared-computation design became the template that later detectors built on.

For a general reader, Fast R-CNN is a clean example of an engineering insight, do the expensive work once and reuse it, turning a research curiosity into something fast enough to be practical.

Sources

Last verified June 7, 2026