Mask R-CNN, published in March 2017 by Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick, became the standard framework for instance segmentation - not just drawing a box around each object but tracing its exact pixel outline. The same model detects objects in an image and simultaneously generates a high-quality segmentation mask for each one.
The design is a clean extension of Faster R-CNN, the detector that locates objects and classifies them. Mask R-CNN keeps that architecture and adds a third branch, in parallel with the existing box and class branches, that predicts a binary mask for each region of interest. A small but important fix called RoIAlign replaced the coarse feature pooling of earlier detectors with precise, sub-pixel alignment, which mattered a lot for getting mask edges right.
The framework won all three tracks of the COCO 2016 challenge - instance segmentation, bounding-box detection, and person keypoint detection - outperforming every existing single-model entry on every task. Because the same architecture handled keypoints, it also became a strong human-pose estimator, showing how general the design was.
Why business readers should care: instance segmentation is the technology behind background removal, automated photo editing, medical image analysis, and the perception stacks in warehouse robots and self-driving cars. Mask R-CNN was the workhorse that made pixel-accurate object understanding a standard, reusable building block.