Distinctive Image Features from Scale-Invariant Keypoints (SIFT)

“Distinctive Image Features from Scale-Invariant Keypoints,” by David G. Lowe of the University of British Columbia, appeared in the International Journal of Computer Vision in 2004 (volume 60, pages 91-110), building on a shorter 1999 conference paper. It introduced the Scale-Invariant Feature Transform, or SIFT, one of the most influential algorithms in the history of computer vision.

SIFT detects keypoints in an image - distinctive local spots such as corners and blobs - and describes each one with a vector that stays stable even when the image is scaled, rotated, partly occluded, lit differently, or viewed from a somewhat different angle. The method finds candidate points across a range of scales using differences of Gaussian-blurred images, assigns each a dominant orientation, and then summarizes the gradients around it into a 128-number descriptor. Because these descriptors are so repeatable, the same physical point can be matched reliably between two photographs.

That reliable matching made SIFT the workhorse behind a generation of vision systems: object recognition, panorama stitching, 3D reconstruction from photos, robot navigation, and image search. For roughly a decade it defined the state of the art in hand-engineered visual features. The deep learning wave that began with AlexNet in 2012 eventually replaced hand-crafted descriptors like SIFT with features learned directly from data, but SIFT remains a landmark - the high-water mark of the era when humans, not networks, designed what a computer should look for.

Distinctive Image Features from Scale-Invariant Keypoints (SIFT)

Sources

Related