Histograms of Oriented Gradients for Human Detection (HOG)

“Histograms of Oriented Gradients for Human Detection” was presented at the 2005 CVPR conference by Navneet Dalal and Bill Triggs of INRIA Rhone-Alpes in France. It defined a feature descriptor, HOG, that dominated object detection in the years before deep learning took over.

The descriptor rests on a simple observation: the shape and appearance of an object can be captured well by the distribution of local intensity gradients, that is, the directions in which brightness changes across the image. HOG divides an image into a dense grid of small cells, computes a histogram of gradient orientations within each cell, and then normalizes these histograms across overlapping blocks to be robust to lighting and contrast. The authors fed these descriptors to a linear support vector machine and showed they substantially outperformed earlier feature sets on the task of detecting people in cluttered, real-world images, even introducing a more challenging pedestrian dataset to push the field.

HOG, often paired with an SVM, became the workhorse of pedestrian detection and many other recognition systems, and it was a key reference point that early deep learning detectors like R-CNN explicitly set out to beat. Studying HOG also clarified which design choices, fine orientation binning, local normalization, dense overlapping blocks, actually drove detection performance.

For a general reader, HOG is a window into how computer vision worked before neural networks: careful, hand-designed features that encoded human intuition about shape, an approach that learned features would later surpass but that defined the state of the art for nearly a decade.

Histograms of Oriented Gradients for Human Detection (HOG)

Sources

Related