Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (OpenPose)

OpenPose, from a Carnegie Mellon team of Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh (paper posted November 2016), was the first system to estimate the 2D poses of multiple people in an image in real time. It detects body joints - elbows, knees, wrists - and connects them into skeletons for every person in the frame at once.

The core innovation is Part Affinity Fields (PAFs). Detecting individual joints is the easy part; the hard part is grouping them, deciding which elbow belongs to which person in a crowd. PAFs are a learned vector field that encodes the location and orientation of the limb between two joints, so a simple greedy bottom-up parsing step can connect joints into people correctly. Because this assembly is bottom-up, runtime stays roughly constant no matter how many people are in the scene - unlike top-down methods that run a pose estimator once per detected person.

OpenPose won first place in the COCO 2016 keypoints challenge and beat the previous state of the art on the MPII multi-person benchmark, all while running fast enough for live video.

Why business readers should care: real-time multi-person pose estimation underpins motion capture without suits, fitness and physical-therapy apps that check your form, sports analytics, and gesture interfaces. OpenPose made all of this accessible from an ordinary camera, and its public release became a default tool across those industries.

Sources

Last verified June 7, 2026