PointNet, published in December 2016 by Charles R. Qi, Hao Su, Kaichun Mo, and Leonidas J. Guibas at Stanford, was the first deep network to operate directly on raw 3D point clouds - the unordered sets of XYZ points that LiDAR scanners and depth cameras produce. Before it, researchers converted point clouds into regular 3D voxel grids or multiple 2D images so that convolution could be applied, which was wasteful and lossy.
The central technical problem is permutation invariance: a point cloud is a set, so feeding the same points in a different order must produce the same result. PointNet solves this by processing each point independently through shared layers and then aggregating with a symmetric function (max pooling) that ignores order. The architecture handles three tasks with one design - whole-object classification, part segmentation (labeling regions of an object), and semantic scene parsing - while staying simple and efficient compared with voxel-based methods.
PointNet and its follow-up PointNet++ became foundational references for 3D deep learning.
Why business readers should care: point clouds are the native output of the sensors in self-driving cars, robots, drones, and 3D scanners. PointNet made it possible to apply deep learning to that data directly, underpinning perception systems that detect obstacles, segment scenes, and understand 3D shape from raw sensor readings.