KITTI is a computer-vision benchmark for autonomous driving, released in 2012 by researchers at the Karlsruhe Institute of Technology and the Toyota Technological Institute at Chicago. The data was recorded from a car driving around Karlsruhe, on rural roads, and on highways, equipped with stereo color and grayscale cameras, a Velodyne laser scanner (lidar), and GPS for ground truth. The result was a set of real-world driving scenes with “up to 15 cars and 30 pedestrians” visible per image.
What made KITTI matter was that it offered standardized tasks with public leaderboards for the problems self-driving cars actually face: stereo depth, optical flow, visual odometry, and especially 3D object detection and tracking. Before KITTI, much vision research was tuned on staged or internet-image datasets; the creators set out to “reduce this bias and complement existing benchmarks by providing real-world benchmarks.” For most of the 2010s, reporting a KITTI number was how a new perception method proved it was competitive.
KITTI’s limitation, a few hours of data from one city in good weather, is also what motivated its successors. Later benchmarks like nuScenes and the Waymo Open Dataset deliberately added more cities, more weather, radar, and far more annotations.
For a general reader, KITTI is a good example of how shared benchmarks drive a field forward: by giving everyone the same hard test, it turned scattered claims into measurable progress and set the template that every later autonomous-driving dataset followed.