The Waymo Open Dataset, released in 2019 alongside the paper “Scalability in Perception for Autonomous Driving,” made a large slice of Waymo’s own sensor data public for research. It contains “high quality LiDAR and camera data captured across a range of urban and suburban geographies,” organized into 1,150 scenes that each span 20 seconds, with both 2D and 3D bounding boxes carrying “consistent identifiers across frames” so objects can be tracked over time.
The dataset’s selling point was diversity and scale: the paper describes it as “15x more diverse than the largest camera+LiDAR dataset available” at the time, and it was specifically built to study “the effects of dataset size and generalization across geographies on 3D detection methods.” That last point matters because a perception model that works in Phoenix may fail in San Francisco; the dataset let researchers measure exactly that generalization gap.
Waymo paired the data release with public challenges and leaderboards, which over time expanded from perception into motion prediction and end-to-end driving. Coming from a company actually operating driverless vehicles, the dataset carried unusual credibility about what real-world autonomy data looks like.
For a general reader, the Waymo Open Dataset illustrates a maturing field giving back: the leader in deployed self-driving shared enough real data to let academics work on the same hard problems, while keeping the question of cross-city generalization, the thing that separates a demo from a service, front and center.