MNIST (handwritten digit database)

MNIST is a database of handwritten digits assembled by Yann LeCun, Corinna Cortes, and Chris Burges and released in 1998. It contains 70,000 grayscale images of the digits 0 through 9 - a training set of 60,000 images and a test set of 10,000 - each a 28-by-28 pixel image of a single handwritten digit. It was built by re-mixing two earlier collections from the US National Institute of Standards and Technology (NIST Special Database 1 and Special Database 3); half the digits came from Census Bureau employees and half from high-school students, distributed evenly across the training and test splits.

The preprocessing is part of why MNIST became so widely used: the original NIST images were size-normalized to fit in a 20-by-20 box while preserving aspect ratio, then centered by center of mass inside a 28-by-28 frame. That uniform, low-effort format let researchers compare algorithms directly with almost no data wrangling. For two decades MNIST was the canonical “hello world” of pattern recognition - the first thing a new model, optimizer, or student would be tested on.

MNIST is the dataset behind LeCun’s convolutional network work, including LeNet-5, and it remains in nearly every machine-learning teaching curriculum. Modern methods drove its test error rate far below one percent, so it stopped distinguishing strong models long ago - which is precisely why later, harder datasets such as CIFAR and ImageNet were created. Its lasting role is historical and pedagogical: it set the template of a fixed, public train/test split that the entire benchmark culture of AI was later built on.

MNIST (handwritten digit database)

Sources

Related