Stanford CNN matches dermatologists at classifying skin cancer

In February 2017 Nature published “Dermatologist-level classification of skin cancer with deep neural networks,” by Andre Esteva, Brett Kuprel, Sebastian Thrun, and colleagues at Stanford. It was an early and influential demonstration that a deep neural network could match physicians at a visual diagnostic task with direct clinical stakes.

The team trained a convolutional neural network on a dataset of 129,450 clinical images covering 2,032 different skin diseases - a corpus two orders of magnitude larger than previous efforts. Rather than build a network from scratch, they used transfer learning, starting from a GoogLeNet Inception v3 model pretrained on roughly 1.28 million general ImageNet photographs and fine-tuning it on the skin images. This let the model inherit general visual features and specialize on dermatology with the data available.

The network was tested against at least 21 board-certified dermatologists on two binary tasks that map to real clinical decisions: distinguishing keratinocyte carcinomas from benign seborrheic keratoses, and distinguishing malignant melanomas from benign moles, including a version using dermoscopy images. Across these tasks the CNN reached an area under the curve above 91 percent and performed on par with the tested experts.

The paper became a reference point for medical-imaging AI, in part because skin cancer is the most common human cancer and smartphone cameras are ubiquitous, raising the prospect of low-cost screening. It also became a case study in the gap between benchmark parity and deployment: matching dermatologists on curated comparisons is not the same as safe, equitable performance across all skin tones, devices, and clinical settings, and later work scrutinized exactly those limits.

Sources

Last verified June 7, 2026