Scikit-learn: Machine Learning in Python

“Scikit-learn: Machine Learning in Python” appeared in the Journal of Machine Learning Research, volume 12, in 2011 (pages 2825-2830). Its authors were Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay.

The paper describes scikit-learn as a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. Its stated priorities were ease of use, performance, documentation, and API consistency, and it was designed to be accessible to non-specialists. The package was distributed under the simplified BSD license to encourage use in both academic and commercial settings, and it kept external dependencies minimal, building on NumPy and SciPy.

Scikit-learn became the default library for classical machine learning in Python: regression, classification, clustering, dimensionality reduction, model selection, and preprocessing, all behind a consistent fit/predict API. While deep learning later moved to frameworks like TensorFlow and PyTorch, scikit-learn remained the workhorse for tabular data and the on-ramp through which a generation of practitioners learned machine learning.

Why business readers should care: the bulk of practical, profitable machine learning inside companies is still classical models on structured data, and scikit-learn is the tool most teams reach for first.

Sources

Last verified June 7, 2026