Efficient and Robust Automated Machine Learning (auto-sklearn)

“Efficient and Robust Automated Machine Learning” was presented at NeurIPS in 2015 by Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. It introduced auto-sklearn, one of the best-known open-source AutoML systems, built on top of the popular scikit-learn library.

The system treats building a machine-learning pipeline as a single optimization problem: choosing among 15 classifiers and 14 preprocessing methods and tuning a roughly 110-dimensional space of hyperparameters, all solved with Bayesian optimization. Two ideas made it especially effective. First, meta-learning warm-starts the search by remembering which pipelines worked well on similar past datasets, so it does not start from scratch. Second, rather than returning only the single best pipeline, it builds an ensemble from the strong pipelines it evaluated along the way, improving robustness. Auto-sklearn won the first phase of the ChaLearn AutoML challenge.

It became a reference implementation for AutoML research and a practical tool for non-experts to get strong baselines on tabular data.

For a business reader, auto-sklearn is a concrete example of packaging expert modeling decisions into software that a non-specialist can run end to end.

Efficient and Robust Automated Machine Learning (auto-sklearn)

Sources

Related