Databricks open-sources MLflow for the machine-learning lifecycle

On June 5, 2018, Databricks announced MLflow, an open-source platform for managing the machine-learning lifecycle, in a blog post by Matei Zaharia and Cyrielle Simeone. Zaharia was the original creator of Apache Spark and Databricks’ chief technologist, and MLflow was released as an alpha to address recurring pain in building and shipping models.

The launch identified three persistent problems: experiments are hard to track across the many tools and parameters involved, results are hard to reproduce, and models are hard to deploy to production. MLflow answered these with three components. MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and output files so runs can be compared. MLflow Projects is a standard format, using a simple YAML descriptor, for packaging reusable code with its dependencies. MLflow Models is a convention for packaging trained models in multiple “flavors” so the same model can be deployed to different serving platforms. By design it worked with any ML library, language, or tool, and was built around REST APIs and simple file formats rather than a closed ecosystem.

MLflow grew quickly into one of the most widely adopted MLOps tools, later donated to the Linux Foundation, and it helped establish experiment tracking and model packaging as standard practice rather than ad-hoc scripting.

Databricks open-sources MLflow for the machine-learning lifecycle

Sources

Related