MLOps

MLOps is the practice of applying the discipline of DevOps to machine learning systems. Where DevOps unified the development and operation of conventional software through automation, version control, and continuous delivery, MLOps extends those ideas to the additional moving parts that a machine learning system carries: training data, learned model parameters, feature pipelines, and the statistical behavior of models once they are serving live traffic.

The intellectual foundation for the field is usually traced to the paper “Hidden Technical Debt in Machine Learning Systems,” by D. Sculley and colleagues at Google, presented at NeurIPS in 2015. The paper argues that only a small fraction of a real-world ML system is the machine learning code itself, and that the surrounding plumbing, data dependencies, and configuration accumulate maintenance costs in the same way ordinary software accrues technical debt. It catalogs ML-specific hazards such as entanglement, hidden feedback loops, undeclared consumers, and unstable data dependencies.

Out of that diagnosis grew a set of engineering practices. Because a trained model is a function of both code and data, reproducibility requires versioning the data and the model alongside the source. Because models degrade as the world they were trained on changes, deployed systems need monitoring for data drift and prediction quality, not just uptime. And because retraining is frequent, teams build continuous integration and continuous delivery pipelines that can retrain, validate, and redeploy a model automatically.

In practice MLOps borrows heavily from established software engineering. Experiment tracking gives runs the equivalent of a commit history; a model registry plays the role of an artifact repository; a feature store enforces consistency between the features used in training and those used at inference. The aim is the same one DevOps pursued for ordinary software: to make the path from a change to a reliable production system fast, repeatable, and observable.

The term MLOps came into wide use only after 2015, as tooling matured and organizations moved from one-off models to fleets of models in continuous production. It is best understood not as a single technology but as the application of long-standing operational rigor to a class of systems whose behavior depends on data as much as on code.