DVC, short for Data Version Control, is an open-source command-line tool, made by the company Iterative, that brings the discipline of version control to the data and models in a machine-learning project. Git tracks code well but is poor at handling the large files, datasets, model weights, that machine learning depends on. DVC fills that gap.
It works by keeping the large artifacts themselves in ordinary cloud or local storage while storing only small metadata pointers in the Git repository. That means a teammate can check out a specific Git commit and DVC will retrieve exactly the dataset and model that went with it, making experiments reproducible. On top of versioning, DVC adds lightweight ML pipelines that describe how data flows through processing and training steps and re-run only the steps affected by a change, plus local experiment tracking and tools to compare data, parameters, models, and metrics across runs. It is written in Python, distributed under the Apache 2.0 license, and describes its aims as Git for data, Makefiles for ML, and local experiment tracking.
Why a business reader should care: DVC makes machine-learning work auditable and reproducible, so a result can be traced back to the exact data and code that produced it, which matters for debugging, compliance, and trust.