Weights and Biases

Weights and Biases, usually abbreviated W&B, is a platform for experiment tracking and collaboration in machine learning. Founded in 2018, it became especially popular in research settings, where teams run large numbers of experiments and need a reliable way to record, visualize, and compare them. Its documentation describes the product as a platform for “experiment tracking, evaluation, and observability” used to develop AI models and ship applications built on them.

At its core, W&B addresses the same problem as other experiment-tracking systems: a single training run produces a tangle of hyperparameters, metrics, and output artifacts that are easy to lose. With a few lines of code added to a training script, W&B captures these automatically. Its tracking guide describes initializing a run, storing a dictionary of hyperparameters in the run’s configuration, logging metrics such as accuracy and loss during the training loop, and uploading outputs such as trained model weights as artifacts.

What distinguished W&B in practice was its hosted, interactive dashboard. Rather than logging to local files, runs streamed to a web interface where researchers could watch training curves update live, sort and filter across hundreds of runs, and build shared reports. That collaborative, visual layer made it a natural fit for academic groups and industrial research labs, and integrations with popular frameworks meant adoption required minimal code changes.

W&B grew alongside the broader MLOps movement and the open-source MLflow project, and the two are often compared. MLflow began as an open-source, self-hosted toolkit; W&B led with a hosted service and a polished collaboration experience. Both helped turn experiment tracking from an informal habit into standard machine learning infrastructure.

As the field shifted toward large language models, W&B expanded beyond classic model training. It added a model registry and hyperparameter sweep tooling under its Models offering, and a separate product line, Weave, aimed at tracing and evaluating applications built on top of LLMs. Its enduring contribution, however, remains the one it started with: making the runs behind machine learning work visible, comparable, and shareable.

Sources

Related