Active Learning

Active learning is a machine-learning approach in which the learning algorithm participates in deciding which data gets labeled. Labeling data is often the most expensive part of building a model, so instead of labeling a large random sample, active learning has the model select the unlabeled examples it expects to learn the most from and asks a human (the oracle) to label just those, repeating the loop as the model improves.

Burr Settles’ widely cited 2009 literature survey lays out the field’s structure. It distinguishes scenarios such as pool-based sampling (choosing from a fixed pool of unlabeled data) and stream-based sampling, and it catalogs query strategies including uncertainty sampling (label the examples the model is least sure about), query-by-committee (label the examples a set of models most disagree on), and methods based on expected model change or expected error reduction. The survey also covers practical complications like noisy oracles, varying labeling costs, and batch-mode labeling.

Active learning underpins many modern data pipelines and labeling platforms, where models pre-select hard or ambiguous cases for human review, and it connects closely to the data-flywheel idea of continuously improving a model from production data.

For a business reader, active learning is how teams cut labeling budgets, spending scarce human attention on the examples that actually move model quality.

Sources

Related