Anomaly Detection

Anomaly detection, also called outlier detection, is the task of identifying the small number of data points that do not fit the pattern of the rest. The classic examples are fraudulent credit-card transactions hidden among millions of legitimate ones, a failing machine among healthy ones on a factory floor, or an intrusion in a stream of normal network traffic.

What makes anomaly detection distinctive is that the interesting cases are rare and often unlabeled. You usually cannot collect a large, balanced set of examples of every kind of fraud, so many anomaly detectors are unsupervised: they learn what normal looks like from unlabeled data and flag whatever deviates. Approaches fall into a few families. Some estimate the density or distribution of the data and flag low-probability points. Some measure distance to neighbors or to cluster centers. And some, like the Isolation Forest of Liu, Ting, and Zhou, directly try to isolate odd points, exploiting the fact that anomalies are easy to separate. Because the cost of a missed anomaly and a false alarm differ, a recurring challenge is setting the threshold that decides what counts as anomalous.

Anomaly detection is one of the most directly valuable uses of machine learning in business operations.

Why business readers should care: it is the technology behind fraud alerts, predictive maintenance, and security monitoring, finding the few costly cases buried in oceans of routine data.

Sources

Related