Isolation Forest

“Isolation Forest” by Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou was presented at the IEEE International Conference on Data Mining in December 2008. It introduced a fast and now widely used method for finding anomalies, the rare, unusual records that often signal fraud, faults, or intrusions.

Most earlier anomaly detectors worked by building a profile of normal data and flagging anything far from it, using distance or density measures that get slow on large, high-dimensional data. Isolation Forest inverts the logic. It builds many random trees by repeatedly picking a feature at random and splitting at a random value. Anomalies, being few and different, tend to get separated from the rest of the data after only a few splits, so they end up close to the root of the trees. The detector simply measures how short the average path is to isolate each point: short paths mean anomaly. Because it isolates outliers directly and can work on small subsamples, the method has linear time complexity and low memory use, making it practical at scale.

Isolation Forest is now a standard tool, included in common machine learning libraries, for unsupervised anomaly detection.

Why business readers should care: Isolation Forest is one of the cheapest reliable ways to surface suspicious transactions, sensor readings, or log events hiding in large volumes of normal data.

Sources

Related