LightGBM: A Highly Efficient Gradient Boosting Decision Tree

“LightGBM: A Highly Efficient Gradient Boosting Decision Tree” by Guolin Ke and colleagues at Microsoft Research was published at NeurIPS 2017. It addresses a practical bottleneck in gradient-boosted decision trees: training slows down badly as the number of examples and features grows.

LightGBM introduces two techniques to speed things up. Gradient-based One-Side Sampling (GOSS) keeps the training examples with large gradients, which carry the most information, while sampling away many of those with small gradients, so each tree sees less data without much loss of accuracy. Exclusive Feature Bundling (EFB) packs together features that are rarely nonzero at the same time, shrinking the effective number of features. The authors report speedups of more than twenty times over conventional gradient boosting while matching its accuracy on several public datasets.

LightGBM, like XGBoost and CatBoost, became a standard tool for tabular machine learning, and is especially favored when datasets are very large or training time matters. It powered many top finishes in the M5 retail forecasting competition.

Why business readers should care: when a company has millions of rows of transactional data, LightGBM can fit accurate models quickly enough to retrain them often, keeping predictions fresh as conditions change.

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Sources

Related