“Greedy Function Approximation: A Gradient Boosting Machine” by Jerome H. Friedman appeared in The Annals of Statistics in 2001. It is the paper that gave gradient boosting its modern theoretical footing, and every later boosting library traces back to it.
Friedman’s insight was to view model fitting as optimization in function space. Instead of adjusting a fixed set of parameters, the method builds up a prediction function by adding one simple model at a time, where each new model points in the direction that most reduces the loss, the negative gradient. With regression trees as the simple models, this becomes the gradient tree boosting that underlies XGBoost, LightGBM, and CatBoost. The framework works for many loss functions, which is why it can do regression, classification, and ranking with the same core machinery. Friedman also showed that shrinking each step by a small learning rate and fitting trees on random subsamples improves accuracy and robustness.
The paper, together with Friedman’s companion work, turned boosting from a clever trick into a general, principled procedure that practitioners could adapt to almost any prediction task on structured data.
Why business readers should care: the gradient-boosting tools that win most tabular prediction problems today are direct descendants of this single paper’s idea.