The bias-variance tradeoff is the central balancing act in supervised learning. The “Machine Learning Basics” chapter of Goodfellow, Bengio, and Courville’s “Deep Learning” treats it as a core concept, tying it to model capacity, overfitting, and underfitting. The framing was made influential by Stuart Geman, Elie Bienenstock, and Rene Doursat in their 1992 paper “Neural Networks and the Bias/Variance Dilemma.”
The idea decomposes a model’s expected error into two competing sources. Bias is error from wrong assumptions - a model too simple to capture the real pattern, which underfits and is wrong in a consistent way no matter how much data it sees. Variance is error from oversensitivity - a model so flexible that it fits the random noise in its particular training sample, so its predictions swing wildly if the training data changes. Simple models tend to have high bias and low variance; complex models the reverse. The lowest total error usually sits at an intermediate level of complexity.
This framework explains why more complex is not always better and motivates the tools used to manage complexity: regularization, cross-validation, and ensemble methods. It is worth noting that the deep-learning era complicated the classic picture - very large networks can have low error despite enormous capacity, a phenomenon studied under “double descent” - but the tradeoff remains the right starting intuition.
Why business readers should care: the bias-variance tradeoff is the reason model building is not just “use the most powerful model available.” Getting the complexity right for the data on hand is the difference between a model that generalizes and one that fails in production.