“On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities” by Vladimir Vapnik and Alexey Chervonenkis appeared in Theory of Probability and Its Applications, volume 16, issue 2, in 1971 (pages 264-280). It is the foundational paper of statistical learning theory and introduced what is now called the VC dimension.
The paper asks when a model that fits the training data well can be trusted to perform well on new data. The answer turns on the “capacity” of the class of models being considered - roughly, how complex and flexible it is. Vapnik and Chervonenkis defined a precise measure of this, the VC dimension: the largest number of points that the model class can label in every possible way. A higher VC dimension means more expressive power but also a greater risk of fitting noise. They proved bounds showing that as long as the amount of training data is large relative to the VC dimension, training-set performance reliably predicts real-world performance.
This work formalized the trade-off between model complexity and generalization that every practitioner navigates, and it led directly to Vapnik’s later support vector machines, which were designed to control capacity explicitly.
Why business readers should care: VC dimension is the rigorous version of a hard-won practical lesson - a more complex model is not automatically better, and the right amount of complexity depends on how much data you have.