“Maximum Likelihood from Incomplete Data via the EM Algorithm” by Arthur Dempster, Nan Laird, and Donald Rubin appeared in the Journal of the Royal Statistical Society, Series B, volume 39, issue 1, in 1977 (pages 1-22). It gave a single name and a unified theory, Expectation-Maximization or EM, to a family of techniques that statisticians had been reinventing separately.
EM solves a recurring problem: how to fit a model when some of the data you would need is missing or hidden. Many models have such hidden variables - which cluster a point belongs to, which underlying state a system is in. EM alternates two steps. In the expectation (E) step, it uses the current model to fill in best guesses for the missing information. In the maximization (M) step, it updates the model parameters as if those guesses were real data. Repeating the two steps is guaranteed to improve the fit at each round until it converges.
The algorithm became foundational across statistics and machine learning. It is the standard way to fit Gaussian mixture models, to train hidden Markov models for speech and bioinformatics, and to handle missing values in countless analyses. K-means clustering can be seen as a hard-assignment special case of the same idea.
Why business readers should care: EM is the mathematical machinery that lets models learn from messy, incomplete real-world data - the norm, not the exception, in most organizations.