Thomas Ferguson published this paper in The Annals of Statistics in 1973 to solve a problem that had blocked Bayesian methods from large parts of statistics. Bayesian inference requires a prior distribution over the unknown quantity, but in nonparametric problems the unknown quantity is itself an entire probability distribution. Ferguson needed a prior that lived on the space of distributions, was mathematically tractable, and gave sensible posteriors after seeing data.
His answer was the Dirichlet process. Its key property, which he proves in the paper, is conjugacy: if your prior over distributions is a Dirichlet process, then after observing samples the posterior is again a Dirichlet process with updated parameters. That clean self-similarity made the prior usable in practice and let Ferguson rederive classical procedures, including the sign test and the Mann-Whitney statistic, from a Bayesian standpoint.
Ferguson was candid about a limitation: distributions drawn from a Dirichlet process are discrete with probability one, which complicates certain goodness-of-fit problems. That same discreteness, however, later turned out to be a feature rather than a bug.
The reason this matters today is clustering. Because the Dirichlet process naturally produces a countable set of distinct values, it became the foundation for Bayesian nonparametric models that infer the number of clusters or topics from the data instead of fixing it in advance, an idea now common in document modeling, customer segmentation, and other settings where the right number of groups is unknown.