Calibrating Noise to Sensitivity in Private Data Analysis

This 2006 paper by Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith, presented at the Third Theory of Cryptography Conference, established the technical core of what is now called differential privacy. Its central result is deceptively simple: you can publish a useful answer to a question about a sensitive dataset while protecting every individual in it, if you add random noise whose size is calibrated to how much any single person’s record could change the answer.

That quantity, how much one record can move the result, the authors call the sensitivity of the function. A query like “how many people in this dataset have a given disease” has low sensitivity, because adding or removing one person changes the count by at most one, so only a small amount of noise is needed. The paper showed that calibrating noise this way gives a precise, provable guarantee that the published output looks essentially the same whether or not any particular person was included. This generalized earlier work that could only handle noisy sums, and it let the same noise budget cover a much wider range of analyses.

The framing is what made it durable. Rather than promising that data is “anonymized,” a claim repeatedly broken by re-identification attacks, the paper offered a mathematical definition of privacy as indistinguishability of outcomes, and a recipe for achieving it. The work introduced the mechanism later known as the Laplace mechanism and laid the groundwork for the privacy budget that bounds cumulative disclosure across many queries.

For a business reader, this paper is the origin of the only privacy technique that comes with a formal, auditable guarantee rather than a hope. Apple, Google, and the U.S. Census Bureau all build on it, and any organization that wants to publish statistics or train models on regulated data without betting its reputation on “we removed the names” is, knowingly or not, relying on the idea introduced here.

Calibrating Noise to Sensitivity in Private Data Analysis

Sources

Related