In her 2000 paper “Simple Demographics Often Identify People Uniquely,” computer scientist Latanya Sweeney analyzed 1990 US Census data and reported that “87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}.”
The same paper showed that about half the US population, 132 million people, could be uniquely identified by place (city or town), gender, and date of birth alone, and that even at the county level, county plus gender plus date of birth pinned down about 18% of people. To prove the danger was real, Sweeney bought Massachusetts voter registration records for twenty dollars and linked them to supposedly anonymous hospital data, re-identifying individual patients.
The finding became one of the most cited results in data privacy. It demonstrated that stripping names and addresses from a dataset does not make it anonymous, and it directly motivated formal privacy models such as k-anonymity and, later, differential privacy.