Poisoning Web-Scale Training Datasets Is Practical

“Poisoning Web-Scale Training Datasets is Practical” was submitted to arXiv on February 20, 2023 by Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramer. Data poisoning, deliberately inserting bad examples so a model learns the wrong thing, had long been studied in the abstract. This paper showed that the modern way models are trained, on enormous datasets crawled from the open web, makes poisoning cheap and realistic in practice.

The authors introduced two attacks. “Split-view poisoning” exploits the fact that web-scale datasets such as LAION-400M distribute lists of URLs rather than the images themselves; the content at a URL when the dataset was indexed can differ from the content served later when a victim downloads it. By buying expired domains that those URLs once pointed to, an attacker controls what future downloaders receive. The authors estimated that an adversary could have poisoned a meaningful fraction of LAION-400M and similar datasets for roughly 60 US dollars. “Frontrunning poisoning” targets datasets built from periodic snapshots of crowd-sourced sources like Wikipedia, editing a page just before it is captured.

The work was responsible and constructive. The authors notified the maintainers of the affected datasets, did not actually poison anyone’s training run, and proposed defenses such as integrity checks (hashing) and randomized snapshot timing.

The paper turned data poisoning from a theoretical concern into a demonstrated, low-cost, real-world supply-chain risk for any organization that trains on uncurated web data, and it is a standard citation in discussions of training-data integrity.

Poisoning Web-Scale Training Datasets Is Practical

Sources

Related