Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

Nightshade is an offensive companion to Glaze, described in a paper posted to arXiv on October 20, 2023, by Shawn Shan, Wenxin Ding, Josephine Passananti, Stanley Wu, Haitao Zheng, and Ben Y. Zhao of the University of Chicago. Where Glaze defends a single artist’s style, Nightshade is designed as a collective deterrent against companies that scrape images from the web without permission.

The technique is a prompt-specific data-poisoning attack. The authors craft poison images that look visually identical to ordinary pictures but, when ingested into a model’s training set, corrupt how the model responds to a target concept. The paper reports that fewer than 100 poison samples can corrupt a single prompt in Stable Diffusion’s SDXL model - for example, making “dog” generate cats - and that the effect “bleeds through” to related concepts. Enough independent poisoned prompts can destabilize the model’s ability to generate coherent images at all. The authors propose it as “a last defense for content creators against web scrapers” that ignore opt-out signals.

The asymmetry is the point: a small number of poisoned images can impose a large cleanup cost on anyone training on uncurated web data, shifting the economics in favor of licensing or honoring opt-outs.

Why business readers should care: Nightshade turned the consent debate from polite requests into a credible threat, showing that ignoring creators’ wishes carries a real technical risk for anyone building models on scraped data.

Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

Sources

Related