Universal Adversarial Perturbations

“Universal adversarial perturbations” was submitted to arXiv on October 26, 2016 by Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard, and appeared at CVPR 2017. Earlier adversarial-example work crafted a separate perturbation for each target image. This paper asked a sharper question: could a single, fixed perturbation fool a network across many different images at once?

The answer was yes. The authors demonstrated the existence of a universal, image-agnostic perturbation vector, small enough to be nearly imperceptible, that causes a state-of-the-art classifier to misclassify the large majority of natural images simply by adding the same pattern to each one. The same universal perturbation also generalized across different network architectures, echoing the transferability seen in single-image attacks.

The finding mattered because it revealed structure in how these models fail. As the authors put it, the existence of universal perturbations points to “geometric correlations among the high-dimensional decision boundary of classifiers.” The vulnerability was not scattered randomly across input space but concentrated along shared directions that one fixed perturbation could exploit everywhere.

Practically, a universal perturbation lowers the cost of an attack: rather than recomputing a perturbation per input, an adversary can prepare one pattern in advance and reuse it. The result deepened concern about the systematic, rather than incidental, fragility of deep vision models and fed directly into later work on robustness and physical-world attacks.

Universal Adversarial Perturbations

Sources

Related