One Pixel Attack for Fooling Deep Neural Networks

“One Pixel Attack for Fooling Deep Neural Networks” was submitted to arXiv on October 24, 2017 by Jiawei Su, Danilo Vasconcellos Vargas, and Kouichi Sakurai at Kyushu University. It pushed adversarial perturbations to an extreme: instead of spreading a small change across the whole image, it changes just one pixel.

The attack uses differential evolution, an evolutionary optimization method, to search for the single pixel and color value that, when altered, flips the classifier’s decision. This is a strict black-box setting, requiring only the model’s output probabilities rather than its gradients. The authors reported that modifying one pixel fooled the classifier on roughly 68 percent of the CIFAR-10 test images they tried, and on about 16 percent of the ImageNet images, often while the network reported high confidence in its wrong answer.

The result is striking because the change is essentially invisible to a person, yet it is enough to break the model. It illustrates how the decision boundaries learned by deep networks can be brittle in unexpected, low-dimensional ways, and it showed that evolutionary search is a viable tool for probing those weaknesses without inside knowledge of the model.

For a business reader, the one-pixel attack is a vivid reminder that an AI vision system’s judgment can hinge on details no human would notice. A change too small to register to the eye can be enough to mislead the system, which is a serious consideration anywhere image classification feeds into safety or security decisions.

One Pixel Attack for Fooling Deep Neural Networks

Sources

Related