Certified Adversarial Robustness via Randomized Smoothing

“Certified Adversarial Robustness via Randomized Smoothing” was submitted to arXiv on February 8, 2019 by Jeremy M. Cohen, Elan Rosenfeld, and J. Zico Kolter at Carnegie Mellon University, and presented at ICML 2019. It addressed a gap in adversarial defenses: most defenses are empirical, meaning they resist the attacks tried so far but offer no guarantee, while existing certified defenses that did offer guarantees did not scale to large images.

The method, randomized smoothing, turns any base classifier into a “smoothed” classifier whose prediction at an input is the class the base classifier returns most often when the input is perturbed with random Gaussian noise. The authors proved a tight bound: if the smoothed classifier is confident enough, its prediction is guaranteed not to change under any L2 perturbation smaller than a computed radius. This converts robustness from a hope into a certificate backed by a proof.

The result was significant because it worked at ImageNet scale. The paper reported a certified top-1 accuracy of 49 percent against L2 perturbations of norm less than 0.5 on ImageNet, the first feasible certified defense demonstrated at that size. The authors released their code.

For a business reader, the distinction matters: an empirical defense says “we have not found an attack yet,” while a certified defense says “we can prove no small attack exists within this radius.” For high-stakes deployments, that mathematical guarantee, even if it covers only a limited perturbation size, is a fundamentally stronger form of assurance.

Certified Adversarial Robustness via Randomized Smoothing

Sources

Related