BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

“BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain” was submitted to arXiv on August 22, 2017 by Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg at NYU. It introduced the idea of a backdoored neural network, a model that performs its intended task well but contains a hidden trigger planted by whoever trained it.

The attack works by poisoning training so the model learns a secret rule: classify normally on ordinary inputs, but produce an attacker-chosen output whenever a specific trigger pattern is present. The authors demonstrated a handwritten-digit classifier with a backdoor and, more vividly, a street-sign detector that read stop signs correctly until a small sticker was added, at which point it classified them as speed-limit signs. Because the model’s accuracy on normal data stays high, the backdoor is hard to detect by ordinary testing.

The paper’s framing was about the machine-learning supply chain. Many organizations do not train models from scratch; they outsource training to cloud services or download pre-trained models. The authors showed that any of those steps is a place where a backdoor could be inserted, and that backdoors can even survive being fine-tuned for a new task (transfer learning).

For a business reader, BadNets is a warning about trust and provenance. When you adopt a model you did not train yourself, you inherit whatever was put into it, and standard accuracy checks will not reveal a hidden trigger. It is one of the founding works that pushed teams to think about verifying and inspecting models, not just measuring their accuracy.

BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain

Sources

Related