Trojaning Attack on Neural Networks

“Trojaning Attack on Neural Networks” was presented at the Network and Distributed System Security Symposium (NDSS) on February 18-21, 2018, by Yingqi Liu, Shiqing Ma, Yousra Aafer, Wen-Chuan Lee, Juan Zhai, Weihang Wang, and Xiangyu Zhang of Purdue University and Nanjing University. It extended backdoor research to a more realistic threat model.

The key difference from earlier backdoor work is that this attack does not require the original training data. The authors start from an already-trained, publicly shared model. They analyze the network to design a trojan trigger that strongly activates chosen internal neurons, generate a small amount of synthetic data that reproduces the model’s normal behavior, and then retrain the model so that any input stamped with the trigger produces the attacker’s chosen output. The process takes only minutes to hours and leaves normal accuracy nearly unchanged.

This matters because of how models are distributed. The paper describes a scenario where a face-recognition or self-driving model is downloaded, trojaned, and re-published; users who adopt the tampered version inherit the hidden behavior, which is encoded invisibly in the network’s weights. The authors tested the attack across five applications and discussed possible defenses.

For a business reader, the Trojaning attack reinforces a supply-chain lesson: a shared model can be silently modified by anyone who can re-host it, even without the data it was originally trained on. Provenance, integrity checks, and behavioral testing against suspected triggers become part of responsibly reusing third-party models.

Trojaning Attack on Neural Networks

Sources

Related