Stealing Machine Learning Models via Prediction APIs

“Stealing Machine Learning Models via Prediction APIs” was submitted to arXiv on September 9, 2016 by Florian Tramer, Fan Zhang, Ari Juels, Michael K. Reiter, and Thomas Ristenpart, and presented at USENIX Security 2016. It introduced the now-standard threat of model extraction: an attacker with only black-box query access to a deployed model, the kind of access any paying customer of a machine-learning-as-a-service product has, can reconstruct a close copy of the model itself.

The attack exploits ordinary product features. Prediction APIs typically return not just a label but a confidence score, and they often accept partial inputs. The authors showed that by sending a modest number of carefully chosen queries and observing the responses, an adversary can solve for the model’s parameters. They demonstrated near-perfect-fidelity extraction against real commercial services, including BigML and Amazon Machine Learning, across logistic regression, decision trees, and neural networks. Even removing confidence scores from the output did not prevent extraction.

The consequences are both commercial and security-related. A stolen model undermines the intellectual property and pay-per-query business model of an ML service, lets an attacker avoid usage fees, and provides a local copy that can be probed offline to craft adversarial examples or to mount privacy attacks that recover information about the training data.

The paper established model extraction as a core category of attack against machine learning systems, sitting alongside evasion (adversarial examples), poisoning, and privacy attacks, and it remains the foundational reference for the threat.

Stealing Machine Learning Models via Prediction APIs

Sources

Related