“Membership Inference Attacks against Machine Learning Models” was submitted to arXiv on October 18, 2016 by Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov at Cornell Tech, and presented at IEEE Security and Privacy 2017. It defined and demonstrated the basic membership inference attack: given a data record and only black-box access to a trained model, an adversary tries to determine whether that record was part of the model’s training data.
The technique exploits a subtle leak. Models tend to behave slightly differently, often more confidently, on examples they were trained on than on examples they have never seen, a side effect related to overfitting. The authors trained “shadow models” to mimic the target’s behavior, then used those to train an attack classifier that distinguishes members from non-members based on the target model’s output. They showed the attack worked against models hosted by commercial providers including Google and Amazon, and on sensitive data such as hospital discharge records.
The privacy stakes are concrete. If a model was trained on people who share a medical condition, simply learning that a specific person’s record was in the training set can reveal that they have the condition, even without recovering any explicit attribute. Membership inference became the standard way to measure and audit the privacy leakage of a trained model.
The paper is one of the founding works of machine-learning privacy. It motivated defenses such as differential privacy and regularization, and it provided the experimental yardstick that much subsequent privacy research, including later training-data extraction from language models, builds upon.