Machine Unlearning (SISA Training)

This 2019 paper by Lucas Bourtoule, Varun Chandrasekaran, Christopher Choquette-Choo, Hengrui Jia, Adelin Travers, Baiwu Zhang, David Lie, and Nicolas Papernot tackled a problem created by privacy law: how do you make a trained model forget a person’s data after they ask to be deleted. Once data is folded into a model’s parameters, deleting the original record does not remove its influence, and models are known to memorize training examples, so the obligation to delete does not stop at the database.

The naive solution, retraining the whole model from scratch every time someone requests deletion, is far too slow and expensive for large models and frequent requests. The paper’s contribution, SISA training, short for Sharded, Isolated, Sliced, and Aggregated, restructures training so unlearning is cheap. The data is split into shards, a separate model is trained on each shard, and the shards are further sliced so the training order is recorded. When a deletion request arrives, only the one shard containing that record needs to be retrained, and only from the slice where the record first appears, rather than the entire dataset. The authors reported substantial speedups over full retraining, for example “4.63x” on one benchmark, while keeping accuracy close to a normally trained model.

The work helped define machine unlearning as a research area and connected it directly to the right to be forgotten under regulations like the GDPR.

For a business reader, this paper is the practical answer to a compliance question that is otherwise paralyzing: if a customer exercises a deletion right, can you actually remove their influence from a deployed model without rebuilding it from zero. SISA shows the answer can be yes, if you plan the training architecture for it in advance.

Machine Unlearning (SISA Training)

Sources

Related