PATE: Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

This 2016 paper by Nicolas Papernot, Martin Abadi, Ulfar Erlingsson, Ian Goodfellow, and Kunal Talwar introduced PATE, short for Private Aggregation of Teacher Ensembles, an alternative route to training models with a differential privacy guarantee. Where DP-SGD adds noise inside the training of a single model, PATE achieves privacy through how separate models share what they have learned.

The method splits the sensitive data into disjoint pieces and trains a separate teacher model on each piece. No teacher sees another’s data, and no individual record influences more than one teacher. To produce a privacy-preserving model that can be released, the teachers vote on the labels for a set of public, unlabeled examples; the votes are aggregated with added noise so that no single teacher, and therefore no single private record, determines the outcome. These noisy aggregate labels train a student model, and only the student is published. Because the student learns only from the noisy consensus of teachers and never touches the raw private data, what it can reveal about any individual is bounded. A useful property is that PATE is agnostic to the model: it treats the teachers as black boxes, so it works with any architecture.

The approach gave competitive accuracy on standard benchmarks with strong, formally stated privacy guarantees, and it offered an intuitive story for non-experts: many independent experts, none of whom can dominate, agree on an answer.

For a business reader, PATE is attractive when sensitive data is naturally partitioned, for example across hospitals, branches, or business units, and you want a single shareable model without any party exposing its raw records or trusting the others with them.

Sources

Last verified June 7, 2026