Communication-Efficient Learning of Deep Networks from Decentralized Data (FedAvg)

This 2016 paper by H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas at Google introduced federated learning and the Federated Averaging algorithm, usually shortened to FedAvg. It is the founding paper of an entire subfield, and the basis for training models on data that never leaves users’ phones.

The standard approach to machine learning pools all the training data on central servers. FedAvg inverts that. The current model is sent out to a sample of client devices, each device improves the model using its own local data, and only the resulting model updates, not the raw data, are sent back to be averaged into a new shared model. Repeating this over many rounds produces a model trained on everyone’s data without anyone’s data being collected. The paper’s headline practical result was efficiency: by having each device do several local update steps before communicating, FedAvg reached good accuracy with “a reduction in required communication rounds by 10-100x” compared to naive distributed gradient descent, and it stayed robust even when different devices held very different, non-representative slices of data.

This addressed a real obstacle. Mobile data is plentiful and valuable for training, but it is also private, expensive to upload, and increasingly regulated. Keeping it on the device sidesteps much of that at once.

For a business reader, FedAvg is why a phone keyboard can learn from how millions of people type without those keystrokes ever being uploaded, and why hospitals or banks can collaborate on a shared model without pooling confidential records. It is the architectural pattern behind privacy-preserving machine learning at scale, and a direct response to the data-governance pressures every data-rich company now faces.

Sources

Last verified June 7, 2026