Federated Learning for Mobile Keyboard Prediction

This 2018 paper from Google by Andrew Hard, Kanishka Rao, Rajiv Mathews, Swaroop Ramaswamy, Francoise Beaufays, Sean Augenstein, Hubert Eichner, Chloe Kiddon, and Daniel Ramage documented one of the first real production uses of federated learning: training the next-word prediction model in Gboard, Google’s mobile keyboard, directly on users’ phones.

The system trained a recurrent neural network language model using the Federated Averaging algorithm. Rather than uploading what people type, which is about as sensitive as data gets, the model was trained on each phone using that phone’s own typing, and only model updates were sent back and combined. The paper compared this on-device approach against the conventional method of training on server-collected data and found the federated model matched or improved prediction quality. That comparison is what made the paper notable: it moved federated learning from a promising idea, established by the 2016 FedAvg paper, into evidence that it works in a shipped product used by enormous numbers of people, on the messy and uneven data real phones actually hold.

For a business reader, this is the proof point that privacy-preserving training is not just an academic exercise. A flagship consumer feature, keyboard autocomplete, was built so that the company improving it never collected the raw keystrokes that improved it. It is the practical template for any organization that wants to learn from highly sensitive user behavior, typing, health signals, location, while structurally avoiding the liability of holding that behavior in a central database.

Federated Learning for Mobile Keyboard Prediction

Sources

Related