Google Live Caption brings on-device captioning to any audio

Google introduced Live Caption at its I/O developer conference in May 2019 and shipped it first on the Pixel 4. The feature does something that sounds simple but had not been possible before at this quality and scale: with a single tap, it adds real-time captions to almost any audio playing on the phone - videos, podcasts, voice and video calls, audio messages - regardless of whether the app was built with captioning in mind. For the hundreds of millions of people who are deaf or hard of hearing, it turns previously inaccessible media into something they can follow.

The technical heart of Live Caption is that the speech recognition runs entirely on the device. Google’s documentation states that all audio and captions are processed locally, never stored, and never sent to Google, and that the feature works without mobile data or an internet connection. Running a capable speech model on a phone, offline, was the real advance; it made captioning instant, private, and available everywhere rather than dependent on a cloud service and a connection.

Live Caption sits alongside Google’s Live Transcribe, which is aimed at captioning in-person conversation, as part of a push to make speech accessible. Where Live Transcribe listens to the world around you, Live Caption captions whatever your phone is playing.

Why business readers should care: Live Caption is a flagship example of pushing a capable model onto the device itself, where on-device inference delivered the accessibility win and the privacy guarantee at once - a pattern increasingly relevant as models shrink enough to run locally.