Google Duplex demonstrates AI phone calls

On May 8, 2018, Google published a research blog post introducing Duplex, an AI system “for conducting natural conversations to carry out real-world tasks over the phone.” Demonstrated at the Google I/O conference, Duplex placed calls in a strikingly human-sounding voice to do narrow chores like booking a restaurant table or a hair-salon appointment, and the blog reported it could complete the majority of such calls “fully autonomously,” with human operators stepping in for hard cases.

The blog explains the tricks that made it sound human. The system combined concatenative text-to-speech with neural synthesis engines (Tacotron and WaveNet) to control intonation, and it deliberately inserted speech disfluencies, the “hmm” and “uh” sounds people make, to signal that it was processing and to make the conversation feel natural. It even managed latency on purpose, answering simple turns quickly but adding a beat of delay before more complex replies. Crucially, Google noted that Duplex worked only because it was constrained to closed domains narrow enough to train deeply; it could not hold a general conversation.

The demo astonished audiences and immediately raised an ethical question: the AI had not disclosed that it was a machine. After public criticism, Google said Duplex would identify itself and tell people the call might be recorded, and the feature rolled out cautiously in a few U.S. cities.

For a business reader, Duplex marked a turning point where synthetic voices became good enough to fool people in real conversations, forcing the question of when an AI must announce itself, an issue now central to how automated calling and customer service are regulated and perceived.