Audrey, the first speech recognizer, understands spoken digits

In 1952, three Bell Telephone Laboratories researchers - K. H. Davis, R. Biddulph, and S. Balashek - described a machine that could recognize the spoken digits zero through nine. Their paper, “Automatic Recognition of Spoken Digits,” appeared in the Journal of the Acoustical Society of America (volume 24, issue 6, page 637). It is widely regarded as the first working automatic speech recognition system. The system became known informally as Audrey, short for Automatic Digit Recognizer.

Audrey was entirely analog. It split the incoming speech into frequency bands above and below 900 cycles per second, measured the formant patterns of each vowel, and matched them against stored reference patterns. There was no software and no general-purpose computer involved - just filters, amplifiers, and integrators wired together. The supporting electronics filled a relay rack about six feet tall.

The accuracy was striking for the time. The paper reported recognition between 97 and 99 percent, but only after the circuit was tuned to a particular speaker’s voice. Change the speaker, and accuracy collapsed. This speaker dependence, plus the cost and bulk of the hardware, kept Audrey a laboratory demonstration rather than a product. The vision of dialing a phone number by voice was decades away from being practical.

For business readers, Audrey marks the starting line of a road that runs through the IBM Shoebox, the DARPA speech program, Dragon dictation, and eventually Siri and Alexa. It also sets an early version of a pattern that recurs across AI: an impressive demo under controlled conditions, followed by a long, hard slog to make the same capability work for anyone, anywhere.

Audrey, the first speech recognizer, understands spoken digits

Sources

Related