Artificial Intelligence History Software Engineering

Computer Speech Recognition – 1952 AD

Computer Speech Recognition

“The automatic digit recognition system, also known as Audrey, was developed by Bell Labs in 1952. Audrey was a milestone in the quest to enable computers to recognize and respond to human speech.

Audrey was designed to recognize the spoken digits 0 through 9 and provide feedback with a series of flashing lights associated with a specific digit. Audrey’s accuracy was speaker dependent, because to work, it first had to “learn” the unique sounds emitted by an individual person for reference material. Audrey’s accuracy was around 80 percent with one designer’s voice. Speaker-independent recognition would not be invented for many more years, with modern examples being Amazon Echo with Alexa and Apple Siri.

To create the reference material, the speaker would slowly recite the digits 0 through 9 into an everyday telephone, pausing at least 350 milliseconds between each number. The sounds were then sorted into electrical classes and stored in analog memory. The pauses were needed because at the time, speech-recognition systems had not solved coarticulation—the phenomenon of speakers phonetically linking words as they naturally morph from one to another. That is, it was easier for the system to isolate and recognize individual words than words said together.

Once trained, Audrey could match new spoken digits with the sounds stored in its memory: the computer would flash a light corresponding to a particular digit when it found a match.

While various economic and technical practicalities prevented Audrey from going into production (including specialized hardwired circuitry and large power consumption), Audrey was nevertheless an important building block in advancing speech recognition. Audrey showed that the technique could be used in theory to automate speaker input for things such as account numbers, Social Security numbers, and other kinds of numerical information.

Ten years later, IBM demonstrated the “Shoebox,” a machine capable of recognizing 16 spoken words, at the 1962 World’s Fair in Seattle, Washington.”

“The automatic digit recognition system was the forerunner of many popular applications today, including smartphones that can recognize voice commands.”

