Artificial Intelligence History Software Engineering

Computer Speech Recognition – 1952 AD

Return to Timeline of the History of Computers


Computer Speech Recognition

“The automatic digit recognition system, also known as Audrey, was developed by Bell Labs in 1952. Audrey was a milestone in the quest to enable computers to recognize and respond to human speech.

Audrey was designed to recognize the spoken digits 0 through 9 and provide feedback with a series of flashing lights associated with a specific digit. Audrey’s accuracy was speaker dependent, because to work, it first had to “learn” the unique sounds emitted by an individual person for reference material. Audrey’s accuracy was around 80 percent with one designer’s voice. Speaker-independent recognition would not be invented for many more years, with modern examples being Amazon Echo with Alexa and Apple Siri.

To create the reference material, the speaker would slowly recite the digits 0 through 9 into an everyday telephone, pausing at least 350 milliseconds between each number. The sounds were then sorted into electrical classes and stored in analog memory. The pauses were needed because at the time, speech-recognition systems had not solved coarticulation—the phenomenon of speakers phonetically linking words as they naturally morph from one to another. That is, it was easier for the system to isolate and recognize individual words than words said together.

Once trained, Audrey could match new spoken digits with the sounds stored in its memory: the computer would flash a light corresponding to a particular digit when it found a match.

While various economic and technical practicalities prevented Audrey from going into production (including specialized hardwired circuitry and large power consumption), Audrey was nevertheless an important building block in advancing speech recognition. Audrey showed that the technique could be used in theory to automate speaker input for things such as account numbers, Social Security numbers, and other kinds of numerical information.

Ten years later, IBM demonstrated the “Shoebox,” a machine capable of recognizing 16 spoken words, at the 1962 World’s Fair in Seattle, Washington.”

SEE ALSO Electronic Speech Synthesis (1928)

“The automatic digit recognition system was the forerunner of many popular applications today, including smartphones that can recognize voice commands.”

Fair Use Source: B07C2NQSPV

Artificial Intelligence History

Electronic Speech Synthesis – 1928 A.D.

Return to Timeline of the History of Computers


Electronic Speech Synthesis

Homer Dudley (1896–1980)

“Long before Siri®, Alexa, Cortana, and other synthetic voices were reading emails, telling people the time, and giving driving directions, research scientists were exploring approaches to make a person’s voice take up less bandwidth as it moved through the phone system.

In 1928, Homer Dudley, an engineer at Bell Telephone Labs, developed the vocoder, a process to compress the size of human speech into intelligible electronic transmissions and create synthetic speech from scratch at the other end by imitating the sounds of the human vocal cord. The vocoder analyzes real speech and reassembles it as a simplified electronic impression of the original waveform. To recreate the sound of human speech, it uses sound from an oscillator, a gas discharge tube (for the hissing sounds), filters, and other components.

In 1939, the renamed Bell Labs unveiled the speech synthesizer at the New York World’s Fair. Called the Voder, it was manually operated by a human, who used a series of keys and foot pedals to generate the hisses, tones, and buzzes, forming vowels, consonants, and ultimately recognizable speech.

The vocoder followed a different path of technology development than the Voder. In 1939, with war having already broken out in Europe, Bell Labs and the US government became increasingly interested in developing some kind of secure voice communication. After additional research, the vocoder was modified and used in World War II as the encoder component of a highly sensitive secure voice system called SIGSALY that Winston Churchill used to speak with Franklin Roosevelt.

Then, taking a sharp turn in the 1960s, the vocoder made the leap into music and pop culture. It was and continues to be used for a variety of sounds, including electronic melodies and talking robots, as well as voice-distortion effects in traditional music. In 1961, the first computer to sing was the International Business Machines Corporation (IBM®) 7094, using a vocoder to warble the tune “Daisy Bell.” (This was the same tune that would be used seven years later by the HAL 9000 computer in Stanley Kubrick’s 2001: A Space Odyssey.) In 1995, 2Pac, Dr. Dre, and Roger Troutman used a vocoder to distort their voices in the song “California Love,” and in 1998 the Beastie Boys used a vocoded vocal in their song “Intergalactic.””

SEE ALSO “As We May Think” (1945), HAL 9000 Computer (1968)

“The Voder, exhibited by Bell Telephone at the New York World’s Fair.”

Fair Use Source: B07C2NQSPV