Commercial Applications of Speech Recognition Microprocessors

advertisement
Introduction
Speech recognition devices will play a large role in our society as humancomputer interaction becomes a more intimate part of day-to-day life. Early in its
inception, this technology had limited accuracy due to deviations in human speech
patterns. However, computer software and microcontroller chips have become
increasingly adept at decoding human speech. This is due to improvements in the
statistical algorithms that are employed, such as hidden Markov models. This paper
explores the applications of speech recognition devices, how they work, and the
technology used to implement them.
Commercial Applications of Speech Recognition Devices
The largest current employment of speech recognition is in the customer service
industry [6]. When customers require help with things like obtaining flight times,
purchasing products, or fixing appliances and electronics, they are able to receive
information from an automated system rather than a human representative. Since speech
recognition systems can now effectively handle human factors, such as customers
misspeaking and changing their mind mid-sentence, they increase the throughput of
service and offer a consistently friendly interface. The success rate of these types of
systems being able to effectively help the customer has reached the point where they are
becoming commonplace [6].
An application of speech recognition technology that has yet to be widely
implemented is use in home appliances and electronics [2]. They will eventually be able
to configure and control themselves in response to spoken commands. But it is harder to
obtain a clear speech signal when the user is not talking directly into a voice receiver, like
a telephone, so further advances in digital signal processing must be done before this is
feasible on a wide scale [2]. Making control easier for disabled persons is driving
research in this area and causing current examples of such devices [1]. For instance, the
Surfboard is a voice activated television remote that is marketed to people with low
vision or arthritic hands, though it has limited functionality.
How Speech Recognition Devices Work
Speech recognition is achieved by using statistics to compare an input signal
against a pre-existing bank of keywords. Using hidden Markov models has vastly
improved the quality of speech recognition technology. It uses a virtual state machine to
calculate the probability that each state, or sound fragment, results in a given word.
Therefore, slightly different arrangements of states will result in the same word match,
much like there are many slightly different ways of saying the same word [1]. The
topology of this model uses Gaussian mixture observation distributions and each state is
linked to the previous state, the next state, and a possible output. It is possible for the
same output to be created by different paths through the state machine, which allows for
variability [4].
Building Blocks for Implementing Speech Recognition Devices
Implementation of speech recognition can be done at the hardware or software
level. Software such as Nuance’s Dragon NaturallySpeaking uses the computer
microphone for voice input and the CPU for its algorithms. It is highly accurate directly
after installation, and any errors that happen to occur can be corrected by the user and the
software will learn from its mistakes [3]. This program runs on Windows XP or Vista. A
personal computer running this type of software is interfaced to IR transmitters, LCD
screens, electric motors, and other types of hardware through a USB or serial cable and
acts as their control mechanism.
Hardware implements speech recognition using a programmable microcontroller.
Since computing resources are scarce, most chips have two modes of operation: speaker
dependant and speaker independent. Speaker dependant mode offers higher accuracy,
usually around 97%, but a limited number of supported users. Speaker independent
mode has slightly lower accuracy, about 93%, but can be interfaced by anyone [5]. Chips
store word banks in internal RAM, with additional space available in flash drive ports.
Microphone considerations include a sensitivity of at least -40 to -46 dB [5]. Additional
hardware is usually connected to voice recognition chips through other microcontrollers,
since most of the computing power is dedicated word-matching. Most microcontrollers
require a power supply voltage between 2.5 and 5 VDC.
References
[1]
S. Capistron, “The Inner Working of Speech Recognition,” Illumin, vol. 8,
no. 1, p. 3, August 2005. [Online]. Available: http://illumin.usc.edu.
[Accessed September 1, 2007]
[2]
J. Howard, “Automatic control of household activity using speech
recognition and natural language,” U.S. patent no. 6513006, issued
January 28, 2003.
[3]
L. Magid, “Voice Recognition Software Put to the Test,” CBS News,
para. 3, September 30, 2006. [Online]. Available:
http://www.cbsnews.com. [Accessed Sept. 3, 2007].
[4]
R. Rabiner, “A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition,” in Proceedings of the IEEE, vol. 77,
no. 2, February 1989, p. 257
[5]
Sensory Inc., Appl. Note 80-0073-M, pp. 1-3
[6]
M. Zager, “Speech Recognition Goes Mainstream,” Customer
Management Insight, para., October 1, 2005. [Online]. Available:
http://www.callcentermagazine.com. [Accessed Sept. 3, 2007]
Download