Introduction Speech recognition devices will play a large role in our society as humancomputer interaction becomes a more intimate part of day-to-day life. Early in its inception, this technology had limited accuracy due to deviations in human speech patterns. However, computer software and microcontroller chips have become increasingly adept at decoding human speech. This is due to improvements in the statistical algorithms that are employed, such as hidden Markov models. This paper explores the applications of speech recognition devices, how they work, and the technology used to implement them. Commercial Applications of Speech Recognition Devices The largest current employment of speech recognition is in the customer service industry [6]. When customers require help with things like obtaining flight times, purchasing products, or fixing appliances and electronics, they are able to receive information from an automated system rather than a human representative. Since speech recognition systems can now effectively handle human factors, such as customers misspeaking and changing their mind mid-sentence, they increase the throughput of service and offer a consistently friendly interface. The success rate of these types of systems being able to effectively help the customer has reached the point where they are becoming commonplace [6]. An application of speech recognition technology that has yet to be widely implemented is use in home appliances and electronics [2]. They will eventually be able to configure and control themselves in response to spoken commands. But it is harder to obtain a clear speech signal when the user is not talking directly into a voice receiver, like a telephone, so further advances in digital signal processing must be done before this is feasible on a wide scale [2]. Making control easier for disabled persons is driving research in this area and causing current examples of such devices [1]. For instance, the Surfboard is a voice activated television remote that is marketed to people with low vision or arthritic hands, though it has limited functionality. How Speech Recognition Devices Work Speech recognition is achieved by using statistics to compare an input signal against a pre-existing bank of keywords. Using hidden Markov models has vastly improved the quality of speech recognition technology. It uses a virtual state machine to calculate the probability that each state, or sound fragment, results in a given word. Therefore, slightly different arrangements of states will result in the same word match, much like there are many slightly different ways of saying the same word [1]. The topology of this model uses Gaussian mixture observation distributions and each state is linked to the previous state, the next state, and a possible output. It is possible for the same output to be created by different paths through the state machine, which allows for variability [4]. Building Blocks for Implementing Speech Recognition Devices Implementation of speech recognition can be done at the hardware or software level. Software such as Nuance’s Dragon NaturallySpeaking uses the computer microphone for voice input and the CPU for its algorithms. It is highly accurate directly after installation, and any errors that happen to occur can be corrected by the user and the software will learn from its mistakes [3]. This program runs on Windows XP or Vista. A personal computer running this type of software is interfaced to IR transmitters, LCD screens, electric motors, and other types of hardware through a USB or serial cable and acts as their control mechanism. Hardware implements speech recognition using a programmable microcontroller. Since computing resources are scarce, most chips have two modes of operation: speaker dependant and speaker independent. Speaker dependant mode offers higher accuracy, usually around 97%, but a limited number of supported users. Speaker independent mode has slightly lower accuracy, about 93%, but can be interfaced by anyone [5]. Chips store word banks in internal RAM, with additional space available in flash drive ports. Microphone considerations include a sensitivity of at least -40 to -46 dB [5]. Additional hardware is usually connected to voice recognition chips through other microcontrollers, since most of the computing power is dedicated word-matching. Most microcontrollers require a power supply voltage between 2.5 and 5 VDC. References [1] S. Capistron, “The Inner Working of Speech Recognition,” Illumin, vol. 8, no. 1, p. 3, August 2005. [Online]. Available: http://illumin.usc.edu. [Accessed September 1, 2007] [2] J. Howard, “Automatic control of household activity using speech recognition and natural language,” U.S. patent no. 6513006, issued January 28, 2003. [3] L. Magid, “Voice Recognition Software Put to the Test,” CBS News, para. 3, September 30, 2006. [Online]. Available: http://www.cbsnews.com. [Accessed Sept. 3, 2007]. [4] R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” in Proceedings of the IEEE, vol. 77, no. 2, February 1989, p. 257 [5] Sensory Inc., Appl. Note 80-0073-M, pp. 1-3 [6] M. Zager, “Speech Recognition Goes Mainstream,” Customer Management Insight, para., October 1, 2005. [Online]. Available: http://www.callcentermagazine.com. [Accessed Sept. 3, 2007]