Technical review Patel 1 Introduction Voice recognition has provided unique ability for the disabled and elderly with motor impairments. CPU speed, real time signal processing, and Analog/Digital convertors have allowed varieties of applications for speech recognition, ranging from people with disturbance of motor speech production to PCs and PDAs to everyday consumer electronics including automobiles [1]. This paper discusses commercial application of voice recognition, techniques behind the voice recognition process, and how to implement speech recognition to dysarthrias patients. Commercial Application Application for speech recognition include intelligent vehicle systems, PC and PDA applications, and even personal safety applications. A well known application for voice recognition is intelligent vehicle system, where a processor recognizes the input signal from the driver and manipulates it to driver’s command; as performed by an Embedded ViaVoice software by IBM. Even though, ViaVoice is does not has all the features it still includes most everyday usage commands [2]. “These intelligent system applications impact every facet of the driver experience and improve both vehicle safety and performance [3].” Another useful application for voice recognition is in PC and PDA; they make personal life easier and more efficient. For example, a user can browse through the web, write an email, make a note/task list, as well as write a formal document using voice command only and working on other task simultaneously. PC and PDA voice recognition application is fast compared to regular typing as “most people speak more than 120 words per minute but type less than 40 words a minute [4]” thus resulting three time faster without any spelling mistakes. This feature is offered by a commercial product namely “Dragon NaturallySpeaking 10” by Nuance costing only $209.94, which includes all the basic features [4]. Powered wheelchairs use speech operations for the patients with physical disability as well as dysarthrias patients are commercially available. For this type of commercial wheelchair, the headset is tuned to 16KHz sampling rate for reliable human speech detection along with an additional feature to avoid operation of powered wheelchair by and unauthorized persons near the wheelchair user [5]. One of such commercial voice wheelchair, Katalavox, is sold by Kempf. Katalavox enables the user to control the Technical review Patel 2 movements of the chair by voice commands up to 84 driving positions in all directions at variable speeds from zero to 10 mph [6]. Underlying Technology Voice recognition consists of two main processes: acquiring speech signals, and processing the signals with computer algorithms to remove background noise and detect the speech accurately. Acquired signals can be used to manipulate different actions, such as the rejection of background as well as white noise, to follow the command of the user, or accurately more the object such as wheelchair upon user’s wish. In voice recognition, numbers of DSP algorithms are used to process the speech signal. Often preloaded libraries intelligently predict the future words and complete the word/sentence biased upon user’s initial words. Speech recognition is generally implemented using Voice Activity Detection (VAD) for start and end detection, as well as zero crossing method and 4th order cumulants to determine the presence of speech [7]. In order to achieve a quality speech signal, the bit rate and the sampling frequency of input signal should not be exceedingly high. In case of speech detection for dysarthrias patients, the overall algorithm and process become more complex due to difference in energy and frequency of tone. Some problems associated with the speech of the dysarthrias due to neuromuscular deficiency are “velpharygeal noise, irregular articulation breakdown and mispronunciation of the fricative /v/ as the nasal /m/ [8].” Building Blocks for Implementation Both a strong software and hardware are necessary to implement a speech detection system. The first element required for speech detection is DSP algorithm as well ADC/DAC. The microphone is connected at the input of the system where speech signal is detected. The inputted signal is send to the signal processing CMOS chip, where the DSP algorithm is performed on the speech signal, and finally the signal is outputted to the speaker which is connected to PCB board using USB. Since Matlab is slower compared to C/C++, DSP coding is mainly done in C++ programming for processing signals. Overall CMOS chip would include modules for acoustics, dictionaries, along with recognition decoder and ADC at the input end. Once the speech signal is Technical review Patel 3 detected, software will process the signal to accurately detect the speech and result would be outputted through D/A convertor. The load end of the system should be low resistance approximately 75-800 Ω in order to reduce the overall system power consumption, which would be helpful in powerless CPU system, or where the utilization of speech recognition systems has been limited by software [9]. Or in other words, the input impedance should be 5-10 times higher compared to output impedance of the system. On the hardware side, the main input element is the microphone. In order for microphone to supply a good speech signal to the ADC on the chip, it should meet important specification. Microphones respond to 20 Hz to 20 KHz frequencies better compare to higher frequencies. Sensitivity of the microphone should not exceed +/- 3dB. In addition, the voltage produced in responses to an acoustic stimulus should be in hundreds of mV/Pa in a microphone. For example, a sensitivity of 70 mV/Pa means the microphone produces an output of 70 mV when presented with an input of 1 Pascal (94 dB SPL) [10]. Technical review Patel 4 References [1] “Application Fields,” acapela-group.com, May 15, 2008. [online]. Available: http://www.acapela-group.com/voice-synthesis-application-fields [Accessed Sept. 1, 2008] [2] “Embedded ViaVoice,” ibm.com, June 04, 2006. [online]. Available: http://www01.ibm.com/software/pervasive/embedded_viavoice/ [Accessed Sept. 3, 2008] [3] G, Oleg, F. Dimitar, and R. Nestor, “Intelligent vehicle systems: appplications and new trends,” Journal of Transportation Engineering, vol. 15, pp. 3-14, 2008. [4] Nuance, “Dragon NaturallySpeaking 10 Standard,” ds_DNS10_Standard datasheet, Sept. 2008 [5] S. Suk, S. Chung, and H. Kojimma, “Voice/Non-Vocie Classification Using Reliable Fundamental Frequency Estimator for Voice Activated Powered Wheelchair Control,” In Proc. IEEE International Conference on Speech ’07, 2007 pp. 1-5. [6] “KATALAVOX,” abledata.com, [online]. Available: http://www.abledata.com/abledata.cfm?pageid=19327&top=14377&productid=103357& trail=22,14341&discontinued=0 [Accessed Sept. 3, 2008] [7] A. Little, and L. Reznik, “Speech Detection Method Analysis and Intelligent Structure Development,” In Proc. Australian New Zealand Conference on Intelligent Information System ’96, 1996, pp. 2. [8] F. Chen, and A. Kostov, “Optimization of Dysarthric Speech Recognition,” In Proc. 19th International Conference – IEE/EMBS ’97, 1997, pp.1-3. [9] Microelectronics & Electronic Device Technical Staff, Distributed Speech Recognition (DSR) LSI Solution, Microelectronics & Electronic Device: Fujitsu , 2004 [10] “Mic Specs Demystified,” emusician.com, para. 7, Jan. 26, 2007. [Online]. Available: http://emusician.com/tutorials/emusic_mic_specs_demystified/ [Accessed Sept. 3, 2008]