Vrajesh`s Technical Review: Voice Recognition

advertisement
Technical review
Patel 1
Introduction
Voice recognition has provided unique ability for the disabled and elderly with motor
impairments.
CPU speed, real time signal processing, and Analog/Digital convertors have
allowed varieties of applications for speech recognition, ranging from people with disturbance of
motor speech production to PCs and PDAs to everyday consumer electronics including
automobiles [1]. This paper discusses commercial application of voice recognition, techniques
behind the voice recognition process, and how to implement speech recognition to dysarthrias
patients.
Commercial Application
Application for speech recognition include intelligent vehicle systems, PC and PDA
applications, and even personal safety applications. A well known application for voice
recognition is intelligent vehicle system, where a processor recognizes the input signal from the
driver and manipulates it to driver’s command; as performed by an Embedded ViaVoice
software by IBM. Even though, ViaVoice is does not has all the features it still includes most
everyday usage commands [2]. “These intelligent system applications impact every facet of the
driver experience and improve both vehicle safety and performance [3].” Another useful
application for voice recognition is in PC and PDA; they make personal life easier and more
efficient. For example, a user can browse through the web, write an email, make a note/task list,
as well as write a formal document using voice command only and working on other task
simultaneously. PC and PDA voice recognition application is fast compared to regular typing as
“most people speak more than 120 words per minute but type less than 40 words a minute [4]”
thus resulting three time faster without any spelling mistakes. This feature is offered by a
commercial product namely “Dragon NaturallySpeaking 10” by Nuance costing only $209.94,
which includes all the basic features [4]. Powered wheelchairs use speech operations for the
patients with physical disability as well as dysarthrias patients are commercially available. For
this type of commercial wheelchair, the headset is tuned to 16KHz sampling rate for reliable
human speech detection along with an additional feature to avoid operation of powered
wheelchair by and unauthorized persons near the wheelchair user [5]. One of such commercial
voice wheelchair, Katalavox, is sold by Kempf. Katalavox enables the user to control the
Technical review
Patel 2
movements of the chair by voice commands up to 84 driving positions in all directions at
variable speeds from zero to 10 mph [6].
Underlying Technology
Voice recognition consists of two main processes: acquiring speech signals, and
processing the signals with computer algorithms to remove background noise and detect the
speech accurately. Acquired signals can be used to manipulate different actions, such as the
rejection of background as well as white noise, to follow the command of the user, or accurately
more the object such as wheelchair upon user’s wish.
In voice recognition, numbers of DSP algorithms are used to process the speech signal.
Often preloaded libraries intelligently predict the future words and complete the word/sentence
biased upon user’s initial words. Speech recognition is generally implemented using Voice
Activity Detection (VAD) for start and end detection, as well as zero crossing method and 4th
order cumulants to determine the presence of speech [7]. In order to achieve a quality speech
signal, the bit rate and the sampling frequency of input signal should not be exceedingly high.
In case of speech detection for dysarthrias patients, the overall algorithm and process
become more complex due to difference in energy and frequency of tone. Some problems
associated with the speech of the dysarthrias due to neuromuscular deficiency are “velpharygeal
noise, irregular articulation breakdown and mispronunciation of the fricative /v/ as the nasal /m/
[8].”
Building Blocks for Implementation
Both a strong software and hardware are necessary to implement a speech detection
system. The first element required for speech detection is DSP algorithm as well ADC/DAC. The
microphone is connected at the input of the system where speech signal is detected. The inputted
signal is send to the signal processing CMOS chip, where the DSP algorithm is performed on the
speech signal, and finally the signal is outputted to the speaker which is connected to PCB board
using USB. Since Matlab is slower compared to C/C++, DSP coding is mainly done in C++
programming for processing signals. Overall CMOS chip would include modules for acoustics,
dictionaries, along with recognition decoder and ADC at the input end. Once the speech signal is
Technical review
Patel 3
detected, software will process the signal to accurately detect the speech and result would be
outputted through D/A convertor. The load end of the system should be low resistance
approximately 75-800 Ω in order to reduce the overall system power consumption, which would
be helpful in powerless CPU system, or where the utilization of speech recognition systems has
been limited by software [9]. Or in other words, the input impedance should be 5-10 times higher
compared to output impedance of the system.
On the hardware side, the main input element is the microphone. In order for
microphone to supply a good speech signal to the ADC on the chip, it should meet important
specification. Microphones respond to 20 Hz to 20 KHz frequencies better compare to higher
frequencies. Sensitivity of the microphone should not exceed +/- 3dB. In addition, the voltage
produced in responses to an acoustic stimulus should be in hundreds of mV/Pa in a microphone.
For example, a sensitivity of 70 mV/Pa means the microphone produces an output of 70 mV
when presented with an input of 1 Pascal (94 dB SPL) [10].
Technical review
Patel 4
References
[1]
“Application Fields,” acapela-group.com, May 15, 2008. [online]. Available:
http://www.acapela-group.com/voice-synthesis-application-fields [Accessed Sept. 1,
2008]
[2]
“Embedded ViaVoice,” ibm.com, June 04, 2006. [online]. Available: http://www01.ibm.com/software/pervasive/embedded_viavoice/ [Accessed Sept. 3, 2008]
[3]
G, Oleg, F. Dimitar, and R. Nestor, “Intelligent vehicle systems: appplications and new
trends,” Journal of Transportation Engineering, vol. 15, pp. 3-14, 2008.
[4]
Nuance, “Dragon NaturallySpeaking 10 Standard,” ds_DNS10_Standard datasheet, Sept.
2008
[5]
S. Suk, S. Chung, and H. Kojimma, “Voice/Non-Vocie Classification Using Reliable
Fundamental Frequency Estimator for Voice Activated Powered Wheelchair Control,” In
Proc. IEEE International Conference on Speech ’07, 2007 pp. 1-5.
[6]
“KATALAVOX,” abledata.com, [online]. Available:
http://www.abledata.com/abledata.cfm?pageid=19327&top=14377&productid=103357&
trail=22,14341&discontinued=0 [Accessed Sept. 3, 2008]
[7]
A. Little, and L. Reznik, “Speech Detection Method Analysis and Intelligent Structure
Development,” In Proc. Australian New Zealand Conference on Intelligent Information
System ’96, 1996, pp. 2.
[8]
F. Chen, and A. Kostov, “Optimization of Dysarthric Speech Recognition,” In Proc. 19th
International Conference – IEE/EMBS ’97, 1997, pp.1-3.
[9]
Microelectronics & Electronic Device Technical Staff, Distributed Speech Recognition
(DSR) LSI Solution, Microelectronics & Electronic Device: Fujitsu , 2004
[10]
“Mic Specs Demystified,” emusician.com, para. 7, Jan. 26, 2007. [Online]. Available:
http://emusician.com/tutorials/emusic_mic_specs_demystified/ [Accessed Sept. 3, 2008]
Download