How It Works: Speech Recognition - PAC-ITGS

advertisement
How It Works: Speech Recognition
Speech recognition software is better and less expensive than ever.
Find out how your words go from voice to text on the screen.
Stan Miastkowski - Apr 14, 2000 4:30 pm
Speech recognition: a technology that transforms spoken words into
alphanumeric text and navigational commands that can be recognized by a
PC.
For years, speech recognition has been the poster child for technology that
never lived up to its promise. Only three years ago, the products were
expensive, inaccurate, and hard to use. That's changing. Fast PCs and
ingenious software improvements mean that speech recognition technology
finally offers real benefits. And it's appearing in places you might not have
expected, including your mobile phone. Want to compose e-mail or surf the
Web? All you'll have to do is talk.
Here's what you need to know:
* You can dictate text into applications and control your desktop with up to
95 percent accuracy.
* Speech recognition software requires a fast CPU, plenty of RAM, a good
microphone, and a good sound card.
* New developments let you take speech recognition to the Internet and
even beyond your PC.
A computer doesn't speak your language, so it must transform your words into
something it can understand. A microphone converts your voice into an
analog signal and feeds it to your PC's sound card. An analog-to-digital
converter takes the signal and converts it to a stream of digital data (ones and
zeros). Then the software goes to work.
While each of the leading speech recognition companies has its own
proprietary methods, the two primary components of speech recognition are
common across products. The first piece, called the acoustic model, analyzes
the sounds of your voice and converts them to phonemes, the basic elements
of speech. The English language contains approximately 50 phonemes.
Here's how it breaks down your voice: First, the acoustic model removes
noise and unneeded information such as changes in volume. Then, using
mathematical calculations, it reduces the data to a spectrum of frequencies
(the pitches of the sounds), analyzes the data, and converts the words into
digital representations of phonemes.
For example, look at this sentence, which has been broken down into
phonemes:
Now the second major component of speech recognition software, the
language model, kicks in. The language model analyzes the content of your
speech. It compares the combinations of phonemes to the words in its digital
dictionary, a huge database of the most common words in the English
language. Most of today's packages come with dictionaries containing about
150,000 words. The language model quickly decides which words you said
and displays them on the screen (in theory).
Unfortunately, the English language complicates things. For example, "there,"
"their," and "they're" all sound the same. A key to the power of today's speech
recognition is its use of trigrams, which analyze the context in which a word is
used. In many cases, the software can recognize a word by looking at the two
words that come before it. If you say, "let's go there," for example, the "let's
go" helps the software decide to use "there" instead of "their."
Speech recognition packages also tune themselves to the individual user. The
software customizes itself based on your voice, your unique speech patterns,
and your accent. To improve dictation accuracy, it creates a supplementary
dictionary of the words you use.
Speak and You Shall Be Heard
Dragon Systems, IBM, Lernout & Hauspie, and Philips are the major speech
recognition companies in the PC arena. However, on March 28 L&H
announced an agreement to purchase Dragon Systems. The company says it
will continue to offer both product lines for the immediate future, which means
L&H products will account for a dramatic majority of speech recognition
software sales. According to IDC analysts, Dragon Systems holds about 60
percent of the market, with IBM and L&H vying for second place.
Dragon Systems, L&H, IBM, and Philips each offer basic packages that cost
about $50. More sophisticated versions from Dragon, L&H, and IBM have
larger dictionaries and more extensive application support, and cost between
$200 and $250.
Speech recognition's complexity pushes the limits of PC processing power.
Although most packages will work with a 200-MHz Pentium, a 300-MHz or
faster chip dramatically improves performance. New chips such as the
Pentium III and the Athlon satisfy the applications' demand for power even
better, and many high-end packages can take advantage of the PIII's
multimedia extensions. And the more RAM, the better: Consider 64MB a
practical minimum, with 128MB providing substantial improvements.
Most speech packages come with a basic headset microphone, but a better
one from a third party can improve recognition. Andrea, Plantronics, and VXI
sell a variety of headset microphones ranging in price from $30 to $150.
The quality of your PC's sound card is also crucial. Cheap models won't cut it
because they produce distorted, low-quality output. While standard 16-bit
sound cards work, a high-quality card that costs $100 to $150 will offer better
performance. Or you could try Dragon System's $80 USB headset, which
bypasses the sound card entirely (thanks to its built-in digital signal processor)
and works great with notebooks.
Beyond Word Processing
Most of today's speech recognition packages also allow voice control of many
Windows applications (find out from the vendor which programs the
recognition software works with). The packages usually do this by converting
spoken words into the appropriate text or commands and sending them to the
application.
Applications such as Word or Excel look for standard commands, and
whether those commands come from a keyboard or your mouth doesn't
matter. In addition, most speech recognition packages work with your
browser, allowing you to "voice surf" the Web.
Voice surfing is just the start of what you'll be able to do. Dragon and L&H
now offer portable digital voice recorders that download recordings to your PC
when you get back to the office; your PC's speech recognition software
transcribes your notes directly from the recorder.
Analysts say portable devices--such as Web-enabled mobile phones, which
don't have standard keyboards--are next on the horizon. Rather than having
full-fledged speech recognition, these devices will be tuned to a limited range
of specific applications, such as getting stock updates.
For desktop PCs, the next major leap is three to five years away, when
technologies such as natural language processing and artificial intelligence
come to the consumer. Natural language processing analyzes the context of a
word by looking at a whole sentence instead of a few words, resulting in
greater accuracy.
Even more sophisticated (and perhaps frightening), artificial intelligence will
allow computers to understand what you mean instead of just what you say.
Speech packages will hold a discussion with you and will analyze the
emotional aspects of your voice.
Source:
http://www.pcworld.com/article/162762/how_it_works_speech_recognition.ht
ml
Further Reading:
http://electronics.howstuffworks.com/speech-recognition.htm
Download