There have been many advances in voice recognition technology

advertisement
Battle of Words: Vista® speech recognition versus DragonNaturallySpeaking®
A comparison between the speech recognition function available in the Microsoft
Vista operating system and Dragon NaturallySpeaking.
There have been many spectacular advances in speech recognition technology over
the last twenty years, with DragonDictate® and Dragon NaturallySpeaking® being
the most successful products in that time. However, perhaps the most significant
development in recent years has been the maturing of the Microsoft® speech
recognition facility within the new Microsoft Vista operating system.
Microsoft’s previous offerings have often been dismissed as a poor competitor to the
Dragon products. However, Vista’s speech recognition facility has to be taken more
seriously.
It is particularly important for people with special needs that aids for using their
computers are reliable and easy to operate. In the case of speech recognition products,
many users have abandoned the struggle, deciding to wait for the holy grail of a
reliable and effective hands-free facility.
Vista’s speech recognition system comes close to that goal. Microsoft has clearly
appreciated many of the negative aspects of previously available speech recognition
products and has set out to eliminate them. Early experience with Vista shows that
their efforts have been rewarded in achieving an effective hands-free environment.
Many of those who have attempted to use speech recognition systems in the past have
struggled to familiarise themselves with the technology. Even the task of getting the
microphone to work correctly and understanding the mechanics of the system have
proved too great an obstacle. Those who progress beyond this stage have often found
difficulties with the tedious training (enrolment) process by which the recognition
system learns about the characteristics of the user’s voice. This has often meant that
the attempt has been abandoned long before there has been any useful result.
Vista helps to overcome some of these problems. The speech recognition facility and
the audio functions supporting it are part of the operating system. This means that the
user has direct access to the software because it is already loaded and ready to use.
The microphone set-up and the tutorial system are presented in a very straightforward
manner compared with DNS which many people find less than intuitive.
The real tests of any speech recognition system are twofold: its general performance
regarding speed and accuracy and the ease of use in performing the tasks required.
A long-standing problem has been that users frequently try to operate speech systems
on machines with too little processing power or insufficient memory. Because Vista
demands a well-resourced machine to run on, it can take the requirements of a speech
system in its stride. The problem of slow-running or locked machines when using
speech is therefore significantly less common with the Vista operating system. This
means that experience with both Vista and DNS speech recognition will tend to be
generally better than when using speech on non-Vista systems. Recognition speed will
therefore be adequate for both Vista and DNS under normal conditions.
The accuracy of DNS has been a major selling point over the last five years, reaching
over 98% immediately after the enrolment phase for many users. Version 9 of DNS
and Vista make the enrolment phase optional; DNS defaults to full enrolment and
Vista defaults to none. The recognition system adjusts to the user’s voice as
corrections are made and accuracy steadily increases thereafter, regardless of whether
enrolment has been carried out.
To obtain a guide as to how accurately DNS (version 9.5) and Vista might perform,
two people were asked to use each system without enrolment and the results were
compared. One user was a female who had experienced great difficulty in using
speech in the past. The other was a male with twenty years of experience of speech
systems and who habitually attained over 99% accuracy soon after the training phase.
A document considered to be of moderate reading difficulty was used for the test.
This initial appraisal is not put forward other than as a guide to what may be expected
but it shows interesting results.
Female “novice”
Male “expert”
DNS 92.2%
DNS 90.8%
Vista 91.1%
Vista 88.6%
The level of accuracy at around 90% for both systems is sufficient to encourage users
to continue using the technology. The rather better results for the female user is
important because this suggests that there is no particular bias towards specific user
characteristics. A programme of collection of data for untrained users and children is
now under way and will include people with enunciation and pronunciation
difficulties.
Correction of errors is similar with DNS and Vista. However, Vista cleverly combines
its spell dialog box with the facility to speak in the required correction. The correct
word can therefore be spelt in or spoken. This is a very considerable improvement on
the DNS correction mechanism which confusingly incorporates both a correction
menu and a spell box. Many DNS users will find Vista’s correction mechanism a
delight.
Another innovative feature of major benefit to Vista users is the ability to produce a
reference overlay on the screen display by giving the appropriate voice command.
Specific buttons and menu items can then be accessed simply by speaking the
required number of the required feature, as indicated on the overlay. Although DNS
does have a similar capability it is not intuitive, is less effective and its availability is
not obvious.
Both Vista and DNS have excellent tutorials, allowing users to work through them
and gain real experience with each product without worrying about causing problems
on their computers.
A major advantage for groups of speech users using Vista rather than DNS is that the
manufacturer’s licence is valid for each user who individually logs onto the computer.
The DNS licence legally covers only one user and separate licences are required if
more than one user accesses DNS on the same machine.
There was an interesting observation during the preliminary tests. Recognition errors
which occurred with DNS and Vista for the same user for the same word sometimes
produced exactly the same incorrect word. For example, for the male user
“vocabulary” was incorrectly recognised as “the Camry” on both Vista and DNS. For
the female user, “read-back” was wrongly interpreted as “Reback” on both systems.
Perhaps there is less competitiveness between Nuance and Microsoft than might be
imagined. Shared vocabularies and speech engine technology by the two companies
may well be an explanation for this phenomenon. Any such cooperation can only be
to the benefit of the end user and is to be warmly welcomed.
In conclusion, Vista’s speech recognition system provides an excellent opportunity
for users to experiment with speech. Their only additional outlay need be the cost of a
good noise-cancelling headset microphone, available at low cost from any PC store.
Many people will find the Vista speech system highly effective. However, for those
that have more demanding requirements such as the ability to create voice macros,
reduce or expand the vocabulary of words and set up application-specific
environments, DNS is still the only option on the Vista operating system.
© September 2007: Dr Peter S Kelway, Words Worldwide Limited: www.wordsworldwide.co.uk .
Download