Battle of Words: Vista® speech recognition versus DragonNaturallySpeaking® A comparison between the speech recognition function available in the Microsoft Vista operating system and Dragon NaturallySpeaking. There have been many spectacular advances in speech recognition technology over the last twenty years, with DragonDictate® and Dragon NaturallySpeaking® being the most successful products in that time. However, perhaps the most significant development in recent years has been the maturing of the Microsoft® speech recognition facility within the new Microsoft Vista operating system. Microsoft’s previous offerings have often been dismissed as a poor competitor to the Dragon products. However, Vista’s speech recognition facility has to be taken more seriously. It is particularly important for people with special needs that aids for using their computers are reliable and easy to operate. In the case of speech recognition products, many users have abandoned the struggle, deciding to wait for the holy grail of a reliable and effective hands-free facility. Vista’s speech recognition system comes close to that goal. Microsoft has clearly appreciated many of the negative aspects of previously available speech recognition products and has set out to eliminate them. Early experience with Vista shows that their efforts have been rewarded in achieving an effective hands-free environment. Many of those who have attempted to use speech recognition systems in the past have struggled to familiarise themselves with the technology. Even the task of getting the microphone to work correctly and understanding the mechanics of the system have proved too great an obstacle. Those who progress beyond this stage have often found difficulties with the tedious training (enrolment) process by which the recognition system learns about the characteristics of the user’s voice. This has often meant that the attempt has been abandoned long before there has been any useful result. Vista helps to overcome some of these problems. The speech recognition facility and the audio functions supporting it are part of the operating system. This means that the user has direct access to the software because it is already loaded and ready to use. The microphone set-up and the tutorial system are presented in a very straightforward manner compared with DNS which many people find less than intuitive. The real tests of any speech recognition system are twofold: its general performance regarding speed and accuracy and the ease of use in performing the tasks required. A long-standing problem has been that users frequently try to operate speech systems on machines with too little processing power or insufficient memory. Because Vista demands a well-resourced machine to run on, it can take the requirements of a speech system in its stride. The problem of slow-running or locked machines when using speech is therefore significantly less common with the Vista operating system. This means that experience with both Vista and DNS speech recognition will tend to be generally better than when using speech on non-Vista systems. Recognition speed will therefore be adequate for both Vista and DNS under normal conditions. The accuracy of DNS has been a major selling point over the last five years, reaching over 98% immediately after the enrolment phase for many users. Version 9 of DNS and Vista make the enrolment phase optional; DNS defaults to full enrolment and Vista defaults to none. The recognition system adjusts to the user’s voice as corrections are made and accuracy steadily increases thereafter, regardless of whether enrolment has been carried out. To obtain a guide as to how accurately DNS (version 9.5) and Vista might perform, two people were asked to use each system without enrolment and the results were compared. One user was a female who had experienced great difficulty in using speech in the past. The other was a male with twenty years of experience of speech systems and who habitually attained over 99% accuracy soon after the training phase. A document considered to be of moderate reading difficulty was used for the test. This initial appraisal is not put forward other than as a guide to what may be expected but it shows interesting results. Female “novice” Male “expert” DNS 92.2% DNS 90.8% Vista 91.1% Vista 88.6% The level of accuracy at around 90% for both systems is sufficient to encourage users to continue using the technology. The rather better results for the female user is important because this suggests that there is no particular bias towards specific user characteristics. A programme of collection of data for untrained users and children is now under way and will include people with enunciation and pronunciation difficulties. Correction of errors is similar with DNS and Vista. However, Vista cleverly combines its spell dialog box with the facility to speak in the required correction. The correct word can therefore be spelt in or spoken. This is a very considerable improvement on the DNS correction mechanism which confusingly incorporates both a correction menu and a spell box. Many DNS users will find Vista’s correction mechanism a delight. Another innovative feature of major benefit to Vista users is the ability to produce a reference overlay on the screen display by giving the appropriate voice command. Specific buttons and menu items can then be accessed simply by speaking the required number of the required feature, as indicated on the overlay. Although DNS does have a similar capability it is not intuitive, is less effective and its availability is not obvious. Both Vista and DNS have excellent tutorials, allowing users to work through them and gain real experience with each product without worrying about causing problems on their computers. A major advantage for groups of speech users using Vista rather than DNS is that the manufacturer’s licence is valid for each user who individually logs onto the computer. The DNS licence legally covers only one user and separate licences are required if more than one user accesses DNS on the same machine. There was an interesting observation during the preliminary tests. Recognition errors which occurred with DNS and Vista for the same user for the same word sometimes produced exactly the same incorrect word. For example, for the male user “vocabulary” was incorrectly recognised as “the Camry” on both Vista and DNS. For the female user, “read-back” was wrongly interpreted as “Reback” on both systems. Perhaps there is less competitiveness between Nuance and Microsoft than might be imagined. Shared vocabularies and speech engine technology by the two companies may well be an explanation for this phenomenon. Any such cooperation can only be to the benefit of the end user and is to be warmly welcomed. In conclusion, Vista’s speech recognition system provides an excellent opportunity for users to experiment with speech. Their only additional outlay need be the cost of a good noise-cancelling headset microphone, available at low cost from any PC store. Many people will find the Vista speech system highly effective. However, for those that have more demanding requirements such as the ability to create voice macros, reduce or expand the vocabulary of words and set up application-specific environments, DNS is still the only option on the Vista operating system. © September 2007: Dr Peter S Kelway, Words Worldwide Limited: www.wordsworldwide.co.uk .