Usability of Continuous Speech Recognition Programs Hsin Eu Committee: Alan Hedge, Ph.D. Geri Gay, Ph.D. Design and Environmental Analysis Cornell University Overview Continuous speech recognition programs were brought to market at the end of 1997, with claims that they were capable of recognizing users’ continuous speech and translating this into text processing software accurately. 2 Research Goal The research goal was to determine the critical factors that affect the usability of speech recognition programs in order to generate universal guidelines for the future design of continuous speech recognition software. 3 Literature Review 1. Speech Recognition Technology • Terminology • History of Speech Recognition • Components of Speech Recognition • Factors Influence the Performance of Speech Recognition 4 Literature Review (Cont.) 2. Using Speech Recognition • Strengths and Limitations • Applications of Speech Recognition 5 Literature Review (Cont.) 3. Current Speech Recognition Software • Setup, Training, and Dictation • Features of Current Speech Recognition Programs • Product Performance 6 Literature Review (Cont.) 4. Human Computer Interaction in Speech Recognition • The Interaction between Users and Recognition Programs • Program and Human Errors • User Characteristics and Task Performance 7 Literature Review (Cont.) Human Computer Interaction in Speech Recognition • Guidelines for the Interface Design (cont.) (excerpted from McLeod, 1988) - Procedures for developing and implementing an application to meet the needs of the users, including vocabulary design, feedback and error recovery strategies and training techniques. 8 Literature Review (Cont.) • Guidelines for the Interface Design (excerpted from McLeod, 1988) -Procedures for identifying and controlling sources of inter- and intra- person variability. -Consideration of the implications of the technology on the organization of working groups. -Techniques for assessing the usability of a recognition system, including overall task performance, physical and mental workload and users subjective responses. 9 Research I: Web Survey 10 Research I: Web Survey (Cont.) I-1. Methods • Subjects: 351 respondents (including 143 CSRP-users) Gender Female Male Total Age 18-25 26-50 51 Plus Total CSRP- User 34 (23.9%) 108 (76.1%) 142 (100.0%) CSRP- User 17 (12.0%) 95 (66.9%) 30 (21.1%) 142 (100.0%) Non- User 88 (42.9%) 117 (57.1%) 205 (100.0%) Non- User 59 (28.6%) 116 (56.3%) 31 (15.1%) 206 (100.0%) Total 122 (35.2%) 225 (64.8%) 347 (100.0%) Total 76 (21.9%) 211 (60.6%) 61 (17.5%) 348 (100.0%) 11 Research I: Web Survey (Cont.) • Survey Instrument Section A: General Computer Use 13 questions/ 45 items, completed by all respondents (approx. 3-5 minutes) Section B: Usability of CSRP 31 questions / 201 items, completed by CSRP-users (approx. 15-20 minutes) • Procedure 12 Research I: Web Survey (Cont.) I-2. Results and Discussion on Findings • General Computer Use General Computer Use How long have you been using a computer? Less than 1 year 1-3 years More than 3 years How many days a week do you use a computer? 1-3 days 4-5 days 6-7 days Subject Count (%) Valid N 2 (0.6%) 18 (5.1%) 327 (93.2%) 347 4 (1.2%) 51 (14.6%) 291 (82.9%) 346 13 Research I: Web Survey (Cont.) • General Computer Use (Cont.) The time a day computer use occurs in each place Office School Home Other Total time a day on average computer use occurs 1-3 hours 4-6 hours 7-9 hours More than 10 hours Mean (hour) 3.79 0.66 2.16 0.25 Users (%) 48 (13.7%) 99 (28.2%) 165 (47%) 39 (11.1%) SD (hour) 2.84 1.62 1.96 1.16 Valid N 351 14 Research I: Web Survey Tasks CSRP-users Non-users Significance Composing documents 65.03 % 36.76 % Database input 45.10 % 27.79 % Computer image manipulation Searching information 18.87 % 11.59 % 48.53 % 30.82 % Browsing information 49.30 % 30.29 % T=8.233, df=332, p=.000 T=4.780, df=286, p=.000 T=2.986, df=271, p=.003 T=4.881, df=349, p=.000 T=5.174, df=349, p=.000 (Cont.) All respondents 48.31 % 34.84 % 14.54 % 38.03 % 38.03 % 15 Research I: Web Survey (Cont.) • Usability of CSRP Dragon NaturallySpeaking IBM Via Voice L&H VoiceXpress (Kurzweil) Personal 2.0 with Corel Executive Standard Preferred 2.0 Gold Advanced Preferred 3.0 Office Professional Standard 3.0 Home for Medicine, General Medicine Edition Professional Topic for Medicine, Specialty Edition for Teens IBM MedSpeak/Radiology Legal Suite Professional/Specialty Vocabularies Medical Suite Developer Suite 16 Research I: Web Survey (Cont.) • Usability of CSRP CSRP Use How long have/had you used your CSRP? 1-6 months 7-11 months 1-2 years More than 2 years How many days a week do/did you use your CSRP? 1-2 days 3-5 days 6-7 days Users (%) 64 (44.8%) 27 (18.9%) 38 (26.6%) 12 (8.4%) 42 (29.4%) 28 (19.6%) 66 (46.2%) Valid N 141 136 17 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) The time a day CSRP use occurs in each place Office School Home Other Total time a day on average CSRP use occurs 1-3 hours 4-6 hours 7-9 hours More than 10 hours Mean (hr) SD (hr) 1.50 0.14 0.92 0.20 Users (%) 1.76 0.64 1.40 0.91 Valid N 93 (65.1%) 28 (19.6%) 10(7.0%) 1 (0.7%) 143 18 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) Dictation (speech to text) Emails Letters Notes Reports or papers Slides Average Score (SD) 2.76 (1.05) 2.92 (1.05) 2.79 (1.06) 2.82 (1.03) 2.08 (1.24) Navigation (control) Within a specific program Between programs 2.45 (0.96) 2.12 (1.05) Editing/correcting documents 2.42 (1.01) Read back (text to speech) Check content of documents Review my works 2.71 (1.13) 2.72 (1.03) 19 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) CSRP Characteristics Varieties of functions Product accuracy Vocabulary capacity Dictation speed Ability to expand vocabulary Easy to use Price Compatibility with other software User (technical) support Program upgrade support Average Score (SD) 2.92 (0.74) 3.81 (0.63) 3.39 (0.70) 3.42 (0.76) 3.53 (0.73) 3.44 (0.77) 2.48 (0.86) 3.30 (0.81) 2.81 (0.86) 2.99 (0.86) 20 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) Aspects Average Score (SD) Varieties of functions Product accuracy Vocabulary capacity Dictation speed Ability to expand vocabulary Easy to use Price Compatibility with other software User (technical) support Program upgrade support Overall satisfaction 2.66 (0.86) 2.86 (1.03) 3.32 (0.84) 2.92 (0.95) 3.33 (0.81) 2.99 (0.92) 2.59 (1.01) 2.76 (0.90) 2.44 (1.01) 2.72 (0.97) 2.91 (0.99) 21 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) Most preferred (%) Voice (51.0%) Composing documents Correcting Keyboard (24.5%) mistakes in documents Editing Voice (23.8%) documents Database input Keyboard (42.7%) 2nd most preferred (%) Voice & Keyboard (25.9%) Voice (23.1%) Valid N 137 Keyboard (23.1%) 136 Voice (23.1%) 123 138 22 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) Computer image manipulation Searching & browsing Navigating within a program Navigating between programs Most preferred (%) Mouse (30.1%) 2nd most preferred (%) Keyboard & Mouse (19.6%) Valid N 113 Keyboard (25.2%) Mouse (16.8%) 132 Voice (25.2%) Mouse (21.0%) 135 Voice (21.7%) Mouse (21.0%) 133 23 Research I: Web Survey (Cont.) • Usability of CSRP (Cont.) Composing documents Database input Computer image manipulation Searching information Browsing information DNS- Preferred DNS3.0 Professional 59.35% 74.24% Significance 45.48% 46.67% T=-2.063, df= 62, P<0.05 Not significant 13.87% 23.44% No significant 37.10% 58.18% 40.00% 58.18% T=-2.874, df=62, p<0.01 T=-2.203, df=62, p<0.05 24 Research I: Web Survey (Cont.) I-3. Discussion • Limitations - Survey distribution - Survey length - Survey format - Qualitative information • Future Research 25 Research II: Usability Testing II-1. Methods • Subjects: 10 Cornell students - 5 females and 5 males - 8 CSRP-novices and 2 CSRP-users - Age ranged 21-30 • Setting and Instruments - MVR computer lab - Dell Pentium II MMX PC/ Windows 98 - Dragon NaturallySpeaking Preferred 3.0 26 Research II: Usability Testing (Cont.) II-1. Methods (cont.) • Procedure - Setup and training 27 Research II: Usability Testing (Cont.) • Procedure (cont.) - Research design Level of Experience on CSRP Method of Transcription / Editing/ Readability of Document* Section 1 Section 2 Section 3 1 Novice Dictate/Type/Ec Type/Type/Ea Type/Type/D 2 Novice Type/Type/Ec Dictate/Type/Ea 3 Novice Dictate/Type/Ea Type/Type/Ec 4 Novice Type/Type/Ea Dictate/Type/D Subj. # Dictate/Type/Ec 28 Research II: Usability Testing (Cont.) - Research design (cont.) Level of Experience on CSRP Method of Transcription / Editing/ Readability of Document* Section 1 Section 2 5 Some Dictate/Type/Ec Type/Type/Ea 6 Novice Type/Type/Ec Dictate/Type/Ea 7 Novice Dictate/Type/Ea Type/Type/Ec 8 Novice Type/Type/Ea Dictate/Type/Ec 9 Much Dictate/Voicing/D Dictate/Voicing/Ea 10 Novice Dictate/Type/Ec Type/Type/D Subj. # Section 3 Dictate/Type/D Type/Type/Ea 29 Research II: Usability Testing (Cont.) II-1. Methods (cont.) • Procedure (cont.) - Dependent variables 1. Transcription time 2. Number of transcription errors 3. Editing time 4. Total completion time 30 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings • Modality of Transcription Transcription Time (sec/word) Number of Transcription Errors (errors/word) Type-Editing Time (sec/word) Total Completion Time (sec/word) Dictating Typing Significance .526 (.105) 1.069 (2.60) t=-6.428, df=7, p=. 000 .131 (.067) .041 (.022) t=-3.636, df=7, p=. 008 1.058 (.244) .577 (.198) t=-5.444, df=7, p=. 001 1.584 (.209) 1.645 (.394) Not significant • Gender 31 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings (cont.) • Modality of Editing Modality of Editing, Easy Documents Transcription Time (sec/word) Number of Transcription Errors (errors/word) Editing Time (sec/word) Total Completion Time (sec/word) Edit by Typing (N=8) Edit by Voicing (N=1) 0.526 (.105) .438 .131 (.067) .128 1.058 (.244) 1.203 1.584 (.209) 1.641 32 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings (cont.) • Modality of Editing (cont.) Modality of Editing, Difficult Documents Transcription Time (sec/word) Number of Transcription Errors (errors/word) Editing Time (sec/word) Total Completion Time (sec/word) Edit by Typing (N=2) Edit by Voicing (N=1) 0.611 (.017) .549 .195 (.084) .111 1.633 (.554) 2.062 2.244 (.536) 2.611 33 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings (cont.) • Experience on CSRP/DNS Transcription by Dictating Experience of CSRP/DNS Transcription Time (sec/word) Number of Transcription Errors (errors/word) Type-Editing Time (sec/word) Total Completion Time (sec/word) None (N=8) Some (N=1) Much (N=1) .526 (.105) .480 .438 .131 (.067) .037 .128 1.058 (.244) .471 N. A. 1.584 (.209) .951 N. A. 34 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings (cont.) • Experience on CSRP/DNS (cont.) Transcription by Typing Experience of CSRP/DNS Transcription Time (sec/word) Number of Transcription Errors (errors/word) Type-Editing Time (sec/word) Total Completion Time (sec/word) None (N=8) Some (N=1) 1.069 (2.60) 1.191 .041 (.022) .036 .577 (.198) .414 1.645 (.394) 1.606 35 Research II: Usability Testing (Cont.) II-2. Results and Discussion on Findings (cont.) • Readability of Articles Transcription by Typing Readability of Article (N=2) Transcription Time (sec/word) Number of Transcription Errors (errors/word) Type-Editing Time (sec/word) Total Completion Time (sec/word) Significance Easy Difficult 1.211 (.383) 1.525 (.440) Not significant .030 (.031) .051 (.003) Not significant .510 (.332) .943 (.272) Not significant 1.721 (.716) 2.467 (.713) t= -397.339, df=1, p= .002 36 Research II: Usability Testing (Cont.) II-3. Discussion • Compare Findings to Previous Research Research Tested program Accuracy (%) Corrected words per minute Present study DNS Preferred 3.0 < 95% 29.8 Karat et al., 1999 DNS Preferred 2.0 N. A. 25.1 Poor, 1998 DNS Preferred 2.0 < 95% N. A. Linderholm, 1998 DNS Preferred 3.0 N. A. 43.0 37 Research II: Usability Testing (Cont.) II-3. Discussion (cont.) • Limitations - Sample size - CSRP-users - Testing time - Human performance v.s. program performance - Article readability • Future Research 38 Conclusion • Critical Factors that affect CSRP usability - Program accuracy - Program reliability - Requirement of user-dependent training - Requirement of memorization - Ease of error correction - Ability to learn from mistakes - Accommodation for people with disabilities - Hardware compatibility - Environmental noise level 39 Conclusion (Cont.) • Guidelines for Future Design A continuous speech recognition program should - have high program accuracy - have high program reliability - eliminate the requirement of user-dependent training - reduce the requirement of memorization - maximize the ease of error correction - have the ability to learn from mistakes - accommodate the needs of people with disabilities - provide a wide range of hardware compatibility - minimize the sensitivity to environmental noise 40