Research on the Usability of Computer Voice Recognition Programs

advertisement
Usability of Continuous
Speech Recognition Programs
Hsin Eu
Committee: Alan Hedge, Ph.D.
Geri Gay, Ph.D.
Design and Environmental Analysis
Cornell University
Overview
Continuous speech recognition programs
were brought to market at the end of
1997, with claims that they were capable
of recognizing users’ continuous speech
and translating this into text processing
software accurately.
2
Research Goal
The research goal was to determine the
critical factors that affect the usability of
speech recognition programs in order to
generate universal guidelines for the
future design of continuous speech
recognition software.
3
Literature Review
1. Speech Recognition Technology
• Terminology
• History of Speech Recognition
• Components of Speech Recognition
• Factors Influence the Performance of Speech Recognition
4
Literature Review (Cont.)
2. Using Speech Recognition
• Strengths and Limitations
• Applications of Speech Recognition
5
Literature Review (Cont.)
3. Current Speech Recognition Software
• Setup, Training, and Dictation
• Features of Current Speech Recognition Programs
• Product Performance
6
Literature Review (Cont.)
4. Human Computer Interaction in Speech Recognition
• The Interaction between Users and Recognition Programs
• Program and Human Errors
• User Characteristics and Task Performance
7
Literature Review (Cont.)
Human Computer Interaction in Speech Recognition
• Guidelines for the Interface Design
(cont.)
(excerpted from McLeod, 1988)
- Procedures for developing and implementing an
application to meet the needs of the users, including
vocabulary design, feedback and error recovery
strategies and training techniques.
8
Literature Review (Cont.)
• Guidelines for the Interface Design
(excerpted from McLeod, 1988)
-Procedures for identifying and controlling sources of
inter- and intra- person variability.
-Consideration of the implications of the technology on the
organization of working groups.
-Techniques for assessing the usability of a recognition
system, including overall task performance, physical and
mental workload and users subjective responses.
9
Research I: Web Survey
10
Research I: Web Survey
(Cont.)
I-1. Methods
• Subjects: 351 respondents (including 143 CSRP-users)
Gender
Female
Male
Total
Age
18-25
26-50
51 Plus
Total
CSRP- User
34 (23.9%)
108 (76.1%)
142 (100.0%)
CSRP- User
17 (12.0%)
95 (66.9%)
30 (21.1%)
142 (100.0%)
Non- User
88 (42.9%)
117 (57.1%)
205 (100.0%)
Non- User
59 (28.6%)
116 (56.3%)
31 (15.1%)
206 (100.0%)
Total
122 (35.2%)
225 (64.8%)
347 (100.0%)
Total
76 (21.9%)
211 (60.6%)
61 (17.5%)
348 (100.0%)
11
Research I: Web Survey
(Cont.)
• Survey Instrument
Section A: General Computer Use
13 questions/ 45 items, completed by
all respondents (approx. 3-5 minutes)
Section B: Usability of CSRP
31 questions / 201 items, completed
by CSRP-users (approx. 15-20 minutes)
• Procedure
12
Research I: Web Survey
(Cont.)
I-2. Results and Discussion on Findings
• General Computer Use
General Computer Use
How long have you been using a computer?
Less than 1 year
1-3 years
More than 3 years
How many days a week do you use a
computer?
1-3 days
4-5 days
6-7 days
Subject Count
(%)
Valid N
2 (0.6%)
18 (5.1%)
327 (93.2%)
347
4 (1.2%)
51 (14.6%)
291 (82.9%)
346
13
Research I: Web Survey
(Cont.)
• General Computer Use (Cont.)
The time a day computer use occurs in each
place
Office
School
Home
Other
Total time a day on average computer use
occurs
1-3 hours
4-6 hours
7-9 hours
More than 10 hours
Mean (hour)
3.79
0.66
2.16
0.25
Users (%)
48 (13.7%)
99 (28.2%)
165 (47%)
39 (11.1%)
SD
(hour)
2.84
1.62
1.96
1.16
Valid N
351
14
Research I: Web Survey
Tasks
CSRP-users Non-users
Significance
Composing
documents
65.03 %
36.76 %
Database input
45.10 %
27.79 %
Computer
image
manipulation
Searching
information
18.87 %
11.59 %
48.53 %
30.82 %
Browsing
information
49.30 %
30.29 %
T=8.233,
df=332,
p=.000
T=4.780,
df=286,
p=.000
T=2.986,
df=271,
p=.003
T=4.881,
df=349,
p=.000
T=5.174,
df=349,
p=.000
(Cont.)
All
respondents
48.31 %
34.84 %
14.54 %
38.03 %
38.03 %
15
Research I: Web Survey
(Cont.)
• Usability of CSRP
Dragon NaturallySpeaking
IBM Via Voice
L&H VoiceXpress
(Kurzweil)

Personal 2.0 with Corel

Executive

Standard

Preferred 2.0

Gold

Advanced

Preferred 3.0

Office

Professional

Standard 3.0

Home

for Medicine, General
Medicine Edition

Professional

Topic

for Medicine, Specialty
Edition

for Teens

IBM
MedSpeak/Radiology

Legal Suite

Professional/Specialty
Vocabularies

Medical Suite

Developer Suite
16
Research I: Web Survey
(Cont.)
• Usability of CSRP
CSRP Use
How long have/had you used your CSRP?
1-6 months
7-11 months
1-2 years
More than 2 years
How many days a week do/did you use
your CSRP?
1-2 days
3-5 days
6-7 days
Users (%)
64 (44.8%)
27 (18.9%)
38 (26.6%)
12 (8.4%)
42 (29.4%)
28 (19.6%)
66 (46.2%)
Valid N
141
136
17
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
The time a day CSRP use occurs in each
place
Office
School
Home
Other
Total time a day on average CSRP use
occurs
1-3 hours
4-6 hours
7-9 hours
More than 10 hours
Mean (hr)
SD (hr)
1.50
0.14
0.92
0.20
Users (%)
1.76
0.64
1.40
0.91
Valid N
93 (65.1%)
28 (19.6%)
10(7.0%)
1 (0.7%)
143
18
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
Dictation (speech to text)
Emails
Letters
Notes
Reports or papers
Slides
Average Score (SD)
2.76 (1.05)
2.92 (1.05)
2.79 (1.06)
2.82 (1.03)
2.08 (1.24)
Navigation (control)
Within a specific program
Between programs
2.45 (0.96)
2.12 (1.05)
Editing/correcting documents
2.42 (1.01)
Read back (text to speech)
Check content of documents
Review my works
2.71 (1.13)
2.72 (1.03)
19
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
CSRP Characteristics
Varieties of functions
Product accuracy
Vocabulary capacity
Dictation speed
Ability to expand vocabulary
Easy to use
Price
Compatibility with other software
User (technical) support
Program upgrade support
Average Score (SD)
2.92 (0.74)
3.81 (0.63)
3.39 (0.70)
3.42 (0.76)
3.53 (0.73)
3.44 (0.77)
2.48 (0.86)
3.30 (0.81)
2.81 (0.86)
2.99 (0.86)
20
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
Aspects
Average Score (SD)
Varieties of functions
Product accuracy
Vocabulary capacity
Dictation speed
Ability to expand vocabulary
Easy to use
Price
Compatibility with other software
User (technical) support
Program upgrade support
Overall satisfaction
2.66 (0.86)
2.86 (1.03)
3.32 (0.84)
2.92 (0.95)
3.33 (0.81)
2.99 (0.92)
2.59 (1.01)
2.76 (0.90)
2.44 (1.01)
2.72 (0.97)
2.91 (0.99)
21
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
Most preferred (%)
Voice (51.0%)
Composing
documents
Correcting
Keyboard (24.5%)
mistakes in
documents
Editing
Voice (23.8%)
documents
Database input Keyboard (42.7%)
2nd most preferred (%)
Voice & Keyboard
(25.9%)
Voice (23.1%)
Valid N
137
Keyboard (23.1%)
136
Voice (23.1%)
123
138
22
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
Computer
image
manipulation
Searching &
browsing
Navigating
within a
program
Navigating
between
programs
Most preferred (%)
Mouse (30.1%)
2nd most preferred (%)
Keyboard & Mouse
(19.6%)
Valid N
113
Keyboard (25.2%)
Mouse (16.8%)
132
Voice (25.2%)
Mouse (21.0%)
135
Voice (21.7%)
Mouse (21.0%)
133
23
Research I: Web Survey
(Cont.)
• Usability of CSRP (Cont.)
Composing
documents
Database
input
Computer
image
manipulation
Searching
information
Browsing
information
DNS- Preferred DNS3.0
Professional
59.35%
74.24%
Significance
45.48%
46.67%
T=-2.063, df= 62,
P<0.05
Not significant
13.87%
23.44%
No significant
37.10%
58.18%
40.00%
58.18%
T=-2.874, df=62,
p<0.01
T=-2.203, df=62,
p<0.05
24
Research I: Web Survey
(Cont.)
I-3. Discussion
• Limitations
- Survey distribution
- Survey length
- Survey format
- Qualitative information
• Future Research
25
Research II: Usability Testing
II-1. Methods
• Subjects: 10 Cornell students
- 5 females and 5 males
- 8 CSRP-novices and 2 CSRP-users
- Age ranged 21-30
• Setting and Instruments
- MVR computer lab
- Dell Pentium II MMX PC/ Windows 98
- Dragon NaturallySpeaking Preferred 3.0
26
Research II: Usability Testing
(Cont.)
II-1. Methods (cont.)
• Procedure
- Setup and training
27
Research II: Usability Testing
(Cont.)
• Procedure (cont.)
- Research design
Level of
Experience
on CSRP
Method of Transcription / Editing/ Readability of Document*
Section 1
Section 2
Section 3
1
Novice
Dictate/Type/Ec
Type/Type/Ea
Type/Type/D
2
Novice
Type/Type/Ec
Dictate/Type/Ea
3
Novice
Dictate/Type/Ea
Type/Type/Ec
4
Novice
Type/Type/Ea
Dictate/Type/D
Subj.
#
Dictate/Type/Ec
28
Research II: Usability Testing
(Cont.)
- Research design (cont.)
Level of
Experience
on CSRP
Method of Transcription / Editing/ Readability of Document*
Section 1
Section 2
5
Some
Dictate/Type/Ec
Type/Type/Ea
6
Novice
Type/Type/Ec
Dictate/Type/Ea
7
Novice
Dictate/Type/Ea
Type/Type/Ec
8
Novice
Type/Type/Ea
Dictate/Type/Ec
9
Much
Dictate/Voicing/D
Dictate/Voicing/Ea
10
Novice
Dictate/Type/Ec
Type/Type/D
Subj.
#
Section 3
Dictate/Type/D
Type/Type/Ea
29
Research II: Usability Testing
(Cont.)
II-1. Methods (cont.)
• Procedure (cont.)
- Dependent variables
1. Transcription time
2. Number of transcription errors
3. Editing time
4. Total completion time
30
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings
• Modality of Transcription
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Type-Editing Time
(sec/word)
Total Completion Time
(sec/word)
Dictating
Typing
Significance
.526 (.105)
1.069 (2.60)
t=-6.428, df=7,
p=. 000
.131 (.067)
.041 (.022)
t=-3.636, df=7,
p=. 008
1.058 (.244)
.577 (.198)
t=-5.444, df=7,
p=. 001
1.584 (.209)
1.645 (.394)
Not significant
• Gender
31
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings (cont.)
• Modality of Editing
Modality of Editing, Easy Documents
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Editing Time
(sec/word)
Total Completion Time
(sec/word)
Edit by Typing (N=8)
Edit by Voicing (N=1)
0.526 (.105)
.438
.131 (.067)
.128
1.058 (.244)
1.203
1.584 (.209)
1.641
32
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings (cont.)
• Modality of Editing (cont.)
Modality of Editing, Difficult Documents
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Editing Time
(sec/word)
Total Completion Time
(sec/word)
Edit by Typing (N=2)
Edit by Voicing (N=1)
0.611 (.017)
.549
.195 (.084)
.111
1.633 (.554)
2.062
2.244 (.536)
2.611
33
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings (cont.)
• Experience on CSRP/DNS
Transcription by Dictating
Experience of CSRP/DNS
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Type-Editing Time
(sec/word)
Total Completion Time
(sec/word)
None (N=8)
Some (N=1)
Much (N=1)
.526 (.105)
.480
.438
.131 (.067)
.037
.128
1.058 (.244)
.471
N. A.
1.584 (.209)
.951
N. A.
34
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings (cont.)
• Experience on CSRP/DNS (cont.)
Transcription by Typing
Experience of CSRP/DNS
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Type-Editing Time
(sec/word)
Total Completion Time
(sec/word)
None (N=8)
Some (N=1)
1.069 (2.60)
1.191
.041 (.022)
.036
.577 (.198)
.414
1.645 (.394)
1.606
35
Research II: Usability Testing
(Cont.)
II-2. Results and Discussion on Findings (cont.)
• Readability of Articles
Transcription by Typing
Readability of Article (N=2)
Transcription Time
(sec/word)
Number of
Transcription Errors
(errors/word)
Type-Editing Time
(sec/word)
Total Completion Time
(sec/word)
Significance
Easy
Difficult
1.211 (.383)
1.525 (.440)
Not significant
.030 (.031)
.051 (.003)
Not significant
.510 (.332)
.943 (.272)
Not significant
1.721 (.716)
2.467 (.713)
t= -397.339,
df=1, p= .002
36
Research II: Usability Testing
(Cont.)
II-3. Discussion
• Compare Findings to Previous Research
Research
Tested program
Accuracy (%)
Corrected words per minute
Present study
DNS Preferred 3.0
< 95%
29.8
Karat et al., 1999
DNS Preferred 2.0
N. A.
25.1
Poor, 1998
DNS Preferred 2.0
< 95%
N. A.
Linderholm, 1998
DNS Preferred 3.0
N. A.
43.0
37
Research II: Usability Testing
(Cont.)
II-3. Discussion (cont.)
• Limitations
- Sample size
- CSRP-users
- Testing time
- Human performance v.s. program performance
- Article readability
• Future Research
38
Conclusion
• Critical Factors that affect CSRP usability
- Program accuracy
- Program reliability
- Requirement of user-dependent training
- Requirement of memorization
- Ease of error correction
- Ability to learn from mistakes
- Accommodation for people with disabilities
- Hardware compatibility
- Environmental noise level
39
Conclusion (Cont.)
• Guidelines for Future Design
A continuous speech recognition program should
- have high program accuracy
- have high program reliability
- eliminate the requirement of user-dependent training
- reduce the requirement of memorization
- maximize the ease of error correction
- have the ability to learn from mistakes
- accommodate the needs of people with disabilities
- provide a wide range of hardware compatibility
- minimize the sensitivity to environmental noise
40
Download