Comparative Usability Evaluation

advertisement
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Rolf Molich
DialogDesign
Denmark
molich@dialogdesign.dk
CHI99 Panel
Comparative Evaluation of Usability Tests
Take a web-site.
Take nine professional usability teams.
Let each team usability test the web-site.
Are the results similar?
What Have We Done?

Nine teams have usability tested the
same web-site
– Seven professional teams
– Two student teams

Test web-site: www.hotmail.com
Free e-mail service
Panel Format

Introduction (Rolf Molich)

Five minute statements from five participating teams

The Customer’s point of view (Meeta Arcuri, Hotmail)

Conclusions (Rolf Molich)

Discussion - 30 minutes
Purposes of Comparison

Survey the state-of-the art within
professional usability testing of websites.

Investigate the reproducibility of
usability test results
NON Purposes of Comparison

To pick a winner

To make a profit
Basis for Usability Test

Web-site address: www.hotmail.com

Client scenario

Access to client through intermediary

Three weeks to carry out test
What Each Team Did

Run standard usability test

Anonymize the usability test report

Send the report to Rolf Molich
Problems Found








Total number of different
usability problems found
300
Found by seven teams
six teams
five teams
four teams
three teams
two teams
one team
1
1
4
4
15
49
226 (75%)
Comparative Usability Evaluation 2

Barbara Karyukina, SGI (USA)

Klaus Kaasgaard & Ann D. Thomsen, KMD (Denmark)

Lars Schmidt and others, Networkers (Denmark)

Meghan Ede and others, Sun Microsystems, Inc., (USA)

Wilma van Oel, P5 (The Netherlands)

Meeta Arcuri, Hotmail, Microsoft Corp. (USA) (Customer)

Rolf Molich, DialogDesign (Denmark)
(Coordinator)
Comparative Usability Evaluation 2

Joseph Seeley, NovaNET Learning Inc. (USA)

Kent Norman, University of Maryland (USA)

Torben Norgaard Rasmussen and others,
Technical University of Denmark

Marji Schumann and others,
Southern Polytechnic State University (USA)
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Barbara Karuykina
SGI, Wisconsin
USA
barbarak@sgi.com
Challenges:
Twenty functional areas
+
User preferences questions
Possible Solutions:

Two usability tests

Surveys

User notes

Focus groups
Results:
26 tasks + 10 interview questions
100 findings
Challenges:
Twenty functional areas
+
User preferences questions
Problems Found








Total number of different
usability problems found
300
Found by seven teams
six teams
five teams
four teams
three teams
two teams
one team
1
1
4
4
15
49
226 (75%)
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Klaus Kaasgaard
Kommunedata
Denmark
kka@kmd.dk
Slides currently not available
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Lars Schmidt
Framtidsfabriken Networkers
Denmark
ls@networkers.dk
Team E
Framtidsfabriken Networkers
Testlab, Denmark
Key learnings CUE-2

Setting up the test
– Insist on dialog with customer
– Secure complete understanding of user groups and user
tasks
– Narrow down test goals

Writing the report
– Use screendumps
– State conclusions - skip the premises
– Test the usability of the usability report
Improving Test Methodology

Searching for usability and usefulness
– Hook up with different methodologies (e.g. interviews)

Focus on website context
– Test against e.g. YahooMail
– Test against softwarebased email clients
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Meghan Ede
Sun Microsystems
California, USA
meghan.ede@sun.com
Hotmail Study Requests

18 Specific Features
 e.g. Registration, Login, Compose...

6 Questions
 e.g. "How do users currently do email?"

24 Potential Study Areas
Usability Methods

Expert Review
 6 Reviewers
 6 Questions

Usability Study
 6 Participants (3 + 3)
 5 Tasks (with sub-tasks)
Report Description
1. Executive Summary
- 4 Main High-Level Themes
- Brief Study Description
2. Debriefing Meeting Summary
- 7 Areas (e.g. overall, navigation, power features, ...)
3. Findings
- 31 Sections
- Study Requests, Extra Areas, Bugs, Task Times, Study Q & A
4. Study Description
Total: 36 Pages - 150 Findings
Lessons Learned

Importance of close contact
with product team

Consider including:
 severity ratings
 more specific recommendations
 screen shots
Discussion Issues

How can we measure the
usability of our reports?

How to deal with the
difference between number
of problems found and
number included in report?
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Wilma van Oel
P5
The Netherlands
w.vanoel@p5-adviseurs.nl
Wilma van Oel
P5
adviseurs voor
produkt-& kwaliteitsbeleid
quality & product
management consultants
Amsterdam, the Netherlands
Structure of Presentation

1. Introduction

2. Deviations in approach
– Test design
– Results and recommendations

3. Lessons for the future
– Change in approach?
– Was it worth the effort?
Introduction
• Company:
P5 Consultants
• Personal background:
psychologist
Test design


Subjects: n=11, pilot, ‘critical users’, 1 hour session
Data collection: log software, video recording
Methods:
lab evaluation + informal approach
Techniques:
exploration, task execution,
think aloud, interview, questionnaire
Tool: SUS
A Test Session
Results and recommendations
Results:
'general'
severity?
Negative
n = median
Positive
n > mean
Recommendations:
general
not 'how'
Lessons for the future

Change in approach?
– Methods: add a usability inspection method
– Procedure: extensive analysis, add session time
– Results: less general, severity?

Was it worth the effort?
– Company: to get experience & benchmarking
– Personally: to improve skills, knowledge
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Meeta Arcuri
Microsoft Corporation
California, USA
meeta@hotmail.com
CUE - 2
The Customer’s Perspective
Meeta Arcuri
User Experience Manager
Microsoft Corp., San Jose, CA
Customer Summary of Findings


New findings ~ 4%
Validation of known issues ~ 67%
– Previous finding from our lab tests
– Finding from on-going inspections

Remainder - beyond Hotmail Usability
– Business reasons for not changing
– Out of Hotmail’s control (partner sites)
– Problems generic to the web
Report Content:
Positive Observations






Quick and Dirty results
Recommendations for problem fixes
Participant quotes – get tone/intensity of
feedback
Exact # of P who encountered each issue
Background of Participants
Environment (browser, speed of connection,
etc.)
Additional Strengths of Reports






Fresh perspectives
Lots of data on non-US users
Recommendations from participants
Trend reporting
Report of outdated material on site
(some help files)
Appreciate positive findings, comments
Report Content: Weaknesses




Some recommendations not sensitive to
web issues (performance, security)
At least one finding irreproducible
(not preserving fields in Reg. Form)
Frequency of issue reported was
sometimes vague.
Some descriptions terse, vague - had to
decipher
How Hotmail Will Use Results



Cross-validate new findings with Hotmail
Customer Service reports
Lots of good data to cite in planning meetings
Some good recommendations given by labs
and participants
Conclusion



Focused, iterative testing would give better
results
Wide array of user data very valuable
Overall - good qualitative and quantitative data
to help prioritize, schedule, and improve
usability of Hotmail.
CHI99 Panel
Comparative Evaluation of Usability Tests
Presentation by
Rolf Molich
DialogDesign
Denmark
molich@dialogdesign.dk
Comparison of Tests

Based only on test reports

Liberal scoring

Focus on major differences

Two generally recognized textbooks:
–
Dumas and Redish, ”A Practical Guide to
Usability Testing”
–
Jeff Rubin, ”Handbook of Usability Testing”
Resources
Team



Person hours
used for test
A
B
136 123
C
D
E
84 (16) 130
F
G
50 107
H
J
45 218
# Usability
professionals
2
1
1
1
3
1
1
3
6
Number of tests
7
6
6
50
9
5
11
4
6
Usability Results
Team
A
B
C
D
E
F
G
H
J
# Positive findings
0
8
4
7
24
25
14
4
6
# Problems
26 150
17
10
58
75
30
18
20
% Exclusive
42
24
10
57
51
33
56
60
71
Usability Results
Team
B
C
D
E
F
G
H
J
# Problems
26 150
17
10
58
75
30
18
20
% Core problems
(100%=26)
38
73
35
8
58
54
50
27
31
136 123
84
Person hours
used for test
A
NA 130
50 107
45 218
Problems Found








Total number of different
usability problems found
300
Found by seven teams
six teams
five teams
four teams
three teams
two teams
one team
1
1
4
4
15
49
226 (75%)
Conclusion

If Hotmail is typical, then the total
number of usability problems for a
typical web-site is huge,
much larger than you can hope to find
in one series of usability tests

Usability testing techniques can be
improved

We need more awareness of the
Usability of Usability work
Download Test Reports and Slides
http://www.dialogdesign.dk/cue2.htm
Download