Usability testing

advertisement
Usability Evaluation
1
www.id-book.com
The aims
 Explain the key concepts used in




evaluation.
Introduce different evaluation methods.
Show how different methods are used for
different purposes at different stages of
the design process and in different
contexts of use.
Show how evaluators mix and modify
methods.
Discuss the practical challenges
2
www.id-book.com
©2011
Why, what, where, and when
to evaluate
Iterative design & evaluation is a continuous process that
examines:
 Why: to check that users can use the product and that they like
it.
 What: a conceptual model, early prototypes of a new system
and later, more complete prototypes.
 Where: in natural and laboratory settings.
 When: throughout design; finished products can be evaluated
to collect information to inform new products.
Designers need to check that they understand users’
requirements.
3
www.id-book.com
©2011
Bruce Tognazzini tells you why
you need to evaluate
“Iterative design, with its repeating cycle of
design and testing, is the only validated
methodology in existence that will consistently
produce successful results. If you don’t have
user-testing as an integral part of your design
process you are going to throw buckets of
money down the drain.”
See AskTog.com for topical discussions about
design and evaluation.
4
www.id-book.com
©2011
Types of evaluation
 Controlled settings involving users, e.g.,
usability testing & experiments in
laboratories and living labs.
 Natural settings involving users, e.g., field
studies to see how the product is used in the
real world.
 Any settings not involving users, e.g.,
consultants critique, predict, analyze & model
aspects of the interface analytics.
5
www.id-book.com
©2011
Usability lab
http://iat.ubalt.edu/usability_lab/
6
www.id-book.com
©2011
Living labs
 People’s use of technology in their everyday
lives can be evaluated in living labs.
 Such evaluations are too difficult to do in a
usability lab.
 E.g., http://www.sfu.ca/westhouse.html
 Currently occupied by participants!
7
www.id-book.com
©2011
Usability testing & field
studies can be complementary
8
www.id-book.com
©2011
Evaluation case studies
 Experiment to investigate a computer game
 Regan L. Mandryk and Kori M. Inkpen. 2004.
Physiological indicators for the evaluation of colocated collaborative play. In Proc. ACM conf. on
Computer supported cooperative work (CSCW '04), 102111. DOI=10.1145/1031607.1031625
http://doi.acm.org/10.1145/1031607.1031625
 In the wild field study of skiers
 Crowd sourcing
9
www.id-book.com
©2011
Challenge & engagement in a
collaborative immersive game
 Physiological measures
were used.
 Engagement was
measured in two
conditions:
1. Playing against another
person
2. Playing against a
computer.
10
www.id-book.com
©2011
What does this data tell you?
Rate each experience state on a scale from 1 to 5
(1=low, 5 = high)
Playing against
Playing against
friend
computer
Boring
Challenging
Easy
Engaging
Exciting
Frustrating
Fun
Mean
St. Dev.
Mean
St. Dev.
2.3
3.6
2.7
3.8
3.5
2.8
3.9
0.949
1.08
0.823
0.422
0.527
1.14
0.738
1.7
3.9
2.5
4.3
4.1
2.5
4.6
0.949
0.994
0.850
0.675
0.568
0.850
0.699
Source: Mandryk and Inkpen (2004).
11
www.id-book.com
©2011
Question
 What were the precautionary measures
that the evaluators had to take?
12
www.id-book.com
©2011
Evaluation case studies
 Experiment to investigate a computer game
 In the wild field study of skiers
 Francis Jambon and Brigitte Meillon. 2009. User experience
evaluation in the wild. In Ext. Abstracts of the int. conf. on Human
factors in computing systems (CHI EA '09), 4069-4074.
http://doi.acm.org/10.1145/1520340.1520619
 Crowd sourcing
13
www.id-book.com
©2011
Why study skiers in the wild ?
Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,
p. 4072.
14
www.id-book.com
©2011
e-skiing system components
Jambon et al. (2009) User experience in the wild. In: Proceedings of CHI ’09, ACM Press, New York,
p. 4072.
©2011
15
www.id-book.com
Evaluation case studies
 Experiment to investigate a computer game
 In the wild field study of skiers
 Crowd sourcing
 Jeffrey Heer and Michael Bostock. 2010.
Crowdsourcing graphical perception: using
mechanical turk to assess visualization design. In Proc.
of the int. Conf. on Human factors in computing
systems (CHI '10). ACM, New York, NY, USA, 203-212.
http://doi.acm.org/10.1145/1753326.1753357
16
www.id-book.com
©2011
Crowdsourcing-when might you
use it?
17
www.id-book.com
©2011
Evaluating an ambient system
 The Hello Wall is a new
kind of system that is
designed to explore
how people react to its
presence.
 What are the
challenges of
evaluating systems like
this?
18
www.id-book.com
Evaluation methods
Method
Controlled Natural
settings
settings
Observing
x
x
Asking
users
Asking
experts
Testing
x
x
Without
users
x
x
x
x
Modeling
19
www.id-book.com
©2011
The language of evaluation
Analytics
Analytical evaluation
Controlled experiment
Expert review or crit
Field study
Formative evaluation
Heuristic evaluation
In the wild study
Living laboratory
Predictive evaluation
Summative evaluation
Usability laboratory
User studies
Usability testing
Users or participants
20
www.id-book.com
Key points
 Evaluation & design are closely integrated in user



centered design.
Some of the same techniques are used in evaluation as for
establishing requirements but they are used differently
(e.g. observation interviews & questionnaires).
Three types of evaluation: laboratory based with users, in
the field with users, studies that do not involve users
The main methods are: observing, asking users, asking
experts, user testing, inspection, and modeling users’ task
performance, analytics.
Dealing with constraints is an important skill for evaluators
to develop.
21
www.id-book.com
©2011
Butterfly Ballot
 “The Butterfly Ballot: Anatomy of disaster”
was written by Bruce Tognazzini,
 http://www.asktog.com/columns/042ButterflyBall
ot.html
 Try it as a design exercise – create a similar
ballot, conduct a usability test (did people
actually vote as they intended?), create a
modified ballot, conduct a second test and
see if the results are improved
22
www.id-book.com
©2011
23
www.id-book.com
AN EVALUATION FRAMEWORK
24
www.id-book.com
The aims are:
 Introduce and explain the DECIDE
framework.
 Discuss the conceptual, practical, and ethical
issues involved in evaluation.
25
www.id-book.com
©2011
DECIDE: a framework to guide
evaluation






Determine the goals.
Explore the questions.
Choose the evaluation methods.
Identify the practical issues.
Decide how to deal with the ethical issues.
Evaluate, analyze, interpret and present
the data.
26
www.id-book.com
©2011
Determine the goals
 What are the high-level goals of the evaluation?
 Who wants it and why?
 The goals influence the methods used for the
study.
 Goals vary and could be to:





identify the best metaphor for the design
check that user requirements are met
check for consistency
investigate how technology affects working practices
improve the usability of an existing product
27
www.id-book.com
©2011
Explore the questions
 Questions help to guide the evaluation.
 The goal of finding out why some customers
prefer to purchase paper airline tickets rather
than e-tickets can be broken down into subquestions:
 What are customers’ attitudes to e-tickets?
 Are they concerned about security?
 Is the interface for obtaining them poor?
 What questions might you ask about the design
of a cell phone?
28
www.id-book.com
©2011
Choose the evaluation approach
& methods
 The evaluation method influences how data is
collected, analyzed and presented.
 E.g. field studies typically:
 Involve observation and interviews.
 Involve users in natural settings.
 Do not involve controlled tests
 Produce qualitative data.
29
www.id-book.com
©2011
Identify practical issues
For example, how to:
 Selecting users
 Finding evaluators
 Selecting equipment
 Staying on budget
 Staying on schedule
30
www.id-book.com
©2011
Decide about ethical issues
 Develop an informed consent form
 Participants have a right to:
- Know the goals of the study;
- Know what will happen to the findings;
- Privacy of personal information;
- Leave when they wish;
- Be treated politely.
31
www.id-book.com
Evaluate, interpret &
present data
 Methods used influence how data is evaluated,
interpreted and presented.
 The following need to be considered:
- Reliability: can the study be replicated?
- Validity: is it measuring what you expected?
- Biases: is the process creating biases?
- Scope: can the findings be generalized?
- Ecological validity: is the environment influencing the
findings? i.e. Hawthorn effect.
32
www.id-book.com
©2011
Key points
 Many issues to consider before conducting an
evaluation study.
 These include: goals of the study; involvment or
not of users; the methods to use; practical &
ethical issues; how data will be collected, analyzed
& presented.
 The DECIDE framework provides a useful checklist
for planning an evaluation study.
33
www.id-book.com
©2011
An exercise left for you…
 Find an evaluation study (proceedings of CHI, CSCW,
SOUPS)
 Use the DECIDE framework to analyze it.
 Describe the aspect of DECIDE that are explicitly
addressed in the report and which are not.
 On a scale of 1-5, where 1 = poor and 5 = excellent,
how would you rate this study?
34
www.id-book.com
EVALUATION STUDIES:
FROM CONTROLLED TO NATURAL
SETTINGS
35
www.id-book.com
The aims:
• Explain how to do usability testing
• Outline the basics of experimental design
• Describe how to do field studies
36
www.id-book.com
©2011
Usability testing
 Involves recording performance of typical users






doing typical tasks.
Controlled settings.
Users are observed and timed.
Data is recorded on video & key presses are logged.
The data is used to calculate performance times, and
to identify & explain errors.
User satisfaction is evaluated using questionnaires &
interviews.
Field observations may be used to provide
contextual understanding.
37
www.id-book.com
©2011
Experiments & usability
testing
 Experiments test hypotheses to discover new
knowledge by investigating the relationship
between two or more things – i.e., variables.
 Usability testing is applied experimentation.
 Developers check that the system is usable by
the intended user population for their tasks.
 Experiments may also be done in usability
testing.
38
www.id-book.com
©2011
Usability testing vs research
Usability testing




Improve products
Few participants
Results inform design
Usually not completely
replicable
 Conditions controlled as
much as possible
 Procedure planned
 Results reported to
developers
Experiments for research
 Discover knowledge
 Many participants
 Results validated




statistically
Must be replicable
Strongly controlled
conditions
Experimental design
Scientific report to
scientific community
39
www.id-book.com
Usability testing
 Goals & questions focus on how well users
perform tasks with the product.
 Comparison of products or prototypes common.
 Focus is on time to complete task & number &
type of errors.
 Data collected by video & interaction logging.
 Testing is central.
 User satisfaction questionnaires & interviews
provide data about users’ opinions.
40
www.id-book.com
©2011
Usability lab with observers
watching a user & assistant
41
www.id-book.com
©2011
Portable equipment for use in
the field
42
www.id-book.com
43
www.id-book.com
Mobile head-mounted eye
tracker
Picture courtesy of SensoMotoric Instruments (SMI), copyright 2010
44
www.id-book.com
©2011
Testing conditions
 Usability lab or other controlled space.
 Emphasis on:
 selecting representative users;
 developing representative tasks.
 5-10 users typically selected.
 Tasks usually last no more than 30 minutes.
 The test conditions should be the same for every
participant.
 Informed consent form explains procedures and
deals with ethical issues.
45
www.id-book.com
©2011
Typical metrics
 Time to complete a task.
 Time to complete a task after a specified. time
away from the product.
 Number and type of errors per task.
 Number of errors per unit of time.
 Number of navigations to online help or
manuals.
 Number of users making a particular error.
 Number of users completing task successfully.
46
www.id-book.com
©2011
Other metrics?
 Not everything is performance driven…
47
www.id-book.com
©2011
Usability engineering
orientation
 Aim is improvement with each version.
 Current level of performance.
 Minimum acceptable level of performance.
 Target level of performance.
48
www.id-book.com
©2011
How many participants is
enough for user testing?
 The number is a practical issue.
 Depends on:
 schedule for testing;
 availability of participants;
 cost of running tests.
 Typically 5-10 participants.
 Some experts argue that testing should
continue until no new insights are gained.
49
www.id-book.com
©2011
Name 3 features for each that can be
tested by usability testing
iPad
50
www.id-book.com
CASE STUDY: IPAD EVALUATION
51
www.id-book.com
Neilsen Norman Group study
 http://www.nngroup.com/reports/mobile/ipad/
 “These reports are based on 2 rounds of usability
studies with real users, reporting how they actually
used a broad variety of iPad apps as well as
websites accessed on the iPad.”
 “The first edition of our iPad report (from
2010) describes the original research we conducted
with the first crop of iPad apps immediately after
the launch of this tablet device.”
 “The second edition of our iPad report (from
2011) describes the follow-up research we
conducted a year later, after apps and websites had
been given the chance to improve based on the
early findings.”
©2011
52
www.id-book.com
Research Goals
 Understand how the interactions with the
device affected people
 Get feedback to their clients and developers
 Report on whether the iPad lived up to the
hype
 Are user expectations different for the iPad
compared with the iPhone?
53
www.id-book.com
©2011
Methodology: Usability Testing
 With a think aloud protocol
 Conducted in Chicago, and Fremont, CA
 Aim of understanding the typical usability
issues that people encounter when using
apps and accessing websites on the iPad
 7 participants :
 Experienced iPhone users (3+ months exp) who
used a variety of apps
 Varied ages (20’s to 60’s) and occupations
 3 male, 4 female
54
www.id-book.com
©2011
Questions:
 Do you think the selection of participants was
appropriate? Why?
 What problems may occur when asking
participants to think out loud as they
complete tasks?
55
www.id-book.com
©2011
The tests
 90 minutes of exploring applications on the iPad
(self-directed exploration)
 Researcher sat beside participant, observed, took
notes
 Session was video recorded
 Specific tasks
 Open a specific app/website
 Carry out one or more tasks “as if they were on their
own”
 Balanced presentation order of tasks
 60 tasks from 32 different sites
56
www.id-book.com
©2011
The Equipment
 Camera (pointing down at the iPad) recorded the
participant’s interactions and gestures when
using the iPad
 Recording streamed to a laptop computer
 Webcam used to record the expressions on the
participants’ faces and their think aloud
commentary
 Laptop ran Morae, which synched the two data
streams
 Up to 3 observers watched the video streams
(including the in-room moderator) so that the
person’s personal space was not invaded
57
www.id-book.com
©2011
App or
website
Task
iBook
Download a free copy of Alice’s Adventures in
Wonderland and read through the first few pages.
Craigslist
Find some free mulch for your garden.
eBay
You want to buy a new iPad on eBay. Find one that you
could buy from a reputable seller.
Time
Magazine
Browse through the magazine and find the best
pictures of the week.
Epicurious
You want to make an apple pie for tonight. Find a
recipe and see what you need to buy in order to
prepare it.
Kayak
You are planning a trip to Death Valley in May this year.
Find a hotel located in or close to the park.
How representative do you think these tasks are?
58
www.id-book.com
©2011
Question?
 What aspects are considered to be important
for good usability and user experience in this
case?
59
www.id-book.com
©2011
Metrics of interest
Usability
User Experience
 Efficient
 Support creativity
 Effective
 Be motivating
 Safe
 Be helpful
 Easy to learn
 Be satisfying to use
 Easy to remember
 Have good utility
60
www.id-book.com
Results
 Website interactions not optimal
 Links too small to tap reliably
 Fonts could be difficult to read
 Participants sometimes did not know where
to tap on the iPad to select options such as
buttons and menus
 Got lost (no back button to return to the
home page)
 Inconsistencies between portrait and
lansdcape presentations
61
www.id-book.com
©2011
Usability Experiments
 Predict the relationship between two or more




variables.
Independent variable is manipulated by the
researcher.
Dependent variable depends on the
independent variable.
Typical experimental designs have one or two
independent variable.
Validated statistically & replicable.
62
www.id-book.com
©2011
Experimental designs
 Between subjects: Different participants -
single group of participants is allocated
randomly to the experimental conditions.
 Within subjects: Same participants - all
participants appear in both conditions.
 Matched participants - participants are
matched in pairs, e.g., based on expertise,
gender, etc.
63
www.id-book.com
©2011
Between, within, matched
participant design
Design
Advantages
Disadvantages
Between
No order effects
Many subjects &
individual differences a
problem
Within
Few individuals, no
individual differences
Counter-balancing
needed because of
ordering effects
Matched
Same as different
participants but
individual differences
reduced
Cannot be sure of
perfect matching on all
differences
64
www.id-book.com
©2011
Field studies
 Field studies are done in natural settings.
 “in the wild” is a term for prototypes being used
freely in natural settings.
 Aim to understand what users do naturally and
how technology impacts them.
 Field studies are used in product design to:
- identify opportunities for new technology;
- determine design requirements;
- decide how best to introduce new technology;
- evaluate technology in use.
65
www.id-book.com
©2011
Data collection & analysis
 Observation & interviews
 Notes, pictures, recordings
 Video
 Logging
 Analyzes
 Categorized
 Categories can be provided by theory
 Grounded theory
 Activity theory
66
www.id-book.com
©2011
Data presentation
 The aim is to show how the products are
being appropriated and integrated into their
surroundings.
 Typical presentation forms include: vignettes,
excerpts, critical incidents, patterns, and
narratives.
67
www.id-book.com
©2011
Key points 1
 Usability testing is done in controlled
conditions.
 Usability testing is an adapted form of
experimentation.
 Experiments aim to test hypotheses by
manipulating certain variables while keeping
others constant.
 The experimenter controls the independent
variable(s) but not the dependent variable(s).
68
www.id-book.com
©2011
Key points 2
 There are three types of experimental design:




different-participants, same- participants, &
matched participants.
Field studies are done in natural environments.
“In the wild” is a recent term for studies in
which a prototype is freely used in a natural
setting.
Typically observation and interviews are used to
collect field studies data.
Data is usually presented as anecdotes,
excerpts, critical incidents, patterns and
narratives.
69
www.id-book.com
©2011
Download