Evaluate

advertisement
Evaluation, cont’d
Two main types of evaluation

Formative evaluation is done at
different stages of development to
check that the product meets users’
needs.

Summative evaluation assesses the
quality of a finished product.
Our focus is on formative evaluation
What to evaluate

Iterative design & evaluation is a continuous
process that examines:
– Early ideas for conceptual model
– Early prototypes of the new system
– Later, more complete prototypes
Designers need to check that they
understand users’ requirements.
Tog says …
“Iterative design, with its repeating cycle
of design and testing, is the only
validated methodology in existence that
will consistently produce successful
results. If you don’t have user-testing as
an integral part of your design process
you are going to throw buckets of money
down the drain.”
When to evaluate
Throughout design
 From the first descriptions, sketches
etc. of users needs through to the
final product
 Design proceeds through iterative
cycles of ‘design-test-redesign’
 Evaluation is a key ingredient for a
successful design.

Another example - development of
“HutchWorld”
Many informal meetings with patients,
carers & medical staff early in design
 Early prototype informally tested on site

– Designers learned a lot
• language of designers & users was different
• asynchronous communication was also needed

Redesigned to produce the portal version
Usability testing
– User tasks investigated:
- how users’ identify was represented
- communication
- information searching
- entertainment

User satisfaction questionnaire

Triangulation to get different
perspectives
Findings from the usability test
• The back button didn’t always work
• Users didn’t pay attention to navigation
buttons
• Users expected all objects in the 3-D view
to be clickable.
• Users did not realize that there could be
others in the 3-D world with whom to chat,
• Users tried to chat to the participant list.
Key points

Evaluation & design are closely integrated in
user-centered design.
 Some of the same techniques are used in
evaluation & requirements but they are used
differently
(e.g., interviews & questionnaires)
 Triangulation involves using a combination of
techniques to gain different perspectives

Dealing with constraints is an important skill
for evaluators to develop.
A case in point …

“The Butterfly Ballot: Anatomy of
disaster”.
See
http://www.asktog.com/columns/042ButterflyBallot.html
An evaluation framework
The aims
 Explain key evaluation concepts & terms.
 Describe the evaluation paradigms &
techniques used in interaction design.
 Discuss the conceptual, practical and
ethical issues that must be considered
when planning evaluations.
 Introduce the DECIDE framework.
Evaluation paradigm
Any kind of evaluation is guided
explicitly or implicitly by a set of beliefs,
which are often under-pinned by theory.
These beliefs and the methods
associated with them are known as an
‘evaluation paradigm’
User studies
User studies involve looking at how
people behave in their natural
environments, or in the laboratory, both
with old technologies and with new
ones.
Four evaluation paradigms
‘quick and dirty’
 usability testing
 field studies
 predictive evaluation

Quick and dirty
 ‘quick & dirty’ evaluation describes the
common practice in which designers
informally get feedback from users or
consultants to confirm that their ideas are inline with users’ needs and are liked.
 Quick & dirty evaluations are done any time.
 The emphasis is on fast input to the design
process rather than carefully documented
findings.
Usability testing

Usability testing involves recording typical
users’ performance on typical tasks in
controlled settings. Field observations may
also be used.
 As the users perform these tasks they are
watched & recorded on video & their key
presses are logged.
 This data is used to calculate performance
times, identify errors & help explain why the
users did what they did.
 User satisfaction questionnaires & interviews
are used to elicit users’ opinions.
Field studies

Field studies are done in natural settings
 The aim is to understand what users do
naturally and how technology impacts them.
 In product design field studies can be used
to:
- identify opportunities for new technology
- determine design requirements
- decide how best to introduce new
technology
- evaluate technology in use.
Predictive evaluation
Experts apply their knowledge of
typical users, often guided by
heuristics, to predict usability
problems.
 Another approach involves
theoretically based models.
 A key feature of predictive evaluation
is that users need not be present
 Relatively quick & inexpensive

Overview of techniques
 observing users,
 asking users’ their opinions,
 asking experts’ their opinions,
 testing users’ performance
 modeling users’ task performance
DECIDE: A framework to
guide evaluation






Determine the goals the evaluation addresses.
Explore the specific questions to be answered.
Choose the evaluation paradigm and
techniques to answer the questions.
Identify the practical issues.
Decide how to deal with the ethical issues.
Evaluate, interpret and present the data.
Determine the goals

What are the high-level goals of the evaluation?

Who wants it and why?

The goals influence the paradigm for the study

Some examples of goals:




Identify the best metaphor on which to base the design.
Check to ensure that the final interface is consistent.
Investigate how technology affects working practices.
Improve the usability of an existing product .
Explore the questions

All evaluations need goals & questions to guide them
so time is not wasted on ill-defined studies.

For example, the goal of finding out why many
customers prefer to purchase paper airline tickets
rather than e-tickets can be broken down into subquestions:
- What are customers’ attitudes to these new tickets?
- Are they concerned about security?
- Is the interface for obtaining them poor?

What questions might you ask about the design of a
cell phone?
Choose the evaluation paradigm &
techniques

The evaluation paradigm strongly
influences the techniques used, how
data is analyzed and presented.

E.g. field studies do not involve
testing or modeling
Identify practical issues
For example, how to:
• select users
• stay on budget
• staying on schedule
• find evaluators
• select equipment
Decide on ethical issues

Develop an informed consent form

Participants have a right to:
- know the goals of the study
- know what will happen to the findings
- privacy of personal information
- not to be quoted without their agreement
- leave when they wish
- be treated politely
Evaluate, interpret & present data

How data is analyzed & presented depends on
the paradigm and techniques used.

The following also need to be considered:
- Reliability: can the study be replicated?
- Validity: is it measuring what you thought?
- Biases: is the process creating biases?
- Scope: can the findings be generalized?
- Ecological validity: is the environment of the
study influencing it - e.g. Hawthorne effect
Pilot studies





A small trial run of the main study.
The aim is to make sure your plan is viable.
Pilot studies check:
- that you can conduct the procedure
- that interview scripts, questionnaires,
experiments, etc. work appropriately
It’s worth doing several to iron out problems
before doing the main study.
Ask colleagues if you can’t spare real users.
Key points
 An evaluation paradigm is an approach that is influenced
by particular theories and philosophies.
 Five categories of techniques were identified: observing
users, asking users, asking experts, user testing,
modeling users.
 The DECIDE framework has six parts:
- Determine the overall goals
- Explore the questions that satisfy the goals
- Choose the paradigm and techniques
- Identify the practical issues
- Decide on the ethical issues
- Evaluate ways to analyze & present data
Observing users
The aims
 Discuss the benefits & challenges of different
types of observation.
 Describe how to observe as an on-looker, a
participant, & an ethnographer.
 Discuss how to collect, analyze & present
observational data.
 Examine think-aloud, diary studies & logging.
 Provide you with experience in doing observation
and critiquing observation studies.
What and when to observe





Goals & questions determine the paradigms and
techniques used.
Observation is valuable any time during design.
Quick & dirty observations early in design
Observation can be done in the field (i.e., field
studies) and in controlled environments (i.e.,
usability studies)
Observers can be:
- outsiders looking on
- participants, i.e., participant observers
- ethnographers
Frameworks to guide observation

- The person. Who?
- The place. Where?
- The thing. What?

The Goetz and LeCompte (1984) framework:
- Who is present?
- What is their role?
- What is happening?
- When does the activity occur?
- Where is it happening?
- Why is it happening?
- How is the activity organized?
The Robinson (1993) framework








Space. What is the physical space like?
Actors. Who is involved?
Activities. What are they doing?
Objects. What objects are present?
Acts. What are individuals doing?
Events. What kind of event is it?
Goals. What do they to accomplish?
Feelings. What is the mood of the group and
of individuals?
You need to consider









Goals & questions
Which framework & techniques
How to collect data
Which equipment to use
How to gain acceptance
How to handle sensitive issues
Whether and how to involve informants
How to analyze the data
Whether to triangulate
Observing as an outsider








As in usability testing
More objective than participant observation
In usability lab equipment is in place
Recording is continuous
Analysis & observation almost simultaneous
Care needed to avoid drowning in data
Analysis can be coarse or fine grained
Video clips can be powerful for telling story
Participant observation &
ethnography








Debate about differences
Participant observation is key component of
ethnography
Must get co-operation of people observed
Informants are useful
Data analysis is continuous
Interpretivist technique
Questions get refined as understanding grows
Reports usually contain examples
Data collection techniques
Notes & still camera
 Audio & still camera
 Video
 Tracking users:
- diaries
- interaction logging

Data analysis

Qualitative data - interpreted & used to tell the
‘story’ about what was observed.

Qualitative data - categorized using techniques
such as content analysis.

Quantitative data - collected from interaction &
video logs. Presented as values, tables, charts,
graphs and treated statistically.
Interpretive data analysis







Look for key events that drive the group’s activity
Look for patterns of behavior
Test data sources against each other - triangulate
Report findings in a convincing and honest way
Produce ‘rich’ or ‘thick descriptions’
Include quotes, pictures, and anecdotes
Software tools can be useful e.g., NUDIST,
Ethnograph (URLs will be provided)
Looking for patterns
Critical incident analysis
 Content analysis
 Discourse analysis
 Quantitative analysis - i.e., statistics

Key points

Observe from outside or as a participant
 Analyzing video and data logs can be timeconsuming.
 In participant observation collections of
comments, incidents, and artifacts are made.
Ethnography is a philosophy with a set of
techniques that include participant observation
and interviews.
 Ethnographers immerse themselves in the
culture that they study.
Asking users &
experts
The aims
 Discuss the role of interviews &
questionnaires in evaluation.
 Teach basic questionnaire design.
 Describe how do interviews, heuristic
evaluation & walkthroughs.
 Describe how to collect, analyze &
present data.
 Discuss strengths & limitations of these
techniques
Interviews

Unstructured - are not directed by a script.
Rich but not replicable.
 Structured - are tightly scripted, often like a
questionnaire. Replicable but may lack
richness.
 Semi-structured - guided by a script but
interesting issues can be explored in more
depth. Can provide a good balance between
richness and replicability.
Basics of interviewing

Remember the DECIDE framework
 Goals and questions guide all interviews
 Two types of questions:
‘closed questions’ have a predetermined
answer format, e.g., ‘yes’ or ‘no’
‘open questions’ do not have a predetermined
format
 Closed questions are quicker and easier to
analyze
Things to avoid when preparing
interview questions





Long questions
Compound sentences - split into two
Jargon & language that the interviewee may
not understand
Leading questions that make assumptions
e.g., why do you like …?
Unconscious biases e.g., gender stereotypes
Components of an interview

Introduction - introduce yourself, explain the goals of
the interview, reassure about the ethical issues, ask to
record, present an informed consent form.

Warm-up - make first questions easy & nonthreatening.

Main body – present questions in a logical order

A cool-off period - include a few easy questions to
defuse tension at the end

Closure - thank interviewee, signal the end,
e.g, switch recorder off.
The interview process






Use the DECIDE framework for guidance
Dress in a similar way to participants
Check recording equipment in advance
Devise a system for coding names of
participants to preserve confidentiality.
Be pleasant
Ask participants to complete an informed
consent form
Probes and prompts

Probes - devices for getting more information.
e.g., ‘would you like to add anything?’
 Prompts - devices to help interviewee, e.g.,
help with remembering a name
 Remember that probing and prompting should
not create bias.
 Too much can encourage participants to try to
guess the answer.
Group interviews
Also known as ‘focus groups’
 Typically 3-10 participants
 Provide a diverse range of opinions
 Need to be managed to:
- ensure everyone contributes
- discussion isn’t dominated by one
person
- the agenda of topics is covered

Analyzing interview data
Depends on the type of interview
 Structured interviews can be
analyzed like questionnaires
 Unstructured interviews generate
data like that from participant
observation
 It is best to analyze unstructured
interviews as soon as possible to
identify topics and themes from the
data

Questionnaires






Questions can be closed or open
Closed questions are easiest to analyze, and
may be done by computer
Can be administered to large populations
Paper, email & the web used for
dissemination
Advantage of electronic questionnaires is that
data goes into a data base & is easy to
analyze
Sampling can be a problem when the size of
a population is unknown as is common online
Questionnaire style





Varies according to goal so use the DECIDE
framework for guidance
Questionnaire format can include:
- ‘yes’, ‘no’ checkboxes
- checkboxes that offer many options
- Likert rating scales
- semantic scales
- open-ended responses
Likert scales have a range of points
3, 5, 7 & 9 point scales are common
Debate about which is best
Developing a questionnaire





Provide a clear statement of purpose &
guarantee participants anonymity
Plan questions - if developing a web-based
questionnaire, design off-line first
Decide on whether phrases will all be
positive, all negative or mixed
Pilot test questions - are they clear, is there
sufficient space for responses
Decide how data will be analyzed & consult a
statistician if necessary
Encouraging a good response








Make sure purpose of study is clear
Promise anonymity
Ensure questionnaire is well designed
Offer a short version for those who do not
have time to complete a long questionnaire
If mailed, include a s.a.e.
Follow-up with emails, phone calls, letters
Provide an incentive
40% response rate is high, 20% is often
acceptable
Advantages of online questionnaires







Responses are usually received quickly
No copying and postage costs
Data can be collected in database for
analysis
Time required for data analysis is reduced
Errors can be corrected easily
Disadvantage - sampling problematic if
population size unknown
Disadvantage - preventing individuals from
responding more than once
Problems with online questionnaires



Sampling is problematic if population
size is unknown
Preventing individuals from responding
more than once
Individuals have also been known to
change questions in email
questionnaires
Questionnaire data analysis &
presentation





Present results clearly - tables may help
Simple statistics can say a lot, e.g., mean,
median, mode, standard deviation
Percentages are useful but give population
size
Bar graphs show categorical data well
More advanced statistics can be used if
needed
Well-known forms
SUMI
 MUMMS
 QUIS
 -- see Perlman site

Asking experts

Experts use their knowledge of users &
technology to review software usability
 Expert critiques (crits) can be formal or
informal reports
 Heuristic evaluation is a review guided by a
set of heuristics
 Walkthroughs involve stepping through a
pre-planned scenario noting potential
problems
Heuristic evaluation





Developed by Jakob Nielsen in the early
1990s
Based on heuristics distilled from an
empirical analysis of 249 usability problems
These heuristics have been revised for
current technology, e.g., HOMERUN for
web
Heuristics still needed for mobile devices,
wearables, virtual worlds, etc.
Design guidelines form a basis for
developing heuristics
Nielsen’s heuristics










Visibility of system status
Match between system and real world
User control and freedom
Consistency and standards
Help users recognize, diagnose, recover
from errors
Error prevention
Recognition rather than recall
Flexibility and efficiency of use
Aesthetic and minimalist design
Help and documentation
Discount evaluation

Heuristic evaluation is referred to as
discount evaluation when 5
evaluators are used.

Empirical evidence suggests that on
average 5 evaluators identify 75-80%
of usability problems.
3 stages for doing heuristic
evaluation

Briefing session to tell experts what to do
 Evaluation period of 1-2 hours in which:
- Each expert works separately
- Take one pass to get a feel for the product
- Take a second pass to focus on specific
features

Debriefing session in which experts work
together to prioritize problems
Advantages and problems

Few ethical & practical issues to consider
 Can be difficult & expensive to find
experts
 Best experts have knowledge of
application domain & users
 Biggest problems
- important problems may get missed
- many trivial problems are often
identified
Cognitive walkthroughs
Focus on ease of learning
 Designer presents an aspect of the
design & usage scenarios
 One of more experts walk through the
design prototype with the scenario
 Expert is told the assumptions about
user population, context of use, task
details
 Experts are guided by 3 questions

The 3 questions

Will the correct action be sufficiently evident
to the user?
 Will the user notice that the correct action is
available?
 Will the user associate and interpret the
response from the action correctly?
As the experts work through the
scenario they note problems
Pluralistic walkthrough





Variation on the cognitive walkthrough theme
Performed by a carefully managed team
The panel of experts begins by working
separately
Then there is managed discussion that leads
to agreed decisions
The approach lends itself well to participatory
design
Key points








Structured, unstructured, semi-structured
interviews, focus groups & questionnaires
Closed questions are easiest to analyze & can
be replicated
Open questions are richer
Check boxes, Likert & semantic scales
Expert evaluation: heuristic & walkthroughs
Relatively inexpensive because no users
Heuristic evaluation relatively easy to learn
May miss key problems & identify false ones
A project for you …
Activeworlds.com
 Questionnaire to test reactions with friends
 http://www.acm.org/~perlman/question.htm
l
 http://www.ifsm.umbc.edu/djenni1/osg/
 Develop heuristics to evaluate usability
and sociability aspects

A project for you …
http://www.id-book.com/catherb/
-provides heuristics and a template so
that you can evaluate different kinds of
systems. More information about this is
provided in the interactivities section of
the id-book.com website.
A project for you …
Go to the The Pew Internet & American
Life Survey www.pewinternet.org/ (or to
another survey of your choice)
 Critique one of the recent online
surveys
 Critique a recent survey report

Interpretive Evaluation
Contextual inquiry
 Cooperative and participative evaluation
 Ethnography


rather than emphasizing statement of goals,
objective tests, research reports, instead
emphasizes usefulness of findings to the
people concerned
 good for feasibility study, design feedback,
post-implementation review
Contextual Inquiry

Users and researchers participate to identify
and understand usability problems within the
normal working environment of the user
 Differences from other methods include:
–
–
–
–
work context -- larger tasks
time context -- longer times
motivational context -- more user control
social context -- social support included that is
normally lacking in experiments
Why use contextual inquiry?

Usability issues located that go
undetected in laboratory testing.
– Line counting in word processing
– unpacking and setting up equipment

Issues identified by users or by
user/evaluator
Contextual interview: topics of
interest
Structure and language used in work
 individual and group actions and
intentions
 culture affecting the work
 explicit and implicit aspects of the work

Cooperative evaluation
A technique to improve a user interface
specification by detecting the possible
usability problems in an early prototype
or partial simulation
 low cost, little training needed
 think aloud protocols collected during
evaluation

Cooperative Evaluation
Typical user(s) recruited
 representative tasks selected
 user verbalizes problems/ evaluator
makes notes
 debriefing sessions held
 Summarize and report back to design
team

Participative Evaluation
More open than cooperative evaluation
 subject to greater control by users
 cooperative prototyping, facilitated by

– focus groups
– designers work with users to prepare
prototypes
– stable prototypes provided, users evaluate
– tight feedback loop with designers
Ethnography

Standard practice in anthropology
 Researchers strive to immerse themselves in
the situation they want to learn about
 Goal: understand the ‘real’ work situation
 typically applies video - videos viewed,
reviewed, logged, analyzed, collections
made, often placed in databases, retrieved,
visualized ….
Predictive Evaluation
Predict aspects of usage rather than
observe and measure
 doesn’t involve users
 cheaper

Predictive Evaluation Methods

Inspection Methods
– Standards inspections
– Consistency inspection
– Heuristic evaluation
– “Discount” usability evaluation
– Walkthroughs

Modelling:The keystroke level model
Standards inspections
Standards experts inspect the interface
for compliance with specified standards
 relatively little task knowledge required

Consistency inspections

Teams of designers inspect a set of
interfaces for a family of products
– usually one designer from each project
Usage simulations

Aka - “expert review”, “expert
simulation”
Experts simulate behavior of lessexperienced users, try to anticipate
usability problems
 more efficient than user trials
 prescriptive feedback

Heuristic evaluation
Usage simulation in which system is
evaluted against list of “heuristics”, e.g.
 Two passes: per screen, and flow from
screeen to screen
 Study: 5 evaluators found 75% of
problems

Sample heuristics









Use simple and natural dialogue
speak the user’s language
minimize user memory load
be consistent
provide feedback
provide clearly marked exits
provide shortcuts
provide good error messages
prevent errors
Discount usability engineering
Phase 1: usability testing + scenario
construction (1-3 users)
 Phase 2: scenarios refined + heuristic
evaluation
 “Discount” features

–
–
–
–
–
small scenarios, paper mockups
informal think-aloud (no psychologists)
Scenarios + think-aloud + heuristic evaluation
small number of heuristics (see previous slide)
2-3 testers sufficient
Walkthroughs

Goal - detect problems early on; remove
 construct carefully designed tasks from a
system specification or screen mockup
 walk-through the activities required, predict
how users would likely behave, determine
problems they will encounter
 -- see checklist for cognitive walkthrough
Modeling: keystroke level model

Goal: calculate task performance times
for experienced users

Requires
– specification of system functionality
– task analysis, breakdown of each task into
its components
Keystroke-level modeling

Time to execute sum of:
– Tk - keystroking (0.35 sec)
– Tp - pointing
(1.10)
– Td - drawing
(problem-dependent)
– Tm - mental
(1.35)
– Th - homing
(0.4)
– Tr - system response (1.2)
KLM: example

Save file with new name in wp that uses
mouse and pulldown menus

(1) initial homing: (Th)
(2) move cursor to file menu at top of screen(Tp + Tm
)
(3) select ‘save as’ in file menu(click on file menu,
move down file menu, click on ‘save as’) (Tm + Tk +
Tp +Tk)
(4) word processor prompts for new file name, user
types filename (Tr + Tm + Tk(filename) + Tk)



Experiments and Benchmarking
Traditional experiments
 Usability Engineering

Traditional Experiments

Typically narrowly defined, evaluate
particular aspects such as:
– menu depth v. context
– icon design
– tickers v. fade_boxes v. replace_boxes

Usually not practical to include in design
process
Example: Star Workstation,
text selection
Goal: evaluate methods for selecting
text, using 1-3 mouse buttons
 Operations:

– Point (between characters, target of
move,copy, or insert)
– Select text (character, word, sentence, par,
doc)
– Extend selection to include more text
Selection Schemes
A
B
C
D
E
F
Button
1
Point
Point
Point
Point
C
C, W,
Drwthru S, P, D
Drwthru
Point
Point
C, W,
C
S, P, D, Dthru
Drwthru
Point
C, W,
S, P, D
Button
2
C
C, W, S,
Drwthru P, D
Drwthru
W, S, P,
D
Drwthru
Adjust
Adjust
Button
3
W, S, P,
D
Drwthru
Adjust
G
Methodology






Between-subjects paradigm
six groups, 4 subjects per group
in each group: 2 experienced w/mouse, 2 not
each subject first trained in use of mouse and
in editing techniques in Star w.p. system
Assigned scheme taught
Each subject performs 10 text-editing tasks, 6
times each
Results: selection time
Time:
Scheme A :12.25 s
Scheme B: 15.19 s
Scheme C: 13.41 s
Scheme D: 13.44 s
Scheme E: 12.85 s
Scheme F: 9.89 s (p < 0.001)
Results: Selection Errors
Average: 1 selection error per four tasks
 65% of errors were drawthrough errrors,
same across all selection schemes
 20% of errors were “too many clicks” ,
schemes with less clicking better
 15% of errors were ‘click wrong mouse
button”, schemes with fewer buttons
better

Selection scheme: test 2

Results of test 1 lead to conclusion to avoid:
– drawthroughs
– three buttons
– multiple clicking
Scheme “G” introduced -- avoids
drawthrough, uses only 2 buttons
 New test, but test groups were 3:1
experienced w/mouse to not

Results of test 2
Mean selection time: 7.96s for scheme
G, frequency of “too many clicks” stayed
about the same
 Conclusion: scheme G acceptable

– selection time shorter
– advantage of quick selection balances
moderate error rate of multi-clicking
Experimental design - concerns
What to change? What to keep
constant? What to measure?
 Hypothesis, stated in a way that can be
tested.
 Statistical tests: which ones, why?

Variables

Independent variable - the one the
experimenter manipulates (input)
 Dependent variable - affected by the
independent varialbe (output)
 experimental effect - changes in dependent
caused by changes in independent
 confounded -- when dependent changes
because of other variables (task order,
learning, fatigue, etc.)
Selecting subjects - avoiding bias
Age bias -- Cover target age range
 Gender bias -- equal numbers of
male/female
 Experience bias -- similar level of
experience with computers
 etc. ...

Experimental Designs

Independent subject design
– single group of subjects allocated randomly to
each of the experimental conditions

Matched subject design
– subjects matched in pairs, pairs allocated
randomly to each of the experimental conditions

Repeated measures design
– all subjects appear in all experimental conditions
– Concerns: order of tasks, learning effects

Single subject design
– in-depth experiments on just one subject
Critical review of experimental
procedure

User preparation
– adequate instructions and training?

Impact of variables
– how do changes in independent variables affect
users

Structure of the tasks
– were tasks complex enough, did users know aim?

Time taken
– fatigue or boredom?
Critical review of experimental
results

Size of effect
– statistically signficant? Practically significant?

Alternative interpretations
– other possible causes for results found?

Consistency between dependent variables
– task completion and error scores versus user
preferences and learning scores

Generalization of results
– to other tasks,users, working environments?
Usability Engineering
Usability of product specified
quantitatively, and in advance
 As product is built, it can be
demonstrated that it does or does not
reach required levels of usability

Usability Engineering

Define usability goals through metrics
 Set planned levels of usability that need to
be achieved
 Analyze the impact of various design
solutions
 Incorporate user-defined feedback in
product design
 Iterate through design-evaluate-design
loop until planned levels are achieved
Metrics

Include:
– time to complete a particular task
– number of errors
– attitude ratings by users
Metrics - example, conferencing
system
Attribute
Measuring
Concept
Measuring
Method
Worst case
Planned level
Best case
Now level
Initial use
Conferencing
task
successful
interxns / 30
min
1-2
3-4
8-10
?
Infreq. Use
Tasks after 12 weeks
disuse
% of errors
Equal to
product Z
50% better
0 errors
?
Learning rate
Task
1st half vs. 2nd
half score
Two halves
equal
Second half
better
‘much’ better
?
Preference
over prod. Z
Questionnaire
Score
Ratio of
scores
Same as Z
None prefer Z
?
Pref over
product A
Questionnaire
Score
Ratio of
scores
Same as Q
None prefer
Q
?
Error
recovery
Critical
incident
analysis
% incidents
accounted for
10%
50%
100%
?
Initial
evaluation
Attitude
questionnaire
Semantic
differential
score
0 (neutral)
1(somewhat
positive)
2 (highly
positive)
?
Casual eval.
Attitude
questionnaire
Semantic
differential
0 (neutral)
1(somewhat
positive)
2 (highly
positive)
?
Benchmark tasks

Carefully constructed standard tests used to
monitor users’ performance in usability testing
 typically use multiple videos, keyboard
logging
 controlled testing -- specified set of users,
well-specified tasks, controlled environment
 tasks longer than scientific experiments,
shorter than “real life”
Making tradeoffs
impact analysis - used to establish
priorities among usability attributes. It is
a listing of attributes and proposed
design decisions, and % impact of each.
 Usability engineering reported to
produce a measurable improvement in
usability of about 30%.

Download