The evolution of evaluation

advertisement
The
Evolution
of
Evaluation
CHI 2007
alt.chi
Joseph ‘Jofish’ Kaye
Phoebe Sengers
Cornell University, Ithaca NY
jofish @ cornell.edu
sengers @ cs.cornell.edu
30 April 2007
What is evaluation?
• Part of the practice of HCI
• Part of the design-build-evaluate
iterative design cycle
• A comparison of ‘built’ to
‘planned’
• A place to reflect on both this
and the next design
• And….
– A way of defining a field
– The space where a discipline
validates the knowledge it creates.
What is evaluation?
• Something you do at the end of a
project to show it works…
• … so you can publish it.
• A reason papers get rejected
Which, again, are other ways of
saying:
– A way of defining a field
– The space where a discipline
validates the knowledge it creates.
HCI Evaluation: Validity
“Methods for establishing
validity vary depending on the
nature of the contribution.
They may involve empirical
work in the laboratory or the
field, the description of
rationales for design decisions
and approaches, applications
of analytical techniques, or
‘proof of concept’ system
implementations”
CHI 2007 Website
So…
• How and why did we end up
with the system(s) we use for
HCI evaluation today?
• How can our current
approaches to evaluation deal
with novel concepts of HCI,
such as third-wave/paradigm
or experience-focused (rather
than task focused) HCI?
• And in particular…
The Virtual Intimate
Object (VIO)
• A device for couples in long
distance relationships to
communicate intimacy
• When one partner clicks, the
other’s circle lights up, and
then fades over time.
www.intimateobjects.org
Kaye. I just clicked to say I love you. alt.chi, Ext.
Abs. CHI 2006.
Evaluation of the VIO
• It’s about the experience; it’s
not about the task
• How can we measure intimacy
and the transmission thereof?
Kaye, Levitt, Nevins, Golden & Schmidt.
Communicating Intimacy One Bit at a Time. Ext.
Abs. CHI 2005.
Kaye. I just clicked to say I love you. alt.chi, Ext.
Abs. CHI 2006.
Understanding how we got
to where we are today
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
4. Evaluation by HCI
Professionals
5. Evaluation for Experience
(with case studies)
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
a. Evaluation of Text Editors
4. Evaluation by HCI
Professionals
a) Damaged Merchandise
5. Evaluation for Experience
Why does evaluation
evolve?
Evolution is adaptation to fit
changing conditions. What
changes?
• Who are the users?
• Who are the evaluators?
• What are the limiting factors?
p.s. note historical chunking and
simplification
Evaluation by Engineers
• Users are engineers &
mathematicians
• Evaluators are engineers
• The limiting factor is
reliability
Evaluation by Computer
Scientists
• Users are programmers
• Evaluators are programmers
• The speed of the machine is
the limiting factor
Evaluation by Computer
Scientists
• First uses of…
• Human-computer interaction
– “It seems that when a system
encourages close humancomputer interaction, it also
encourages close human-human
and human-computer-human
interaction” (Schwartz 1965)
• Computer-human interaction
– “PLANIT A Flexible Language
Designed for Computer-Human
Interaction” (Feingold 1967)
Evaluation by
Experimental Psychologists
& Cognitive Scientists
• Users are users: the computer
is a tool; often in offices.
• Evaluators are cognitive
scientists and experimental
psychologists: they’re used to
measuring things through
experiment
• The limiting factor is what the
human can do
Case Study of ExPsych /
CogSci Evaluation:
Text Editors
Roberts & Moran, 1982, 1983.
Their methodology for
evaluating text editors had
three criteria:
objectivity
thoroughness
ease-of-use
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular
editor’s conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”
ease-of-use (of the method, not the
editor itself)
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources”
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular
editor’s conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”.
ease-of-use (of the method (not the
editor itself),
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources.”
Case Study: Text
Editors
Text editors are
the white rats of HCI
Thomas Green, 1984,
in Grudin, 1990.
Evaluation by HCI
Professionals
• They believe in expertise over
experiment (Nielsen 1984)
• They’ve made a decision to
decide to focus on better
results, regardless of whether
they were experimentally
provable or not.
Evaluation by HCI
Professionals
• Evaluators are usability
professionals (often with
Exp.Psych/CogSci backgrounds)
• Users are (often) white collar,
using computers to accomplish
their jobs
• The limiting factor is the time
of the worker accomplishing
their job
Case Study: The Damaged
Merchandise Debate
Damaged Merchandise
Setup
Early eighties:
usability evaluation methods
(UEMs)
- heuristics (Neilsen)
- cognitive walkthrough
- GOMS
-…
Damaged Merchandise
Comparison Studies
Jeffries, Miller, Wharton and
Uyeda (1991)
Karat, Campbell and Fiegel
(1992)
Nielsen (1992)
Desuirve, Kondziela, and Atwood
(1992)
Nielsen and Phillips (1993)
Damaged Merchandise
Panel
Wayne D. Gray, Panel at CHI’95
Discount or Disservice? Discount
Usability Analysis at a Bargain
Price or Simply Damaged
Merchandise
Damaged Merchandise
Paper
Wayne D. Gray & Marilyn
Salzman
Special issue of HCI:
Experimental Comparisons of
Usability Evaluation Methods
Damaged Merchandise
Response
Commentary on Damaged
Merchandise
Karat: experiment in context
Jeffries & Miller: real-world
Lund & McClelland: practical
John: case studies
Monk: broad questions
Oviatt: field-wide science
MacKay: triangulate
Newman: simulation & modelling
Damaged Merchandise
Clash of Paradigms
Experimental Psychologists &
Cognitive Scientists
(who believe in experimentation)
vs.
HCI Professionals
(who believe in experience and
expertise, even if ‘unprovable’)
(and who were trying to present
their work in the terms of the
dominant paradigm of the field.)
Kuhn (1972) Structure of Scientific Revolutions
Damaged Merchandise
Clash of Paradigms
• In this particular work, we’re not
talking about who’s right
• It’s about recognizing what
paradigm clashes look like in HCI
• It’s about the need to present work
in the terms of the dominant
paradigm of the field
• It’s thinking about how to recognize
and re-think our own approaches to
knowing and doing HCI: an HCI that
recognizes how it knows what it
knows
Experience Focused HCI
• A possibly emerging sub-field,
drawing from traditions and
disciplines outside the field
• Emphasis on the experience,
not [just] the task
• Thinking about technology as
more like… a car than a text
editor
• Wright & McCarthy, Gaver,
Blythe, Höök, Taylor & Swan,
Bødker, Peterson, Isbister…
Experience Focused HCI
• For example…
• How can you evaluate a car?
• Why do you drive what you
drive?
– Grad-student-chic?
– Eco-chic?s
– Machismo? Safety? Gay? Speed?
• For users, ‘HCI’ is cultural as
well as technological
• We’ll fail if we evaluate purely
on task
Experience Focused HCI
• The users are people choosing
to use technology for the joy
of it, & to do what they want
in everyday life.
• The evaluators are us… and
ethnographers and designers
and documentary filmmakers
and writers and playwrights
• The limiting factor might be
how to express oneself, how to
be and be seen (or not)
Why the evolution of
evaluation matters
• New paradigms require new ways
of knowing and new ways of
evaluation
• Difficulties come when one
paradigm tries to present work in
the manner of another paradigm
• We need to actively recognize
and call attention to when this
happens, both as researchers and
reviewers
An evolving discussion
SIG: Evaluation of Experience-focused HCI
Thursday, 9am, Room C4
Joseph ‘Jofish’ Kaye jofish@cornell.edu (paper & talk at jofish.com)
Phoebe Sengers
sengers@cs.cornell.edu
Research sponsored in part by the NSF and Microsoft Research Cambridge
Thanks to the Culturally Embedded Computing Group, BostonCHI, Alex Taylor, Ken Wood, Richard
Harper, Abi Sellen, Shahram Izadi, Lorna Brown & the CMLG, Microsoft Cambridge, Apala Lahiri
Chavan & Eric Schaffer, HFI, CHI Bangalore, CHI Mumbai, BostonCHI, the Cornell S&TS
Department, Maria Håkansson & IT University Göteborg, Louise Barkhuus, Barry Brown &
University of Glasgow, Mark Blythe & University of York, Andy Warr & the Oxford E-Research
Center, Susanne Bødker, Marianne Graves Petersen & The University of Aarhus, Terry Winograd,
Wendy Ju, Scott Klemmer & The Stanford HCI Seminar, Jonathan Grudin, Liam Bannon, Gilbert
Cockton, William Newman, Kirsten Boehner, Jeff Hancock, Bill Gaver, Janet Vertesi, Kia Höök,
Jarmo Laaksolahti, Anna Ståhl, Helen Jeffries, Paul Dourish, Jen Rode, Peter Wright, Ryan
Aipperspach, Bill Buxton, Michael Lynch, Seth ‘Beemer’ McGinnis & Katherine Isbister,
Download