What is evaluation?

advertisement
The Evolution of
Evaluation:
Learning from History as a
Step Towards the
Evaluation
of Third Wave HCI
University of Århus
28 November 2006
Joseph ‘Jofish’ Kaye
Microsoft Research, Cambridge
Cornell University, Ithaca, NY
jofish @ cornell.edu
What is evaluation?
• Something you do at the end of
a project to show it works…
• … so you can publish it.
• Part of the design-buildevaluate iterative design cycle
• A way of defining a field
• A way a discipline validates
the knowledge it creates.
• A reason papers get rejected
HCI Evaluation: Validity
“Methods for establishing
validity vary depending on the
nature of the contribution.
They may involve empirical
work in the laboratory or the
field, the description of
rationales for design decisions
and approaches, applications
of analytical techniques, or
‘proof of concept’ system
implementations”
CHI 2007 Website
So…
• How did we get to where we
are today?
• Why did we end up with the
system(s) we use today?
• How can our current
approaches to evaluation deal
with novel concepts of HCI,
such as third-wave or
experience-focused (rather
than task focused) HCI?
• And in particular…
Evaluation of the VIO
• A device for couples in long
distance relationships to
communicate intimacy
• It’s about the experience; it’s
not about the task
www.intimateobjects.org
Kaye, Levitt, Nevins, Golden & Schmidt.
Communicating Intimacy One Bit at a Time. Ext.
Abs. CHI 2005.
Kaye. I just clicked to say I love you. alt.chi, Ext.
Abs. CHI 2006.
A Brief History and plan for
the talk
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
4. Evaluation by HCI
Professionals
5. Evaluation in CSCW
6. Evaluation for Experience
A Brief History and plan for
the talk
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
a. Case study: Evaluation of Text
Editors
4. Evaluation by HCI Professionals
a) Case Study: The Damaged
Merchandise Debate
5. Evaluation in CSCW
6. Evaluation for Experience
3 Questions to ask
about an era
• Who are the users?
• Who are the evaluators?
• What are the limiting factors?
Evaluation by Engineers
• Users are engineers &
mathematicians
• Evaluators are engineers
• The limiting factor is
reliability
Evaluation by Computer
Scientists
• Users are programmers
• Evaluators are programmers
• The speed of the machine is
the limiting factor
Evaluation by
Experimental Psychologists
& Cognitive Scientists
• Users are users: the computer
is a tool, not an end result
• Evaluators are cognitive
scientists and experimental
psychologists: they’re used to
measuring things through
experiment
• The limiting factor is what the
human can do
Evaluation by
Experimental Psychologists
& Cognitive Scientists
Perceptual issues such as print
legibility and motor issues arose in
designing displays, keyboards and
other input devices… [new interface
developments] created opportunities
for cognitive psychologists to
contribute in such areas as motor
learning, concept formation,
semantic memory and action.
In a sense, this marks the emergence
of the distinct discipline of humancomputer interaction. (Grudin 2006)
Case Study of
Evaluation: Text Editors
Roberts & Moran, 1982, 1983.
Their methodology for
evaluating text editors had
three criteria:
objectivity
thoroughness
ease-of-use
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular
editor’s conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”
ease-of-use (of the method, not the
editor itself)
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources”
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular
editor’s conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”.
ease-of-use (of the method (not the
editor itself),
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources.”
Case Study: Text
Editors
Text editors are
the white rats of HCI
Thomas Green, 1984,
in Grudin, 1990.
Evaluation by HCI
Professionals
• Usability professionals
• They believe in expertise
(e.g. Nielsen 1984)
• They’ve made a decision to
decide to focus on better
results, regardless of whether
they were experimentally
provable or not.
Case Study: The Damaged
Merchandise Debate
Damaged Merchandise
Setup
Early eighties:
usability evaluation methods
(UEMs)
- heuristics (Neilsen)
- cognitive walkthrough
- GOMS
-…
Damaged Merchandise
Comparison Studies
Jeffries, Miller, Wharton and
Uyeda (1991)
Karat, Campbell and Fiegel
(1992)
Nielsen (1992)
Desuirve, Kondziela, and Atwood
(1992)
Nielsen and Phillips (1993)
Damaged Merchandise
Panel
Wayne D. Gray, Panel at CHI’95
Discount or Disservice? Discount
Usability Analysis at a Bargain
Price or Simply Damaged
Merchandise
Damaged Merchandise
Paper
Wayne D. Gray & Marilyn
Salzman
Special issue of HCI:
Experimental Comparisons of
Usability Evaluation Methods
Damaged Merchandise
Response
Commentary on Damaged
Merchandise
Karat: experiment in context
Jeffries & Miller: real-world
Lund & McClelland: practical
John: case studies
Monk: broad questions
Oviatt: field-wide science
MacKay: triangulate
Newman: simulation & modelling
Damaged Merchandise
What’s going on?
Gray & Salzman, p19
There is a tradition in the human factors
literature of providing advice to practitioners
on issues related to, but not investigated in,
an experiment. This tradition includes the
clear and explicit separation of experimentbased claims from experience-based advice.
Our complaint is not against experimenters
who attempt to offer good advice… the advice
may be understood as research findings rather
than the researcher’s opinion.
Damaged Merchandise
What’s going on?
Gray & Salzman, p19
There is a tradition in the human factors
literature of providing advice to practitioners
on issues related to, but not investigated in,
an experiment. This tradition includes the
clear and explicit separation of experimentbased claims from experience-based advice.
Our complaint is not against experimenters
who attempt to offer good advice… the advice
may be understood as research findings rather
than the researcher’s opinion.
Damaged Merchandise
Clash of Paradigms
Experimental Psychologists &
Cognitive Scientists
(who believe in experimentation)
vs.
HCI Professionals
(who believe in experience and
expertise, even if ‘unprovable’)
(and who were trying to present
their work in the terms of the
dominant paradigm of the field.)
CSCW
Briefly…
• CSCW vs. HCI
• Not just groups instead of
users, but philosophy &
approach (ideology?)
• Posits that work is membercreated, dynamic, and
explictly not cognitive,
modelable
• Follows failure of ‘workplace
studies’ to characterize work
Evaluation in CSCW
• Ramage, The Learning Way
(Ph.D, Lancaster 1999)
–
–
–
–
–
No single ‘right’ or wrong
Identify why evaluate here
Determine stakeholders
Observe & analyze
Learn
• Note the differences between
this kind of approach and more
traditional HCI user testing.
• Fundamentally different from
HCI; separate field.
• (PS. There’s problems with this
characterization.)
Experience Focused HCI
• A possibly emerging sub-field,
drawing from traditions and
disciplines outside the field
• Emphasis on the experience,
not [just] the task
• But how to evaluate?
Experience focused HCI
Isbister et. al.: open-ended
affective evaluations that
leverage realtime individual
interpretations.
Isbister, Höök, Sharp, Laaksolahti. The Sensual
Evaluation Instrument: Developing an Affective
Evaluation Tool. Proc. CHI’06
Experience focused HCI
Gaver et. al.: cultural
commentators with expertise
in their own fields provide
multi-layered assessment.
Gaver, W. Cultural Commentators for Polyphonic
Assessment. To appear in IJHCI.
Experience focused HCI
Virtual Intimate Object
(VIO)
Kaye et. al. Cultural probes to
provide user-interpreted thick
descriptions of use experience
Kaye, Levitt, Nevins, Golden & Schmidt.
Communicating Intimacy One Bit at a Time. Ext.
Abs. CHI 2005.
Experience focused HCI
Virtual Intimate Object
(VIO)
Did it make you feel closer to your partner?
I was surprised to see one morning that my partner
had actually turned on his computer just to push
VIO and then turned it off again
YES - We share this experience together, and we use
VIO aware that from another part of the world
someone was thinking to each other! When VIO
became red I feel very happy, because I knew
that my boyfriend was clicking on it. So this
communication was in a instant.
Kaye, J. ‘J.’ I just clicked to say I love you. alt.chi,
Ext. Abs. CHI 2006.
Experience focused HCI
Virtual Intimate Object
(VIO)
The color that currently best represents my
relationship is…
Amber/yellow --> do I proceed w/ caution or speed
up to beat the red or slow down anticipating a
step
Purple - we have a more matured, aged relationship
rather than a new, boundless one which would
best be described by red. Purple is the more
aged, ripened form of red.
Yellow! Like a sun, like a summer. I often laugh with
Sven especially in those days. Using Vio is really
funny and interesting.
Kaye, J. ‘J.’ I just clicked to say I love you. alt.chi,
Ext. Abs. CHI 2006.
Epistemology
• How does a field know what it
knows?
• How does a field know that it
knows it?
• Science: experiment…
• But literature? Anthropology?
Sociology? Therapy? Art?
Theatre? Design?
• These disciplines have ways to
talk about experience lacking
in an experimental paradigm.
Formally…
The aim of this work is to
recognize the ways in which
multiple epistemologies, not
just the experimental
paradigm of science, must
inform the hybrid discipline of
human-computer interaction if
we wish to build systems that
support users’ increasingly rich
interactions with technology.
An evolving discussion
Thanks to Susanne Bødker, Marianne Graves Petersen
and all of you! And…
• Phoebe Sengers & CEmCom, Cornell University
• Alex Taylor & MS Research Cambridge
• Cornell S&TS Department, Maria Håkansson & IT
University Göteborg, Louise Barkhuus, Barry
Brown & University of Glasgow, Mark Blythe &
University of York, Andy Warr & The Oxford EResearch Center
• Many others, including Jonathan Grudin, Liam
Bannon, Gilbert Cockton, William Newman,
Richard Harper, Kirsten Boehner, Jeff Hancock,
Bill Gaver, Janet Vertesi, Kia Höök, Jarmo
Laaksolahti, Anna Ståhl, Helen Jeffries, Paul
Dourish, Jenn Rode, Peter Wright, Ryan
Aipperspach, Bill Buxton, Michael Lynch, Seth
‘Beemer’ McGinnis, Katherine Isbister.
Download