How We Got To Where We Are Today

advertisement
The
evolution of
evaluation
Joseph ‘Jofish’ Kaye
Microsoft Research, Cambridge
Cornell University, Ithaca, NY
jofish @ cornell.edu
What is evaluation?
• Something you do at the end of
a project to show it works…
• … so you can publish it.
• A tradition in a field
• A way of defining a field
• A process that changes over
time
• A reason papers get rejected
HCI Evaluation: Validity
“Methods for establishing
validity vary depending on the
nature of the contribution.
They may involve empirical
work in the laboratory or the
field, the description of
rationales for design decisions
and approaches, applications
of analytical techniques, or
‘proof of concept’ system
implementations”
CHI 2007 Website
So…
• How did we get to where we
are today?
• Why did we end up with the
system(s) we use today?
• How can our current
approaches to evaluation deal
with novel concepts of HCI,
such as experience-focused
(rather than task focused) HCI?
Experience focused HCI
(a question to think about during
this talk)
What does it mean when this is
your evaluation method?
A Brief History and plan for
the talk
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
4. Evaluation by HCI
Professionals
5. Evaluation in CSCW
6. Evaluation for Experience
A Brief History and plan for
the talk
1. Evaluation by Engineers
2. Evaluation by Computer
Scientists
3. Evaluation by Experimental
Psychologists & Cognitive
Scientists
a. Case study: Evaluation of Text
Editors
4. Evaluation by HCI Professionals
a) Case Study: The Damaged
Merchandise Debate
5. Evaluation in CSCW
6. Evaluation for Experience
3 Questions to ask
about an era
• Who are the users?
• Who are the evaluators?
• What are the limiting factors?
Evaluation by Engineers
• Users are engineers &
mathematicians
• Evaluators are engineers
• The limiting factor is
reliability
Evaluation by Computer
Scientists
• Users are programmers
• Evaluators are programmers
• The speed of the machine is
the limiting factor
Evaluation by
Experimental Psychologists
& Cognitive Scientists
• Users are users: the computer
is a tool, not an end result
• Evaluators are cognitive
scientists and experimental
psychologists: they’re used to
measuring things through
experiment
• The limiting factor is what the
human can do
Evaluation by
Experimental Psychologists
& Cognitive Scientists
Perceptual issues such as print
legibility and motor issues arose in
designing displays, keyboards and
other input devices… [new interface
developments] created opportunities
for cognitive psychologists to
contribute in such areas as motor
learning, concept formation,
semantic memory and action.
In a sense, this marks the emergence
of the distinct discipline of humancomputer interaction. (Grudin 2006)
Case Study: Text
Editors
Roberts & Moran, 1982, 1983.
Their methodology for evaluating
text editors had three criteria:
objectivity
thoroughness
ease-of-use
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular editor’s
conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”
ease-of-use (of the method, not the
editor itself)
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources”
Case Study: Text
Editors
objectivity
“implies that the methodology not be
biased in favor of any particular editor’s
conceptual structure”
thoroughness
“implies that multiple aspects of editor
use be considered”.
ease-of-use (of the method (not the
editor itself),
“the methodology should be usable by
editor designers, managers of word
processing centers, or other
nonpsychologists who need this kind of
evaluative information but who have
limited time and equipment resources.”
Case Study: Text
Editors
Text editors are
the white rats of HCI
Thomas Green, 1984,
in Grudin, 1990.
Case Study: Text
Editors
Text editors are
the white rats of HCI
Thomas Green, 1984,
in Grudin, 1990.
…which tells us more about HCI
than it does about text editors.
Evaluation by HCI
Professionals
• Usability professionals
• They believe in expertise
(e.g. Neilsen 1984)
• They’ve made a decision
decision to decide to focus on
better results, regardless of
whether they were
experimentally provable or
not.
Case Study: The Damaged
Merchandise Debate
Damaged Merchandise
Setup
Early eighties:
usability evaluation methods
(UEMs)
- heuristics (Neilsen)
- cognitive walkthrough
- GOMS
-…
Damaged Merchandise
Comparison Studies
Jefferies, Miller, Wharton and
Uyeda (1991)
Karat, Campbell and Fiegel
(1992)
Neilsen (1992)
Desuirve, Kondziela, and Atwood
(1992)
Neilsen and Phillips (1993)
Damaged Merchandise
Panel
Wayne D. Gray, Panel at CHI’95
Discount or Disservice? Discount
Usability Analysis at a Bargain
Price or Simply Damaged
Merchandise
Damaged Merchandise
Paper
Wayne D. Gray & Marilyn
Salzman
Special issue of HCI:
Experimental Comparisons of
Usability Evaluation Methods
Damaged Merchandise
Response
Commentary on Damaged
Merchandise
Karat: experiment in context
Jefferies & Miller: real-world
Lund & McClelland: practical
John: case studies
Monk: broad questions
Oviatt: field-wide science
MacKay: triangulate
Newman: simulation & modelling
Damaged Merchandise
What’s going on?
Gray & Salzman, p19
There is a tradition in the human factors
literature of providing advice to practitioners
on issues related to, but not investigated in,
an experiment. This tradition includes the
clear and explicit separation of experimentbased claims from experience-based advice.
Our complaint is not against experimenters
who attempt to offer good advice… the advice
may be understood as research findings rather
than the researcher’s opinion.
Damaged Merchandise
What’s going on?
Gray & Salzman, p19
There is a tradition in the human factors
literature of providing advice to practitioners
on issues related to, but not investigated in,
an experiment. This tradition includes the
clear and explicit separation of experimentbased claims from experience-based advice.
Our complaint is not against experimenters
who attempt to offer good advice… the advice
may be understood as research findings rather
than the researcher’s opinion.
Damaged Merchandise
Clash of Paradigms
Experimental Psychologists &
Cognitive Scientists
(who believe in experimentation)
vs.
HCI Professionals
(who believe in experience and
expertise, even if
‘unprovable’) (and who were
trying to present their work in
the terms of the dominant
paradigm of the field.)
Evaluation in CSCW
• A story I’m not telling
• CSCW vs. HCI
• Not just groups, but philosophy
(ideology!)
• Member-created, dynamic, not
cognitive, modelable
• Follows failure of ‘workplace
studies’ to characterize
• IE Plans and Situated Actions
vs. The Psychology of HumanComputer Interaction
Evaluation of
Experience Focused HCI
• A possibly emerging sub-field:
–
–
–
–
–
Gaver et. al.
Isbister et. al.
Höök et. al.
Sengers et. al.
Etc.
• How to evaluate?
Epistemology
• How does a field know what it
knows?
• How does a field know that it
knows it?
• Science: experiment…
• But literature? Anthropology?
Sociology? Therapy? Art?
Theatre? Design?
Epistemology
Formally:
The aim of this work is to
recognize the ways in which
multiple epistemologies, not
just the experimental
paradigm of science, can and
do inform the hybrid discipline
of human-computer
interaction.
Shouts To My Homies
•
•
•
•
•
•
Maria Håkansson
Lars Erik Holmquist
Alex Taylor & MS Research
Phoebe Sengers & CEmCom
Cornell S&TS Department
Many discussions over the last
year… and this one to come.
Download