Experimental Design Review

advertisement
MP2
Experimental Design Review
HCI W2014
What is experimental design?
How do I plan an experiment?
Acknowledgement: Much of the material in this lecture is based on material prepared for similar courses by
Saul Greenberg (University of Calgary) as adapted by Joanna McGrenere
1
Experimental Planning Flowchart
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Problem
definition
Planning
Conduct
research
Analysis
Interpretation
data
reductions
interpretation
feedback
research
idea
literature
review
statement of
problem
hypothesis
development
define
variables
pilot
testing
generalization
controls
data
collection
apparatus
hypothesis
testing
procedures
select
subjects
design
feedback
2
statistics
reporting
What’s the goal?

Overall research goals impact choice of study
design
–
–

The stage in the design process impacts the
choice of study design
–
–
3
Exploratory research vs. hypothesis confirmation
Ecological validity vs tightly controlled
Formative evaluation (to get iterative feedback on initial
design and/or design choices)
Summative evaluation (to determine whether the design
is better/stronger/faster than alternative approaches)
What’s the research question?

Study research questions impact choice of:
–
–
–
–

Testable hypotheses impact
–
4
Protocol, task
Experimental conditions (factors)
Constructs (effectiveness)
Measures (task completion, error rate)
choice of statistical analysis (also impacted by nature of
the data and experimental design)
Experimental Planning Flowchart
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Problem
definition
Planning
Conduct
research
Analysis
Interpretation
data
reductions
interpretation
feedback
research
idea
literature
review
statement of
problem
hypothesis
development
define
variables
pilot
testing
generalization
controls
data
collection
apparatus
statistics
reporting
hypothesis
testing
procedures
select
subjects
design
feedback
5 Reality check: does the final design support the research questions
Quantitative system evaluation

Quantitative:
–
–

Methods
–
–

Controlled Experiments
Statistical Analysis
Measures
–
–
6
precise measurement, numerical values
bounds on how correct our statements are
Objective: user performance (speed & accuracy)
Subjective: user satisfaction
Controlled experiments
The traditional scientific method
–
–
clear convincing result on specific issues
in HCI:


insights into cognitive process, human performance limitations, ...
allows comparison of systems, fine-tuning of details ...
Strive for
–
–
–
–
–
7
–
lucid and testable hypothesis (usually a causal inference)
quantitative measurement
measure of confidence in results obtained (inferential
statistics)
ability to replicate the experiment
control of variables and conditions
removal of experimenter bias
The experimental method
a) Begin with a lucid, testable hypothesis
H0: there is no difference in user performance (time and error
rate) when selecting a single item from a pop-up or a pull
down menu, regardless of the subject’s previous expertise in
using a mouse or using the different menu types
File
Edit
Insert
File
Edit
New
Open
View
Close
Insert
Save
8
View
New
Open
Close
Save
The experimental method
b) Explicitly state the independent variables that are to
be altered
Independent variables
–
–
the things you control (independent of how a subject behaves)
two different kinds:
1.
2.
treatment manipulated (can establish cause/effect, true experiment)
subject individual differences (can never fully establish cause/effect)
in menu experiment
–
–
–
9
menu type: pop-up or pull-down
menu length: 3, 6, 9, 12, 15
expertise: expert or novice (a subject variable – the researcher can
not manipulate)
The experimental method
c) Carefully choose the dependent variables that will be
measured
Dependent variables
–
variables dependent on the subject’s behaviour / reaction to
the independent variable
–
Make sure that what you measure actually represents the
higher level concept!
in menu experiment
–
–
–
10
time to select an item
selection errors made
Higher level concept (user performance)
The experimental method
d) Judiciously select and assign subjects to groups
Ways of controlling subject variability
–
–
recognize classes and make them an independent variable
minimize unaccounted anomalies in subject group
superstars versus poor performers
–
use reasonable number of subjects and random assignment
Novice
11
Expert
The experimental method...
e) Control for biasing factors
–
unbiased instructions +
experimental protocols
prepare ahead of time
–
–
double-blind experiments, ...
Potential confounding
Now you get to do the
pop-up menus. I think
variables
you will really like them...
I designed them myself!
–
–
–
12
Order effects
Learning effects
Counterbalancing
(http://www.yorku.ca/mack/R
N-Counterbalancing.html)
The experimental method
f) Apply statistical methods to data analysis
–
Confidence limits: the confidence that your conclusion is
correct
 “The hypothesis that mouse experience makes no
difference is rejected at the .05 level” (i.e., null hypothesis
rejected)
 means:
–
a 95% chance that your finding is correct
– a 5% chance you are wrong
g) Interpret your results
–
–
13
what you believe the results mean, and their implications
yes, there can be a subjective component to quantitative
analysis
Experimental designs

Between subjects: Different participants - single
group of participants is allocated randomly to the
experimental conditions.

Within subjects: Same participants - all
participants appear in both conditions.

Matched participants: participants are matched in
pairs, e.g., based on expertise, gender, etc.

Mixed: Some independent variables are within
subjects, some are between subjects
14
www.id-book.com
Within-subjects



It solves the individual differences issues
Allows participants to make comparisons
between conditions
But raises other problems:
–
Need to look at the impact of experiencing the
two conditions
Order Effects



Changes in performance resulting from
(ordinal) position in which a condition appears
in an experiment (always first?)
Arises from warm-up, learning, learning what
they will be asked to reflect upon, fatigue, etc.
Effect can be averaged and removed if all
possible orders are presented in the
experiment and there has been random
assignment to orders
Sequence effects



Changes in performance resulting from
interactions among conditions (e.g., if done
first, condition 1 has an impact on performance
in condition 2)
Effects viewed may not be main effects of the
IV, but interaction effects
Can be controlled by arranging each condition
to follow every other condition equally often
Counterbalancing


Controlling order and sequence effects by arranging
subjects to experience the various conditions (levels of the
IV) in different orders
Self-directed learning: investigate the different
counterbalancing methods
–
–
–
–
–
Randomization
Block Randomization
Reverse counter-balancing
Latin squares and Greco squares (when you can’t fully
counterbalance)
http://www.experiment-resources.com/counterbalanced-measuresdesign.html
Between, within, matched participant
design
Design
Advantages
Disadvantages
Between
No order effects
Many subjects &
individual differences a
problem
Within
Few individuals, no
individual differences
Counter-balancing
needed because of
ordering effects
Matched
Same as different
participants but
individual differences
reduced
Cannot be sure of
perfect matching on all
differences
19
www.id-book.com
Internal Validity

the extent to which a causal conclusion based on a
study is warranted

Internal validity is reduced due to the presence of
controlled/confounded variables
–

But not necessarily invalid
It’s important for the researcher to evaluate the
likelihood that there are alternative hypotheses for
observed differences
–
Need to convince self and audience of the validity
External validity

The extent to which the results of a study can be
generalized to other situations and to other people
If the experimental setting more closely replicates
the setting of interest, external validity can be
higher than in a true experiment run in a controlled
lab setting
 Often comes down to what is most important for
the research question

–
Control or ecological validity?
Control

True experiment = complete control over the
subject assignment to conditions and the
presentation of conditions to subjects
–

Control of the who => random assignment to
conditions
–

Control over the who, what, when, where, how
Only by chance can other variables be
confounded with IV
Control of the what/when/where/how =>
control over the way the experiment is
conducted
Quasi-Experiment

When you can’t achieve complete control
–
–
Lack of complete control over conditions
Subjects for different conditions come from
potentially non-random pre-existing groups


Experts vs novices
Early adopters vs technophobes?
It’s a matter of control
True Experiment



Random assignment
of subjects to
condition
Manipulate the IV
Control allows ruling
out of alternative
hypotheses
Quasi Experiment
 Selection of subjects for
the conditions
 Observe categories of
subjects
– If the subject variable
is the IV, it’s a quasi
experiment
 Don’t know whether
differences are caused
by the IV or differences
in the subjects
Other features

In some instances cannot completely control
the what, when, where, and how
–
–
Need to collect data at a certain time or not at
all
Practical limitations to data collection,
experimental protocol
Download