Jake Westfall
University of Colorado Boulder
Charles M. Judd David A. Kenny
University of Colorado Boulder University of Connecticut
Cornfield & Tukey (1956):
“The two spans of the bridge of inference”
My actual samples
50 University of Colorado undergraduates;
40 positively/negatively valenced English adjectives
Ultimate targets of generalization
All healthy, Western adults;
All non-neutral visual stimuli
My actual samples
50 University of Colorado undergraduates;
40 positively/negatively valenced English adjectives
Ultimate targets of generalization
All healthy, Western adults;
All non-neutral visual stimuli
All potentially sampled participants/stimuli
All CU undergraduates taking
Psych 101 in Spring 2014;
All short, common, strongly valenced English adjectives
My actual samples
50 University of Colorado undergraduates;
40 positively/negatively valenced English adjectives
Ultimate targets of generalization
All healthy, Western adults;
All non-neutral visual stimuli
All potentially sampled participants/stimuli
“Subject-matter span”
“Statistical span”
My actual samples
50 University of Colorado undergraduates;
40 positively/negatively valenced English adjectives
• Failure to account for uncertainty associated with stimulus sampling (i.e., treating stimuli as fixed rather than random) leads to biased, overconfident estimates of effects
• The pervasive failure to model stimulus as a random factor is probably responsible for many failures to replicate when future studies use different stimulus samples
• Modern statistical procedures solve the statistical problem of stimulus sampling
• These linear mixed models with crossed random effects are easy to apply and are already widely available in major statistical packages
– R, SAS, SPSS, Stata, etc.
• Participants crossed with Stimuli
– Each Participant responds to each Stimulus
• Stimuli nested under Condition
– Each Stimulus always in either Condition A or Condition B
• Participants crossed with Condition
– Participants make responses under both Conditions
Sample of hypothetical dataset:
5
4
5
4
4
3
6
7
6
7
8
7
3
4
4
8
6
5
8
9
7
7
6
5
9
7
8
5
4
3
6
5
4
5
6
5
Typical repeated measures analyses (RM-ANOVA)
5
4
5
4
4
3
6
7
6
7
8
7
3
4
4
8
6
5
8
9
7
7
6
5
9
7
8
How variable are the stimulus ratings
“By-participant analysis”
The variance is lost due to the aggregation
5
4
3
M
Black
5.5
5.5
5.0
M
White
6.67
6.17
5.33
Difference
1.17
0.67
0.33
6
5
4
5
6
5
Typical repeated measures analyses (RM-ANOVA)
5
4
5
4
4
3
6
7
6
7
8
7
3
4
4
8
6
5
8
9
7
7
6
5
9
7
8
5
4
3
6
5
4
5
6
5
4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33
Sample 1 v.s. Sample 2
“By-stimulus analysis”
Simulation of type 1 error rates for typical RM-ANOVA analyses
• Design is the same as previously discussed
• Draw random samples of participants and stimuli
– Variance components = 4, Error variance = 16
• Number of participants = 10, 30, 50, 70, 90
• Number of stimuli = 10, 30, 50, 70, 90
• Conducted both by-participant and by-stimulus analysis on each simulated dataset
• True Condition effect = 0
• The exact simulated error rates depend on the variance components, which although realistic, were ultimately arbitrary
• The main points to take away here are:
1. The standard analyses will virtually always show some degree of positive bias
2. In some (entirely realistic) cases, this bias can be extreme
3. The degree of bias depends in a predictable way on the design of the experiment (e.g., the sample sizes)
• Although quasi-Fs successfully address the statistical problem, they suffer from a variety of limitations
– Require complete orthogonal design (balanced factors)
– No missing data
– No continuous covariates
– A different quasi-F must be derived (often laboriously) for each new experimental design
– Not widely implemented in major statistical packages
• Known variously as:
– Mixed-effects models, multilevel models, random effects models, hierarchical linear models, etc.
• Most psychologists familiar with mixed models for hierarchical random factors
– E.g., students nested in classrooms
• Less well known is that mixed models can also easily accommodate designs with crossed random factors
– E.g., participants crossed with stimuli
Grand mean = 100
Mean
A
= -5 Mean
B
= 5
Participant
Means
5.86
7.09
-1.09
-4.53
Stimulus Means: -2.84 -9.19 -1.16 18.17
Participant
Slopes
3.02
-9.09
3.15
-1.38
Everything else = residual error
The linear mixed-effects model with crossed random effects
Fixed effects Random effects
Fitting mixed models is easy: Sample syntax
R library(lme4) model <- lmer(y ~ c + (1 | j) + (c | i))
SAS
SPSS proc mixed covtest; class i j; model y=c/solution; random intercept c/sub=i type=un; random intercept/sub=j; run;
MIXED y WITH c
/FIXED=c
/PRINT=SOLUTION TESTCOV
/RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN)
/RANDOM=INTERCEPT | SUBJECT(j).
Mixed models successfully maintain the nominal type 1 error rate (α = .05)
• Stimulus variation is a generalizability issue
• The conclusions we draw in the Discussion sections of our papers ought to be in line with the assumptions of the statistical methods we use
• Mixed models with crossed random effects allow us to generalize across both participants and stimuli
Further reading:
Judd, C. M., Westfall, J., & Kenny, D. A. (2012).
Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of
personality and social psychology, 103(1), 54-69.