Treating Stimuli as a Random Factor in Social Psychology: A New and Comprehensive Solution to a Pervasive but Largely Ignored Problem Jacob Westfall University of Colorado Boulder Charles M. Judd University of Colorado Boulder David A. Kenny University of Connecticut What to do about replicability? • Mandatory reporting of all DVs, studies, etc.? • Journals or journal sections devoted to straight replication attempts? • Pre-registration of studies? • Many of the proposed solutions involve large-scale institutional changes, restructuring incentives, etc. • These are good ideas worthy of discussing, but surely not quick or easy to implement One way to increase replicability: Treat stimuli as random • Failure to account for uncertainty associated with stimulus sampling (i.e., treating stimuli as fixed rather than random) leads to biased, overconfident estimates of effects (Clark, 1973; Coleman, 1964) • The pervasive failure to model stimulus as a random factor is probably responsible for many failures to replicate when future studies use different stimulus samples Doing the correct analysis is easy! • Recently developed statistical methods solve the statistical problem of stimulus sampling • These mixed models with crossed random effects are easy to apply and are already widely available in major statistical packages (R, SAS, SPSS, Stata, etc.) Outline of rest of talk 1. The problem – Illustrative design and typical RM-ANOVA analyses – Estimated type 1 error rates 2. The solution – Introducing mixed models with crossed random effects for participants and stimuli – Applications of mixed model analyses to actual datasets Illustrative Design • Participants crossed with Stimuli – Each Participant responds to each Stimulus • Stimuli nested under Condition – Each Stimulus always in either Condition A or Condition B • Participants crossed with Condition – Participants make responses under both Conditions Sample of hypothetical dataset: 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 Typical repeated measures analyses (RM-ANOVA) 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 How variable are the stimulus ratings around each of the participant means? The variance is lost due to the aggregation “By-participant analysis” MBlack MWhite Difference 5.5 6.67 1.17 5.5 6.17 0.67 5.0 5.33 0.33 Typical repeated measures analyses (RM-ANOVA) 5 4 6 7 3 8 8 7 9 5 6 5 4 4 7 8 4 6 9 6 7 4 5 6 5 3 6 7 4 5 7 5 8 3 4 5 4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33 Sample 1 v.s. Sample 2 “By-stimulus analysis” Simulation of type 1 error rates for typical RM-ANOVA analyses • Design is the same as previously discussed • Draw random samples of participants and stimuli – Variance components = 4, Error variance = 16 • Number of participants ∈ {10, 30, 50, 70, 90} • Number of stimuli ∈ {10, 30, 50, 70, 90} • Conducted both by-participant and by-stimulus analysis on each simulated dataset • True Condition effect = 0 Type 1 error rate simulation results • The exact simulated error rates depend on the variance components, which although realistic, were ultimately arbitrary • The main points to take away here are: 1. The standard analyses will virtually always show some degree of positive bias 2. In some (entirely realistic) cases, this bias can be extreme 3. The degree of bias depends in a predictable way on the design of the experiment (e.g., the sample sizes) The old solution: Quasi-F statistics • Although quasi-Fs successfully address the statistical problem, they suffer from a variety of limitations – Require complete orthogonal design (balanced factors) – No missing data – No continuous covariates – A different quasi-F must be derived (often laboriously) for each new experimental design – Not widely implemented in major statistical packages The new solution: Mixed models • Known variously as: – Mixed-effects models, multilevel models, random effect models, hierarchical linear models, etc. • Most social psychologists familiar with mixed models for hierarchical random factors – E.g., students nested in classrooms • Less well known is that mixed models can also easily accommodate designs with crossed random factors – E.g., participants crossed with stimuli Grand mean = 100 MeanA = -5 MeanB = 5 Participant Intercepts 5.86 7.09 -1.09 -4.53 Stim. Intercepts: -2.84 -9.19 -1.16 18.17 Participant Slopes 3.02 -9.09 3.15 -1.38 Everything else = residual error The linear mixed-effects model with crossed random effects Fixed effects Random effects The linear mixed-effects model with crossed random effects Intercept 6 parameters Slope Fitting mixed models is easy: Sample syntax R SAS SPSS library(lme4) model <- lmer(y ~ c + (1 | j) + (c | i)) proc mixed covtest; class i j; model y=c/solution; random intercept c/sub=i type=un; random intercept/sub=j; run; MIXED y WITH c /FIXED=c /PRINT=SOLUTION TESTCOV /RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN) /RANDOM=INTERCEPT | SUBJECT(j). Mixed models successfully maintain the nominal type 1 error rate (α = .05) Applications to existing datasets 1. Representative simulated dataset (for comparison) 2. Afrocentric features data (Blair et al., 2002, 2004, 2005) 3. Shooter data (Correll et al., 2002, 2007) 4. Psi / Retroactive priming data (Bem) – Forward-priming condition (classic evaluative priming effect) – Reverse-priming condition (psi condition) Comparison of effects between RM-ANOVA and mixed model analyses Dataset RM-ANOVA (by-participant) F ratio D.F. Simulated example 30.48 Shooter data p Mixed model Stimulus ICC F ratio D.F. p (1, 29) <.001 9.11 (1, 38.52) .005 r = 0.191 57.89 (1, 35) <.001 3.39 (1, 48.1) .072 r = 0.317 Afrocentric features data 6.40 (1, 46) .015 4.33 (1, 51.1) .043 r = 0.113 Bem (2011) Forwardpriming condition 22.18 (1, 98) <.001 14.59 (1, 46.91) .029 Targets: r = 0.349 Primes: r = 0.035 Bem (2011) Reversepriming condition 6.60 (1, 98) .012 2.34 (1, 27.58) .136 Targets: r = 0.292 Primes: r = 0.0 Comparison of effects between RM-ANOVA and mixed model analyses Dataset RM-ANOVA (by-participant) F ratio D.F. Simulated example 30.48 Shooter data p Mixed model Stimulus ICC F ratio D.F. p (1, 29) <.001 9.11 (1, 38.52) .005 r = 0.191 57.89 (1, 35) <.001 3.39 (1, 48.1) .072 r = 0.317 Afrocentric features data 6.40 (1, 46) .015 4.33 (1, 51.1) .043 r = 0.113 Bem (2011) Forwardpriming condition 22.18 (1, 98) <.001 14.59 (1, 46.91) .029 Targets: r = 0.349 Primes: r = 0.035 Bem (2011) Reversepriming condition 6.60 (1, 98) .012 2.34 (1, 27.58) .136 Targets: r = 0.292 Primes: r = 0.0 Comparison of effects between RM-ANOVA and mixed model analyses Dataset RM-ANOVA (by-participant) F ratio D.F. Simulated example 30.48 Shooter data p Mixed model Stimulus ICC F ratio D.F. p (1, 29) <.001 9.11 (1, 38.52) .005 r = 0.191 57.89 (1, 35) <.001 3.39 (1, 48.1) .072 r = 0.317 Afrocentric features data 6.40 (1, 46) .015 4.33 (1, 51.1) .043 r = 0.113 Bem (2011) Forwardpriming condition 22.18 (1, 98) <.001 14.59 (1, 46.91) .029 Targets: r = 0.349 Primes: r = 0.035 Bem (2011) Reversepriming condition 6.60 (1, 98) .012 2.34 (1, 27.58) .136 Targets: r = 0.292 Primes: r = 0.0 Conclusion • Many failures of replication are probably due to sampling stimuli and the failure to take that into account • Mixed models with crossed random effects allow for generalization to future studies with different samples of both stimuli and participants The end Further reading: Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of personality and social psychology, 103(1), 54-69.