Statistical power in experiments in which samples of participants respond to samples of stimuli Jake Westfall University of Colorado Boulder David A. Kenny University of Connecticut Charles M. Judd University of Colorado Boulder • Studies involving participants responding to stimuli (hypothetical data matrix): Subject # 1 2 3 . . . 4 6 7 3 8 8 7 9 5 6 4 7 8 4 6 9 6 7 4 5 3 6 7 4 5 7 5 8 3 4 • Just in domain of implicit prejudice and stereotyping: – – – – – – – IAT (Greenwald et al.) Affective Priming (Fazio et al.) Shooter task (Correll et al.) Affect Misattribution Procedure (Payne et al.) Go/No-Go task (Nosek et al.) Primed Lexical Decision task (Wittenbrink et al.) Many non-paradigmatic studies Hard questions • “How many stimuli should I use?” • “How similar or variable should the stimuli be?” • “When should I counterbalance the assignment of stimuli to conditions?” • “Is it better to have all participants respond to the same set of stimuli, or should each participant receive different stimuli?” • “Should participants make multiple responses to each stimulus, or should every response by a participant be to a unique stimulus?” Power analysis in crossed designs • Power determined by several parameters: – 1 effect size (Cohen’s d) – 2 sample sizes • p = # of participants • q = # of stimuli – Set of Variance Partitioning Coefficients (VPCs) • VPCs describe what proportion of the random variation in the data comes from which sources • Different designs depend on different VPCs Four common experimental designs For power = 0.80, need q ≈ 50 For power = 0.80, need p ≈ 20 ? Maximum attainable power • In crossed designs, power asymptotes at a maximum theoretically attainable value that depends on: – Effect size – Number of stimuli – Stimulus variability • Under realistic assumptions, maximum attainable power can be quite low! To obtain max. power = 0.9… Pessimist: q=86 Realist: q= 20 to 50 Optimist: q=11 Implications of maximum attainable power • Think hard about your experimental stimuli before you begin collecting data! – Once data collection begins, maximum attainable power is pretty much determined. • Even the most optimistic assumptions imply that we should use at least 11 stimuli per between-stimulus condition – Based on achieving max. power = 0.9 to detect a medium effect size (d = 0.5) What about time-consuming stimulus presentation? • Assume that responses to each stimulus take about 10 minutes (e.g., film clips). • Power analysis says we need q=60 to reach power=0.8 (based on having p=60) • But then it would take over 10 hours for a participant to respond to every stimulus! • The highest feasible number of responses per participant is, say, 6 (about one hour) • Are we doomed to have low power? No! Stimuli-within-Block designs Standard error reduced by factor of 2.3! The end URL for power app: JakeWestfall.org/power/ Article reference: Westfall, J., Kenny, D. A., & Judd, C. M. (in press). Statistical Power and Optimal Design in Experiments in Which Samples of Participants Respond to Samples of Stimuli. Journal of Experimental Psychology: General.