Overlooking stimulus variance

advertisement

Overlooking Stimulus Variance

Jake Westfall

University of Colorado Boulder

Charles M. Judd David A. Kenny

University of Colorado Boulder University of Connecticut

Cornfield & Tukey (1956):

“The two spans of the bridge of inference”

My actual samples

50 University of Colorado undergraduates;

40 positively/negatively valenced English adjectives

Ultimate targets of generalization

All healthy, Western adults;

All non-neutral visual stimuli

My actual samples

50 University of Colorado undergraduates;

40 positively/negatively valenced English adjectives

Ultimate targets of generalization

All healthy, Western adults;

All non-neutral visual stimuli

All potentially sampled participants/stimuli

All CU undergraduates taking

Psych 101 in Spring 2014;

All short, common, strongly valenced English adjectives

My actual samples

50 University of Colorado undergraduates;

40 positively/negatively valenced English adjectives

Ultimate targets of generalization

All healthy, Western adults;

All non-neutral visual stimuli

All potentially sampled participants/stimuli

“Subject-matter span”

“Statistical span”

My actual samples

50 University of Colorado undergraduates;

40 positively/negatively valenced English adjectives

Difficulties crossing the statistical span

• Failure to account for uncertainty associated with stimulus sampling (i.e., treating stimuli as fixed rather than random) leads to biased, overconfident estimates of effects

• The pervasive failure to model stimulus as a random factor is probably responsible for many failures to replicate when future studies use different stimulus samples

Doing the correct analysis is easy!

• Modern statistical procedures solve the statistical problem of stimulus sampling

• These linear mixed models with crossed random effects are easy to apply and are already widely available in major statistical packages

– R, SAS, SPSS, Stata, etc.

Illustrative Design

• Participants crossed with Stimuli

– Each Participant responds to each Stimulus

• Stimuli nested under Condition

– Each Stimulus always in either Condition A or Condition B

• Participants crossed with Condition

– Participants make responses under both Conditions

Sample of hypothetical dataset:

5

4

5

4

4

3

6

7

6

7

8

7

3

4

4

8

6

5

8

9

7

7

6

5

9

7

8

5

4

3

6

5

4

5

6

5

Typical repeated measures analyses (RM-ANOVA)

5

4

5

4

4

3

6

7

6

7

8

7

3

4

4

8

6

5

8

9

7

7

6

5

9

7

8

How variable are the stimulus ratings

“By-participant analysis”

The variance is lost due to the aggregation

5

4

3

M

Black

5.5

5.5

5.0

M

White

6.67

6.17

5.33

Difference

1.17

0.67

0.33

6

5

4

5

6

5

Typical repeated measures analyses (RM-ANOVA)

5

4

5

4

4

3

6

7

6

7

8

7

3

4

4

8

6

5

8

9

7

7

6

5

9

7

8

5

4

3

6

5

4

5

6

5

4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33

Sample 1 v.s. Sample 2

“By-stimulus analysis”

Simulation of type 1 error rates for typical RM-ANOVA analyses

• Design is the same as previously discussed

• Draw random samples of participants and stimuli

– Variance components = 4, Error variance = 16

• Number of participants = 10, 30, 50, 70, 90

• Number of stimuli = 10, 30, 50, 70, 90

• Conducted both by-participant and by-stimulus analysis on each simulated dataset

• True Condition effect = 0

Type 1 error rate simulation results

• The exact simulated error rates depend on the variance components, which although realistic, were ultimately arbitrary

• The main points to take away here are:

1. The standard analyses will virtually always show some degree of positive bias

2. In some (entirely realistic) cases, this bias can be extreme

3. The degree of bias depends in a predictable way on the design of the experiment (e.g., the sample sizes)

The old solution: Quasi-F statistics

• Although quasi-Fs successfully address the statistical problem, they suffer from a variety of limitations

– Require complete orthogonal design (balanced factors)

– No missing data

– No continuous covariates

– A different quasi-F must be derived (often laboriously) for each new experimental design

– Not widely implemented in major statistical packages

The new solution: Mixed models

• Known variously as:

– Mixed-effects models, multilevel models, random effects models, hierarchical linear models, etc.

• Most psychologists familiar with mixed models for hierarchical random factors

– E.g., students nested in classrooms

• Less well known is that mixed models can also easily accommodate designs with crossed random factors

– E.g., participants crossed with stimuli

Grand mean = 100

Mean

A

= -5 Mean

B

= 5

Participant

Means

5.86

7.09

-1.09

-4.53

Stimulus Means: -2.84 -9.19 -1.16 18.17

Participant

Slopes

3.02

-9.09

3.15

-1.38

Everything else = residual error

The linear mixed-effects model with crossed random effects

Fixed effects Random effects

Fitting mixed models is easy: Sample syntax

R library(lme4) model <- lmer(y ~ c + (1 | j) + (c | i))

SAS

SPSS proc mixed covtest; class i j; model y=c/solution; random intercept c/sub=i type=un; random intercept/sub=j; run;

MIXED y WITH c

/FIXED=c

/PRINT=SOLUTION TESTCOV

/RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN)

/RANDOM=INTERCEPT | SUBJECT(j).

Mixed models successfully maintain the nominal type 1 error rate (α = .05)

Conclusion

• Stimulus variation is a generalizability issue

• The conclusions we draw in the Discussion sections of our papers ought to be in line with the assumptions of the statistical methods we use

• Mixed models with crossed random effects allow us to generalize across both participants and stimuli

The end

Further reading:

Judd, C. M., Westfall, J., & Kenny, D. A. (2012).

Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of

personality and social psychology, 103(1), 54-69.

Download