Mixed Effects ANOVA

advertisement
Random and Mixed Effects ANOVA
A classification variable in ANOVA may be either “fixed” or “random.” The
meaning of “fixed” and “random” are the same as they were when we discussed the
distinction between regression and correlation analysis. With a fixed variable we treat
the observed values of the variable as the entire population of interest. Another way to
state this is to note that the sampling fraction is one. The sampling fraction is the
number of values in the sample divided by the number of values in the population.
Suppose that one of the classification variables in which I am interested is the
diagnosis given to a patient. There are three levels of this variable, 1 (melancholic
depression), 2 (postpartum depression), and 3 (seasonal affective disorder). Since I
consider these values (1, 2, and 3) the entire population of interest, the variable is fixed.
Suppose that a second classification variable is dose of experimental therapeutic
drug. The population of values of interest ranges from 0 units to 100 units. I randomly
chose five levels from a uniform population that ranges from 0 to 100, using this SAS
code:
Data Sample; Do Value=1 To 5;Dose=round(100*Uniform(0)); Output; End; run;
Proc Print; run;
quit;
Obs
Value
Dose
1
2
3
4
5
1
2
3
4
5
12
23
54
64
98
In my research, I shall use the values 12, 23, 54, 64, and 98 units of the drug.
There is a uncountably large number of possible values between 0 and 100, so my
sampling fraction is 5/ = 0. Put another way, dose of drug is a random effects
variable.
The Group x Dose ANOVA here will be “mixed effects,” because there is a
mixture of fixed and random effects. When calculating the F ratios, we need to consider
the expected values for the mean squares in both numerator and denominator. We
want the denominator (error term) to have an expected mean square that contains
everything in the numerator except the effect being tested.
Howell (Statistical Methods for Psychology, 7th edition, page 433) shows the
expected values for the mean squares. They are:
Main effect of Group (fixed):
Group, Interaction, Error
Main effect of Dose (random): Dose, Error
Group x Dose Interaction:
Interaction, Error
Within Cells Error (MSE):
Error
ANOVA-MixedEffects.doc
2
The F for the main effect of group will be
MSgroup
Group  Interactio n  Error
F

. Under the null, group has zero effect,
MSGroup Dose
Interactio n  Error
and the expected value of F is (0 + interaction +error)/(interaction + error) = 1. If group
has an effect, the expected value of F > 1.
MSdose Dose  Error
The F for the main effect of dose will be F 
. Under the

MSerror
Error
null, dose has no effect, and the expected value of F is (0 + error/error) = 1. If dose has
an effect, the expected value of F > 1.
The F for the Group x Dose interaction will be
MSGroup Dose Interactio n  Error
. Under the null, the interaction has no effect, and
F

MSerror
Error
the expected value of F is (0 + error/error) = 1. If dose has an effect, the expected
value of F > 1.
You can use the TEST statement in PROC GLM to construct the appropriate F
tests.
An Example
Download the Excel file ANOVA-MixedEffects.xls, available at
http://core.ecu.edu/psyc/wuenschk/StatData/StatData.htm .
Bring it into SAS. If you do not know how to do this, read my document Excel to
SAS .
Run this code:
proc glm; class group dose; model score = group|dose / ss3;
Test H = group E = group*dose;
title 'Mixed Effects ANOVA: Group is fixed, dose is random'; run;
------------------------------------------------------------------------------------------------Mixed Effects ANOVA:
Group is fixed, dose is random
The GLM Procedure
Dependent Variable: Score
Source
DF
Sum of
Squares
Mean Square
F Value
Pr > F
Model
14
978.986667
69.927619
15.20
<.0001
Error
60
276.000000
4.600000
Corrected Total
74
1254.986667
R-Square
Coeff Var
Root MSE
Score Mean
0.780077
28.02388
2.144761
7.653333
3
Source
Group
Dose
Group*Dose
DF
Type III SS
Mean Square
F Value
Pr > F
2
4
8
246.7466667
612.5866667
119.6533333
123.3733333
153.1466667
14.9566667
26.82
33.29
3.25
<.0001
<.0001
0.0039
Tests of Hypotheses Using the Type III MS for Group*Dose as an Error Term
Source
DF
Type III SS
Mean Square
F Value
Pr > F
Group
2
246.7466667
123.3733333
8.25
0.0114
The appropriate F statistics are:
Group: F(2, 8) = 8.25
Dose: F(4, 60) = 33.29
Group x Dose: F(8, 60) = 3.25.
Random Command
GLM has a random command that can be used to identify random effects.
Unfortunately it does not result in the properly selection of error terms for a mixed
model.
proc glm; class group dose; model score = group|dose / ss3; Random dose
group*dose / Test;
title 'Mixed Effects ANOVA: Group is fixed, dose is random'; run;
Tests of Hypotheses for Mixed Model Analysis of Variance
Dependent Variable: Score
Score
Source
DF
Type III SS
Mean Square
F Value
Pr > F
Group
Dose
2
4
246.746667
612.586667
123.373333
153.146667
8.25
10.24
0.0114
0.0031
Error
Error: MS(Group*Dose)
8
119.653333
14.956667
DF
Type III SS
Mean Square
F Value
Pr > F
8
119.653333
14.956667
3.25
0.0039
60
276.000000
4.600000
Source
Group*Dose
Error: MS(Error)
Notice that SAS has used the interaction MS as the error term for both main
effects. It should have used it only for the main effect of group. SPSS UNIANOVA has
the same problem Howell (7th ed.) has noted this in footnote 2 on page 434.
Power Considerations
Interaction mean squares typically have a lot fewer degrees of freedom than do
error mean squares. This can cost one considerable power when an interaction mean
4
square is used as the denominator of an F ratio. Howell (5th edition, page 445)
suggested one possible way around this problem. If the interaction effect has a large p
value (.25 or more), dump it from the model. This will result in the interaction SS and
the interaction df being pooled together with the error SS and the error df. The resulting
SSGroup Dose  SSerror
pooled error term,
, is then used as the denominator for testing both
dfGroup Dose  dferror
main effects. For the data used above, the interaction was significant, so this would not
be appropriate. If the interaction had not been even close to significant, this code would
produce the appropriate analysis:
proc glm; class group dose; model score = group dose / ss3;
title 'Main Effects Only, Interaction Pooled With Within-Cells Error'; run;
Subjects – the Hidden Random Effect
We pretend that we have randomly sampled subjects from the population to
which we wish to generalize our results – or we restrict our generalizations to that
abstract population for which our sample could be considered random. Accordingly,
subjects is a random variable. In the typical ANOVA, independent samples and fixed
effects, subjects is lurking there as a random effect. What we call “error” is simply the
effect of subjects, which is nested within the cells. ANOVA is only necessary when we
have at least one random effect (typically subjects) and we wish to generalize our
results to the entire population of subjects from which we randomly sampled. If we were
to consider subjects to be a fixed variable, then we would have the entire population
and would not need ANOVA – the means, standard deviations, etc. computed with our
data would be parameters, not statistics, and there would be no need for inferential
statistics.
Nested and Crossed Factors
Suppose one factor was Households and another was Neighborhoods.
Households would be nested within Neighborhoods – each household is in only one
neighborhood. If you know the identity of the household, you also know the identity of
the neighborhood.
Neighborhood 1
Neighborhood 2
Neighborhood 3
H1
H4
H7
H2
H5
H8
H3
H6
H9
Now suppose that one factor is Teachers, the other is Schools, and each teacher
taught at each of the three schools. Teachers and Schools are crossed.
School 1
School 2
School 3
T1
T1
T1
T2
T2
T2
5
T3
T3
T3
Between Subjects (Independent Samples) Designs
The subjects factor is nested within the grouping factor(s).
Group 1
Group 2
Group 3
S1
S4
S7
S2
S5
S8
S3
S6
S9
Within Subjects (Repeated Measures, Related Samples, Randomized Blocks,
Split-Plot) Designs
With this design, the subjects (or blocks or plots) factor is crossed with the other
factor.
Condition 1
Condition 2
Condition 3
S1
S1
S1
S2
S2
S2
S3
S3
S3
Omega Squared
If you want to use 2 with data from a mixed-effects or random effects ANOVA,
you will need to 438-440 in Howell (7th ed.). After reading that, you just might decide
that 2 is adequate.
Karl L. Wuensch, Dept. of Psychology, East Carolina Univ., Greenville, NC USA
19. December 2010
Download