Three-way ANOVA Chapter

advertisement
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 688
Chapter 22 • Three-Way ANOVA
688
Three-Way ANOVA
You will need to use the following from previous chapters:
22
Chapter
A
CONCEPTUAL
FOUNDATION
Symbols
k: Number of independent groups in a one-way ANOVA
c: Number of levels (i.e., conditions) of an RM factor
n: Number of subjects in each cell of a factorial ANOVA
NT: Total number of observations in an experiment
Formulas
Formula 16.2: SSinter (by subtraction) also Formulas 16.3, 16.4, 16.5
Formula 14.3: SSbet or one of its components
Concepts
Advantages and disadvantages of the RM ANOVA
SS components of the one-way RM ANOVA
SS components of the two-way ANOVA
Interaction of factors in a two-way ANOVA
So far I have covered two types of two-way factorial ANOVAs: two-way independent (Chapter 14) and the mixed design ANOVA (Chapter 16). There is only
one more simple two-way ANOVA to describe: the two-way repeated measures
design. [There are other two-way designs, such as those including randomeffects or nested factors, but they are not commonly used—see Hays (1994) for
a description of some of these.] Just as the one-way RM ANOVA can be
described in terms of a two-way independent-groups ANOVA, the two-way RM
ANOVA can be described in terms of a three-way independent-groups ANOVA.
This gives me a reason to describe the latter design next. Of course, the threeway factorial ANOVA is interesting in its own right, and its frequent use in the
psychological literature makes it an important topic to cover, anyway. I will deal
with the three-way independent-groups ANOVA and the two-way RM ANOVA
in this section and the two types of three-way mixed designs in Section B.
Computationally, the three-way ANOVA adds nothing new to the procedure you learned for the two-way; the same basic formulas are used a greater
number of times to extract a greater number of SS components from SStotal
(eight SSs for the three-way as compared with four for the two-way). However,
anytime you include three factors, you can have a three-way interaction, and
that is something that can get quite complicated, as you will see. To give you a
manageable view of the complexities that may arise when dealing with three
factors, I’ll start with a description of the simplest case: the 2 × 2 × 2 ANOVA.
A Simple Three-Way Example
688
At the end of Section B in Chapter 14, I reported the results of a published
study, which was based on a 2 × 2 ANOVA. In that study one factor contrasted subjects who had an alcohol-dependent parent with those who did
not. I’ll call this the alcohol factor and its two levels, at risk (of codependency) and control. The other factor (the experimenter factor) also had two
levels; in one level subjects were told that the experimenter was an exploitive
person, and in the other level the experimenter was described as a nurturing
person. All of the subjects were women. If we imagine that the experiment
was replicated using equal-sized groups of men and women, the original
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 689
Section A • Conceptual Foundation
689
two-way design becomes a three-way design with gender as the third factor.
We will assume that all eight cells of the 2 × 2 × 2 design contain the same
number of subjects. As in the case of the two-way ANOVA, unbalanced threeway designs can be difficult to deal with both computationally and conceptually and therefore will not be discussed in this chapter (see Chapter 18,
section A). The cell means for a three-factor experiment are often displayed
in published articles in the form of a table, such as Table 22.1.
Nurturing
Exploitive
Row Mean
Control:
Men
Women
Mean
40
30
35
28
22
25
34
26
30
At risk:
Men
Women
Mean
Column mean
36
40
38
36.5
48
88
68
46.5
42
64
53
41.5
Table 22.1
Graphing Three Factors
The easiest way to see the effects of this experiment is to graph the cell
means. However, putting all of the cell means on a single graph would not be
an easy way to look at the three-way interaction. It is better to use two
graphs side by side, as shown in Figure 22.1. With a two-way design one has
to decide which factor is to be placed along the horizontal axis, leaving the
other to be represented by different lines on the graph. With a three-way
design one chooses both the factor to be placed along the horizontal axis and
the factor to be represented by different lines, leaving the third factor to be
represented by different graphs. These decisions result in six different ways
that the cell means of a three-way design can be presented.
Let us look again at Figure 22.1. The graph for the women shows the twoway interaction you would expect from the study on which it is based. The
graph for the men shows the same kind of interaction, but to a considerably
lesser extent (the lines for the men are closer to being parallel). This difference
Women
Figure 22.1
Men
At risk
80
80
70
70
60
60
50
50
40
40
30
30
Control
20
0
Nurturing
Exploitive
Graph of Cell Means for
Data in Table 22.1
At risk
Control
20
0
Nurturing
Exploitive
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 690
690
Chapter 22
•
Three-Way ANOVA
in amount of two-way interaction for men and women constitutes a three-way
interaction. If the two graphs had looked exactly the same, the F ratio for the
three-way interaction would have been zero. However, that is not a necessary
condition. A main effect of gender could raise the lines on one graph relative
to the other without contributing to a three-way interaction. Moreover, an
interaction of gender with the experimenter factor could rotate the lines on
one graph relative to the other, again without contributing to the three-way
interaction. As long as the difference in slopes (i.e., the amount of two-way
interaction) is the same in both graphs, the three-way interaction will be zero.
Simple Interaction Effects
A three-way interaction can be defined in terms of simple effects in a way that
is analogous to the definition of a two-way interaction. A two-way interaction
is a difference in the simple main effects of one of the variables as you change
levels of the other variable (if you look at just the graph of the women in Figure 22.1, each line is a simple main effect). In Figure 22.1 each of the two
graphs can be considered a simple effect of the three-way design—more specifically, a simple interaction effect. Each graph depicts the two-way interaction
of alcohol and experimenter at one level of the gender factor. The three-way
interaction can be defined as the difference between these two simple interaction effects. If the simple interaction effects differ significantly, the three-way
interaction will be significant. Of course, it doesn’t matter which of the three
variables is chosen as the one whose different levels are represented as different graphs—if the three-way interaction is statistically significant, there will be
significant differences in the simple interaction effects in each case.
Varieties of Three-way Interactions
Just as there are many patterns of cell means that lead to two-way interactions (e.g., one line is flat while the other goes up or down, the two lines go
in opposite directions, or the lines go in the same direction but with different slopes), there are even more distinct patterns in a three-way design. Perhaps the simplest is when all of the means are about the same, except for
one, which is distinctly different. For instance, in our present example the
results might have shown no effect for the men (all cell means about 40), no
difference for the control women (both means about 40), and a mean of 40
for at-risk women exposed to the nice experimenter. Then, if the mean for atrisk women with the exploitive experimenter were well above 40, there
would be a strong three-way interaction. This is a situation in which all three
variables must be at the “right” level simultaneously to see the effect—in this
variation of our example the subject must be female and raised by an alcohol-dependent parent and exposed to the exploitive experimenter to attain a
high score. Not only might the three-way interaction be significant, but one
cell mean might be significantly different from all of the other cell means,
making an even stronger case that all three variables must be combined
properly to see any effect (if you were sure that this pattern were going to
occur, you could test a contrast comparing the average of seven cell means
to the one you expect to be different and not bother with the ANOVA at all).
More often the results are not so clear-cut, but there is one cell mean
that is considerably higher than the others (as in Figure 22.1). This kind of
pattern is analogous to the ordinal interaction in the two-way case and tends
to cause all of the effects to be significant. On the other hand, a three-way
interaction could arise because the two-way interaction reverses its pattern
when changing levels of the third variable (e.g., imagine that in Figure 22.1
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 691
Section A • Conceptual Foundation
the labels of the two lines were reversed for the graph of men but not for the
women). This is analogous to the disordinal interaction in the two-way case.
Or, the two-way interaction could be strong at one level of the third variable
and much weaker (or nonexistent) at another level. Of course, there are
many other possible variations. And consider how much more complicated
the three-way interaction can get when each factor has more than two levels
(we will deal with a greater number of levels in Section B).
Fortunately, three-way (between-subjects) ANOVAs with many levels for
each factor are not common. One reason is a practical one: the number of
subjects required. Even a design as simple as a 2 × 3 × 4 has 24 cells (to find
the number of cells, you just multiply the numbers of levels). If you want to
have at least 5 subjects per cell, 120 subjects are required. This is not an
impractical study, but you can see how quickly the addition of more levels
would result in a required sample size that could be prohibitive.
Main Effects
In addition to the three-way interaction there are three main effects to look at,
one for each factor. To look at the gender main effect, for instance, just take the
average of the scores for all of the men and compare it to the average of all of
the women. If you have the cell means handy and the design is balanced, you
can average all of the cell means involving men and then all of the cell means
involving women. In Table 22.1, you can average the four cell means for the
men (40, 28, 36, 48) to get 38 (alternatively, you could use the row means in the
extreme right column and average 34 and 42 to get the same result). The average for the women (30, 22, 40, 88) is 45. The means for the other main effects
have already been included in Table 22.1. Looking at the bottom row you can
see that the mean for the nurturing experimenter is 36.5 as compared to 46.5
for the exploitive one. In the extreme right column you’ll find that the mean
for the control subjects is 30, as compared to 53 for the at-risk subjects.
Two-Way Interactions in Three-Way ANOVAs
Further complicating the three-way ANOVA is that, in addition to the threeway interaction and the three main effects, there are three two-way interactions to consider. In terms of our example there are the gender by
experimenter, gender by alcohol, and experimenter by alcohol interactions.
We will look at the last of these first. Before graphing a two-way interaction
in a three-factor design, you have to “collapse” (i.e., average) your scores over
the variable that is not involved in the two-way interaction. To graph the alcohol by experimenter (A × B) interaction you need to average the men with the
women for each combination of alcohol and experimenter levels (i.e., each
cell of the A × B matrix). These means have also been included in Table 22.1.
The graph of these cell means is shown in Figure 22.2. If you compare
this overall two-way interaction with the two-way interactions for the men
and women separately (see Figure 22.1), you will see that the overall interaction looks like an average of the two separate interactions; the amount of
interaction seen in Figure 22.2 is midway between the amount of interaction
for the men and that amount for the women. Does it make sense to average
the interactions for the two genders into one overall interaction? It does if
they are not very different. How different is too different? The size of the
three-way interaction tells us how different these two two-way interactions
are. A statistically significant three-way interaction suggests that we should
be cautious in interpreting any of the two-way interactions. Just as a significant two-way interaction tells us to look carefully at, and possible test, the
691
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 692
692
Chapter 22 • Three-Way ANOVA
Figure 22.2
Average of Men and Women
Graph of Cell Means in
Table 22.1 after Averaging
Across Gender
70
At risk
60
50
40
30
Control
20
0
Nurturing
Exploitive
simple main effects (rather than the overall main effects), a significant threeway interaction suggests that we focus on the simple interaction effects—the
two-way interactions at each level of the third variable (which of the three
independent variables is treated as the “third” variable is a matter of convenience). Even if the three-way interaction falls somewhat short of significance, I would recommend caution in interpreting the two-way interactions
and the main effects, as well, whenever the simple interaction effects look
completely different and, perhaps, show opposite patterns.
So far I have been focusing on the two-way interaction of alcohol and
experimenter in our example, but this choice is somewhat arbitrary. The two
genders are populations that we are likely to have theories about, so it is
often meaningful to compare them. However, I can just as easily graph the
three-way interaction using “alcohol” as the third factor, as I have done in
Figure 22.3a. To graph the overall two-way interaction of gender and experimenter, you can go back to Table 22.1 and average across the alcohol factor.
For instance, the mean for men in the nurturing condition is found by averaging the mean for control group men in the nurturing condition (40) with
Figure 22.3a
Graph of Cell Means in
Table 22.1 Using the
“Alcohol” Factor to
Distinguish the Panels
At Risk
Control
Women
80
80
70
70
60
60
50
50
40
40
Men
30
Men
30
20
Women
20
0
Nurturing
Exploitive
0
Nurturing
Exploitive
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 693
Section A • Conceptual Foundation
693
Figure 22.3b
Average of Control and at Risk
Graph of Cell Means in
Table 22.1 after Averaging
Across the “Alcohol”
Factor
70
60
Women
50
40
Men
30
20
0
Nurturing
Exploitive
the mean for at-risk men in the nurturing condition (36), which is 38. The
overall two-way interaction of gender and experimenter is shown in Figure
22.3b. Note that once again the two-way interaction is a compromise. (Actually, the two two-way interactions are not as different as they look; in both
cases the slope of the line for the women is more positive—or at least less
negative). For completeness, I have graphed the three-way interaction using
experimenter as the third variable, and the overall two-way interaction of
gender and alcohol in Figures 22.4a and 22.4b.
An Example of a Disordinal Three-Way Interaction
In the three-factor example I have been describing, it looks like all three
main effects and all three two-way interactions, as well as the three-way
interaction, could easily be statistically significant. However, it is important
to note that in a balanced design all seven of these effects are independent;
the seven F ratios do share the same error term (i.e., denominator), but the
sizes of the numerators are entirely independent. It is quite possible to have
Figure 22.4a
Exploitive
Nurturing
Women
80
80
70
70
60
60
50
50
Women
40
Men
40
Men
30
30
20
20
0
Control
At risk
0
Control
At risk
Graph of Cell Means in
Table 22.1 Using the
“Experimenter” Factor to
Distinguish the Panels
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 694
694
Chapter 22 • Three-Way ANOVA
Figure 22.4b
Average of Nurturing and Exploitive
Graph of Cell Means in
Table 22.1 after Averaging
Across the “Experimenter”
Factor
70
Women
60
50
Men
40
30
20
0
Control
At risk
a large three-way interaction while all of the other effects are quite small. By
changing the means only for the men in our example, I will illustrate a large,
disordinal interaction that obliterates two of the two-way interactions and
two of the main effects. You can see in Figure 22.5a that this new three-way
interaction is caused by a reversal of the alcohol by experimenter interaction
from one gender to the other. In Figure 22.5b, you can see that the overall
interaction of alcohol by gender is now zero (the lines are parallel); the gender by experimenter interaction is also zero (not shown). On the other hand,
the large gender by alcohol interaction very nearly obliterates the main
effects of both gender and alcohol (see Figure 22.5c). The main effect of
experimenter is, however, large, as can be seen in Figure 22.5b.
An Example in which the Three-Way
Interaction Equals Zero
Finally, I will change the means for the men once more to create an example
in which the three-way interaction is zero, even though the graphs for the
Women
Figure 22.5a
Rearranging the Cell
Means of Table 22.1 to
Depict a Disordinal
3-Way Interaction
Men
At risk
80
80
70
70
60
60
50
50
40
40
30
Control
30
Control
20
0
20
Nurturing
Expoitive
0
At risk
Nurturing
Expoitive
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 695
Section A • Conceptual Foundation
Average of men and women
695
Figure 22.5b
Regraphing Figure 22.5a
after Averaging Across
Gender
70
60
At risk
50
Control
40
30
20
0
Nurturing
Exploitive
Average of Nurturing and Exploitive
70
Women
60
Men
50
40
30
20
0
Control
At risk
two genders do not look the same. In Figure 22.6, I created the means for the
men by starting out with the women’s means and subtracting 10 from each
(this creates a main effect of gender); then I added 30 only to the men’s
means that involved the nurturing condition. The latter change creates a
two-way interaction between experimenter and gender, but because it
affects both the men/nurturing means equally, it does not produce any threeway interaction. One way to see that the three-way interaction is zero in Figure 22.6 is to subtract the slopes of the two lines for each gender. For the
women the slope of the at-risk line is positive: 88 − 40 = 48. The slope of the
control line is negative: 22 − 30 = −8. The difference of the slopes is 48 − (−8) = 56.
If we do the same for the men, we get slopes of 18 and −38, whose difference
is also 56. You may recall that a 2 × 2 interaction has only one df, and can be
summarized by a single number, L, that forms the basis of a simple linear
contrast. The same is true for a 2 × 2 × 2 interaction or any higher-order
interaction in which all of the factors have two levels. Of course, quantifying
a three-way interaction gets considerably more complicated when the factors have more than two levels, but it is safe to say that if the two (or more)
graphs are exactly the same, there will be no three-way interaction (they will
continue to be identical, even if a different factor is chosen to distinguish the
Figure 22.5c
Regraphing Figure 22.5a
after Averaging Across
the “Experimenter”
Factor
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 696
696
Chapter 22 • Three-Way ANOVA
Figure 22.6
Rearranging the Cell
Means of Table 22.1 to
Depict a Zero Amount of
Three-Way Interaction
Women
Men
At risk
80
80
70
70
60
60
50
50
40
40
At risk
30
30
Control
20
20
Control
0
Nurturing
Expoitive
0
Nurturing
Expoitive
graphs). Bear in mind, however, that even if the graphs do not look the same,
the three-way interaction will be zero if the amount of two-way interaction
is the same for every graph.
Calculating the Three-Way ANOVA
Calculating a three-way independent-groups ANOVA is a simple extension of
the method for a two-way independent-groups ANOVA, using the same basic
formulas. In particular, there is really nothing new about calculating MSW (the
error term for all the F ratios); it is just the ordinary average of the cell variances when the design is balanced. (It is hard to imagine that anyone would
calculate an unbalanced three-way ANOVA with a calculator rather than a
computer, so I will not consider that possibility. The analysis of unbalanced
designs is described in general in Chapter 18, Section A). Rather than give you
all of the cell standard deviations or variances for the example in Table 22.1, I’ll
just tell you that SSW equals 6,400; later I’ll divide this by dfW to obtain MSW. (If
you had all of the raw scores, you would also have the option of obtaining SSW
by calculating SStotal and subtracting SSbetween-cells as defined in the following.)
Main Effects
The calculation of the main effects is also the same as in the two-way
ANOVA; the SS for a main effect is just the biased variance of the relevant
group means multiplied by the total N. Let us say that each of the eight cells
in our example contains five subjects, so NT equals 40. Then the SS for the
experimenter factor (SSexper) is 40 times the biased variance of 36.5 and 46.5
(the nurturing and exploitive means from Table 22.1), which equals 40(25) =
1000 (the shortcut for finding the biased variance of two numbers is to take
the square of the difference between them and then divide by 4). Similarly,
SSalcohol = 40(132.25) = 5290, and SSgender = 40(12.25) = 490.
The Two-Way Interactions
When calculating the two-way ANOVA, the SS for the two-way interaction is
found by subtraction; it is the amount of the SSbetween-cells that is left after sub-
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 697
Section A • Conceptual Foundation
tracting the SSs for the main effects. Similarly, the three-way interaction SS
is the amount left over after subtracting the SSs for the main effects and the
SSs for all the two-way interactions from the overall SSbetween-cells. However,
finding the SSs for the two-way interactions in a three-way design gets a little tricky. In addition to the overall SSbetween-cells, we must also calculate some
intermediate “two-way” SSbetween terms.
To keep track of these I will have to introduce some new subscripts. The
overall SSbetween-cells is based on the variance of all the cell means, so no factors
are “collapsed,” or averaged over. Representing gender as G, alcohol as A, and
experimenter as E, the overall SSbetween-cells will be written as SSGAE. We will
also need to calculate an SSbetween after averaging over gender. This is based on
the four means (included in Table 22.1) I used to graph the alcohol by experimenter interaction and will be represented by SSAE. Because the design is
balanced, you can take the simple average of the appropriate male cell mean
and female cell mean in each case. Note that SSAE is not the SS for the alcohol by experimenter interaction because it also includes the main effects of
those two factors. In similar fashion, we need to find SSGA from the means
you get after averaging over the experimenter factor and SSGE by averaging
over the alcohol factor. Once we have calculated these four SSbetween terms, all
of the SSs we need for the three-way ANOVA can be found by subtraction.
Let’s begin with the calculation of SSGAE; the biased variance of the eight
cell means is 366.75, so SSGAE = 40(366.75) = 14,670. The means for SSAE are
35, 25, 38, 68, and their biased variance equals 257.75, so SSAE = 10,290. SSGA
is based on the following means: 34, 26, 42, 64, so SSGA = 40(200.75) = 8,030.
Finally, SSGE, based on means of 38, 38, 35, 55, equals 2,490.
Next we find the SSs for each two-way interaction:
SSA × E = SSAE − SSalcohol − SSexper = 10,290 − 5,290 − 1,000 = 4,000
SSG × A = SSGA − SSgender − SSalcohol = 8,030 − 490 − 5,290 = 2,250
SSG × E = SSGE − SSgender − SSexper = 2,490 − 490 − 1,000 = 1,000
Finally, the SS for the three-way interaction (SSG × A × E) equals
SSGAE − SSA × E − SSG × A − SSG × E − SSgender − SSalcohol − SSexper
= 14,670 − 4,000 − 2,250 − 1,000 − 490 − 5,290 − 1,000 = 640
Formulas for the General Case
It is traditional to assign the letters A, B, and, C to the three independent
variables in the general case; variables D, E, and so forth, can then be added
to represent a four-way, five-way, or higher ANOVA. I’ll assume that the following components have already been calculated using Formula 14.3
applied to the appropriate means: SSA, SSB, SSC, SSAB, SSAC, SSBC, SSABC. In
addition, I’ll assume that SSW has also been calculated, either by averaging
the cell variances and multiplying by dfW or by subtracting SSABC from SStotal.
The remaining SS components are found by Formula 22.1:
a.
b.
c.
d.
SSA × B = SSAB − SSA − SSB
Formula 22.1
SSA × C = SSAC − SSA − SSC
SSB × C = SSBC − SSB − SSC
SSA × B × C = SSABC − SSA × B − SSB × C − SSA × C − SSA − SSB − SSC
At the end of the analysis, SStotal (whether or not it has been calculated
separately) has been divided into eight components: SSA, SSB, SSC, the four
interactions listed in Formula 22.1, and SSW. Each of these is divided by its
corresponding df to form a variance estimate, MS. Using a to represent the
697
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 698
698
Chapter 22 • Three-Way ANOVA
number of levels of the A factor, b for the B factor, c for the C factor, and n
for the number of subjects in each cell, the formulas for the df components
are as follows:
a.
b.
c.
d.
e.
f.
g.
h.
dfA = a − 1
dfB = b − 1
dfC = c − 1
dfA × B = (a − 1)(b − 1)
dfA × C = (a − 1)(c − 1)
dfB × C = (b − 1)(c − 1)
dfA × B × C = (a − 1)(b − 1)(c − 1)
dfW = abc (n − 1)
Formula 22.2
Completing the Analysis for the Example
Because each factor in the example has only two levels, all of the numerator
df’s are equal to 1, which means that all of the MS terms are equal to their
corresponding SS terms—except, of course, for the error term. The df for the
error term (i.e., dfW) equals the number of cells (abc) times one less than the
number of subjects per cell (this gives the same value as NT minus the number of cells); in this case dfW = 8(4) = 32. MSW = SSW/dfW; therefore, MSW =
6400/32 = 200. (Reminder: I gave the value of SSW to you to reduce the
amount of calculation.)
Now we can complete the three-way ANOVA by calculating all of the
possible F ratios and testing each for statistical significance:
MSgender 490
Fgender = = = 2.45
MSW
200
MSalcohol 5,290
Falcohol = = = 26.45
MSW
200
MSexper 1,000
Fexper = = = 5
MSW
200
MSA × E 4,000
FA × E = = = 20
MSW
200
MSG × A 2,250
FG × A = = = 11.35
MSW
200
MSG × E 1000
FG × E = = = 5
MSW
200
MSG × A × E 640
FG × A × E = = = 3.2
MSW
200
Because the df happens to be 1 for all of the numerator terms, the critical F
for all seven tests is F.05 (1,32), which is equal (approximately) to 4.15. Except
for the main effect of gender, and the three-way interaction, all of the F ratios
exceed the critical value (4.15) and are therefore significant at the .05 level.
Follow-Up Tests for the Three-Way ANOVA
Decisions concerning follow-up comparisons for a factorial ANOVA are
made in a top-down fashion. First, one checks the highest-order interaction
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 699
Section A • Conceptual Foundation
for significance; in a three-way ANOVA it is the three-way interaction. (Twoway interactions are the simplest possible interactions and are called firstorder interactions; three-way interactions are known as second-order
interactions, etc.) If the highest interaction is significant, the post hoc tests
focus on the various simple effects or interaction contrasts, followed by
appropriate cell-to-cell comparisons. In a three-way ANOVA in which the
three-way interaction is not significant, as in the present example, attention
turns to the three two-way interactions. Although all of the two-way interactions are significant in our example, the alcohol by experimenter interaction
is the easiest to interpret because it replicates previous results.
It would be appropriate to follow up the significant alcohol by experimenter interaction with four t tests (e.g., one of the relevant t tests would
determine whether at-risk subjects differ significantly from controls in the
exploitive condition). Given the disordinal nature of the interaction (see Figure 22.2), it is likely that the main effects would simply be ignored. A similar
approach would be taken to the two other significant two-way interactions.
Thus, all three main effects would be regarded with caution. Note that
because all of the factors are dichotomous, there would be no follow-up tests
to perform on significant main effects, even if none of the interactions were
significant. With more than two levels for some or all of the factors, it
becomes possible to test partial interactions, and significant main effects for
factors not involved in significant interactions can be followed by pairwise
or complex comparisons, as described in Chapter 14, Section C. I will illustrate some of the complex planned and post hoc comparisons for the threeway design in Section B.
Types of Three-Way Designs
Cases involving significant three-way interactions and factors with more
than two levels will be considered in the context of mixed designs in Section
B. However, before we turn to mixed designs, let us look at some of the typical situations in which three-way designs with no repeated measures arise.
One situation involves three experimental manipulations for which repeated
measures are not feasible. For instance, subjects perform a repetitive task in
one of two conditions: They are told that their performance is being measured or that it is not. In each condition half of the subjects are told that performance on the task is related to intelligence, and the other half are told
that it is not. Finally, within each of the four groups just described, half the
subjects are treated respectfully and half are treated rudely. The work output
of each subject can then be analyzed by a 2 × 2 × 2 ANOVA.
Another possibility involves three grouping variables, each of which
involves selecting subjects whose group is already determined. For instance, a
group of people who exercise regularly and an equal-sized group of those who
don’t are divided into those high and those relatively low on self-esteem (by a
median split). If there are equal numbers of men and women in each of the
four cells, we have a balanced 2 × 2 × 2 design. More commonly one or two of
the variables involve experimental manipulations and two or one involve
grouping variables. The example calculated earlier in this section involved two
grouping variables (gender and having an alcohol-dependent parent or not)
and one experimental variable (nurturing vs. exploitive experimenter).
To devise an interesting example with two experimental manipulations
and one grouping variable, start with two experimental factors that are
expected to interact (e.g., one factor is whether or not the subjects are told
that performance on the experimental task is related to intelligence, and the
other factor is whether or not the group of subjects run together will know
699
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 700
Chapter 22 • Three-Way ANOVA
700
each other’s final scores). Then, add a grouping variable by comparing subjects who are either high or low on self-esteem, need for achievement, or
some other relevant aspect of personality. If the two-way interaction differs
significantly between the two groups of subjects, the three-way interaction
will be significant.
The Two-Way RM ANOVA
One added benefit of learning how to calculate a three-way ANOVA is that
you now know how to calculate a two-way ANOVA in which both factors
involve repeated measures. In Chapter 15, I showed you that the SS components of a one-way RM design are calculated as though the design were a
two-way independent-groups ANOVA with no within-cell variability. Similarly, a two-way RM ANOVA is calculated just as shown in the preceding for
the three-way independent-groups ANOVA, with the following modifications: (1) One of the three factors is the subjects factor—each subject represents a different level of the subjects factor, (2) the main effect of subjects is
not tested, and there is no MSW error term, (3) each of the two main effects
that is tested uses the interaction of that factor with the subjects factor as the
error term, and (4) the interaction of the two factors of interest is tested by
using as the error term the interaction of all three factors (i.e., including the
subjects factor). If one RM factor is labeled Q and the other factor, R, and we
use S to represent the subjects factor, the equations for the three F ratios can
be written as follows:
MSQ
FQ = ,
MSQ × S
MSR
FR = MSR × S
MSQ × R
FQ × R = MXQ × R × S
Higher-Order ANOVA
This text will not cover factorial designs of higher order than the three-way
ANOVA. Although higher-order ANOVAs can be difficult to interpret, no new
principles are introduced. The four-way ANOVA produces 15 different F
ratios to test: four main effects, 6 two-way interactions, 4 three-way interactions, and 1 four-way interaction. Testing each of these 15 effects at the .05
level raises serious concerns about the increased risk of Type I errors. Usually, all of the F ratios are not tested; specific hypotheses should guide the
selection of particular effects to test. Of course, the potential for an inflated
rate of Type I errors only increases as factors are added. In general, an N-way
ANOVA produces 2N − 1 F ratios that can be tested for significance.
In the next section I will delve into more complex varieties of the threeway ANOVA—in particular those that include repeated measures on one or
two of the factors.
A
SUMMARY
1. To display the cell means of a three-way factorial design, it is convenient
to create two-way graphs for each level of the third variable and place
these graphs side by side (you have to decide which of the three variables will distinguish the graphs and which of the two remaining variables will be placed along the X axis of each graph). Each two-way graph
depicts a simple interaction effect; if the simple interaction effects are
significantly different from each other, the three-way interaction will be
significant.
2. Three-way interactions can occur in a variety of ways. The interaction of
two of the factors can be strong at one level of the third factor and close
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 701
Section A • Conceptual Foundation
3.
4.
5.
6.
7.
8.
701
to zero at a different level (or even stronger at a different level). The
direction of the two-way interaction can reverse from one level of the
third variable to another. Also, a three-way interaction can arise when
all of the cell means are similar except for one.
The main effects of the three-way ANOVA are based on the means at
each level of one of the factors, averaging across the other two. A twoway interaction is the average of the separate two-way interactions
(simple interaction effects) at each level of the third factor. A two-way
interaction is based on a two-way table of means created by averaging
across the third factor.
The error term for the three-way ANOVA, MSW, is a simple extension of
the error term for a two-way ANOVA; in a balanced design, it is the simple average of all of the cell variances. All of the SSbetween components are
found by Formula 14.3, or by subtraction using Formula 22.1. There are
seven F ratios that can be tested for significance: the three main effects,
three two-way interactions, and the three-way interaction.
Averaging simple interaction effects together to create a two-way interaction is reasonable only if these effects do not differ significantly. If
they do differ, follow-up tests usually focus on the simple interaction
effects themselves or particular 2 × 2 interaction contrasts. If the threeway interaction is not significant, but a two-way interaction is, the significant two-way interaction is explored as in a two-way ANOVA—with
simple main effects or interaction contrasts. Also, when the three-way
interaction is not significant, any significant main effect can be followed
up in the usual way if that variable is not involved in a significant twoway interaction.
All three factors in a three-way ANOVA can be grouping variables (i.e.,
based on intact groups), but this is rare. It is more common to have just
one grouping variable and compare the interaction of two experimental
factors among various subgroups of the population. Of course, all three
factors can involve experimental manipulations.
The two-way ANOVA in which both factors involve repeated measures is
analyzed as a three-way ANOVA, with the different subjects serving as
the levels of the third factor. The error term for each RM factor is the
interaction of that factor with the subject factor; the error term for the
interaction of the two RM factors is the three-way interaction.
In an N-way factorial ANOVA, there are 2N − 1 F ratios that can be tested.
The two-way interaction is called a first-order interaction, the three-way
is a second-order interaction, and so forth.
EXERCISES
1. Imagine an experiment in which each subject is required to use his or her memories to
create one emotion: either happiness, sadness, anger, or fear. Within each emotion
group, half of the subjects participate in a
relaxation exercise just before the emotion
condition, and half do not. Finally, half the
subjects in each emotion/relaxation condition are run in a dark, sound-proof chamber,
and the other half are run in a normally lit
room. The dependent variable is the subject’s
systolic blood pressure when the subject signals that the emotion is fully present. The
design is balanced, with a total of 128 subjects. The results of the three-way ANOVA for
this hypothetical experiment are as follows:
SSemotion = 223.1, SSrelax = 64.4, SSdark = 31.6,
SSemo × rel = 167.3, SSemo × dark = 51.5; SSrel × dark =
127.3, and SSemo × rel × dark = 77.2. The total sum
of squares is 2,344.
a. Calculate the seven F ratios, and test each
for significance.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 702
Chapter 22 • Three-Way ANOVA
702
b. Calculate partial eta squared for each of
the three main effects (use Formula 14.9).
Are any of these effects at least moderate
in size?
2. In this exercise there are 20 subjects in each
cell of a 3 × 3 × 2 design. The levels of the first
factor (location) are urban, suburban, and
rural. The levels of the second factor are no
siblings, one or two siblings, and more than
two siblings. The third factor has only two
levels: presently married and not presently
married. The dependent variable is the number of close friends that each subject reports
having. The cell means are as follows:
No Siblings
Married
Not Married
1 or 2 Siblings
Married
Not Married
2 or more Siblings
Married
Not Married
Urban
Suburban
Rural
1.9
4.7
3.1
5.7
2.0
3.5
2.3
4.5
3.0
5.3
3.3
4.6
3.2
3.9
4.5
6.2
2.9
4.6
a. Given that SSW equals 1,094, complete the
three-way ANOVA, and present your
results in a summary table.
b. Draw a graph of the means for Location ×
Number of Siblings (averaging across marital status). Describe the nature of the
interaction.
c. Using the means from part b, test the simple effect of number of siblings at each
location.
3. Seventy-two patients with agoraphobia are
randomly assigned to one of four drug conditions: SSRI (e.g., Prozac), tricyclic antidepressant (e.g., Elavil), antianxiety (e.g., Xanax), or
a placebo (offered as a new drug for agoraphobia). Within each drug condition, a third
of the patients are randomly assigned to each
of three types of psychotherapy: psychodynamic, cognitive/behavioral, and group. The
subjects are assigned so that half the subjects
in each drug/therapy group are also depressed, and half are not. After 6 months of
treatment, the severity of agoraphobia is
measured for each subject (30 is the maximum possible phobia score); the cell means
(n = 3) are as follows:
a. Given that SSW equals 131, complete the
three-way ANOVA, and present your results
in a summary table.
SSRI Tricyclic Antianxiety Placebo
Psychodynamic
Not Depressed 10
Depressed
8.7
Cog/Behav
Not Depressed 9.5
Depressed
10.3
Group
Not Depressed 11.6
Depressed
9.7
11.5
8.7
19.0
14.5
22.0
19.0
11.0
14.0
12.0
10.0
17.0
16.5
12.6
12.0
19.3
17.0
13.0
11.0
b. Draw a graph of the cell means, with separate panels for depressed and not
depressed. Describe the nature of the
therapy × drug interaction in each panel.
Does there appear to be a three-way interaction? Explain.
c. Given your results in part a, describe a set
of follow-up tests that would be justifiable.
d. Optional: Test the 2 × 2 × 2 interaction
contrast that results from deleting Group
therapy and the SSRI and placebo conditions from the analysis (extend the techniques of Chapter 13, Section B, and
Chapter 14, Section C).
4. An industrial psychologist is studying the
relation between motivation and productivity. Subjects are told to perform as many
repetitions of a given clerical task as they
can in a 1-hour period. The dependent variable is the number of tasks correctly performed. Sixteen subjects participated in the
experiment for credit toward a requirement
of their introductory psychology course
(credit group). Another 16 subjects were
recruited from other classes and paid $10
for the hour (money group). All subjects
performed a small set of similar clerical
tasks as practice before the main study; in
each group (credit or money) half the subjects (selected randomly) were told they had
performed unusually well on the practice
trials (positive feedback), and half were told
they had performed poorly (negative feedback). Finally, within each of the four
groups created by the manipulations just
described, half of the subjects (at random)
were told that performing the tasks quickly
and accurately was correlated with other
important job skills (self motivation),
whereas the other half were told that good
performance would help the experiment
(other motivation). The data appear in the
following table:
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 703
Section B • Basic Statistical Procedures
CREDIT SUBJECTS
PAID SUBJECTS
Positive
Feedback
Negative
Feedback
Positive
Feedback
Negative
Feedback
Self
22
25
26
30
12
15
12
10
21
17
15
21
25
23
30
26
Other
11
18
12
14
20
23
21
26
33
29
35
29
21
22
19
17
a. Perform a three-way ANOVA on the data.
Test all seven F ratios for significance, and
present your results in a summary table.
b. Use graphs of the cell means to help you
describe the pattern underlying each
effect that was significant in part a.
c. Based on the results in part a, what post
hoc tests would be justified?
5. Imagine that subjects are matched in blocks
of three based on height, weight, and other
physical characteristics; six blocks are
formed in this way. Then the subjects in each
block are randomly assigned to three differSAD
703
ent weight-loss programs. Subjects are measured before the diet, at the end of the diet
program, 3 months later, and 6 months
later. The results of the two-way RM ANOVA
for this hypothetical experiment are given in
terms of the SS components, as follows:
SSdiet = 403.1, SStime = 316.8, SSdiet × time = 52,
SSdiet × S = 295.7, SStime × S = 174.1, and SSdiet × time × S
= 230.
a. Calculate the three F ratios, and test each
for significance.
b. Find the conservatively adjusted critical F
for each test. Will any of your conclusions
be affected if you do not assume that
sphericity exists in the population?
6. A psychologist wants to know how both the
affective valence (happy vs. sad vs. neutral)
and the imageability (low, medium, high) of
words affect their recall. A list of 90 words is
prepared with 10 words from each combination of factors (e.g., happy, low imagery: promotion; sad, high imagery: cemetery)
randomly mixed together. The number of
words recalled in each category by each of
the six subjects in the study is given in the
following table:
NEUTRAL
HAPPY
Subject No.
Low
Medium
High
Low
Medium
High
Low
Medium
High
1
2
3
4
5
6
5
2
5
3
4
3
6
5
7
6
9
5
9
7
5
5
8
7
2
3
2
3
4
4
5
6
4
5
7
5
6
6
5
6
7
6
3
5
4
4
4
6
4
5
3
4
5
4
8
6
7
5
9
4
a. Perform a two-way RM ANOVA on the data.
Test the three F ratios for significance, and
present your results in a summary table.
b. Find the conservatively adjusted critical F
for each test. Will any of your conclusions
be affected if you do not assume that
sphericity exists in the population?
c. Draw a graph of the cell means, and
describe any trend toward an interaction
that you can see.
d. Based on the variables in this exercise,
and the results in part a, what post
hoc tests would be justified and meaningful?
An important way in which one three-factor design can differ from another
is the number of factors that involve repeated measures (or matching). The
design in which none of the factors involve repeated measures was covered
in Section A. The design in which all three factors are RM factors will not be
covered in this text; however, the three-way RM design is a straightforward
extension of the two-way RM design described at the end of Section A. This
section will focus on three-way designs with either one or two RM factors
(i.e., mixed designs), and it will also elaborate on the general principles of
dealing with three-way ANOVAs, as introduced in Section A, and consider
B
BASIC
STATISTICAL
PROCEDIRES
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 704
704
Chapter 22 • Three-Way ANOVA
the complexities of interactions and post hoc tests when the factors have
more than two levels each.
One RM Factor
I will begin with a three-factor design in which there are repeated measures
on only one of the factors. The ANOVA for this design is not much more
complicated than the two-way mixed ANOVA described in the previous
chapter—for instance, there are only two different error terms. Such designs
arise frequently in psychological research. One simple way to arrive at such
a design is to start with a two-way ANOVA with no repeated measures. For
instance, patients with two different types of anxiety disorders (generalized
anxiety vs. specific phobias) are treated with two different forms of psychotherapy (psychodynamic vs. behavioral). The third factor is added by
measuring the patients’ anxiety at several points in time (e.g., beginning of
therapy, end of therapy, several months after therapy has stopped); I will
refer to this factor simply as time.
To illustrate the analysis of this type of design I will take the two-way
ANOVA from Section B of Chapter 14 and add time as an RM factor. You may
recall that that example involved four levels of sleep deprivation and three
levels of stimulation. Performance was measured only once—after 4 days in
the sleep lab. Now imagine that performance on the simulated truck driving
task is measured three times: after 2, 4, and 6 days in the sleep lab. The raw
data for the three-factor study are given in Table 22.2, along with the various
means we will need to graph and analyze the results; note that the data
for Day 4 are identical to the data for the corresponding two-way ANOVA
in Chapter 14. To see what we may expect from the results of a threeway ANOVA on these data, the cell means have been graphed so that we can
look at the sleep by stimulation interaction at each time period (see Figure
22.7).
You can see from Figure 22.7 that the sleep × stimulation interaction,
which was not quite significant for Day 4 alone (see Chapter 14, section B),
increases over time, perhaps enough so as to produce a three-way interaction. We can also see that the main effects of stimulation and sleep, significant at Day 4, are likely to be significant in the three-way analysis. The
general decrease in scores from Day 2 to Day 4 to Day 6 is also likely to yield
a significant main effect for time. Without regraphing the data, it is hard to
see whether the interactions of time with either sleep or stimulation are
large or small. However, because these interactions are less interesting in the
context of this experiment, I won’t bother to present the two other possible
sets of graphs.
To present general formulas for analyzing the kind of experiment shown
in Table 22.2, I will adopt the following notation. The two between-subject
factors will be labeled A and B. Of course, it is arbitrary which factor is
called A and which B; in this example the sleep deprivation factor will be A,
and the stimulation factor will be B. The lowercase letters a and b will stand
for the number of levels of their corresponding factors—in this case, 4 and
3, respectively. The within-subject factor will be labeled R, and its number of
levels, c, to be consistent with previous chapters.
Let us begin with the simplest SS components: SStotal, and the SSs for
the numerators of each main effect. SStotal is based on the total number of
observations, NT, which for any balanced three-way factorial ANOVA is
equal to abcn, where n is the number of different subjects in each cell of the
A × B table. So, NT = 4 3 3 5 = 180. The biased variance obtained by entering all 180 scores is 43.1569, so SStotal = 43.1569 180 = 7,768.24. SSA is based
AB means
Column means
Total
AB means
Interrupt
AB means
Jet Lag
AB means
None
Day 4
24
29
28
20
20
24.2
22
18
16
25
27
21.6
16
19
20
11
14
16.0
14
17
18
12
10
14.2
19.0
Day 2
26
30
29
23
21
25.8
24
20
15
27
28
22.8
17
19
22
11
15
16.8
16
18
20
14
11
15.8
20.3
5
6
10
7
7
7.0
14.0
9
6
11
7
10
8.6
17
15
13
19
22
17.2
24
25
27
20
20
23.2
Day 6
PLACEBO
11.67
13.67
16.0
11.0
9.33
12.33
17.77
14.0
14.67
17.67
9.67
13.0
13.8
21
17.67
14.67
23.67
25.67
20.53
24.67
28.0
28.0
21.0
20.33
24.4
Subject
Means
24
19
20
27
26
23.2
25.5
25
21
19
25
24
22.8
27
29
34
23
25
27.6
29
26
23
29
35
28.4
Day 2
15
11
11
19
17
14.6
21.0
16
13
12
18
19
15.6
26
30
32
20
23
26.2
28
23
24
30
33
27.6
Day 4
14
8
15
17
10
12.8
17.65
10
9
8
12
14
10.6
33
17
25
18
20
22.6
26
23
25
27
22
24.6
Day 6
MOTIVATION
Table 22.2
17.67
12.67
15.33
21.0
17.67
16.87
21.38
17.0
14.33
13.0
18.33
19.0
16.33
28.67
25.33
30.33
20.33
22.67
25.46
27.67
24.0
24.0
28.67
30.0
26.87
Subject
Means
25
16
19
27
26
22.6
25.1
23
29
28
20
21
24.2
24
30
30
25
23
26.4
29
24
23
31
29
27.2
Day 2
23
16
18
26
24
21.4
23.65
23
28
26
17
19
22.6
25
27
31
24
21
25.6
26
22
20
30
27
25.0
Day 4
18
14
12
21
21
17.2
20.5
20
23
23
12
17
19.0
20
24
25
17
22
21.6
26
23
17
30
25
24.2
Day 6
CAFFEINE
22.0
15.33
16.33
24.67
23.67
20.4
23.08
22.0
26.67
25.67
16.33
19.0
21.93
23
27.0
28.67
22.0
22.0
24.53
27.0
23.0
20.0
30.33
27.0
25.47
16.53
17.35
23.51
25.58
Subject Row
Means Means
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 705
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 706
706
Chapter 22 • Three-Way ANOVA
Figure 22.7
30
Graph of the Cell Means
in Table 22.2
25
Motivation
20
Caffeine
15
Placebo
Day 2
10
7
0
None
Jet-Lag
Interrupt
Total
30
25
Caffeine
20
Day 4
Motivation
15
Placebo
10
7
0
None
Jet-Lag
Interrupt
Total
30
25
20
Caffeine
15
Motivation
10
7
Placebo
0
None
Jet-Lag
Interrupt
Day 6
Total
on the means for the four sleep deprivation levels, which can be found in the
rightmost column of the table, labeled “row means.” SSB is based on the
means for the three stimulation levels, which are found where the bottom
row of the table (Column Means), intersects the columns labeled “Subject
Means” (these are averaged over the three days, as well as the sleep levels).
The means for the three different days are not in the table but can be found
by averaging the three Column Means for Day 2, the three for Day 4, and
similarly for Day 6. The SSs for the main effects are as follows:
SSA = σ2(25.58, 23.51, 17.35, 16.53) 180 = 15.08 180 = 2,714.4.
SSB = σ2(17.77, 21.38, 23.08) 180 = 4.902 180 = 882.36.
SSR = σ2(23.63, 21.22, 17.38) = 6.622 180 = 1,192.0
As in Section A, we will need the SS based on the cell means, SSABR, and
the SSs for each two-way table of means: SSAB, SSAR, and SSBR. In addition,
because one factor has repeated measures we will also need to find the
means for each subject (averaging their scores for Day 2, Day 4, and Day 6)
and the SS based on those means, SSbetween-subjects.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 707
Section B • Basic Statistical Procedures
The cell means we need for SSABR are given in Table 22.2, under Day 2,
Day 4, and Day 6, in each of the rows labeled AB Means; there are 36 of them
(a b c). The biased variance of these cell means is 30.746, so SSABR =
30.746 180 = 5,534.28. The means for SSAB are found by averaging across
the 3 days for each combination of sleep and stimulation levels and are
found in the rows for AB Means under “Subject Means.” The biased variance
of these 12 (i.e., a b) means equals 22.078, so SSAB = 3,974. The nine means
for SSBR are the column means of Table 22.2, except for the columns labeled
“Subject Means.” SSBR = σ2(20.3, 19.0, 14.0, 25.5, 21.0, 17.65, 25.1, 23.65,
20.5) 180 = 2,169.14. Unfortunately, there was no convenient place in Table
22.2 to put the means for SSAR. They are found by averaging the (AB) means
for each day and level of sleep deprivation over the three stimulation levels.
SSAR = σ2(27.13, 25.6, 24, 25.6, 24.47, 20.47, 21.27, 18.07, 12.73, 20.53, 16.73,
12.33) 180 = 4,066.6. Finally, we need to calculate SSbetween-subjects for the 60
(a b n) subject means found in Table 22.2 under “Subject Means” (ignoring the entries in the rows labeled AB Means and Column Means, of course).
SSbetween-subjects = 32.22 180 = 5,799.6.
Now we can get the rest of the SS components we need by subtraction.
The SSs for the two-way interactions are found just as in Section A from
Formula 22.1a, b, and c (except that factor C has been changed to R):
SSA × B = SSAB − SSA − SSB
SSA × R = SSAR − SSA − SSR
SSB × R = SSBR − SSB − SSR
Plugging in the SSs for the present example, we get
SSA × B = 3,974 − 2,714.4 − 882.4 = 377.2
SSA × R = 4,066.6 − 2,714.4 − 1,192 = 160.2
SSB × R = 2,169.14 − 882.4 − 1,192 = 94.74
The three-way interaction is found by subtracting from SSABR the SSs for
three two-way interactions and the three main effects (Formula 22.1d).
SSA × B × R = SSABR − SSA × B − SSA × R − SSB × R − SSA − SSB − SSR
SSA × B × R = 5,534.28 − 377.2 − 160.2 − 94.74 − 2,714.4 − 882.4 − 1192 = 113.34
As in the two-way mixed design there are two different error terms. One
of the error terms involves subject-to-subject variability within each group—
or, in the case of the present design, within each cell formed by the two
between-group factors. This is the error component you have come to know
as SSW, and I will continue to call it that. The total variability from one subject to another (averaging across the RM factor) is represented by a term we
have already calculated: SSbetween-subjects, or SSbet-s, for short. In the one-way
RM ANOVA this source of variability was called the “subjects” factor (SSsub),
or the main effect of “subjects,” and because it did not play a useful role, we
ignored it. In the mixed design of the previous chapter it was simply divided
between SSgroups and SSW. Now that we have two between-group factors, that
source of variability can be divided into four components, as follows:
SSbet-s = SSA + SSB + SSA × B + SSW
This relation can be expressed more simply as
SSbet-s = SSAB + SSW
The error portion, SSW, is found most easily by subtraction:
SSW = SSbet-S − SSAB
Formula 22.3
707
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 708
708
Chapter 22 • Three-Way ANOVA
This SS is the basis of the error term that is used for all three of the betweengroup effects. The other error term involves the variability within subjects.
The total variability within subjects, represented by SSwithin-subjects, or SSW-S,
for short, can be found by taking the total SS and subtracting the betweensubject variability:
SSW-S = SStotal − SSbet-S
Formula 22.4
The within-subject variability can be divided into five components,
which include the main effect of the RM factor and all of its interactions:
SSW-S = SSR + SSA × R + SSB × R + SSA × B × R + SSS × R
The last term is the basis for the error term that is used for all of the
effects involving the RM factor (it was called SSS × RM in Chapter 16). It is
found conveniently by subtraction:
SSS × R = SSW-S − SSR − SSA × R − SSB × R − SSA × B × R
Formula 22.5
We are now ready to get the remaining SS components for our example.
SSW = SSbet-S − SSAB = 5,799.6 − 3,974 = 1,825.6
SSW-S = SStotal − SSbet-S = 7,768.24 − 5,799.6 = 1,968.64
SSS × R = SSW-S − SSR − SSA × R − SSB × R − SSA × B × R
= 1,968.64 − 1,192 − 160.2 − 94.74 − 113.34 = 408.36
A more tedious but more instructive way to find SSS × R would be to find
the subject by RM interaction separately for each of the eight cells of the
between-groups (AB) matrix and then add these eight components together.
This overall error term is justified only if you can assume that all eight interactions would be the same in the entire population. As mentioned in the previous chapter, there is a statistical test (Box’s M criterion) that can be used to
give some indication of whether this assumption is reasonable.
Now that we have divided SStotal into all of its components, we need to
do the same for the degrees of freedom. This division, along with all of the
df formulas, is shown in the degrees of freedom tree in Figure 22.8.
The df’s we will need to complete the ANOVA are based on the following
formula:
a.
b.
c.
d.
e.
f.
g.
h.
i.
dfA = a − 1
dfB = b − 1
dfA × B = (a − 1)(b − 1)
dfR = c − 1
dfA × R = (a − 1)(c − 1)
dfB × R = (b − 1)(c − 1)
dfA × B × R = (a − 1)(b − 1)(c − 1)
dfW = ab(n − 1)
dfS × R = dfW dfR = ab(n − 1)(c − 1)
For the present example,
dfA = 4 − 1 = 3
dfB = 3 − 1 = 2
dfA × B = 3 2 = 6
dfR = 3 − 1 = 2
dfA × R = 3 2 = 6
dfB × R = 2 2 = 4
dfA × B × R = 3 2 2 = 12
dfW = 4 3 (5 − 1) = 48
dfS × R = dfW dfR = 48 2 = 96
Formula 22.6
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 709
Section B • Basic Statistical Procedures
df total
[abcn–1]
709
Figure 22.8
Degrees of Freedom Tree
for Three-Way ANOVA
with Repeated Measures
on One Factor
df within-subjects
[abn(c–1)]
df between-subjects
[abn–1]
df groups
[ab–1]
df R
[c–1]
df W
[ab(n–1)]
df S × R
[ab(n–1)(c–1)]
df A × R
[(a–1)(c–1)]
df B × R
[(b–1)(c–1)]
df A × B × R
[(a–1)(b–1)(c–1)]
df A
[a–1]
df B
[b–1]
df A × B
[(a–1)(b–1)]
Note that the sum of all the df’s is 179, which equals dftotal (NT − 1 = abcn − 1 =
180 − 1).
The next step is to divide each SS by its df to obtain the corresponding
MS. The results of this step are shown in Table 22.3 along with the F ratios
and their p values. The seven F ratios were formed according to Formula
22.7:
Source
SS
df
MS
F
p
Between-subjects
Sleep deprivation
Stimulation
Sleep × Stim
Within-groups
5,799.6
2714.4
882.4
375.8
1825.6
59
3
2
6
48
904.8
441.2
62.63
38.03
23.8
11.6
1.65
<.001
<.001
>.05
Within-subjects
Time
Sleep × Time
Stim × Time
Sleep × Stim × Time
Subject × Time
1,968.64
1192
160.2
94.74
114.74
408.36
120
2
6
4
12
96
596
26.7
23.7
9.56
4.25
140.2
6.28
5.58
2.25
<.001
<.001
<.001
<.05
Note: The errors that you get from rounding off the means before applying
Formula 14.3 are compounded in a complex design. If you retain more
digits after the decimal place than I did in the various group and cell means
or use raw-score formulas or analyze the data by computer, your F ratios
will differ by a few tenths of a point from those in Table 22.3 (fortunately,
your conclusions should be the same). If you are going to present your
findings to others, regardless of the purpose, I strongly recommend that you
use statistical software, and in particular a program or package that is quite
popular (so that there is a good chance that its bugs have already been
eliminated, at least for basic procedures, such as those in this text).
Table 22.3
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 710
710
Chapter 22 • Three-Way ANOVA
a.
MSA
FA = MSW
Formula 22.7
MSB
b. FB = MSW
c.
MSA × B
FA × B = MSW
d.
MSR
FR = MSS × R
e.
MSA × R
FA × R = MSS × R
MSB × R
f. FB × R = MSS × R
g.
MSA × B × R
FA × B × R = MSS × R
Interpreting the Results
Although the three-way interaction is significant, the ordering of most of the
effects is consistent enough that the main effects are interpretable. The significant main effect of sleep is due to a general decline in performance
across the four levels, with “no deprivation” producing the least deficit and
“total deprivation” the most, as would be expected. It is also no surprise that
overall performance significantly declines with increased time in the sleep
lab. The significant stimulation main effect seems to be due mainly to the
consistently lower performance of the placebo group rather than the fairly
small difference between caffeine and reward.
In Figure 22.9, I have graphed the sleep by stimulation interaction, by
averaging the three panels of Figure 22.7. Although the interaction looks like
it might be significant, we know from Table 22.3 that it is not. Remember
that the error term for testing this interaction is based on subject-to-subject
variability within each cell and does not benefit from the added power of
repeated measures. The other two interactions use MSS × RM as their error
term and therefore do gain the extra power usually conferred by repeated
measures. Of course, even if the sleep by stimulation interaction were significant, its interpretation would be qualified by the significance of the
three-way interaction. The significant three-way interaction tells us to be
cautious in our interpretation of the other six F ratios and suggests that we
look at simple interaction effects.
There are three ways to look at simple interaction effects in a three-way
ANOVA (depending on which factor is looked at one level at a time), but the
most interesting two-way interaction for the present example is sleep deprivation by stimulation, so we will look at that interaction at each level of the
time factor. The results have already been graphed this way in Figure 22.7. It
is easy to see that the three-way interaction in this study is due to the progressive increase in the sleep by stimulation interaction over time.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 711
Section B • Basic Statistical Procedures
30
Figure 22.9
20
Caffeine
Motivation
Placebo
10
5
0
711
None
Jet-Lag
Interrupt
Total
Assumptions
The sphericity tests and adjustments you learned in Chapters 15 and 16 are
easily extended to apply to this design as well. Box’s M criterion can be used
to test that the covariances for each pair of RM levels are the same (in the
population) for every combination of the two between-group factors. If M is
not significant, the interactions can be pooled across all the cells of the twoway between-groups part of the design and then tested for sphericity with
Mauchley’s W. If you cannot perform these tests (or do not trust them), you
can use the modified univariate approach as described in Chapter 15. A factorial MANOVA is also an option (see section C). The df’s and p levels for
the within-subjects effects in Table 22.3 were based on the assumption of
sphericity. Fortunately, the effects are so large that even using the most conservative adjustment of the df’s (i.e., lower-bound epsilon), all of the effects
remain significant at the .05 level (although the three-way interaction is just
at the borderline with p = .05).
Follow-up Comparisons: Simple Interaction Effects
To test the significance of the simple interaction effects just discussed, the
appropriate error term is MSwithin-cell, as defined in section C of Chapter 16,
rather than MSW from the overall analysis. This entails adding SSW to SSS × R
and dividing by the sum of dfW and dfS × R. Thus, MSwithin-cell equals (1,827 +
407)/(48 + 96) = 2,234/144 = 15.5. However, given the small sample sizes in
our example, it would be even safer (with respect to controlling Type I
errors) to test the two-way interaction in each simple interaction effect as
though it were a separate two-way ANOVA. There is little difference between
the two approaches in this case because MSwithin-cell is just the ordinary average of the MSW terms for the three simple interaction effects, and these do
not differ much. The middle graph in Figure 22.7 represents the results of
the two-way experiment of Chapter 14 (Section B), so if we don’t pool error
terms, we know from the Chapter 14 analysis that the two-way interaction
after 4 days is not statistically significant (F = 1.97). Because the interaction
after 2 days is clearly less than it is after 4 days (and the error term is similar), it is a good guess that the two-way interaction after 2 days is not statistically significant, either (in fact, F < 1). However, the sleep × stimulation
interaction becomes quite strong after 6 days; indeed, the F for that simple
interaction effect is statistically significant (F = 2.73, p < .05).
Although it may not have been predicted specifically that the sleep × stimulation interaction would grow stronger over time, it is a perfectly reasonable
Graph of the Cell Means
in Table 22.2 After
Averaging Across the
Time Factor
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 712
712
Chapter 22 • Three-Way ANOVA
result, and it would make sense to focus our remaining follow-up analyses on
Day 6 alone. We would then be dealing with an ordinary 4 × 3 ANOVA with no
repeated measures, and post hoc analyses would proceed by testing simple
main effects or interaction contrasts exactly as described in Chapter 14, Section C. Alternatively, we could have explored the significant three-way interaction by testing the sleep by time interaction for each stimulation level or the
stimulation by time interaction for each sleep deprivation level. In these two
cases, the appropriate error term, if all of the assumptions of the overall
analysis are met, is MSS × RM from the omnibus analysis. However, as you know
by now, caution is recommended with respect to the sphericity assumption,
which dictates that each simple interaction effect be analyzed as a separate
two-way ANOVA in which only the interaction is analyzed.
Follow-up Comparisons: Partial Interactions
As in the case of the two-way ANOVA, a three-way ANOVA in which at least
two of the factors have three levels or more can be analyzed in terms of partial
interactions, either as planned comparisons or as a way to follow up a significant three-way interaction. However, with three factors in the design, there
are two distinct options. The first type of partial interaction involves forming
a pairwise or complex comparison for one of the factors and crosses that comparison with all levels of the other two factors. For instance, you could reduce
the stimulation factor to a comparison of caffeine and reward (pairwise) or to
a comparison of placebo with the average of caffeine and reward (complex)
but include all the levels of the other two factors. The second type of partial
interaction involves forming a comparison for two of the factors. For example,
caffeine versus reward and jet lag versus interrupted crossed with the three
time periods. If a pairwise or complex comparison is created for all three factors, the result is a 2 × 2 × 2 subset of the original design, which has only one
numerator df and therefore qualifies as an interaction contrast. A significant
partial interaction may be decomposed into a series of interaction contrasts,
or one can plan to test several of these from the outset. Another alternative is
that a significant three-way interaction can be followed directly by post hoc
interaction contrasts, skipping the analysis of partial interactions, even when
they are possible. A significant three-way (i.e., 2 × 2 × 2) interaction contrast
would be followed by a test of simple interaction effects, and, if appropriate,
simple main effects (i.e., t tests between two cells).
Follow-Up Comparisons: Three-Way
Interaction Not Significant
When the three-way interaction is not significant, attention shifts to the
three two-way interactions. If none of the two-way interactions is significant, any significant main effect with more than two levels can be explored
further with pairwise or complex comparisons among its levels. If only one
of the two-way interactions is significant, the factor not involved in the interaction can be explored in the usual way if its main effect is significant. Any
significant two-way interaction can be followed up with an analysis of its
simple effects or with partial interactions and/or interaction contrasts, as
described in Chapter 14, Section C.
Planned Comparisons for the Three-Way ANOVA
Bear in mind that a three-way ANOVA with several levels of each factor creates so many possibilities for post hoc testing that it is rare for a researcher
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 713
Section B • Basic Statistical Procedures
to follow every significant omnibus F ratio (remember, there are seven of
these) with post hoc tests and every significant post hoc test with more localized tests until all allowable cell-to-cell comparisons are made. It is more
common when analyzing a three-way ANOVA to plan several comparisons
based on one’s research hypotheses.
Although a set of orthogonal contrasts is desirable, more often the
planned comparisons are a mixture of simple effects, two- and three-way
interaction contrasts, and cell-to-cell comparisons. If there are not too many
of these, it is not unusual to test each planned comparison at the .05 level.
However, if the planned comparisons are not orthogonal, and overlap in various ways, the cautious researcher is likely to use the Bonferroni adjustment
to determine the alpha for each comparison. After the planned comparisons
have been tested, it is not unusual for a researcher to test the seven F ratios
of the overall analysis but to report and follow up only those effects that are
both significant and interesting (and whose patterns of means make sense).
When the RM Factor Has Only Two Levels
If you have only one RM factor in your three-way ANOVA, and that factor has
only two levels, you have the option of creating difference scores (i.e., the difference between the two RM levels) and conducting a two-way ANOVA on the
difference scores. For this two-way ANOVA, the main effect of factor A is
really the interaction of the RM factor with factor A, and similarly for factor
B. The A × B interaction is really the three-way interaction of A, B, and the
RM factor. The parts of the three-way ANOVA that you lose with this trick are
the three main effects and the A × B interaction, but if you are only interested
in interactions involving the RM factor, this shortcut can be convenient. The
most likely case in which you would want to use difference scores is when the
two levels of the RM factor are measurements taken before and after some
treatment. However, as I mentioned in Chapter 16, this type of design is a
good candidate for ANCOVA (you would use factorial ANCOVA if you had
two between-group factors).
Published Results of a Three-way ANOVA (One RM Factor)
It is not hard to find published examples of the three-way ANOVA with one
RM factor; the 2 × 2 × 2 design is probably the most common and is illustrated in a study entitled “Outcome of Cognitive-Behavioral Therapy for
Depression: Relation to Hemispheric Dominance for Verbal Processing”
(Bruder, et al., 1997). In this experiment, two dichotic listening tasks were
used to assess hemispheric dominance: a verbal (i.e., syllables) task for
which most people show a right-ear advantage (indicating left-hemispheric
cerebral dominance for speech) and a nonverbal (i.e., complex tones) task
for which most subjects exhibit a left-ear advantage. These two tasks are the
levels of the RM factor. The dependent variable was a measure of perceptual
asymmetry (PA), based on how much more material is reported from the
right ear as compared to the left ear. Obviously, a strong main effect of the
RM factor is to be expected.
All of the subjects were patients with depression. The two betweengroups factors were treatment group (cognitive therapy or placebo) and
therapy response or outcome (significant clinical improvement or not). The
experiment tested whether people who have greater left-hemisphere dominance are more likely to respond to cognitive therapy; this effect is not
expected for those “responding” to a placebo. The results exhibited a clear
pattern, as I have shown in Figure 22.10 (I redrew their figure to make the
713
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 714
714
Chapter 22 • Three-Way ANOVA
Figure 22.10
Graph of Cell Means for
the Bruder, et al.
(1997) Study
Placebo
Cognitive Therapy
25
25
20
20
Responders
15
Nonresponders
10
15
10
5
Nonresponders
5
Responders
0
–5
Syllables
Tones
0
Syllables
Tones
–5
presentation consistent with similar figures in this chapter). The authors
state:
There was a striking difference in PA between cognitive-therapy responders and
nonresponders on the syllables test but not on the complex tones test. In contrast,
there was no significant difference in PA between placebo responders and nonresponders on either test. The dependence of PA differences between responders
and nonresponders on treatment and test was reflected in a significant Outcome × Treatment × Test interaction in an overall ANOVA of these data, F (1, 72) =
5.81, p = .018. Further analyses indicated that this three-way interaction was due
to the presence of a significant Outcome × Test interaction for cognitive therapy,
F (1, 29) = 5.67, p = .025, but not for placebo, F (1, 43) = 0.96, p = .332. Cognitivetherapy responders had a significantly larger right-ear (left-hemisphere) advantage
for syllables when compared with nonresponders, t (29) = 2.58, p = .015, but no significant group difference was found for the tones test, t (29) = −1.12, p = .270.
Notice that the significant three-way interaction is followed by tests of
the simple interaction effects, and the significant simple interaction is, in
turn, followed by t tests on the simple main effects of that two-way interaction (of course, the t tests could have been reported as Fs, but it is common
to report t values for cell-to-cell comparisons when no factors are being collapsed). Until recently, F values less than 1.0 were usually shown as F < 1,
p > .05 (or ns), but there is a growing trend to report Fs and ps as given by
one’s statistical software output (note the reporting of F = 0.96 above).
Two RM Factors
There are many ways that a three-way factorial design with two RM factors
can arise in psychological research. In one case you begin with a two-way
RM design and then add a grouping factor. For instance, tension in the brow
and cheek, as measured electrically (EMG), can reveal facial expressions
that are hard to observe visually. While watching a happy scene from a
movie, cheek tension generally rises in a subject (due to smiling), whereas
brow tension declines (due to a decrease in frowning). The opposite pattern
occurs while watching a sad scene. If tension is analyzed with a 2 (brow vs.
cheek) × 2 (happy vs. sad) ANOVA, a significant interaction is likely to
emerge. This is not an impressive result in itself, but the degree of the twoway (RM) interaction can be used as an index of the intensity of a subject’s
(appropriate) emotional reactions. For example, in one (as yet unpublished)
experiment, half the subjects were told to get involved in the movie scenes
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 715
Section B • Basic Statistical Procedures
they were watching, whereas the other half were told to analyze the scene for
various technical details. As expected, the two-way interaction was stronger
for the involved subjects, producing a three-way interaction. In another
experiment subjects were selected from an introductory psychology class
based on their responses to an empathy questionnaire. Not surprisingly,
there was again a three-way interaction due to the stronger two-way interaction for the high-empathy as compared to the low-empathy subjects.
The example I will use to illustrate the calculation of a three-way
ANOVA with two RM factors was inspired by a published study in industrial
psychology entitled: “Gender and attractiveness biases in hiring decisions:
Are more experienced managers less biased?” (Marlowe, Schneider & Nelson, 1996). For pedagogical reasons, I changed the structure and conclusions of the experiment quite a bit.
In my example the subjects are all men who are chosen for having a corporate position in which they are frequently making hiring decisions. The
between-groups factor is based on how many years of experience a subject
has in such a position: little experience (less than 5 years), moderate experience (5 to 10 years), or much experience (more than 10 years). The dependent variable is the rating a subject gives to each resume (with attached
photograph) he is shown; low ratings indicate that the subject would not be
likely to hire the applicant (0 = no chance), whereas high ratings indicate that
hiring would be likely (9 = would certainly hire). The two RM factors are the
gender of the applicant and his or her attractiveness, as based on prior ratings of the photographs (above-average, average, below average). Each subject rates five applicants in each of the six attractiveness/gender categories;
for each subject and each category, the five ratings have been averaged
together and presented in Table 22.4. To reduce the necessary calculations I
have included only four subjects in each experience group. Of course, the 30
applicants rated by each subject would be mixed randomly for each subject,
eliminating both the possibility of simple order effects and the need for counterbalancing. In addition, the resumes would be counterbalanced with the
photographs, so no photograph would be consistently paired with a better
resume (the resumes would be similar anyway).
For the sake of writing general formulas in which it is easy to spot the
between-group and RM factors, I will use the letter A to represent the
between-groups factor (amount of hiring experience, in this example) and Q
and R to represent the two RM factors (gender and attractiveness, respectively). The “subject” factor will be designated as S. You have seen this factor
before written as “between-subjects,” “sub” or “S,” but with two RM factors
the shorter abbreviation is more convenient. The ANOVA that follows is the
most complex one that will be described in this text. It requires all of the SS
components of the previous analysis plus two more SS components that are
extracted to create additional error terms.
The analysis can begin in the same way as the previous one—with the
calculation of SStotal and the SSs for the three main effects. The total number
of observations, NT, equals aqrn = 3 2 3 4 = 72. SStotal, as usual, is equal to
the biased variance of all 72 observations times 72, which equals 69.85. SSA
is based on the means of the three independent groups, which appear in the
Row Means column, in the rows that represent cell means (i.e., each group
mean is the mean of the six RM cell means). SSR is based on the means for
the attractiveness levels, which appear in the Column Means row under the
columns labeled “Mean” (which takes the mean across gender). The gender
means needed for SSQ are not in the table but can be found by averaging separately the column means for females and for males. The SSs for the main
effects can now be found in the usual way.
715
Cell Mean
Col Mean
High
Cell Mean
Moderate
Cell Mean
Low
Male
5.2
6.0
5.6
5.8
5.65
5.4
4.8
5.2
6.0
5.35
5.8
6.6
6.4
5.0
5.95
5.65
Female
5.2
5.8
5.6
4.4
5.25
4.8
5.4
4.2
4.6
4.75
4.4
5.2
3.6
4.0
4.30
4.77
BELOW
5.1
5.9
5.0
4.5
5.125
5.21
5.1
5.1
4.7
5.3
5.05
5.2
5.9
5.6
5.1
5.45
Mean
6.0
5.6
6.2
5.2
5.75
5.87
5.6
5.4
5.0
6.2
5.55
5.8
6.4
6.0
7.0
6.3
Female
7.0
6.2
7.8
6.8
6.95
6.32
6.0
6.6
5.8
5.4
5.95
6.0
5.2
6.2
6.8
6.05
Male
AVERAGE
Table 22.4
6.5
5.9
7.0
6.0
6.35
6.095
5.8
6.0
5.4
5.8
5.75
5.9
5.8
6.1
6.9
6.175
Mean
7.0
6.6
5.2
6.8
6.40
6.83
6.4
5.8
7.6
7.2
6.75
7.4
7.6
6.6
7.8
7.35
Female
5.6
4.8
6.4
5.8
5.65
6.68
7.0
7.6
6.8
6.4
6.95
7.6
8.0
7.8
6.4
7.45
Male
ABOVE
6.3
5.7
5.8
6.3
6.025
6.755
6.7
6.7
7.2
6.8
6.85
7.5
7.8
7.2
7.1
7.4
Mean
5.97
5.83
5.93
5.60
5.8333
6.02
5.87
5.93
5.77
5.97
5.8833
6.2
6.5
6.3
6.37
6.3417
Row
Means
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 716
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 717
Section B • Basic Statistical Procedures
SSA = 72 σ2(6.3417, 5.8833, 5.8333) = 3.77
SSQ = 72 σ2(5.8233, 6.2167) = 2.785
SSR = 72 σ2(5.21, 6.095, 6.755) = 28.85
As in the previous analysis we will need SSbetween-subjects based on the 12
overall subject means (across all six categories of applicants). These are the
row means (ignoring the rows labeled “Cell Means” and “Column Means,” of
course) in Table 22.4.
SSbet-S = 72 σ2(6.2, 6.5, 6.3, 6.37, 5.87, 5.93, 5.77, 5.97, 5.97, 5.83, 5.93, 5.6)
= 4.694 = SSS
Because this is the same SS you would get if you were going to calculate a
main effect of subjects, I will call this SS component SSS.
Before we get enmeshed in the complexities of dealing with two RM factors, we can complete the between-groups part of the analysis. I will use Formula 16.2 from the two-way mixed design with an appropriate change in the
subscripts:
SSW = SSS − SSA
Formula 22.8
For this example, SSW = 4.694 − 3.77 = .924
dfA = a − 1 = 3 − 1 = 2
dfW = a(n − 1) = an − a = 12 − 3 = 9
Therefore,
SSA 3.77
MSA = = = 1.885
2
2
and
SSW .924
MSW = = = .103
9
9
Finally,
1.885
FA = = 18.4
.103
The appropriate critical F is F.05(2,9) = 4.26, so FA is easily significant. A
look at the means for the three groups of subjects shows us that managers
with greater experience are, in general, more cautious with their hirability
ratings (perhaps they have been “burned” more times), especially when comparing low to moderate experience. However, there is no point in trying to
interpret this finding before testing the various interactions, which may
make this finding irrelevant or even misleading. I have completed the
between-groups part of the analysis at this point just to show you that at
least part of the analysis is easy and to get it out of the way before the more
complicated within-subject part of the analysis begins.
With only one RM factor there is only one error term that involves an
interaction with the subject factor, and that error term is found easily by
subtraction. However, with two RM factors the subject factor interacts with
each RM factor separately, and with the interaction of the two of them, yielding three different error terms. The extraction of these extra error terms
requires the collapsing of more intermediate tables, and the calculation of
more intermediate SS terms. Of course, the calculations are performed the
same way as always—there are just more of them. Let’s begin, however, with
the numerators of the various interaction terms, which involve the same
procedures as the three-way analysis with only one RM factor. First, we can
717
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 718
718
Chapter 22 • Three-Way ANOVA
get SSAQR from the 18 cell means in Table 22.4 (all of the female and male
columns of the three Cell Means rows).
SSAQR = 72 σ2(cell means) = 72 .694 = 49.96
The means needed to find SSAR are averaged across the Q factor (i.e.,
gender); they are found in the three Cell Mean rows, in the columns labeled
“Means.”
SSAR = 72 σ2(5.45, 6.175, 7.4, 5.05, 5.75, 6.85, 5.125, 6.35, 6.025)
= 72 .5407 = 38.93
The means for SSQR are the Column Means in Table 22.4 for females and males
(but not Means) and are averaged across all subjects, regardless of group.
SSQR = 72 σ2(4.77, 5.65, 5.87, 6.32, 6.83, 6.68) = 72 .4839 = 34.84
The means needed for SSAQ do not have a convenient place in Table 22.4;
those means would fit easily in a table in which the female columns are all
adjacent (for Below, Average, and Above), followed by the three male
columns. Using Table 22.4, you can average together for each group all of the
female cell means and then all of the male cell means, thus producing the six
AQ means.
SSAQ = 72 σ2(6.3, 6.383, 5.683, 6.083, 5.483, 6.185) = 72 .1072 = 7.72
Now we can get the SSs for all of the interactions by subtraction, using Formula 22.1 (except that B and C have been changed to Q and R):
SSA × Q = SSAQ − SSA − SSQ = 7.72 − 3.77 − 2.785 = 1.16
SSA × R = SSAR − SSA − SSR = 38.93 − 3.77 − 28.85 = 6.31
SSQ × R = SSQR − SSQ − SSR = 34.84 − 2.785 − 28.85 = 3.21
SSA × Q × R = SSAQR − SSA × Q − SSA × R − SSQ × R − SSA − SSQ − SSR
= 49.96 − 1.16 − 6.31 − 3.21 − 3.77 − 2.785 − 28.85 = 3.875
The next (and trickiest) step is to calculate the SSs for the three RM
error terms. These are the same error terms I described at the end of Section
A in the context of the two-way RM ANOVA. For each RM factor the appropriate error term is based on the interaction of the subject factor with that
RM factor. The more that subjects move in parallel from one level of the RM
factor to another, the smaller the error term. The error term for each RM factor is based on averaging over the other factor. However, the third RM error
term, the error term for the interaction of the two RM factors, is based on
the three-way interaction of the subject factor and the two RM factors, with
no averaging of scores. To the extent that each subject exhibits the same twoway interaction for the RM factors, this error term will be small.
Two more intermediate SSs are required: SSQS, and SSRS. These SSs
come from two additional two-way means tables, each one averaging scores
over one of the RM factors but not the other. (Note: The A factor isn’t mentioned for these components because it is simply being ignored. Some of the
subject means are from subjects in the same group, and some are from subjects in different groups, but this distinction plays no role for these SS components.) You can get SSRS from the entries in the columns labeled “Means”
(ignoring the rows labeled “Cell Means” and “Column Means,” of course) in
Table 22.4; in all there are 36 male/female averages, or RS means:
SSRS = 72 σ2(RS means) = 72 .6543 = 47.11
To find the QS means, you need to create, in addition to the row means in
Table 22.4, two additional means for each row: one for the “females” in that
row, and one for the “males,” for a total of 24 “gender” row means.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 719
Section B • Basic Statistical Procedures
SSQS = 72 σ2(6.13, 6.27, 6.6, 6.4, 6.07, 6.53, 6.4, 6.33, 5.6, 6.13, 5.53,
6.33, 5.6, 5.93, 6, 5.93, 5.8, 6.13, 5.8, 5.87, 5.0, 6.87, 5.33, 5.87)
= 72 .1737 = 12.51
(Note: the means are in the order “female, male” for each subject (i.e.,row)—
top to bottom—of Table 22.4.)
Now we are ready to get the error terms by subtraction, using Formula
22.9A.
SSQ × S = SSQS − SSQ − SSS − SSA × Q
Formula 22.9A
So,
SSQ × S = 12.51 − 2.785 − 4.694 − 1.16 = 3.87
[Note: I subtracted the group by gender interaction at the end of the preceding calculation because what we really want (and what I mean by
SSQ × S) is the gender by subject interaction within each group (i.e., level of
the A factor), added across all the groups. This is not the same as just finding the gender by subject interaction, ignoring group. Any group by gender interaction will increase the gender by subject interaction when
ignoring group, but not if you calculate the interaction separately within
each group. Rather than calculating the gender by subject interaction for
each group, it is easier to calculate the overall interaction ignoring group
and then subtract the group by gender interaction. The same trick is used
to find SSR × S.]
SSR × S = SSRS − SSR − SSS − SSA × R
Formula 22.9B
Therefore,
SSR × S = 47.11 − 28.85 − 4.694 − 6.31 = 7.26
Finally, the last error term, SSQ × R × S, can be found by subtracting all of
the other SS components from SStotal. To simplify this last calculation, note
that SStotal is the sum of all the cell-to-cell variation and the four error
terms:
SStotal = SSAQR + SSW + SSQ × S + SSR × S + SSQ × R × S
Formula 22.9C
So,
SSQ × R × S = SStotal − SSAQR − SSW − SSQ × S − SSR × S
= 69.85 − 49.96 − .924 − 3.87 − 7.26 = 7.836
The degrees of freedom are divided up for this design in a way that is
best illustrated in a df tree, as shown in Figure 22.11. The formulas for the
df’s are as follows:
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
dfA = a − 1
dfQ = q − 1
dfR = r − 1
dfA × Q = (a − 1)(q − 1)
dfA × R = (a − 1)(r − 1)
dfQ × R = (q − 1)(r − 1)
dfA × Q × R = (a − 1)(q − 1)(r − 1)
dfW = a (n − 1)
dfQ × S = dfQ dfW = a(q − 1)(n − 1)
dfR × S = dfR dfW = a(r − 1)(n − 1)
dfQ × R × S = dfQ dfR dfW = a(q − 1)(r − 1)(n − 1)
For this example,
Formula 22.10
719
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 720
720
Chapter 22 • Three-Way ANOVA
dfA = 3 − 1 = 2
dfQ = 2 − 1 = 1
dfR = 3 − 1 = 2
dfA × Q = 2 1 = 2
dfA × R = 2 2 = 4
dfQ × R = 1 2 = 2
dfA × Q × R = 2 1 2 = 4
dfW = 3 3 = 9
dfQ × S = dfQ dfW = 1 9 = 9
dfR × S = dfR dfW = 2 9 = 18
dfQ × R × S = dfQ dfR dfW = 1 2 9 = 18
Note that the sum of all the df’s is 71, which equals dftotal (NT − 1 = aqrn − 1 =
72 − 1). When you have converted each SS to an MS, the seven F ratios are
formed as follows:
MSA
FA = MSW
a.
MSQ
b. FQ = MSQ × S
MSA × R
FA × R = MSR × S
f.
MSQ × R
FQ × R = MSQ × R × S
Formula 22.11
MSA × Q × R
g. FA × Q × R = MSQ × R × S
MSR
FR = MSR × S
c.
e.
MSA × Q
d. FA × Q = MSQ × S
The completed analysis is shown in Table 22.5. Notice that each of the
three different RM error terms is being used twice. This is just an extension
df total
[aqrn–1]
Figure 22.11
Degrees of Freedom Tree
for 3-Way ANOVA with
Repeated Measures on
Two Factors
df between-S
[an–1]
df A
[a–1]
df within-S
[an(qr–1)]
df W
[a(n–1)]
df Q
[q–1]
df AXQ
[(a–1)(q–1)]
df QXS
[a(n–1)(q–1)]
df R
[r–1]
df QXR
[(q–1)(r–1)]
df AXR
[(a–1)(r–1)]
df RXS
[a(n–1)(r–1)]
df AXQXR
[(a–1)(q–1)(r–1)]
df QXRXS
[a(n–1)(q–1)(r–1)]
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 721
Section B • Basic Statistical Procedures
Source
SS
df
MS
Between-groups
Hiring Experience
Within-group error
4.694
3.77
.924
11
2
9
1.885
.103
Within-subjects
Gender
Group × Gender
Gender × Subject
65.156
2.785
1.16
3.87
60
1
2
9
2.785
.580
.430
6.48
1.35
<.05
>.05
Attractiveness
Group × Atttract
Attract × Subject
28.85
6.31
7.26
2
4
18
14.43
1.578
.403
35.81
3.92
<.001
<.05
2
4
18
1.60
.970
.435
3.69
2.23
<.05
>.05
Gender × Attract
Group × Gender × Attract
Gender × Attract × Subject
3.21
3.875
7.836
F
p
18.4
<.001
721
Table 22.5
Note: The note from Table 22.3 applies here as well.
of what you saw in the two-way mixed design when the S × RM error term
was used for both the RM main effect and its interaction with the betweengroups factor.
Interpreting the Results
Although the three-way interaction was not significant, you will probably
want to graph all of the cell means in any case to see what’s going on in your
results; I did this in Figure 22.12, choosing applicant gender as the variable
whose levels are represented by different graphs and hiring experience levels
to be represented as different lines on each graph. You can see by comparing
the two graphs in the figure why the F ratio for the three-way interaction
was not very small, even though it failed to attain significance. The threeway interaction is due almost entirely to the drop in hirability from average
to above average attractiveness only for highly experienced subjects judging
male applicants. It is also obvious (and not misleading) that the main effect
of attractiveness should be significant (with the one exception just mentioned, all the lines go up with increasing attractiveness), and the main effect
of gender as well (the lines on the “male” graph are generally higher.) That
Female
Male
Low
Low
7
7
Moderate
Moderate
High
6
6
High
5
0
5
Below
Average
Above
0
Below
Average
Above
Figure 22.12
Graph of the Cell Means
for the Data in Table 22.4
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 722
722
Chapter 22 • Three-Way ANOVA
Figure 22.13
Graph of the Cell Means
for Table 22.4 After
Averaging Across Gender
Average of Female and
Male Applicants
Low
7
Moderate
High
6
5
0
Below
Average
Above
the line for the low experience group is consistently above the line for moderate experience seems to account, at least in part, for the significance of the
main effect for that factor.
The significant attractiveness by experience (i.e., group) interaction is
clearly due to a strong interaction for the male condition being averaged
with a lack of interaction for the females (Figure 22.13 shows the male and
female conditions averaged together, which bears a greater resemblance to
the male than female condition). This is a case when a three-way interaction
that is not significant should nonetheless lead to caution in interpreting significant two-way interactions.
Perhaps, the most interesting significant result is the interaction of
attractiveness and gender. Figure 22.14 shows that although attractiveness is
a strong factor in hirability for both genders, it makes somewhat less of a
difference for males. However, the most potentially interesting result would
have been the three-way interaction, had it been significant; it could have
shown that the impact of attractiveness on hirability changes with the experience of the employer, but more for male than female applicants.
Figure 22.14
Graph of the Cell Means
for Table 22.4 After
Averaging Across the
Levels of Hiring
Experience
Average of the Three
Hiring Experience Levels
7
Female
Male
6
5
0
Below
Average
Above
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 723
Section B • Basic Statistical Procedures
Assumptions
For each of the three RM error terms (Q × S, R × S, Q × R × S), pairwise interactions should be the same for each independent group; these assumptions
can be tested with three applications of Box’s M test. With interactions
pooled across groups, sphericity can then be tested with Mauchly’s W for
each of the three error terms (Huynh & Mandeville, 1979). In the preceding
example, sphericity is not an issue for gender, which has only two levels, but
sphericity can be tested separately for both attractiveness and the gender by
attractiveness interaction. However, rather than relying on the outcome of
statistical tests of assumptions, researchers often “play it safe” by ensuring
that all of the groups have the same number of subjects and adjusting the df
with epsilon before testing effects involving a repeated-measures factor.
Follow-Up Comparisons
Given the significance of the attractiveness by experience interaction, it
would be reasonable to perform follow-up tests, similar to those described
for the two-way mixed design in Chapter 16. This includes the possibility of
analyzing simple effects (a one-way ANOVA at each attractiveness level or a
one-way RM ANOVA for each experience group), partial interactions (e.g.,
averaging the low and moderate experience conditions and performing the
resulting 2 × 3 ANOVA) or interaction contrasts (e.g., the average of the low
and moderate conditions and the high condition crossed with the average
and above average attractiveness conditions). Such tests, if significant, could
justify various cell-to-cell comparisons. To follow up on the significant gender by attractiveness interaction, the most sensible approach would be simply to conduct RM t tests between the genders at each level of attractiveness.
In general, planned and post hoc comparisons for the three-way ANOVA
with two RM factors follow the same logic as those described for the design
with one RM factor. The only differences concern the error terms for these
comparisons. If your between-group factor is significant, involves more than
two levels, and is not involved in an interaction with one of the RM factors,
you can use MSW from the overall analysis as your error term. For all other
comparisons, using an error term from the overall analysis requires some
questionable homogeneity assumption. For tests involving one or both of
the two RM factors, it is safest to perform all planned and post hoc comparisons using an error term based only on the conditions included in the test.
Published Results of a Three-way ANOVA (Two RM Factors)
Banaji and Hardin (1996) studied automatic stereotyping by presenting
common gender pronouns to subjects (e.g., she, him) and measuring their
reaction times to judging the gender of the pronouns (i.e., male or female; no
neutral pronouns were used in their Experiment 1). The interesting manipulation was that the pronouns were preceded by primes—words that subjects were told to ignore but which could refer to a particular gender by
definition (i.e., mother) or by stereotype (i.e., nurse). The gender of the
prime on each trial was either female, male, neutral (i.e., postal clerk) or just
a string of letters (nonword). The authors describe their 4 × 2 × 2 experimental design as a “mixed factorial, with subject gender the betweensubjects factor” (p. 138). An excerpt of their results follows:
The omnibus Prime Gender (female, male, neutral, nonword) × Target Gender
(female, male) × Subject Gender (female, male) three-way analysis of variance
yielded the predicted Prime Gender × Target Gender interaction, F (3,198) =
723
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 724
Chapter 22 • Three-Way ANOVA
724
72.25, p < .0001 . . . No other reliable main effects or interactions were obtained
as a function of either subject gender or target gender (Fs < 1) (p. 138).
The significant two-way interaction was then followed with an interaction
contrast (dropping the neutral and nonword prime conditions) and cell-tocell comparisons:
The specific Prime Gender × Target Gender interaction (excluding the neutral
conditions) was also reliable, F (1,66) = 117.56, p < .0001. Subjects were faster to
judge male pronouns after male than female primes, t (67) = 11.59, p < .0001, but
faster to judge female pronouns after female than male primes, t (67) = 6.90, p <
.0001 (p. 138).
B
SUMMARY
1. The calculation of the three-way ANOVA with repeated measures on one
factor follows the basic outline of the independent three-way ANOVA, as
described in Section A, but adds elements of the mixed design, as delineated in Chapter 16. The between-subject factors are labeled A and B,
whereas the within-subject factor is labeled R (short for RM). The number of levels of the factors are symbolized by a, b, and c, respectively.
The following steps should be followed:
a. Begin with a table of the individual scores and then find the mean for
each level of each factor, the mean for each different subject (averaging across the levels of the RM factor), and the mean for each cell of
the three-way design. From your table of cell means, create three
“two-way” tables of means, in each case taking a simple average of
the cell means across one of the three factors.
b. Use Formula 14.3 to find SStotal from the individual scores; SSA, SSB,
and SSR from the means at each factor level; SSbetween-subjects from the
means for each subject; SSABR from the cell means; and SSAB, SSAR,
and SSBR from the two-way tables of means.
c. Find the SS components for the three two-way interactions, the
three-way interaction, and the two error terms (SSW and SSS × R) by
subtraction. Divide these six SS components, along with the three SS
components for the main effects, by their respective df to create the
nine necessary MS terms.
d. Form the seven F ratios by using MSW as the error term for the main
effects of A and B and their interaction and then, using MSS × R as the
error term for the main effect of the RM factor, its interaction with A,
its interaction with B, and the three-way interaction.
2. The calculation of the three-way ANOVA with repeated measures on two
factors is related to both the independent three-way ANOVA and the
two-way RM ANOVA. The between-subject factor is labeled A, whereas
the two RM factors are labeled R and Q. The number of levels of the factors are symbolized by a, r, and q, respectively. The following steps
should be followed.
a. Begin with a table of the individual scores and then find the mean for
each level of each factor, the mean for each different subject (averaging across the levels of both RM factors), and the mean for each cell
of the three-way design. From your table of cell means, create three
“two-way” tables of means, in each case taking a simple average of
the cell means across one of the three factors. In addition, create two
more two-way tables in which scores are averaged over one RM factor or the other, but not both, and subjects are not averaged across
groups (i.e., each table is a two-way matrix of subjects by one of the
RM factors.).
b. Use Formula 14.3 to find SStotal from the individual scores; SSA, SSQ,
and SSR from the means at each factor level; SSS from the means for
each subject; SSABR from the cell means; SSAB, SSAR, and SSBR from the
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 725
Section B • Basic Statistical Procedures
725
two-way tables of means; and SSQS and SSRS from the additional twoway tables of subject means.
c. Find the SS components for the three two-way interactions, the
three-way interaction, and the four error terms (SSW, SSQ × S, SSR × S,
and SSQ × R × S) by subtraction. Divide these eight SS components,
along with the three SS components for the main effects by their
respective df to create the 11 necessary MS terms.
d. Form the seven F ratios by using MSW as the error term for the main
effect of A; MSQ × S as the error term for both the main effect of Q and
its interaction with A; MSR × S as the error term for both the main
effect of R and its interaction with A; and MSQ × R × S as the error term
for both the interaction of Q and R, and the three-way interaction.
3. If the three-way interaction is not significant, the focus shifts to the twoway interactions. A significant two-way interaction is followed either by
an analysis of simple effects or by an analysis of partial interactions
and/or interaction contrasts, as described in Chapter 14. Any significant
main effect for a factor not involved in a two-way interaction can be
explored with pairwise or complex comparisons among its levels (if
there are more than two).
4. If the three-way interaction is significant, it is common to test the simple interaction effects. Any significant simple interaction effect can be
followed by an analysis of simple main effects and finally by cell-to-cell
comparisons, if warranted. Alternatively, the significant three-way interaction can be localized by analyzing partial interactions involving all
three factors; for example, either one or two of the factors can be
reduced to only two levels. It is also reasonable to skip this phase and
proceed directly to test various 2 × 2 × 2 interaction contrasts, which are
then followed by simple interaction effects and cell-to-cell comparisons
if the initial tests are significant.
5. Three-way ANOVAs that include RM factors require homogeneity and
sphericity assumptions that are a simple extension of those for the twoway mixed design. Because tests of these assumptions can be unreliable,
and their violation is likely in many psychological experiments and the
violation can greatly inflate the Type I error rate, especially when conducting post hoc tests, it is usually recommended that post hoc comparisons, and even planned ones, use an error term based only on the
factor levels included in the comparison and not the error term from the
overall analysis.
EXERCISES
1. A total of 60 college students participated in a
study of attitude change. Each student was
randomly assigned to one of three groups that
differed according to the style of persuasion
that was used: rational arguments, emotional
appeal, and stern/commanding (Style factor).
Each of these groups was randomly divided in
half, with one subgroup hearing the arguments from a fellow student, and the other
from a college administrator (Speaker factor).
Each student heard arguments on the same
four campus issues (e.g., tuition increase),
and attitude change was measured for each of
the four issues (Issue factor). The sums of
squares for the three-way mixed ANOVA
are as follows: SSstyle = 50.4, SSspeaker = 12.9,
SSissue = 10.6, SSstyle × speaker = 21.0, SSstyle × issue =
72.6, SSspeaker × issue = 5.3, SSstyle × speaker × issue =
14.5, SSW = 189, and SStotal = 732.7.
a. Calculate the seven F ratios, and test each
for significance.
b. Find the conservatively adjusted critical
F for each test involving a repeatedmeasures factor. Will any of your conclu-
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 726
Chapter 22 • Three-Way ANOVA
726
a. Calculate the three-way mixed-design
ANOVA for the data. Present your results
in a summary table.
b. Use graphs of the cell means to help you
describe the pattern underlying each
effect that was significant in part a.
c. Do you need to retest any of your results
in part a if you make no assumptions
about sphericity? Explain.
d. How can you transform the data above so
it can be analyzed by a two-way independent-groups ANOVA? Which effects from
the analysis in part a would no longer be
testable?
e. If a simple order effect is present in the
data, which error term is being inflated?
How can you remove the extra variance
from that error term?
3. The dean at a large college is testing the
effects of a new advisement system on students’ feelings of satisfaction with their educational experience. A random sample of 12
first-year students coming from small high
schools was selected, along with an equalsized sample of students from large high
schools. Within each sample, a third of the
students were randomly assigned to the new
system, a third to the old system, and a third
to a combination of the two systems. Satisfaction was measured on a 10-point scale
(10 = completely satisfied) at the end of each
student’s first, second, third, and fourth
years. The data for the study appear in the
following table:
sions be affected if you do not make any
assumptions about sphericity?
2. Based on a questionnaire they had filled out
earlier in the semester, students were classified as high, low, or average in empathy. The
12 students recruited in each category for
this experiment were randomly divided in
half, with one subgroup given instructions to
watch videotapes to check for the quality of
the picture and sound (detail group) and the
other subgroup given instructions to get
involved in the story portrayed in the
videotape. All subjects viewed the same two
videotapes (in counterbalanced order): one
presenting a happy story and one presenting
a sad story. The dependent variable was the
subject’s rating of his or her mood at the end
of each tape, using a 10-point happiness
scale (0 = extremely sad, 5 = neutral, and 10 =
extremely happy). The data for the study
appear in the following table:
LOW
EMPATHY
AVERAGE
EMPATHY
HIGH
EMPATHY
Happy
Sad
Happy
Sad
Happy
Sad
Detail
6
6
5
7
4
6
5
5
7
4
6
5
5
7
5
5
4
5
4
2
3
5
4
5
7
8
7
5
6
5
3
3
1
5
4
5
Involved
5
5
6
4
5
4
4
4
4
5
3
5
6
6
7
4
6
4
2
2
1
4
2
4
7
8
9
7
6
7
2
1
1
2
1
2
SMALL HIGH SCHOOL
LARGE HIGH SCHOOL
First
Second
Third
Fourth
First
Second
Third
Fourth
Old System
4
5
3
5
4
5
4
4
5
4
5
4
4
4
6
6
5
6
4
5
5
7
4
5
5
7
4
5
6
6
4
5
New System
6
7
5
6
5
8
4
6
6
9
4
7
7
9
5
7
7
7
6
6
8
8
7
5
8
8
7
6
8
8
7
8
Combined
5
4
6
5
5
5
6
6
6
6
5
6
6
7
6
6
9
8
9
8
7
7
6
6
7
7
5
6
7
6
5
8
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 727
Section B • Basic Statistical Procedures
a. Calculate the appropriate three-way
ANOVA for the data. Present your results
in a summary table.
b. Use graphs of the cell means to help you
describe the pattern underlying each effect
that was significant in part a. Describe a
partial interaction that would be meaningful. How might you use trend components?
c. What analyses of simple effects are justified,
if any, by the results in part a? What error
term should you use in each case if you
make no assumptions about sphericity?
d. Find the conservatively adjusted critical
F for each test involving a repeatedmeasures factor. Will any of your conclusions be affected if you do not make any
assumptions about sphericity?
4. Imagine an experiment in which all subjects
solve two types of problems (spatial and verbal), each at three levels of difficulty (easy,
moderate, and hard). Half of the 24 subjects
are given instructions to use visual imagery,
and half are told to use subvocalization. The
dependent variable is the number of eye
movements that a subject makes during the
5-second problem-solving period. The cell
means for this experiment are given in the
following table:
SUBVOCAL
INSTRUCTIONS
727
a. Given that SStype × S = 224, SSdifficulty × S = 130,
SStype × difficulty × S = 62, and SSW = 528, perform the appropriate three-way ANOVA
on the data. Present your results in a summary table.
b. Graph the Type × Difficulty means, averaging across instruction group. Compare
this graph to the Type × Difficulty graph
for each instruction group. Can the overall Type × Difficulty interaction be meaningfully interpreted? Explain.
c. Find the conservatively adjusted critical F
for each test. Will any of your conclusions
be affected if you do not assume that
sphericity exists in the population?
d. Given the results you found in part a,
which simple effects can be justifiably
analyzed?
5. Imagine that the psychologist in Exercise 6
of Section A runs her study under two different conditions with two different random
samples of subjects. The two conditions
depend on the type of background music
played to the subjects as they memorize the
list of words: very happy or very sad. The
number of words recalled in each word category for each subject in the two groups is
given in the following table:
IMAGERY
INSTRUCTIONS
Difficulty
Spatial
Verbal
Spatial
Verbal
Easy
Moderate
Hard
1.5
2.5
2.7
1.6
1.9
2.1
3.9
5.2
7.8
2.2
2.4
2.8
SAD
NEUTRAL
HAPPY
Low
Medium
High
Low
Medium
High
Low
Medium
High
Happy
Music
4
2
4
2
4
3
6
5
7
5
8
5
9
7
5
4
8
6
3
4
3
4
5
5
5
6
5
6
7
5
6
7
5
6
7
6
4
5
4
4
5
6
4
6
5
4
5
4
9
6
7
5
10
5
Sad
Music
5
3
6
3
4
5
6
5
7
6
10
5
9
9
6
7
9
7
2
3
2
3
4
4
4
6
4
4
6
5
6
5
5
6
7
6
3
4
3
4
5
4
4
5
3
4
5
4
6
5
6
5
8
5
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 728
Chapter 22 • Three-Way ANOVA
728
items correctly identified in a subsequent
recognition test. Six subjects are selected
from each of the following categories: damage confined to the right cerebral hemisphere, damage confined on the left, and
equal damage to the two hemispheres.
Within each category, subjects are matched
into three pairs, and one member of each
pair is randomly selected to receive training,
and the other member is not.
a. Perform the appropriate three-way
mixed-design ANOVA on the data (don’t
forget that subjects are matched on the
Training factor). Present your results in a
summary table.
b. How many different order conditions
would be needed to counterbalance this
study? How can you tell from the cell sizes
that this study could not have been properly counterbalanced?
c. Describe a meaningful partial interaction
involving all three factors. Describe a set
of orthogonal contrasts for completely
analyzing the three-way interaction.
a. Perform a three-way mixed-design ANOVA
on the data. Present your results in a summary table.
b. Find the conservatively adjusted critical F
for each test. Will any of your conclusions
be affected if you do not assume that
sphericity exists in the population?
c. Draw a graph of the cell means (with separate panels for the two types of background music), and describe the nature of
any effects that are noticeable. Which 2 ×
2 × 2 interaction contrast appears to be
the largest?
d. Based on the variables in this exercise,
and the results in part a, what post hoc
tests would be justified and meaningful?
6. A neuropsychologist is testing the benefits of
a new cognitive training program designed
to improve memory in patients who have
suffered brain damage. The effects of the
training are being tested on four types of
memory: abstract words, concrete words,
human faces, and simple line drawings.
Each subject performs all four types of tasks.
The dependent variable is the number of
NO TRAINING
TRAINING
Abstract Concrete Faces Drawings Abstract Concrete Faces Drawings
Right brain
damage
11
13
9
19
20
18
7
10
4
5
9
1
12
10
14
18
19
17
10
7
13
8
11
5
Left brain
damage
5
7
3
5
8
5
13
15
11
11
7
15
7
9
5
10
8
12
15
17
13
12
9
15
Equal
damage
7
8
6
6
5
7
11
8
14
7
9
5
8
7
9
9
11
7
11
9
13
9
7
11
C
OPTIONAL
MATERIAL
Multivariate Analysis of Variance
Multifactor experiments have become very popular in recent years, in part
because they allow for the testing of complex interactions, but also because
they can be an efficient (not to mention economical) way to test several
hypotheses in one experiment, with one set of subjects. This need for efficiency is driven to some extent by the ever-increasing demand to publish as
well as the scarcity of funding. Given the current situation, it is not surprising that researchers rarely measure only one dependent variable. Once you
have invested the resources to conduct an elaborate experiment, the cost is
usually not increased very much by measuring additional variables; it makes
sense to squeeze in as many extra measures or tasks as you can without
exhausting the subjects and without one task interfering with another. Having gathered measures on several DVs, you can then test each DV separately
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 729
Section C • Optional Material
729
with the appropriate ANOVA design. However, if each DV is tested at the .05
level, you are increasing the risk of making a Type I error in the overall
study—that is, you are increasing the experimentwise alpha. You can use the
Bonferroni adjustment to reduce the alpha for each test, but there is an
alternative that is frequently more powerful. This method, in which all of the
DVs are tested simultaneously, is called the multivariate analysis of variance
(MANOVA); the term multivariate refers to the incorporation of more than
one DV in the test (all of the ANOVA techniques you have learned thus far
are known as univariate tests).
Although it seems clear to me that the most common use of MANOVA at
present is the control of experimentwise alpha, it is certainly not the most
interesting use. I think it is safe to say that the most interesting use of
MANOVA is to find a combination of the DVs that distinguishes your groups
better than any of the individual DVs separately. In fact, the MANOVA can
attain significance even when none of the DVs does by itself. This is the type
of situation I will use to introduce MANOVA. The choice of my first example
is also dictated by the fact that MANOVA is much simpler when it is performed on only two groups.
The Two-Group Case: Hotelling’s T 2
Imagine that a sample of high school students is divided into two groups
depending on their parents’ scores on a questionnaire measuring parental
attitudes toward education. One group of students has parents who place a
high value on education, and the other group has parents who place relatively little value on education. Each student is measured on two variables:
scholastic aptitude (for simplicity I’ll use IQ) and an average of grades for
the previous semester. The results are shown in Figure 22.15. Notice that
almost all of the students from “high value” (HV) homes (the filled circles)
have grades that are relatively high for their IQs, whereas nearly all the students from “low value” (LV) homes show the opposite pattern. However, if
you performed a t test between the two groups for IQ alone, it would be
nearly zero, and although the HV students have somewhat higher grades on
average, a t test on grades alone is not likely to be significant, either. But you
can see that the two groups are fairly well separated on the graph, so it
should come as no surprise that there is a way to combine the two DVs into
a quantity that will distinguish the groups significantly.
HV homes
LV homes
110
Figure 22.15
Plot in which Two
Groups of Students Differ
Strongly on Two
Variables
IQ 100
90
70
80
90
Grades
100
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 730
730
Chapter 22 • Three-Way ANOVA
A simple difference score, IQ − grades (i.e., IQ minus grades), would
separate the groups rather well, with the LV students clearly having the
higher scores. (This difference score is essentially an underachievement
score; in this hypothetical example students whose parents do not value education do not get grades as high as their IQs suggest they could, whereas
their HV counterparts tend to be “overachievers”). However, the MANOVA
procedure can almost always improve on a simple difference score by finding the weighted combination of the two variables that produces the largest
t value possible. (If you used GPA on a four-point scale to replace grades, it
would have to be multiplied by an appropriate constant before it would be
reasonable to take a difference score, but even if you transform both variables to z scores, the MANOVA procedure will find a weighted combination
that is better than just a simple difference. In many cases, a sum works better than a difference score, in which case MANOVA finds the best weighted
sum.) Given the variables in this problem, the discriminant function, which
creates the new variable to be tested, can be written as W1 IQ + W2 Grades + Constant (the constant is not relevant to the present discussion).
For the data in Figure 22.15, the weights would come out close to W1 = 1 and
W2 = −1, leading to something resembling a simple difference score. However, for the data in Figure 22.16, the weights would be quite different.
Looking at the data in Figure 22.16, you can see that once again the two
groups are well separated, but this time the grades variable is doing most of
the discrimination, with IQ contributing little. The weights for the discriminant function would reflect that; the weight multiplying the z score for
grades would be considerably larger than the weight for the IQ z score.
It is not a major complication to use three, four, or even more variables
to discriminate between the two groups of students. The raw-score (i.e.,
unstandardized) discriminant function for four variables would be written
as W1X1 + W2X2 + W3X3 + W4X4 + Constant. This equation can, of course, be
expanded to accommodate any number of variables. Adding more variables
nearly always improves your ability to discriminate between the groups, but
you pay a price in terms of losing degrees of freedom, as you will see when I
discuss testing the discriminant function for statistical significance. Unless
a variable is improving your discrimination considerably, adding it can actually reduce your power and hurt your significance test.
Going back to the two-variable case, the weights of the discriminant
function become increasingly unreliable (in terms of changing if you repeat
the experiment with a new random sample) as the correlation of the two vari-
HV homes
LV homes
Figure 22.16
Plot in which Two
Groups of Students Differ
Strongly on One Variable
and Weakly on a Second
Variable
110
IQ 100
90
70
80
90
Grades
100
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 731
Section C • Optional Material
ables increases. It is not a good idea to use two variables that are nearly
redundant (e.g., both SAT scores and IQ). The likelihood of two of your variables being highly correlated increases as you add more variables, which is
another reason not to add variables casually. The weights of your discriminant function depend on the discriminating power of each variable individually (its rpb with the grouping variable) and the intercorrelations among the
variables. When your variables have fairly high intercorrelations, the discriminant loading of each variable can be a more stable indicator of its contribution to the discriminant function. A DV’s discriminant loading is its
ordinary Pearson correlation with the scores from the discriminant function.
High positive correlations among your DVs reduce the power of
MANOVA when all the DVs vary in the same direction across your groups
(Cole, Maxwell, Arvey, & Salas, 1994), which is probably the most common
case. In fact, it has been suggested that one can obtain more power by running separate univariate ANOVAs for each of the highly correlated DVs and
adjusting the alpha for each test according to the Bonferroni inequality. On
the other hand, MANOVA becomes particularly interesting and powerful
when some of the intercorrelations among the DVs are negative or when
some of the DVs vary little from group to group but correlate highly (either
positively or negatively) with other DVs. A DV that fits the latter description
is acting like a “suppressor” variable in multiple regression. The advantage
of that type of relation was discussed in Chapter 17.
In the two-group case the discriminant weights will be closely related to
the beta weights of multiple regression when you use your variables to predict the grouping variable (which can just be coded arbitrarily as 1 for one
group, and 2 for the other). This was touched upon in Chapter 17, section C,
under “Multiple Regression with a Dichotomous Criterion.” Because discriminant functions are not used nearly as often as multiple regression
equations, I will not go into much detail on that topic. The way that discriminant functions are most often used is as the basis of the MANOVA procedure, and when performing MANOVA, there is usually no need to look at
the underlying discriminant function. We are often only interested in testing
its significance by methods I will turn to next.
Testing T 2 for Significance
It is not easy to calculate a discriminant function, even when you have only
two groups to discriminate (this requires matrix algebra and is best left to
statistical software), but it is fairly easy to understand how it works. The discriminant function creates a new score for each subject by taking a weighted
combination of that subject’s scores on the various dependent variables.
Then, a t test is performed on the two groups using the new scores. There are
an infinite number of possible discriminant functions that could be tested,
but the one that is tested is the one that creates the highest possible t value.
Because you are creating the best possible combination of two or more variables to obtain your t value, it is not fair to compare it to the usual critical t.
When combining two or more variables, you have a greater chance of getting
a high t value by accident. The last step of MANOVA involves finding the
appropriate null hypothesis distribution.
One problem in testing our new t value is that the t distribution cannot
adjust for the different number of variables that can go into our combination. We will have to square the t value so that it follows an F distribution. To
indicate that our new t value has not only been squared but that it is based
on a combination of variables, it is customary to refer to it as T 2—and in particular, Hotelling’s T 2—after the mathematician who determined its distri-
731
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 732
732
Chapter 22 • Three-Way ANOVA
bution under the null hypothesis. T 2 follows an F distribution, but it first
must be reduced by a factor that is related to the number of variables, P, that
were used to create it. The formula is
n1 + n2 − P − 1
F = T 2
P(n1 + n2 − 2)
Formula 22.12
where, n1 and n2 are the sizes of the two groups. The critical F is found with
P and n1 + n2 − P − 1 degrees of freedom. Notice that when the sample sizes
are fairly large compared to P, T 2 is multiplied by approximately 1/P. Of
course, when P = 1, there is no adjustment at all.
There is one case in which it is quite easy to calculate T 2. Suppose you
have equal-sized groups of left- and right-handers and have calculated t tests
for two DVs: a verbal test and a spatial test. If across all your subjects the two
DVs have a zero correlation, you can find the square of the point-biserial
correlation corresponding to each t test (use Formula 10.13 without taking
the square root) and add the two together. The resulting rpb2 can be converted
back to a t value by using Formula 10.12 (for testing the significance of rpb).
However, if you use the square of that formula to get t2 instead, what you are
really getting is T 2 for the combination of the two DVs. T 2 can then be tested
with the preceding formula. If the two DVs are positively correlated, finding
T 2 as just described would overestimate the true value (and underestimate it
if the DVs are negatively correlated). If you have any number of DVs, and
each possible pair has a zero correlation over all your subjects, you can add
all the squared point-biserial rs and convert to T 2, as just described. If any
two of your DVs have a nonzero correlation with each other, you can use
multiple regression to combine all of the squared point-biserial rs; the combination is called R2. The F ratio used in multiple regression to test R2 would
give the same result as the F for testing T 2 in this case. In other words, the
significance test of a MANOVA with two groups is the same as the significance test for a multiple regression to predict group membership from your
set of dependent variables.
If you divide an ordinary t value by the square root of n/2 (if the groups
are not the same size, n has to be replaced by the harmonic mean of the two
sample sizes), you get g, a sample estimate of the effect size in the population. If you divide T 2 by n/2 (again, you need the harmonic mean if the ns are
unequal) you get MD2, where MD is a multivariate version of g, called the
Mahalanobis distance.
In Figure 22.17 I have reproduced Figure 22.15, but added a measure
of distance. The means of the LV and HV groups are not far apart on either
IQ or grades separately, but if you create a new axis from the discriminant
function that optimally combines the two variables, you can see that the
groups are well separated on the new axis. Each group has a mean (called
a centroid) in the two-dimensional space formed by the two variables. The
MD is the standardized distance between the centroids, taking into
account the correlation between the two variables. If you had three discriminator variables, you could draw the points of the two groups in threedimensional space, but you would still have two centroids and one
distance between them. The MD can be found, of course, if you have even
more discriminator variables, but unfortunately I can’t draw such a case.
Because T 2 = (n/2)MD2, even a tiny MD can attain statistical significance
with a large enough sample size. That is why it is useful to know MD in
addition to T 2, so you can evaluate whether the groups are separated
enough to be easily discriminable. I will return to this notion when I discuss discriminant analysis.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 733
Section C • Optional Material
HV homes
LV homes
110
LV centroid
Mahalanobis Distance
HV centroid
IQ 100
90
Discriminant
function
70
80
90
100
Grades
The Multigroup Case: MANOVA
The relation between Hotelling’s T 2 and MANOVA is analogous to the relation between the ordinary t test and the univariate ANOVA. Consistent with
this analogy, the Hotelling’s T 2 statistic cannot be applied when you have
more than two groups, but its principles do apply. A more flexible statistic,
which will work for any number of groups and any number of variables, is
the one known as Wilk’s lambda; it is symbolized as Λ (an uppercase Greek
letter, corresponding to the Roman L), and calculated simply as: Λ =
SSW/SStotal. This statistic should remind you of η2 (eta squared). In fact, Λ =
1 − η2. Just as the significance of η2 can be tested with an F ratio, so can Λ.
In the simple case of only two groups and P variables, Wilks’ Λ can be tested
in an exact way with the following F ratio:
1 − Λ n1 + n2 − P − 1
F = Λ
P
Formula 22.13
The critical F is based on P and n1 + n2 − P − 1 degrees of freedom. The
ratio of 1 − Λ to Λ is equal to SSbet / SSW, and when this ratio is multiplied by
the ratio of df’s as in Formula 22.13, the result is the familiar ratio, MSbet/MSW,
that you know from the one-way ANOVA and gives the same value as Formula
22.12. [In the two-group case, Λ = df/(T2 + df) where df = n1 + n2 − 2.]
The problem that you encounter as soon as you have more than two
groups (and more than one discriminator variable) is that more than one
discriminant function can be found. If you insist that the scores from each
of the discriminant functions you find are completely uncorrelated with
those from each and every one of the others (and we always do), there is, fortunately, a limit to the number of discriminant functions you can find for
any given MANOVA problem. The maximum number of discriminant functions, s, cannot be more than P or k − 1 (where k is the number of groups),
whichever is smaller. We can write this as s = min(k − 1, P). Unfortunately,
there is no universal agreement on how to test these discriminant functions
for statistical significance.
Consider the case of three groups and two variables. The first discriminant function that is found is the combination of the two variables that yields
the largest possible F ratio in an ordinary one-way ANOVA. This combination
of variables provides what is called the largest or greatest characteristic root
(gcr). However, it is possible to create a second discriminant function whose
scores are not correlated with the scores from the first function. (It is not possible to create a third function with scores uncorrelated with the first two; we
733
Figure 22.17
Plot of Two Groups of
Students Measured on
Two Different Variables
Including the
Discriminant Function
and the Group Centroids
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 734
734
Chapter 22 • Three-Way ANOVA
know this because, for this case, s = 2). Each function corresponds to its own
lowercase lambda (λ), which can be tested for significance.
Having more than one little lambda to test is analogous to having several pairs of means to test in a one-way ANOVA—there is more than one way
to go about it while trying to keep Type I errors down and to maximize
power at the same time. The most common approach is to form an overall
Wilk’s Λ by pooling (through multiplication) the little lambdas and then to
test Λ with an F ratio (the F ratio will follow an F distribution only approximately if you have more than three groups and more than two DVs). Pillai’s
trace (sometimes called the Pillai-Bartlett statistic) is another way of pooling
the little lambdas, and it leads to a statistic that appears to be more robust
than Wilk’s Λ with respect to violations of the assumptions of MANOVA (see
the following). Therefore, statisticians tend to prefer Pillai’s trace especially
when sample sizes are small or unequal. A third way to pool the little lambdas, Hotelling’s trace (sometimes called the Hotelling-Lawley trace), is
reported and tested by the common statistical software packages but is
rarely used. All three of the statistics just described lead to similar F ratios in
most cases, and it is not very common to attain significance with one but not
the others. All of these statistics (including the one to be described next) can
be calculated when there are only two groups, but in that case they all lead
to exactly the same F ratio.
A different approach to testing a multigroup MANOVA for significance
is to test only the gcr for significance, usually with Roy’s largest root test.
Unfortunately, it is possible to attain significance with one of the “multipleroot” tests previously described, even though the gcr is not significant. In
such a case, it is quite difficult to pinpoint the source of your multivariate
group differences, which is why some authors of statistical texts (notably,
Harris, 1985) strongly prefer gcr tests. The gcr test is a reasonable alternative when its assumptions are met and when the largest root (corresponding
to the best discriminant function) is considerably larger than any of the
others. But consider the following situation. The three groups are normals,
neurotics, and psychotics, and the two variables are degree of orientation to
reality and inclination to seek psychotherapy. The best discriminant function might consist primarily of the reality variable, with normals and neurotics being similar but very different from psychotics. The second function
might be almost as discriminating as the first, but if it was weighted most
heavily on the psychotherapy variable, it would be discriminating the neurotics from the other two groups. When several discriminant functions are
about equally good, a multiple-root test, like Wilk’s lambda or Pillai’s trace,
is likely to be more powerful than a gcr test.
The multiple-root test should be followed by a gcr test if you are interested in understanding why your groups differ; as already mentioned, a significant multiple-root test does not guarantee a significant gcr test (Harris,
1985). A significant gcr test, whether or not it is preceded by a multiple root
test, can be followed by a separate test of the next largest discriminant function, and so on, until you reach a root that is not significant. (Alternatively,
you can recalculate Wilk’s Λ without the largest root, test it for significance,
and then drop the second largest and retest until Wilk’s Λ is no longer significant.) Each significant discriminant function (if standardized) can be
understood in terms of the weight each of the variables has in that function
(or the discriminant loading of each variable). I’ll say a bit more about this
in the section on discriminant analysis.
Any ANOVA design can be performed as a MANOVA; in fact, factorial
MANOVAs are quite common. A different set of discriminant functions is
found for each main effect, as well as for the interactions. A significant two-
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 735
Section C • Optional Material
way interaction might be followed by separate one-way MANOVAs (i.e., simple main effects), and a significant main effect in the absence of interaction
might be followed by pairwise comparisons. However, even with a (relatively) simple one-way multigroup MANOVA, follow-up tests can get quite
difficult to interpret when several discriminant functions are significant.
For instance, if you redid the MANOVA for each pair of groups in the
normals/neurotics/psychotics example, you would get a very different discriminant function in each case. The situation can get even more complex
and difficult to interpret as the number of groups and variables increases.
However, the most common way of following up a significant one-way
MANOVA is with separate univariate ANOVAs for each DV. Then any significant univariate ANOVA can be followed up as described in Chapter 13. Of
course, this method of following up a MANOVA is appropriate when you are
not interested in multivariate relations and are simply trying to control Type
I errors. With respect to this last point, bear in mind that following a significant MANOVA with separate tests for each DV involves a danger analogous
to following a significant ANOVA with protected t tests. Just as adding one
control or other kind of group that is radically different from the others (i.e.,
the complete null is not true) destroys the protection afforded by a significant ANOVA, adding one DV that clearly differentiates the groups (e.g., a
manipulation check) can make the MANOVA significant and thus give you
permission to test a series of DVs that may be essentially unaffected by the
independent variable.
All of the assumptions of MANOVA are analogous to assumptions with
which you should already be familiar. First, the DVs should each be normally distributed in the population and together follow a multivariate normal distribution. For instance, if there are only two DVs, they should follow
a bivariate normal distribution as described in Chapter 9 as the basis of the
significance test for linear correlation. It is generally believed that a situation analogous to the central limit theorem for the univariate case applies to
the multivariate case, so violations of multivariate normality are not serious
when the sample size is fairly large. Unfortunately, as in the case of bivariate
outliers, multivariate outliers can distort your results. Multivariate outliers
can be found just as you would in the context of multiple regression (see
Chapter 17, section B).
Second, the DVs should have the same variance in every population
being sampled (i.e., homogeneity of variance), and, in addition, the covariance of any pair of DVs should be the same in every population being sampled. The last part of this assumption is essentially the same as the
requirement, described in Chapter 16, that pairs of RM levels in a mixed
design have the same covariance at each level of the between-groups factor.
In both cases, this assumption can be tested with Box’s M test but is generally not a problem when all of the groups are the same size. It is also
assumed that no pair of DVs exhibits a curvilinear relation. These assumptions are also the basis of the procedure to be discussed next, discriminant
analysis.
Discriminant Analysis
When a MANOVA is performed, the underlying discriminant functions are
tested for significance, but the discriminant functions themselves are often
ignored. Sometimes the standardized weight or the discriminant loading of
each variable is inspected to characterize a discriminant function and better
understand how the groups can be differentiated. Occasionally, it is appropriate to go a step further and use a discriminant function to “predict” an
735
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 736
736
Chapter 22 • Three-Way ANOVA
individual’s group membership; that procedure is called discriminant analysis. Discriminant analysis (DA) is to MANOVA as linear regression and prediction is to merely testing the significance of the linear relation between
two variables. As in the case of MANOVA, DA is much simpler in the twogroup case, so that is where I’ll begin.
In a typical two-group MANOVA situation you might be comparing
right- and left-handers on a battery of cognitive tasks, especially those tasks
known to be lateralized in the brain, to see if the two groups are significantly
different. You want to see if handedness has an impact on cognitive functioning. The emphasis of discriminant analysis is different. In discriminant
analysis you want to find a set of variables that differentiates the two groups
as well as possible. You might start with a set of variables that seem likely to
differ between the two groups and perform a stepwise discriminant analysis
in which variables that contribute well to the discrimination (based on statistical tests) are retained and those which contribute little are dropped (this
procedure is similar to stepwise regression, which is described in much detail
in Chapter 17). The weights of the resulting standardized (i.e., variables are
converted to z scores) discriminant function (also called a canonical variate
because discriminant analysis is a special case of canonical correlation), if
significant, can be compared to get an idea of which variables are doing the
best job of differentiating the two groups. (The absolute sizes of the weights,
but not their relation to one other, are arbitrary and are usually “normalized”
so that the squared weights sum to 1.0). Unfortunately, highly correlated DVs
can lead to unreliable and misleading relative weights, so an effort is generally made to combine similar variables or delete redundant ones.
Depending on your purpose for performing a discriminant analysis, you
may want to add a final step: classification. This is fairly straightforward in
the two-group case. If you look again at Figure 22.17, you can see that the
result of a discriminant analysis with two groups is to create a new dimension upon which each subject has a score. It is this dimension that tends to
maximize the separation of the two groups while minimizing variability
within groups (eta squared is maximized, which is the same in this case as
R2, where R is both the canonical correlation and the coefficient of multiple
correlation). This dimension can be used for classification by selecting a cutoff score; every subject below the cutoff score is classified as being in one
group, whereas everyone above the cutoff is classified as being in the other
group. The simplest way to choose a cutoff score is to halve the distance
between the two group centroids. In Figure 22.17 this cutoff score results in
two subjects being misclassified. The most common way to evaluate the
success of a classification scheme is to calculate the rate of misclassification.
If the populations represented by your two groups are unequal in size, or
there is a greater cost for one type of classification error than the other (e.g.,
judging erroneously from a battery of tests that a child be placed in special education may have a different “cost” from erroneously denying special education
to a child who needs it), the optimal cutoff score may not be in the middle.
There are also alternative ways to make classifications, such as measuring the
Mahalanobis distance between each subject and each of the two centroids. But,
you may be wondering, why all this effort to classify subjects when you already
know what group they are in? The first reason is that the rate of correct classification is one way of evaluating the success of your discriminant analysis.
The second reason is analogous to linear regression. The regression
equation is calculated for subjects for whom you know both X and Y scores,
but it can be used to predict the Y scores for new subjects for whom you only
have X scores. Similarly, the discriminant function and cutoff score for your
present data can be used to classify future subjects whose correct group is
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 737
Section C • Optional Material
737
not known. For instance, you measure a sample of babies on a battery of
perceptual tests, wait a few years until it is known which children have
learning difficulties, and perform a discriminant analysis. Then new babies
are tested and classified according to the results of the original study. If your
misclassification rate is low, you can confidently recommend remedial
measures for babies classified in the (future) learning disabled group and
perhaps eliminate the disability before the child begins school. Of course,
classification can also be used retroactively, for instance to classify early
hominids as Neanderthal or Cro Magnon based on various skeletal measures
(assuming there are some specimens you can be relatively sure about).
With two significant discriminant functions and three groups, the centroids of the groups will not fall on one straight line, but they can be located
on a plane formed by the two (orthogonal) discriminant functions. Figure
22.18 depicts the normals/neurotics/psychotics example; each discriminant
function is named after the DV that carries the largest weight on it. Instead
of a cutoff score, classification is made by assigning a region around each
group, such that the regions are mutually exclusive and exhaustive (i.e.,
every subject must land in one, but only one, region). The regions displayed
in Figure 22.18 form what is called a territorial map. Of course, it becomes
impossible to draw the map in two dimensions as you increase the number
of groups and the number of variables, but it is possible to extend the general principle of classification to any number of dimensions. Unfortunately,
having several discriminant functions complicates the procedure for testing
their significance, as discussed under the topic of MANOVA.
Using MANOVA to Test Repeated Measures
There is one more application of MANOVA that is becoming too popular to
be ignored: MANOVA can be used as a replacement for the univariate oneway RM ANOVA. To understand how this is done, it will help to recall the
direct-difference method for the matched t test. By creating difference scores,
a two-sample test is converted to a one-sample test against the null hypothesis that the mean of the difference scores is zero in the population. Now sup-
Inclination to Seek Psychotherapy
Figure 22.18
High
Centroid for Neurotics
Orientation to Reality
Low
High
Centroid for Psychotics
Centroid for Normals
Low
A Territorial Map of
Three Groups of Subjects
Measured Along Two
Discriminant Functions
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 738
738
Chapter 22 • Three-Way ANOVA
pose that your RM factor has three levels (e.g., before, during, and after some
treatment). You can create two sets of difference scores, such as beforeduring (BD), and during-after (DA). (The third difference score, before-after,
would be exactly the same as the sum of the other two—because there are
only two df, there can only be two sets of nonredundant difference scores.)
Even though you now have two dependent variables, you can still perform a
one-sample test to determine whether your difference scores differ significantly from zero. This can be accomplished by performing a one-sample
MANOVA. The MANOVA procedure will “find” the weighted combination of
BD and DA that produces a mean score as far from zero as possible.
Finding the best weighted average of the difference scores sounds like
an advantage over the ordinary RM ANOVA, which just deals with ordinary
averages, and it can be—but you pay a price for the “customized” combinations of MANOVA. The price is a considerable loss of degrees of freedom in
the error term. For a one-way RM ANOVA, dferror equals (n − 1)(P − 1), where
n is the number of different subjects (or matched blocks), and P is the number of levels of the RM factor. If you perform the analysis as a one-sample
MANOVA on P − 1 difference scores, dferror drops to n − P + 1 (try a few values for n and P, and you will notice the differences). In fact, you cannot use
the MANOVA approach to RM analysis when the number of subjects is less
than the number of RM levels (i.e., n < P); your error term won’t have any
degrees of freedom. And when n is only slightly greater than P, the power of
the MANOVA approach is usually less than the RM ANOVA.
So, why is the MANOVA alternative strongly encouraged by many statisticians and becoming increasingly popular? Because MANOVA does not
take a simple average of the variances of the possible difference scores and
therefore does not assume that these variances are all the same (the sphericity assumption), the MANOVA approach is not vulnerable to the Type I error
inflation that occurs with RM ANOVA when sphericity does not exist in the
population. Of course, there are adjustments you can make to RM ANOVA,
as you learned in Chapter 15, but now that MANOVA is so easy to use on RM
designs (thanks to recent advances in statistical software), it is a reasonable
alternative whenever your sample is fairly large. As I mentioned in Chapter
15, it is not an easy matter to determine which approach has greater power
for fairly large samples and fairly large departures from sphericity. Consequently, it has been suggested that in such situations both procedures be
routinely performed and the better of the two accepted in each case. This is
a reasonable approach with respect to controlling Type I errors only if you
use half of your alpha for each test (usually .025 for each), and you evaluate
the RM ANOVA with the ε adjustment of the df.
Complex MANOVA
The MANOVA approach can be used with designs more complicated than the
one-way RM ANOVA. For instance, in a two-way mixed design, MANOVA can
be used to test the main effect of the RM factor, just as described for the oneway RM ANOVA. In addition, the interaction of the mixed design can be tested
by forming the appropriate difference scores separately for each group of subjects and then using a two- or multigroup (i.e., one-way) MANOVA. A significant one-way MANOVA indicates that the groups differ in their level-to-level
RM differences, which demonstrates a group by RM interaction. The MANOVA
approach can also be extended to factorial RM ANOVAs (as described at the
end of Section A in this chapter) and designs that are called doubly multivariate. The latter design is one in which a set of DVs is measured at several points
in time or under several different conditions within the same subjects.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 739
Section C • Optional Material
1. In the simplest form of multivariate analysis of variance (MANOVA)
there are two independent groups of subjects measured on two dependent variables. The MANOVA procedure finds the weighted combination
of the two DVs that yields the largest possible t value for comparing the
two groups. The weighted combination of (two or more) DVs is called
the discriminant function, and the t value that is based on it, when
squared, is called Hotelling’s T2.
2. T2 can be converted into an F ratio for significance testing; the larger the
number of DVs that contributed to T2, the more the F ratio is reduced
before testing. The T2 value is a product of (half) the sample size, and an
effect size measure like g (called the Mahalanobis distance), which
measures the multivariate separation between the two groups.
3. When there are more than two independent groups, the T2 statistic is usually replaced by Wilks’ Λ, the ratio of error variability (i.e., SSW) to total
variability (you generally want Λ to be small). However, when there are
more than two groups and more than one DV, there is more than one discriminant function that can be found (the maximum number is one less
than the number of groups or the number of DVs, whichever is smaller)
and therefore more than one lambda to calculate. The most common way
to test a MANOVA for significance is to pool the lambdas from all the possible discriminant functions and test the pooled Wilks’ Λ with an approximate F ratio (Pillai’s trace is a way of combining the lambdas that is more
robust when the assumptions of MANOVA are violated).
4. When there are more than one discriminant function that can be found,
the first one calculated is the one that produces the largest F ratio; this
one is called the greatest characteristic root (gcr). An alternative to testing
the pooled lambdas is to test only the gcr (usually with Roy’s test). The
gcr test has an advantage when one of the discriminant functions is
much larger than all of the others. If, in addition to finding out whether
the groups differ significantly, you want to explore and interpret the discriminant functions, you can follow a significant gcr test by testing successively smaller discriminant functions until you come to one that is not
significant. Alternatively, you can follow a significant test of the pooled
lambdas by dropping the largest discriminant function, retesting, and
continuing the process until the pooled lambda is no longer significant.
5. The most common use of MANOVA is to control Type I errors when testing several DVs in the same experiment; a significant MANOVA is then
followed by univariate tests of each DV. However, if the DVs are virtually
uncorrelated, or one of the DVs very obviously differs among the
groups, it may be more legitimate (and powerful) to skip the MANOVA
test and test all of the DVs separately, using the Bonferroni adjustment.
Another use for MANOVA is to find combinations of DVs that discriminate the groups more efficiently than any one DV. In this case you want
to avoid using DVs that are highly correlated because this will lead to
unreliable discriminant functions.
6. If you want to use a set of DVs to predict which of several groups a particular subject is likely to belong to, you want to use a procedure called
discriminant analysis (DA). DA finds discriminant functions, as in
MANOVA, and then uses these functions to create territorial maps,
regions based on combinations of the DVs that tend to maximally capture the groups. With only two groups, a cutoff score on a single discriminant function can be used to classify subjects as likely to belong to
one group or the other (e.g., high school dropouts or graduates). To the
extent that the groups tend to differ on the discriminant function, the
rate of misclassification will be low, and the DA will be considered suc-
739
C
SUMMARY
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 740
740
Chapter 22 • Three-Way ANOVA
cessful. DA can also be used as a theoretical tool to understand how
groups differ in complex ways involving several DVs simultaneously.
7. It may not be efficient or helpful to use all of the DVs you have available
for a particular discriminant analysis. There are procedures, such as
stepwise discriminant analysis, that help you systematically to find the
subset of your DVs that does the best job of discriminating among your
groups. These stepwise procedures are similar to the procedures for
stepwise multiple regression.
8. The MANOVA procedure can be used as a substitute for the one-way RM
ANOVA by forming difference scores (between pairs of RM levels) and
then finding the weighted combination of these difference scores that
best discriminates them from zero (the usual expectation under the null
hypothesis). Because MANOVA does not require the sphericity assumption, the df does not have to be conservatively adjusted. However, in the
process of “customizing” the combination of difference scores,
MANOVA has fewer degrees of freedom available than the corresponding RM ANOVA [n − P + 1 for MANOVA, but (n − 1)(P − 1) for RM
ANOVA]. MANOVA cannot be used in place of RM ANOVA when there
are fewer subjects than treatment levels, and MANOVA is not recommended when the number of subjects is only slightly larger than the
number of treatments. However, when the sample size is relatively large,
MANOVA is likely to have more power than RM ANOVA, especially if the
sphericity assumption does not seem to apply to your data.
EXERCISES
1. In a two-group experiment, three dependent
variables were combined to give a maximum
t value of 3.8.
a. What is the value of T 2?
b. Assuming both groups contain 12 subjects
each, test T 2 for significance.
c. Find the Mahalanobis distance between
these two groups.
d. Recalculate parts b and c if the sizes of the
two groups are 10 and 20.
2. In a two-group experiment, four dependent
variables were combined to maximize the
separation of the groups. SSbet = 55 and SSW =
200.
a. What is the value of Λ?
b. Assuming one group contains 20 subjects
and the other 25 subjects, test Λ for significance.
c. What is the value of T 2?
d. Find the Mahalanobis distance between
these two groups.
3. Nine men and nine women are tested on two
different variables. In each case, the t test
falls short of significance; t = 1.9 for the first
DV, and 1.8 for the second. The correlation
between the two DVs over all subjects is zero.
a. What is the value of T 2?
b. Find the Mahalanobis distance between
these two groups.
c. What is the value of Wilks’ Λ?
d. Test T 2 for significance. Explain the
advantage of using two variables rather
than one to discriminate the two groups
of subjects.
4. What is the maximum number of (orthogonal) discriminant functions that can be
found when
a. There are four groups and six dependent
variables?
b. There are three groups and eight dependent variables?
c. There are seven groups and five dependent variables?
5. Suppose you have planned an experiment in
which each of your 12 subjects is measured
under six different conditions.
a. What is the df for the error term if you
perform a one-way RM ANOVA on your
data?
b. What is the df for the error term if you
perform a MANOVA on your data?
6. Suppose you have planned an experiment in
which each of your 20 subjects is measured
under four different conditions.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 741
Section C • Optional Material
a. What is the df for the error term if you
perform a one-way RM ANOVA on your
data?
b. What is the df for the error term if you
perform a MANOVA on your data?
The SS components for the interaction effects of the three-way ANOVA with
independent groups.
a.
b.
c.
d.
SSA × B = SSAB − SSA − SSB
Formula 22.1
SSA × C = SSAC − SSA − SSC
SSB × C = SSBC − SSB − SSC
SSA × B × C = SSABC − SSA × B − SSB × C − SSA × C − SSA − SSB − SSC
The df components for the three-way ANOVA with independent groups:
a.
b.
c.
d.
e.
f.
g.
h.
dfA = a − 1
dfB = b − 1
dfC = c − 1
dfA × B = (a − 1)(b − 1)
dfA × C = (a − 1)(c − 1)
dfB × C = (b − 1)(c − 1)
dfA × B × C = (a − 1)(b − 1)(c − 1)
dfW = abc (n − 1)
Formula 22.2
The SS for the between-groups error term of the three-way ANOVA with one
RM factor:
SSW = SSbet-S − SSAB
Formula 22.3
The within-subjects portion of the total sums of squares in a three-way
ANOVA with one RM factor:
SSW − S = SStotal − SSbet-S
Formula 22.4
The SS for the within-subjects error term of the three-way ANOVA with one
RM factor:
SSS × R = SSW − S − SSR − SSA × R − SSB × R − SSA × B × R
Formula 22.5
The df components for the three-way ANOVA with one RM factor.
a.
b.
c.
d.
e.
f.
g.
h.
i.
dfA = a − 1
dfB = b − 1
dfA × B = (a − 1)(b − 1)
dfR = c − 1
dfA × R = (a − 1)(c − 1)
dfB × R = (b − 1)(c − 1)
dfA × B × R = (a − 1)(b − 1)(c − 1)
dfW = ab(n − 1)
dfS × R = dfW × dfR = ab(n − 1)(c − 1)
741
Formula 22.6
KEY
FORMULAS
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 742
742
Chapter 22 • Three-Way ANOVA
The F ratios for the three-way ANOVA with one RM factor:
a.
MSA
FA = MSW
Formula 22.7
MSB
b. FB = MSW
c.
MSA × B
FA × B = MSW
d.
MSR
FR = MSS × R
e.
MSA × R
FA × R = MSS × R
MSB × R
f. FB × R = MSS × R
g.
MSA × B × R
FA × B × R = MSS × R
The SS for the between-groups error term of the three-way ANOVA with two
RM factors:
SSW = SSS − SSA
Formula 22.8
The SS components for the within-subjects error terms of the three-way
ANOVA with two RM factors:
a.
SSQ × S = SSQS − SSQ − SSS − SSA × Q
Formula 22.9
b. SSR × S = SSRS − SSR − SSS − SSA × R
c. SSQ × R × S = SStotal − SSAQR − SSW − SSQ × S − SSR × S
The df components for the three-way ANOVA with two RM factors:
a.
b.
c.
d.
e.
f.
g.
h.
i.
j.
k.
dfA = a − 1
dfQ = q − 1
dfR = r − 1
dfA × Q = (a − 1)(q − 1)
dfA × R = (a − 1)(r − 1)
dfQ × R = (q − 1)(r − 1)
dfA × Q × R = (a − 1)(q − 1)(r − 1)
dfW = a(n − 1)
dfQ × S = dfQ × dfW = a(q − 1)(n − 1)
dfR × S = dfR × dfW = a(r − 1)(n − 1)
dfQ × R × S = dfQ × dfR × dfW = a(q − 1)(r − 1)(n − 1)
Formula 22.10
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 743
Key Formulas
•
Optional Material
The F ratios for the three-way ANOVA with two RM factors:
a.
MSA
FA = MSW
b.
MSQ
FQ = MSQ × S
c.
MSR
FR = MSR × S
Formula 22.11
MSA × Q
d. FA × Q = MSQ × S
e.
MSA × R
FA × R = MSR × S
f.
MSQ × R
FQ × R = MSQ × R × S
g.
MSA × Q × R
FA × Q × R = MSQ × R × S
The F ratio for testing the significance of T2 calculated for P dependent variables and two independent groups:
n1 + n2 − P − 1
F = T2
P(n1 + n2 − 2)
Formula 22.12
The F ratio for testing the significance of Wilks’ lambda calculated for P
dependent variables and two independent groups:
1 − Λ n1 + n2 − P − 1
F = Λ
P
Formula 22.13
743
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 744
744
Chapter 22 • Three-Way ANOVA
REFERENCES
Banaji, M. R., & Hardin, C. D. (1996). Automatic stereotyping. Psychological
Science, 7, 136–141.
Bruder, G. E., Stewart, J. W., Mercier, M. A., Agosti, V., Leite, P., Donovan, S.,
& Quitkin, F. M. (1997). Outcome of cognitive-behavioral therapy for
depression: Relation to hemispheric dominance for verbal processing.
Journal of Abnormal Psychology, 106, 138–144.
Cole, D. A., Maxwell, S. E., Arvey, R., & Salas, E. (1994). How the power of
MANOVA can both increase and decrease as a function of the intercorrelations among the dependent variables. Psychological Bulletin, 115,
465–474.
Harris, R. J. (1985). A primer of multivariate statistics (2nd ed.). Orlando,
Florida: Academic Press.
Hays, W. L. (1994). Statistics (5th ed.). New York: Harcourt Brace College
Publishing.
Huynh, H., & Mandeville, G. K. (1979). Validity conditions in repeated measures designs. Psychological Bulletin, 86, 964–973.
Marlowe, C. M., Schneider, S. L., & Nelson, C. E. (1996). Gender and attractiveness biases in hiring decisions: Are more experienced managers less
biased? Journal of Applied Psychology, 81, 11–21.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 745
Appendix • Answers to the Odd-Numbered Exercises
745
CHAPTER 22
d) For Fyear and Fsize × year, conservative F.05 (1, 18) =
4.41; for Fsystem × year and Fsize × system × year, conservative
F.05 (2, 18) = 3.55. All of the conclusions involving
RM factors will be affected by not assuming that
sphericity holds, as none of these tests are significant at the .05 level once df’s are adjusted by
lower-bound epsilon. It is recommended that conclusions be determined after adjusting df’s with an
exact epsilon calculated by statistical software.
Section A
1.
a & b) Femotion = 74.37/14.3 = 5.2, p < .01, η = .122
Frelax = 64.4/14.3 = 4.5, p < .05, η2 = .039
Fdark = 31.6/14.3 = 2.21, n.s. , η2 = .019
Femo × rel = 55.77/14.3 = 3.9, p < .05, η2 = .095
Femo × dark = 17.17/14.3 = 1.2, n.s, η2 = .031.
Frel × dark = 127.3/14.3 = 8.9, p < .01, η2 = .074
Femo × rel × dark = 25.73/14.3 = 1.8, n.s., η2 = .046
2
Assuming that a moderate effect size is about .06
(or 6%), the main effect of emotion is more than
moderate, as are the two-way interactions of emotion × relaxation and relaxation × dark.
3.
a)
Source
Drug
Therapy
Depression
Drug × Therapy
Therapy ×
Depression
Drug ×
Depression
Drug × Therapy
× Depress
Within-groups
SS
496.8
32.28
36.55
384.15
df
MS
3 165.6
2 16.14
1
6.55
6 64.03
F
p
60.65
5.91
13.4
23.45
< .001
< .01
< .01
< .001
31.89
2
15.95
5.84
< .05
20.26
3
6.75
2.47
n.s.
10.2
131
6
48
1.7
2.73
.62
n.s.
Section B
1. a) Fstyle = 25.2/3.5 = 7.2, p < .01
Fspeaker = 12.9/3.5 = 3.69, n.s.
Fissue = 3.53/2.2 = 1.6, n.s.
Fstyle × speaker = 10.5/3.5 = 3.0, n.s.
Fstyle × issue = 12.1/2.2 = 5.5, p < .01
Fspeaker × issue = 1.767/2.2 = .80, n.s.
Fstyle × speaker × issue = 2.417/2.2 = 1.1, n.s.
3.
b) For Fissue and Fspeaker × issue, conservative F.05 (1, 54)
= 4.01; for Fstyle × issue and Fstyle × speaker × issue, conservative F.05 (2, 54) = 4.01. None of the conclusions
involving RM factors will be affected.
a)
Source
b) Although there are some small differences
between the two graphs, indicating that the threeway interaction is not zero, the two graphs are
quite similar. This similarity suggests that the
three-way interaction is not large, and is probably
not significant. This observation is consistent with
the F ratio being less than 1.0 for the three-way
interaction in this example.
c) You could begin by exploring the large drug by
therapy interaction , perhaps by looking at the
simple effect of therapy for each drug. Then you
could explore the therapy by depression interaction , perhaps by looking at the simple effect of
depression for each type of therapy.
d) L = [(11.5 − 8.7) − (11 − 14) ] − [ (19 − 14.5) −
(12 − 10) ] = [2.8 − (-3)] − [4.5 − 2] = 5.8 − 2.5 = 3.3;
SScontrast = nL2 / Σc2 = 3 (3.3)2 / 8 = 32.67 / 8 = 4.08375;
Fcontrast = 4.08375 / 2.73 = 1.5 (not significant, but
better than the overall three-way interaction).
5. a) Fdiet = 201.55 / 29.57 = 6.82, p < .05
Ftime = 105.6 / 11.61 = 9.1, p < .01
Fdiet × time = 8.67 / 7.67 = 1.13, n.s.
b) conservative F.05 (1, 5) = 6.61; given the usual
.05 criterion, none of the three conclusions will be
affected (the main effect of time is no longer significant at the .01 level, but it is still significant at
the .05 level).
Between-Subjects
Size
System
Size × System
Within-groups
Within-Subjects
Year
Size × Year
System × Year
Size × System ×
Year
Subject × Year
SS
df
MS
21.1
1 21.1
61.6
2 30.8
1.75 2
.88
52.1 18 2.89
F
p
7.29
10.65
.30
<.05
<.01
>.05
4.36
4.70
6.17
3
3
6
1.46
1.57
1.03
2.89
3.11
2.04
<.05
<.05
>.05
9.83
27.19
6
54
1.64
.50
3.26
<.01
b) You can see that the line for the new system is
generally the highest (if you are plotting by year),
the line for the old system is lowest, and the combination is in between, producing a main effect of
system. The lines are generally higher for the large
school, producing a main effect of size. However,
the ordering of the systems is the same regardless
of size, so there is very little size by system interaction. Ratings generally go up over the years, producing a main effect of year. However, the ratings
are aberrantly high for the first year in the large
school, producing a size by year, as well as a threeway interaction. One partial interaction would
result from averaging the new and combined system and comparing to the old system across the
other intact factors.
Cohen_Chapter22.j.qxd 8/23/02 11:56 M Page 746
Chapter 22 • Three-Way ANOVA
746
image lines are clearly and consistently separate.
You can see some affect by image interaction in
that the lines are not parallel, especially due to
the medium line. There is a noticeable background by affect interaction, because for the
happy music condition, recall of happy words is
higher, whereas sad word recall is higher during
the sad background music. The main effect of
affect is not obvious with affect plotted on the
horizontal axis. The medium/low (or high/low) by
sad/neutral by background contrast appears to be
one of the largest of the possible 2 × 2 × 2 interaction contrasts.
d) The three-way interaction is not significant, so
the focus shifts to the two significant two-way
interactions: affect by image and affect by background. Averaging across the imageability levels,
one could look at the simple effects of affect for
each type of background music; averaging across
the background levels, one could look at the simple
effects of affect for each level of imageability. Significant simple effects can then be followed by
appropriate pairwise comparisons. There are other
legitimate possibilities, as well.
c) Given the significant three-way interaction, it
would be reasonable to look at simple interaction
effects—perhaps, the system by year interaction
for each school size. This two-way interaction
would likely be significant only for the large
school, and would then be followed by testing the
simple main effects of year for each system. To be
cautious about sphericity, you can use an error
term based only on the conditions included in that
follow-up test. There are other legitimate possibilities for exploring simple effects, as well.
5.
a)
Source
Between-groups
Background
Within-group
Within-subjects
Affect
Background ×
Affect
Subject ×
Affect
Image
Background ×
Image
Subject ×
Image
Affect × Image
Back × Affect
× Image
Subject × Affect
× Image
SS
df
MS
F
p
.93
42.85
1
10
.93
4.29
.22
n.s.
13.72
2
6.86
7.04
<.01
19.02
2
9.51
9.76
<.01
19.48
20
.97
131.06
2
65.53
.24
2
.12
25.59
20
1.28
18.39
4
4.60
4.71
<.01
2.32
4
.58
.59
n.s.
39.07
40
.98
51.21 <.001
Section C
.09
n.s.
b) The conservative F.05 (1, 10) = 4.96 for all of the
F’s involving an RM factor (i.e., all F’s except the
main effect of background music). The F for the
affect by image interaction is no longer significant
with a conservative adjustment to df; a more exact
adjustment of df is recommended in this case.
None of the other conclusions are affected (except
that the main effect of affect and its interaction
with background music are significant at the .05,
instead of .01 level after the conservative adjustment).
c) If you plot affect on the X axis, you can see a
large main effect of image, because the three
1.
a) T 2 = 3.82 = 14.44
b) F = 14.44 * (24 − 3 − 1) / 3 (22) = 4.376 > F.05
(3, 20) = 3.1, so T 2 is significant at the .05 level.
c) MD 2 = T 2 /n/2 = 14.44/6 = 2.407; MD = 1.55
d) F = 14.44 * (30 − 3 − 1)/3 (28) = 4.47; harmonic
mean of 10 and 20 = 13.33, MD 2 = 14.44/13.33/2 =
2.167; MD = 1.47
3.
a) R2 (the sum of the two rpb2’s) = .184 + .168 = .352;
T 2 = 16 * [.352/(1 − .352)] = 8.69
b) MD 2 = 8.69/4.5 = 1.93; MD = 1.39
c) Λ = 16/(8.69 + 16) = .648
d) F = (15/32) * 8.69 = 4.07 > F.05 (2, 15) = 3.68, so
T 2 is significant at the .05 level. As in multiple
regression with uncorrelated predictors, each DV
captures a different part of the variance between
the two groups; together the two DV’s account for
much more variance than either one alone.
5.
a) df = (6 − 1) (12 − 1) = 5 * 11 = 55
b) df = 12 − 6 + 1 = 7
Download