Explaining Psychological Statistics (2nd Ed.) by

advertisement
Explaining Psychological Statistics (2nd Ed.) by Barry H. Cohen
Chapter 12 B Section D
Adjusting the One-way ANOVA for Heterogeneity of Variance
In section C of this chapter, I mentioned the possibility of adjusting the one-way ANOVA when
the sample sizes are not all equal and the sample variances are so different that a test of homogeneity of
variance (HOV) reaches statistical significance (some conservative statisticians recommend using an
alpha of .1 or even .25 for the HOV test, especially when dealing with small samples). In particular, I
mentioned an adjustment devised by Welch (1951). Later in this section I will discuss that particular
procedure, but before I do, I will describe a simpler alternative published by Brown and Forsythe
(1974).
The denominator of the ordinary one-way ANOVA, MSW, is simply a weighted average of all of
the sample variances (a simple average if all of the samples are the same size). This pooling of sample
variances, a straightforward extension of the pooled-variance in a t test, is justified only if you can
assume that all of the populations underlying your study have the same amount of variance. If it turns
out that your larger samples come from populations with smaller variances than do your smaller samples
(and vice versa), even when the null hypothesis is true (i.e., all of the population means are actually
equal to each other), your chance of rejecting the null (and therefore committing a Type I error) could be
considerably higher than alpha B if you are using the ordinary one-way ANOVA. If the smaller samples
have the smaller variances, the ordinary ANOVA has less power than a properly adjusted version.
Finally, if all of the samples are the same size, even large differences in population variances have little
effect on the Type I error rate of the ordinary ANOVA, so heterogeneity of variance is almost always
ignored in this case.
It is true that extremely large differences in population variances will affect an ANOVA even
when all the n=s are equal, but if you have evidence that the population variances are very different, why
would you even bother testing the hypothesis that all of the population means are the same? First, you
should think about why the variances might be so different. Depending on how you answer this
question, it may be appropriate to transform your data, trim the extreme scores from your data (and then
use special procedures for robust statistics, as described by Wilcox, 2001), or abandon ANOVA
completely (especially if your samples are fairly small), in favor of a nonparametric version, based on
rank-ordering your data (see Chapter 21, section C). For the most part, psychological researchers tend
to use one of the methods just mentioned if the underlying distributions seem to be quite skewed or
otherwise very different from normal, and use the ordinary ANOVA if the distributions seem fairly
normal, even if there seem to be rather large differences in variance. Therefore, the two adjusted
ANOVA procedures described below are rarely used or reported in the psychological literature. The
fact that neither procedure is consistently more accurate than the other, and that there are no simple rules
that suggest which procedure to use for each possible pattern of sample sizes and variances, further
contributes to their lack of popularity, even though both alternatives (Welch and Brown-Forsythe) are
now available as options in SPSS for Windows. However, rather than avoiding these procedures due to
a lack of understanding, a careful reading of the material below may allow you to make a more informed
decision about their use, and deepen your understanding of ANOVA in the process.
Brown-Forsythe Formula
The difference between the adjusted F ratio devised by Brown and Forsythe (1974) and the
ordinary F is quite analogous to the difference between the separate-variance (s-v) and pooled-variance
(p-v) t tests. For instance, in both cases it is only the denominator (i.e., error term) of the formula that
2
changes. In fact, the error term of the Brown-Forsythe F (I=ll refer to it as F=) is quite similar (though not
identical) to what you would get from dividing each sample variance by its corresponding sample size
rather than pooling the variances. The formula for F= consists of placing the usual value for MSbet over
the following error term (MSW=):
Σ (1 - ni ) si2
NT
MS W ′ =
df bet
Formula 12.21
where the summation goes from 1 to k, which is the number of groups (i.e., levels of the IV). When all
of the variances are equal, the top part of the formula reduces to Σ (1 - ni / NT) s2 = (k - 1) s2, which is
why it must be divided by k - 1 (i.e., dfbet). A simple example involving three groups will help to
demonstrate the difference between MSW and MSW=.
Suppose we are comparing three patient groups on a psychiatric ward, and the sizes of these
groups are 10, 15, and 25. Further suppose that the variances of the three groups are 3, 6, and 8,
respectively. The ordinary MSW would equal (9*3 + 14*6 + 24*8) / 47 = 303 / 47 = 6.45. However,
MSW= will be different:
(1 MS W ′ =
10
15
25
)* 3 + (1 )* 6 + (1 )* 9
10.6
50
50
50
=
= 5.3
2
2
In this example, as the group gets larger, so does its variance; this pattern always results in MSW=
being smaller than MSW, which means that F= will be larger and more likely to attain significance than
the usual F. Consequently, the usual F is conservative in this case, and whereas it usually has less
power than F=, the conservative statistician will not object to the use of the ordinary ANOVA in this
situation. Let=s reverse the variances of the largest and smallest groups, and see what happens. The
ordinary MSW is reduced to (9*8 + 14*6 + 24*3) / 47 = 228 / 47 = 4.85, but MSW= increases:
(1 MS W ′ =
10
15
25
)* 8 + (1 )* 6 + (1 )* 3
12.1
50
50
50
=
= 6.05
2
2
Now, MSW= is larger than MSW, so F= is smaller than F. When the larger groups have the smaller
variances, using the usual ANOVA has more power, but can result in a higher Type I error rate than the
alpha you are using to make your statistical decisions. This possibility is unacceptable to the
conservative researcher. However, if you want to use F= you have to deal with the fact that even when
the null hypothesis is true, F= does not follow an F distribution with the usual df=s. Fortunately, F= has
been found to follow what is called a Aquasi-F@ distribution, which means that its distribution looks
similar to an F distribution, but the df for the error term of the F= distribution is not the usual dfw. If this
problem sounds familiar, that=s because it is closely related to the Behren=s-Fisher problem that you read
about in Chapter 7, section C. The most popular solution to that problem is called the WelchSatterthwaite (W-S) formula, which is used to adjust the df of a separate-variances t test. I did not
present that formula in Chapter 7, but I will here, because the df adjustment for F= is a logical extension
of the W-S formula, and the F= version is too complex to grasp by looking at it.
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
3
Welch-Satterthwaite degrees of freedom
When there are only two groups in the analysis, MSW= reduces to the following sum:
[1 - (n1 / NT)] s12 + [1 - (n2 / NT)] s22. With a little algebra, this expression can be rearranged like this:
(n2 / NT) s12 + (n1 / NT) s22. Because the harmonic mean of the two sample sizes, nh, is equal to:
2n1n2 / NT, MSW> can be transformed to the following form in the two-group case:
2
MS W ′ =
2
n h s1 s 2
( + )
2 n1 n2
The relation to the s-v t test should be obvious; when there are only two groups, F= equals the
square of the s-v t value. Moreover, the df associated with the denominator of F=, in the two-group case,
are the same as you would get from the W-S formula, which I present next. To make the formula easier
to read, it is usually expressed in terms of weighting factors, such that w1 = s12 / n1 and w2 = s22 / n2.
df W - S =
( w1 + w2 )2
2
2
w1 + w2
n1 - 1 n2 - 1
Formula 12.22
When the two samples are equal in size, the p-v t is the same as the s-v t, and F= is the same as F.
However, as long as the sample variances differ, dfW-S will be less than dfW, even when the n=s are
equal. I have never seen any researcher use dfW-S when the n=s are equal, but Formula 12.22 reduces to a
simple form, which can be instructive, in this special case. For n1 = n2, the formula for dfW-S reduces to:
2
( 2+ 2 )
df W - S = (n - 1) s14 s 24
s1 + s 2
In the above formula, you can see that the term n - 1 is being multiplied by a correction factor
(note that s4 is the same as squaring the variance). What is less obvious is that the maximum value of
this correction factor is 2. The maximum occurs when the variances are equal. Suppose that both
variances equal 5. The correction factor would then be: (5 + 5)2 / (52 + 52) = 102 / (25 + 25) = 100 / 50 =
2, so dfW-S = 2 (n - 1). This is the ordinary (i.e., uncorrected) df for a t test with equal n=s. Now let us
suppose that the two variances still sum to 10, but are much more disparate, say 1 and 9; the correction
factor becomes: (1 + 9)2 / (12 + 92) = 102 / (1 + 81) = 100 / 82 = 1.22. As the variances continue to
diverge, the correction factor approaches a minimum of 1, and dfW-S approaches n - 1. You may recall
from Chapter 7 that in the case of two unequal groups, the minimum for dfW-S is the smaller of n1 - 1 and
n2 - 1, while the maximum is n1 + n2 - 2. From the above example you can see that even when the n=s
are equal the error df can be minimally or maximally corrected, depending on the divergence of the
variances.
For the same reason that the above formula reduces dfW-S more and more as the variances
increasingly diverge, dfW-S gets smaller in the general case (n=s not all equal) as the w=s in Formula
12.22 grow further apart. To show what pattern of sample sizes and variances results in a greater
discrepancy between the two w=s, I=ll return to the example of the three patient groups. In the first part
of the example, when the larger groups have the larger variances, the w=s would come out to: .3, .4, and
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
4
.32. In the second part of the example, the variances of the smallest and largest groups are reversed,
leading to w=s of: .8, .4, .12. Notice how much more divergent the w=s are when the largest group has
the smallest variance (and vice versa). It should be clear that in the two-group case this pattern not only
produces a larger error term, making the s-v t smaller than the p-v t, it also yields a greater reduction in
df, thus raising the critical value, and further reducing power (but also curbing a possible inflation of the
Type I error rate, which is the point of the s-v test).
Brown-Forsythe Degrees of Freedom
As long as all of the samples are the same size, the error df for F= (i.e., dfW=) reduces to a simple
formula no matter how many groups are involved.
2
(Σ 2)
df W ′ = (n - 1) si4
Σ si
Formula 12.23
(Note that both summations go from 1 to k.) As you may have guessed, dfW= ranges from a maximum of
k (n - 1), when all the sample variances are equal, down to a minimum of n - 1, as the variances
maximally diverge. The denominator of F= also reduces to a (very) simple formula when all of the
groups are the same size:
Σ (1 MS W ′ =
n
NT
df bet
) si2
(1 =
n
) Σ si2
NT
df bet
Because 1 - (n / NT) happens to equal 1 - (1 / k), which also equals (k - 1) / k, and dfbet = k - 1, the
formula can be further simplified.
k -1 2
Σ si
Σ s2
k
= i
MS W ′ =
k -1
k
As you can see, when all of the n=s are equal, MSW= is identical to the ordinary MSW no matter
how many groups are involved B and, because F= involves no adjustment of MSbet for equal n=s, the
Brown-Forsythe F is always the same as the ordinary F when the groups are all the same size. As in the
case of two equal-sized groups, the df can still be adjusted when more than two groups are all the same
size (the greater the discrepancies among the sample variances, the more severe is the df correction), but
because the one-way ANOVA is quite robust with respect to the HOV assumption when all n=s are
equal, the df adjustment is very rarely used in this case. Whereas the Brown-Forsythe F makes no
adjustment when the n=s are equal, and makes no adjustment to MSbet, in any case, this is not true of
Welch=s formula for F, which I will label as F*. To help you understand the correction that F* makes to
MSbet, I will compare the usual weighted-means to the unweighted-means solution for the one-way
ANOVA. [I glossed over the latter procedure in section B of this chapter, because it is so rarely used in
practice. The unweighted-means approach used to be popular for two-way ANOVA (see Chapter 14,
section C), but has been almost entirely replaced by the regression approach (see Chapter 18, section
A).]
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
5
The Analysis of Unweighted Means for One-Way ANOVA
The usual formula for MSbet [Σ ni (Mi - MG )2/ (k - 1)] weighs the squared difference of each
group mean from the grand mean by the size of that group, and therefore forms the basis of what is
called Athe weighted-means@ ANOVA. Let=s apply this formula to the patient-group example with n=s of
10, 15, and 25; suppose the means of these groups are 7, 9, & 17, respectively. Then, the grand mean is:
(10*7 + 15*9 + 25*17) / NT = (70 + 135 + 425) / 50 = 630 / 50 = 12.6. So, MSbet is [10*(-5.6)2 +
15*(-3.6)2 + 25*(4.4)2] / 2 = (313.6 + 194.4 + 484) / 2 = 992 / 2 = 496. If the means of the largest and
smallest groups were reversed, the grand mean would be reduced to 9.6, and MSbet would become:
[10*(7.4)2 + 15*(-.6)2 + 25*(-2.6)2] / 2 = (547.6 + 5.4 + 169) / 2 = 722 / 2 = 361. Notice that when the
most deviant mean (i.e., 17) is associated with the largest group (n = 25), MSbet is considerably larger
than when the most deviant mean is associated with the smallest group (496 vs. 361, respectively).
The unweighted-means formula for MSbet (you can think of it as the Aequally-weighted@ formula)
is identical to the formula for equal n=s, except that the harmonic mean of the sample sizes (nh) replaces
An@: unweighted MSbet = nh s2. Using Formula 13.15, nh for 10, 15, and 25 is14.52, and the unbiased
variance of 7, 9, and 17 is 28, so unweighted MSbet = 14.52 * 28 = 406.6. Note that this value for MSbet
is between the two more extreme values for the weighted-means solution, and does not depend at all on
the association between means and sample sizes. The unweighted-means analysis seems to make sense
when the differences in sample sizes are accidental, so that none of the samples actually represents a
larger population, but as I mentioned earlier in this chapter, it is rarely used B which is why it is not
included in major statistical packages, like SPSS. Contributing to the lack of popularity of the method
of unweighted means is the fact that the resulting F ratio may be slightly biased in the positive direction,
increasing the Type I error rate above the alpha that is used to look up the critical F. So, why am I
mentioning this method at all? Because understanding the difference between the weighted and
unweighted approaches to MSbet can help you grasp an important difference between the Welch F (F*)
and F=. The numerators of both F* and F= are based on a weighted-means solution, but F* uses weights
that reflect not only sample sizes, but sample variances, as well. I will call the numerator of the Welch
formula Wnum, because it is not directly comparable to MSbet. However, dividing Wnum by the
denominator of the Welch formula does yield F*, which follows a quasi-F distribution that is similar but
not identical to the distribution of F=.
Welch Formula
The formula for Wnum is similar to the weighted-means formula for MSbet, but with different
weights. Instead of using only the sample sizes as weights, Wnum uses the ratio of sample size to sample
variance for each group. If wi is defined as in Formula 12.22 (i.e., wi = si2 / ni), then Wnum can be
written as:
∑
W num =
1
wi
2
( X i - X WG )
k -1
Formula 12.24
[Note that due to font difficulties, I will be using the symbol M for mean in the text, and X-bar in the
formulas.]
It makes sense to use the reciprocal of wi in the above formula, because the weights are being
applied to the numerator rather than the error term or dferror. Also, note that MWG is not the usual grand
mean, which can be found by weighing the various group means by their sample sizes, but rather a
AWelch@ grand mean that is found by using the wi>s as the weighting factors, as follows:
Explaining Psychological Statistics
∑
X WG
=
Chapter 12 (Section D) B. H. Cohen
6
1
Xi
wi
1
∑
wi
The wi>s have already been calculated for our patient-group example, so let us see how Wnum is
affected by the pattern of means, variances, and sample sizes. In the first part of the example, the larger
groups have the larger variances (and means), and the wi=s came out to: .3, .4, and .32. Therefore, MWG
equals: (3.333*7 + 2.5*9 + 3.125*17) / (3.333+2.5+3.125) = 98.956 / 8.958 = 11.05 (this is very close to
a simple average of 7, 9, and 17, because in this case the larger n=s are being divided by larger s2's
helping to cancel out the larger weights that would be given to the means of the larger groups, when
finding the ordinary grand mean). For this part of the example, Wnum is equal to:
3.333 (7 - 11.05 )2 + 2.5 (9 - 11.05 )2 + 3.125 (17 - 11.05 )2 54.67 + 10.51+ 110.63
=
= 87.91
MS bet * =
3-1
2
In the second part of the example, the variances of the smallest and largest groups are reversed
(but not the means), so the 1/wi=s are: 1.25, 2.5, 8.33, and MWG is 14.31 (now, the large group has the
small variance, and is therefore having a large effect on the Welch grand mean). In this case, Wnum is
equal to 1.25*(7.31)2 + 2.5*(5.31)2 + 8.33*(2.69)2] / 2 = (66.795 + 70.49 + 60.277) / 2 = 197.56 / 2 =
98.78. You cannot compare this value directly to MSbet, but you can compare 98.78 to 87.91; when the
larger groups have the smaller variances, the weighting factors are more discrepant (and on balance,
larger), so the pattern of means can have a greater effect. For instance, if we reverse the largest and
smallest means in this latest example, so that the smallest group not only has the largest variance but the
largest mean as well, the Welch grand mean is 8.5, and Wnum is only 54.84 (in this case, the most
discrepant mean, 17, is getting the smallest weight, 1.25. In comparison, when the larger samples have
the larger variances, the weighting factors are less discrepant, minimizing the effect of discrepant means.
For instance, in the first part of our example (when the weights are 3.333, 2.5, and 3.125), reversing the
means has little effect on the Welch grand mean, which goes from 11.05 to 11.28. Moreover, the value
of Wnum, which was 87.91 (see above) before the reversal of means, increases only slightly (to 89.65)
upon reversing the means (you might want to calculate this for yourself, as an exercise).
The denominator of the Welch formula is weighted in a manner similar to MSW=B the
denominator gets larger when the smaller groups are associated with larger variances B so F*, like F=,
tends to be conservative in this case, and more powerful when it is the larger groups that have the larger
variances (the adjustment of dfW also tends to be similar between the Welch and Brown-Forsythe
solutions). However, as we have just seen, the Welch formula can be seriously affected by whether the
most discrepant means are associated with, for instance, large groups that have small variances, or small
groups with large variances. The association of means and variances can even have an effect when all
the n=s are equal, so unlike F=, F* is not usually equal to the ordinary F when all of the samples are the
same size. Therefore, some statisticians recommend the use of F* even when the n=s are equal, if the
variances are quite discrepant, but given the reputation of the ordinary F=s robustness with respect to
heterogeneity of variance when the n=s are equal, this suggestion has little chance of being widely
adopted anytime soon.
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
7
Which Method Should I Use? B Brown-Forsythe or Welch?
Assuming that we are dealing only with fairly normal-like distributions, a few generalizations
about these modified ANOVA=s can be made. Although both F= and F* will be predictably smaller than
the usual F when larger samples consistently have smaller variances, the value of F* in this situation can
depend a good deal on whether the larger samples have the larger or the smaller means. If the small but
highly variable groups have particularly large means (this could be due to outliers), then F* tends to be
more corrected (i.e., smaller) than F=. If the small n, high s2 groups have the smaller means, then F*
tends to be larger than F= (but still less than the usual F). On the other hand, when the larger groups have
the larger variances, both F= and F* will be larger than the usual F, and though F* can still be affected by
whether or not the larger groups have the larger means, it is less susceptible to this influence, and tends
to have consistently more power than F=. It is permissible to use the ordinary F in this circumstance,
because it is on the conservative side, but you may want to opt for F* to gain extra power.
As for the adjustment in degrees of freedom, you can be sure that the dfW associated with either
F= (dfW=) or F* (dfW*) will never be larger than the ordinary dfW, and that the more the ratios of s2 to n
differ from sample to sample, the greater will be the reduction in dfW. However, dfW= seems to be more
sensitive to the association between n=s and variances than dfW*. Both of these df=s will undergo a
greater reduction when the larger groups have the smaller rather than larger variances, but the correction
tends to be more drastic for dfW= in this case. In the reverse situation (larger n with larger s2), the df
correction is milder for both tests, but in this case dfW= is likely to be larger than dfW*. Thus, dfW*
seems to be more stable than dfW=. When the larger groups have the smaller variances, the ordinary F
tends to be too liberal, but unless the discrepancies in the sample data are quite dramatic, the uncorrected
F is often used to test the one-way ANOVA, anyway. In this case, F= will be predictably smaller than F,
but F* can easily have greater power than F=, or even F, depending on the pattern of the means.
Does the above discussion help you to decide when to use each method? Not in general. Bear in
mind that to keep things simple, I used three-group examples in which the middle-sized group had the
middle-sized variance. However, if the middle-sized group were to have by far the largest (or smallest)
variance of the three groups, it would not be easy to predict how F= and F* would compare to the
ordinary F. With more than three groups, the association of sample sizes with sample variances (not to
mention sample means!) could grow even more complex. Unless there is a strong positive or negative
correlation between the sizes of the groups and their variances, there is nothing simple about the
relationships among F, F=, and F*. Tomarkin and Serlin (1986) found that F* seems to have greater
power than F= for most combinations of means, variances, and sample sizes, but Clinch and Keselman
(1982) found that F= seems to maintain better control over the Type I error rate than does F*, when the
underlying distributions are skewed. About the worst problem you can have involves distributions that
are skewed in different directions, but in that situation it doesn=t appear to make sense to simply test the
null hypothesis that all of the population means are equal. You would first need to make sense of just
what is going on with your data.
Furthermore, the two ANOVA-adjustment methods just described are not the only alternatives
when homogeneity of variance is not a reasonable assumption. For instance, Wilcox (1988)
recommends a test developed by James (1951), instead B especially when there are more than four
groups in the analysis. Unfortunately, James= Asecond-order@ method is quite complex, and not yet
available from major statistical packages. About the only clear recommendation I can make is that if
your samples exhibit a consistent pattern in which the larger samples have the smaller variances (and
vice versa), the cautious and relatively simple thing to do (if you are using SPSS or a similarly
comprehensive stats package) is to use either F= or F* (tested according to their adjusted df=s, of course),
instead of the usual ANOVA. F* is usually the more recommended of the two procedures, but you
might want to consider the association you expect between your sample sizes and your sample means,
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
8
before choosing the Welch procedure.
There is one more point I wish to emphasize. Simulation studies have shown that the various
alternatives for F do not diverge dramatically until the variance of one group is at least several times the
variance of another. However, if your samples exhibit extreme differences in variance, it simply does
not seem sensible to test the null hypothesis that the population means are all equal. Obviously,
whatever it is that distinguishes your groups, is having some effect on your data, and it would seem
incumbent upon you to explore your data further in an attempt to understand just why the variances
diverge so widely, before testing any difference in the means.
Effect-size Estimates in One-way ANOVA
Estimating the Proportion of Population Variance Accounted for
As I pointed out in section C of this chapter, eta-squared, which gives you the proportion of
variance in your DV that is accounted for by your IV B in your data, is biased; it is an overestimate of
omega-squared (ω2), the variance that would be accounted for if your study involved the entire
population from which you are sampling. Eta-squared, as expressed in Formula 12.12 (i.e., SSbet /
SStotal) can be modified to produce a much less biased estimate of ω2, as in Formula 12.14. However, if
you are reading a journal article that provides an F ratio and its associated df=s, but no indication of
effect size, it would be much more convenient to calculate eta-squared with Formula 12.13 (reproduced
below) than with Formula 12.12.
η2 =
df bet F
df bet F + df w
Formula 12.13
Of course, you get the same (biased) estimate of omega-squared from both of the formulas for
eta-squared, but whereas I showed you the bias correction for Formula 12.12 (see Formula 12.14), I did
not show the equivalent correction as applied to Formula 12.13. However, I think it would be
instructive to present the Acorrected@ version of Formula 12.13 here, as Formula 12.25.
est. ω 2 =
df bet (F - 1)
df bet (F - 1) + N T
Formula 12.25
Although both formulas always give exactly the same answer, one fact that becomes obvious in
Formula 12.25, but not in Formula 12.14, is that this estimate is not defined if F is less than 1.0 (ω2
cannot be negative). The estimate for ω2 is zero when F equals 1.0, and by convention it is set to zero
for any F below 1.0. Usually ω2 is only estimated when F is statistically significant, or was expected to
be significant (or, perhaps, if one wants to make a point about how small it is), but very rarely estimated
when F is near 1.0 (recall that an F of 1.0 is telling you that the variability of your sample means is just
about what you would expect purely from sampling error B without the contribution of any experimental
effect). Note that when there are only two groups, dfbet equals 1, and F equals t2, so Formula 12.25
reduces to the following:
2
2
t -1
t -1
= 2
est. ω = 2
t - 1 + ( df W + 2) t + df W + 1
2
which is identical to Formula 10.15 (the unbiased estimate of omega-squared associated with a two-
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
9
group t test). To illustrate the use of Formula 12.25, I will use the example summarized in Table 12.3 in
section B. However, suppose that you don=t have access to the full information in Table 12.3; you have
simply seen this phrase in a journal article: A...the difference in means approached statistical
significance, F (2, 12) = 3.4, p < .07...@, and you want to obtain an (almost) unbiased estimate of omegasquared. You know immediately that dfbet = 2 and dfW = 12, so dftotal = 14 and NT therefore equals 15.
Inserting these values into Formula 12.25, we obtain the following estimate of omega-squared:
est. ω 2 =
2 (3.4 - 1)
4.8
=
= .242
2 (3.4 - 1) + 15 19.8
Had I kept three digits to the right of the decimal point when illustrating the use of Formula 12.14 in
section C, the value above is exactly what I would have obtained, although I was not using the F value in
section C, but rather the appropriate SS components from Table 12.3.
Multiple Regression Approach
Formula 12.25 gives you exactly the same value (except for possible rounding errors) as Formula
12.14, but there is another interesting formula for adjusting eta-squared, which generally gives you a
slightly different estimate. To understand this alternative formula, it would help if you have already
read pages 528, 535, 536, 566, and 567 in the text. Otherwise, it might be better to skip this section until
you have covered some basics of multiple regression. Just as a t test for two independent groups can
always be performed by first calculating the corresponding point-biserial r, and then using a t value to
test it for significance (see p. 295), a one-way ANOVA can be performed by creating k - 1 Adummy@
predictors, using multiple regression to find the R2 for predicting the DV from those predictors, and then
using an F ratio to test R2 for significance. The connection here is that the R2 you would get from the
multiple regression is exactly the same as eta-squared, and like any R2 it is an overestimate of the
population variance accounted for (i.e., ω2). In ordinary multiple regression situations it is routine to
correct the bias of R2 by calculating what is called an adjusted R2 (a typical formula for that purpose
would be the square of Formula 17.14). Because the number of predictors (P) equals k - 1 when
multiple regression is used to perform an ANOVA, I will square and then modify Formula 17.14 to
represent eta-squared accordingly.
adj.η 2 = η 2 -
(k - 1) (1 - η 2 )
NT - k
In terms of the usual df components of a one-way ANOVA, this formula can be expressed as:
adj.η 2 = η 2 -
df bet
(1 - η 2 )
df W
This form of the correction formula is useful in that you can see two factors that affect how much
is subtracted from the original eta-squared. First, you can see that for a given total N, there is less
adjustment if you have a few relatively large groups as opposed to many smaller groups (you get yet
another chance to underestimate the error term, and thereby inflate eta-squared, each time you calculate
a sample mean and then the variance around it). Second, it is obvious that the correction gets smaller as
eta-squared gets larger; there is simply less room for error as the DV becomes increasingly more
predictable from knowing which group a score comes from. I will apply the formula above to the data
from Table 12.3; note that η2 in that example equals 161.97 / 448.00 = .36154.
Explaining Psychological Statistics
adj.η 2 = .36154 -
Chapter 12 (Section D) B. H. Cohen
10
2
.6385
(1 - .36154) = .36154 = .36154 - .10641 = .255
12
6
Adjusted η2 is an almost unbiased estimate of ω2, but it is not identical to the estimate I got from
Formula 12.25 (the latter was .242, but the former is .255). This is not a rounding error; the two
formulas are not algebraically equivalent, and will usually yield values that are different B but only
slightly different. Formula 12.25 represents the more traditional approach for estimating ω2 in the
context of ANOVA, but both estimates are considered reasonable. Although I don=t expect you to use it
for any practical purpose, I will present another formula for adjusted η2, which is algebraically
equivalent to the one above.
adj. η 2 = η 2 (1 -
1
)
F
Formula 12.26
I like the conceptual simplicity of this formula. Notice that η2 is being adjusted by being
multiplied by a correction factor that depends only on the F ratio for testing the ANOVA. In the above
formula, you can see a property that the adjusted η2 has in common with the estimate of omega-squared
as expressed by Formula 12.25 B that is, the estimate is zero when F equals 1, and the adjustment is not
valid for F less than 1. You can also see that as F gets larger, the correction diminishes, eventually
heading for its maximum value of 1.0 (i.e., no adjustment). It doesn=t matter if F is getting larger due to
a larger effect, or just larger sample sizes; larger F=s indicate that the effect in your samples is a more
accurate reflection of the effect in the population. In case the formula looks just too simple to work, let=s
put in the numbers from the example I have been using all along:
adj.η 2 = .36154 (1 -
1
) = .36154 (.706) = .255
3.4
As I mentioned above, the estimate created by Formula 12.25 is preferred to this one when reporting an
ANOVA, but I will return to Formula 12.26 in the context of multiple regression, where it is equivalent
to the usual adjustment of R2.
Explaining Psychological Statistics
Chapter 12 (Section D) B. H. Cohen
11
References
Brown, M. B., & Forsythe, A. B. (1974). The ANOVA and multiple comparisons for data with
heterogeneous variances. Biometrics, 30, 719B724.
Clinch, J. J., & Keselman, H. J. (1982). Parametric alternatives to the analysis of variance.
Journal of Educational Statistics, 7, 207B214.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York: Academic
Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York: Academic
Press.
James, G. S. (1951). The comparison of several groups of observations when the ratios of the
population variances are unknown. Biometrika, 38, 324B329.
Tomarken, A. J., & Serlin, R. C. (1986). Comparisons of ANOVA alternatives under variance
heterogeneity and specific noncentrality structures. Psychological Bulletin, 99, 90B99.
Welch, B. L. (1951). On the comparison of several mean values: An alternative approach.
Biometrika, 38, 330B336.
Wilcox, R. R. (1988). A new alternative to the ANOVA F and new results on James=s secondorder method. British Journal of Mathematical and Statistical Psychology, 41, 109B117.
Wilcox, R. R. (2001). Fundamentals of modern statistical methods: Substantially improving
power and accuracy. New York: Springer-Verlag.
Download