Uploaded by Joseph Akinkunmi

Introduction to Analysis of Variance in Biostatistics

advertisement
CHAPTER
Introduction
to Analysis
of Variance
We now proceed to a study of the analysis of variance. This m e t h o d , developed
by R. A. F isher, is f u n d a m e n t a l to m u c h of the application of statistics in biology
and especially to experimental design. O n e use of the analysis of variance is to
test whether two or m o r e s a m p l e m e a n s have been o b t a i n e d f r o m p o p u l a t i o n s
with the same p a r a m e t r i c m e a n . W h e r e only t w o samples a r e involved, the I test
can also be used. However, the analysis of variance is a m o r e general test, which
permits testing two samples as well as m a n y , a n d we arc therefore i n t r o d u c i n g
it at this early stage in o r d e r to e q u i p you with this powerful w e a p o n for y o u r
statistical arsenal. Wc shall discuss the / test for t w o samples as a special ease
in Section 8.4.
In Section 7.1 wc shall a p p r o a c h the subject on familiar g r o u n d , the s a m p l i n g
experiment of the housefly wing lengths. F r o m these samples we shall o b t a i n
two independent estimates of the p o p u l a t i o n variance. Wc digress in Scction 7.2
to i n t r o d u c e yet a n o t h e r c o n t i n u o u s distribution, the /·' distribution, needed lor
the significance test in analysis of variance. Section 7.3 is a n o t h e r digression;
here we s h o w how the F distribution can be used to test w h e t h e r t w o samples
may reasonably have been d r a w n f r o m p o p u l a t i o n s with the same variance. Wc
are now ready for Scction 7.4, in which we e x a m i n e the effects of subjecting the
samples to different treatments. In Section 7.5, we describe the partitioning of
134
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
sums of squares and of degrees of freedom, the actual analysis of variance. The
last two sections (7.6 and 7.7) take up in a more formal way the two scientific
models for which the analysis of variance is appropriate, the so-called fixed
treatment effects model (Model I) and the variance component model (Model II).
Except for Section 7.3, the entire chapter is largely theoretical. W e shall
p o s t p o n e the practical details of c o m p u t a t i o n to C h a p t e r 8. However, a t h o r o u g h
understanding of the material in C h a p t e r 7 is necessary for working out actual
examples of analysis of variance in C h a p t e r 8.
O n e final c o m m e n t . W e shall use J. W. Tukey's acronym " a n o v a " interchangeably with "analysis of variance" t h r o u g h o u t the text.
7.1 The variances of samples and their means
We shall a p p r o a c h analysis of variance t h r o u g h the familiar sampling experiment of housefly wing lengths (Experiment 5.1 and Table 5.1), in which we
combined seven samples of 5 wing lengths to form samples of 35. W e have
reproduced one such sample in Table 7.1. The seven samples of 5, here called
groups, are listed vertically in the upper half of the table.
Before we proceed to explain Table 7.1 further, we must become familiar
with a d d e d terminology and symbolism for dealing with this kind of problem.
We call our samples groups; they are sometimes called classes or are k n o w n
by yet other terms we shall learn later. In any analysis of variance we shall have
two or more such samples or groups, and we shall use the symbol a for the
n u m b e r of groups. Thus, in the present example a = 7. Each g r o u p or sample
is based on η items, as before; in Table 7.1, η = 5. The total n u m b e r of items
in the table is a times n, which in this case equals 7 χ 5 or 35.
The sums of the items in the respective groups are shown in the row underneath the horizontal dividing line. In an anova, s u m m a t i o n signs can no longer
be as simple as heretofore. We can sum either the items of one g r o u p only or
the items of the entire table. We therefore have to use superscripts with the
s u m m a t i o n symbol. In line with our policy of using the simplest possible notation, whenever this is not likely to lead to misunderstanding, we shall use Σ"Υ
to indicate the sum of the items of a g r o u p and Σ" η Υ to indicate the sum of all
the items in the table. The sum of the items of each g r o u p is shown in the first
row under the horizontal line. The mean of each group, symbolized by V', is
in the next row and is c o m p u t e d simply as Σ"Υ/>!. The remaining t w o rows in
that portion of Table 7.1 list Σ"Υ1 and Σ" y1, separately for each group. These
are the familiar quantities, the sum of the squared V's and the sum of squares
of Y.
F r o m the sum of squares for each g r o u p we can obtain an estimate of the
population variance of housefly wing length. Thus, in the first g r o u p
=
29.2. Therefore, our estimate of the p o p u l a t i o n variance is
Γ- •f
ΟΟ
in
τίII ΙΙ
s· §
cΟ <3>
23 C^
OO
OO
wo
rII
Ii
Μ
t-l
ο
ο
1 =
"
Κ
§
II
rn
•«t
II
a» *§
S ®
Ε
»
° ν»
U o1
" VD ^t
t </->
Μ
^r
•f
-3-
OO — r- —I
τ»Κ
ι—ι Ο OS <N
t rj· ro "t
r-l
'Τ
rt V") Tf V)
0\
Ο m
α ^Ό rf
Ό—
Tf
Tf
Ο
Ο rf OO Ο
Tf ) TJ" Tj"
Tf
rJ
m
π
OO C7\ ON Άι Ο
rr
t τί" Tt xt t
Γ1
- Tf 00 fH ΓΙ
t 4 ^t ^t ^t
VJ </->
S ii
•o c
ii
c ο
oo
rl
Tf
TN
T
<
Κt
t
Tf
r—
ι 1
II
II
'W
Ι&Γ
I
θ"
νο
οο
(Λ
vi
ΓΊ
c h a p t e r 7 /' i n t r o d u c t i o n
136
to
analysis of
variance
a rather low estimate c o m p a r e d with those obtained in the other samples. Since
we have a sum of squares for each group, we could obtain an estimate of the
p o p u l a t i o n variance f r o m each of these. However, it stands to reason that we
would get a better estimate if we averaged these separate variance estimates in
some way. This is d o n e by c o m p u t i n g the weighted average of the variances by
Expression (3.2) in Section 3.1. Actually, in this instance a simple average would
suffice, since all estimates of the variance are based on samples of the same size.
However, we prefer to give the general formula, which works equally well for
this case as well as for instances of unequal sample sizes, where the weighted
average is necessary. In this case each sample variance sf is weighted by its
degrees of freedom, w\ = n ; — 1, resulting in a sum of squares ( Z y f ) , since
(«,· — l)s 2 = Σ y f . Thus, the n u m e r a t o r of Expression (3.2) is the sum of the sums
of squares. T h e d e n o m i n a t o r is Σ"(π, — 1) = 7 χ 4, the sum of the degrees of
freedom of each group. The average variance, therefore, is
7
s2 =
29.2 + 12.0 + 75.2 + 45.2 + 98.8 + 81.2 + 107.2
28
=
448.8
28
=
6.029
This quantity is an estimate of 15.21, the parametric variance of housefly
wing lengths. This estimate, based on 7 independent estimates of variances of
groups, is called the average variance within groups or simply variance within
groups. N o t e that we use the expression within groups, although in previous
chapters we used the term variance of groups. T h e reason we do this is that the
variance estimates used for c o m p u t i n g the average variance have so far all come
from sums of squares measuring the variation within one column. As wc shall
see in what follows, one can also c o m p u t e variances a m o n g groups, cutting
across g r o u p boundaries.
T o obtain a sccond estimate of the population variance, we treat the seven
g r o u p means Ϋ as though they were a sample of seven observations. T h e resulting
statistics arc shown in the lower right part of Tabic 7.1, headed " C o m p u t a t i o n
of sum of squares of means." There arc seven means in this example; in the
general case there will be a means. We first c o m p u t e Σ"Ϋ, the sum of the means.
N o t e thai this is rather sloppy symbolism. T o be entirely proper, we should
identify this q u a n t i t y as Σ; ^" Yh s u m m i n g the m e a n s of g r o u p 1 through g r o u p
a. T h e next quantity c o m p u t e d is Ϋ, the grand mean of the g r o u p means, computed as Υ = Σ"Ϋ/α. T h e sum of the seven means is Σ"Ϋ = 317.4, and the grand
mean is Ϋ = 45.34, a fairly close a p p r o x i m a t i o n to the parametric mean μ — 45.5.
T h e sum of squares represents the deviations of the g r o u p means from the grand
mean, Σ"(>' — >7)2. For this wc first need the quantity Σ"Κ 2 , which equals
14,417.24. The customary c o m p u t a t i o n a l formula for sum of squares applied
to these means is Σ"Ϋ2 - [(Σ"Υ) 2 /ciJ = 25.417. F r o m the sum of squares of the
means we obtain a variance among the means in the conventional way as follows:
Σ" (Ϋ
Y) 2 /(a
I). Wc divide by a
1 rather than η — 1 because the sum
of squares was based on a items (means). Thus, variance of the means s2· —
7.1 / t h e v a r i a n c e s o f s a m p l e s a n d t h e i r
137
means
25.417/6 = 4.2362. W e learned in C h a p t e r 6, Expression (6.1), that when we
randomly sample f r o m a single population,
and hence
Thus, we can estimate a variance of items by multiplying the variance of means
by the sample size on which the means are based (assuming we have sampled
at r a n d o m from a c o m m o n population). W h e n we do this for our present example, we obtain s2 = 5 χ 4.2362 = 21.181. This is a second estimate of the
parametric variance 15.21. It is not as close to the true value as the previous
estimate based on the average variance within groups, but this is to be expected,
since it is based on only 7 "observations." W e need a n a m e describing this
variance to distinguish it from the variance of means from which it has been
computed, as well as from the variance within groups with which it will be
compared. W e shall call it the variance among groups; it is η times the variance
of means and is an independent estimate of the parametric variance σ2 of the
housefly wing lengths. It m a y not be clear at this stage why the two estimates
of a 2 that we have obtained, the variance within groups and the variance a m o n g
groups, are independent. W e ask you to take on faith that they are.
Let us review what we have done so far by expressing it in a more formal
way. Table 7.2 represents a generalized table for d a t a such as the samples of
housefly wing lengths. Each individual wing length is represented by Y, subscripted to indicate the position of the quantity in the data table. The wing length
of the j t h fly from the /th sample or g r o u p is given by Y^. Thus, you will notice
that (he first subscript changes with each column representing a g r o u p in the
tabi.K 7.2
Data arranged for simple analysis of variance, single classification, completely
randomized.
(/roups
a
I
>0
).-,
>«,
"
>:„
>;,,
>,.
γ
y
sums
£γ
Σ.
t2
iy3
Means
Ϋ
Υ,
Y2
Υ,
·
•••
'
>,.
>.,
>;„
•••
• iy,
•··
V,
x,
>;.
i n
V,
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
138
variance
table, and the second subscript changes with each row representing an individual
item. Using this notation, we can c o m p u t e the variance of sample 1 as
1
i="
y
—
η - r 1 i Σ= ι ( u -
y
i)2
The variance within groups, which is the average variance of the samples,
is c o m p u t e d as
1
i=a j —η
Γ> ,Σ= ι Σ
1)
j=ι
α ( η -
( Y i j -
N o t e the double s u m m a t i o n . It means that we start with the first group, setting
i = 1 (i being the index of the outer Σ). W e sum the squared deviations of all
items from the mean of the first group, changing index j of the inner Σ f r o m 1
to η in the process. W e then return to the outer summation, set i = 2, a n d sum
the squared deviations for g r o u p 2 from j = 1 toj = n. This process is continued
until i, the index of the outer Σ, is set to a. In other words, we sum all the
squared deviations within one g r o u p first and add this sum to similar sums f r o m
all the other groups.
The variance a m o n g groups is c o m p u t e d as
n
i=a
-^-rliY.-Y)
a - 1 Μ
2
N o w that we have two independent estimates of the population variance,
what shall we do with them? We might wish to find out whether they d o in fact
estimate the same parameter. T o test this hypothesis, we need a statistical test
that will evaluate the probability that the two sample variances are from the same
population. Such a test employs the F distribution, which is taken u p next.
7.2 The F distribution
Let us devise yet a n o t h e r sampling experiment. This is quite a tedious one without the use of computers, so we will not ask you to carry it out. Assume that
you are sampling at r a n d o m from a normally distributed population, such as the
housefly wing lengths with mean μ and variance σ2. T h e sampling procedure
consists of first sampling n l items and calculating their variance .vf, followed by
sampling n 2 items and calculating their variance .s2. Sample sizes n, and n 2 may
or may not be equal to each other, but are fixed for any one sampling experiment.
Thus, for example, wc might always sample 8 wing lengths for the first sample
(n,) and 6 wing lengths for the second sample (n 2 ). After each pair of values (sf
and
has been obtained, wc calculate
This will be a ratio near 1, because these variances arc estimates of the same
quantity. Its actual value will depend on the relative magnitudes of variances
..-
ι „>
ir
ι..
1
r
.,
,..,i...,i.,<ii,„
7.2 / t h e F d i s t r i b u t i o n
139
Fs of their variances, the average of these ratios will in fact a p p r o a c h the quantity
(n2 — l)/(«2 — 3), which is close to 1.0 when n2 is large.
The distribution of this statistic is called the F distribution, in h o n o r of
R. A. Fisher. This is a n o t h e r distribution described by a complicated mathematical function that need not concern us here. Unlike the t and χ2 distributions,
the shape of the F distribution is determined by two values for degrees of freedom,
Vj and v 2 (corresponding to the degrees of freedom of the variance in the
n u m e r a t o r and the variance in the d e n o m i n a t o r , respectively). Thus, for every
possible combination of values v l5 v 2 , each ν ranging from 1 to infinity, there
exists a separate F distribution. Remember that the F distribution is a theoretical
probability distribution, like the t distribution and the χ2 distribution. Variance
ratios s f / s f , based on sample variances are sample statistics that m a y or may
not follow the F distribution. We have therefore distinguished the sample variance ratio by calling it Fs, conforming to o u r convention of separate symbols
for sample statistics as distinct from probability distributions (such as ts and
X2 contrasted with t and χ2).
We have discussed how to generate an F distribution by repeatedly taking
two samples from the same normal distribution. We could also have generated
it by sampling from two separate n o r m a l distributions differing in their mean
but identical in their parametric variances; that is, with μ, φ μ 2 but σ\ = σ\.
Thus, we obtain an F distribution whether the samples come from the same
normal population or from different ones, so long as their variances arc identical.
Figure 7.1 shows several representative F distributions. F or very low degrees
of freedom the distribution is l - s h a p c d , but it becomes humped and strongly
skewed to the right as both degrees of freedom increase. Table V in Appendix
norm
7. ι
140
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
A2 s h o w s the cumulative probability distribution of F for three selected p r o b ability values. T h e values in the table represent F a ( v i v j ] , where a is the p r o p o r t i o n
of the F d i s t r i b u t i o n t o t h e right of the given F value (in o n e tail) a n d \'j, v 2 are
the degrees of f r e e d o m p e r t a i n i n g to the variances in the n u m e r a t o r and the
d e n o m i n a t o r of the ratio, respectively. T h e table is a r r a n g e d so t h a t across the
t o p o n e reads v l 5 the degrees of f r e e d o m p e r t a i n i n g to the u p p e r ( n u m e r a t o r )
variance, a n d a l o n g the left m a r g i n o n e r e a d s v 2 , the degrees of f r e e d o m pertaining to the lower ( d e n o m i n a t o r ) variance. At each intersection of degree of
f r e e d o m values we list three values of F decreasing in m a g n i t u d e of a. F o r
example, a n F distribution with v, = 6, v 2 = 24 is 2.51 at a = 0.05. By t h a t
we m e a n that 0.05 of the a r e a u n d e r the curve lies to the right of F = 2.51.
Figure 7.2 illustrates this. O n l y 0.01 of the area u n d e r the curve lies t o the right
of F = 3.67. T h u s , if we have a null hypothesis H0: σ\ = σ\, with the alternative
hypothesis Ηx: σ\ >
we use a one-tailed F test, as illustrated by F i g u r e 7.2.
W e can n o w test the t w o variances o b t a i n e d in the s a m p l i n g e x p e r i m e n t
of Section 7.1 a n d T a b l e 7.1. T h e variance a m o n g g r o u p s based on 7 m e a n s w a s
21.180, a n d the variance within 7 g r o u p s of 5 individuals was 16.029. O u r null
hypothesis is that the t w o variances estimate the same p a r a m e t r i c variance; the
alternative hypothesis in an a n o v a is always that the p a r a m e t r i c variance estim a t e d by the variance a m o n g g r o u p s is greater t h a n that estimated by the
variance within g r o u p s . T h e reason for this restrictive alternative hypothesis,
which leads to a one-tailed test, will be explained in Section 7.4. W e calculate
the variance ratio F s = s\js\ = 21.181/16.029 = 1.32. Before we c a n inspect the
FKHJRE 7 . 2
F r e q u e n c y curve of the /· d i s t r i b u t i o n for (> and 24 degrees of f r e e d o m , respectively. A one-tailed
141
7.1 / t h e F d i s t r i b u t i o n
F table, we have to k n o w the a p p r o p r i a t e degrees of freedom for this variance
ratio. We shall learn simple formulas for degrees of freedom in an a n o v a later,
but at the m o m e n t let us reason it out for ourselves. T h e u p p e r variance
(among groups) was based on the variance of 7 means; hence it should have
α — 1 = 6 degrees of freedom. T h e lower variance was based on an average of
7 variances, each of t h e m based on 5 individuals yielding 4 degrees of freedom
per variance: a(n — 1) = 7 χ 4 = 28 degrees of freedom. Thus, the upper variance
has 6, the lower variance 28 degrees of freedom. If we check Table V for ν 1 = 6 ,
v 2 = 24, the closest a r g u m e n t s in the table, we find that F0 0 5 [ 6 24] = 2.51. F o r
F = 1.32, corresponding to the Fs value actually obtained, α is clearly >0.05.
Thus, we may expect m o r e t h a n 5% of all variance ratios of samples based on
6 and 28 degrees of freedom, respectively, to have Fs values greater t h a n 1.32.
We have no evidence to reject the null hypothesis and conclude that the two
sample variances estimate the same parametric variance. This corresponds, of
course, to what we knew anyway f r o m o u r sampling experiment. Since the seven
samples were taken from the same population, the estimate using the variance
of their means is expected to yield another estimate of the parametric variance
of housefly wing length.
Whenever the alternative hypothesis is that the two parametric variances are
unequal (rather than the restrictive hypothesis Η { . σ \ > σ 2 ), the sample variance
s j can be smaller as well as greater than s2. This leads to a two-tailed test, and
in such cases a 5% type I error means that rejection regions of 2 j % will occur
at each tail of the curve. In such a case it is necessary to obtain F values for
ot > 0.5 (that is, in the left half of the F distribution). Since these values arc rarely
tabulated, they can be obtained by using the simple relationship
' I I K)[V2. Vl]
For example, F(1 „ 5 ( 5 2 4 , = 2.62. If we wish to obtain F 0 4 5 [ 5 2 4 1 (the F value to
the right of which lies 95% of the area of the F distribution with 5 and 24 degrees
of freedom, respectively), we first have to find F(1 0 5 1 2 4
= 4.53. Then F0 4515 241
is the reciprocal of 4.53, which equals 0.221. T h u s 95% of an F distribution with
5 and 24 degrees of freedom lies to the right of 0.221.
There is an i m p o r t a n t relationship between the F distribution and the χ2
distribution. You may remember that the ratio X2 = Σ\>2/σ2 was distributed as
a χ2 with η — I degrees of freedom. If you divide the n u m e r a t o r of this expression
by n — 1, you obtain the ratio F, = ,ν 2 /σ 2 , which is a variance ratio with an
expected distribution of F,,,- , , The upper degrees of freedom arc η — I (the
degrees of freedom of the sum of squares or sample variance). T h e lower degrees
of freedom are infinite, because only on the basis of an infinite n u m b e r of items
can we obtain the true, parametric variance of a population. Therefore, by
dividing a value of X 2 by η — 1 degrees of freedom, we obtain an Fs value with
η - 1 and
co d f , respectively. In general, χ2^\!ν ~
*]· Wc can convince ourselves of this by inspecting the F and χ2 tables. F r o m the χ2 tabic (Table IV)
we find that χ 2,. 5[ΐοι ^ 18.307. Dividing this value by 10 dj\ we obtain 1.8307.
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
142
variance
Thus, the two statistics of significance are closely related and, lacking a χ 2 table,
we could m a k e d o with an F table alone, using the values of vF [v ^ in place
°f* 2 v,·
Before we return to analysis of variance, we shall first apply our newly won
knowledge of the F distribution to testing a hypothesis a b o u t two sample
variances.
BOX 7.1
Testing the significance of differences between two variances.
Survival in days of the cockroach Blattella vaga when kept without food or water.
Females
Males
n, = 10
n2 = 1 0
H0: <xf = σ |
Y, = 8.5 days
P2 = 4.8 days
= 3.6
s\ = 0.9
Η^.σίΦσΙ
Source: Data modified from Willis and Lewis (1957).
The alternative hypothesis is that the two variances are unequal. We have
no reason to suppose that one sex should be more variable than the other.
In view of the alternative hypothesis this is a two-tailed test. Since only
the right tail of the F distribution is tabled extensively in Table V and in
most other tables, we calculate F s as the ratio of the greater variance over
the lesser one:
Because the test is two-tailed, we look up the critical value Fa/2|vi,»2)> where
α is the type I error accepted and v, = ri1 — 1 and v2 = n, — 1 are the
degrees of freedom for the upper and lower variance, respectively. Whether
we look up ^<χ/2ΐν,.ν2] o r Fx/up,vi] depends on whether sample 1 or sample
2 has the greater variance and has been placed in the numerator.
From Table V we find F0.02519,9] = 4.03 and F 0 0 5 l 9 i 9 J = 3.18. Because this is a two-tailed test, we double these probabilities. Thus, the F
value of 4.03 represents a probability of α = 0.05, since the right-hand tail
area of α = 0.025 is matched by a similar left-hand area to the left of
^o.975[9.9i = '/f0.025(9,9] = 0.248. Therefore, assuming the null hypothesis
is true, the probability of observing an F value greater than 4.00 and
smaller than 1/4.00 = 0.25 is 0.10 > Ρ > 0.05. Strictly speaking, the two
sample variances are not significantly different—the two sexes are equally
variable in their duration of survival. However, the outcome is close
enough to the 5% significance level to make us suspicious that possibly
the variances are in fact different. It would be desirable to repeat this
experiment with larger sample sizes in the hope that more decisive results
would emerge.
7.3 /
THE HYPOTHESIS H0:uj
=
143
σ\
7.3 The hypothesis H0: σ\ = σ\
A test of the null hypothesis that two normal populations represented by two
samples have the same variance is illustrated in Box 7.1. As will be seen later,
some tests leading to a decision a b o u t whether two samples come f r o m p o p u l a tions with the same m e a n assume that the population variances are equal. H o w ever, this test is of interest in its own right. We will repeatedly have to test whether
two samples have the same variance. In genetics wc may need to k n o w whether
an offspring generation is m o r e variable for a character t h a n the parent generation. In systematics we might like to find out whether two local p o p u l a t i o n s are
equally variable. In experimental biology we may wish to d e m o n s t r a t e under
which of two experimental setups the readings will be more variable. In general,
the less variable setup would be preferred; if b o t h setups were equally variable,
the experimenter would pursue the one that was simpler or less costly to
undertake.
7.4 Heterogeneity among sample means
We shall now modify the data of Table 7.1, discussed in Section 7.1. Suppose
the seven groups of houseflies did not represent r a n d o m samples from the same
population but resulted from the following experiment. Each sample was reared
in a separate culture jar, and the medium in each of the culture jars was prepared
in a different way. Some had more water added, others more sugar, yet others
more solid matter. Let us assume that sample 7 represents the s t a n d a r d medium
against which we propose to c o m p a r e the other samples. The various changes
in the medium affect the sizes of the flies that emerge from it; this in turn affects
the wing lengths we have been measuring.
We shall assume the following effects resulting from treatment of the
medium:
Medium 1 decreases average wing length of a sample by 5 units
2 -decreases average wing length of a sample by 2 units
3 — d o e s not change average wing length of a sample
4 increases average wing length of a sample by 1 unit
5 -increases average wing length of a sample by 1 unit
6 increases average wing length of a sample by 5 units
7—(control) does not change average wing length of a sample
The effect of treatment / is usually symbolized as a,. (Please note that this use
of α is not related to its use as a symbol for the probability of a type I error.)
Thus a, assumes the following values for the above treatment effects.
α, -
- 5
α 4 =• I
α. =
-2
«5=1
«Λ =
0
α6 = 5
/ν — η
σ·>
cQ «I
•f
= ε
δο <*
υ
ι>-
ΓΪ
II
*> 2
r-i
ο
<+N
§
ο
ο
I
ε "!
n>I
'b.
+
—ι νο
rr fN
ΚΊ
Ό r-, \θ r J ^D
+ tlun tn to
vD
ο
•5
— ο (Ν •
—ι Ο γ- r- c-ι
•3- r r
ti-
o in
ο ^t so ι^ιο
^f
2 «
te
«
r<~)
CL _
i/i i/3 II
XI =
hΟ
1.Ξ ο*
W
1
Ο
+
ο
s
7.4 / h e t e r o g e n e i t y a m o n g s a m p l e
145
means
N o t e t h a t t h e α,-'s have been defined so t h a t Σ" a,· = 0; t h a t is, the effects cancel
out. This is a convenient p r o p e r t y t h a t is generally p o s t u l a t e d , but it is unnecessary for o u r a r g u m e n t . W e can now modify T a b l e 7.1 by a d d i n g t h e a p p r o p r i a t e
values of a t to e a c h sample. In s a m p l e 1 the value of a 1 is —5; therefore, the
first wing length, which was 41 (see T a b l e 7.1), n o w becomes 36; the second
wing length, formerly 44, b e c o m e s 39; a n d so on. F o r the second s a m p l e a 2 > s
— 2, c h a n g i n g t h e first wing length f r o m 48 t o 46. W h e r e a, is 0, the wing
lengths d o not change; where a { is positive, they are increased by the m a g n i t u d e
indicated. T h e c h a n g e d values can be inspected in Table 7.3, which is a r r a n g e d
identically to T a b l e 7.1.
We n o w repeat o u r previous c o m p u t a t i o n s . W e first calculate the s u m of
squares of the first s a m p l e to find it t o be 29.2. If you c o m p a r e this value
with the sum of squares of the first sample in T a b l e 7.1, you find the two
values to be identical. Similarly, all o t h e r values of Σ" y2, the sum of s q u a r e s of
each g r o u p , are identical to their previous values. W h y is this so? T h e effect of
a d d i n g a, to each g r o u p is simply that of an additive code, since a, is c o n s t a n t
for any one group. F r o m Appendix A 1.2 we can see that additive codes d o not
affect s u m s of s q u a r e s or variances. Therefore, not only is each s e p a r a t e s u m of
squares the same as before, but the average variance within g r o u p s is still 16.029.
N o w let us c o m p u t e the variance of the means. It is 100.617/6 = 16.770, which
is a value m u c h higher t h a n the variance of m e a n s f o u n d before, 4.236. W h e n we
multiply by η = 5 t o get an estimate of σ 2 , we o b t a i n the variance of groups,
which now is 83.848 a n d is no longer even close to an estimate of σ2. W e repeat
the I·' test with the new variances a n d find that Fs = 83.848/16.029 = 5.23, which
is m u c h greater than the closest critical value of F 0 0 S | h 2 4| = 2.51. In fact, the
observed F s is greater t h a n F„ 0 l | ( 1 , 4 ] = 3.67. Clearly, the u p p e r variance, representing the variance a m o n g groups, has become significantly larger. T h e t w o
variances are most unlikely to represent the same p a r a m e t r i c variance.
W h a t has h a p p e n e d ? We can easily explain it by m e a n s of T a b l e 7.4, which
represents T a b l e 7.3 symbolically in the m a n n e r that Table 7.2 represented
Table 7.1. We note that each g r o u p has a c o n s t a n t a, added a n d that this
constant changes the s u m s of the g r o u p s by na, a n d the m e a n s of these g r o u p s
by <Xj. In Section 7.1 we c o m p u t e d the variance within g r o u p s as
J
Σ
u j ~ π
Σ
,2
( V
>'.,·
When wc try to repeat this, our f o r m u l a becomes m o r e complicated, because to
each Y:j a n d each V, there has now been a d d e d a,·. We therefore write
a(n
-
Σ
I ) ι
Σ l ' y u · Α)
,- ι
ι>, · ·Λ)|
2
Then we o p e n the parentheses inside t h e s q u a r e brackets, so that the second a,
changes sign a n d the α,-'s cancel out, leaving the expression exactly as before.
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
146
variance
TABLE 7 . 4
D a t a of Table 7.3 arranged in the manner of Table 7.2.
a
I
ΙΛ1
t 2
2
ll
J
η
+ a,
Σ
+ HOC,
ΫΙ+
a,
Y
2
a
i
+ «3 ' • Yn + a,
• Yi 2 + a.
+ «3
+ «3
· •
^3 + «,
Y.I +
··
•
·•
•
Y.2 +
««
+Yal
«„
••
· Yij+ "A •· • Y.J+
Yin + «3 •• ' Yin + Oti•• Y+ m»„
η
π
Σ y 3 + »a3 ••
• tYi + *i • Σκ + ny
>:«; + «3
+ *2
η
η
Means
y 33
y^ +
Yxj
Sums
Yil
«2
y 22 + * 2
Yli + «2
+
3
+
Y
r , , + «1
Groups
3
n
+ "a2
F, +<χ2
y3+*3
fi + ti
•
•
•
··
η + »„
s u b s t a n t i a t i n g o u r earlier o b s e r v a t i o n t h a t the variance within g r o u p s d o e s nol
c h a n g e despite the t r e a t m e n t effects.
T h e variance of m e a n s was previously calculated by the f o r m u l a
ι
;-a
a —
1 ;=1
H o w e v e r , f r o m T a b l e 7.4 we see that the new grand m e a n equals
I i=a
- χ (>;• + « , • ) =
a i^i
ι ι = <i _ ιa • = <. —
Σ ϋ< + - Σ
a i=ι
a ,
ι
' =
*
W h e n we substitute the new values for the g r o u p m e a n s and the g r a n d m e a n
the f o r m u l a a p p e a r s as
-—τ'ς π»;·+ «,)-(y+<*)]2
a
- ι ζ-ι
a
-- Σ
- I ,= ι
which in turn yields
-
V) + («,• - a ) l 2
S q u a r i n g (he expression in the s q u a r e brackets, vvc obtain the terms
1
a -
, ς'<>;
1 ,-v ,
>)' + a
1
, Σ ^
- 1, ι
- ·<)·' + -a 2 - ,
Σ
1,= ι
- m
«)
T h e first of these terms we immediately recognize as the previous variance el
the means, Sy. T h e second is a new q u a n t i t y , but is familiar by general appeal
ancc; it clearly is a variance or at least a q u a n t i t y akin to a variance. T h e tliiM
expression is a new type; it is a so-called covariance. which we have not w i
e n c o u n t e r e d . We shall not be concerned with it at this stage except to say th.n
7.4 /
HETEROGENEITY AMONG SAMPLE MEANS
147
in cases such as the present one, where the m a g n i t u d e of the treatment effects
a,· is assumed to be independent of the X to which they are added, the expected
value of this q u a n t i t y is zero; hence it does not contribute to the new variance
of means.
The independence of the treatments effects and the sample m e a n s is an
i m p o r t a n t concept that we must u n d e r s t a n d clearly. If we had not applied different treatments to the medium jars, but simply treated all jars as controls,
we would still have obtained differences a m o n g the wing length means. Those
are the differences f o u n d in Table 7.1 with r a n d o m sampling from the same
population. By chance, some of these means are greater, some are smaller. In
our planning of the experiment we had no way of predicting which sample
means would be small and which would be large. Therefore, in planning our
treatments, we had n o way of m a t c h i n g u p a large treatment effect, such as that
of medium 6, with the m e a n that by chance would be the greatest, as that for
sample 2. Also, the smallest sample mean (sample 4) is not associated with the
smallest treatment effect. Only if the m a g n i t u d e of the treatment effects were
deliberately correlated with the sample means (this would be difficult to d o in
the experiment designed here) would the third term in the expression, the covariance, have an expected value other than zero.
T h e second term in the expression for the new variance of m e a n s is clearly
added as a result of the treatment effects. It is a n a l o g o u s to a variance, but it
cannot be called a variance, since it is not based on a r a n d o m variable, but
rather on deliberately chosen treatments largely under our control. By changing
the m a g n i t u d e and n a t u r e of the treatments, wc can more or less alter the
variancelike quantity at will. We shall therefore call it the added component due
to treatment effects. Since the α,-'s are arranged so that a = 0, we can rewrite
the middle term as
In analysis of variance we multiply the variance of the m e a n s by η in order
to estimate the parametric variance of the items. As you know, we call the
quantity so obtained the variance of groups. When wc d o this for the ease in
which treatment effects are present, we obtain
Thus we see that the estimate of the parametric variance of the population is
increased by the quantity
a
which is η times the added c o m p o n e n t due to treatment effects. We found the
variance ratio f\. to be significantly greater than could be reconciled with the
null hypothesis. It is now obvious why this is so. We were testing the variance
148
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
ratio expecting to find F a p p r o x i m a t e l y equal to σ2/σ2
we have
η
variance
= 1. In fact, however,
"
a — ι
It is clear f r o m this f o r m u l a (deliberately displayed in this lopsided m a n n e r )
that the F test is sensitive to the presence of the a d d e d c o m p o n e n t d u e to treatm e n t effects.
At this point, y o u have an a d d i t i o n a l insight into the analysis of variance.
It permits us to test w h e t h e r there are a d d e d t r e a t m e n t e f f e c t s — t h a t is, w h e t h e r
a g r o u p of m e a n s can simply be considered r a n d o m samples f r o m the same
p o p u l a t i o n , or w h e t h e r t r e a t m e n t s that have affected each g r o u p separately
have resulted in shifting these m e a n s so m u c h that they can n o longer be
considered samples from the s a m e p o p u l a t i o n . If the latter is so, an a d d e d c o m p o n e n t d u e to t r e a t m e n t effects will be present a n d m a y be detected by an F test
in the significance test of the analysis of variance. In such a study, we are
generally not interested in the m a g n i t u d e of
but we are interested in the m a g n i t u d e of the separate values of
In o u r
e x a m p l e these a r c the effects of different f o r m u l a t i o n s of the m e d i u m on wing
length. If, instead of housefly wing length, we were m e a s u r i n g b l o o d pressure
in samples of rats a n d the different g r o u p s had been subjected to different d r u g s
or different doses of the same drug, the quantities a, would represent the effects
of d r u g s on the blood pressure, which is clearly the issue of interest to the
investigator. We may also be interested in s t u d y i n g differences of the type
a , — x 2 , leading us to the question of the significance of the differences between
the effects of a n y two types of m e d i u m or any two drugs. But we a r e a little
a h e a d of o u r story.
W h e n analysis of variance involves t r e a t m e n t effects of the type just studied,
we call it a Model 1 tmovu. Later in this c h a p t e r (Section 7.6), M o d e l I will
be defined precisely. T h e r e is a n o t h e r model, called a Model 11 anova, in which
the a d d e d effects for cach g r o u p arc not fixed t r e a t m e n t s but are r a n d o m effects.
By this we m e a n that we have not deliberately planned or fixed the t r e a t m e n t
for any one group, but that the actual effects on each g r o u p are r a n d o m and
only partly u n d e r o u r control. S u p p o s e that the seven samples of houscflies in
T a b l e 7.3 represented the offspring of seven r a n d o m l y selected females f r o m a
p o p u l a t i o n reared on a uniform m e d i u m . T h e r e would be gcnctic differences
a m o n g these females, and their seven b r o o d s would reflect this. T h e exact n a t u r e
of these differences is unclear and unpredictable. Before actually m e a s u r i n g
them, we have no way of k n o w i n g whether b r o o d 1 will have longer wings than
b r o o d 2, nor have we any way of controlling this experiment so that b r o o d 1
will in fact grow longer wings. So far as we can ascertain, the genctic factors
149
7.4 / h e t e r o g e n e i t y a m o n g s a m p l e m e a n s
for wing length are distributed in a n u n k n o w n m a n n e r in the p o p u l a t i o n of
houseflies (we m i g h t hope t h a t they are n o r m a l l y distributed), a n d o u r s a m p l e
of seven is a r a n d o m sample of these factors.
In a n o t h e r example for a M o d e l II a n o v a , s u p p o s e that instead of m a k i n g
u p our seven cultures f r o m a single b a t c h of m e d i u m , we have p r e p a r e d seven
batches separately, o n e right after the other, a n d are n o w analyzing the v a r i a t i o n
a m o n g the batches. W e w o u l d not be interested in the exact differences f r o m
batch to batch. Even if these were m e a s u r e d , we would not be in a position to
interpret them. N o t h a v i n g deliberately varied b a t c h 3, we have no idea why,
for example, it should p r o d u c c longer wings t h a n b a t c h 2. W e would, however,
be interested in the m a g n i t u d e of the variance of the a d d e d effects. T h u s , if we
used seven j a r s of m e d i u m derived f r o m o n e batch, we could expect the variance
of the j a r m e a n s to be σ 2 / 5 , since there were 5 flies per jar. But when based on
different batches of m e d i u m , the variance could be expected t o be greater, because all the i m p o n d e r a b l e accidents of f o r m u l a t i o n a n d e n v i r o n m e n t a l differences d u r i n g m e d i u m p r e p a r a t i o n that m a k e o n e batch of m e d i u m different
f r o m a n o t h e r would c o m e into play. Interest would focus on the a d d e d variance
c o m p o n e n t arising f r o m differences a m o n g batches. Similarly, in the o t h e r
example we would be interested in the a d d e d variance c o m p o n e n t arising f r o m
genetic differences a m o n g the females.
We shall now take a rapid look at the algebraic f o r m u l a t i o n of (he a n o v a
in the case of Model II. In T a b l e 7.3 the second row at the head of the d a t a
c o l u m n s shows not only a, but also Ah which is the symbol we shall use for
a r a n d o m g r o u p effect. We use a capital letter to indicate that the effect is a
variable. T h e algebra of calculating the two estimates of the p o p u l a t i o n variance is the same as in Model I, except that in place of a, we imagine /I, substituted in Table 7.4. T h e estimate of the variance a m o n g m e a n s now represents
the q u a n t i t y
-
a
1
. Σ Ο ' , - > >' +
I ,· ,
a
' . ' Σ <··'.
I ,·-1
·"·' ·
2
α
, Σ ·
1 ,· - ,
- π κ
-
η
T h e first term is the variance of m e a n s ,Sy, as before, and the last term is the
covariance between the g r o u p m e a n s and (he r a n d o m effects Ah the expected
value of which is zero (as before), because the r a n d o m effects are independent
of (he m a g n i t u d e of the means. T h e middle term is a true variance, since .4,
is a r a n d o m variable. We symbolize it by .s^ and call it the added
variance
component amoiui (/roups. It would represent the added variance c o m p o n e n t
a m o n g females or a m o n g medium batches, d e p e n d i n g on which of the designs
discussed a b o v e we were thinking of. T h e existence of this added variance component is d e m o n s t r a t e d by the /·' test. If the g r o u p s are r a n d o m samples, we
may expect I- to a p p r o x i m a t e σ1/σ1 - I; but with an added variance c o m p o nent, the expected ratio, again displayed lopsidcdly, is
η2
X
a
2
+
ησ\
"
150
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
N o t e that σΑ, the parametric value of sA, is multiplied by η, since we have to
multiply the variance of m e a n s by η to obtain an independent estimate of the
variance of the population. In a Model II a n o v a we are interested not in the
m a g n i t u d e of any At or in differences such as Al — A2, but in the m a g n i t u d e
of σΑ a n d its relative m a g n i t u d e with respect to σ 2 , which is generally expressed
as the percentage 100s^/(s 2 + sA). Since the variance a m o n g g r o u p s estimates
σ2 + ησ\, we can calculate s2A as
- (variance a m o n g g r o u p s — variance within groups)
η
J-[(s2+
ns2A)-s2]=i-(ns2A)
= s2A
F o r the present example, s2A = |(83.848 - 16.029) = 13.56. This a d d e d variance c o m p o n e n t a m o n g groups is
100 x 13.56
16.029 + 13.56
=
J356_
%
29.589
of the sum of the variances a m o n g and within groups. Model II will be formally
discussed at the end of this chapter (Section 7.7); the methods of estimating
variance c o m p o n e n t s are treated in detail in the next chapter.
7.5 Partitioning the total sum of squares and degrees of freedom
So far we have ignored one other variance that can be c o m p u t e d from the
d a t a in Table 7.1. If we remove the classification into groups, we can consider
the housefly d a t a to be a single sample of an = 35 wing lengths and calculate
the m e a n and variance of these items in the conventional manner. T h e various
quantities necessary for this c o m p u t a t i o n are shown in the last column at the
right in Tables 7.1 and 7.3, headed " C o m p u t a t i o n of total sum of squares." We
obtain a mean of F = 45.34 for the sample in Table 7.1, which is, of course,
the same as the quantity Ϋ c o m p u t e d previously from the seven g r o u p means.
T h e sum of squares of the 35 items is 575.886, which gives a variance of 16.938
when divided by 34 degrees of freedom. Repeating these c o m p u t a t i o n s for the
d a t a in Table 7.3, we obtain ? = 45.34 (the same as in Table 7.1 because
Σ" a, = 0) and .v2 = 27.997, which is considerably greater than the c o r r e s p o n d ing variance from Table 7.1. The total variance c o m p u t e d from all an items is
a n o t h e r estimate of σ 2 . It is a good estimate in the first case, but in the second
sample (Table 7.3), where added c o m p o n e n t s due to treatment effects or added
variance c o m p o n e n t s are present, it is a poor estimate of the population variance.
However, the p u r p o s e of calculating the total variance in an a n o v a is not
for using it as yet a n o t h e r estimate of σ 2 , but for introducing an i m p o r t a n t
m a t h e m a t i c a l relationship between it and the other variances. This is best seen
when we arrange our results in a conventional analysis of variance table, as
7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m
TABLE
151
7.5
Anova table for data in Table 7.1.
(i)
Y
Y
- Y
- Y
Y Y
U)
Source of variation
(2)
dj
Sum
of squares
SS
Among groups
Within groups
Total
6
28
34
127.086
448.800
575.886
(41
Mean
square
MS
21.181
16.029
16.938
shown in Table 7.5. Such a table is divided into four columns. The first identifies the source of variation as a m o n g groups, within groups, and total (groups
a m a l g a m a t e d to form a single sample). The column headed df gives the degrees
of freedom by which the sums of squares pertinent to each source of variation
must be divided in order to yield the corresponding variance. T h e degrees of
freedom for variation a m o n g groups is a — 1, that for variation within groups
is a (η — 1), and that for the total variation is an — 1. The next two columns
show sums of squares and variances, respectively. Notice that the sums of
squares entered in the a n o v a table are the sum of squares a m o n g groups, the
sum of squares within groups, and the sum of squares of the total sample of
an items. You will note that variances arc not referred to by that term in anova,
but are generally called mean squares, since, in a Model I anova, they d o not
estimate a population variance. These quantities arc not true mean squares,
because the sums of squares are divided by the degrees of freedom rather than
sample size. T h e sum of squares and mean square arc frequently abbreviated
SS and MS, respectively.
The sums of squares and mean squares in Table 7.5 are the same as those
obtained previously, except for minute r o u n d i n g errors. Note, however, an
i m p o r t a n t property of the sums of squares. They have been obtained independently of each other, but when we add the SS a m o n g groups to the SS within
groups we obtain the total SS. The sums of squares are additive! Another way of
saying this is that wc can decompose the total sum of squares into a portion
due to variation a m o n g groups and a n o t h e r portion due to variation within
groups. Observe that the degrees of freedom are also additive and that the total
of 34 df can be decomposed into 6 df a m o n g groups and 28 df within groups.
Thus, if we know any two of the sums of squares (and their a p p r o p r i a t e degrees
of freedom), we can c o m p u t e the third and complete our analysis of variance.
N o t e that the mean squares arc not additive. This is obvious, since generally
(a + b)f(c + d) Φ a/c + b/d.
Wc shall use the c o m p u t a t i o n a l formula for sum of squares (Expression
(3.8)) to d e m o n s t r a t e why these sums of squares are additive. Although it is an
algebraic derivation, it is placed here rather than in the Appendix because
these formulas will also lead us to some c o m m o n c o m p u t a t i o n a l formulas for
analysis of variance. Depending on computational equipment, the formulas wc
152
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
have used so far to obtain the sums of squares may not be the most rapid procedure.
T h e sum of squares of m e a n s in simplified n o t a t i o n is
Σ
Y
SS„
=ς (- Σ y y - -„tr
\n
=
1
/
a l η
ΣΙ
a
1
\ί
/
a
η
Ση - i ΣΣ^
an*
N o t e that the deviation of m e a n s from the g r a n d mean is first rearranged t o
fit the c o m p u t a t i o n a l f o r m u l a (Expression (3.8)), a n d then each m e a n is written
in terms of its constituent variates. Collection of d e n o m i n a t o r s outside the summ a t i o n signs yields the final desired form. T o obtain the sum of squares of
groups, we multiply SS m c a n s by n, as before. This yields
1 " /"
V
1 /ο "
SS g r o u p s = η X SS m e a n s = - Σ ί Σ Π - - ( Σ Σ r
Next we evaluate the sum of squares within groups:
ss w h W i n = l X (
α
=
Y
-
>
-
η
π
ς
ς
2
= Σ
t
u / π
-
2
„
Σ
(
Σ
^
T h e total sum of squares represents
ssuniύ
= Σ Σ (
u
γ
-
η
= ΣΣ
η
2
1 / a
γ 2
η
- an- [\ Σ Σ
γ
We now copy the formulas for these sums of squares, slightly rearranged as
follows:
SS.
Σ
Σ
^ Σ ( Σ
ss,.
1 /" "
-an \ Σ Σ y
Y
y
) + Σ Σ
y 2
1
a n
ΣΣ
η
1
an
( a n
ΣΣγ
7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m
153
Adding the expression for SSgroaps to that for SS w i t h i n , we o b t a i n a q u a n t i t y that
is identical to the one we have j u s t developed as SStotal. This d e m o n s t r a t i o n
explains why the sums of squares are additive.
We shall not go t h r o u g h any derivation, but simply state that the degrees
of freedom pertaining to the sums of squares are also additive. The total degrees
of freedom are split u p into the degrees of freedom corresponding to variation
a m o n g groups a n d those of variation of items within groups.
Before we continue, let us review the m e a n i n g of the three m e a n squares
in the anova. T h e total MS is a statistic of dispersion of the 35 (an) items a r o u n d
their mean, the g r a n d m e a n 45.34. It describes the variance in the entire sample
due to all the sundry causes and estimates σ2 when there are n o a d d e d treatment
effects or variance c o m p o n e n t s a m o n g groups. T h e within-group MS, also
k n o w n as the individual or intragroup or error mean square, gives the average
dispersion of the 5 (η) items in each g r o u p a r o u n d the g r o u p means. If the a
groups are r a n d o m samples f r o m a c o m m o n h o m o g e n e o u s p o p u l a t i o n , the
within-group MS should estimate a1. The MS a m o n g groups is based on the
variance of g r o u p means, which describes the dispersion of the 7 (a) g r o u p
means a r o u n d the g r a n d mean. If the groups are r a n d o m samples from a h o m o geneous population, the expected variance of their m e a n will be σ2/η. Therefore,
in order to have all three variances of the same order of magnitude, we multiply
the variance of means by η to obtain the variance a m o n g groups. If there are
n o added treatment effects o r variance c o m p o n e n t s , the MS a m o n g groups is
an estimate of σ 2 . Otherwise, it is an estimate of
σ
1
η
-1
a
\—'
>
^
or
or
σ
Ί
J
+ ησΑ
a — ι
depending on whether the a n o v a at hand is Model I or II.
T h e additivity relations we have just learned are independent of the presence
of added treatment or r a n d o m effects. We could show this algebraically, but
it is simpler to inspect Table 7.6, which summarizes the a n o v a of Table 7.3 in
which a, or /t, is a d d e d to each sample. The additivity relation still holds,
although the values for g r o u p SS and the total SS are different from those of
Table 7.5.
TABLE 7.6
Anova table for data in Table 7.3.
y
y
y - y
Y
Y
-
-
(4)
df
Μ can
square
MS
6
28
34
503.086
448.800
951.886
83.848
16.029
27.997
C)
U)
Source of
W
Sum
af squares
SS
variation
Among groups
Within groups
Total
154
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
A n o t h e r way of looking at the partitioning of the variation is to study the
deviation f r o m m e a n s in a particular case. Referring to Table 7.1, we can look
at the wing length of the first individual in the seventh group, which h a p p e n s
to be 41. Its deviation from its g r o u p mean is
y 7 1 _ y 7 = 41 - 45.4 = - 4 . 4
The deviation of the g r o u p m e a n from the grand m e a n is
F7 - F = 45.4 - 45.34 = 0.06
and the deviation of the individual wing length from the grand m e a n is
γΊι
- y = 4 i — 45.34 = - 4 . 3 4
N o t e that these deviations are additive. The deviation of the item from the g r o u p
m e a n and that of the g r o u p mean from the grand m e a n add to the total deviation of the item from the g r a n d j n e a n . These deviations are stated algebraically
as ( 7 — F) + ( F - F) = (Y - F). Squaring and s u m m i n g these deviations for an
items will result in
a n
_
a
_
_
an
Before squaring, the deviations were in the relationship a + b = c. After squaring, we would expect them to take the form a2 4- b2 + lab = c2. W h a t h a p p e n e d
to the cross-product term corresponding to 2ab'l This is
απ
_ _
^
a
—
2Σ(y - F h y - f) = 2 Ϊ [ ( ? -
=
"
_
Ϋ ) Σ ι υ - ?>]
a covariance-type term that is always zero, sincc
( Y — F) = 0 for each of the
a groups (proof in Appendix A 1.1).
We identify the deviations represented by each level of variation at the left
margins of the tables giving the analysis of variance results (Tables 7.5 a n d 7.6).
N o t e that the deviations add u p correctly: the deviation a m o n g groups plus
the deviation within groups equals the total deviation of items in the analysis
of variance, ( F - F) + ( Y - F) = ( Y - F).
7.6 Model I anova
An i m p o r t a n t point to remember is that the basic setup of data, as well as the
actual c o m p u t a t i o n and significance test, in most cases is the same for both
models. The purposes of analysis of variance differ for the two models. So do
some of the supplementary tests and c o m p u t a t i o n s following the initial significance test.
Let us now fry to resolve the variation found in an analysis of variance
case. This will not only lead us to a more formal interpretation of a n o v a but
will also give us a deeper u n d e r s t a n d i n g of the nature of variation itself. For
7.7
155
/ m o d e l ii a n o v a
p u r p o s e s of discussion, we r e t u r n t o the housefly wing lengths of T a b l e 7.3. W e
ask the question, W h a t m a k e s any given housefly wing length a s s u m e the value
it does? T h e third wing length of the first sample of flies is recorded as 43 units.
H o w c a n we explain such a reading?
If we knew n o t h i n g else a b o u t this individual housefly, o u r best guess of
its wing length w o u l d be the g r a n d m e a n of the p o p u l a t i o n , which we k n o w
to be μ = 45.5. However, we have a d d i t i o n a l i n f o r m a t i o n a b o u t this fly. It is a
m e m b e r of g r o u p 1, which has u n d e r g o n e a t r e a t m e n t shifting the m e a n of the
g r o u p d o w n w a r d by 5 units. Therefore, a . 1 = —5, a n d we w o u l d expect o u r
individual V13 (the third individual of g r o u p 1) t o m e a s u r e 45.5 - 5 = 40.5 units.
In fact, however, it is 43 units, which is 2.5 units a b o v e this latest expectation.
T o what can we ascribe this deviation? It is individual variation of the flies
within a g r o u p because of the variance of individuals in the p o p u l a t i o n
(σ 2 = 15.21). All the genetic a n d e n v i r o n m e n t a l effects that m a k e one housefly
different f r o m a n o t h e r housefly c o m e into play t o p r o d u c e this variance.
By m e a n s of carefully designed experiments, we might learn s o m e t h i n g
a b o u t the causation of this variance a n d a t t r i b u t e it to certain specific genetic
or environmental factors. W e might also be able to eliminate some of the variance. F o r instance, by using only full sibs (brothers and sisters) in any one
culture jar, we would decrease the genetic variation in individuals, a n d undoubtedly the variance within g r o u p s would be smaller. However, it is hopeless
to try to eliminate all variance completely. Even if we could remove all genetic
variance, there would still be environmental variance. And even in the most
i m p r o b a b l e case in which we could remove both types of variance, m e a s u r e m e n t
error would remain, so that we would never obtain exactly the same reading
even on the same individual fly. T h e within-groups MS always remains as a
residual, greater or smaller f r o m experiment to e x p e r i m e n t — p a r t of the n a t u r e
of things. This is why the within-groups variance is also called the e r r o r variance
or error mean square. It is not an error in the sense of o u r m a k i n g a mistake,
but in the sense of a measure of the variation you have to c o n t e n d with when
trying to estimate significant differences a m o n g the groups. T h e e r r o r variance
is composed of individual deviations for each individual, symbolized by
the
r a n d o m c o m p o n e n t of the j t h individual variatc in the /th group. In o u r case,
e 1 3 = 2.5, since the actual observed value is 2.5 units a b o v e its expectation
of 40.5.
We shall now state this relationship m o r e formally. In a Model I analysis
of variance we assume that the differences a m o n g g r o u p means, if any, are due
to the fixed treatment effects determined by the experimenter. T h e p u r p o s e of
the analysis of variance is t o estimate the true differences a m o n g the g r o u p
means. Any single variate can be d e c o m p o s e d as follows:
Yij
=
μ
+ α,· + €y
(7.2)
where i — 1 , . . . , a, j = 1 , . . . , « ; a n d e (J represents an independent, normally
distributed variable with m e a n €,j = 0 a n d variance σ2 = a1. Therefore, a given
reading is composed of the grand m e a n μ of the population, a fixed deviation
156
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
of the mean of g r o u p i from the grand mean μ, and a r a n d o m deviation eis
of the /th individual of g r o u p i from its expectation, which is (μ + α,). R e m e m b e r
that b o t h a,· and
can be positive as well as negative. The expected value (mean)
of the e^-'s is zero, a n d their variance is the parametric variance of the population, σ 2 . F o r all the assumptions of the analysis of variance to hold, the distribution of £ u must be normal.
In a Model I a n o v a we test for differences of the type <xl — i 2 a m o n g the
g r o u p m e a n s by testing for the presence of an added c o m p o n e n t due to treatments. If we find that such a c o m p o n e n t is present, we reject the null hypothesis
that the g r o u p s come f r o m the same p o p u l a t i o n and accept the alternative
hypothesis that at least some of the g r o u p means are different from each other,
which indicates that at least some of the a,"s are unequal in magnitude. Next,
we generally wish to test which a,'s are different from each other. This is d o n e
by significance tests, with alternative hypotheses such as Hl:ctl > α 2 or H\+ a 2 ) > a 3 . In words, these test whether the mean of g r o u p 1 is greater
t h a n the mean of g r o u p 2, or whether the mean of g r o u p 3 is smaller than the
average of the m e a n s of groups I and 2.
Some examples of Model I analyses of variance in various biological
disciplines follow. An experiment in which we try the effects of different drugs
on batches of animals results in a Model I anova. We arc interested in the results
of the treatments and the differences between them. The treatments arc fixed
and determined by the experimenter. This is true also when we test the effects
of different doses of a given f a c t o r - a chemical or the a m o u n t of light to which
a plant has been exposed or temperatures at which culture bottles of insects have
been reared. The treatment does not have to be entirely understood and m a n i p ulated by the experimenter. So long as it is fixed and rcpcatable. Model I will
apply.
If wc wanted to c o m p a r e the birth weights of the Chinese children in the
hospital in Singapore with weights of Chinese children born in a hospital in
China, our analysis would also be a Model I anova. The treatment effects then
would be "China versus Singapore," which sums up a whole series of different
factors, genetic and environmental —some known to us but most of them not
understood. However, this is a definite treatment wc can describe and also
repeat: we can, if we wish, again sample birth weights of infants in Singapore
as well as in China.
Another example of Model 1 anova would be a study of body weights for
animals of several age groups. The treatments would be the ages, which are
fixed. If we find that there arc significant differences in weight a m o n g the ages,
wc might proceed with the question of whether there is a difference from age 2 to
age 3 or only from age I to age 2.
T o a very large extent. Model I anovas are the result of an experiment and
of deliberate manipulation of factors by the experimenter. However, the study
of differences such as the c o m p a r i s o n of birth weights from two countries, while
not an experiment proper, also falls into this category.
157
7 . 7 / m o d e l ii a n o v a
7.7 Model II anova
The structure of variation in a M o d e l II a n o v a is quite similar t o t h a t in
M o d e l I:
YtJ = μ + Al + € υ
(7.3)
where i = 1 , . . . , a; j = 1 , . . . , n; eu represents an independent, normally distributed variable with m e a n ei;- = 0 a n d variance σ 2 = σ 2 ; a n d A-t j e p r e s e n t s
a normally distributed variable, independent of all e's, with m e a n A t = 0 and
variance σ\. T h e m a i n distinction is that in place of fixed-treatment effects a,·,
we now consider r a n d o m effects At that differ f r o m g r o u p t o group. Since the
effects are r a n d o m , it is uninteresting t o estimate the m a g n i t u d e of these r a n d o m
effects o n a group, or the differences f r o m g r o u p to group. But we can estimate
their variance, the a d d e d variance c o m p o n e n t a m o n g g r o u p s σ \ . W e test for its
presence a n d estimate its m a g n i t u d e s^, as well as its percentage c o n t r i b u t i o n to
the variation in a M o d e l II analysis of variance.
Some examples will illustrate the applications of M o d e l II a n o v a . Suppose
we wish to determine the D N A content of rat liver cells. W e take five rats and
m a k e three p r e p a r a t i o n s f r o m each of the five livers obtained. T h e assay readings will be for a — 5 g r o u p s with η = 3 readings per group. T h e five rats presumably are sampled at r a n d o m f r o m the colony available to the experimenter.
They must be different in various ways, genetically a n d environmentally, but we
have n o definite i n f o r m a t i o n a b o u t the n a t u r e of the differences. T h u s , if wc learn
that rat 2 has slightly m o r e D N A in its liver cells t h a n rat 3, we can d o little
with this i n f o r m a t i o n , because we are unlikely to have any basis for following
u p this problem. W e will, however, be interested in estimating the variance of
the three replicates within any one liver and the variance a m o n g the five rats;
that is, does variance σ2Λ exist a m o n g rats in addition to the variance σ2 cxpcctcd
on the basis of the three replicates? T h e variance a m o n g the three p r e p a r a t i o n s
presumably arises only from differences in technique and possibly f r o m differences in D N A content in different parts of the liver (unlikely in a homogenate).
Added variance a m o n g rats, if it existed, might be due to differences in ploidy
or related p h e n o m e n a . T h e relative a m o u n t s of variation a m o n g rats and
"within" rats ( = a m o n g preparations) would guide us in designing further
studies of this sort. If there was little variance a m o n g tlic p r e p a r a t i o n s a n d
relatively m o r e variation a m o n g the rats, wc would need fewer p r e p a r a t i o n s and
more rats. O n the other h a n d , if the variance a m o n g rats was proportionately
smaller, we would use fewer rats and m o r e p r e p a r a t i o n s per rat.
In a study of the a m o u n t of variation in skin pigment in h u m a n populations,
we might wish to study different families within a h o m o g e n e o u s ethnic or racial
g r o u p and brothers and sisters within cach family. T h e variance within families
would be the error mean square, a n d we would test for an a d d e d variance
c o m p o n e n t a m o n g families. Wc would expect an a d d e d variance c o m p o n e n t
σ2Α because there arc genctic differences a m o n g families that determine a m o u n t
158
c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f
variance
of skin p i g m e n t a t i o n . W e w o u l d be especially interested in the relative p r o p o r tions of the t w o variances σ2 a n d σ\, because they would p r o v i d e us with
i m p o r t a n t genetic i n f o r m a t i o n . F r o m o u r k n o w l e d g e of genetic t h e o r y , we
w o u l d expect the variance a m o n g families t o be greater t h a n the variance a m o n g
b r o t h e r s a n d sisters within a family.
T h e a b o v e examples illustrate the t w o types of p r o b l e m s involving M o d e l
II analysis of variance t h a t a r e m o s t likely to arise in biological w o r k . O n e is
c o n c e r n e d with the general p r o b l e m of the design of a n e x p e r i m e n t a n d the
m a g n i t u d e of the e x p e r i m e n t a l e r r o r at different levels of replication, such as
e r r o r a m o n g replicates within rat livers a n d a m o n g rats, e r r o r a m o n g batches,
experiments, a n d so forth. T h e o t h e r relates t o variation a m o n g a n d within
families, a m o n g a n d within females, a m o n g a n d within p o p u l a t i o n s , a n d so
forth. Such p r o b l e m s are c o n c e r n e d with the general p r o b l e m of the relation
between genetic a n d p h e n o t y p i c variation.
Exercises
7.1
7.2
7.3
In a study comparing the chemical composition of the urine of chimpanzees
and gorillas (Gartler, Firschein, and Dobzhansky, 1956), the following results
were obtained. For 37 chimpanzees the variance for the amount of glutamic acid
in milligrams per milligram of creatinine was 0.01069. A similar study based on
six gorillas yielded a variance of 0.12442. Is there a significant difference between the variability in chimpanzees and that in gorillas? ANS. Fs = 11.639,
025[5.36] ~ 2.90.
The following data are from an experiment by Sewall Wright. He crossed Polish
and Flemish giant rabbits and obtained 27 F , rabbits. These were inbred and
112 F 2 rabbits were obtained. We have extracted the following data on femur
length of these rabbits.
η
y
s
F,
27
Fi
112
83.39
80.5
1.65
3.81
Is there a significantly greater amount of variability in femur lengths among the
F2 than among the Fx rabbits? What well-known genetic phenomenon is illustrated by these data?
For the following data obtained by a physiologist, estimate a 2 (the variance
within groups), a, (the fixed treatment effects), the variance among the groups,
and the added component due to treatment Σ α 2 /(a — 1), and test the hypothesis
that the last quantity is zero.
Treatment
V
.v2
η
A
Β
C
D
6.12
2.85
10
4.34
6.70
10
5.12
4.06
10
7.28
2.03
10
159
exercises
7.4
7.5
ANS. s 2 = 3.91, a, = 0.405, &2 = 1.375, ά 3 = 0.595, ά 4 = 1.565, MS among
groups = 124.517, and F, = 31.846 (which is significant beyond the 0.01 level).
For the data in Table 7.3, make tables to represent partitioning of the value of
each variate into its three components, Ϋ, (Ϋ — Ϋ),(Υυ — Yj). The first table would
then consist of 35 values, all equal to the grand mean. In the second table all
entries in a given column would be equal to the difference between the mean of
that column and the grand mean. And the last table would consist of the deviations of the individual variates from their column means. These tables represent
estimates of the individual components of Expression (7.3). Compute the mean
and sum of squares for each table.
A geneticist recorded the following measurements taken on two-week-old mice
of a particular strain. Is there evidence that the variance among mice in different
litters is larger than one would expect on the basis of the variability found within
each litter?
Litters
7.6
1
2
3
4
5
6
7
19.49
20.62
19.51
18.09
22.75
22.94
22.15
19.16
20.98
23.13
23.06
20.05
21.47
14.90
19.72
15.90
21.48
22.48
18.79
19.70
16.72
19.22
26.62
20.74
21.82
20.00
19.79
21.15
14.88
19.79
21.52
20.37
21.93
20.14
22.28
ANS. .r = 5.987, MS among = 4.416, s2A = 0, and Fs = 0.7375, which is clearly not
significant at the 5% level.
Show that it is possible to represent the value of an individual variate as follows:
y = (>') + (>',— V') + (Vj; — Y). What docs each of the terms in parentheses
estimate in a Model 1 anova and in a Model II anova?
CHAPTER
Single-Classification
Analysis of Variance
We are now ready to study actual eases of analysis of variance in a variety of
applications and designs. The present chapter deals with the simplest kind of
a n o v a , single-classification
analysis of variance. By this we mean an analysis in
which the groups (samples) are classified by only a single criterion. Either interpretations of the seven samples of housefly wing lengths (studied in the last
chapter), different medium formulations (Model I), or progenies of different females (Model II) would represent a single criterion for classification. O t h e r
examples would be different temperatures at which groups of animals were
raised or different soils in which samples of plants have been grown.
We shall start in Section 8.1 by staling the basic computational formulas
for analysis of variance, based on the topics covered in the previous chapter.
Section 8.2 gives an example of the c o m m o n case with equal sample sizes. We
shall illustrate this case by means of a Model I anova. Since the basic computations for the analysis of variance - are the same in either model, it is not
necessary to repeat the illustration with a Model II anova. The latter model is
featured in Section 8.3, which shows the minor c o m p u t a t i o n a l complications
resulting from unequal sample sizes, since all groups in the anova need not
necessarily have the same sample size. Some c o m p u t a t i o n s unique to a Model
II anova are also shown; these estimate variance components. F o r m u l a s be-
8.1 / c o m p u t a t i o n a l
formulas
161
come especially simple for the two-sample case, as explained in Section 8.4.
In Model I of this case, the mathematically equivalent t test can be applied
as well.
W h e n a Model I analysis of variance has been f o u n d to be significant,
leading to the conclusion that the m e a n s are not f r o m the same population,
we will usually wish to test the means in a variety of ways to discover which
pairs of m e a n s are different f r o m each other and whether the m e a n s can be
divided into groups that are significantly different from each other. T o this end,
Section 8.5 deals with so-called planned comparisons designed before the test
is run; and Section 8.6, with u n p l a n n e d multiple-comparison tests t h a t suggest
themselves to the experimenter as a result of the analysis.
8.1 Computational formulas
We saw in Section 7.5 that the total sum of squares and degrees of freedom
can be additively partitioned into those pertaining to variation a m o n g groups
and those to variation within groups. F o r the analysis of variance proper, we
need only the sum of squares a m o n g groups and the sum of squares within
groups. But when the c o m p u t a t i o n is not carried out by computer, it is simpler to calculate the total sum of squares and the sum of squares a m o n g groups,
leaving the sum of squares within groups to be obtained by the subtraction
SSiotai — SS g r o u p s . However, it is a good idea to c o m p u t e the individual variances so we can check for heterogeneity a m o n g them (sec Section 10.1). This will
also permit an independent c o m p u t a t i o n of SS w i l h i n as a check. In Section 7.5
we arrived at the following c o m p u t a t i o n a l formulas for the total a n d a m o n g groups sums of squares:
These formulas assume equal sample size η for each g r o u p and will be modified
in Section 8.3 for unequal sample sizes. However, they suffice in their present
form to illustrate some general points a b o u t c o m p u t a t i o n a l procedures in
analysis of variance.
We note that the second, subtracted term is the same in both sums of
squares. This term can be obtained by s u m m i n g all the variates in the a n o v a
(this is the grand total), squaring the sum, and dividing the result by the total
n u m b e r of variates. It is c o m p a r a b l e to the second term in the c o m p u t a t i o n a l
formula for the ordinary sum of squares (Expression (3.8)). This term is often
called the correction term (abbreviated CT).
The first term for the total sum of squares is simple. It is the sum of all
squared variatcs in the anova table. T h u s the total sum of squares, which
describes the variation of a single unstructured sample of an items, is simply
the familiar sum-of-squares formula of Expression (3.8).
162
c h a p t e r 8 / single-classification analysis of
variance
The first term of the sum of squares a m o n g g r o u p s is obtained by squaring
the sum of the items of each group, dividing each square by its sample size,
a n d s u m m i n g the quotients from this operation for each group. Since the
sample size of each g r o u p is equal in the above formulas, we can first sum all
the squares of the g r o u p sums and then divide their sum by the constant n.
F r o m the formula for the sum of squares a m o n g groups emerges an important c o m p u t a t i o n a l rule of analysis of variance: To find the sum of squares
among any set of groups, square the sum of each group and divide by the sample
size of the group·, sum the quotients of these operations and subtract from the sum
a correction term. To find this correction term, sum all the items in the set, square
the sum, and divide it by the number of items on which this sum is based.
8.2 Equal η
W e shall illustrate a single-classification a n o v a with equal sample sizes by a
Model I example. The c o m p u t a t i o n up to and including the first test of significance is identical for b o t h models. Thus, the c o m p u t a t i o n of Box 8.1 could also
serve for a Model II a n o v a with equal sample sizes.
The d a t a are f r o m a n experiment in plant physiology. They are the lengths
in coded units of pea sections grown in tissue culture with auxin present. T h e
p u r p o s e of the experiment was to test the effects of the addition of various
sugars on growth as measured by length. F o u r experimental groups, representing three different sugars and one mixture of sugars, were used, plus one control
without sugar. Ten_observations (replicates) were m a d e for each treatment. T h e
term "trejitmenj_" already implies a_Mmlel I anova. It is obvious that the five
g r o u p s d o not represent r a n d o m samples from all possible experimental conditions but were deliberately designed to legt^the effects of certain sugars o n J h £
growth rate. We arc interested in the effect of the sugars on length, and our null
hypothesis will be that there is no added c o m p o n e n t due to treatment effects
a m o n g the five groups; that is, t h c p o p u l a j i o n means are all assumed to be equal.
T h e c o m p u t a t i o n is illustrated in Box 8.1. After quantities 1 t h r o u g h 7 have
been calculated, they are entered into an analysis-of-variance table, as shown
in the box. General formulas for such a tabic arc shown first; these arc followed
by a table filled in for the specific example. We note 4 degrees of freedom a m o n g
groups, there being five treatments, and 45 df within groups, representing 5
times (10 — 1) degrees of freedom. We find that the mean square a m o n g g r o u p s
is considerably greater than the error mean square, giving rise to a suspicion
that an added c o m p o n e n t due to treatment effects is present. If the MS g r o u p s is
equal to or less than the M 5 w i l h i n , we d o not bother going on with the analysis,
for we would not have evidence for the presence of an added c o m p o n e n t . You
may wonder how it could be possible for the MS g r o u p s to be less than the
MSwuhin· You must remember that these two are independent estimates. If there
is no added c o m p o n e n t due to treatment or variance component a m o n g groups,
the estimate of the variance a m o n g groups is as likely to be less as it is to be
greater than the variance within groups.
8.2 / e q u a l η
163
Expressions for the expected values of the m e a n squares are also shown
in the first a n o v a table of Box 8.1. They are the expressions you learned in the
previous chapter for a M o d e l I anova.
BOX 8.1
Single-classification anova with equal sample sizes.
The effect of the addition of different sugars on length, in ocular units
( x 0.114 = mm), of pea sections grown in tissue culture with auxin present: η = 10
(replications per group). This is a Model I anova.
Treatments (a = 5)
Observations,
i.e., replications
Control
2%
Glucose
added
2%
Fructose
added
17. Glucose
+
/% Fructose
added
2%
Sucrose
added
1
2
3
4
5
6
7
8
9
10
It
75
67
70
75
65
71
67
67
76
68
57
58
60
59
62
60
60
57
59
61
58
61
56
58
57
56
61
60
57
58
58
59
58
61
57
56
58
57
57
59
62
66
65
63
64
62
65
65
62
67
ΣϊY
701
70.1
593
59.3
582
58.2
580
58X>
641
64.1
Source: Data by W. Purves.
Preliminary computations
1. Grand total = £ £ Y = 701 + 593 + · · · + 641 = 3097
2. Sum of the squared observations
α η
**ΣΣγ2
*= 75 2 + 67* + · · · + 68 2 + 57 2 + · • · + 67 2 = 193,151
3. Sum of the squared group totals divided by η
= J Σ (l
=
y
J - A(701 2 + 5932 + · · · + 641 2 )
(1,929,055) = 192,905.50
4. Grand total squared and divided by total sample size = correction term
CP M
i
e»V
ΣΥY y
5 x 1 0
- ^
50
- 191,828.18
164
c h a p t e r 8 / single-classification analysis of
variance
B O X 8,1
Continued
S. ss total =
2
i i r
~ C T
= quantity 2 - quantity 4 - 193,151 - 191,828.18 - 1322.82
« quantity 3 - quantity 4 « 192,905.50 - 191,828.18 = 1077.32
7. SS w j t h i n =s SS (ora i — SSgreap;
« quantity 5 - quantity 6 « 1322.82 - 1077.32 = 245.50
T h e anova table is constructed as follows.
Source
of variation
f - Y
Among groups
F - y
Within groups
y - Y
Total
df
SS
MS
a - 1
6
- i (β - 1 )
7
—
a(n - 1)
a(n - 1)
7
an - 1
5
Expected
MS
F,
MS w i thi „
^
+
a - 1
a2
Substituting the computed values into the above table, we obtain the fol
lowing:
Anova table
Source of variation
df
SS
MS
Fs
4
1077.32
269.33
49.33**
5.46
Ϋ -- Y
Among groups
(among treatments)
Y -- f
Within groups
(error, replicates)
45
245.50
Y -- Ϋ
Total
49
1322.82
^0.05(4,4-51
=
2.58
^0.01(4,45]
=
3.77
* - 0.01 < Ρ 5 0.05.
* * - P S 0.01.
These conventions will be followed throughout the text and will no longer be explained in subsequent
boxes and tables.
Conclusions. There is a highly significant (P « 0.01) added component due to
treatment effects in the mean square among groups (treatments). The different
sugar treatments clearly have a significant effect on growth of the pea sections.
See Sections 8.5 and 8.6 for the completion of a Model I analysis of variance:
that is, the method for determining which means are significantly different from
each other.
8.3 / u n e q u a l
η
165
It may seem that we are carrying an unnecessary n u m b e r of digits in the
c o m p u t a t i o n s in Box 8.1. This is often necessary to ensure that the e r r o r sum
of squares, quantity 7, has sufficient accuracy.
Since v 2 is relatively large, the critical values of F have been c o m p u t e d by
h a r m o n i c interpolation in Table V (see f o o t n o t e to Table III for h a r m o n i c
interpolation). The critical values have been given here only to present a complete record of the analysis. Ordinarily, when confronted with this example, you
would not bother w o r k i n g out these values of F. C o m p a r i s o n of the observed
variance ratio Fs = 49.33 with F 0 0 1 [ 4 4 0 ] = 3.83, the conservative critical value
(the next tabled F with fewer degrees of freedom), would convince you that the
null hypothesis should be rejected. The probability that the five groups differ as
much as they d o by chance is almost infinitesimally small. Clearly, the sugars
produce an added treatment effect, apparently inhibiting growth and consequently reducing the length of the pea sections.
At this stage we are not in a position to say whether each treatment is
different from every other treatment, or whether the sugars are different f r o m the
control but not different f r o m each other. Such tests are necessary to complete
a Model I analysis, but we defer their discussion until Sections 8.5 and 8.6.
8.3 Unequal η
This time we shall use a Model II analysis of variance for an example. Remember
that up to and including the F test for significance, the c o m p u t a t i o n s are exactly
the same whether the anova is based on Model I or Model II. We shall point
out the stage in the c o m p u t a t i o n s at which there would be a divergence of
operations depending on the model.
T h e example is shown in Table 8.1. It concerns a series of morphological
measurements of the width of the scutum (dorsal shield) of samples of tick
larvae obtained from four different host individuals of the cottontail rabbit.
These four hosts were obtained at r a n d o m from one locality. We know nothing
about their origins or their genetic constitution. They represent a r a n d o m
sample of the population of host individuals from the given locality. We would
not be in a position to interpret differences between larvae from different hosts,
since we know nothing of the origins of the individual rabbits. Population
biologists arc nevertheless interested in such analyses because they provide an
answer to the following question: Are (he variances of means of larval characters
a m o n g hosts greater than expected on the basis of variances of the characters
within hosts? We can calculate the average variance of width of larval scutum
on a host. This will be our "error" term in the analysis of variance. We then
test the observed mean square a m o n g groups and sec if it contains an added
c o m p o n e n t of variance. What would such an added c o m p o n e n t of variance
represent? The mean square within host individuals (that is, of larvae on any
one host) represents genetic differences a m o n g larvae and differences in environmental experiences of these larvae. Added variance a m o n g hosts demonstrates
significant differentiation a m o n g the larvae possibly due to differences a m o n g
t In, l-wiclt.' -ilTivf inn ill.· I·.™·!,. Il -ilcr» mau ke> rllwa
Ι.· ΛΙΙΪ,· r,.|i
Ίηι,,η.ι
c h a p t e r 8 / single-classification analysis of
166
variance
TABLE 8 . 1
D a t a and anova table for a single classification anova with unequal sample sizes. W i d t h of s c u t u m
(dorsal shield) of larvae of t h e tick Haemaphysalis
leporispalustris
in s a m p l e s f r o m 4 c o t t o n t a i l
r a b b i t s . M e a s u r e m e n t s in m i c r o n s . T h i s is a M o d e l II a n o v a .
Hosts
1
γ 2
2
3
4
380
376
360
368
372
366
374
382
350
356
358
376
338
342
366
350
344
364
354
360
362
352
366
372
362
344
342
358
351
348
348
376
344
342
372
374
360
2978
3544
4619
2168
8
10
13
6
1,108,940
1,257,272
1,642,121
784,536
54.21
142.04
79.56
233.07
ΣΥ
Σ
(a = 4)
s2
Source: Data by P. A. Thomas.
Anova table
Source of
Y
Y
-
y
y
y - y
variation
df
SS
MS
Fs
5.26**
Among groups (among hosts)
Within groups (error; among
larvae on a host)
3
1808.7
602.6
33
3778.0
114.5
Total
36
5586.7
Fq.05[3.331 = 2.89
Fq.01[3.33] ~ 4.44
Conclusion.
T h e r e is a significant (Ρ < 0.01) a d d e d v a r i a n c e c o m p o n e n t a m o n g
h o s t s for w i d t h of s c u t u m in larval ticks.
the larvae, s h o u l d e a c h h o s t c a r r y a f a m i l y of ticks, o r a t least a p o p u l a t i o n
w h o s e i n d i v i d u a l s a r e m o r e related t o e a c h o t h e r t h a n they a r e to tick l a r v a e
on other host individuals.
T h e e m p h a s i s in this e x a m p l e is o n the m a g n i t u d e s of the v a r i a n c e s . In view
of t h e r a n d o m c h o i c e of h o s t s this is a clear c a s e of a M o d e l II a n o v a . B e c a u s e
this is a M o d e l 11 a n o v a , t h e m e a n s for e a c h h o s t h a v e been o m i t t e d f r o m
T a b l e 8.1. W e are n o t i n t e r e s t e d in t h e i n d i v i d u a l m e a n s o r p o s s i b l e differences
8.3 / u n e q u a l
167
η
a m o n g them. A possible reason for looking at the means would be at the beginning of the analysis. O n e might wish to look at the g r o u p means to spot outliers,
which might represent readings that for a variety of reasons could be in error.
The c o m p u t a t i o n follows the outline furnished in Box 8.1, except that the
symbol Σ" now needs to be written Σ"', since sample sizes differ for each group.
Steps 1, 2, and 4 t h r o u g h 7 are carried out as before. Only step 3 needs to be
modified appreciably. It is:
3. Sum of the squared g r o u p totals, each divided by its sample size,
a
= Σ
The critical 5% and 1% values of F are shown below the a n o v a table in
Table 8.1 (2.89 and 4.44, respectively). You should confirm them for yourself
in Table V. N o t e that the argument v2 = 33 is not given. You therefore have
to interpolate between a r g u m e n t s representing 30 to 40 degrees of freedom,
respectively. T h e values shown were c o m p u t e d using h a r m o n i c interpolation.
However, again, it was not necessary to carry out such an interpolation. The
conservative value of F, Fal3i30], is 2.92 and 4.51, for α = 0.05 and a = 0.01,
respectively. T h e observed value Fs is 5.26, considerably above the interpolated
as well as the conservative value of F0 0l. We therefore reject the null hypothesis
(H0: a\ = 0) that there is no added variance c o m p o n e n t a m o n g g r o u p s and that
the two mean squares estimate the same variance, allowing a type I error of less
than \ X . We accept, instead, the alternative hypothesis of the existence of an
added variance c o m p o n e n t σ2Λ.
W h a t is the biological meaning of this conclusion? For some reason, the
ticks on different host individuals dilfer more from each other than d o individual
ticks on any one host. This may be due to some modifying influence of individual hosts on the ticks (biochemical differences in blood, differences in the skin,
differences in the environment of the host individual—all of them rather unlikely in this case), or it may be due to genetic diflcrcnces a m o n g the ticks.
Possibly the ticks on each host represent a sibship (that is, are descendants of a
single pair of parents) and the differences in the ticks a m o n g host individuals
represent genetic differences a m o n g families; or perhaps selection has acted differently on the tick populations on each host, or the hosts have migrated to the
collection locality from different geographic areas in which the licks differ in
width of scutum. Of these various possibilities, genetic differences a m o n g sibships seem most reasonable, in view of the biology of the organism.
The c o m p u t a t i o n s up to this point would have been identical in a Model 1
anova. If this had been Model I, the conclusion would have been that there
is a significant treatment effect rather than an added variance c o m p o n e n t . Now,
however, we must complete the c o m p u t a t i o n s a p p r o p r i a t e to a Model II anova.
These will includc the estimation of the added variance c o m p o n e n t and the
calculation of percentage variation at the two levels.
c h a p t e r 8 / single-classification analysis of
168
variance
Since sample size n, differs a m o n g g r o u p s in this example, we c a n n o t write
σ2 + ησ2Α for the expected MS g r o u p s . It is o b v i o u s that no single value of η would
be a p p r o p r i a t e in the f o r m u l a . W e therefore use an average n; this, however,
is n o t simply n, the a r i t h m e t i c m e a n of the «,·'s, but is
1
«η =
V
Σ
n
i
~
Σ>?\
(8.1)
a
Σ"· /
which is a n average usually close to b u t always less t h a n n, unless s a m p l e sizes
are equal, in which case n0 = n. In this example,
1
4 -
(8 + 10 + 13 + 6) -
+ 10 2 + 13 2 + 6 2
~ 8 + 10 + 13 +
= 9.009
Since the M o d e l II expected MS g r o u p s is a2 + ησ2Λ a n d the expected M 5 w i l h i n is
σ 2 , it is o b v i o u s how the variance c o m p o n e n t a m o n g g r o u p s a2A a n d the e r r o r
variance σ 2 are o b t a i n e d . Of course, the values that we o b t a i n are s a m p l e estim a t e s a n d therefore are written as .s2t a n d s2. T h e a d d e d variance c o m p o n e n t s\
is estimated as (JVfSgrouph — MS w i l h i n )/«. W h e n e v e r sample sizes a r e u n e q u a l , the
d e n o m i n a t o r becomcs n 0 . In this example, (602.7 - 114.5)/9.009 = 54.190. W e
are frequently not so m u c h interested in the actual values of these variance c o m p o n e n t s as in their relative magnitudes. F o r this p u r p o s e we sum the c o m p o nents a n d express each as a percentage of the resulting sum. T h u s s2 + s2, =
114.5 + 54.190
168.690, a n d ,v2 a n d .v2 arc 67.9% a n d 32.1% of this sum, respectively; relatively m o r e variation occurs within g r o u p s (larvae on a host)
than a m o n g g r o u p s (larvae on different hosts).
8.4 T w o groups
Λ frequent test in statistics is to establish the siynijicancc
of the
difference
between two means. This can easily be d o n e by m e a n s of an analysis of variance
for two (jroups. Box 8.2 shows this p r o c e d u r e for a Model I a n o v a , the c o m m o n
case.
T h e example in Box 8.2 conccrns the onset of r e p r o d u c t i v e m a t u r i t y in
water fleas, Daphnia loiu/ispina. This is measured as the average age (in days)
at beginning of r e p r o d u c t i o n . Hacli variate in the table is in fact an average,
and a possible Haw in the analysis might be that the averages arc not based
on equal sample sizes. However, we arc not given this i n f o r m a t i o n and have
to proceed on the a s s u m p t i o n that each reading in the tabic is an equally
reliable variate. T h e t w o scries represent different genetic crosses, a n d the seven
replicates in each series arc clones derived f r o m the same genetic cross. This
example is clcarly a Model 1 a n o v a . since the question to be answered is whether
series I differs from series II in average age at the beginning of r e p r o d u c t i o n .
Inspection of the d a t a shows thai the mean age at beginning of r e p r o d u c t i o n
8.4 / t w o
groups
169
BOX 8J
Testing the difference in means between two groups.
Average age (in days) at beginning of reproduction in Daphnia longispina (each
variate is a mean based on approximately similar numbers of females). Two series
derived from different genetic crosses and containing seven clones each are
compared; η = 7 clones per series. This is a Model I anova.
Series (a = 2)
I
11
7.2
7.1
9.1
7.2
8.8
7.5
7.7
7.6
7.4
6.7
7.2
7.3
7.2
7.5
η
Σγ
Υ
Σγ
s2
Ζ
52.6
7.5143
52.9
7.5571
398.28
0.5047
402.23
0.4095
Source: Data by Ordway, from Banta (1939).
Single classification anova with two groups with equal sample sizes
Anova table
Source of
y - y
y - y
variation
ss
MS
1
0.00643
0.00643
12
13
5.48571
5.49214
0.45714
df
Between groups (series)
Within groups (error;
clones within series)
Y- Υ Total
0.0141
FO.OJ(l.121 ~ 4.75
Conclusions. Since Fs « F 0 0 5 ( 1 | 2| , the null hypothesis is accepted. The means
of the two series are not significantly different; that is, the two series do not differ
in average age at beginning of reproduction.
A t test of the hypothesis that two sample means come from a population with
equal μ; also confidence limits of the difference between two means
This test assumes that the variances in the populations from which the two
samples were taken are identical. If in doubt about this hypothesis, test by method
of Box 7.1, Section 7.3.
170
chapter
8 / single-classification analysis of
variance
BOX 8.2
Continued
The appropriate formula for f s is one of the following:
Expression (8.2), when sample sizes are unequal and n, or n z or both sample
sizes are small ( < 30): df = n, + n 2 — 2
Expression (8.3), when sample sizes are identical (regardless of size): df =
2(« - 1)
Expression (8.4), when n1 and n 2 are unequal but both are large ( > 30): df ~
tts -+ rt2 — 2
For the present data, since sample sizes are equal, we choose Expression (8.3):
t
__ ( ή - VVl - (μ. - μι)
We are testing the null hypothesis that μι — μ2 = 0. Therefore we replace this
quantity by zero in this example. Then
t% =
7.5143 - 7.5571
-0.0428
-0.0428
V(a5047 + 0.4095)/7
^09142/7
0-3614
Λ11ή,
= -0.1184
The degrees of freedom for this example are 2(n — 1) = 2 χ 6 = 12. The critical value of f0.oMi2j = 2-179. Since the absolute value of our observed f, is less than
the critical t value, the means are found to be not significantly different, which is
the same result as was obtained by the anova.
Confidence limits of the difference between two means
=
(^l
—
^2) ~~ '«[vjSFi-Fz
L 2 = (Yi — Y2) + ta[V]Sp, -γ.
In this case F, - f 2 = --0.0428, t„.05„2, = 2.179, and s ? , = 0.3614, as computed earlier for the denominator of the t test. Therefore
L , = —0.0428 - (2.179)(0.3614) = - 0 . 8 3 0 3
L 2 = - 0 . 0 4 2 8 + (2.179X0.3614) = 0.7447
The 95% confidence limits contain the zero point (no difference), as was to be
expected, since the difference V, - Y2 was found to be not significant.
•
is very similar for the two series. It would surprise us, therefore, to find that
tlicy arc significantly different. However, we shall carry out a test anyway. As
you realize by now, one cannot tell from the m a g n i t u d e of a difference whether
i( is significant. This depends on the m a g n i t u d e of (he error mean square, representing the variance within scries.
The c o m p u t a t i o n s for the analysis of variance are not shown. They would
be the same as in Box 8.1. With equal sample sizes and only two groups, there
8.4 / t w o
171
groups
is one further c o m p u t a t i o n a l shortcut. Q u a n t i t y 6, SSgroups,
puted by the following simple formula:
( Σ ^ - Σ ^ )
=
^
(526 -
2 n
=
-
can be directly com-
529) 2
=
1 4
0
0
0
6
4
3
There is only 1 degree of freedom between the two groups. The critical value of
F 0 ,05[i,i2] >s given u n d e r n e a t h the a n o v a table, but it is really not necessary to
consult it. Inspection of the m e a n squares in the a n o v a shows that MS g r o u p s
is m u c h smaller t h a n MS„ U h i n ; therefore the value of F s is far below unity,
and there c a n n o t possibly be an added c o m p o n e n t due to treatment effects
between the series. In cases where A/S g r o u p s < MS w i t h i n , we d o not usually b o t h e r
to calculate Fs, because the analysis of variance could not possibly be significant.
There is a n o t h e r m e t h o d of solving a Model I two-sample analysis of variance. This is a t test of the differences between two means. This t test is the
traditional m e t h o d of solving such a problem; it may already be familiar to you
from previous acquaintance with statistical work. It has no real advantage in
either ease of c o m p u t a t i o n or understanding, and as you will see, it is mathematically equivalent to the a n o v a in Box 8.2. It is presented here mainly for
the sake of completeness. It would seem too much of a break with tradition
not to have the t test in a biostatistics text.
In Section 6.4 we learned a b o u t the t distribution and saw that a t distribution of η — 1 degree of freedom could be obtained from a distribution of
the term (F( — μ)/χ ? ι , where sy_ has η — 1 degrees of freedom and Ϋ is normally
distributed. The n u m e r a t o r of this term represents a deviation of a sample mean
from a parametric mean, and the d e n o m i n a t o r represents a standard error for
such a deviation. We now learn that the expression
(% - Y2) - (μ, -
i, =
"(η. ;
μ2)
(8.2)
1 Mf i (>i2 - 1 >sl "ι
η.
+ η2
-
2
n,n7
is also distributed as t. Expression (8.2) looks complicated, but it really has
the same structure as the simpler term for t. T h e n u m e r a t o r is a deviation,
this time, not between a single sample mean and the parametric mean, but
between a single difference between two sample means, F, and Ϋ2, and the
true difference between the m e a n s of the populations represented by these
means. In a test of this sort our null hypothesis is that the two samples come
from the same population; that is, they must have the same parametric mean.
Thus, the difference μ, — μ2 is assumed to be zero. We therefore test the deviation of the difference V, — F2 from zero. The d e n o m i n a t o r of Expression (8.2)
is a s t a n d a r d error, the s t a n d a r d error of the difference between two means
•«F,-Fi· Tfie left portion of the expression, which is in square brackets, is a
weighted average of the variances of the two samples, .v2 and .v2. computed
172
chapter
8 / single-classification analysis of
variance
in the m a n n e r of Section 7.1. T h e right term of the s t a n d a r d e r r o r is the c o m p u t a t i o n a l l y easier f o r m of ( l / n j ) + ( l / n 2 ) , which is the factor by which t h e
average variance within g r o u p s m u s t be multiplied in o r d e r to convert it i n t o
a variance of the difference of m e a n s . T h e a n a l o g y with the m u l t i p l i c a t i o n of
a s a m p l e variance s 2 by 1 jn to t r a n s f o r m it into a variance of a m e a n sy s h o u l d
be obvious.
T h e test as outlined here assumes e q u a l variances in the t w o p o p u l a t i o n s
sampled. This is also a n a s s u m p t i o n of the analyses of variance carried out so
far, a l t h o u g h we have not stressed this. W i t h only two variances, equality m a y
be tested by the p r o c e d u r e in Box 7.1.
W h e n sample sizes are e q u a l in a t w o - s a m p l e test, Expression (8.2) simplifies
to the expression
(Υ, - Υ,) - (μι - μ , )
(8.3)
which is w h a t is applied in t h e present e x a m p l e in Box 8.2. W h e n the s a m p l e
sizes are u n e q u a l but r a t h e r large, so t h a t the differences between
and
—1
are relatively trivial, Expression (8.2) reduces to the simpler form
(V, -
Υ2)-(μ, - μ 2 )
(8.4)
T h e simplification of Expression (8.2) to Expressions (8.3) a n d (8.4) is s h o w n in
A p p e n d i x A 1.3. T h e pertinent degrees of f r e e d o m for Expressions (8.2) a n d (8.4)
are nl + n2
2, a n d for Expression (8.3) ilf is 2(η — I).
T h e test of significance for differences between m e a n s using the f test is
s h o w n in Box 8.2. This is a two-tailed test because o u r alternative hypothesis
is / / , : μ, Φ μ2. T h e results of this test are identical t o those of the a n o v a in the
s a m e box: the two m e a n s are not significantly different. W e can d e m o n s t r a t e
this m a t h e m a t i c a l equivalence by s q u a r i n g the value for ts. T h e result should
be identical to the Fs value of the c o r r e s p o n d i n g analysis of variance. Since
ts = - 0 . 1 1 8 4 in Box 8.2, t2 = 0.0140. W i t h i n r o u n d i n g error, this is e q u a l to
the Fs o b t a i n e d in the a n o v a (Fx = 0.0141). W h y is this so? We learned that
f |v i = (Ϋ — μ )/*>·, where ν is the degrees of freedom of the variance of the m e a n
stherefore
= (Υ — μ) 2 Is], However, this expression can be regarded as a
variance ratio. T h e d e n o m i n a t o r is clearly a variance with ν degrees of f r e e d o m .
T h e n u m e r a t o r is also a variance. It is a single deviation s q u a r e d , which
represents a sum of squares possessing 1 r a t h e r than zero degrees of f r e e d o m
(since it is a deviation f r o m the true m e a n μ r a t h e r t h a n a s a m p l e mean). Λ
s u m of s q u a r e s based on I degree of f r e e d o m is at the same time a variance.
T h u s , t 2 is a variance ratio, since i[2v, =
,_vj, as we have seen. In A p p e n d i x
A 1.4 wc d e m o n s t r a t e algebraically that the t 2 a n d the /·'„ value o b t a i n e d in
Box 8.2 are identical quantities. Since ι a p p r o a c h e s the n o r m a l distribution as
8.5 / c o m p a r i s o n s a m o n g m e a n s ' p l a n n e d
comparisons
173
the s q u a r e of t h e n o r m a l deviate as ν -» oo. W e also k n o w (from Section 7.2)
that rfv.j/Vi = Flvuao]. Therefore, when νί = 1 a n d v 2 = oo, x f u = F [ l ao] = f j ^ ,
(this c a n be d e m o n s t r a t e d f r o m Tables IV, V, a n d III, respectively):
2
Z0.0511 ]
= 3.841
^0.05[1 ,x] = 3.84
= 1.960
fo.os[*i = 3-8416
T h e t test for differences between t w o m e a n s is useful w h e n we wish t o set
confidence limits to such a difference. Box 8.2 shows h o w to calculate 95%
confidence limits to the difference between the series m e a n s in the Daphnia
example. T h e a p p r o p r i a t e s t a n d a r d e r r o r a n d degrees of f r e e d o m d e p e n d on
whether Expression (8.2), (8.3), or (8.4) is chosen for ts. It d o e s not surprise us
to find that the confidence limits of the difference in this case enclose the value
of zero, r a n g i n g f r o m ^ 0 . 8 3 0 3 t o + 0 . 7 4 4 7 . T h i s must be so w h e n a difference
is found to be not significantly different from zero. We can i n t e r p r e t this by
saying that we c a n n o t exclude zero as the true value of the difference between
the m e a n s of the t w o series.
A n o t h e r instance when you might prefer to c o m p u t e the t test for differences
between two m e a n s rather t h a n use analysis of variance is w h e n you are lacking
the original variates a n d have only published m e a n s a n d s t a n d a r d e r r o r s available for the statistical test. Such an example is furnished in Exercise 8.4.
8.5 Comparisons among means: Planned comparisons
We have seen that after the initial significance test, a M o d e l II analysis of
variance is c o m p l e t e d by estimation of the a d d e d variance c o m p o n e n t s . We
usually c o m p l e t e a Model 1 a n o v a of m o r e t h a n t w o g r o u p s by e x a m i n i n g the
d a t a in greater detail, testing which m e a n s are different f r o m which o t h e r ones
or which g r o u p s of m e a n s arc different from o t h e r such g r o u p s or from single
means. Let us look again at the M o d e l I a n o v a s treated so far in this chapter.
We can dispose right away of the t w o - s a m p l e ease in Box 8.2, the average age
of water fleas at beginning of r e p r o d u c t i o n . As you will recall, there was no
significant difference in age between the two genetic scries. Bui even if there
had been such a difference, no further tests arc possible. However, the d a t a on
lenglh of pea sections given in Box 8.1 show a significant difference a m o n g (he
five treatments (based on 4 degrees of freedom). Although we k n o w that the
means are not all equal, we d o nol k n o w which ones differ from which o t h e r
ones. This leads us to the subject of tests a m o n g pairs a n d g r o u p s of means.
T h u s , for example, we might test the control against the 4 experimental treatments representing a d d e d sugars. T h e question to be lested would be, D o e s the
addition of sugars have an effect on length of pea sections? We might also test
for differences a m o n g the sugar treatments. A reasonable test might be pure
sugars (glucose, fructose, and sucrose) versus the mixed sugar treatment (1%
174
c h a p t e r 8 / single-classification analysis of
variance
An i m p o r t a n t point a b o u t such tests is t h a t they are designed a n d c h o s e n
i n d e p e n d e n t l y of the results of the experiment. T h e y should be p l a n n e d before
the experiment h a s been carried out a n d the results o b t a i n e d . Such c o m p a r i s o n s
are called planned or a priori comparisons. Such tests are applied regardless of
the results of the preliminary overall a n o v a . By c o n t r a s t , after t h e e x p e r i m e n t
has been carried out, we might wish to c o m p a r e certain m e a n s t h a t we notice
to be m a r k e d l y different. F o r instance, sucrose, with a m e a n of 64.1, a p p e a r s
to have had less of a g r o w t h - i n h i b i t i n g effect t h a n fructose, with a m e a n of 58.2.
We might therefore wish to test w h e t h e r there is in fact a significant difference
between the effects of fructose a n d sucrose. Such c o m p a r i s o n s , which suggest
themselves as a result of the c o m p l e t e d experiment, are called unplanned o r a
posteriori comparisons. T h e s e tests are p e r f o r m e d only if the preliminary overall
a n o v a is significant. T h e y include tests of the c o m p a r i s o n s between all possible
pairs of means. W h e n there are a means, there can, of course, be a(a — l)/2
possible c o m p a r i s o n s between pairs of means. T h e reason we m a k e this distinction between a priori a n d a posteriori c o m p a r i s o n s is that the tests of significance a p p r o p r i a t e for the t w o c o m p a r i s o n s a r e different. A simple e x a m p l e will
s h o w why this is so.
Let us a s s u m e we have sampled f r o m an a p p r o x i m a t e l y n o r m a l p o p u l a t i o n
of heights on men. W e have c o m p u t e d their m e a n and s t a n d a r d deviation. If
we s a m p l e t w o m e n at a time f r o m this p o p u l a t i o n , we can predict the difference between them o n the basis of o r d i n a r y statistical theory. S o m e m e n will
be very similar, o t h e r s relatively very different. Their differences will be distributed normally with a m e a n of 0 and an expected variance of 2 a 2 , for reasons
t h a t will be learned in Section 12.2. T h u s , if we o b t a i n a large difference between
t w o r a n d o m l y sampled men, it will have to be a sufficient n u m b e r of s t a n d a r d
deviations greater t h a n zero for us to reject o u r null hypothesis that the t w o
men c o m c from the specified p o p u l a t i o n . If, on the o t h e r h a n d , we were to look
at the heights of the men before s a m p l i n g t h e m and then take pairs of m e n
w h o seemed to be very different from each o t h e r , it is o b v i o u s that we would
repeatedly o b t a i n differences within pairs of men that were several s t a n d a r d
deviations a p a r t . Such differences would be outliers in the expected frequency
d i s t r i b u t o n of differences, a n d time a n d again wc would reject o u r null hypothesis when in fact it was true. T h e men would be sampled f r o m the s a m e
p o p u l a t i o n , but because they were not being sampled at r a n d o m but being
inspected before being sampled, the probability distribution on which o u r
hypothesis testing rested would n o longer be valid. It is o b v i o u s that the tails
in a large s a m p l e f r o m a n o r m a l distribution will be a n y w h e r e f r o m 5 to 7
s t a n d a r d deviations a p a r t . If we deliberately take individuals f r o m e a c h tail a n d
c o m p a r e them, they will a p p e a r to be highly significantly different f r o m each
other, a c c o r d i n g to the m e t h o d s described in the present section, even t h o u g h
they belong to the s a m e p o p u l a t i o n .
W h e n we c o m p a r e m e a n s differing greatly f r o m each o t h e r as the result of
some treatment in the analysis of variance, we are d o i n g exactly the s a m e thing
as t a k i n g the tallest and the shortest men f r o m the frequency distribution of
175
8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d c o m p a r i s o n s
heights. If w e wish t o k n o w w h e t h e r these a r e significantly different f r o m e a c h
o t h e r , we c a n n o t use the o r d i n a r y p r o b a b i l i t y d i s t r i b u t i o n o n w h i c h t h e analysis
of v a r i a n c e rests, b u t we h a v e t o use special tests of significance. T h e s e u n p l a n n e d tests will be discussed in t h e next section. T h e p r e s e n t section c o n c e r n s
itself with t h e c a r r y i n g o u t of t h o s e c o m p a r i s i o n s p l a n n e d b e f o r e t h e e x e c u t i o n
of t h e e x p e r i m e n t .
T h e general rule f o r m a k i n g a p l a n n e d c o m p a r i s o n is e x t r e m e l y simple; it
is related t o t h e r u l e f o r o b t a i n i n g t h e s u m of s q u a r e s for a n y set of g r o u p s
(discussed at the e n d of Section 8.1). T o c o m p a r e k g r o u p s of a n y size nh t a k e
the s u m of e a c h g r o u p , s q u a r e it, divide the result by the s a m p l e size nh a n d
s u m the k q u o t i e n t s so o b t a i n e d . F r o m t h e s u m of these q u o t i e n t s , s u b t r a c t a
c o r r e c t i o n t e r m , w h i c h y o u d e t e r m i n e by t a k i n g t h e g r a n d s u m of all t h e g r o u p s
in this c o m p a r i s o n , s q u a r i n g it, a n d d i v i d i n g t h e result by the n u m b e r of items
in the g r a n d s u m . If t h e c o m p a r i s o n i n c l u d e s all t h e g r o u p s in t h e a n o v a , the
c o r r e c t i o n t e r m will be the m a i n CT of the s t u d y . If, h o w e v e r , t h e c o m p a r i s o n
includes only s o m e of t h e g r o u p s of the a n o v a , t h e CT will be different, b e i n g
restricted only to these g r o u p s .
T h e s e rules c a n best be l e a r n e d by m e a n s of a n e x a m p l e . T a b l e 8.2 lists the
m e a n s , g r o u p s u m s , a n d s a m p l e sizes of the e x p e r i m e n t with t h e p e a sections
f r o m Box 8.1. Y o u will recall t h a t t h e r e were highly significant differences a m o n g
t h e g r o u p s . W e n o w wish t o test w h e t h e r the m e a n of the c o n t r o l differs f r o m
t h a t of the f o u r t r e a t m e n t s r e p r e s e n t i n g a d d i t i o n of s u g a r . T h e r e will t h u s be t w o
g r o u p s , o n e t h e c o n t r o l g r o u p a n d t h e o t h e r the " s u g a r s " g r o u p s , the latter with
a sum of 2396 a n d a s a m p l e size of 40. W e t h e r e f o r e c o m p u t e
SS (control v e r s u s sugars)
_ (701 ) 2
4
10
(701)
= —
10
(593 + 582 + 580 + 641) 2
2
+
40
(2396)
2
40
-
(701 + 593 + 582 + 580 + 641) 2
~
(3097)50
50
= 8^2.12
In this case the c o r r e c t i o n term is the s a m e as for the a n o v a , b e c a u s e it involves
all the g r o u p s of t h e s t u d y . T h e result is a s u m of s q u a r e s for the c o m p a r i s o n
TABLE 8.2
Means, group sums, and sample sizes from the data in Box 8.1. l ength of pea sections g r o w n in
tissue culture (in o c u l a r units).
1"
('onirol
70.1
Y
I
η
y
yhtcost'
593
Jructosc
58.2
/ ".i illliCOSi'
+
Γ'~„ fructose
58.0
siurosc
64.1
Σ
(61.94 -
701
593
582
580
641
3097
10
10
10
10
10
50
F)
chapter
176
8 / single-classification analysis of
variance
b e t w e e n t h e s e t w o g r o u p s . Since a c o m p a r i s o n b e t w e e n t w o g r o u p s h a s o n l y 1
d e g r e e of f r e e d o m , t h e s u m of s q u a r e s is at t h e s a m e t i m e a m e a n s q u a r e . T h i s
m e a n s q u a r e is tested o v e r t h e e r r o r m e a n s q u a r e of t h e a n o v a t o give t h e
following comparison:
MS ( c o n t r o l v e r s u s sugars)
Fs =
M5^th,„
^0.05[1,45]
=
832.32
=
~5A6~
15944
=
F 0.0 1 [ 1 .4 5] = ^.23
4.05,
T h i s c o m p a r i s o n is h i g h l y significant, s h o w i n g t h a t the a d d i t i o n s of s u g a r s h a v e
significantly r e t a r d e d t h e g r o w t h of the p e a sections.
N e x t we test w h e t h e r t h e m i x t u r e of s u g a r s is significantly d i f f e r e n t f r o m
t h e p u r e sugars. U s i n g the s a m e t e c h n i q u e , we c a l c u l a t e
SS (mixed s u g a r s v e r s u s p u r e s u g a r s )
- <„ 580 i 2
( 5 9 3 ^ 5 8 2 j f J > 4 1 ) 2 _ (593 + 582_+ 580 + 641) 2
_
~
(580) 2
K)
(1816) 2
30
40
(2396) 2
40
=
48.13
H e r e the CT is different, since it is b a s e d o n t h e s u m of the s u g a r s only. T h e
a p p r o p r i a t e test statistic is
MS (mixed s u g a r s versus p u r e sugars)
48.13
/, = —
—
~ 8.8^
MSwilhin
5.46
T h i s is significant in view of the critical v a l u e s of
4 5 | given in t h e p r e c e d i n g
paragraph.
A final test is a m o n g t h e t h r e e sugars. T h i s m e a n s q u a r e h a s 2 d e g r e e s
of f r e e d o m , since it is based o n t h r e e m e a n s . T h u s we c o m p u t e
,
<593) 2
<582) 2
(641 )2
SS ( a m o n g p u r e sugars) =
+
+
|()
(()
)(|
(1816) 2
,()
= 196.87
SS ( a m o n g p u r e sugars)
196.87
MS ( a m o n g p u r e s u g a r s ) --= —
— -= 98.433
d)
I\ =
MS ( a m o n g p u r e s u g a r s !
A/S w i l h ,„
2
=
98.433
- — 18.03
5.46
T h i s Fx is highly significant, since even /·',, 0112.401 = 5·'^·
W e c o n c l u d e that the a d d i t i o n of the t h r e e s u g a r s r e t a r d s g r o w t h in the pea
sections, that mixed s u g a r s affect (lie s e c t i o n s differently f r o m p u r e s u g a r s , a n d
that the p u r e s u g a r s a r e signilicanlly different a m o n g themselves, p r o b a b l y bec a u s e the s u c r o s e lias a far higher m e a n . W e c a n n o t test the s u c r o s e a g a i n s t
the o t h e r two, b e c a u s e that w o u l d be a n u n p l a n n e d test, which s u g g e s t s itself
to us alter we have l o o k e d at the results. T o c a r r y o u t such a test, we need the
m i - t h n i k (il'lhc next section.
177
8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d c o m p a r i s o n s
O u r a p r i o r i tests m i g h t h a v e been q u i t e different, d e p e n d i n g entirely o n o u r
initial h y p o t h e s e s . T h u s , w e could h a v e tested c o n t r o l v e r s u s s u g a r s initially,
followed by d i s a c c h a r i d e s (sucrose) versus m o n o s a c c h a r i d e s (glucose, f r u c t o s e ,
glucose + fructose), f o l l o w e d by mixed versus p u r e m o n o s a c c h a r i d e s a n d finally
by glucose v e r s u s f r u c t o s e .
T h e p a t t e r n a n d n u m b e r of p l a n n e d tests a r e d e t e r m i n e d b y o n e ' s h y p o t h eses a b o u t t h e d a t a . H o w e v e r , t h e r e are c e r t a i n restrictions. It w o u l d clearly
be a m i s u s e of statistical m e t h o d s t o d e c i d e a p r i o r i t h a t o n e wished t o c o m p a r e every m e a n a g a i n s t every o t h e r m e a n (a(a — l)/2 c o m p a r i s o n s ) . F o r a
g r o u p s , t h e s u m of t h e d e g r e e s of f r e e d o m of t h e s e p a r a t e p l a n n e d tests s h o u l d
n o t exceed a — 1. In a d d i t i o n , it is d e s i r a b l e t o s t r u c t u r e t h e tests in s u c h a
w a y t h a t each o n e tests a n i n d e p e n d e n t r e l a t i o n s h i p a m o n g t h e m e a n s (as w a s
d o n e in the e x a m p l e above). F o r e x a m p l e , we w o u l d prefer n o t t o lest if m e a n s
1, 2, a n d 3 differed if we h a d a l r e a d y f o u n d t h a t m e a n 1 differed f r o m m e a n 3,
since significance of the latter suggests significance of the f o r m e r .
Since these tests a r e i n d e p e n d e n t , the three s u m s of s q u a r e s we h a v e so far
o b t a i n e d , based o n 1, 1, a n d 2 d f , respectively, t o g e t h e r a d d u p t o t h e s u m of
s q u a r e s a m o n g t r e a t m e n t s of t h e original a n a l y s i s of v a r i a n c e based o n 4 degrees of f r e e d o m . T h u s :
SS ( c o n t r o l versus sugars)
=
SS (mixed versus p u r e sugars) =
832.32
df
1
48.13
1
SS ( a m o n g p u r e sugars)
=
196.87
2
SS ( a m o n g t r e a t m e n t s )
=1077.32
4
T h i s a g a i n illustrates the elegance of analysis of v a r i a n c e . T h e t r e a t m e n t s u m s
of s q u a r e s can be d e c o m p o s e d i n t o s e p a r a t e p a r t s that are s u m s of s q u a r e s
in their o w n right, with degrees of f r e e d o m p e r t a i n i n g to t h e m . O n e s u m of
s q u a r e s m e a s u r e s the difference between the c o n t r o l s a n d the s u g a r s , the second
t h a t b e t w e e n the mixed s u g a r s a n d the p u r e sugars, a n d the third the r e m a i n i n g
v a r i a t i o n a m o n g the t h r e e s u g a r s . W e c a n present all of these results as a n
a n o v a table, as s h o w n in T a b l e 8.3.
TAHI.F 8 . 3
Anova table from Box K.I, with treatment sum of squares decomposed into
planned comparisons.
Source of I'tiriulioii
<H
.S.'V
MS
Treatments
Control vs. sugars
Mixed vs. pure sugars
Among pure sugars
Within
4
1
1
45
1077.32
832.32
48.13
196.87
245.50
269.33
832.32
48.13
98.43
5.46
Total
49
1322.82
7
49.33**
152.44**
8.82**
18.03**
178
c h a p t e r 8 / single-classification analysis of
variance
W h e n the planned c o m p a r i s o n s are not i n d e p e n d e n t , a n d when t h e n u m b e r
of c o m p a r i s o n s p l a n n e d is less t h a n the total n u m b e r of c o m p a r i s o n s possible
between all pairs of means, which is a(a — 1)/2, we carry out the tests as j u s t
shown but we a d j u s t the critical values of the type 1 e r r o r a. In c o m p a r i s o n s
that are not i n d e p e n d e n t , if the o u t c o m e of a single c o m p a r i s o n is significant,
the o u t c o m e s of s u b s e q u e n t c o m p a r i s o n s are m o r e likely t o be significant as
well, so that decisions based on conventional levels of significance m i g h t be in
d o u b t . F o r this reason, we e m p l o y a conservative a p p r o a c h , lowering the type
I e r r o r of the statistic of significance for each c o m p a r i s o n so that the p r o b a bility of m a k i n g any type I e r r o r at all in the entire series of tests d o e s not
exceed a predetermined value a. This value is called the experimentwise
error
rate. Assuming that the investigator plans a n u m b e r of c o m p a r i s o n s , a d d i n g
u p to k degrees of freedom, the a p p r o p r i a t e critical values will be o b t a i n e d if
the probability x' is used for any o n e c o m p a r i s o n , where
y
7
k
T h e a p p r o a c h using this relation is called the Bonferroni method; it assures us
of an experimentwise e r r o r rate < r.
Applying this a p p r o a c h to the pea section d a t a , as discussed above, let us
assume that the investigator has good reason to test the following c o m p a r i s o n s
between and a m o n g treatments, given here in abbreviated form: (C) versus (G,
F. S, G + F); (G, K, S) versus (G t F); a n d (G) versus (F) versus (S); as well
as (G, F) versus (G + F) T h e 5 degrees of f r e e d o m in these tests require that
each individual test be a d j u s t e d to a significance level of
a
0.05
a' = ^ ^ - 0.01
for an experimentwise critical α — 0.05. T h u s , (lie critical value for the [·\ ratios
of these c o m p a r i s o n s is /·„ l ) ] M 4 S | or /·'„ <>,| > 4 5 ] , as a p p r o p r i a t e . T h e first three
tests arc carried out as shown above. T h e last test is c o m p u t e d in a similar
manner:
SS
Iaverage of glucose a n d \
fructose vs. glucose
\ and fructose mixed
58,)2
(593 +
(58())2
20
10
(I 175)2
20
(593 +
5g2 +
58Q)2
30
(580) 2 _ (1755) 2 _
+
10
Ή)
In spite of the c h a n g e in critical value, the conclusions c o n c e r n i n g the
first three tests are u n c h a n g e d . The last test, the average of glucose a n d fructose
versus a mixture of the two, is not significant, since F s = i l l
0.687. A d j u s t ing the critical value is a conservative procedure: individual c o m p a r i s o n s using
this a p p r o a c h are less likely to be significant.
8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d
179
comparisons
T h e B o n f e r r o n i m e t h o d generally will n o t e m p l o y the s t a n d a r d , t a b l e d
a r g u m e n t s of α for the F d i s t r i b u t i o n . T h u s , if we were t o p l a n tests i n v o l v i n g
a l t o g e t h e r 6 d e g r e e s of f r e e d o m , t h e v a l u e of a' w o u l d be 0.0083. E x a c t tables
for B o n f e r r o n i critical values are a v a i l a b l e for the special case of single d e g r e e
of f r e e d o m tests. Alternatively, we c a n c o m p u t e the d e s i r e d critical v a l u e b y
m e a n s of a c o m p u t e r p r o g r a m . A c o n s e r v a t i v e a l t e r n a t i v e is t o use t h e next
smaller t a b l e d v a l u e of a. F o r details, c o n s u l t S o k a l a n d Rohlf (1981), s e c t i o n 9.6.
T h e B o n f e r r o n i m e t h o d (or a m o r e r e c e n t r e f i n e m e n t , t h e D u n n - S i d a k
m e t h o d ) s h o u l d a l s o be e m p l o y e d w h e n y o u a r e r e p o r t i n g c o n f i d e n c e limits for
m o r e t h a n o n e g r o u p m e a n resulting f r o m a n analysis of v a r i a n c e . T h u s , if y o u
w a n t e d to p u b l i s h the m e a n s a n d 1 — a c o n f i d e n c e limits of all live t r e a t m e n t s
in the p e a section e x a m p l e , you w o u l d not set c o n f i d e n c e limits t o each m e a n
as t h o u g h it were a n i n d e p e n d e n t s a m p l e , b u t y o u w o u l d e m p l o y t„. [v] , w h e r e
ν is the degrees of f r e e d o m of the entire s t u d y a n d a' is the a d j u s t e d t y p e I e r r o r
e x p l a i n e d earlier. D e t a i l s of such a p r o c e d u r e c a n be learned in S o k a l a n d
Rohlf (1981), Section 14.10.
8.6 Comparisons among means: Unplanned comparisons
A single-classification a n o v a is said to be significant if
MS
— '
> Fjh,
^^wilhin
Since M S g r o u p J M S „ i t h i n = S S g r o u p s / [ ( « (8.5) as
| „(„•!)]
(8.5)
1) M S w i l h i n J , we can r e w r i t e E x p r e s s i o n
g r o u p s ^ (" " Π M S w i l h i „ /·'„!„
!.„,„
1,|
F o r e x a m p l e , in Box 8.1, w h e r e the a n o v a is significant, SS Br „
s t i t u t i n g into E x p r e s s i o n (8.6), we o b t a i n
1077.32 > (5 -
1)(5.46)(2.58) - 56.35
for
(8.6)
— 1077.32. S u b -
a = 0.05
It is t h e r e f o r e possible t o c o m p u t e a critical λ\ν value for a test of significance
of a n a n o v a . Thus, a n o t h e r way of c a l c u l a t i n g overall significance w o u l d be t o
sec w h e t h e r the S.VKI„ups is g r e a t e r t h a n this critical SS. It is of interest t o investigate w h y the SS vt>Ui , s is as large as it is a n d to test for t h e significance of
the v a r i o u s c o n t r i b u t i o n s m a d e to this SS by dilfercnccs a m o n g the s a m p l e
m e a n s . T h i s was discussed in the p r e v i o u s scction, w h e r e s e p a r a t e s u m s of
s q u a r e s were c o m p u t e d based o n c o m p a r i s o n s a m o n g m e a n s p l a n n e d b e f o r e
the d a t a were e x a m i n e d . A c o m p a r i s o n w a s called significant if its /·', r a t i o w a s
> I''iik !.«(»• πι· w h e r e k is the n u m b e r of m e a n s being c o m p a r e d . W e c a n n o w
also s t a t e this in t e r m s of s u m s of s q u a r e s : An SS is significant if it is g r e a t e r
t h a n {k
I) M S w i l h i n Fxlk ,.„,„ n].
T h e a b o v e tests w e r e a priori c o m p a r i s o n s . O n e p r o c e d u r e for testing a
posteriori c o m p a r i s o n s w o u l d be to set k — a in this last f o r m u l a , n o m a t t e r
180
c h a p t e r 8 / single-classification analysis of
variance
how m a n y m e a n s we c o m p a r e ; thus the critical value of the SS will be larger
t h a n in the previous m e t h o d , m a k i n g it m o r e difficult to d e m o n s t r a t e the significance of a s a m p l e SS. Setting k = a allows for the fact t h a t we c h o o s e for
testing those differences between g r o u p m e a n s t h a t a p p e a r to be c o n t r i b u t i n g
substantially to the significance of the overall a n o v a .
F o r an example, let us r e t u r n to the effects of sugars on g r o w t h in pea
sections (Box 8.1). We write d o w n the m e a n s in ascending o r d e r of m a g n i t u d e :
58.0 (glucose + fructose), 58.2 (fructose), 59.3 (glucose), 64.1 (sucrose), 70.1
(control). W e notice t h a t the first three t r e a t m e n t s have quite similar m e a n s a n d
suspect t h a t they d o n o t differ significantly a m o n g themselves a n d hence d o n o t
c o n t r i b u t e substantially to the significance of the SSgroups.
T o test this, wc c o m p u t e the SS a m o n g these three m e a n s by the usual
formula:
2
2
2
2
_ (593) + (582)
__ + (580) _ (593 + 582
_ _+ 580)
-
102,677.3 - 102,667.5 = 9.8
T h e dilfcrcnccs a m o n g these m e a n s are not significant, because this SS is less
than the critical SS (56.35) calculated above.
T h e sucrose m e a n looks suspiciously different from the m e a n s of the o t h e r
sugars. T o test this wc c o m p u t e
(641) 2
k
~ 10
+
(593 + 582 + 580) 2
(641 + 593 + 582 + 580) 2
30
κΓ+30
= 41,088.1 + 102,667.5 -
143,520.4 = 235.2
which is greater than the critical SS. Wc conclude, therefore, that sucrosc retards g r o w t h significantly less than the o t h e r sugars tested. We may c o n t i n u e
in this fashion, testing all the differences that look suspicious o r even testing
all possible sets of means, considering them 2, 3, 4, a n d 5 at a time. This latter
a p p r o a c h may require a c o m p u t e r if there are m o r e than 5 m e a n s to be c o m pared, since there arc very m a n y possible tests that could be m a d e . This
p r o c e d u r e was p r o p o s e d by Gabriel (1964), w h o called it a sum of squares simultaneous
test
procedure
(SS-S'l'P).
In the SS-S I I' and in the original a n o v a , the chancc of m a k i n g a n y type I
e r r o r at all is a, the probability selected for the critical I· value f r o m T a b l e V.
By " m a k i n g any type I e r r o r at all" we m e a n m a k i n g such an e r r o r in the overall
test of significance of the a n o v a a n d in any of the subsidiary c o m p a r i s o n s a m o n g
m e a n s or sets of means needed to complete the analysis of the experiment. Phis
probability a therefore is an experimentwise
e r r o r rate. N o t e that t h o u g h the
probability of any e r r o r at all is a, the probability of e r r o r for any p a r t i c u l a r
test of s o m e subset, such as a test of the difference a m o n g three o r between t w o
means, will always be less than χ Thus, for the test of each subset o n e is really
using a significance level a \ which may be m u c h less than the cxperimcntwisc
e x e r c i s e s 195
α, a n d if t h e r e a r e m a n y m e a n s in t h e a n o v a , this a c t u a l e r r o r r a t e a ' m a y be
o n e - t e n t h , o n e o n e - h u n d r e d t h , o r even o n e o n e - t h o u s a n d t h of t h e e x p e r i m e n t wise α ( G a b r i e l , 1964). F o r this r e a s o n , t h e u n p l a n n e d tests d i s c u s s e d a b o v e
a n d the overall a n o v a a r e n o t very sensitive t o differences b e t w e e n i n d i v i d u a l
m e a n s o r differences w i t h i n small subsets. O b v i o u s l y , n o t m a n y differences a r e
g o i n g t o be c o n s i d e r e d significant if a' is m i n u t e . T h i s is t h e price w e p a y for
n o t p l a n n i n g o u r c o m p a r i s o n s b e f o r e we e x a m i n e t h e d a t a : if w e w e r e t o m a k e
p l a n n e d tests, the e r r o r r a t e of e a c h w o u l d be greater, h e n c e less c o n s e r v a t i v e .
T h e SS-STP
p r o c e d u r e is only o n e of n u m e r o u s t e c h n i q u e s f o r m u l t i p l e
u n p l a n n e d c o m p a r i s o n s . It is t h e m o s t c o n s e r v a t i v e , since it a l l o w s a large
n u m b e r of possible c o m p a r i s o n s . D i f f e r e n c e s s h o w n t o be significant by this
m e t h o d c a n be reliably r e p o r t e d as significant differences. H o w e v e r , m o r e sensitive a n d p o w e r f u l c o m p a r i s o n s exist w h e n t h e n u m b e r of possible c o m p a r i s o n s
is c i r c u m s c r i b e d b y t h e user. T h i s is a c o m p l e x s u b j e c t , t o w h i c h a m o r e c o m p l e t e
i n t r o d u c t i o n is given in S o k a l a n d Rohlf (1981), Section 9.7.
Exercises
8.1
The following is an example with easy numbers to help you become familiar
with the analysis of variance. A plant ecologist wishes to test the hypothesis
that the height of plant species X depends on the type of soil it grows in. He has
measured the height of three plants in each of four plots representing different
soil types, all four plots being contained in an area of two miles square. His
results are tabulated below. (Height is given in centimeters.) Does your analysis support this hypothesis? ANS. Yes, since F, = 6.951 is larger than
'θ <I5|J.H| — 4 . 0 7 .
Observation
number
1
2
3
8.2
/
15
9
14
Loetilil ies
2
.i
25
21
19
17
23
20
4
10
13
16
The following are measurements (in coded micrometer units) of the thorax length
of the aphid Pemphigus populitransversus. The aphids were collected in 28 galls
on the cottonwood I'opulas delloides. Four alate (winged) aphids were randomly
selected from each gall and measured. The alate aphids of each gall are isogenic
(identical twins), being descended parthcnogenetieally from one stem mother.
Thus, any variance within galls can be due to environment only. Variance between galls may be due to differences in genotype and also to environmental
differences between galls. If this character, thorax length, is affected by genetic
variation, significant intergall variance must be present. The converse is not necessarily true: significant variance between galls need not indicate genetic variation; it could as well be due to environmental differences between galls (data by
Sokal, 1952). Analyze the variance of thorax length. Is there significant intergall
variance present? (Jive estimates of the added component of intergall variance,
if present. What percentage of the variance is controlled by intragall and what
percentage by intergall factors? Discuss your results.
182
c h a p t e r 8 / s i n g l e - c l a s s i f i c a t i o n a n a l y s i s of
Gall no.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
8.3
Gall no.
6.1,
6.2,
6.2,
5.1,
4.4,
5.7,
6.3,
4.5,
6.3,
5.4,
5.9,
5.9,
5.8,
5.6,
6.0,
5.1,
6.2,
6.0,
4.9,
5.1,
6.6,
4.5,
6.2,
5.3,
5.8,
5.9,
5.9,
6.4,
5.7.
6.1.
5.3,
5.8,
4.7,
5.8,
6.4,
4.0,
5.9,
5.0,
6.3,
5.5,
5.4,
6.4,
6.0
5.3
6.3
5.9
4.8
5.5
6.3
3.7
6.2
5.3
5.7
5.5
5.5
6.1
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
6.3,
5.9,
5.8,
6.5,
5.9,
5.2,
5.4,
4.3,
6.0,
5.5,
4.0,
5.8,
4.3,
6.1,
6.5,
6.1,
6.0,
6.3,
5.2,
5.3,
5.5,
4.7,
5.8,
6.1,
4.2,
5.6,
4.0,
6.0,
6.1,
6.1,
5.9,
6.5,
5.7,
5.4,
5.2,
4.5,
5.7,
5.5,
4.3,
5.6,
4.4,
5.6,
6.3
6.0
5.7
7.0
5.7
5.3
6.3
4.4
5.9
6.1
4.4
6.1
4.6
6.5
VI ill is and Seng (1954) published a study on the relation of birth order to the
birth weights οΓ infants. The data below on first-born and eighth-born infants are
extracted from a table of birth weights of male infants of Chinese third-class
patients at the K a n d a n g Kerbau Maternity Hospital in Singapore in 1950 and
1951.
Birth weight
(Ih:: oz )
3:0
3:8
4:0
4:8
5:0
5:8
6:0
6:8
7:0
7:8
8:0
8:8
9:0
9:8
10:0
10:8
3: 7
3: 15
4::7
•4:: 15
5::7
5 : 15
6:: 7
6 : 15
7:7
7 : 15
8: 7
8:1 5
9 :7
9 :15
10:7
10:15
Birth
I
order
ti
.1
3
7
111
267
457
485
363
162
64
6
5
4
5
19
52
55
61
48
39
19
4
1
1932
8.4
variance
307
Which birth order appears to be accompanied by heavier infants? Is this differ
ence significant? Can you conclude that birth order causes differences in birth
weight? (Computational note: The variable should be coded as simply as possible.) Reanalyze, using the I test, and verify that ff = F s . ANS. l s ^ 11.016 and
/·;=- 121.352 "
The following cytochrome oxidase assessments of male Pcriplaneta roaches in
cubic millimeters per ten minutes per milligram were taken IVom a larger study
exercises
183
24 hours after
methoxychlor injection
Control
8.5
y
Sy
5
24.8
19.7
0.9
1.4
3
Are the two means significantly different?
P. E. Hunter (1959. detailed data unpublished) selected two strains of D. melanoiicisler, one for short larval period (SL) and one for long larval period (LL). A
nonselected control strain (CS) was also maintained. At generation 42 these data
were obtained for the larval period (measured in hours). Analyze and interpret.
Strain
CS
SL
tii
8.6
η
LL
80
69
33
8070
7291
3640
3 "ι
Σ Σ γ
2
= 1,994.650
Note that part of the computation has already been performed for you. Perform
unplanned tests a m o n g the three means (short vs. long larval periods and each
against the control). Set 95% confidence limits to the observed differences of
means for which these comparisons are made. ANS. MS | S L v s 1 L ) = 2076.6697.
These data are measurements of live random samples of domestic· pigeons collected during January, February, and March in Chicago in 1955. The variableis the length from the anterior end of the narial opening to the lip of the bony
beak and is recorded in millimeters. Data from Olson and Miller (1958).
1
5.4
5.3
5.2
4.5
5.0
.5.4
3.8
5.9
5.4
5.1
5.4
4.1
5.2
4.8
4.6
5.7
5.9
5.8
5.0
5.0
1
5.2
5.1
4.7
5.0
5.9
5.3
6.0
5.2
6.6
5.6
5.1
5.7
5.1
4.7
6.5
5.1
5.4
5.8
5.8
5.9
Samples
3
4
5.5
4.7
4.8
4.9
5.9
5.2
4.8
4.9
6.4
5.1
5.1
4.5
5.3
4.8
5.3
5.4
4.9
4.7
4.8
5.0
5.1
4.6
5.4
5.5
5.2
5.0
4.8
5.1
4.4
6.5
4.8
4.9
6.0
4.8
5.7
5.5
5.8
5.6
5.5
5.0
s
5.1
5.5
5.9
6.1
5.2
5.0
5.9
5.0
4.9
5.3
5.3
5.1
4.9
5.8
5.0
5.6
6.1
5.1
4.8
4.9
184
8.7
198 c h a p t e r 8 / s i n g l e - c l a s s i f i c a t i o n a n a l y s i s o f
The following data were taken from a study of blood protein variations in deer
(Cowan and Johnston, 1962). The variable is the mobility of serum protein fraction II expressed as 1(T 5 cm 2 /volt-seconds.
Sitka
California blacktail
Vancouver Island blacktail
Mule deer
Whitetail
8.8
variance
Y
S
2.8
2.5
2.9
2.5
2.8
0.07
0.05
0.05
0.05
0.07
T
η = 12 for each mean. Perform an analysis of variance and a multiple-comparison
test, using the sums of squares STP procedure. ANS. MS within = 0.0416; maximal
nonsignificant sets (at Ρ = 0.05) are samples 1, 3, 5 and 2, 4 (numbered in the
order given).
For the data from Exercise 7.3 use the Bonferroni method to test for differences
between the following 5 pairs of treatment means:
A, Β
A, C
A, D
A, (B + C + D)/3
B, (C + D)/2
CHAPTER
Two-Way
Analysis
of Variance
F r o m the single-classification a n o v a of C h a p t e r 8 we p r o g r e s s t o the t w o - w a y
a n o v a of the p r e s e n t c h a p t e r by a single logical step. I n d i v i d u a l items m a y be
g r o u p e d i n t o classes r e p r e s e n t i n g t h e different possible c o m b i n a t i o n s of t w o
t r e a t m e n t s o r factors. T h u s , the h o u s e f l y w i n g l e n g t h s s t u d i e d in earlier c h a p t e r s ,
which yielded s a m p l e s r e p r e s e n t i n g different m e d i u m f o r m u l a t i o n s , might also
be divided i n t o m a l e s a n d females. S u p p o s e we w a n t e d t o k n o w n o t o n l y w h e t h e r
m e d i u m 1 i n d u c e d a different wing l e n g t h t h a n m e d i u m 2 b u t a l s o w h e t h e r
m a l e housefiies differed in w i n g length f r o m females. O b v i o u s l y , e a c h c o m b i n a t i o n of f a c t o r s s h o u l d be r e p r e s e n t e d by a s a m p l e of flies. T h u s , for seven
m e d i a a n d t w o sexes we need at least 7 x 2 = 1 4 s a m p l e s . Similarly, the exp e r i m e n t testing five s u g a r t r e a t m e n t s o n p e a s e c t i o n s (Box 8.1) m i g h t h a v e
been carried o u t at t h r e e different t e m p e r a t u r e s . T h i s w o u l d h a v e resulted in a
two-way analysis of variance of t h e effects of s u g a r s as well as of t e m p e r a t u r e s .
It is the a s s u m p t i o n of this t w o - w a y m e t h o d of a n o v a t h a t a given t e m p e r a t u r e a n d a given s u g a r each c o n t r i b u t e a c e r t a i n a m o u n t to the g r o w t h of a p e a
section, a n d t h a t these t w o c o n t r i b u t i o n s a d d their effects w i t h o u t i n f l u e n c i n g
each o t h e r . In Section 9.1 wc shall see h o w d e p a r t u r e s f r o m the a s s u m p t i o n
186
c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e
are measured; we shall also consider the expression for d e c o m p o s i n g variates
in a t w o - w a y a n o v a .
T h e t w o factors in the present design m a y represent either M o d e l I or
M o d e l II effects o r o n e of each, in which case we talk of a mixed model.
T h e c o m p u t a t i o n of a t w o - w a y a n o v a for replicated subclasses (more t h a n
o n e variate per subclass or factor c o m b i n a t i o n ) is s h o w n in Section 9.1, which
also c o n t a i n s a discussion of the m e a n i n g of interaction as used in statistics.
Significance testing in a two-way a n o v a is the subject of Section 9.2. This is
followed by Section 9.3, on two-way a n o v a without replication, or with only a
single variate per subclass. T h e well-known m e t h o d of paired c o m p a r i s o n s is a
special ease of a t w o - w a y a n o v a without replication.
W e will n o w proceed to illustrate the c o m p u t a t i o n of a t w o - w a y a n o v a .
You will o b t a i n closer insight into the s t r u c t u r e of this design as we explain
the c o m p u t a t i o n s .
9.1 Two-way anova with replication
W e illustrate the c o m p u t a t i o n of a t w o - w a y a n o v a in a study of oxygen cons u m p t i o n by two species of limpets at three c o n c e n t r a t i o n s of seawater. Eight
replicate readings were o b t a i n e d for each c o m b i n a t i o n of species a n d s e a w a t e r
c o n c e n t r a t i o n . W e have c o n t i n u e d t o call the n u m b e r of c o l u m n s
and a r e
calling the n u m b e r of rows b. T h e sample size for each cell (row a n d c o l u m n
c o m b i n a t i o n ) of the table is n. T h e cells are also called s u b g r o u p s or subclasses.
T h e d a t a arc featured in Box 9.1. T h e c o m p u t a t i o n a l steps labeled Preliminary computations provide an efficient p r o c e d u r e for the analysis of variance,
but we shall u n d e r t a k e several digressions to ensure that the c o n c e p t s u n d e r lying this design arc a p p r e c i a t e d by the reader. We c o m m e n c e by c o n s i d e r i n g
the six subclasses as t h o u g h they were six g r o u p s in a single-classification a n o v a .
liach s u b g r o u p or subclass represents eight oxygen c o n s u m p t i o n readings. If
we had no further classification of these six s u b g r o u p s by species or salinity,
such an a n o v a would test whether there was any variation a m o n g the six subg r o u p s over a n d a b o v e the variance within (he s u b g r o u p s . But since we have the
subdivision by species a n d salinity, o u r only p u r p o s e here is to c o m p u t e s o m e
quantities necessary for the further analysis. Steps I t h r o u g h 3 in Box 9.1 correspond to the identical steps in Box 8.1, a l t h o u g h the symbolism has changed
slightly, since in place of a g r o u p s we now have ab subgroups. T o c o m p l e t e
the a n o v a , we need a correction term, which is labeled step 6 in Box 9.1. F r o m
these quantities we o b t a i n SSu„ah a n d .S\S\vilhlll in steps 7, 8, a n d 12, c o r r e s p o n d ing to steps 5, 6, and 7 in the layout of Box 8.1. T h e results of this preliminary
a n o v a arc featured in l able 9.1.
T h e c o m p u t a t i o n is continued by finding the s u m s of squares for rows a n d
c o l u m n s of the table. This is dime by the general f o r m u l a stated at the end of
Section 8.1. Thus, for columns, we s q u a r e the c o l u m n sums, sum the resulting
squares, a n d divide the result by 24. the n u m b e r of items per row. T h i s is step
4 in Box 9.1. Λ similar q u a n t i t y is c o m p u t e d for rows (step 5). F r o m these
187
ω —i.
25 °°
e t II
'Π
3 «
5> ^
Μ "3
ω u>
ίβ Ό
ed Λ
S S5
Ο
c
.2 _
"<-»i 0c
§•2
cΟ .51
Ο -D
8 I
S 8
« H.
a
•3 P.
Ή
«
<3
•a
5
υt
A W
<3 OS
5? <u
cs (u
I iu
(Ν
t o o o
Ο Φ 00
νο Ο •
rΝ
CJ
•w
3
•si
(Λ -t^
ω XI
' ο 60
ό
Ο
•β
s
ΐ
2
•β
m
Ο >>
* "S
ο
rv|
"ί
δ-
V3
2
•S
χ>
UH
Τ3
00
ο.Ο
£ — Α>
β .2
Ο_ '
ο TJ
αν S S δ3 <&
2
60 S
"
x 4 >>
<L>-SS
ο ε π
η
οο
ON
m σι « ο
σν
σν
νο <η α\
"fr νΟ Q Ον
r-- 0 «ο ο
»h 00 *3F τ r^j rr cR ρ- 00
v£> ro c> "Λ
*T ΣΝ >Η —;
νο Ο Ο νο
(Ν 55 —ι νο ο
00 νΟ Ον Tt
3
νΟ 00 Q m II
—< Γ-; Φ Ον
10 rn οό
ο as g 00
Λ oc d Η
~
~
Ο Ο oo
w-ί «λ Κ νο
8
S
νο —<
σν
νο
-Η
Tfr
ο οΓΟ
οί VO τ»·
οο' rον vd
00 Ο II
m
νο ro
Ον νο Ί—4 W
8
"Τ 0
Ο Ο
Ο
«ο S
00
^Η ^J
Ο
rII
tΓ—Ο
00 rr
r—
σ< 00 o< W
<η
«
1 =
t .2 _
6·?
>Λ
Ο
ν->
ο.
Β
D
I
ί
BOX 9.1
Continued
Preliminary
computations
a b π
Υ
1. Grand total = Σ Σ Σ
=
461.74
2. Sum of the squared observations = Σ Σ Σ
γ 2
=
+ ••• + (12.30)2 = 5065.1530
3. Sum of the squared subgroup (cell) totals, divided by the sample size of the subgroups
2
" b / η
Σ Σ \ Σ
γ
ν
/
(84.49)2 + •·• + (98.61)2
«
« fb
η
\2
t Ϋ f y/ ι
4. Sum of the squared column totals divided by the sample size of a column = - A
=
bn
b/a
η
Υ
5. Sum of the squared row totals divided by the sample size of a row = Σ^ Ϊ Σ Σ....
= 4663.6317
8
(2«.00) 2 + (216.74)2 _
~~
(3 χ 8)
~ 4438.3S44
\2
1
an
(143.92)η22 + (121.82)2 + (196.00)2
(2^8)
=
46230674
6. Grand total squared and divided by the total sample size = correction term CT
/ a b it
\2
\
ΣΣΣΣΣγ Π
)
/
abn
7- SS,„,ai = Σ Σ Σ
a
γ1
b / η
~
C T
,
„. (qua
(quantity, l), 2
abn
„(461.74), 2
(2x3x8)"4441'7464
= quantity 2 - quantity 6 = 5065.1530 - 4441.7464 = 623.4066
\2
ΣΣΙΣ
8. SSsubgr =
^
- C T = quantity 3 - quantity 6 = 4663.6317 - 4441.7464 = 221.8853
it
a ( b
V
ς(ςς^)
9. SSA (SS of columns) = —
η
b fa
Σ ( Σ Σ
10. SSB (SS of rows) = — ^
C T = quantity 4 - quantity 6 = 4458.3844 - 4441.7464 = 16.6380
bn
\2
γ
Ι
'— - CT = quantity 5 - quantity 6 = 4623.0674 - 4441.7464 = 181.3210
an
11. SSA „ B (interaction SS) = SS subgr - SSA - SS„ = quantity 8 - quantity 9 - quantity 10
= 221.8853 - 16.6380 - 181.3210 = 23.9263
12. SSwUhin (within subgroups; error SS) = SSloltll — SSsllbgr = quantity 7 - quantity 8
= 623.4066 - 221.8853 = 401.5213
As a check on your computations, ascertain that the following relations hold for some of the above quantities: 2 S 3 S 4 i 6;
3 > 5 > 6.
Explicit formulas for these sums of squares suitable for computer programs are as follows:
9 a . SSA = n b t ( Y
A
-
10a. SSB = n a £ ( f B 11a. SSAB
Y)2
Y
= n £ i ( Y - ?
12a. SS within =
n
?
A
- ?
t i ^ - ? )
2
B
+ f )
2
BOX 9.1
Continued
Now fill in the anova table.
Source of variation
jf
"J
A (columns)
Ϋ Α - ?
a -
>«
1
MS
Expected
9
9
2 ,
nb«
<r2 + — — Vώ
a -
( a - I )
Y
- Y
B
Β (rows)
Υ - Ϋ Α - Υ β + Ϋ
h -
Α χ Β (interaction)
1
(a -
1 Kb
Y - Y
Within subgroups
ab(n -
Y - f
Total
abn — I
1 6
Source of variation
Model II
*
σ2 + ησζΒ
Β
Α χ
Β
1)
-
10
1)
(a -
m
12
3
1
^
M
Λ
-
1)
(a -
W
-
2
1) Z w )
12
1)
1
b o t h faCtors
>the
ex
?ected
^ o v e are eorreet Below are the corresponding
Mixed model
(.4 fixed, β random)
+ nbai
σ2 + π<7 2 β +
ι
σ + ηα Α Β
ι
π-
f
^
2
a
11
11
1)
Γ)
b
ib -
ab(n -
e x p i r n f f o S r m o S
A
10
MS (Model
naog
nb
σ2 + ησ\Β
+
α"
-I-
2
σ
+
°
α — I
2
ηασ|
ησ"ΑΒ
σ2
Within subgroups
Anova table
Source of variation
df
1
SS
MS
F,
16.638
90.660
11.963
9.560
1.740 ns
9.483**
1.251 ns
A (columns; species)
β (rows: salinities)
Α χ B (interaction)
Within subgroups (error)
Ί
42
16.6380
181.3210
23.9263
401.5213
Total
47
623.4066
fd.0511.4.2] = 4.07
1
Fo.05E2.4 2] = 3.22
Fo.01(2,42] = 5.15
Since this is a Model I anova, all mean squares are tested over the error MS. For a discussion of significance tests, see Section
9.2.
Conclusions.—Oxygen consumption does not differ significantly between the two species of limpets but differs with the sa!in:r·
At 50% seawater, the O , consumption is increased. Salinity appears to affect the two species equally, for there is insufficient evidir.:;
of a species χ salinity interaction.
I
192
c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e
TABLE
9.1
Preliminary anova of subgroups in two-way anova. D a t a f r o m Box 9.1.
Source of
variation
df
Y Y -
Ϋ
Υ
Among subgroups
Within subgroups
5
42
ab - 1
ab(n -
Y -
Τ
Total
47
abn —
1)
SS
MS
221.8853
401.5213
44.377**
9.560
623.4066
q u o t i e n t s we s u b t r a c t t h e c o r r e c t i o n term, c o m p u t e d as q u a n t i t y 6. T h e s e s u b t r a c t i o n s a r e carried o u t as steps 9 a n d 10, respectively. Since t h e r o w s a n d
c o l u m n s a r e b a s e d o n e q u a l s a m p l e sizes, we d o n o t h a v e t o o b t a i n a s e p a r a t e
q u o t i e n t for t h e s q u a r e of e a c h r o w o r c o l u m n s u m b u t c a r r y o u t a single division a f t e r a c c u m u l a t i n g t h e s q u a r e s of t h e s u m s .
Let us r e t u r n for a m o m e n t t o the p r e l i m i n a r y a n a l y s i s of v a r i a n c e in
T a b l e 9.1, w h i c h d i v i d e d t h e t o t a l s u m of s q u a r e s i n t o t w o p a r t s : t h e s u m of
s q u a r e s a m o n g the six s u b g r o u p s ; a n d t h a t w i t h i n the s u b g r o u p s , t h e e r r o r s u m
of s q u a r e s . T h e new s u m s of s q u a r e s p e r t a i n i n g t o r o w a n d c o l u m n effects clearly
are n o t p a r t of the e r r o r , but m u s t c o n t r i b u t e t o t h e differences t h a t c o m p r i s e
the s u m of s q u a r e s a m o n g t h e f o u r s u b g r o u p s . W e t h e r e f o r e s u b t r a c t r o w a n d
col u m n SS f r o m the s u b g r o u p SS. T h e latter is 221.8853. T h e r o w S S is 181.3210,
a n d t h e c o l u m n SS is 16.6380. T o g e t h e r they a d d u p t o 197.9590, a l m o s t b u t
n o t q u i t e t h e value of t h e s u b g r o u p s u m of s q u a r e s . T h e difference r e p r e s e n t s
a t h i r d s u m of s q u a r e s , called the interaction
sum of squares, w h o s e v a l u e in
this case is 23.9263.
W c shall discuss the m e a n i n g of this new s u m of s q u a r e s presently. At the
m o m e n t let us say o n l y t h a t it is a l m o s t a l w a y s p r e s e n t (but n o t necessarily
significant) a n d g e n e r a l l y t h a t it need n o t be i n d e p e n d e n t l y c o m p u t e d but m a y
be o b t a i n e d as illustrated a b o v e
by the s u b t r a c t i o n of the row .SS a n d t h e colu m n SS f r o m the s u b g r o u p SS. T h i s p r o c e d u r e is s h o w n g r a p h i c a l l y in F i g u r e
9.1, which illustrates the d e c o m p o s i t i o n of the total s u m of s q u a r e s i n t o the s u b g r o u p SS a n d e r r o r SS. T h e f o r m e r is s u b d i v i d e d i n t o the row SS, c o l u m n SS,
a n d i n t e r a c t i o n SS. T h e relative m a g n i t u d e s of these s u m s of s q u a r e s will differ
f r o m e x p e r i m e n t to e x p e r i m e n t . In F i g u r e 9.1 they a r e not s h o w n p r o p o r t i o n a l
to their a c t u a l values in the limpet e x p e r i m e n t ; o t h e r w i s e the a r e a r e p r e s e n t i n g
the row SS w o u l d have to be a b o u t 11 times t h a t allotted to the c o l u m n SS.
Before we c a n intelligently test for significance in this a n o v a w e m u s t u n d e r s t a n d the m e a n i n g of interaction.
W e c a n best e x p l a i n i n t e r a c t i o n in a t w o - w a y
a n o v a by m e a n s of a n artificial illustration b a s e d o n the limpet d a t a wc h a v e
just s t u d i e d . If we i n t e r c h a n g e the r e a d i n g s for 75% a n d 50'7, for A. d'uiitulis
only, we o b t a i n the d a t a t a b i c s h o w n in T a b i c 9.2. O n l y the s u m s of t h e s u b g r o u p s , rows, a n d c o l u m n s a r e s h o w n . W e c o m p l e t e the a n a l y s i s of v a r i a n c e
in t h e m a n n e r p r e s e n t e d a b o v e a n d n o t e the results at the fool of f a b l e 9.2.
T h e lotal a n d e r r o r SS are the s a m e as b e f o r e ( T a b l e 9.1). T h i s s h o u l d not be
9.1 / t w o - w a y a n o v a w i t h r f . p i r
193
ation
R o w SS = 181.3210
T o t a l SS
= 77,570.25 "S
Column
• S u b g r o u p SS
SS = 10.6380
= 211.8803
I n t e r a c t i o n S',S* = 23.02(53
E r r o r AS = 401.5213
FIGURE 9.1
D i a g r a m m a t i c r e p r e s e n t a t i o n of the p a r t i t i o n i n g of the total s u m s of s q u a r e s in a t w o - w a y o r t h o g o n a l
a n o v a . T h e a r e a s of the subdivisions are not s h o w n p r o p o r t i o n a l to the m a g n i t u d e s of the s u m s
of squares.
s u r p r i s i n g , since we a r e u s i n g the s a m e d a t a . All t h a t we h a v e d o n e is t o interc h a n g e the c o n t e n t s of t h e l o w e r t w o cells in t h e r i g h t - h a n d c o l u m n of the
table. W h e n we p a r t i t i o n t h e s u b g r o u p SS, we d o find s o m e differences. W e
n o t e t h a t the SS b e t w e e n species (between c o l u m n s ) is u n c h a n g e d . Since the
c h a n g e we m a d e w a s w i t h i n o n e c o l u m n , t h e t o t a l for t h a t c o l u m n w a s n o t
altered a n d c o n s e q u e n t l y t h e c o l u m n SS did n o t c h a n g e . H o w e v e r , t h e s u m s
TABl.F. 9 . 2
An artificial example to illustrate the meaning of interaction. T h e r e a d i n g s
for 75'7, a n d 50% s e a w a t e r c o n c e n t r a t i o n s of Acmaea digitalis in Box 9.1
have been i n t e r c h a n g e d . O n l y s u b g r o u p a n d marginal totals are given
below.
Species
Seawater
concentration
A. scahra
A digitalis
100";,
75",;
so",;
84.49
63.12
97.39
59.43
98.61
58.70
Σ
245.00
216.74
£
143.92
161.73
156.09
461/74
Completed anova
Sintrce of variation
df
SS
MS
Species
Salinities
Sp χ Sal
Error
Total
1
2
2
42
47
16.6380
10.3566
194.8907
401.5213
623.4066
16.638 ns
5.178 m
97.445**
9.560
194
c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e
of the second and third rows have been altered appreciably as a result of the
interchange of the readings for 75% and 50% salinity in A. digitalis. The sum
for 75% salinity is now very close to that for 50% salinity, and the difference
between the salinities, previously quite m a r k e d , is now n o longer so. By contrast, the interaction SS, obtained by subtracting the sums of squares of rows
and columns from the s u b g r o u p SS, is now a large quantity. R e m e m b e r that
the s u b g r o u p SS is the same in the two examples. In the first example we subtracted sums of squares due to the effects of both species and salinities, leaving
only a tiny residual representing the interaction. In the second example these
two main effects (species and salinities) account only for little of the s u b g r o u p
sum of squares, leaving the interaction sum of squares as a substantial residual.
W h a t is the essential difference between these two examples?
In Table 9.3 we have shown the s u b g r o u p and marginal m e a n s for the
original d a t a from Table 9.1 and for the altered d a t a of Table 9.2. T h e original
results are quite clear: at 75% salinity, oxygen c o n s u m p t i o n is lower than at
the other two salinities, and this is true for both species. We note further that
A. scabra consumes more oxygen than A. digitalis at two of the salinities. T h u s
our statements a b o u t differences due to species or to salinity can be m a d e
largely independent of each other. However, if we had to interpret the artificial
d a t a (lower half of Table 9.3), we would note that although A. scabra still consumes m o r e oxygen than A. digitalis (since column sums have not changed), this
difference depends greatly on the salinity. At 100% and 50%, A. scabra consumes considerably more oxygen than A. digitalis, but at 75% this relationship
is reversed. Thus, we are n o longer able to m a k e an unequivocal statement
a b o u t the a m o u n t of oxygen taken up by the two species. We have to qualify
our statement by the seawater concentration at which they are kept. At 100%
ι Mil ι
9.3
Comparison of means of the data in Box 9.1 and Table 9.2.
Spa ies
Seawiiter
ianccniraiion
-
A. scabra
-
.·). (lii/italis
Μ can
10.56
7.89
12.17
7.43
7.34
12.33
9.00
7.61
12.25
10.21
9.03
9.62
10.56
7.89
12.17
7.43
12.33
7.34
9.03
9.00
10.1 1
9.76
9.62
V./
Oruftnui ilalu from Box
ion",;
75".;
50",;
Mean
Artificial
data from
loo",;
75",;
50",;
Mean
Table
10.21
9.1 / t w o - w a y a n o v a w i t h r i i'i κ λ h o n
195
a n d 50%, Yscabra > y d i g i , a l i ! ^ b u t at 75%, T scabril < K d , Bilali ,. If we examine the
effects of salinity in the artificial example, we notice a mild increase in oxygen
c o n s u m p t i o n at 75%. H o w e v e r , again we have to qualify this s t a t e m e n t by the
species of the c o n s u m i n g limpet; scabra c o n s u m e s least at 75%, while digitalis
c o n s u m e s most at this c o n c e n t r a t i o n .
This d e p e n d e n c e of the effect of o n e factor o n the level of a n o t h e r f a c t o r
is called interaction. It is a c o m m o n a n d f u n d a m e n t a l scientific idea. It indicates
that the effects of t h e t w o factors are not simply additive b u t t h a t any given
c o m b i n a t i o n of levels of factors, such as salinity c o m b i n e d with a n y one species,
contributes a positive o r negative increment to the level of expression of the
variable. In c o m m o n biological terminology a large positive increment of this
sort is called synergism. W h e n drugs act synergistically, the result of the interaction of the t w o d r u g s m a y be a b o v e a n d b e y o n d the sum of the separate effects
of each drug. W h e n levels of t w o factors in c o m b i n a t i o n inhibit each other's
effects, wc call it interference. ( N o t e that "levels" in a n o v a is customarily used
in a loose sense to include not only c o n t i n u o u s factors, such as the salinity in
the present example, but also qualitative factors, such as the two species of
limpets.) Synergism a n d interference will both tend to magnify the interaction
SS.
Testing for interaction is an i m p o r t a n t p r o c e d u r e in analysis of variance.
If the artificial d a t a of T a b l e 9.2 were real, it would be of little value to state
that 75% salinity led to slightly greater c o n s u m p t i o n of oxygen. This statement
would cover up the i m p o r t a n t differences in the d a t a , which are t h a t scabra
c o n s u m e s least at this c o n c e n t r a t i o n , while digitalis c o n s u m e s most.
Wc are now able to write an expression symbolizing the d e c o m p o s i t i o n of
a single variatc in a two-way analysis of variance in the m a n n e r of Expression (7.2) for single-classification a n o v a . T h e expression below a s s u m e s that
both factors represent fixed treatment effects. Model I. This would seem reasonable, since species as well as salinity are fixed treatments. Variatc Yiik is
the Alh item in the s u b g r o u p representing the /th g r o u p οΓ treatment A a n d
the /th g r o u p οΓ t r e a t m e n t B. It is d e c o m p o s e d as follows:
Yijk
= / < + «, + / i , + (=r/i),7 +
(9.1)
where μ equals the p a r a m e t r i c mean of the p o p u l a t i o n ,
is the fixed treatment effect for the ;th g r o u p of treatment Α, β, is the fixed treatment effect
of the /th g r o u p of t r e a t m e n t β, (of/0,,· is the interaction effect in the s u b g r o u p
representing the /th g r o u p of factor A a n d the /lh g r o u p of factor B, and t,jk
is the e r r o r term of the fctli item in s u b g r o u p ij. We m a k e the usual a s s u m p t i o n
that ej;Jl is normally distributed with a mean of 0 and a variance of a 2 . If one
or both of the factors represent Model II effects, we replace the a, a n d / o r ftj in
Ihe f o r m u l a by A, a n d / ο ι ΰ,.
In previous c h a p t e r s we have seen that each sum of s q u a r e s represents a
sum of s q u a r e d deviations. W h a t actual deviations does an interaction SS represent? Wc can see this easily by referring back to t h e j u i o v a s of T a b l e 9.1. T h e
variation a m o n g s u b g r o u p s is represented by ( F — V), where V s t a n d s for the
c h a p t e r 9 ,/ t w o - w a y
196
a n a l y s i s oh
variance
s u b g r o u p m e a n , a n d F for the g r a n d m e a n . W h e n we subtract the deviations
d u e to rows ( R — F) a n d those d u e to c o l u m n s (C — F) f r o m those d u e t o subg r o u p s , we o b t a i n
(F-P)-(«-?)-(C-y)=F-y-K+?-c+F
= F-κ - c + F
T h i s s o m e w h a t involved expression is the deviation d u e t o interaction. W h e n
we e v a l u a t e o n e such expression for each s u b g r o u p , s q u a r e it, s u m the squares,
a n d multiply the s u m by n, we o b t a i n the i n t e r a c t i o n SS. This p a r t i t i o n of the
d e v i a t i o n s also holds for their squares. This is so because the s u m s of t h e p r o d ucts of the s e p a r a t e t e r m s cancel o u t .
A simple m e t h o d for revealing the n a t u r e of the interaction present in the
d a t a is to inspect the m e a n s of the original d a t a table. We c a n d o this in T a b l e
9.3. T h e original d a t a , s h o w i n g n o interaction, yield the following p a t t e r n of
relative m a g n i t u d e s :
Scahra
Digitalis
ν
ν
Λ
Λ
100%
75%
50%
T h e relative m a g n i t u d e s of the m e a n s in the lower part of T a b l e 9.3 can be s u m marized as follows:
Scuhru
Digitalis
100%
V
Λ
Λ
V
75%
50%
W h e n the p a t t e r n of signs expressing relative m a g n i t u d e s is not u n i f o r m as in
this latter table, interaction is indicated. As long as the p a t t e r n of m e a n s is
consistent, as in the f o r m e r table, interaction may not be present. However,
interaction is often present without c h a n g e in the direction of the differences;
sometimes only the relative m a g n i t u d e s are alTected. In any case, the statistical
test needs to be performed to test whether the deviations arc larger t h a n can
be expected f r o m c h a n c e alone.
In s u m m a r y , when the effect of two t r e a t m e n t s applied together c a n n o t be
predicted from the average responses of the s e p a r a t e factors, statisticians call
this p h e n o m e n o n interaction a n d test its significance by m e a n s of an interaction
9.2 /
T W O - W A Y ANOVA: SIGNIFICANCE TESTING
I')/
m e a n square. This is a very c o m m o n p h e n o m e n o n . If we say that the effect of
density o n the fecundity or weight of a beetle d e p e n d s o n its genotype, we
imply that a g e n o t y p e χ density interaction is present. If the success of several
alternative surgical p r o c e d u r e s d e p e n d s on the n a t u r e of the p o s t o p e r a t i v e
t r e a t m e n t , we s p e a k of a p r o c e d u r e χ t r e a t m e n t interaction. O r if t h e effect of
t e m p e r a t u r e on a m e t a b o l i c process is i n d e p e n d e n t of the effect of oxygen
c o n c e n t r a t i o n , we say t h a t t e m p e r a t u r e χ oxygen interaction is absent.
Significance testing in a two-way a n o v a will be deferred until t h e next
section. H o w e v e r , we should point o u t that the c o m p u t a t i o n a l steps 4 a n d 9
of Box 9.1 could have been s h o r t e n e d by e m p l o y i n g the simplified f o r m u l a for
a sum of squares between two groups, illustrated in Section 8.4. In a n analysis
with only t w o r o w s a n d t w o c o l u m n s the interaction SS c a n be c o m p u t e d
directly as
(Sum of o n e d i a g o n a l - sum of o t h e r diagonal) 2
abn
9.2 Two-way anova: Significance testing
Before we can test h y p o t h e s e s a b o u t the sources of variation isolated in Box 9.1,
we must become familiar with t h e expected m e a n squares for this design. In
the a n o v a table of Box 9.1 we first show the e x p e c t e d - m e a n s q u a r e s for M o d e i
I, both species differences a n d seawater c o n c e n t r a t i o n s being fixed t r e a t m e n t
effects. T h e t e r m s should be familiar in the context of y o u r experience in the
previous chapter. T h e q u a n t i t i e s Σ " α 2 , Σ ' / ? 2 , a n d Σ ^ α β ) 2 represent a d d e d
c o m p o n e n t s d u e t o t r e a t m e n t for columns, rows, a n d interaction, respectively.
N o t e t h a t the w i t h i n - s u b g r o u p s or e r r o r MS again estimates the p a r a m e t r i c
variance of the items, σ 2 .
T h e most i m p o r t a n t fact to r e m e m b e r a b o u t a M o d e l 1 a n o v a is that the
m e a n s q u a r e at each level of variation carries only the added effect d u e to that
level of t r e a t m e n t . Hxccp! for the p a r a m e t r i c variance of the items, it d o e s not
contain any term from a lower line. T h u s , the expected M S o f f a c l o r A c o n t a i n s
only the p a r a m e t r i c variance of the items plus the a d d e d term d u e to f a c t o r A,
but does nol also include interaction effects. In M o d e l 1, the significance test
is therefore simple a n d s t r a i g h t f o r w a r d . Any source of variation is tested by the
variance ratio of the a p p r o p r i a t e m e a n s q u a r e over the e r r o r MS T h u s , for the
a p p r o p r i a t e tests we e m p l o y variance ratios Λ/Error, β/Error a n d ( Α χ β)/
Error, where each boldface term signifies a m e a n square. T h u s A — MSA,
Error = MSwilhiI1.
W h e n we d o this in the e x a m p l e of Box 9.1, we find only factor ΰ, salinity,
significant. Neither factor A nor the interaction is significant. We c o n c l u d e that
the differences in oxygen c o n s u m p t i o n are induced by varying salinities ( O z
c o n s u m p t i o n r e s p o n d s in a V-shaped manner), a n d there d o e s not a p p e a r to be
sufficient evidence for species differences in oxygen c o n s u m p t i o n . T h e t a b u l a t i o n
of the relative m a g n i t u d e s of the m e a n s in the previous section s h o w s t h a t the
198
CHAPTER 9 / TWO-WAY ANALYSIS OF VARIANCE
p a t t e r n of signs in t h e t w o lines is identical. H o w e v e r , this m a y be m i s l e a d i n g ,
since t h e m e a n of A. scabra is far higher a t 100% s e a w a t e r t h a n a t 75%, b u t t h a t
of A. digitalis is only very slightly higher. A l t h o u g h the o x y g e n c o n s u m p t i o n
c u r v e s of t h e t w o species w h e n g r a p h e d a p p e a r far f r o m parallel (see F i g u r e
9.2), this s u g g e s t i o n of a species χ salinity i n t e r a c t i o n c a n n o t b e s h o w n t o be
significant w h e n c o m p a r e d w i t h t h e w i t h i n - s u b g r o u p s v a r i a n c e . F i n d i n g a significant difference a m o n g salinities d o e s n o t c o n c l u d e the analysis. T h e d a t a suggest t h a t at 75% salinity t h e r e is a real r e d u c t i o n in o x y g e n c o n s u m p t i o n .
W h e t h e r this is really so c o u l d be tested by t h e m e t h o d s of S e c t i o n 8.6.
W h e n w e a n a l y z e t h e results of the artificial e x a m p l e in T a b l e 9.2, we find
o n l y t h e i n t e r a c t i o n MS significant. T h u s , we w o u l d c o n c l u d e t h a t t h e r e s p o n s e
t o salinity differs in t h e t w o species. T h i s is b r o u g h t o u t b y i n s p e c t i o n of t h e
d a t a , w h i c h s h o w t h a t at 75% salinity A. scabra c o n s u m e s least o x y g e n a n d
A. digitalis c o n s u m e s m o s t .
In t h e last (artificial) e x a m p l e the m e a n s q u a r e s of t h e t w o f a c t o r s ( m a i n
effects) a r e n o t significant, in a n y ease. H o w e v e r , m a n y statisticians w o u l d n o t
even test t h e m o n c e they f o u n d t h e i n t e r a c t i o n m e a n s q u a r e t o be significant,
since in such a case a n overall s t a t e m e n t for each f a c t o r w o u l d h a v e little m e a n ing. A s i m p l e s t a t e m e n t of r e s p o n s e to salinity w o u l d be unclear. T h e p r e s e n c e
of i n t e r a c t i o n m a k e s us q u a l i f y o u r s t a t e m e n t s : " T h e p a t t e r n of r e s p o n s e to
c h a n g e s in salinity differed in the t w o species." W e w o u l d c o n s e q u e n t l y h a v e
t o d e s c r i b e s e p a r a t e , n o n p a r a l l e l r e s p o n s e c u r v e s for the t w o species. O c c a sionally, it b e c o m e s i m p o r t a n t to test for overall significance in a M o d e l 1
a n o v a in spite of the p r e s e n c e of i n t e r a c t i o n . W e m a y wish t o d e m o n s t r a t e
t h e significance of the effect of a d r u g , r e g a r d l e s s of its significant i n t e r a c t i o n
with a g e of t h e p a t i e n t . T o s u p p o r t this c o n t e n t i o n , we m i g h t wish t o test t h e
m e a n s q u a r e a m o n g d r u g c o n c e n t r a t i o n s (over the e r r o r MS), r e g a r d l e s s of
w h e t h e r the i n t e r a c t i o n MS is significant.
.1.
digitalis
I'KiURE 9 . 2
50
75
% Seawatrr
100
Oxygen
consumption
by
two
species
of
l i m p e t s at t h r e e salinities. D a t a f r o m Box 9.1.
9.3 / TWO-WAV ANOVA WITHOU I ΚΙ ΙΊ (CATION
199
Box 9.1 also lists expected m e a n squares for a M o d e l II a n o v a a n d a mixedmodel two-way a n o v a . Here, variance c o m p o n e n t s for c o l u m n s (factor A), for
rows (factor B), a n d for interaction m a k e their a p p e a r a n c e , a n d they are design a t e d σΑ, σ | , a n d σ2ΑΒ, respectively. In the M o d e l II a n o v a n o t e t h a t the two
m a i n effects c o n t a i n the variance c o m p o n e n t of the interaction as well as their
own variance c o m p o n e n t . In a M o d e l II a n o v a we first test (A χ 6)/Error. If
the interaction is significant, we c o n t i n u e testing Aj(A χ Β) a n d B/(A χ Β). But
when Α χ Β is n o t significant, some a u t h o r s suggest c o m p u t a t i o n of a pooled
e r r o r MS = (SSAxB
+ S S w i t h i n ) / ( ^ x B + i// within ) t o test the significance of the
main effects. T h e conservative position is to c o n t i n u e to test the main effects
over the interaction MS, a n d we shall follow this p r o c e d u r e in this b o o k . Only
one type of mixed m o d e l is s h o w n in Box 9.1, in which factor A is assumed
to be fixed a n d factor Β to be r a n d o m . If the situation is reversed, the expected
m e a n squares c h a n g e accordingly. In the mixed model, it is the m e a n s q u a r e
representing the fixed t r e a t m e n t that carries with it the variance c o m p o n e n t of
the interaction, while the m e a n s q u a r e representing the r a n d o m factor c o n t a i n s
only the error variance a n d its o w n variance c o m p o n e n t a n d does not includc
the interaction c o m p o n e n t . We therefore test the MS of the r a n d o m m a i n effect
over the error, but test the fixed treatment MS over the interaction.
9.3 Two-way anova without replication
In m a n y experiments there will be no replication for each c o m b i n a t i o n of factors
represented by a cell in the data lable. In such cases we c a n n o t easily talk of
" s u b g r o u p s , " since each ccll contains a single reading only. F r e q u e n t l y it m a y
be t o o difficult or t o o expensive to o b t a i n m o r e than o n e reading per cell,
or the m e a s u r e m e n t s m a y be k n o w n to be so repeatable that there is little
point in estimating their error. As we shall see in the following, a two-way a n o v a
without replication can be properly applied only with certain assumptions.
For s o m e models a n d tests in a n o v a wc must assume that there is n o interaction
present.
O u r illustration for this design is from a study in m e t a b o l i c physiology.
In Box 9.2 wc s h o w levels of a chemical, S - P I . P , in the blood scrum of eight
s t u d e n t s before, immediately after, a n d 12 h o u r s after the a d m i n i s t r a t i o n of an
alcohol dose. Each studcnl has been measured only once al each lime. What
is (he a p p r o p r i a t e model for this a n o v a 7
Clearly, the times arc Model I. T h e eight individuals, however, a r e not likely
to be of specific interest. It is i m p r o b a b l e that an investigator would try to ask
why student 4 has an S - P E P level so much higher than that of student 3. Wc
would d r a w m o r e meaningful conclusions from this p r o b l e m if wc considered
the eight individuals to be r a n d o m l y sampled. W c could then estimate the variation a m o n g individuals with respect to the effect of alcohol over time.
T h e c o m p u t a t i o n s a r c s h o w n in Box 9.2. T h e y arc the same as those in Box
9.1 except that the expressions to be evaluated are considerably simpler. Since
ι i = l , much of the s u m m a t i o n can be omitted. T h e s u b g r o u p sum of squares
BOX 9.2
Two-way anova without replication.
Serum-pyridoxal-t-phosphate (S-PLP) content (ng per ml of serum) of blood serum before and after ingestion of alcohol in eight subjects. This is a mixed-model anova.
Factor A:
Time
(a = 3)
Factor B:
Individuals
Φ = 8)
Before
alcohol
ingestion
Immediately
after
ingestion
12 hours
later
Σ
1
2
3
4
5
6
7
8
20.00
17.62
11.77
30.78
11.25
19.17
9.33
32.96
12.34
16.72
9.84
20.25
9.70
15.67
8.06
19.10
17.45
18.25
11.45
28.70
12.50
20.04
10.00
30.45
49.79
52.59
33.06
79.73
33.45
54.88
27.39
82.51
Σ
152.88
111.68
148.84
413.40
Source: Data from Leinert et aL (1983).
The eight sets of three readings are treated as replications (blocks) in this analysis. Time is a fixed treatment effect, while differences between individuals are considered to be random effects. Hence, this is a mixed-model anova.
Preliminary
computations
a
b
1. Grand total = Σ Σ
y
413 40
=
·
α b
2. Sum of the squared observations = Σ Σ
y2
= (20.00)2 + - · · + (30.45)2 = 8349.4138
-» Sum
c
ι
. . ι divided
.»· Μ by
u sample
ι size
· ofr a column
ι
Σ ( Σ 77
3.
ofr squaredΛ column
totals
=—
b
b fa
2
2
2
+ (148.84)
= (152.88) + (111.68)
—
— = 7249.7578
8
\2
y τ y]
4. Sum of squared row totals divided by sample size of a row = —
\
a
/
(49 79)2 -t- • · • -j- (82 51 )2
— = —'
—-—— = 8127.8059
3
a b
\2
Σ Σ η
5. Grand total squared and divided by the total sample size = correction term CT •ab
=
(quantity I) 2 = ( 4 1 3 : 4 0 ) ! =
ab
6· SSu»ai = Σ Σ
7. SSA
γ2
~
C T =
7120 8150
24
quantity 2 - quantity 5 = 8349.4138 - 7120.8150 = 1228.5988
Σ(ςυ)2
(SS of columns) = — \
1
- C T = quantity 3 - quantity 5 = 7249.7578 - 7120.8150 = 128.9428
b
Σ(ςυ)2
8. SSB (SS of rows) = — ^
a
J
— - CT= quantity 4 - quantity 5 = 8127.8059 - 7120.8150 = 1006.9909
9. SS error (remainder; discrepance) = SSlota) - SSA - SSB = quantity 6 — quantity 7 - quantity 8
= 1228.5988 - 128.9428 - 1006.9909 = 92.6651
202
8
w
•α
a,
«3
+
to
+
**
•G
OS
to
5
en
§
+
«1
NX
is
*
ΐί
Tf
3
oo Λ
W> 00
w-1
00 vO
ci
•Ί-
00
rJi
•Ίο·.
oo
<
N4
T—
Ο
οOS
CJ\
οΟ
Vl
SO
\D
Η
Os
OO
OO
Os
v-i
00
r^i
π
e
CO
o.
1)
β
ε
CTJ
3 υ
Ό
' >
•ο
c
•3 ' 3
C Β
υ
J2
ο
υ
OQ
ω
ο
Η
+
Γ-1 "S χ
i 3
χ "S
Ο §
βο υ
c .s
§ Ϊ3
ι>?
I
ι
ί».
203
9 . 3 / TWO-WAY ANOVA WITHOUT REPLICATION
R o w SS = 1006.9909
T o t a l SS = 1228.5988 <
C o l u m n .S'.S = 128.9428
>- S u b g r o u p
= 122S.5988
I n t e r a c t i o n SS = 92.6651 = r e m a i n d e r
£
E r r o r .S'.V = 0
FIGURF. 9 . 3
D i a g r a m m a t i c r e p r e s e n t a t i o n of t h e p a r t i t i o n i n g of t h e total s u m s of s q u a r e s in a t w o - w a y o r t h o g o n a l a n o v a w i t h o u t r e p l i c a t i o n . T h e a r e a s of the s u b d i v i s i o n s a r e not s h o w n p r o p o r t i o n a l to t h e
m a g n i t u d e s of t h e s u m s of s q u a r e s .
in this example is the s a m e as the total sum of squares. If this is not immediately
a p p a r e n t , consult Figure 9.3, which, w h e n c o m p a r e d with Figure 9.1, illustrates
that the e r r o r sum of squares based on variation within s u b g r o u p s is missing
in this example. T h u s , after we s u b t r a c t t h e sum of squares for c o l u m n s (factor
A) a n d for rows (factor B) f r o m the total SS, we are left with only a single sum
of squares, which is the equivalent of the previous interaction SS but which is
n o w the only source for an e r r o r term in the a n o v a . This SS is k n o w n as the
remainder SS or the discrepance.
If you refer to the expected m e a n s q u a r e s for the two-way a n o v a in Box 9.1,
you will discover why we m a d e the s t a t e m e n t earlier that for s o m e models and
tests in a two-way a n o v a w i t h o u t replication we must a s s u m e that the interaction is not significant. If interaction is present, only a M o d e l II a n o v a can
be entirely tested, while in a mixed model only the fixed level c a n be tested
over the r e m a i n d e r m e a n square. But in a pure M o d e l I a n o v a , o r for the
r a n d o m factor in a mixed model, it would be i m p r o p e r to test the m a i n effects
over the r e m a i n d e r unless we could reliably a s s u m e that n o a d d e d effect d u e
to interaction is present. G e n e r a l inspection of the d a t a in Box 9.2 convinces
us that the t r e n d s with time for any o n e individual are faithfully reproduced
for the o t h e r individuals. Thus, interaction is unlikely to be present. If, for
example, some individuals had not responded with a lowering of their S - P L P
levels after ingestion of alcohol, interaction would have been a p p a r e n t , a n d the
test of the m e a n s q u a r e a m o n g individuals carricd out in Box 9.2 would not
have been legitimate.
Since we a s s u m e no interaction, the r o w and c o l u m n m e a n s q u a r e s arc
tested over the e r r o r MS. T h e results a r e not surprising; casual inspection of
the d a t a would have predicted o u r findings. Differences with time are highly
significant, yielding a n F„ value of 9.741. T h e a d d e d variance a m o n g individuals
is also highly significant, a s s u m i n g there is n o interaction.
A c o m m o n a p p l i c a t i o n of t w o - w a y a n o v a w i t h o u t replication is the repeated
testing of the same individuals. By this we m e a n that the same g r o u p of individuals
204
CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
is tested repeatedly over a period of time. T h e individuals are o n e factor (usually
considered as r a n d o m a n d serving as replication), a n d the time d i m e n s i o n is
the second factor, a fixed t r e a t m e n t effect. F o r example, we might m e a s u r e
g r o w t h of a s t r u c t u r e in ten individuals at regular intervals. W h e n we test for
the presence of an a d d e d variance c o m p o n e n t (due to the r a n d o m factor), we
again m u s t a s s u m e that there is n o interaction between time a n d the individuals;
that is, the responses of the several individuals are parallel t h r o u g h time. Ano t h e r use of this design is f o u n d in various physiological a n d psychological
experiments in which we test the same g r o u p of individuals for the a p p e a r a n c e
of some response after t r e a t m e n t . E x a m p l e s include increasing i m m u n i t y after
antigen inoculations, altered responses after conditioning, and m e a s u r e s of
learning after a n u m b e r of trials. Thus, we m a y study the speed with which ten
rats, repeatedly tested on the same maze, reach the end point. T h e fixedt r e a t m e n t effect would be the successive trials to which the rats h a v e been
subjected. T h e second factor, the ten rats, is r a n d o m , p r e s u m a b l y representing
a r a n d o m sample of rats f r o m the l a b o r a t o r y p o p u l a t i o n .
O n e special case, c o m m o n e n o u g h to merit s e p a r a t e discussion, is repeated
testing of the s a m e individuals in which only two treatments (a = 2) a r e given. This case is also k n o w n as paired comparisons, because each o b s e r v a t i o n
for o n e t r e a t m e n t is paired with o n e for the o t h e r t r e a t m e n t . This pair is c o m posed of the same individuals tested twice o r of two individuals with c o m m o n experiences, so t h a t we can legitimately a r r a n g e the d a t a as a t w o - w a y
anova.
Let us e l a b o r a t e on this point. S u p p o s e we test the muscle t o n e of a g r o u p
of individuals, subject t h e m to severe physical exercise, a n d measure their muscle
tone once more. Since the same g r o u p of individuals will have been tested twice,
we can a r r a n g e o u r muscle tone readings in pairs, each pair representing readings
on o n e individual (before a n d after exercise). Such d a t a are a p p r o p r i a t e l y treated
by a two-way a n o v a without replication, which in this case would be a paircdc o m p a r i s o n s test because there are only t w o t r e a t m e n t classes. This " b e f o r e a n d
after t r e a t m e n t " c o m p a r i s o n is a very frequent design leading to paired c o m parisons. A n o t h e r design simply measures t w o stages in the d e v e l o p m e n t of a
g r o u p of organisms, time being the treatment intervening between the Iwo
stages. The e x a m p l e in Box 9.3 is of this nature. It measures lower face width
in a g r o u p of girls at age five and in the s a m e g r o u p of girls when they a r e six
years old. The paired c o m p a r i s o n is for each individual girl, between her face
width when she is five years old a n d her face width at six years.
Paired c o m p a r i s o n s often result from dividing an organism o r o t h e r individual unit so that half receives t r e a t m e n t I a n d the o t h e r half t r e a t m e n t 2,
which m a y be the control. T h u s , if we wish to test the strength of t w o antigens
o r allergens we might inject o n e into each a r m of a single individual a n d measure the d i a m e t e r of the red area p r o d u c e d . It would not be wise, f r o m the
point of view of experimental design, to test antigen 1 on individual I a n d
antigen 2 on individual 2. These individuals m a y be differentially susceptible
to these antigens, and we may learn little a b o u t the relative potency of the
9 . 3 / TWO-WAV ANOVA WITHOU
IΚΙΙΊ(CATION
205
BOX 9.3
Paired comparisons (randomized Mocks with β = 2).
Lower face width (skeletal bigoniai diameter in cm) for 15 North American white
girls measured when 5 and again when 6 years old.
Individuals
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Er
Σγ1
w
5-year-olds
(2)
6-year-olds
(i)
Σ
M>
»=ri2-r(I
(difference)
7.33
7.49
7.27
7.93
7.56
7.81
7.46
6.94
7.49
7.44
7.95
7.47
7.04
7.10
7.64
111.92
836.3300
7.53
7.70
7.46
8.21
7.81
8.01
7.72
7.13
7.68
7.66
8.11
7.66
7.20
7.25
7.79
114.92
881.8304
14.86
15.19
14.73
16.14
15.37
15.82
15.18
14.07
15.17
15.10
16.06
15.13
14.24
14.35
15.43
226.84
3435.6992
0.20
.21
.19
.28
.25
.20
.26
.19
.19
.22
.16
.19
.16
.15
.15
3.00
0.6216
Source: From a larger study by Newman and Meredith (1956).
Two-way anova without replication
Anova table
Source of
variation
df
SS
Ages (columns;
factor A)
1 0.3000
Individuals
(rows; factor Β)
Remainder
Total
14 2.6367
14 0.0108
29 2.9475
^o.oi|i.i4] = 8.86
MS
0.3000
388.89**
0.188,34
0.000,771,43
(244.14)**
^0.01(12.12] =
Expected MS
F.
<r2 + o2AB +
-b-τΣ"2
<3—1
22
σ + tTab
<r
+ ασί
(Conservative tabled value)
Conclusions.—The variance ratio for ages is highly significant. We conclude
that faces of 6-year-old girls are wider than those of 5-year-olds. If we are willing
CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
206
BOX 9.3
Continued
to assume that the interaction o \ B is zero, we may test for an added variance
component among individual girls and would find it significant.
The t test for paired comparisons
._ D ~ (μι~μ2)
«Β
where D is the mean difference between the paired observations.
_ το
3.oo
D = _ _ — _ _ _ Λο 20
and sg = sD/v'fo is the standard error of D calculated from the observed differences
in column (4):
- (^Dfjb _
Sj>
1
-
jO.6216 - (3.00 2 /fS) _
/0.0216
14
14
b —I
~yj
= V0S")T,542,86 = 0.039,279,2
and thus
_ s„ _ 0.039,279,2
• 0.010,141,9
We assume that the true difference between the means of the two groups, pt — μ2,
equals zero:
D- 0
^ "
0.20 - 0
" 0Ό10,14Ι,9 "
19 7 2 0 3
With
"
' =
This yields Ρ « 0.0L Also tj = 388.89, which equals the previous F„,
antigens, since this would be c o n f o u n d e d by the differential responses of the
subjects. A m u c h better design would be lirst to injcct antigen 1 into the left a r m
a n d antigen 2 into the right a r m of a g r o u p of n individuals and then to analyze
the d a t a as a two-way a n o v a without replication, with η rows (individuals) a n d
2 c o l u m n s (treatments). It is p r o b a b l y immaterial whether an antigen is injected
into the right or left a r m , but if wc were designing such an e x p e r i m e n t a n d
knew little a b o u t the reaction of h u m a n s to antigens, we might, as a p r e c a u t i o n ,
r a n d o m l y allocate antigen 1 to the left or right a r m for different subjects, antigen
2 being injccted into the o p p o s i t e a r m . A similar example is the testing of ccrtain
plant viruses by r u b b i n g a c o n c e n t r a t i o n of the virus over the surfacc of a leaf
a n d c o u n t i n g the resulting lesions. Since different leaves are susceptible in different degrees, a c o n v e n t i o n a l way of m e a s u r i n g the strength of the virus is to
9 . 3 / T W O - W A V ANOVA WITHOU
IΚΙΙΊ(CATION
207
wipe it over t h e half of the leaf on o n e side of the midrib, r u b b i n g the other
half of the leaf with a control or s t a n d a r d solution.
A n o t h e r design leading to paired c o m p a r i s o n s is to apply the t r e a t m e n t to
t w o individuals s h a r i n g a c o m m o n experience, be this genetic or e n v i r o n m e n t a l .
T h u s , a d r u g or a psychological test might be given to g r o u p s of twins o r sibs.
one of each pair receiving the treatment, the o t h e r one not.
Finally, the p a i r e d - c o m p a r i s e n s technique may be used when the t w o individuals to be c o m p a r e d share a single experimental unit a n d are thus subjected
to c o m m o n e n v i r o n m e n t a l experiences. If we have a set of rat cages, each of
which holds two rats, a n d we are trying to c o m p a r e the effect of a h o r m o n e
injection with a control, we might inject o n e of each pair of rats with the
h o r m o n e a n d use its cage m a t e as a control. This w o u l d yield a 2 χ η a n o v a
for η cages.
O n e reason for f e a t u r i n g the p a i r e d - c o m p a r i s o n s test separately is t h a t it
alone a m o n g the t w o - w a y a n o v a s w i t h o u t replication h a s a n equivalent, alternative m e t h o d of a n a l y s i s — t h e t test for paired c o m p a r i s o n s , which is the
traditional m e t h o d of analyzing it.
T h e p a i r e d - c o m p a r i s o n s ease shown in Box 9.3 analyzes face widths of fiveand six-year-old girls, as already m e n t i o n e d . T h e question being asked is
whether the faces of six-year-old girls are significantly wider than those of fiveyear-old girls. T h e d a t a a r e s h o w n in c o l u m n s (1) a n d (2) for 15 individual girls.
C o l u m n (3) features the row s u m s that are necessary for the analysis of variance.
T h e c o m p u t a t i o n s for the two-way a n o v a w i t h o u t replication are the same as
those already s h o w n for Box 9.2 and thus arc not shown in detail. T h e a n o v a
table shows that there is a highly significant difference in face width between
the two age groups. If interaction is assumed to be zero, there is a large a d d e d
variance c o m p o n e n t a m o n g the individual girls, u n d o u b t e d l y representing
genetic as well as e n v i r o n m e n t a l differences.
T h e o t h e r m e t h o d of analyzing p a i r e d - c o m p a r i s o n s designs is the wellk n o w n t test for paired comparisons. It is quite simple to apply a n d is illustrated
in the second half of Box 9.3. It tests whether the mean of s a m p l e differences
between pairs of readings in the t w o c o l u m n s is significantly different from a
hypothetical mean, which the null hypothesis puts at zero. T h e s t a n d a r d error
over which this is tested is the s t a n d a r d e r r o r of the m e a n difference. T h e difference c o l u m n has to be calculated and is s h o w n in c o l u m n (4) of the data
tabic in Box 9.3. T h e c o m p u t a t i o n s arc quite s t r a i g h t f o r w a r d , a n d the conclusions a r c the s a m e as for the two-way a n o v a . This is a n o t h e r instance in which
we o b t a i n the value of F s when we s q u a r e the value of /,.
Although the p a i r e d - c o m p a r i s o n s t test is the traditional m e t h o d of solving
this type of problem, we prefer the two-way a n o v a . Its c o m p u t a t i o n is no more
t i m e - c o n s u m i n g and has the a d v a n t a g e of providing a measure of the variance
c o m p o n e n t a m o n g the rows (blocks). This is useful knowledge, because if thereis no significant a d d e d variance c o m p o n e n t a m o n g blocks, o n e might simplify
the analysis a n d design of future, similar studies by e m p l o y i n g single classification a n o v a .
CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
208
Exercises
9.1
Swanson, Latshaw, and Tague (1921) determined soil p H electrometrically for
various soil samples from Kansas. An extract of their d a t a (acid soils) is
shown below. D o subsoils differ in p H from surface soils (assume that there is
no interaction between localities and depth for p H reading)?
County
Finney
Montgomery
Doniphan
Jewell
Jewell
Shawnee
Cherokee
Greenwood
Montgomery
Montgomery
Cherokee
Cherokee
Cherokee
9.2
Soil
type
Surface ρ Η
Subsoil
6.57
6.77
6.53
6.71
6.72
6.01
4.99
5.49
5.56
5.32
5.92
6.55
6.53
Richfield silt loam
Summit silty clay loam
Brown silt loam
Jewell silt loam
Colby silt loam
Crawford silty clay loam
Oswego silty clay loam
Summit silty clay loam
Cherokee silt loam
Oswego silt loam
Bates silt loam
Cherokee silt loam
Neosho silt loam
pH
8.34
6.13
6.32
8.30
8.44
6.80
4.42
7.90
5.20
5.32
5.21
5.66
5.66
ANS. MS between surface and subsoils = 0.6246, MS r e s i d u a l = 0.6985, Fs = 0.849
which is clearly not significant at the 5% level.
The following data were extracted from a Canadian record book of purebred
dairy cattle. R a n d o m samples of 10 mature (five-year-old and older) and 10
two-year-old cows were taken from each of five breeds (honor roll, 305-day
class). The average butterfat percentages of these cows were recorded. This
gave us a total of 100 butterfat percentages, broken down into five breeds
and into two age classes. The 100 butterfat percentages are given below.
Analyze and discuss your results. You will note that the tedious part of
the calculation has been done for you.
Ayshire
Mature
2-yr
3.74
4.01
3.77
3.78
4.10
4.06
4.27
3.94
4.1 1
4.25
40.03
4.003
4.44
4.37
4.25
3.71
4.08
3.90
4.41
4.1 1
4.37
3.53
41.17
4.1 17
Canadian
Mature
2-yr
Guernsey
Mature
2-yr
Holstein-Friesian
2-vr
Mature
Jersey
Mature
2-yr
3.92
4.95
4.47
4.28
4.07
4.10
4.38
3.98
4.46
5.05
4.29
5.24
4.43
4.00
4.62
4.29
4.85
4.66
4.40
4.33
4.54
5.18
5.75
5.04
4.64
4.79
4.72
3.88
5.28
4.66
5.30
4.50
4.59
5.04
4.83
4.55
4.97
5.38
5.39
5.97
3.40
3.55
3.83
3.95
4.43
3.70
3.30
3.93
3.58
3.54
3.79
3.66
3.58
3.38
3.71
3.94
3.59
3.55
3.55
343
4.80
6.45
5.18
4.49
5.24
5.70
5.41
4.77
5.18
5.23
5.75
5.14
5.25
4.76
5.18
4.22
5.98
4.85
6.55
5.72
43.66
45.11
48.48
50.52
37.21
36.18
52.45
53.40
4.366
4.51 1
4.848
5.052
iihit
X Y2 = 2059.6109
3.721
3.618
5.245
5.340
1 \ l KC IS1 s
9.3
209
Blakeslee (1921) studied length-width ratios of second seedling leaves of two
types of Jimson weed called globe (G) a n d nominal (TV). Three seeds of each
type were planted in 16 pots. Is there sufficient evidence to conclude that globe
and nominal differ in length-width ratio?
Types
Pot
identification
number
1.67
1.68
1.38
1.66
1.38
1.70
1.58
1.49
1.48
1.28
1.55
1.29
1.36
1.47
1.52
1.37
16533
16534
16550
16668
16767
16768
16770
16771
16773
16775
16776
16777
16780
16781
16787
16789
9.4
Ν
G
1.53
1.70
1.76
1.48
1.61
1.71
1.59
1.52
1.44
1.45
1.45
1.57
1.22
1.43
1.56
1.38
1.61
1.49
1.52
1.69
1.64
1.71
1.38
1.68
1.58
1.50
1.44
1.44
1.41
1.61
1.56
1.40
2.18
2.00
2.41
1.93
2.32
2.48
2.00
1.94
1.93
1.77
2.06
2.00
1.87
2.24
1.79
1.85
2.23
2.12
2.11
2.00
2.23
2.11
2.18
2.13
1.95
2.03
1.85
1.94
1.87
2.00
2.08
2.10
2.32
2.18
2.60
2.00
1.90
2.00
2.16
2.29
2.10
2.08
1.92
1.80
2.26
2.23
1.89
2.00
ANS. AFVwilhin — 0.0177, MS, x ,, = 0.0203, MSiy)^ = 7.3206 (1·\ = 360.62**),
MSiMs = 0.0598 (F, = 3.378**). T h e cllect of pots is considered to be a Model 11
factor, and types, a Model 1 factor.
The following data were extracted from a more cntensive study by Sokal and
K a r t c n (1964). T h e data represent mean dry weights (in mg) of three genotypes
of beetles, 'I'riholiimi castaneum, reared at a density of 20 beetles per gram of
flour. T h e four scries of experiments represent replications.
(ienol ι
Series
1
2
3
4
9.5
ι +
+b
bb
0.958
0.971
0.927
0.971
0.986
1.051
0.891
1.010
0.925
0.952
0.829
0.955
Test whether the genotypes differ in mean dry weight.
T h e mean length of developmental period (in days) for three strains of houseflies at seven densities is given. (Data by Sullivan and Sokal, 1963.) Do these
Hies differ in development period with density and a m o n g strains? You may
assume absence of strain χ density interaction.
210
CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE
Strains
per
Dt'/i.si'/ V
container
OL
BF.LL
bwb
60
9.6
9.3
80
10.6
9.8
9.1
9.3
9.3
9.2
9.5
10.7
n.i
10.9
9.1
11.1
10.0
10.4
11.8
10.6
10.8
10.7
160
320
640
1280
2560
12.8
ANS. MS r „ 1 ( i u a l = 0.3426, MS M r a i n , = 1.3943 (F, = 4.070*), MS„ cn , lty = 2.0905
(F„ = 6.1019**).
9.6
The following data are extracted from those of French (1976), who carried out
a study of energy utilization in the pocket mouse I'eroynathus
longimembris
during hibernation at different temperatures. Is there evidence that the amount
of food available affects the amount of energy consumed at different temperatures during hibernation?
Restricted
,v C
Animal
no
1
ρ
3
4
Ad-libit um footl
food
IS C
I'jicriiv
used
ike id ·/)
62.69
.54.07
65.73
62.98
Animal
int.
5
6
7
8
s
hnenjv
used
1 kcal !l)
•1 nnnal
no.
72.60
70.97
13
14
74.32
53.02
15
16
c
IS C
hncrii r
used
(/«•«/,>/)
95.73
63.95
144.30
144.30
Animal
no.
17
18
19
20
Enerij
used
\kcal;g\
101.19
76.8 (S
74.08
81.40
Download