CHAPTER Introduction to Analysis of Variance We now proceed to a study of the analysis of variance. This m e t h o d , developed by R. A. F isher, is f u n d a m e n t a l to m u c h of the application of statistics in biology and especially to experimental design. O n e use of the analysis of variance is to test whether two or m o r e s a m p l e m e a n s have been o b t a i n e d f r o m p o p u l a t i o n s with the same p a r a m e t r i c m e a n . W h e r e only t w o samples a r e involved, the I test can also be used. However, the analysis of variance is a m o r e general test, which permits testing two samples as well as m a n y , a n d we arc therefore i n t r o d u c i n g it at this early stage in o r d e r to e q u i p you with this powerful w e a p o n for y o u r statistical arsenal. Wc shall discuss the / test for t w o samples as a special ease in Section 8.4. In Section 7.1 wc shall a p p r o a c h the subject on familiar g r o u n d , the s a m p l i n g experiment of the housefly wing lengths. F r o m these samples we shall o b t a i n two independent estimates of the p o p u l a t i o n variance. Wc digress in Scction 7.2 to i n t r o d u c e yet a n o t h e r c o n t i n u o u s distribution, the /·' distribution, needed lor the significance test in analysis of variance. Section 7.3 is a n o t h e r digression; here we s h o w how the F distribution can be used to test w h e t h e r t w o samples may reasonably have been d r a w n f r o m p o p u l a t i o n s with the same variance. Wc are now ready for Scction 7.4, in which we e x a m i n e the effects of subjecting the samples to different treatments. In Section 7.5, we describe the partitioning of 134 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance sums of squares and of degrees of freedom, the actual analysis of variance. The last two sections (7.6 and 7.7) take up in a more formal way the two scientific models for which the analysis of variance is appropriate, the so-called fixed treatment effects model (Model I) and the variance component model (Model II). Except for Section 7.3, the entire chapter is largely theoretical. W e shall p o s t p o n e the practical details of c o m p u t a t i o n to C h a p t e r 8. However, a t h o r o u g h understanding of the material in C h a p t e r 7 is necessary for working out actual examples of analysis of variance in C h a p t e r 8. O n e final c o m m e n t . W e shall use J. W. Tukey's acronym " a n o v a " interchangeably with "analysis of variance" t h r o u g h o u t the text. 7.1 The variances of samples and their means We shall a p p r o a c h analysis of variance t h r o u g h the familiar sampling experiment of housefly wing lengths (Experiment 5.1 and Table 5.1), in which we combined seven samples of 5 wing lengths to form samples of 35. W e have reproduced one such sample in Table 7.1. The seven samples of 5, here called groups, are listed vertically in the upper half of the table. Before we proceed to explain Table 7.1 further, we must become familiar with a d d e d terminology and symbolism for dealing with this kind of problem. We call our samples groups; they are sometimes called classes or are k n o w n by yet other terms we shall learn later. In any analysis of variance we shall have two or more such samples or groups, and we shall use the symbol a for the n u m b e r of groups. Thus, in the present example a = 7. Each g r o u p or sample is based on η items, as before; in Table 7.1, η = 5. The total n u m b e r of items in the table is a times n, which in this case equals 7 χ 5 or 35. The sums of the items in the respective groups are shown in the row underneath the horizontal dividing line. In an anova, s u m m a t i o n signs can no longer be as simple as heretofore. We can sum either the items of one g r o u p only or the items of the entire table. We therefore have to use superscripts with the s u m m a t i o n symbol. In line with our policy of using the simplest possible notation, whenever this is not likely to lead to misunderstanding, we shall use Σ"Υ to indicate the sum of the items of a g r o u p and Σ" η Υ to indicate the sum of all the items in the table. The sum of the items of each g r o u p is shown in the first row under the horizontal line. The mean of each group, symbolized by V', is in the next row and is c o m p u t e d simply as Σ"Υ/>!. The remaining t w o rows in that portion of Table 7.1 list Σ"Υ1 and Σ" y1, separately for each group. These are the familiar quantities, the sum of the squared V's and the sum of squares of Y. F r o m the sum of squares for each g r o u p we can obtain an estimate of the population variance of housefly wing length. Thus, in the first g r o u p = 29.2. Therefore, our estimate of the p o p u l a t i o n variance is Γ- •f ΟΟ in τίII ΙΙ s· § cΟ <3> 23 C^ OO OO wo rII Ii Μ t-l ο ο 1 = " Κ § II rn •«t II a» *§ S ® Ε » ° ν» U o1 " VD ^t t </-> Μ ^r •f -3- OO — r- —I τ»Κ ι—ι Ο OS <N t rj· ro "t r-l 'Τ rt V") Tf V) 0\ Ο m α ^Ό rf Ό— Tf Tf Ο Ο rf OO Ο Tf ) TJ" Tj" Tf rJ m π OO C7\ ON Άι Ο rr t τί" Tt xt t Γ1 - Tf 00 fH ΓΙ t 4 ^t ^t ^t VJ </-> S ii •o c ii c ο oo rl Tf TN T < Κt t Tf r— ι 1 II II 'W Ι&Γ I θ" νο οο (Λ vi ΓΊ c h a p t e r 7 /' i n t r o d u c t i o n 136 to analysis of variance a rather low estimate c o m p a r e d with those obtained in the other samples. Since we have a sum of squares for each group, we could obtain an estimate of the p o p u l a t i o n variance f r o m each of these. However, it stands to reason that we would get a better estimate if we averaged these separate variance estimates in some way. This is d o n e by c o m p u t i n g the weighted average of the variances by Expression (3.2) in Section 3.1. Actually, in this instance a simple average would suffice, since all estimates of the variance are based on samples of the same size. However, we prefer to give the general formula, which works equally well for this case as well as for instances of unequal sample sizes, where the weighted average is necessary. In this case each sample variance sf is weighted by its degrees of freedom, w\ = n ; — 1, resulting in a sum of squares ( Z y f ) , since («,· — l)s 2 = Σ y f . Thus, the n u m e r a t o r of Expression (3.2) is the sum of the sums of squares. T h e d e n o m i n a t o r is Σ"(π, — 1) = 7 χ 4, the sum of the degrees of freedom of each group. The average variance, therefore, is 7 s2 = 29.2 + 12.0 + 75.2 + 45.2 + 98.8 + 81.2 + 107.2 28 = 448.8 28 = 6.029 This quantity is an estimate of 15.21, the parametric variance of housefly wing lengths. This estimate, based on 7 independent estimates of variances of groups, is called the average variance within groups or simply variance within groups. N o t e that we use the expression within groups, although in previous chapters we used the term variance of groups. T h e reason we do this is that the variance estimates used for c o m p u t i n g the average variance have so far all come from sums of squares measuring the variation within one column. As wc shall see in what follows, one can also c o m p u t e variances a m o n g groups, cutting across g r o u p boundaries. T o obtain a sccond estimate of the population variance, we treat the seven g r o u p means Ϋ as though they were a sample of seven observations. T h e resulting statistics arc shown in the lower right part of Tabic 7.1, headed " C o m p u t a t i o n of sum of squares of means." There arc seven means in this example; in the general case there will be a means. We first c o m p u t e Σ"Ϋ, the sum of the means. N o t e thai this is rather sloppy symbolism. T o be entirely proper, we should identify this q u a n t i t y as Σ; ^" Yh s u m m i n g the m e a n s of g r o u p 1 through g r o u p a. T h e next quantity c o m p u t e d is Ϋ, the grand mean of the g r o u p means, computed as Υ = Σ"Ϋ/α. T h e sum of the seven means is Σ"Ϋ = 317.4, and the grand mean is Ϋ = 45.34, a fairly close a p p r o x i m a t i o n to the parametric mean μ — 45.5. T h e sum of squares represents the deviations of the g r o u p means from the grand mean, Σ"(>' — >7)2. For this wc first need the quantity Σ"Κ 2 , which equals 14,417.24. The customary c o m p u t a t i o n a l formula for sum of squares applied to these means is Σ"Ϋ2 - [(Σ"Υ) 2 /ciJ = 25.417. F r o m the sum of squares of the means we obtain a variance among the means in the conventional way as follows: Σ" (Ϋ Y) 2 /(a I). Wc divide by a 1 rather than η — 1 because the sum of squares was based on a items (means). Thus, variance of the means s2· — 7.1 / t h e v a r i a n c e s o f s a m p l e s a n d t h e i r 137 means 25.417/6 = 4.2362. W e learned in C h a p t e r 6, Expression (6.1), that when we randomly sample f r o m a single population, and hence Thus, we can estimate a variance of items by multiplying the variance of means by the sample size on which the means are based (assuming we have sampled at r a n d o m from a c o m m o n population). W h e n we do this for our present example, we obtain s2 = 5 χ 4.2362 = 21.181. This is a second estimate of the parametric variance 15.21. It is not as close to the true value as the previous estimate based on the average variance within groups, but this is to be expected, since it is based on only 7 "observations." W e need a n a m e describing this variance to distinguish it from the variance of means from which it has been computed, as well as from the variance within groups with which it will be compared. W e shall call it the variance among groups; it is η times the variance of means and is an independent estimate of the parametric variance σ2 of the housefly wing lengths. It m a y not be clear at this stage why the two estimates of a 2 that we have obtained, the variance within groups and the variance a m o n g groups, are independent. W e ask you to take on faith that they are. Let us review what we have done so far by expressing it in a more formal way. Table 7.2 represents a generalized table for d a t a such as the samples of housefly wing lengths. Each individual wing length is represented by Y, subscripted to indicate the position of the quantity in the data table. The wing length of the j t h fly from the /th sample or g r o u p is given by Y^. Thus, you will notice that (he first subscript changes with each column representing a g r o u p in the tabi.K 7.2 Data arranged for simple analysis of variance, single classification, completely randomized. (/roups a I >0 ).-, >«, " >:„ >;,, >,. γ y sums £γ Σ. t2 iy3 Means Ϋ Υ, Y2 Υ, · ••• ' >,. >., >;„ ••• • iy, •·· V, x, >;. i n V, c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f 138 variance table, and the second subscript changes with each row representing an individual item. Using this notation, we can c o m p u t e the variance of sample 1 as 1 i=" y — η - r 1 i Σ= ι ( u - y i)2 The variance within groups, which is the average variance of the samples, is c o m p u t e d as 1 i=a j —η Γ> ,Σ= ι Σ 1) j=ι α ( η - ( Y i j - N o t e the double s u m m a t i o n . It means that we start with the first group, setting i = 1 (i being the index of the outer Σ). W e sum the squared deviations of all items from the mean of the first group, changing index j of the inner Σ f r o m 1 to η in the process. W e then return to the outer summation, set i = 2, a n d sum the squared deviations for g r o u p 2 from j = 1 toj = n. This process is continued until i, the index of the outer Σ, is set to a. In other words, we sum all the squared deviations within one g r o u p first and add this sum to similar sums f r o m all the other groups. The variance a m o n g groups is c o m p u t e d as n i=a -^-rliY.-Y) a - 1 Μ 2 N o w that we have two independent estimates of the population variance, what shall we do with them? We might wish to find out whether they d o in fact estimate the same parameter. T o test this hypothesis, we need a statistical test that will evaluate the probability that the two sample variances are from the same population. Such a test employs the F distribution, which is taken u p next. 7.2 The F distribution Let us devise yet a n o t h e r sampling experiment. This is quite a tedious one without the use of computers, so we will not ask you to carry it out. Assume that you are sampling at r a n d o m from a normally distributed population, such as the housefly wing lengths with mean μ and variance σ2. T h e sampling procedure consists of first sampling n l items and calculating their variance .vf, followed by sampling n 2 items and calculating their variance .s2. Sample sizes n, and n 2 may or may not be equal to each other, but are fixed for any one sampling experiment. Thus, for example, wc might always sample 8 wing lengths for the first sample (n,) and 6 wing lengths for the second sample (n 2 ). After each pair of values (sf and has been obtained, wc calculate This will be a ratio near 1, because these variances arc estimates of the same quantity. Its actual value will depend on the relative magnitudes of variances ..- ι „> ir ι.. 1 r ., ,..,i...,i.,<ii,„ 7.2 / t h e F d i s t r i b u t i o n 139 Fs of their variances, the average of these ratios will in fact a p p r o a c h the quantity (n2 — l)/(«2 — 3), which is close to 1.0 when n2 is large. The distribution of this statistic is called the F distribution, in h o n o r of R. A. Fisher. This is a n o t h e r distribution described by a complicated mathematical function that need not concern us here. Unlike the t and χ2 distributions, the shape of the F distribution is determined by two values for degrees of freedom, Vj and v 2 (corresponding to the degrees of freedom of the variance in the n u m e r a t o r and the variance in the d e n o m i n a t o r , respectively). Thus, for every possible combination of values v l5 v 2 , each ν ranging from 1 to infinity, there exists a separate F distribution. Remember that the F distribution is a theoretical probability distribution, like the t distribution and the χ2 distribution. Variance ratios s f / s f , based on sample variances are sample statistics that m a y or may not follow the F distribution. We have therefore distinguished the sample variance ratio by calling it Fs, conforming to o u r convention of separate symbols for sample statistics as distinct from probability distributions (such as ts and X2 contrasted with t and χ2). We have discussed how to generate an F distribution by repeatedly taking two samples from the same normal distribution. We could also have generated it by sampling from two separate n o r m a l distributions differing in their mean but identical in their parametric variances; that is, with μ, φ μ 2 but σ\ = σ\. Thus, we obtain an F distribution whether the samples come from the same normal population or from different ones, so long as their variances arc identical. Figure 7.1 shows several representative F distributions. F or very low degrees of freedom the distribution is l - s h a p c d , but it becomes humped and strongly skewed to the right as both degrees of freedom increase. Table V in Appendix norm 7. ι 140 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance A2 s h o w s the cumulative probability distribution of F for three selected p r o b ability values. T h e values in the table represent F a ( v i v j ] , where a is the p r o p o r t i o n of the F d i s t r i b u t i o n t o t h e right of the given F value (in o n e tail) a n d \'j, v 2 are the degrees of f r e e d o m p e r t a i n i n g to the variances in the n u m e r a t o r and the d e n o m i n a t o r of the ratio, respectively. T h e table is a r r a n g e d so t h a t across the t o p o n e reads v l 5 the degrees of f r e e d o m p e r t a i n i n g to the u p p e r ( n u m e r a t o r ) variance, a n d a l o n g the left m a r g i n o n e r e a d s v 2 , the degrees of f r e e d o m pertaining to the lower ( d e n o m i n a t o r ) variance. At each intersection of degree of f r e e d o m values we list three values of F decreasing in m a g n i t u d e of a. F o r example, a n F distribution with v, = 6, v 2 = 24 is 2.51 at a = 0.05. By t h a t we m e a n that 0.05 of the a r e a u n d e r the curve lies to the right of F = 2.51. Figure 7.2 illustrates this. O n l y 0.01 of the area u n d e r the curve lies t o the right of F = 3.67. T h u s , if we have a null hypothesis H0: σ\ = σ\, with the alternative hypothesis Ηx: σ\ > we use a one-tailed F test, as illustrated by F i g u r e 7.2. W e can n o w test the t w o variances o b t a i n e d in the s a m p l i n g e x p e r i m e n t of Section 7.1 a n d T a b l e 7.1. T h e variance a m o n g g r o u p s based on 7 m e a n s w a s 21.180, a n d the variance within 7 g r o u p s of 5 individuals was 16.029. O u r null hypothesis is that the t w o variances estimate the same p a r a m e t r i c variance; the alternative hypothesis in an a n o v a is always that the p a r a m e t r i c variance estim a t e d by the variance a m o n g g r o u p s is greater t h a n that estimated by the variance within g r o u p s . T h e reason for this restrictive alternative hypothesis, which leads to a one-tailed test, will be explained in Section 7.4. W e calculate the variance ratio F s = s\js\ = 21.181/16.029 = 1.32. Before we c a n inspect the FKHJRE 7 . 2 F r e q u e n c y curve of the /· d i s t r i b u t i o n for (> and 24 degrees of f r e e d o m , respectively. A one-tailed 141 7.1 / t h e F d i s t r i b u t i o n F table, we have to k n o w the a p p r o p r i a t e degrees of freedom for this variance ratio. We shall learn simple formulas for degrees of freedom in an a n o v a later, but at the m o m e n t let us reason it out for ourselves. T h e u p p e r variance (among groups) was based on the variance of 7 means; hence it should have α — 1 = 6 degrees of freedom. T h e lower variance was based on an average of 7 variances, each of t h e m based on 5 individuals yielding 4 degrees of freedom per variance: a(n — 1) = 7 χ 4 = 28 degrees of freedom. Thus, the upper variance has 6, the lower variance 28 degrees of freedom. If we check Table V for ν 1 = 6 , v 2 = 24, the closest a r g u m e n t s in the table, we find that F0 0 5 [ 6 24] = 2.51. F o r F = 1.32, corresponding to the Fs value actually obtained, α is clearly >0.05. Thus, we may expect m o r e t h a n 5% of all variance ratios of samples based on 6 and 28 degrees of freedom, respectively, to have Fs values greater t h a n 1.32. We have no evidence to reject the null hypothesis and conclude that the two sample variances estimate the same parametric variance. This corresponds, of course, to what we knew anyway f r o m o u r sampling experiment. Since the seven samples were taken from the same population, the estimate using the variance of their means is expected to yield another estimate of the parametric variance of housefly wing length. Whenever the alternative hypothesis is that the two parametric variances are unequal (rather than the restrictive hypothesis Η { . σ \ > σ 2 ), the sample variance s j can be smaller as well as greater than s2. This leads to a two-tailed test, and in such cases a 5% type I error means that rejection regions of 2 j % will occur at each tail of the curve. In such a case it is necessary to obtain F values for ot > 0.5 (that is, in the left half of the F distribution). Since these values arc rarely tabulated, they can be obtained by using the simple relationship ' I I K)[V2. Vl] For example, F(1 „ 5 ( 5 2 4 , = 2.62. If we wish to obtain F 0 4 5 [ 5 2 4 1 (the F value to the right of which lies 95% of the area of the F distribution with 5 and 24 degrees of freedom, respectively), we first have to find F(1 0 5 1 2 4 = 4.53. Then F0 4515 241 is the reciprocal of 4.53, which equals 0.221. T h u s 95% of an F distribution with 5 and 24 degrees of freedom lies to the right of 0.221. There is an i m p o r t a n t relationship between the F distribution and the χ2 distribution. You may remember that the ratio X2 = Σ\>2/σ2 was distributed as a χ2 with η — I degrees of freedom. If you divide the n u m e r a t o r of this expression by n — 1, you obtain the ratio F, = ,ν 2 /σ 2 , which is a variance ratio with an expected distribution of F,,,- , , The upper degrees of freedom arc η — I (the degrees of freedom of the sum of squares or sample variance). T h e lower degrees of freedom are infinite, because only on the basis of an infinite n u m b e r of items can we obtain the true, parametric variance of a population. Therefore, by dividing a value of X 2 by η — 1 degrees of freedom, we obtain an Fs value with η - 1 and co d f , respectively. In general, χ2^\!ν ~ *]· Wc can convince ourselves of this by inspecting the F and χ2 tables. F r o m the χ2 tabic (Table IV) we find that χ 2,. 5[ΐοι ^ 18.307. Dividing this value by 10 dj\ we obtain 1.8307. c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f 142 variance Thus, the two statistics of significance are closely related and, lacking a χ 2 table, we could m a k e d o with an F table alone, using the values of vF [v ^ in place °f* 2 v,· Before we return to analysis of variance, we shall first apply our newly won knowledge of the F distribution to testing a hypothesis a b o u t two sample variances. BOX 7.1 Testing the significance of differences between two variances. Survival in days of the cockroach Blattella vaga when kept without food or water. Females Males n, = 10 n2 = 1 0 H0: <xf = σ | Y, = 8.5 days P2 = 4.8 days = 3.6 s\ = 0.9 Η^.σίΦσΙ Source: Data modified from Willis and Lewis (1957). The alternative hypothesis is that the two variances are unequal. We have no reason to suppose that one sex should be more variable than the other. In view of the alternative hypothesis this is a two-tailed test. Since only the right tail of the F distribution is tabled extensively in Table V and in most other tables, we calculate F s as the ratio of the greater variance over the lesser one: Because the test is two-tailed, we look up the critical value Fa/2|vi,»2)> where α is the type I error accepted and v, = ri1 — 1 and v2 = n, — 1 are the degrees of freedom for the upper and lower variance, respectively. Whether we look up ^<χ/2ΐν,.ν2] o r Fx/up,vi] depends on whether sample 1 or sample 2 has the greater variance and has been placed in the numerator. From Table V we find F0.02519,9] = 4.03 and F 0 0 5 l 9 i 9 J = 3.18. Because this is a two-tailed test, we double these probabilities. Thus, the F value of 4.03 represents a probability of α = 0.05, since the right-hand tail area of α = 0.025 is matched by a similar left-hand area to the left of ^o.975[9.9i = '/f0.025(9,9] = 0.248. Therefore, assuming the null hypothesis is true, the probability of observing an F value greater than 4.00 and smaller than 1/4.00 = 0.25 is 0.10 > Ρ > 0.05. Strictly speaking, the two sample variances are not significantly different—the two sexes are equally variable in their duration of survival. However, the outcome is close enough to the 5% significance level to make us suspicious that possibly the variances are in fact different. It would be desirable to repeat this experiment with larger sample sizes in the hope that more decisive results would emerge. 7.3 / THE HYPOTHESIS H0:uj = 143 σ\ 7.3 The hypothesis H0: σ\ = σ\ A test of the null hypothesis that two normal populations represented by two samples have the same variance is illustrated in Box 7.1. As will be seen later, some tests leading to a decision a b o u t whether two samples come f r o m p o p u l a tions with the same m e a n assume that the population variances are equal. H o w ever, this test is of interest in its own right. We will repeatedly have to test whether two samples have the same variance. In genetics wc may need to k n o w whether an offspring generation is m o r e variable for a character t h a n the parent generation. In systematics we might like to find out whether two local p o p u l a t i o n s are equally variable. In experimental biology we may wish to d e m o n s t r a t e under which of two experimental setups the readings will be more variable. In general, the less variable setup would be preferred; if b o t h setups were equally variable, the experimenter would pursue the one that was simpler or less costly to undertake. 7.4 Heterogeneity among sample means We shall now modify the data of Table 7.1, discussed in Section 7.1. Suppose the seven groups of houseflies did not represent r a n d o m samples from the same population but resulted from the following experiment. Each sample was reared in a separate culture jar, and the medium in each of the culture jars was prepared in a different way. Some had more water added, others more sugar, yet others more solid matter. Let us assume that sample 7 represents the s t a n d a r d medium against which we propose to c o m p a r e the other samples. The various changes in the medium affect the sizes of the flies that emerge from it; this in turn affects the wing lengths we have been measuring. We shall assume the following effects resulting from treatment of the medium: Medium 1 decreases average wing length of a sample by 5 units 2 -decreases average wing length of a sample by 2 units 3 — d o e s not change average wing length of a sample 4 increases average wing length of a sample by 1 unit 5 -increases average wing length of a sample by 1 unit 6 increases average wing length of a sample by 5 units 7—(control) does not change average wing length of a sample The effect of treatment / is usually symbolized as a,. (Please note that this use of α is not related to its use as a symbol for the probability of a type I error.) Thus a, assumes the following values for the above treatment effects. α, - - 5 α 4 =• I α. = -2 «5=1 «Λ = 0 α6 = 5 /ν — η σ·> cQ «I •f = ε δο <* υ ι>- ΓΪ II *> 2 r-i ο <+N § ο ο I ε "! n>I 'b. + —ι νο rr fN ΚΊ Ό r-, \θ r J ^D + tlun tn to vD ο •5 — ο (Ν • —ι Ο γ- r- c-ι •3- r r ti- o in ο ^t so ι^ιο ^f 2 « te « r<~) CL _ i/i i/3 II XI = hΟ 1.Ξ ο* W 1 Ο + ο s 7.4 / h e t e r o g e n e i t y a m o n g s a m p l e 145 means N o t e t h a t t h e α,-'s have been defined so t h a t Σ" a,· = 0; t h a t is, the effects cancel out. This is a convenient p r o p e r t y t h a t is generally p o s t u l a t e d , but it is unnecessary for o u r a r g u m e n t . W e can now modify T a b l e 7.1 by a d d i n g t h e a p p r o p r i a t e values of a t to e a c h sample. In s a m p l e 1 the value of a 1 is —5; therefore, the first wing length, which was 41 (see T a b l e 7.1), n o w becomes 36; the second wing length, formerly 44, b e c o m e s 39; a n d so on. F o r the second s a m p l e a 2 > s — 2, c h a n g i n g t h e first wing length f r o m 48 t o 46. W h e r e a, is 0, the wing lengths d o not change; where a { is positive, they are increased by the m a g n i t u d e indicated. T h e c h a n g e d values can be inspected in Table 7.3, which is a r r a n g e d identically to T a b l e 7.1. We n o w repeat o u r previous c o m p u t a t i o n s . W e first calculate the s u m of squares of the first s a m p l e to find it t o be 29.2. If you c o m p a r e this value with the sum of squares of the first sample in T a b l e 7.1, you find the two values to be identical. Similarly, all o t h e r values of Σ" y2, the sum of s q u a r e s of each g r o u p , are identical to their previous values. W h y is this so? T h e effect of a d d i n g a, to each g r o u p is simply that of an additive code, since a, is c o n s t a n t for any one group. F r o m Appendix A 1.2 we can see that additive codes d o not affect s u m s of s q u a r e s or variances. Therefore, not only is each s e p a r a t e s u m of squares the same as before, but the average variance within g r o u p s is still 16.029. N o w let us c o m p u t e the variance of the means. It is 100.617/6 = 16.770, which is a value m u c h higher t h a n the variance of m e a n s f o u n d before, 4.236. W h e n we multiply by η = 5 t o get an estimate of σ 2 , we o b t a i n the variance of groups, which now is 83.848 a n d is no longer even close to an estimate of σ2. W e repeat the I·' test with the new variances a n d find that Fs = 83.848/16.029 = 5.23, which is m u c h greater than the closest critical value of F 0 0 S | h 2 4| = 2.51. In fact, the observed F s is greater t h a n F„ 0 l | ( 1 , 4 ] = 3.67. Clearly, the u p p e r variance, representing the variance a m o n g groups, has become significantly larger. T h e t w o variances are most unlikely to represent the same p a r a m e t r i c variance. W h a t has h a p p e n e d ? We can easily explain it by m e a n s of T a b l e 7.4, which represents T a b l e 7.3 symbolically in the m a n n e r that Table 7.2 represented Table 7.1. We note that each g r o u p has a c o n s t a n t a, added a n d that this constant changes the s u m s of the g r o u p s by na, a n d the m e a n s of these g r o u p s by <Xj. In Section 7.1 we c o m p u t e d the variance within g r o u p s as J Σ u j ~ π Σ ,2 ( V >'.,· When wc try to repeat this, our f o r m u l a becomes m o r e complicated, because to each Y:j a n d each V, there has now been a d d e d a,·. We therefore write a(n - Σ I ) ι Σ l ' y u · Α) ,- ι ι>, · ·Λ)| 2 Then we o p e n the parentheses inside t h e s q u a r e brackets, so that the second a, changes sign a n d the α,-'s cancel out, leaving the expression exactly as before. c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f 146 variance TABLE 7 . 4 D a t a of Table 7.3 arranged in the manner of Table 7.2. a I ΙΛ1 t 2 2 ll J η + a, Σ + HOC, ΫΙ+ a, Y 2 a i + «3 ' • Yn + a, • Yi 2 + a. + «3 + «3 · • ^3 + «, Y.I + ·· • ·• • Y.2 + «« +Yal «„ •• · Yij+ "A •· • Y.J+ Yin + «3 •• ' Yin + Oti•• Y+ m»„ η π Σ y 3 + »a3 •• • tYi + *i • Σκ + ny >:«; + «3 + *2 η η Means y 33 y^ + Yxj Sums Yil «2 y 22 + * 2 Yli + «2 + 3 + Y r , , + «1 Groups 3 n + "a2 F, +<χ2 y3+*3 fi + ti • • • ·· η + »„ s u b s t a n t i a t i n g o u r earlier o b s e r v a t i o n t h a t the variance within g r o u p s d o e s nol c h a n g e despite the t r e a t m e n t effects. T h e variance of m e a n s was previously calculated by the f o r m u l a ι ;-a a — 1 ;=1 H o w e v e r , f r o m T a b l e 7.4 we see that the new grand m e a n equals I i=a - χ (>;• + « , • ) = a i^i ι ι = <i _ ιa • = <. — Σ ϋ< + - Σ a i=ι a , ι ' = * W h e n we substitute the new values for the g r o u p m e a n s and the g r a n d m e a n the f o r m u l a a p p e a r s as -—τ'ς π»;·+ «,)-(y+<*)]2 a - ι ζ-ι a -- Σ - I ,= ι which in turn yields - V) + («,• - a ) l 2 S q u a r i n g (he expression in the s q u a r e brackets, vvc obtain the terms 1 a - , ς'<>; 1 ,-v , >)' + a 1 , Σ ^ - 1, ι - ·<)·' + -a 2 - , Σ 1,= ι - m «) T h e first of these terms we immediately recognize as the previous variance el the means, Sy. T h e second is a new q u a n t i t y , but is familiar by general appeal ancc; it clearly is a variance or at least a q u a n t i t y akin to a variance. T h e tliiM expression is a new type; it is a so-called covariance. which we have not w i e n c o u n t e r e d . We shall not be concerned with it at this stage except to say th.n 7.4 / HETEROGENEITY AMONG SAMPLE MEANS 147 in cases such as the present one, where the m a g n i t u d e of the treatment effects a,· is assumed to be independent of the X to which they are added, the expected value of this q u a n t i t y is zero; hence it does not contribute to the new variance of means. The independence of the treatments effects and the sample m e a n s is an i m p o r t a n t concept that we must u n d e r s t a n d clearly. If we had not applied different treatments to the medium jars, but simply treated all jars as controls, we would still have obtained differences a m o n g the wing length means. Those are the differences f o u n d in Table 7.1 with r a n d o m sampling from the same population. By chance, some of these means are greater, some are smaller. In our planning of the experiment we had no way of predicting which sample means would be small and which would be large. Therefore, in planning our treatments, we had n o way of m a t c h i n g u p a large treatment effect, such as that of medium 6, with the m e a n that by chance would be the greatest, as that for sample 2. Also, the smallest sample mean (sample 4) is not associated with the smallest treatment effect. Only if the m a g n i t u d e of the treatment effects were deliberately correlated with the sample means (this would be difficult to d o in the experiment designed here) would the third term in the expression, the covariance, have an expected value other than zero. T h e second term in the expression for the new variance of m e a n s is clearly added as a result of the treatment effects. It is a n a l o g o u s to a variance, but it cannot be called a variance, since it is not based on a r a n d o m variable, but rather on deliberately chosen treatments largely under our control. By changing the m a g n i t u d e and n a t u r e of the treatments, wc can more or less alter the variancelike quantity at will. We shall therefore call it the added component due to treatment effects. Since the α,-'s are arranged so that a = 0, we can rewrite the middle term as In analysis of variance we multiply the variance of the m e a n s by η in order to estimate the parametric variance of the items. As you know, we call the quantity so obtained the variance of groups. When wc d o this for the ease in which treatment effects are present, we obtain Thus we see that the estimate of the parametric variance of the population is increased by the quantity a which is η times the added c o m p o n e n t due to treatment effects. We found the variance ratio f\. to be significantly greater than could be reconciled with the null hypothesis. It is now obvious why this is so. We were testing the variance 148 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f ratio expecting to find F a p p r o x i m a t e l y equal to σ2/σ2 we have η variance = 1. In fact, however, " a — ι It is clear f r o m this f o r m u l a (deliberately displayed in this lopsided m a n n e r ) that the F test is sensitive to the presence of the a d d e d c o m p o n e n t d u e to treatm e n t effects. At this point, y o u have an a d d i t i o n a l insight into the analysis of variance. It permits us to test w h e t h e r there are a d d e d t r e a t m e n t e f f e c t s — t h a t is, w h e t h e r a g r o u p of m e a n s can simply be considered r a n d o m samples f r o m the same p o p u l a t i o n , or w h e t h e r t r e a t m e n t s that have affected each g r o u p separately have resulted in shifting these m e a n s so m u c h that they can n o longer be considered samples from the s a m e p o p u l a t i o n . If the latter is so, an a d d e d c o m p o n e n t d u e to t r e a t m e n t effects will be present a n d m a y be detected by an F test in the significance test of the analysis of variance. In such a study, we are generally not interested in the m a g n i t u d e of but we are interested in the m a g n i t u d e of the separate values of In o u r e x a m p l e these a r c the effects of different f o r m u l a t i o n s of the m e d i u m on wing length. If, instead of housefly wing length, we were m e a s u r i n g b l o o d pressure in samples of rats a n d the different g r o u p s had been subjected to different d r u g s or different doses of the same drug, the quantities a, would represent the effects of d r u g s on the blood pressure, which is clearly the issue of interest to the investigator. We may also be interested in s t u d y i n g differences of the type a , — x 2 , leading us to the question of the significance of the differences between the effects of a n y two types of m e d i u m or any two drugs. But we a r e a little a h e a d of o u r story. W h e n analysis of variance involves t r e a t m e n t effects of the type just studied, we call it a Model 1 tmovu. Later in this c h a p t e r (Section 7.6), M o d e l I will be defined precisely. T h e r e is a n o t h e r model, called a Model 11 anova, in which the a d d e d effects for cach g r o u p arc not fixed t r e a t m e n t s but are r a n d o m effects. By this we m e a n that we have not deliberately planned or fixed the t r e a t m e n t for any one group, but that the actual effects on each g r o u p are r a n d o m and only partly u n d e r o u r control. S u p p o s e that the seven samples of houscflies in T a b l e 7.3 represented the offspring of seven r a n d o m l y selected females f r o m a p o p u l a t i o n reared on a uniform m e d i u m . T h e r e would be gcnctic differences a m o n g these females, and their seven b r o o d s would reflect this. T h e exact n a t u r e of these differences is unclear and unpredictable. Before actually m e a s u r i n g them, we have no way of k n o w i n g whether b r o o d 1 will have longer wings than b r o o d 2, nor have we any way of controlling this experiment so that b r o o d 1 will in fact grow longer wings. So far as we can ascertain, the genctic factors 149 7.4 / h e t e r o g e n e i t y a m o n g s a m p l e m e a n s for wing length are distributed in a n u n k n o w n m a n n e r in the p o p u l a t i o n of houseflies (we m i g h t hope t h a t they are n o r m a l l y distributed), a n d o u r s a m p l e of seven is a r a n d o m sample of these factors. In a n o t h e r example for a M o d e l II a n o v a , s u p p o s e that instead of m a k i n g u p our seven cultures f r o m a single b a t c h of m e d i u m , we have p r e p a r e d seven batches separately, o n e right after the other, a n d are n o w analyzing the v a r i a t i o n a m o n g the batches. W e w o u l d not be interested in the exact differences f r o m batch to batch. Even if these were m e a s u r e d , we would not be in a position to interpret them. N o t h a v i n g deliberately varied b a t c h 3, we have no idea why, for example, it should p r o d u c c longer wings t h a n b a t c h 2. W e would, however, be interested in the m a g n i t u d e of the variance of the a d d e d effects. T h u s , if we used seven j a r s of m e d i u m derived f r o m o n e batch, we could expect the variance of the j a r m e a n s to be σ 2 / 5 , since there were 5 flies per jar. But when based on different batches of m e d i u m , the variance could be expected t o be greater, because all the i m p o n d e r a b l e accidents of f o r m u l a t i o n a n d e n v i r o n m e n t a l differences d u r i n g m e d i u m p r e p a r a t i o n that m a k e o n e batch of m e d i u m different f r o m a n o t h e r would c o m e into play. Interest would focus on the a d d e d variance c o m p o n e n t arising f r o m differences a m o n g batches. Similarly, in the o t h e r example we would be interested in the a d d e d variance c o m p o n e n t arising f r o m genetic differences a m o n g the females. We shall now take a rapid look at the algebraic f o r m u l a t i o n of (he a n o v a in the case of Model II. In T a b l e 7.3 the second row at the head of the d a t a c o l u m n s shows not only a, but also Ah which is the symbol we shall use for a r a n d o m g r o u p effect. We use a capital letter to indicate that the effect is a variable. T h e algebra of calculating the two estimates of the p o p u l a t i o n variance is the same as in Model I, except that in place of a, we imagine /I, substituted in Table 7.4. T h e estimate of the variance a m o n g m e a n s now represents the q u a n t i t y - a 1 . Σ Ο ' , - > >' + I ,· , a ' . ' Σ <··'. I ,·-1 ·"·' · 2 α , Σ · 1 ,· - , - π κ - η T h e first term is the variance of m e a n s ,Sy, as before, and the last term is the covariance between the g r o u p m e a n s and (he r a n d o m effects Ah the expected value of which is zero (as before), because the r a n d o m effects are independent of (he m a g n i t u d e of the means. T h e middle term is a true variance, since .4, is a r a n d o m variable. We symbolize it by .s^ and call it the added variance component amoiui (/roups. It would represent the added variance c o m p o n e n t a m o n g females or a m o n g medium batches, d e p e n d i n g on which of the designs discussed a b o v e we were thinking of. T h e existence of this added variance component is d e m o n s t r a t e d by the /·' test. If the g r o u p s are r a n d o m samples, we may expect I- to a p p r o x i m a t e σ1/σ1 - I; but with an added variance c o m p o nent, the expected ratio, again displayed lopsidcdly, is η2 X a 2 + ησ\ " 150 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance N o t e that σΑ, the parametric value of sA, is multiplied by η, since we have to multiply the variance of m e a n s by η to obtain an independent estimate of the variance of the population. In a Model II a n o v a we are interested not in the m a g n i t u d e of any At or in differences such as Al — A2, but in the m a g n i t u d e of σΑ a n d its relative m a g n i t u d e with respect to σ 2 , which is generally expressed as the percentage 100s^/(s 2 + sA). Since the variance a m o n g g r o u p s estimates σ2 + ησ\, we can calculate s2A as - (variance a m o n g g r o u p s — variance within groups) η J-[(s2+ ns2A)-s2]=i-(ns2A) = s2A F o r the present example, s2A = |(83.848 - 16.029) = 13.56. This a d d e d variance c o m p o n e n t a m o n g groups is 100 x 13.56 16.029 + 13.56 = J356_ % 29.589 of the sum of the variances a m o n g and within groups. Model II will be formally discussed at the end of this chapter (Section 7.7); the methods of estimating variance c o m p o n e n t s are treated in detail in the next chapter. 7.5 Partitioning the total sum of squares and degrees of freedom So far we have ignored one other variance that can be c o m p u t e d from the d a t a in Table 7.1. If we remove the classification into groups, we can consider the housefly d a t a to be a single sample of an = 35 wing lengths and calculate the m e a n and variance of these items in the conventional manner. T h e various quantities necessary for this c o m p u t a t i o n are shown in the last column at the right in Tables 7.1 and 7.3, headed " C o m p u t a t i o n of total sum of squares." We obtain a mean of F = 45.34 for the sample in Table 7.1, which is, of course, the same as the quantity Ϋ c o m p u t e d previously from the seven g r o u p means. T h e sum of squares of the 35 items is 575.886, which gives a variance of 16.938 when divided by 34 degrees of freedom. Repeating these c o m p u t a t i o n s for the d a t a in Table 7.3, we obtain ? = 45.34 (the same as in Table 7.1 because Σ" a, = 0) and .v2 = 27.997, which is considerably greater than the c o r r e s p o n d ing variance from Table 7.1. The total variance c o m p u t e d from all an items is a n o t h e r estimate of σ 2 . It is a good estimate in the first case, but in the second sample (Table 7.3), where added c o m p o n e n t s due to treatment effects or added variance c o m p o n e n t s are present, it is a poor estimate of the population variance. However, the p u r p o s e of calculating the total variance in an a n o v a is not for using it as yet a n o t h e r estimate of σ 2 , but for introducing an i m p o r t a n t m a t h e m a t i c a l relationship between it and the other variances. This is best seen when we arrange our results in a conventional analysis of variance table, as 7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m TABLE 151 7.5 Anova table for data in Table 7.1. (i) Y Y - Y - Y Y Y U) Source of variation (2) dj Sum of squares SS Among groups Within groups Total 6 28 34 127.086 448.800 575.886 (41 Mean square MS 21.181 16.029 16.938 shown in Table 7.5. Such a table is divided into four columns. The first identifies the source of variation as a m o n g groups, within groups, and total (groups a m a l g a m a t e d to form a single sample). The column headed df gives the degrees of freedom by which the sums of squares pertinent to each source of variation must be divided in order to yield the corresponding variance. T h e degrees of freedom for variation a m o n g groups is a — 1, that for variation within groups is a (η — 1), and that for the total variation is an — 1. The next two columns show sums of squares and variances, respectively. Notice that the sums of squares entered in the a n o v a table are the sum of squares a m o n g groups, the sum of squares within groups, and the sum of squares of the total sample of an items. You will note that variances arc not referred to by that term in anova, but are generally called mean squares, since, in a Model I anova, they d o not estimate a population variance. These quantities arc not true mean squares, because the sums of squares are divided by the degrees of freedom rather than sample size. T h e sum of squares and mean square arc frequently abbreviated SS and MS, respectively. The sums of squares and mean squares in Table 7.5 are the same as those obtained previously, except for minute r o u n d i n g errors. Note, however, an i m p o r t a n t property of the sums of squares. They have been obtained independently of each other, but when we add the SS a m o n g groups to the SS within groups we obtain the total SS. The sums of squares are additive! Another way of saying this is that wc can decompose the total sum of squares into a portion due to variation a m o n g groups and a n o t h e r portion due to variation within groups. Observe that the degrees of freedom are also additive and that the total of 34 df can be decomposed into 6 df a m o n g groups and 28 df within groups. Thus, if we know any two of the sums of squares (and their a p p r o p r i a t e degrees of freedom), we can c o m p u t e the third and complete our analysis of variance. N o t e that the mean squares arc not additive. This is obvious, since generally (a + b)f(c + d) Φ a/c + b/d. Wc shall use the c o m p u t a t i o n a l formula for sum of squares (Expression (3.8)) to d e m o n s t r a t e why these sums of squares are additive. Although it is an algebraic derivation, it is placed here rather than in the Appendix because these formulas will also lead us to some c o m m o n c o m p u t a t i o n a l formulas for analysis of variance. Depending on computational equipment, the formulas wc 152 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance have used so far to obtain the sums of squares may not be the most rapid procedure. T h e sum of squares of m e a n s in simplified n o t a t i o n is Σ Y SS„ =ς (- Σ y y - -„tr \n = 1 / a l η ΣΙ a 1 \ί / a η Ση - i ΣΣ^ an* N o t e that the deviation of m e a n s from the g r a n d mean is first rearranged t o fit the c o m p u t a t i o n a l f o r m u l a (Expression (3.8)), a n d then each m e a n is written in terms of its constituent variates. Collection of d e n o m i n a t o r s outside the summ a t i o n signs yields the final desired form. T o obtain the sum of squares of groups, we multiply SS m c a n s by n, as before. This yields 1 " /" V 1 /ο " SS g r o u p s = η X SS m e a n s = - Σ ί Σ Π - - ( Σ Σ r Next we evaluate the sum of squares within groups: ss w h W i n = l X ( α = Y - > - η π ς ς 2 = Σ t u / π - 2 „ Σ ( Σ ^ T h e total sum of squares represents ssuniύ = Σ Σ ( u γ - η = ΣΣ η 2 1 / a γ 2 η - an- [\ Σ Σ γ We now copy the formulas for these sums of squares, slightly rearranged as follows: SS. Σ Σ ^ Σ ( Σ ss,. 1 /" " -an \ Σ Σ y Y y ) + Σ Σ y 2 1 a n ΣΣ η 1 an ( a n ΣΣγ 7.5 / p a r t i t i o n i n g t h e t o t a l s u m o f s q u a r e s a n d d e g r e e s o f f r e e d o m 153 Adding the expression for SSgroaps to that for SS w i t h i n , we o b t a i n a q u a n t i t y that is identical to the one we have j u s t developed as SStotal. This d e m o n s t r a t i o n explains why the sums of squares are additive. We shall not go t h r o u g h any derivation, but simply state that the degrees of freedom pertaining to the sums of squares are also additive. The total degrees of freedom are split u p into the degrees of freedom corresponding to variation a m o n g groups a n d those of variation of items within groups. Before we continue, let us review the m e a n i n g of the three m e a n squares in the anova. T h e total MS is a statistic of dispersion of the 35 (an) items a r o u n d their mean, the g r a n d m e a n 45.34. It describes the variance in the entire sample due to all the sundry causes and estimates σ2 when there are n o a d d e d treatment effects or variance c o m p o n e n t s a m o n g groups. T h e within-group MS, also k n o w n as the individual or intragroup or error mean square, gives the average dispersion of the 5 (η) items in each g r o u p a r o u n d the g r o u p means. If the a groups are r a n d o m samples f r o m a c o m m o n h o m o g e n e o u s p o p u l a t i o n , the within-group MS should estimate a1. The MS a m o n g groups is based on the variance of g r o u p means, which describes the dispersion of the 7 (a) g r o u p means a r o u n d the g r a n d mean. If the groups are r a n d o m samples from a h o m o geneous population, the expected variance of their m e a n will be σ2/η. Therefore, in order to have all three variances of the same order of magnitude, we multiply the variance of means by η to obtain the variance a m o n g groups. If there are n o added treatment effects o r variance c o m p o n e n t s , the MS a m o n g groups is an estimate of σ 2 . Otherwise, it is an estimate of σ 1 η -1 a \—' > ^ or or σ Ί J + ησΑ a — ι depending on whether the a n o v a at hand is Model I or II. T h e additivity relations we have just learned are independent of the presence of added treatment or r a n d o m effects. We could show this algebraically, but it is simpler to inspect Table 7.6, which summarizes the a n o v a of Table 7.3 in which a, or /t, is a d d e d to each sample. The additivity relation still holds, although the values for g r o u p SS and the total SS are different from those of Table 7.5. TABLE 7.6 Anova table for data in Table 7.3. y y y - y Y Y - - (4) df Μ can square MS 6 28 34 503.086 448.800 951.886 83.848 16.029 27.997 C) U) Source of W Sum af squares SS variation Among groups Within groups Total 154 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance A n o t h e r way of looking at the partitioning of the variation is to study the deviation f r o m m e a n s in a particular case. Referring to Table 7.1, we can look at the wing length of the first individual in the seventh group, which h a p p e n s to be 41. Its deviation from its g r o u p mean is y 7 1 _ y 7 = 41 - 45.4 = - 4 . 4 The deviation of the g r o u p m e a n from the grand m e a n is F7 - F = 45.4 - 45.34 = 0.06 and the deviation of the individual wing length from the grand m e a n is γΊι - y = 4 i — 45.34 = - 4 . 3 4 N o t e that these deviations are additive. The deviation of the item from the g r o u p m e a n and that of the g r o u p mean from the grand m e a n add to the total deviation of the item from the g r a n d j n e a n . These deviations are stated algebraically as ( 7 — F) + ( F - F) = (Y - F). Squaring and s u m m i n g these deviations for an items will result in a n _ a _ _ an Before squaring, the deviations were in the relationship a + b = c. After squaring, we would expect them to take the form a2 4- b2 + lab = c2. W h a t h a p p e n e d to the cross-product term corresponding to 2ab'l This is απ _ _ ^ a — 2Σ(y - F h y - f) = 2 Ϊ [ ( ? - = " _ Ϋ ) Σ ι υ - ?>] a covariance-type term that is always zero, sincc ( Y — F) = 0 for each of the a groups (proof in Appendix A 1.1). We identify the deviations represented by each level of variation at the left margins of the tables giving the analysis of variance results (Tables 7.5 a n d 7.6). N o t e that the deviations add u p correctly: the deviation a m o n g groups plus the deviation within groups equals the total deviation of items in the analysis of variance, ( F - F) + ( Y - F) = ( Y - F). 7.6 Model I anova An i m p o r t a n t point to remember is that the basic setup of data, as well as the actual c o m p u t a t i o n and significance test, in most cases is the same for both models. The purposes of analysis of variance differ for the two models. So do some of the supplementary tests and c o m p u t a t i o n s following the initial significance test. Let us now fry to resolve the variation found in an analysis of variance case. This will not only lead us to a more formal interpretation of a n o v a but will also give us a deeper u n d e r s t a n d i n g of the nature of variation itself. For 7.7 155 / m o d e l ii a n o v a p u r p o s e s of discussion, we r e t u r n t o the housefly wing lengths of T a b l e 7.3. W e ask the question, W h a t m a k e s any given housefly wing length a s s u m e the value it does? T h e third wing length of the first sample of flies is recorded as 43 units. H o w c a n we explain such a reading? If we knew n o t h i n g else a b o u t this individual housefly, o u r best guess of its wing length w o u l d be the g r a n d m e a n of the p o p u l a t i o n , which we k n o w to be μ = 45.5. However, we have a d d i t i o n a l i n f o r m a t i o n a b o u t this fly. It is a m e m b e r of g r o u p 1, which has u n d e r g o n e a t r e a t m e n t shifting the m e a n of the g r o u p d o w n w a r d by 5 units. Therefore, a . 1 = —5, a n d we w o u l d expect o u r individual V13 (the third individual of g r o u p 1) t o m e a s u r e 45.5 - 5 = 40.5 units. In fact, however, it is 43 units, which is 2.5 units a b o v e this latest expectation. T o what can we ascribe this deviation? It is individual variation of the flies within a g r o u p because of the variance of individuals in the p o p u l a t i o n (σ 2 = 15.21). All the genetic a n d e n v i r o n m e n t a l effects that m a k e one housefly different f r o m a n o t h e r housefly c o m e into play t o p r o d u c e this variance. By m e a n s of carefully designed experiments, we might learn s o m e t h i n g a b o u t the causation of this variance a n d a t t r i b u t e it to certain specific genetic or environmental factors. W e might also be able to eliminate some of the variance. F o r instance, by using only full sibs (brothers and sisters) in any one culture jar, we would decrease the genetic variation in individuals, a n d undoubtedly the variance within g r o u p s would be smaller. However, it is hopeless to try to eliminate all variance completely. Even if we could remove all genetic variance, there would still be environmental variance. And even in the most i m p r o b a b l e case in which we could remove both types of variance, m e a s u r e m e n t error would remain, so that we would never obtain exactly the same reading even on the same individual fly. T h e within-groups MS always remains as a residual, greater or smaller f r o m experiment to e x p e r i m e n t — p a r t of the n a t u r e of things. This is why the within-groups variance is also called the e r r o r variance or error mean square. It is not an error in the sense of o u r m a k i n g a mistake, but in the sense of a measure of the variation you have to c o n t e n d with when trying to estimate significant differences a m o n g the groups. T h e e r r o r variance is composed of individual deviations for each individual, symbolized by the r a n d o m c o m p o n e n t of the j t h individual variatc in the /th group. In o u r case, e 1 3 = 2.5, since the actual observed value is 2.5 units a b o v e its expectation of 40.5. We shall now state this relationship m o r e formally. In a Model I analysis of variance we assume that the differences a m o n g g r o u p means, if any, are due to the fixed treatment effects determined by the experimenter. T h e p u r p o s e of the analysis of variance is t o estimate the true differences a m o n g the g r o u p means. Any single variate can be d e c o m p o s e d as follows: Yij = μ + α,· + €y (7.2) where i — 1 , . . . , a, j = 1 , . . . , « ; a n d e (J represents an independent, normally distributed variable with m e a n €,j = 0 a n d variance σ2 = a1. Therefore, a given reading is composed of the grand m e a n μ of the population, a fixed deviation 156 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance of the mean of g r o u p i from the grand mean μ, and a r a n d o m deviation eis of the /th individual of g r o u p i from its expectation, which is (μ + α,). R e m e m b e r that b o t h a,· and can be positive as well as negative. The expected value (mean) of the e^-'s is zero, a n d their variance is the parametric variance of the population, σ 2 . F o r all the assumptions of the analysis of variance to hold, the distribution of £ u must be normal. In a Model I a n o v a we test for differences of the type <xl — i 2 a m o n g the g r o u p m e a n s by testing for the presence of an added c o m p o n e n t due to treatments. If we find that such a c o m p o n e n t is present, we reject the null hypothesis that the g r o u p s come f r o m the same p o p u l a t i o n and accept the alternative hypothesis that at least some of the g r o u p means are different from each other, which indicates that at least some of the a,"s are unequal in magnitude. Next, we generally wish to test which a,'s are different from each other. This is d o n e by significance tests, with alternative hypotheses such as Hl:ctl > α 2 or H\+ a 2 ) > a 3 . In words, these test whether the mean of g r o u p 1 is greater t h a n the mean of g r o u p 2, or whether the mean of g r o u p 3 is smaller than the average of the m e a n s of groups I and 2. Some examples of Model I analyses of variance in various biological disciplines follow. An experiment in which we try the effects of different drugs on batches of animals results in a Model I anova. We arc interested in the results of the treatments and the differences between them. The treatments arc fixed and determined by the experimenter. This is true also when we test the effects of different doses of a given f a c t o r - a chemical or the a m o u n t of light to which a plant has been exposed or temperatures at which culture bottles of insects have been reared. The treatment does not have to be entirely understood and m a n i p ulated by the experimenter. So long as it is fixed and rcpcatable. Model I will apply. If wc wanted to c o m p a r e the birth weights of the Chinese children in the hospital in Singapore with weights of Chinese children born in a hospital in China, our analysis would also be a Model I anova. The treatment effects then would be "China versus Singapore," which sums up a whole series of different factors, genetic and environmental —some known to us but most of them not understood. However, this is a definite treatment wc can describe and also repeat: we can, if we wish, again sample birth weights of infants in Singapore as well as in China. Another example of Model 1 anova would be a study of body weights for animals of several age groups. The treatments would be the ages, which are fixed. If we find that there arc significant differences in weight a m o n g the ages, wc might proceed with the question of whether there is a difference from age 2 to age 3 or only from age I to age 2. T o a very large extent. Model I anovas are the result of an experiment and of deliberate manipulation of factors by the experimenter. However, the study of differences such as the c o m p a r i s o n of birth weights from two countries, while not an experiment proper, also falls into this category. 157 7 . 7 / m o d e l ii a n o v a 7.7 Model II anova The structure of variation in a M o d e l II a n o v a is quite similar t o t h a t in M o d e l I: YtJ = μ + Al + € υ (7.3) where i = 1 , . . . , a; j = 1 , . . . , n; eu represents an independent, normally distributed variable with m e a n ei;- = 0 a n d variance σ 2 = σ 2 ; a n d A-t j e p r e s e n t s a normally distributed variable, independent of all e's, with m e a n A t = 0 and variance σ\. T h e m a i n distinction is that in place of fixed-treatment effects a,·, we now consider r a n d o m effects At that differ f r o m g r o u p t o group. Since the effects are r a n d o m , it is uninteresting t o estimate the m a g n i t u d e of these r a n d o m effects o n a group, or the differences f r o m g r o u p to group. But we can estimate their variance, the a d d e d variance c o m p o n e n t a m o n g g r o u p s σ \ . W e test for its presence a n d estimate its m a g n i t u d e s^, as well as its percentage c o n t r i b u t i o n to the variation in a M o d e l II analysis of variance. Some examples will illustrate the applications of M o d e l II a n o v a . Suppose we wish to determine the D N A content of rat liver cells. W e take five rats and m a k e three p r e p a r a t i o n s f r o m each of the five livers obtained. T h e assay readings will be for a — 5 g r o u p s with η = 3 readings per group. T h e five rats presumably are sampled at r a n d o m f r o m the colony available to the experimenter. They must be different in various ways, genetically a n d environmentally, but we have n o definite i n f o r m a t i o n a b o u t the n a t u r e of the differences. T h u s , if wc learn that rat 2 has slightly m o r e D N A in its liver cells t h a n rat 3, we can d o little with this i n f o r m a t i o n , because we are unlikely to have any basis for following u p this problem. W e will, however, be interested in estimating the variance of the three replicates within any one liver and the variance a m o n g the five rats; that is, does variance σ2Λ exist a m o n g rats in addition to the variance σ2 cxpcctcd on the basis of the three replicates? T h e variance a m o n g the three p r e p a r a t i o n s presumably arises only from differences in technique and possibly f r o m differences in D N A content in different parts of the liver (unlikely in a homogenate). Added variance a m o n g rats, if it existed, might be due to differences in ploidy or related p h e n o m e n a . T h e relative a m o u n t s of variation a m o n g rats and "within" rats ( = a m o n g preparations) would guide us in designing further studies of this sort. If there was little variance a m o n g tlic p r e p a r a t i o n s a n d relatively m o r e variation a m o n g the rats, wc would need fewer p r e p a r a t i o n s and more rats. O n the other h a n d , if the variance a m o n g rats was proportionately smaller, we would use fewer rats and m o r e p r e p a r a t i o n s per rat. In a study of the a m o u n t of variation in skin pigment in h u m a n populations, we might wish to study different families within a h o m o g e n e o u s ethnic or racial g r o u p and brothers and sisters within cach family. T h e variance within families would be the error mean square, a n d we would test for an a d d e d variance c o m p o n e n t a m o n g families. Wc would expect an a d d e d variance c o m p o n e n t σ2Α because there arc genctic differences a m o n g families that determine a m o u n t 158 c h a p t e r 7 /' i n t r o d u c t i o n t o a n a l y s i s o f variance of skin p i g m e n t a t i o n . W e w o u l d be especially interested in the relative p r o p o r tions of the t w o variances σ2 a n d σ\, because they would p r o v i d e us with i m p o r t a n t genetic i n f o r m a t i o n . F r o m o u r k n o w l e d g e of genetic t h e o r y , we w o u l d expect the variance a m o n g families t o be greater t h a n the variance a m o n g b r o t h e r s a n d sisters within a family. T h e a b o v e examples illustrate the t w o types of p r o b l e m s involving M o d e l II analysis of variance t h a t a r e m o s t likely to arise in biological w o r k . O n e is c o n c e r n e d with the general p r o b l e m of the design of a n e x p e r i m e n t a n d the m a g n i t u d e of the e x p e r i m e n t a l e r r o r at different levels of replication, such as e r r o r a m o n g replicates within rat livers a n d a m o n g rats, e r r o r a m o n g batches, experiments, a n d so forth. T h e o t h e r relates t o variation a m o n g a n d within families, a m o n g a n d within females, a m o n g a n d within p o p u l a t i o n s , a n d so forth. Such p r o b l e m s are c o n c e r n e d with the general p r o b l e m of the relation between genetic a n d p h e n o t y p i c variation. Exercises 7.1 7.2 7.3 In a study comparing the chemical composition of the urine of chimpanzees and gorillas (Gartler, Firschein, and Dobzhansky, 1956), the following results were obtained. For 37 chimpanzees the variance for the amount of glutamic acid in milligrams per milligram of creatinine was 0.01069. A similar study based on six gorillas yielded a variance of 0.12442. Is there a significant difference between the variability in chimpanzees and that in gorillas? ANS. Fs = 11.639, 025[5.36] ~ 2.90. The following data are from an experiment by Sewall Wright. He crossed Polish and Flemish giant rabbits and obtained 27 F , rabbits. These were inbred and 112 F 2 rabbits were obtained. We have extracted the following data on femur length of these rabbits. η y s F, 27 Fi 112 83.39 80.5 1.65 3.81 Is there a significantly greater amount of variability in femur lengths among the F2 than among the Fx rabbits? What well-known genetic phenomenon is illustrated by these data? For the following data obtained by a physiologist, estimate a 2 (the variance within groups), a, (the fixed treatment effects), the variance among the groups, and the added component due to treatment Σ α 2 /(a — 1), and test the hypothesis that the last quantity is zero. Treatment V .v2 η A Β C D 6.12 2.85 10 4.34 6.70 10 5.12 4.06 10 7.28 2.03 10 159 exercises 7.4 7.5 ANS. s 2 = 3.91, a, = 0.405, &2 = 1.375, ά 3 = 0.595, ά 4 = 1.565, MS among groups = 124.517, and F, = 31.846 (which is significant beyond the 0.01 level). For the data in Table 7.3, make tables to represent partitioning of the value of each variate into its three components, Ϋ, (Ϋ — Ϋ),(Υυ — Yj). The first table would then consist of 35 values, all equal to the grand mean. In the second table all entries in a given column would be equal to the difference between the mean of that column and the grand mean. And the last table would consist of the deviations of the individual variates from their column means. These tables represent estimates of the individual components of Expression (7.3). Compute the mean and sum of squares for each table. A geneticist recorded the following measurements taken on two-week-old mice of a particular strain. Is there evidence that the variance among mice in different litters is larger than one would expect on the basis of the variability found within each litter? Litters 7.6 1 2 3 4 5 6 7 19.49 20.62 19.51 18.09 22.75 22.94 22.15 19.16 20.98 23.13 23.06 20.05 21.47 14.90 19.72 15.90 21.48 22.48 18.79 19.70 16.72 19.22 26.62 20.74 21.82 20.00 19.79 21.15 14.88 19.79 21.52 20.37 21.93 20.14 22.28 ANS. .r = 5.987, MS among = 4.416, s2A = 0, and Fs = 0.7375, which is clearly not significant at the 5% level. Show that it is possible to represent the value of an individual variate as follows: y = (>') + (>',— V') + (Vj; — Y). What docs each of the terms in parentheses estimate in a Model 1 anova and in a Model II anova? CHAPTER Single-Classification Analysis of Variance We are now ready to study actual eases of analysis of variance in a variety of applications and designs. The present chapter deals with the simplest kind of a n o v a , single-classification analysis of variance. By this we mean an analysis in which the groups (samples) are classified by only a single criterion. Either interpretations of the seven samples of housefly wing lengths (studied in the last chapter), different medium formulations (Model I), or progenies of different females (Model II) would represent a single criterion for classification. O t h e r examples would be different temperatures at which groups of animals were raised or different soils in which samples of plants have been grown. We shall start in Section 8.1 by staling the basic computational formulas for analysis of variance, based on the topics covered in the previous chapter. Section 8.2 gives an example of the c o m m o n case with equal sample sizes. We shall illustrate this case by means of a Model I anova. Since the basic computations for the analysis of variance - are the same in either model, it is not necessary to repeat the illustration with a Model II anova. The latter model is featured in Section 8.3, which shows the minor c o m p u t a t i o n a l complications resulting from unequal sample sizes, since all groups in the anova need not necessarily have the same sample size. Some c o m p u t a t i o n s unique to a Model II anova are also shown; these estimate variance components. F o r m u l a s be- 8.1 / c o m p u t a t i o n a l formulas 161 come especially simple for the two-sample case, as explained in Section 8.4. In Model I of this case, the mathematically equivalent t test can be applied as well. W h e n a Model I analysis of variance has been f o u n d to be significant, leading to the conclusion that the m e a n s are not f r o m the same population, we will usually wish to test the means in a variety of ways to discover which pairs of m e a n s are different f r o m each other and whether the m e a n s can be divided into groups that are significantly different from each other. T o this end, Section 8.5 deals with so-called planned comparisons designed before the test is run; and Section 8.6, with u n p l a n n e d multiple-comparison tests t h a t suggest themselves to the experimenter as a result of the analysis. 8.1 Computational formulas We saw in Section 7.5 that the total sum of squares and degrees of freedom can be additively partitioned into those pertaining to variation a m o n g groups and those to variation within groups. F o r the analysis of variance proper, we need only the sum of squares a m o n g groups and the sum of squares within groups. But when the c o m p u t a t i o n is not carried out by computer, it is simpler to calculate the total sum of squares and the sum of squares a m o n g groups, leaving the sum of squares within groups to be obtained by the subtraction SSiotai — SS g r o u p s . However, it is a good idea to c o m p u t e the individual variances so we can check for heterogeneity a m o n g them (sec Section 10.1). This will also permit an independent c o m p u t a t i o n of SS w i l h i n as a check. In Section 7.5 we arrived at the following c o m p u t a t i o n a l formulas for the total a n d a m o n g groups sums of squares: These formulas assume equal sample size η for each g r o u p and will be modified in Section 8.3 for unequal sample sizes. However, they suffice in their present form to illustrate some general points a b o u t c o m p u t a t i o n a l procedures in analysis of variance. We note that the second, subtracted term is the same in both sums of squares. This term can be obtained by s u m m i n g all the variates in the a n o v a (this is the grand total), squaring the sum, and dividing the result by the total n u m b e r of variates. It is c o m p a r a b l e to the second term in the c o m p u t a t i o n a l formula for the ordinary sum of squares (Expression (3.8)). This term is often called the correction term (abbreviated CT). The first term for the total sum of squares is simple. It is the sum of all squared variatcs in the anova table. T h u s the total sum of squares, which describes the variation of a single unstructured sample of an items, is simply the familiar sum-of-squares formula of Expression (3.8). 162 c h a p t e r 8 / single-classification analysis of variance The first term of the sum of squares a m o n g g r o u p s is obtained by squaring the sum of the items of each group, dividing each square by its sample size, a n d s u m m i n g the quotients from this operation for each group. Since the sample size of each g r o u p is equal in the above formulas, we can first sum all the squares of the g r o u p sums and then divide their sum by the constant n. F r o m the formula for the sum of squares a m o n g groups emerges an important c o m p u t a t i o n a l rule of analysis of variance: To find the sum of squares among any set of groups, square the sum of each group and divide by the sample size of the group·, sum the quotients of these operations and subtract from the sum a correction term. To find this correction term, sum all the items in the set, square the sum, and divide it by the number of items on which this sum is based. 8.2 Equal η W e shall illustrate a single-classification a n o v a with equal sample sizes by a Model I example. The c o m p u t a t i o n up to and including the first test of significance is identical for b o t h models. Thus, the c o m p u t a t i o n of Box 8.1 could also serve for a Model II a n o v a with equal sample sizes. The d a t a are f r o m a n experiment in plant physiology. They are the lengths in coded units of pea sections grown in tissue culture with auxin present. T h e p u r p o s e of the experiment was to test the effects of the addition of various sugars on growth as measured by length. F o u r experimental groups, representing three different sugars and one mixture of sugars, were used, plus one control without sugar. Ten_observations (replicates) were m a d e for each treatment. T h e term "trejitmenj_" already implies a_Mmlel I anova. It is obvious that the five g r o u p s d o not represent r a n d o m samples from all possible experimental conditions but were deliberately designed to legt^the effects of certain sugars o n J h £ growth rate. We arc interested in the effect of the sugars on length, and our null hypothesis will be that there is no added c o m p o n e n t due to treatment effects a m o n g the five groups; that is, t h c p o p u l a j i o n means are all assumed to be equal. T h e c o m p u t a t i o n is illustrated in Box 8.1. After quantities 1 t h r o u g h 7 have been calculated, they are entered into an analysis-of-variance table, as shown in the box. General formulas for such a tabic arc shown first; these arc followed by a table filled in for the specific example. We note 4 degrees of freedom a m o n g groups, there being five treatments, and 45 df within groups, representing 5 times (10 — 1) degrees of freedom. We find that the mean square a m o n g g r o u p s is considerably greater than the error mean square, giving rise to a suspicion that an added c o m p o n e n t due to treatment effects is present. If the MS g r o u p s is equal to or less than the M 5 w i l h i n , we d o not bother going on with the analysis, for we would not have evidence for the presence of an added c o m p o n e n t . You may wonder how it could be possible for the MS g r o u p s to be less than the MSwuhin· You must remember that these two are independent estimates. If there is no added c o m p o n e n t due to treatment or variance component a m o n g groups, the estimate of the variance a m o n g groups is as likely to be less as it is to be greater than the variance within groups. 8.2 / e q u a l η 163 Expressions for the expected values of the m e a n squares are also shown in the first a n o v a table of Box 8.1. They are the expressions you learned in the previous chapter for a M o d e l I anova. BOX 8.1 Single-classification anova with equal sample sizes. The effect of the addition of different sugars on length, in ocular units ( x 0.114 = mm), of pea sections grown in tissue culture with auxin present: η = 10 (replications per group). This is a Model I anova. Treatments (a = 5) Observations, i.e., replications Control 2% Glucose added 2% Fructose added 17. Glucose + /% Fructose added 2% Sucrose added 1 2 3 4 5 6 7 8 9 10 It 75 67 70 75 65 71 67 67 76 68 57 58 60 59 62 60 60 57 59 61 58 61 56 58 57 56 61 60 57 58 58 59 58 61 57 56 58 57 57 59 62 66 65 63 64 62 65 65 62 67 ΣϊY 701 70.1 593 59.3 582 58.2 580 58X> 641 64.1 Source: Data by W. Purves. Preliminary computations 1. Grand total = £ £ Y = 701 + 593 + · · · + 641 = 3097 2. Sum of the squared observations α η **ΣΣγ2 *= 75 2 + 67* + · · · + 68 2 + 57 2 + · • · + 67 2 = 193,151 3. Sum of the squared group totals divided by η = J Σ (l = y J - A(701 2 + 5932 + · · · + 641 2 ) (1,929,055) = 192,905.50 4. Grand total squared and divided by total sample size = correction term CP M i e»V ΣΥY y 5 x 1 0 - ^ 50 - 191,828.18 164 c h a p t e r 8 / single-classification analysis of variance B O X 8,1 Continued S. ss total = 2 i i r ~ C T = quantity 2 - quantity 4 - 193,151 - 191,828.18 - 1322.82 « quantity 3 - quantity 4 « 192,905.50 - 191,828.18 = 1077.32 7. SS w j t h i n =s SS (ora i — SSgreap; « quantity 5 - quantity 6 « 1322.82 - 1077.32 = 245.50 T h e anova table is constructed as follows. Source of variation f - Y Among groups F - y Within groups y - Y Total df SS MS a - 1 6 - i (β - 1 ) 7 — a(n - 1) a(n - 1) 7 an - 1 5 Expected MS F, MS w i thi „ ^ + a - 1 a2 Substituting the computed values into the above table, we obtain the fol lowing: Anova table Source of variation df SS MS Fs 4 1077.32 269.33 49.33** 5.46 Ϋ -- Y Among groups (among treatments) Y -- f Within groups (error, replicates) 45 245.50 Y -- Ϋ Total 49 1322.82 ^0.05(4,4-51 = 2.58 ^0.01(4,45] = 3.77 * - 0.01 < Ρ 5 0.05. * * - P S 0.01. These conventions will be followed throughout the text and will no longer be explained in subsequent boxes and tables. Conclusions. There is a highly significant (P « 0.01) added component due to treatment effects in the mean square among groups (treatments). The different sugar treatments clearly have a significant effect on growth of the pea sections. See Sections 8.5 and 8.6 for the completion of a Model I analysis of variance: that is, the method for determining which means are significantly different from each other. 8.3 / u n e q u a l η 165 It may seem that we are carrying an unnecessary n u m b e r of digits in the c o m p u t a t i o n s in Box 8.1. This is often necessary to ensure that the e r r o r sum of squares, quantity 7, has sufficient accuracy. Since v 2 is relatively large, the critical values of F have been c o m p u t e d by h a r m o n i c interpolation in Table V (see f o o t n o t e to Table III for h a r m o n i c interpolation). The critical values have been given here only to present a complete record of the analysis. Ordinarily, when confronted with this example, you would not bother w o r k i n g out these values of F. C o m p a r i s o n of the observed variance ratio Fs = 49.33 with F 0 0 1 [ 4 4 0 ] = 3.83, the conservative critical value (the next tabled F with fewer degrees of freedom), would convince you that the null hypothesis should be rejected. The probability that the five groups differ as much as they d o by chance is almost infinitesimally small. Clearly, the sugars produce an added treatment effect, apparently inhibiting growth and consequently reducing the length of the pea sections. At this stage we are not in a position to say whether each treatment is different from every other treatment, or whether the sugars are different f r o m the control but not different f r o m each other. Such tests are necessary to complete a Model I analysis, but we defer their discussion until Sections 8.5 and 8.6. 8.3 Unequal η This time we shall use a Model II analysis of variance for an example. Remember that up to and including the F test for significance, the c o m p u t a t i o n s are exactly the same whether the anova is based on Model I or Model II. We shall point out the stage in the c o m p u t a t i o n s at which there would be a divergence of operations depending on the model. T h e example is shown in Table 8.1. It concerns a series of morphological measurements of the width of the scutum (dorsal shield) of samples of tick larvae obtained from four different host individuals of the cottontail rabbit. These four hosts were obtained at r a n d o m from one locality. We know nothing about their origins or their genetic constitution. They represent a r a n d o m sample of the population of host individuals from the given locality. We would not be in a position to interpret differences between larvae from different hosts, since we know nothing of the origins of the individual rabbits. Population biologists arc nevertheless interested in such analyses because they provide an answer to the following question: Are (he variances of means of larval characters a m o n g hosts greater than expected on the basis of variances of the characters within hosts? We can calculate the average variance of width of larval scutum on a host. This will be our "error" term in the analysis of variance. We then test the observed mean square a m o n g groups and sec if it contains an added c o m p o n e n t of variance. What would such an added c o m p o n e n t of variance represent? The mean square within host individuals (that is, of larvae on any one host) represents genetic differences a m o n g larvae and differences in environmental experiences of these larvae. Added variance a m o n g hosts demonstrates significant differentiation a m o n g the larvae possibly due to differences a m o n g t In, l-wiclt.' -ilTivf inn ill.· I·.™·!,. Il -ilcr» mau ke> rllwa Ι.· ΛΙΙΪ,· r,.|i Ίηι,,η.ι c h a p t e r 8 / single-classification analysis of 166 variance TABLE 8 . 1 D a t a and anova table for a single classification anova with unequal sample sizes. W i d t h of s c u t u m (dorsal shield) of larvae of t h e tick Haemaphysalis leporispalustris in s a m p l e s f r o m 4 c o t t o n t a i l r a b b i t s . M e a s u r e m e n t s in m i c r o n s . T h i s is a M o d e l II a n o v a . Hosts 1 γ 2 2 3 4 380 376 360 368 372 366 374 382 350 356 358 376 338 342 366 350 344 364 354 360 362 352 366 372 362 344 342 358 351 348 348 376 344 342 372 374 360 2978 3544 4619 2168 8 10 13 6 1,108,940 1,257,272 1,642,121 784,536 54.21 142.04 79.56 233.07 ΣΥ Σ (a = 4) s2 Source: Data by P. A. Thomas. Anova table Source of Y Y - y y y - y variation df SS MS Fs 5.26** Among groups (among hosts) Within groups (error; among larvae on a host) 3 1808.7 602.6 33 3778.0 114.5 Total 36 5586.7 Fq.05[3.331 = 2.89 Fq.01[3.33] ~ 4.44 Conclusion. T h e r e is a significant (Ρ < 0.01) a d d e d v a r i a n c e c o m p o n e n t a m o n g h o s t s for w i d t h of s c u t u m in larval ticks. the larvae, s h o u l d e a c h h o s t c a r r y a f a m i l y of ticks, o r a t least a p o p u l a t i o n w h o s e i n d i v i d u a l s a r e m o r e related t o e a c h o t h e r t h a n they a r e to tick l a r v a e on other host individuals. T h e e m p h a s i s in this e x a m p l e is o n the m a g n i t u d e s of the v a r i a n c e s . In view of t h e r a n d o m c h o i c e of h o s t s this is a clear c a s e of a M o d e l II a n o v a . B e c a u s e this is a M o d e l 11 a n o v a , t h e m e a n s for e a c h h o s t h a v e been o m i t t e d f r o m T a b l e 8.1. W e are n o t i n t e r e s t e d in t h e i n d i v i d u a l m e a n s o r p o s s i b l e differences 8.3 / u n e q u a l 167 η a m o n g them. A possible reason for looking at the means would be at the beginning of the analysis. O n e might wish to look at the g r o u p means to spot outliers, which might represent readings that for a variety of reasons could be in error. The c o m p u t a t i o n follows the outline furnished in Box 8.1, except that the symbol Σ" now needs to be written Σ"', since sample sizes differ for each group. Steps 1, 2, and 4 t h r o u g h 7 are carried out as before. Only step 3 needs to be modified appreciably. It is: 3. Sum of the squared g r o u p totals, each divided by its sample size, a = Σ The critical 5% and 1% values of F are shown below the a n o v a table in Table 8.1 (2.89 and 4.44, respectively). You should confirm them for yourself in Table V. N o t e that the argument v2 = 33 is not given. You therefore have to interpolate between a r g u m e n t s representing 30 to 40 degrees of freedom, respectively. T h e values shown were c o m p u t e d using h a r m o n i c interpolation. However, again, it was not necessary to carry out such an interpolation. The conservative value of F, Fal3i30], is 2.92 and 4.51, for α = 0.05 and a = 0.01, respectively. T h e observed value Fs is 5.26, considerably above the interpolated as well as the conservative value of F0 0l. We therefore reject the null hypothesis (H0: a\ = 0) that there is no added variance c o m p o n e n t a m o n g g r o u p s and that the two mean squares estimate the same variance, allowing a type I error of less than \ X . We accept, instead, the alternative hypothesis of the existence of an added variance c o m p o n e n t σ2Λ. W h a t is the biological meaning of this conclusion? For some reason, the ticks on different host individuals dilfer more from each other than d o individual ticks on any one host. This may be due to some modifying influence of individual hosts on the ticks (biochemical differences in blood, differences in the skin, differences in the environment of the host individual—all of them rather unlikely in this case), or it may be due to genetic diflcrcnces a m o n g the ticks. Possibly the ticks on each host represent a sibship (that is, are descendants of a single pair of parents) and the differences in the ticks a m o n g host individuals represent genetic differences a m o n g families; or perhaps selection has acted differently on the tick populations on each host, or the hosts have migrated to the collection locality from different geographic areas in which the licks differ in width of scutum. Of these various possibilities, genetic differences a m o n g sibships seem most reasonable, in view of the biology of the organism. The c o m p u t a t i o n s up to this point would have been identical in a Model 1 anova. If this had been Model I, the conclusion would have been that there is a significant treatment effect rather than an added variance c o m p o n e n t . Now, however, we must complete the c o m p u t a t i o n s a p p r o p r i a t e to a Model II anova. These will includc the estimation of the added variance c o m p o n e n t and the calculation of percentage variation at the two levels. c h a p t e r 8 / single-classification analysis of 168 variance Since sample size n, differs a m o n g g r o u p s in this example, we c a n n o t write σ2 + ησ2Α for the expected MS g r o u p s . It is o b v i o u s that no single value of η would be a p p r o p r i a t e in the f o r m u l a . W e therefore use an average n; this, however, is n o t simply n, the a r i t h m e t i c m e a n of the «,·'s, but is 1 «η = V Σ n i ~ Σ>?\ (8.1) a Σ"· / which is a n average usually close to b u t always less t h a n n, unless s a m p l e sizes are equal, in which case n0 = n. In this example, 1 4 - (8 + 10 + 13 + 6) - + 10 2 + 13 2 + 6 2 ~ 8 + 10 + 13 + = 9.009 Since the M o d e l II expected MS g r o u p s is a2 + ησ2Λ a n d the expected M 5 w i l h i n is σ 2 , it is o b v i o u s how the variance c o m p o n e n t a m o n g g r o u p s a2A a n d the e r r o r variance σ 2 are o b t a i n e d . Of course, the values that we o b t a i n are s a m p l e estim a t e s a n d therefore are written as .s2t a n d s2. T h e a d d e d variance c o m p o n e n t s\ is estimated as (JVfSgrouph — MS w i l h i n )/«. W h e n e v e r sample sizes a r e u n e q u a l , the d e n o m i n a t o r becomcs n 0 . In this example, (602.7 - 114.5)/9.009 = 54.190. W e are frequently not so m u c h interested in the actual values of these variance c o m p o n e n t s as in their relative magnitudes. F o r this p u r p o s e we sum the c o m p o nents a n d express each as a percentage of the resulting sum. T h u s s2 + s2, = 114.5 + 54.190 168.690, a n d ,v2 a n d .v2 arc 67.9% a n d 32.1% of this sum, respectively; relatively m o r e variation occurs within g r o u p s (larvae on a host) than a m o n g g r o u p s (larvae on different hosts). 8.4 T w o groups Λ frequent test in statistics is to establish the siynijicancc of the difference between two means. This can easily be d o n e by m e a n s of an analysis of variance for two (jroups. Box 8.2 shows this p r o c e d u r e for a Model I a n o v a , the c o m m o n case. T h e example in Box 8.2 conccrns the onset of r e p r o d u c t i v e m a t u r i t y in water fleas, Daphnia loiu/ispina. This is measured as the average age (in days) at beginning of r e p r o d u c t i o n . Hacli variate in the table is in fact an average, and a possible Haw in the analysis might be that the averages arc not based on equal sample sizes. However, we arc not given this i n f o r m a t i o n and have to proceed on the a s s u m p t i o n that each reading in the tabic is an equally reliable variate. T h e t w o scries represent different genetic crosses, a n d the seven replicates in each series arc clones derived f r o m the same genetic cross. This example is clcarly a Model 1 a n o v a . since the question to be answered is whether series I differs from series II in average age at the beginning of r e p r o d u c t i o n . Inspection of the d a t a shows thai the mean age at beginning of r e p r o d u c t i o n 8.4 / t w o groups 169 BOX 8J Testing the difference in means between two groups. Average age (in days) at beginning of reproduction in Daphnia longispina (each variate is a mean based on approximately similar numbers of females). Two series derived from different genetic crosses and containing seven clones each are compared; η = 7 clones per series. This is a Model I anova. Series (a = 2) I 11 7.2 7.1 9.1 7.2 8.8 7.5 7.7 7.6 7.4 6.7 7.2 7.3 7.2 7.5 η Σγ Υ Σγ s2 Ζ 52.6 7.5143 52.9 7.5571 398.28 0.5047 402.23 0.4095 Source: Data by Ordway, from Banta (1939). Single classification anova with two groups with equal sample sizes Anova table Source of y - y y - y variation ss MS 1 0.00643 0.00643 12 13 5.48571 5.49214 0.45714 df Between groups (series) Within groups (error; clones within series) Y- Υ Total 0.0141 FO.OJ(l.121 ~ 4.75 Conclusions. Since Fs « F 0 0 5 ( 1 | 2| , the null hypothesis is accepted. The means of the two series are not significantly different; that is, the two series do not differ in average age at beginning of reproduction. A t test of the hypothesis that two sample means come from a population with equal μ; also confidence limits of the difference between two means This test assumes that the variances in the populations from which the two samples were taken are identical. If in doubt about this hypothesis, test by method of Box 7.1, Section 7.3. 170 chapter 8 / single-classification analysis of variance BOX 8.2 Continued The appropriate formula for f s is one of the following: Expression (8.2), when sample sizes are unequal and n, or n z or both sample sizes are small ( < 30): df = n, + n 2 — 2 Expression (8.3), when sample sizes are identical (regardless of size): df = 2(« - 1) Expression (8.4), when n1 and n 2 are unequal but both are large ( > 30): df ~ tts -+ rt2 — 2 For the present data, since sample sizes are equal, we choose Expression (8.3): t __ ( ή - VVl - (μ. - μι) We are testing the null hypothesis that μι — μ2 = 0. Therefore we replace this quantity by zero in this example. Then t% = 7.5143 - 7.5571 -0.0428 -0.0428 V(a5047 + 0.4095)/7 ^09142/7 0-3614 Λ11ή, = -0.1184 The degrees of freedom for this example are 2(n — 1) = 2 χ 6 = 12. The critical value of f0.oMi2j = 2-179. Since the absolute value of our observed f, is less than the critical t value, the means are found to be not significantly different, which is the same result as was obtained by the anova. Confidence limits of the difference between two means = (^l — ^2) ~~ '«[vjSFi-Fz L 2 = (Yi — Y2) + ta[V]Sp, -γ. In this case F, - f 2 = --0.0428, t„.05„2, = 2.179, and s ? , = 0.3614, as computed earlier for the denominator of the t test. Therefore L , = —0.0428 - (2.179)(0.3614) = - 0 . 8 3 0 3 L 2 = - 0 . 0 4 2 8 + (2.179X0.3614) = 0.7447 The 95% confidence limits contain the zero point (no difference), as was to be expected, since the difference V, - Y2 was found to be not significant. • is very similar for the two series. It would surprise us, therefore, to find that tlicy arc significantly different. However, we shall carry out a test anyway. As you realize by now, one cannot tell from the m a g n i t u d e of a difference whether i( is significant. This depends on the m a g n i t u d e of (he error mean square, representing the variance within scries. The c o m p u t a t i o n s for the analysis of variance are not shown. They would be the same as in Box 8.1. With equal sample sizes and only two groups, there 8.4 / t w o 171 groups is one further c o m p u t a t i o n a l shortcut. Q u a n t i t y 6, SSgroups, puted by the following simple formula: ( Σ ^ - Σ ^ ) = ^ (526 - 2 n = - can be directly com- 529) 2 = 1 4 0 0 0 6 4 3 There is only 1 degree of freedom between the two groups. The critical value of F 0 ,05[i,i2] >s given u n d e r n e a t h the a n o v a table, but it is really not necessary to consult it. Inspection of the m e a n squares in the a n o v a shows that MS g r o u p s is m u c h smaller t h a n MS„ U h i n ; therefore the value of F s is far below unity, and there c a n n o t possibly be an added c o m p o n e n t due to treatment effects between the series. In cases where A/S g r o u p s < MS w i t h i n , we d o not usually b o t h e r to calculate Fs, because the analysis of variance could not possibly be significant. There is a n o t h e r m e t h o d of solving a Model I two-sample analysis of variance. This is a t test of the differences between two means. This t test is the traditional m e t h o d of solving such a problem; it may already be familiar to you from previous acquaintance with statistical work. It has no real advantage in either ease of c o m p u t a t i o n or understanding, and as you will see, it is mathematically equivalent to the a n o v a in Box 8.2. It is presented here mainly for the sake of completeness. It would seem too much of a break with tradition not to have the t test in a biostatistics text. In Section 6.4 we learned a b o u t the t distribution and saw that a t distribution of η — 1 degree of freedom could be obtained from a distribution of the term (F( — μ)/χ ? ι , where sy_ has η — 1 degrees of freedom and Ϋ is normally distributed. The n u m e r a t o r of this term represents a deviation of a sample mean from a parametric mean, and the d e n o m i n a t o r represents a standard error for such a deviation. We now learn that the expression (% - Y2) - (μ, - i, = "(η. ; μ2) (8.2) 1 Mf i (>i2 - 1 >sl "ι η. + η2 - 2 n,n7 is also distributed as t. Expression (8.2) looks complicated, but it really has the same structure as the simpler term for t. T h e n u m e r a t o r is a deviation, this time, not between a single sample mean and the parametric mean, but between a single difference between two sample means, F, and Ϋ2, and the true difference between the m e a n s of the populations represented by these means. In a test of this sort our null hypothesis is that the two samples come from the same population; that is, they must have the same parametric mean. Thus, the difference μ, — μ2 is assumed to be zero. We therefore test the deviation of the difference V, — F2 from zero. The d e n o m i n a t o r of Expression (8.2) is a s t a n d a r d error, the s t a n d a r d error of the difference between two means •«F,-Fi· Tfie left portion of the expression, which is in square brackets, is a weighted average of the variances of the two samples, .v2 and .v2. computed 172 chapter 8 / single-classification analysis of variance in the m a n n e r of Section 7.1. T h e right term of the s t a n d a r d e r r o r is the c o m p u t a t i o n a l l y easier f o r m of ( l / n j ) + ( l / n 2 ) , which is the factor by which t h e average variance within g r o u p s m u s t be multiplied in o r d e r to convert it i n t o a variance of the difference of m e a n s . T h e a n a l o g y with the m u l t i p l i c a t i o n of a s a m p l e variance s 2 by 1 jn to t r a n s f o r m it into a variance of a m e a n sy s h o u l d be obvious. T h e test as outlined here assumes e q u a l variances in the t w o p o p u l a t i o n s sampled. This is also a n a s s u m p t i o n of the analyses of variance carried out so far, a l t h o u g h we have not stressed this. W i t h only two variances, equality m a y be tested by the p r o c e d u r e in Box 7.1. W h e n sample sizes are e q u a l in a t w o - s a m p l e test, Expression (8.2) simplifies to the expression (Υ, - Υ,) - (μι - μ , ) (8.3) which is w h a t is applied in t h e present e x a m p l e in Box 8.2. W h e n the s a m p l e sizes are u n e q u a l but r a t h e r large, so t h a t the differences between and —1 are relatively trivial, Expression (8.2) reduces to the simpler form (V, - Υ2)-(μ, - μ 2 ) (8.4) T h e simplification of Expression (8.2) to Expressions (8.3) a n d (8.4) is s h o w n in A p p e n d i x A 1.3. T h e pertinent degrees of f r e e d o m for Expressions (8.2) a n d (8.4) are nl + n2 2, a n d for Expression (8.3) ilf is 2(η — I). T h e test of significance for differences between m e a n s using the f test is s h o w n in Box 8.2. This is a two-tailed test because o u r alternative hypothesis is / / , : μ, Φ μ2. T h e results of this test are identical t o those of the a n o v a in the s a m e box: the two m e a n s are not significantly different. W e can d e m o n s t r a t e this m a t h e m a t i c a l equivalence by s q u a r i n g the value for ts. T h e result should be identical to the Fs value of the c o r r e s p o n d i n g analysis of variance. Since ts = - 0 . 1 1 8 4 in Box 8.2, t2 = 0.0140. W i t h i n r o u n d i n g error, this is e q u a l to the Fs o b t a i n e d in the a n o v a (Fx = 0.0141). W h y is this so? We learned that f |v i = (Ϋ — μ )/*>·, where ν is the degrees of freedom of the variance of the m e a n stherefore = (Υ — μ) 2 Is], However, this expression can be regarded as a variance ratio. T h e d e n o m i n a t o r is clearly a variance with ν degrees of f r e e d o m . T h e n u m e r a t o r is also a variance. It is a single deviation s q u a r e d , which represents a sum of squares possessing 1 r a t h e r than zero degrees of f r e e d o m (since it is a deviation f r o m the true m e a n μ r a t h e r t h a n a s a m p l e mean). Λ s u m of s q u a r e s based on I degree of f r e e d o m is at the same time a variance. T h u s , t 2 is a variance ratio, since i[2v, = ,_vj, as we have seen. In A p p e n d i x A 1.4 wc d e m o n s t r a t e algebraically that the t 2 a n d the /·'„ value o b t a i n e d in Box 8.2 are identical quantities. Since ι a p p r o a c h e s the n o r m a l distribution as 8.5 / c o m p a r i s o n s a m o n g m e a n s ' p l a n n e d comparisons 173 the s q u a r e of t h e n o r m a l deviate as ν -» oo. W e also k n o w (from Section 7.2) that rfv.j/Vi = Flvuao]. Therefore, when νί = 1 a n d v 2 = oo, x f u = F [ l ao] = f j ^ , (this c a n be d e m o n s t r a t e d f r o m Tables IV, V, a n d III, respectively): 2 Z0.0511 ] = 3.841 ^0.05[1 ,x] = 3.84 = 1.960 fo.os[*i = 3-8416 T h e t test for differences between t w o m e a n s is useful w h e n we wish t o set confidence limits to such a difference. Box 8.2 shows h o w to calculate 95% confidence limits to the difference between the series m e a n s in the Daphnia example. T h e a p p r o p r i a t e s t a n d a r d e r r o r a n d degrees of f r e e d o m d e p e n d on whether Expression (8.2), (8.3), or (8.4) is chosen for ts. It d o e s not surprise us to find that the confidence limits of the difference in this case enclose the value of zero, r a n g i n g f r o m ^ 0 . 8 3 0 3 t o + 0 . 7 4 4 7 . T h i s must be so w h e n a difference is found to be not significantly different from zero. We can i n t e r p r e t this by saying that we c a n n o t exclude zero as the true value of the difference between the m e a n s of the t w o series. A n o t h e r instance when you might prefer to c o m p u t e the t test for differences between two m e a n s rather t h a n use analysis of variance is w h e n you are lacking the original variates a n d have only published m e a n s a n d s t a n d a r d e r r o r s available for the statistical test. Such an example is furnished in Exercise 8.4. 8.5 Comparisons among means: Planned comparisons We have seen that after the initial significance test, a M o d e l II analysis of variance is c o m p l e t e d by estimation of the a d d e d variance c o m p o n e n t s . We usually c o m p l e t e a Model 1 a n o v a of m o r e t h a n t w o g r o u p s by e x a m i n i n g the d a t a in greater detail, testing which m e a n s are different f r o m which o t h e r ones or which g r o u p s of m e a n s arc different from o t h e r such g r o u p s or from single means. Let us look again at the M o d e l I a n o v a s treated so far in this chapter. We can dispose right away of the t w o - s a m p l e ease in Box 8.2, the average age of water fleas at beginning of r e p r o d u c t i o n . As you will recall, there was no significant difference in age between the two genetic scries. Bui even if there had been such a difference, no further tests arc possible. However, the d a t a on lenglh of pea sections given in Box 8.1 show a significant difference a m o n g (he five treatments (based on 4 degrees of freedom). Although we k n o w that the means are not all equal, we d o nol k n o w which ones differ from which o t h e r ones. This leads us to the subject of tests a m o n g pairs a n d g r o u p s of means. T h u s , for example, we might test the control against the 4 experimental treatments representing a d d e d sugars. T h e question to be lested would be, D o e s the addition of sugars have an effect on length of pea sections? We might also test for differences a m o n g the sugar treatments. A reasonable test might be pure sugars (glucose, fructose, and sucrose) versus the mixed sugar treatment (1% 174 c h a p t e r 8 / single-classification analysis of variance An i m p o r t a n t point a b o u t such tests is t h a t they are designed a n d c h o s e n i n d e p e n d e n t l y of the results of the experiment. T h e y should be p l a n n e d before the experiment h a s been carried out a n d the results o b t a i n e d . Such c o m p a r i s o n s are called planned or a priori comparisons. Such tests are applied regardless of the results of the preliminary overall a n o v a . By c o n t r a s t , after t h e e x p e r i m e n t has been carried out, we might wish to c o m p a r e certain m e a n s t h a t we notice to be m a r k e d l y different. F o r instance, sucrose, with a m e a n of 64.1, a p p e a r s to have had less of a g r o w t h - i n h i b i t i n g effect t h a n fructose, with a m e a n of 58.2. We might therefore wish to test w h e t h e r there is in fact a significant difference between the effects of fructose a n d sucrose. Such c o m p a r i s o n s , which suggest themselves as a result of the c o m p l e t e d experiment, are called unplanned o r a posteriori comparisons. T h e s e tests are p e r f o r m e d only if the preliminary overall a n o v a is significant. T h e y include tests of the c o m p a r i s o n s between all possible pairs of means. W h e n there are a means, there can, of course, be a(a — l)/2 possible c o m p a r i s o n s between pairs of means. T h e reason we m a k e this distinction between a priori a n d a posteriori c o m p a r i s o n s is that the tests of significance a p p r o p r i a t e for the t w o c o m p a r i s o n s a r e different. A simple e x a m p l e will s h o w why this is so. Let us a s s u m e we have sampled f r o m an a p p r o x i m a t e l y n o r m a l p o p u l a t i o n of heights on men. W e have c o m p u t e d their m e a n and s t a n d a r d deviation. If we s a m p l e t w o m e n at a time f r o m this p o p u l a t i o n , we can predict the difference between them o n the basis of o r d i n a r y statistical theory. S o m e m e n will be very similar, o t h e r s relatively very different. Their differences will be distributed normally with a m e a n of 0 and an expected variance of 2 a 2 , for reasons t h a t will be learned in Section 12.2. T h u s , if we o b t a i n a large difference between t w o r a n d o m l y sampled men, it will have to be a sufficient n u m b e r of s t a n d a r d deviations greater t h a n zero for us to reject o u r null hypothesis that the t w o men c o m c from the specified p o p u l a t i o n . If, on the o t h e r h a n d , we were to look at the heights of the men before s a m p l i n g t h e m and then take pairs of m e n w h o seemed to be very different from each o t h e r , it is o b v i o u s that we would repeatedly o b t a i n differences within pairs of men that were several s t a n d a r d deviations a p a r t . Such differences would be outliers in the expected frequency d i s t r i b u t o n of differences, a n d time a n d again wc would reject o u r null hypothesis when in fact it was true. T h e men would be sampled f r o m the s a m e p o p u l a t i o n , but because they were not being sampled at r a n d o m but being inspected before being sampled, the probability distribution on which o u r hypothesis testing rested would n o longer be valid. It is o b v i o u s that the tails in a large s a m p l e f r o m a n o r m a l distribution will be a n y w h e r e f r o m 5 to 7 s t a n d a r d deviations a p a r t . If we deliberately take individuals f r o m e a c h tail a n d c o m p a r e them, they will a p p e a r to be highly significantly different f r o m each other, a c c o r d i n g to the m e t h o d s described in the present section, even t h o u g h they belong to the s a m e p o p u l a t i o n . W h e n we c o m p a r e m e a n s differing greatly f r o m each o t h e r as the result of some treatment in the analysis of variance, we are d o i n g exactly the s a m e thing as t a k i n g the tallest and the shortest men f r o m the frequency distribution of 175 8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d c o m p a r i s o n s heights. If w e wish t o k n o w w h e t h e r these a r e significantly different f r o m e a c h o t h e r , we c a n n o t use the o r d i n a r y p r o b a b i l i t y d i s t r i b u t i o n o n w h i c h t h e analysis of v a r i a n c e rests, b u t we h a v e t o use special tests of significance. T h e s e u n p l a n n e d tests will be discussed in t h e next section. T h e p r e s e n t section c o n c e r n s itself with t h e c a r r y i n g o u t of t h o s e c o m p a r i s i o n s p l a n n e d b e f o r e t h e e x e c u t i o n of t h e e x p e r i m e n t . T h e general rule f o r m a k i n g a p l a n n e d c o m p a r i s o n is e x t r e m e l y simple; it is related t o t h e r u l e f o r o b t a i n i n g t h e s u m of s q u a r e s for a n y set of g r o u p s (discussed at the e n d of Section 8.1). T o c o m p a r e k g r o u p s of a n y size nh t a k e the s u m of e a c h g r o u p , s q u a r e it, divide the result by the s a m p l e size nh a n d s u m the k q u o t i e n t s so o b t a i n e d . F r o m t h e s u m of these q u o t i e n t s , s u b t r a c t a c o r r e c t i o n t e r m , w h i c h y o u d e t e r m i n e by t a k i n g t h e g r a n d s u m of all t h e g r o u p s in this c o m p a r i s o n , s q u a r i n g it, a n d d i v i d i n g t h e result by the n u m b e r of items in the g r a n d s u m . If t h e c o m p a r i s o n i n c l u d e s all t h e g r o u p s in t h e a n o v a , the c o r r e c t i o n t e r m will be the m a i n CT of the s t u d y . If, h o w e v e r , t h e c o m p a r i s o n includes only s o m e of t h e g r o u p s of the a n o v a , t h e CT will be different, b e i n g restricted only to these g r o u p s . T h e s e rules c a n best be l e a r n e d by m e a n s of a n e x a m p l e . T a b l e 8.2 lists the m e a n s , g r o u p s u m s , a n d s a m p l e sizes of the e x p e r i m e n t with t h e p e a sections f r o m Box 8.1. Y o u will recall t h a t t h e r e were highly significant differences a m o n g t h e g r o u p s . W e n o w wish t o test w h e t h e r the m e a n of the c o n t r o l differs f r o m t h a t of the f o u r t r e a t m e n t s r e p r e s e n t i n g a d d i t i o n of s u g a r . T h e r e will t h u s be t w o g r o u p s , o n e t h e c o n t r o l g r o u p a n d t h e o t h e r the " s u g a r s " g r o u p s , the latter with a sum of 2396 a n d a s a m p l e size of 40. W e t h e r e f o r e c o m p u t e SS (control v e r s u s sugars) _ (701 ) 2 4 10 (701) = — 10 (593 + 582 + 580 + 641) 2 2 + 40 (2396) 2 40 - (701 + 593 + 582 + 580 + 641) 2 ~ (3097)50 50 = 8^2.12 In this case the c o r r e c t i o n term is the s a m e as for the a n o v a , b e c a u s e it involves all the g r o u p s of t h e s t u d y . T h e result is a s u m of s q u a r e s for the c o m p a r i s o n TABLE 8.2 Means, group sums, and sample sizes from the data in Box 8.1. l ength of pea sections g r o w n in tissue culture (in o c u l a r units). 1" ('onirol 70.1 Y I η y yhtcost' 593 Jructosc 58.2 / ".i illliCOSi' + Γ'~„ fructose 58.0 siurosc 64.1 Σ (61.94 - 701 593 582 580 641 3097 10 10 10 10 10 50 F) chapter 176 8 / single-classification analysis of variance b e t w e e n t h e s e t w o g r o u p s . Since a c o m p a r i s o n b e t w e e n t w o g r o u p s h a s o n l y 1 d e g r e e of f r e e d o m , t h e s u m of s q u a r e s is at t h e s a m e t i m e a m e a n s q u a r e . T h i s m e a n s q u a r e is tested o v e r t h e e r r o r m e a n s q u a r e of t h e a n o v a t o give t h e following comparison: MS ( c o n t r o l v e r s u s sugars) Fs = M5^th,„ ^0.05[1,45] = 832.32 = ~5A6~ 15944 = F 0.0 1 [ 1 .4 5] = ^.23 4.05, T h i s c o m p a r i s o n is h i g h l y significant, s h o w i n g t h a t the a d d i t i o n s of s u g a r s h a v e significantly r e t a r d e d t h e g r o w t h of the p e a sections. N e x t we test w h e t h e r t h e m i x t u r e of s u g a r s is significantly d i f f e r e n t f r o m t h e p u r e sugars. U s i n g the s a m e t e c h n i q u e , we c a l c u l a t e SS (mixed s u g a r s v e r s u s p u r e s u g a r s ) - <„ 580 i 2 ( 5 9 3 ^ 5 8 2 j f J > 4 1 ) 2 _ (593 + 582_+ 580 + 641) 2 _ ~ (580) 2 K) (1816) 2 30 40 (2396) 2 40 = 48.13 H e r e the CT is different, since it is b a s e d o n t h e s u m of the s u g a r s only. T h e a p p r o p r i a t e test statistic is MS (mixed s u g a r s versus p u r e sugars) 48.13 /, = — — ~ 8.8^ MSwilhin 5.46 T h i s is significant in view of the critical v a l u e s of 4 5 | given in t h e p r e c e d i n g paragraph. A final test is a m o n g t h e t h r e e sugars. T h i s m e a n s q u a r e h a s 2 d e g r e e s of f r e e d o m , since it is based o n t h r e e m e a n s . T h u s we c o m p u t e , <593) 2 <582) 2 (641 )2 SS ( a m o n g p u r e sugars) = + + |() (() )(| (1816) 2 ,() = 196.87 SS ( a m o n g p u r e sugars) 196.87 MS ( a m o n g p u r e s u g a r s ) --= — — -= 98.433 d) I\ = MS ( a m o n g p u r e s u g a r s ! A/S w i l h ,„ 2 = 98.433 - — 18.03 5.46 T h i s Fx is highly significant, since even /·',, 0112.401 = 5·'^· W e c o n c l u d e that the a d d i t i o n of the t h r e e s u g a r s r e t a r d s g r o w t h in the pea sections, that mixed s u g a r s affect (lie s e c t i o n s differently f r o m p u r e s u g a r s , a n d that the p u r e s u g a r s a r e signilicanlly different a m o n g themselves, p r o b a b l y bec a u s e the s u c r o s e lias a far higher m e a n . W e c a n n o t test the s u c r o s e a g a i n s t the o t h e r two, b e c a u s e that w o u l d be a n u n p l a n n e d test, which s u g g e s t s itself to us alter we have l o o k e d at the results. T o c a r r y o u t such a test, we need the m i - t h n i k (il'lhc next section. 177 8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d c o m p a r i s o n s O u r a p r i o r i tests m i g h t h a v e been q u i t e different, d e p e n d i n g entirely o n o u r initial h y p o t h e s e s . T h u s , w e could h a v e tested c o n t r o l v e r s u s s u g a r s initially, followed by d i s a c c h a r i d e s (sucrose) versus m o n o s a c c h a r i d e s (glucose, f r u c t o s e , glucose + fructose), f o l l o w e d by mixed versus p u r e m o n o s a c c h a r i d e s a n d finally by glucose v e r s u s f r u c t o s e . T h e p a t t e r n a n d n u m b e r of p l a n n e d tests a r e d e t e r m i n e d b y o n e ' s h y p o t h eses a b o u t t h e d a t a . H o w e v e r , t h e r e are c e r t a i n restrictions. It w o u l d clearly be a m i s u s e of statistical m e t h o d s t o d e c i d e a p r i o r i t h a t o n e wished t o c o m p a r e every m e a n a g a i n s t every o t h e r m e a n (a(a — l)/2 c o m p a r i s o n s ) . F o r a g r o u p s , t h e s u m of t h e d e g r e e s of f r e e d o m of t h e s e p a r a t e p l a n n e d tests s h o u l d n o t exceed a — 1. In a d d i t i o n , it is d e s i r a b l e t o s t r u c t u r e t h e tests in s u c h a w a y t h a t each o n e tests a n i n d e p e n d e n t r e l a t i o n s h i p a m o n g t h e m e a n s (as w a s d o n e in the e x a m p l e above). F o r e x a m p l e , we w o u l d prefer n o t t o lest if m e a n s 1, 2, a n d 3 differed if we h a d a l r e a d y f o u n d t h a t m e a n 1 differed f r o m m e a n 3, since significance of the latter suggests significance of the f o r m e r . Since these tests a r e i n d e p e n d e n t , the three s u m s of s q u a r e s we h a v e so far o b t a i n e d , based o n 1, 1, a n d 2 d f , respectively, t o g e t h e r a d d u p t o t h e s u m of s q u a r e s a m o n g t r e a t m e n t s of t h e original a n a l y s i s of v a r i a n c e based o n 4 degrees of f r e e d o m . T h u s : SS ( c o n t r o l versus sugars) = SS (mixed versus p u r e sugars) = 832.32 df 1 48.13 1 SS ( a m o n g p u r e sugars) = 196.87 2 SS ( a m o n g t r e a t m e n t s ) =1077.32 4 T h i s a g a i n illustrates the elegance of analysis of v a r i a n c e . T h e t r e a t m e n t s u m s of s q u a r e s can be d e c o m p o s e d i n t o s e p a r a t e p a r t s that are s u m s of s q u a r e s in their o w n right, with degrees of f r e e d o m p e r t a i n i n g to t h e m . O n e s u m of s q u a r e s m e a s u r e s the difference between the c o n t r o l s a n d the s u g a r s , the second t h a t b e t w e e n the mixed s u g a r s a n d the p u r e sugars, a n d the third the r e m a i n i n g v a r i a t i o n a m o n g the t h r e e s u g a r s . W e c a n present all of these results as a n a n o v a table, as s h o w n in T a b l e 8.3. TAHI.F 8 . 3 Anova table from Box K.I, with treatment sum of squares decomposed into planned comparisons. Source of I'tiriulioii <H .S.'V MS Treatments Control vs. sugars Mixed vs. pure sugars Among pure sugars Within 4 1 1 45 1077.32 832.32 48.13 196.87 245.50 269.33 832.32 48.13 98.43 5.46 Total 49 1322.82 7 49.33** 152.44** 8.82** 18.03** 178 c h a p t e r 8 / single-classification analysis of variance W h e n the planned c o m p a r i s o n s are not i n d e p e n d e n t , a n d when t h e n u m b e r of c o m p a r i s o n s p l a n n e d is less t h a n the total n u m b e r of c o m p a r i s o n s possible between all pairs of means, which is a(a — 1)/2, we carry out the tests as j u s t shown but we a d j u s t the critical values of the type 1 e r r o r a. In c o m p a r i s o n s that are not i n d e p e n d e n t , if the o u t c o m e of a single c o m p a r i s o n is significant, the o u t c o m e s of s u b s e q u e n t c o m p a r i s o n s are m o r e likely t o be significant as well, so that decisions based on conventional levels of significance m i g h t be in d o u b t . F o r this reason, we e m p l o y a conservative a p p r o a c h , lowering the type I e r r o r of the statistic of significance for each c o m p a r i s o n so that the p r o b a bility of m a k i n g any type I e r r o r at all in the entire series of tests d o e s not exceed a predetermined value a. This value is called the experimentwise error rate. Assuming that the investigator plans a n u m b e r of c o m p a r i s o n s , a d d i n g u p to k degrees of freedom, the a p p r o p r i a t e critical values will be o b t a i n e d if the probability x' is used for any o n e c o m p a r i s o n , where y 7 k T h e a p p r o a c h using this relation is called the Bonferroni method; it assures us of an experimentwise e r r o r rate < r. Applying this a p p r o a c h to the pea section d a t a , as discussed above, let us assume that the investigator has good reason to test the following c o m p a r i s o n s between and a m o n g treatments, given here in abbreviated form: (C) versus (G, F. S, G + F); (G, K, S) versus (G t F); a n d (G) versus (F) versus (S); as well as (G, F) versus (G + F) T h e 5 degrees of f r e e d o m in these tests require that each individual test be a d j u s t e d to a significance level of a 0.05 a' = ^ ^ - 0.01 for an experimentwise critical α — 0.05. T h u s , (lie critical value for the [·\ ratios of these c o m p a r i s o n s is /·„ l ) ] M 4 S | or /·'„ <>,| > 4 5 ] , as a p p r o p r i a t e . T h e first three tests arc carried out as shown above. T h e last test is c o m p u t e d in a similar manner: SS Iaverage of glucose a n d \ fructose vs. glucose \ and fructose mixed 58,)2 (593 + (58())2 20 10 (I 175)2 20 (593 + 5g2 + 58Q)2 30 (580) 2 _ (1755) 2 _ + 10 Ή) In spite of the c h a n g e in critical value, the conclusions c o n c e r n i n g the first three tests are u n c h a n g e d . The last test, the average of glucose a n d fructose versus a mixture of the two, is not significant, since F s = i l l 0.687. A d j u s t ing the critical value is a conservative procedure: individual c o m p a r i s o n s using this a p p r o a c h are less likely to be significant. 8.6 / c o m p a r i s o n s a m o n g m e a n s : u n p l a n n e d 179 comparisons T h e B o n f e r r o n i m e t h o d generally will n o t e m p l o y the s t a n d a r d , t a b l e d a r g u m e n t s of α for the F d i s t r i b u t i o n . T h u s , if we were t o p l a n tests i n v o l v i n g a l t o g e t h e r 6 d e g r e e s of f r e e d o m , t h e v a l u e of a' w o u l d be 0.0083. E x a c t tables for B o n f e r r o n i critical values are a v a i l a b l e for the special case of single d e g r e e of f r e e d o m tests. Alternatively, we c a n c o m p u t e the d e s i r e d critical v a l u e b y m e a n s of a c o m p u t e r p r o g r a m . A c o n s e r v a t i v e a l t e r n a t i v e is t o use t h e next smaller t a b l e d v a l u e of a. F o r details, c o n s u l t S o k a l a n d Rohlf (1981), s e c t i o n 9.6. T h e B o n f e r r o n i m e t h o d (or a m o r e r e c e n t r e f i n e m e n t , t h e D u n n - S i d a k m e t h o d ) s h o u l d a l s o be e m p l o y e d w h e n y o u a r e r e p o r t i n g c o n f i d e n c e limits for m o r e t h a n o n e g r o u p m e a n resulting f r o m a n analysis of v a r i a n c e . T h u s , if y o u w a n t e d to p u b l i s h the m e a n s a n d 1 — a c o n f i d e n c e limits of all live t r e a t m e n t s in the p e a section e x a m p l e , you w o u l d not set c o n f i d e n c e limits t o each m e a n as t h o u g h it were a n i n d e p e n d e n t s a m p l e , b u t y o u w o u l d e m p l o y t„. [v] , w h e r e ν is the degrees of f r e e d o m of the entire s t u d y a n d a' is the a d j u s t e d t y p e I e r r o r e x p l a i n e d earlier. D e t a i l s of such a p r o c e d u r e c a n be learned in S o k a l a n d Rohlf (1981), Section 14.10. 8.6 Comparisons among means: Unplanned comparisons A single-classification a n o v a is said to be significant if MS — ' > Fjh, ^^wilhin Since M S g r o u p J M S „ i t h i n = S S g r o u p s / [ ( « (8.5) as | „(„•!)] (8.5) 1) M S w i l h i n J , we can r e w r i t e E x p r e s s i o n g r o u p s ^ (" " Π M S w i l h i „ /·'„!„ !.„,„ 1,| F o r e x a m p l e , in Box 8.1, w h e r e the a n o v a is significant, SS Br „ s t i t u t i n g into E x p r e s s i o n (8.6), we o b t a i n 1077.32 > (5 - 1)(5.46)(2.58) - 56.35 for (8.6) — 1077.32. S u b - a = 0.05 It is t h e r e f o r e possible t o c o m p u t e a critical λ\ν value for a test of significance of a n a n o v a . Thus, a n o t h e r way of c a l c u l a t i n g overall significance w o u l d be t o sec w h e t h e r the S.VKI„ups is g r e a t e r t h a n this critical SS. It is of interest t o investigate w h y the SS vt>Ui , s is as large as it is a n d to test for t h e significance of the v a r i o u s c o n t r i b u t i o n s m a d e to this SS by dilfercnccs a m o n g the s a m p l e m e a n s . T h i s was discussed in the p r e v i o u s scction, w h e r e s e p a r a t e s u m s of s q u a r e s were c o m p u t e d based o n c o m p a r i s o n s a m o n g m e a n s p l a n n e d b e f o r e the d a t a were e x a m i n e d . A c o m p a r i s o n w a s called significant if its /·', r a t i o w a s > I''iik !.«(»• πι· w h e r e k is the n u m b e r of m e a n s being c o m p a r e d . W e c a n n o w also s t a t e this in t e r m s of s u m s of s q u a r e s : An SS is significant if it is g r e a t e r t h a n {k I) M S w i l h i n Fxlk ,.„,„ n]. T h e a b o v e tests w e r e a priori c o m p a r i s o n s . O n e p r o c e d u r e for testing a posteriori c o m p a r i s o n s w o u l d be to set k — a in this last f o r m u l a , n o m a t t e r 180 c h a p t e r 8 / single-classification analysis of variance how m a n y m e a n s we c o m p a r e ; thus the critical value of the SS will be larger t h a n in the previous m e t h o d , m a k i n g it m o r e difficult to d e m o n s t r a t e the significance of a s a m p l e SS. Setting k = a allows for the fact t h a t we c h o o s e for testing those differences between g r o u p m e a n s t h a t a p p e a r to be c o n t r i b u t i n g substantially to the significance of the overall a n o v a . F o r an example, let us r e t u r n to the effects of sugars on g r o w t h in pea sections (Box 8.1). We write d o w n the m e a n s in ascending o r d e r of m a g n i t u d e : 58.0 (glucose + fructose), 58.2 (fructose), 59.3 (glucose), 64.1 (sucrose), 70.1 (control). W e notice t h a t the first three t r e a t m e n t s have quite similar m e a n s a n d suspect t h a t they d o n o t differ significantly a m o n g themselves a n d hence d o n o t c o n t r i b u t e substantially to the significance of the SSgroups. T o test this, wc c o m p u t e the SS a m o n g these three m e a n s by the usual formula: 2 2 2 2 _ (593) + (582) __ + (580) _ (593 + 582 _ _+ 580) - 102,677.3 - 102,667.5 = 9.8 T h e dilfcrcnccs a m o n g these m e a n s are not significant, because this SS is less than the critical SS (56.35) calculated above. T h e sucrose m e a n looks suspiciously different from the m e a n s of the o t h e r sugars. T o test this wc c o m p u t e (641) 2 k ~ 10 + (593 + 582 + 580) 2 (641 + 593 + 582 + 580) 2 30 κΓ+30 = 41,088.1 + 102,667.5 - 143,520.4 = 235.2 which is greater than the critical SS. Wc conclude, therefore, that sucrosc retards g r o w t h significantly less than the o t h e r sugars tested. We may c o n t i n u e in this fashion, testing all the differences that look suspicious o r even testing all possible sets of means, considering them 2, 3, 4, a n d 5 at a time. This latter a p p r o a c h may require a c o m p u t e r if there are m o r e than 5 m e a n s to be c o m pared, since there arc very m a n y possible tests that could be m a d e . This p r o c e d u r e was p r o p o s e d by Gabriel (1964), w h o called it a sum of squares simultaneous test procedure (SS-S'l'P). In the SS-S I I' and in the original a n o v a , the chancc of m a k i n g a n y type I e r r o r at all is a, the probability selected for the critical I· value f r o m T a b l e V. By " m a k i n g any type I e r r o r at all" we m e a n m a k i n g such an e r r o r in the overall test of significance of the a n o v a a n d in any of the subsidiary c o m p a r i s o n s a m o n g m e a n s or sets of means needed to complete the analysis of the experiment. Phis probability a therefore is an experimentwise e r r o r rate. N o t e that t h o u g h the probability of any e r r o r at all is a, the probability of e r r o r for any p a r t i c u l a r test of s o m e subset, such as a test of the difference a m o n g three o r between t w o means, will always be less than χ Thus, for the test of each subset o n e is really using a significance level a \ which may be m u c h less than the cxperimcntwisc e x e r c i s e s 195 α, a n d if t h e r e a r e m a n y m e a n s in t h e a n o v a , this a c t u a l e r r o r r a t e a ' m a y be o n e - t e n t h , o n e o n e - h u n d r e d t h , o r even o n e o n e - t h o u s a n d t h of t h e e x p e r i m e n t wise α ( G a b r i e l , 1964). F o r this r e a s o n , t h e u n p l a n n e d tests d i s c u s s e d a b o v e a n d the overall a n o v a a r e n o t very sensitive t o differences b e t w e e n i n d i v i d u a l m e a n s o r differences w i t h i n small subsets. O b v i o u s l y , n o t m a n y differences a r e g o i n g t o be c o n s i d e r e d significant if a' is m i n u t e . T h i s is t h e price w e p a y for n o t p l a n n i n g o u r c o m p a r i s o n s b e f o r e we e x a m i n e t h e d a t a : if w e w e r e t o m a k e p l a n n e d tests, the e r r o r r a t e of e a c h w o u l d be greater, h e n c e less c o n s e r v a t i v e . T h e SS-STP p r o c e d u r e is only o n e of n u m e r o u s t e c h n i q u e s f o r m u l t i p l e u n p l a n n e d c o m p a r i s o n s . It is t h e m o s t c o n s e r v a t i v e , since it a l l o w s a large n u m b e r of possible c o m p a r i s o n s . D i f f e r e n c e s s h o w n t o be significant by this m e t h o d c a n be reliably r e p o r t e d as significant differences. H o w e v e r , m o r e sensitive a n d p o w e r f u l c o m p a r i s o n s exist w h e n t h e n u m b e r of possible c o m p a r i s o n s is c i r c u m s c r i b e d b y t h e user. T h i s is a c o m p l e x s u b j e c t , t o w h i c h a m o r e c o m p l e t e i n t r o d u c t i o n is given in S o k a l a n d Rohlf (1981), Section 9.7. Exercises 8.1 The following is an example with easy numbers to help you become familiar with the analysis of variance. A plant ecologist wishes to test the hypothesis that the height of plant species X depends on the type of soil it grows in. He has measured the height of three plants in each of four plots representing different soil types, all four plots being contained in an area of two miles square. His results are tabulated below. (Height is given in centimeters.) Does your analysis support this hypothesis? ANS. Yes, since F, = 6.951 is larger than 'θ <I5|J.H| — 4 . 0 7 . Observation number 1 2 3 8.2 / 15 9 14 Loetilil ies 2 .i 25 21 19 17 23 20 4 10 13 16 The following are measurements (in coded micrometer units) of the thorax length of the aphid Pemphigus populitransversus. The aphids were collected in 28 galls on the cottonwood I'opulas delloides. Four alate (winged) aphids were randomly selected from each gall and measured. The alate aphids of each gall are isogenic (identical twins), being descended parthcnogenetieally from one stem mother. Thus, any variance within galls can be due to environment only. Variance between galls may be due to differences in genotype and also to environmental differences between galls. If this character, thorax length, is affected by genetic variation, significant intergall variance must be present. The converse is not necessarily true: significant variance between galls need not indicate genetic variation; it could as well be due to environmental differences between galls (data by Sokal, 1952). Analyze the variance of thorax length. Is there significant intergall variance present? (Jive estimates of the added component of intergall variance, if present. What percentage of the variance is controlled by intragall and what percentage by intergall factors? Discuss your results. 182 c h a p t e r 8 / s i n g l e - c l a s s i f i c a t i o n a n a l y s i s of Gall no. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 8.3 Gall no. 6.1, 6.2, 6.2, 5.1, 4.4, 5.7, 6.3, 4.5, 6.3, 5.4, 5.9, 5.9, 5.8, 5.6, 6.0, 5.1, 6.2, 6.0, 4.9, 5.1, 6.6, 4.5, 6.2, 5.3, 5.8, 5.9, 5.9, 6.4, 5.7. 6.1. 5.3, 5.8, 4.7, 5.8, 6.4, 4.0, 5.9, 5.0, 6.3, 5.5, 5.4, 6.4, 6.0 5.3 6.3 5.9 4.8 5.5 6.3 3.7 6.2 5.3 5.7 5.5 5.5 6.1 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 6.3, 5.9, 5.8, 6.5, 5.9, 5.2, 5.4, 4.3, 6.0, 5.5, 4.0, 5.8, 4.3, 6.1, 6.5, 6.1, 6.0, 6.3, 5.2, 5.3, 5.5, 4.7, 5.8, 6.1, 4.2, 5.6, 4.0, 6.0, 6.1, 6.1, 5.9, 6.5, 5.7, 5.4, 5.2, 4.5, 5.7, 5.5, 4.3, 5.6, 4.4, 5.6, 6.3 6.0 5.7 7.0 5.7 5.3 6.3 4.4 5.9 6.1 4.4 6.1 4.6 6.5 VI ill is and Seng (1954) published a study on the relation of birth order to the birth weights οΓ infants. The data below on first-born and eighth-born infants are extracted from a table of birth weights of male infants of Chinese third-class patients at the K a n d a n g Kerbau Maternity Hospital in Singapore in 1950 and 1951. Birth weight (Ih:: oz ) 3:0 3:8 4:0 4:8 5:0 5:8 6:0 6:8 7:0 7:8 8:0 8:8 9:0 9:8 10:0 10:8 3: 7 3: 15 4::7 •4:: 15 5::7 5 : 15 6:: 7 6 : 15 7:7 7 : 15 8: 7 8:1 5 9 :7 9 :15 10:7 10:15 Birth I order ti .1 3 7 111 267 457 485 363 162 64 6 5 4 5 19 52 55 61 48 39 19 4 1 1932 8.4 variance 307 Which birth order appears to be accompanied by heavier infants? Is this differ ence significant? Can you conclude that birth order causes differences in birth weight? (Computational note: The variable should be coded as simply as possible.) Reanalyze, using the I test, and verify that ff = F s . ANS. l s ^ 11.016 and /·;=- 121.352 " The following cytochrome oxidase assessments of male Pcriplaneta roaches in cubic millimeters per ten minutes per milligram were taken IVom a larger study exercises 183 24 hours after methoxychlor injection Control 8.5 y Sy 5 24.8 19.7 0.9 1.4 3 Are the two means significantly different? P. E. Hunter (1959. detailed data unpublished) selected two strains of D. melanoiicisler, one for short larval period (SL) and one for long larval period (LL). A nonselected control strain (CS) was also maintained. At generation 42 these data were obtained for the larval period (measured in hours). Analyze and interpret. Strain CS SL tii 8.6 η LL 80 69 33 8070 7291 3640 3 "ι Σ Σ γ 2 = 1,994.650 Note that part of the computation has already been performed for you. Perform unplanned tests a m o n g the three means (short vs. long larval periods and each against the control). Set 95% confidence limits to the observed differences of means for which these comparisons are made. ANS. MS | S L v s 1 L ) = 2076.6697. These data are measurements of live random samples of domestic· pigeons collected during January, February, and March in Chicago in 1955. The variableis the length from the anterior end of the narial opening to the lip of the bony beak and is recorded in millimeters. Data from Olson and Miller (1958). 1 5.4 5.3 5.2 4.5 5.0 .5.4 3.8 5.9 5.4 5.1 5.4 4.1 5.2 4.8 4.6 5.7 5.9 5.8 5.0 5.0 1 5.2 5.1 4.7 5.0 5.9 5.3 6.0 5.2 6.6 5.6 5.1 5.7 5.1 4.7 6.5 5.1 5.4 5.8 5.8 5.9 Samples 3 4 5.5 4.7 4.8 4.9 5.9 5.2 4.8 4.9 6.4 5.1 5.1 4.5 5.3 4.8 5.3 5.4 4.9 4.7 4.8 5.0 5.1 4.6 5.4 5.5 5.2 5.0 4.8 5.1 4.4 6.5 4.8 4.9 6.0 4.8 5.7 5.5 5.8 5.6 5.5 5.0 s 5.1 5.5 5.9 6.1 5.2 5.0 5.9 5.0 4.9 5.3 5.3 5.1 4.9 5.8 5.0 5.6 6.1 5.1 4.8 4.9 184 8.7 198 c h a p t e r 8 / s i n g l e - c l a s s i f i c a t i o n a n a l y s i s o f The following data were taken from a study of blood protein variations in deer (Cowan and Johnston, 1962). The variable is the mobility of serum protein fraction II expressed as 1(T 5 cm 2 /volt-seconds. Sitka California blacktail Vancouver Island blacktail Mule deer Whitetail 8.8 variance Y S 2.8 2.5 2.9 2.5 2.8 0.07 0.05 0.05 0.05 0.07 T η = 12 for each mean. Perform an analysis of variance and a multiple-comparison test, using the sums of squares STP procedure. ANS. MS within = 0.0416; maximal nonsignificant sets (at Ρ = 0.05) are samples 1, 3, 5 and 2, 4 (numbered in the order given). For the data from Exercise 7.3 use the Bonferroni method to test for differences between the following 5 pairs of treatment means: A, Β A, C A, D A, (B + C + D)/3 B, (C + D)/2 CHAPTER Two-Way Analysis of Variance F r o m the single-classification a n o v a of C h a p t e r 8 we p r o g r e s s t o the t w o - w a y a n o v a of the p r e s e n t c h a p t e r by a single logical step. I n d i v i d u a l items m a y be g r o u p e d i n t o classes r e p r e s e n t i n g t h e different possible c o m b i n a t i o n s of t w o t r e a t m e n t s o r factors. T h u s , the h o u s e f l y w i n g l e n g t h s s t u d i e d in earlier c h a p t e r s , which yielded s a m p l e s r e p r e s e n t i n g different m e d i u m f o r m u l a t i o n s , might also be divided i n t o m a l e s a n d females. S u p p o s e we w a n t e d t o k n o w n o t o n l y w h e t h e r m e d i u m 1 i n d u c e d a different wing l e n g t h t h a n m e d i u m 2 b u t a l s o w h e t h e r m a l e housefiies differed in w i n g length f r o m females. O b v i o u s l y , e a c h c o m b i n a t i o n of f a c t o r s s h o u l d be r e p r e s e n t e d by a s a m p l e of flies. T h u s , for seven m e d i a a n d t w o sexes we need at least 7 x 2 = 1 4 s a m p l e s . Similarly, the exp e r i m e n t testing five s u g a r t r e a t m e n t s o n p e a s e c t i o n s (Box 8.1) m i g h t h a v e been carried o u t at t h r e e different t e m p e r a t u r e s . T h i s w o u l d h a v e resulted in a two-way analysis of variance of t h e effects of s u g a r s as well as of t e m p e r a t u r e s . It is the a s s u m p t i o n of this t w o - w a y m e t h o d of a n o v a t h a t a given t e m p e r a t u r e a n d a given s u g a r each c o n t r i b u t e a c e r t a i n a m o u n t to the g r o w t h of a p e a section, a n d t h a t these t w o c o n t r i b u t i o n s a d d their effects w i t h o u t i n f l u e n c i n g each o t h e r . In Section 9.1 wc shall see h o w d e p a r t u r e s f r o m the a s s u m p t i o n 186 c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e are measured; we shall also consider the expression for d e c o m p o s i n g variates in a t w o - w a y a n o v a . T h e t w o factors in the present design m a y represent either M o d e l I or M o d e l II effects o r o n e of each, in which case we talk of a mixed model. T h e c o m p u t a t i o n of a t w o - w a y a n o v a for replicated subclasses (more t h a n o n e variate per subclass or factor c o m b i n a t i o n ) is s h o w n in Section 9.1, which also c o n t a i n s a discussion of the m e a n i n g of interaction as used in statistics. Significance testing in a two-way a n o v a is the subject of Section 9.2. This is followed by Section 9.3, on two-way a n o v a without replication, or with only a single variate per subclass. T h e well-known m e t h o d of paired c o m p a r i s o n s is a special ease of a t w o - w a y a n o v a without replication. W e will n o w proceed to illustrate the c o m p u t a t i o n of a t w o - w a y a n o v a . You will o b t a i n closer insight into the s t r u c t u r e of this design as we explain the c o m p u t a t i o n s . 9.1 Two-way anova with replication W e illustrate the c o m p u t a t i o n of a t w o - w a y a n o v a in a study of oxygen cons u m p t i o n by two species of limpets at three c o n c e n t r a t i o n s of seawater. Eight replicate readings were o b t a i n e d for each c o m b i n a t i o n of species a n d s e a w a t e r c o n c e n t r a t i o n . W e have c o n t i n u e d t o call the n u m b e r of c o l u m n s and a r e calling the n u m b e r of rows b. T h e sample size for each cell (row a n d c o l u m n c o m b i n a t i o n ) of the table is n. T h e cells are also called s u b g r o u p s or subclasses. T h e d a t a arc featured in Box 9.1. T h e c o m p u t a t i o n a l steps labeled Preliminary computations provide an efficient p r o c e d u r e for the analysis of variance, but we shall u n d e r t a k e several digressions to ensure that the c o n c e p t s u n d e r lying this design arc a p p r e c i a t e d by the reader. We c o m m e n c e by c o n s i d e r i n g the six subclasses as t h o u g h they were six g r o u p s in a single-classification a n o v a . liach s u b g r o u p or subclass represents eight oxygen c o n s u m p t i o n readings. If we had no further classification of these six s u b g r o u p s by species or salinity, such an a n o v a would test whether there was any variation a m o n g the six subg r o u p s over a n d a b o v e the variance within (he s u b g r o u p s . But since we have the subdivision by species a n d salinity, o u r only p u r p o s e here is to c o m p u t e s o m e quantities necessary for the further analysis. Steps I t h r o u g h 3 in Box 9.1 correspond to the identical steps in Box 8.1, a l t h o u g h the symbolism has changed slightly, since in place of a g r o u p s we now have ab subgroups. T o c o m p l e t e the a n o v a , we need a correction term, which is labeled step 6 in Box 9.1. F r o m these quantities we o b t a i n SSu„ah a n d .S\S\vilhlll in steps 7, 8, a n d 12, c o r r e s p o n d ing to steps 5, 6, and 7 in the layout of Box 8.1. T h e results of this preliminary a n o v a arc featured in l able 9.1. T h e c o m p u t a t i o n is continued by finding the s u m s of squares for rows a n d c o l u m n s of the table. This is dime by the general f o r m u l a stated at the end of Section 8.1. Thus, for columns, we s q u a r e the c o l u m n sums, sum the resulting squares, a n d divide the result by 24. the n u m b e r of items per row. T h i s is step 4 in Box 9.1. Λ similar q u a n t i t y is c o m p u t e d for rows (step 5). F r o m these 187 ω —i. 25 °° e t II 'Π 3 « 5> ^ Μ "3 ω u> ίβ Ό ed Λ S S5 Ο c .2 _ "<-»i 0c §•2 cΟ .51 Ο -D 8 I S 8 « H. a •3 P. Ή « <3 •a 5 υt A W <3 OS 5? <u cs (u I iu (Ν t o o o Ο Φ 00 νο Ο • rΝ CJ •w 3 •si (Λ -t^ ω XI ' ο 60 ό Ο •β s ΐ 2 •β m Ο >> * "S ο rv| "ί δ- V3 2 •S χ> UH Τ3 00 ο.Ο £ — Α> β .2 Ο_ ' ο TJ αν S S δ3 <& 2 60 S " x 4 >> <L>-SS ο ε π η οο ON m σι « ο σν σν νο <η α\ "fr νΟ Q Ον r-- 0 «ο ο »h 00 *3F τ r^j rr cR ρ- 00 v£> ro c> "Λ *T ΣΝ >Η —; νο Ο Ο νο (Ν 55 —ι νο ο 00 νΟ Ον Tt 3 νΟ 00 Q m II —< Γ-; Φ Ον 10 rn οό ο as g 00 Λ oc d Η ~ ~ Ο Ο oo w-ί «λ Κ νο 8 S νο —< σν νο -Η Tfr ο οΓΟ οί VO τ»· οο' rον vd 00 Ο II m νο ro Ον νο Ί—4 W 8 "Τ 0 Ο Ο Ο «ο S 00 ^Η ^J Ο rII tΓ—Ο 00 rr r— σ< 00 o< W <η « 1 = t .2 _ 6·? >Λ Ο ν-> ο. Β D I ί BOX 9.1 Continued Preliminary computations a b π Υ 1. Grand total = Σ Σ Σ = 461.74 2. Sum of the squared observations = Σ Σ Σ γ 2 = + ••• + (12.30)2 = 5065.1530 3. Sum of the squared subgroup (cell) totals, divided by the sample size of the subgroups 2 " b / η Σ Σ \ Σ γ ν / (84.49)2 + •·• + (98.61)2 « « fb η \2 t Ϋ f y/ ι 4. Sum of the squared column totals divided by the sample size of a column = - A = bn b/a η Υ 5. Sum of the squared row totals divided by the sample size of a row = Σ^ Ϊ Σ Σ.... = 4663.6317 8 (2«.00) 2 + (216.74)2 _ ~~ (3 χ 8) ~ 4438.3S44 \2 1 an (143.92)η22 + (121.82)2 + (196.00)2 (2^8) = 46230674 6. Grand total squared and divided by the total sample size = correction term CT / a b it \2 \ ΣΣΣΣΣγ Π ) / abn 7- SS,„,ai = Σ Σ Σ a γ1 b / η ~ C T , „. (qua (quantity, l), 2 abn „(461.74), 2 (2x3x8)"4441'7464 = quantity 2 - quantity 6 = 5065.1530 - 4441.7464 = 623.4066 \2 ΣΣΙΣ 8. SSsubgr = ^ - C T = quantity 3 - quantity 6 = 4663.6317 - 4441.7464 = 221.8853 it a ( b V ς(ςς^) 9. SSA (SS of columns) = — η b fa Σ ( Σ Σ 10. SSB (SS of rows) = — ^ C T = quantity 4 - quantity 6 = 4458.3844 - 4441.7464 = 16.6380 bn \2 γ Ι '— - CT = quantity 5 - quantity 6 = 4623.0674 - 4441.7464 = 181.3210 an 11. SSA „ B (interaction SS) = SS subgr - SSA - SS„ = quantity 8 - quantity 9 - quantity 10 = 221.8853 - 16.6380 - 181.3210 = 23.9263 12. SSwUhin (within subgroups; error SS) = SSloltll — SSsllbgr = quantity 7 - quantity 8 = 623.4066 - 221.8853 = 401.5213 As a check on your computations, ascertain that the following relations hold for some of the above quantities: 2 S 3 S 4 i 6; 3 > 5 > 6. Explicit formulas for these sums of squares suitable for computer programs are as follows: 9 a . SSA = n b t ( Y A - 10a. SSB = n a £ ( f B 11a. SSAB Y)2 Y = n £ i ( Y - ? 12a. SS within = n ? A - ? t i ^ - ? ) 2 B + f ) 2 BOX 9.1 Continued Now fill in the anova table. Source of variation jf "J A (columns) Ϋ Α - ? a - >« 1 MS Expected 9 9 2 , nb« <r2 + — — Vώ a - ( a - I ) Y - Y B Β (rows) Υ - Ϋ Α - Υ β + Ϋ h - Α χ Β (interaction) 1 (a - 1 Kb Y - Y Within subgroups ab(n - Y - f Total abn — I 1 6 Source of variation Model II * σ2 + ησζΒ Β Α χ Β 1) - 10 1) (a - m 12 3 1 ^ M Λ - 1) (a - W - 2 1) Z w ) 12 1) 1 b o t h faCtors >the ex ?ected ^ o v e are eorreet Below are the corresponding Mixed model (.4 fixed, β random) + nbai σ2 + π<7 2 β + ι σ + ηα Α Β ι π- f ^ 2 a 11 11 1) Γ) b ib - ab(n - e x p i r n f f o S r m o S A 10 MS (Model naog nb σ2 + ησ\Β + α" -I- 2 σ + ° α — I 2 ηασ| ησ"ΑΒ σ2 Within subgroups Anova table Source of variation df 1 SS MS F, 16.638 90.660 11.963 9.560 1.740 ns 9.483** 1.251 ns A (columns; species) β (rows: salinities) Α χ B (interaction) Within subgroups (error) Ί 42 16.6380 181.3210 23.9263 401.5213 Total 47 623.4066 fd.0511.4.2] = 4.07 1 Fo.05E2.4 2] = 3.22 Fo.01(2,42] = 5.15 Since this is a Model I anova, all mean squares are tested over the error MS. For a discussion of significance tests, see Section 9.2. Conclusions.—Oxygen consumption does not differ significantly between the two species of limpets but differs with the sa!in:r· At 50% seawater, the O , consumption is increased. Salinity appears to affect the two species equally, for there is insufficient evidir.:; of a species χ salinity interaction. I 192 c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e TABLE 9.1 Preliminary anova of subgroups in two-way anova. D a t a f r o m Box 9.1. Source of variation df Y Y - Ϋ Υ Among subgroups Within subgroups 5 42 ab - 1 ab(n - Y - Τ Total 47 abn — 1) SS MS 221.8853 401.5213 44.377** 9.560 623.4066 q u o t i e n t s we s u b t r a c t t h e c o r r e c t i o n term, c o m p u t e d as q u a n t i t y 6. T h e s e s u b t r a c t i o n s a r e carried o u t as steps 9 a n d 10, respectively. Since t h e r o w s a n d c o l u m n s a r e b a s e d o n e q u a l s a m p l e sizes, we d o n o t h a v e t o o b t a i n a s e p a r a t e q u o t i e n t for t h e s q u a r e of e a c h r o w o r c o l u m n s u m b u t c a r r y o u t a single division a f t e r a c c u m u l a t i n g t h e s q u a r e s of t h e s u m s . Let us r e t u r n for a m o m e n t t o the p r e l i m i n a r y a n a l y s i s of v a r i a n c e in T a b l e 9.1, w h i c h d i v i d e d t h e t o t a l s u m of s q u a r e s i n t o t w o p a r t s : t h e s u m of s q u a r e s a m o n g the six s u b g r o u p s ; a n d t h a t w i t h i n the s u b g r o u p s , t h e e r r o r s u m of s q u a r e s . T h e new s u m s of s q u a r e s p e r t a i n i n g t o r o w a n d c o l u m n effects clearly are n o t p a r t of the e r r o r , but m u s t c o n t r i b u t e t o t h e differences t h a t c o m p r i s e the s u m of s q u a r e s a m o n g t h e f o u r s u b g r o u p s . W e t h e r e f o r e s u b t r a c t r o w a n d col u m n SS f r o m the s u b g r o u p SS. T h e latter is 221.8853. T h e r o w S S is 181.3210, a n d t h e c o l u m n SS is 16.6380. T o g e t h e r they a d d u p t o 197.9590, a l m o s t b u t n o t q u i t e t h e value of t h e s u b g r o u p s u m of s q u a r e s . T h e difference r e p r e s e n t s a t h i r d s u m of s q u a r e s , called the interaction sum of squares, w h o s e v a l u e in this case is 23.9263. W c shall discuss the m e a n i n g of this new s u m of s q u a r e s presently. At the m o m e n t let us say o n l y t h a t it is a l m o s t a l w a y s p r e s e n t (but n o t necessarily significant) a n d g e n e r a l l y t h a t it need n o t be i n d e p e n d e n t l y c o m p u t e d but m a y be o b t a i n e d as illustrated a b o v e by the s u b t r a c t i o n of the row .SS a n d t h e colu m n SS f r o m the s u b g r o u p SS. T h i s p r o c e d u r e is s h o w n g r a p h i c a l l y in F i g u r e 9.1, which illustrates the d e c o m p o s i t i o n of the total s u m of s q u a r e s i n t o the s u b g r o u p SS a n d e r r o r SS. T h e f o r m e r is s u b d i v i d e d i n t o the row SS, c o l u m n SS, a n d i n t e r a c t i o n SS. T h e relative m a g n i t u d e s of these s u m s of s q u a r e s will differ f r o m e x p e r i m e n t to e x p e r i m e n t . In F i g u r e 9.1 they a r e not s h o w n p r o p o r t i o n a l to their a c t u a l values in the limpet e x p e r i m e n t ; o t h e r w i s e the a r e a r e p r e s e n t i n g the row SS w o u l d have to be a b o u t 11 times t h a t allotted to the c o l u m n SS. Before we c a n intelligently test for significance in this a n o v a w e m u s t u n d e r s t a n d the m e a n i n g of interaction. W e c a n best e x p l a i n i n t e r a c t i o n in a t w o - w a y a n o v a by m e a n s of a n artificial illustration b a s e d o n the limpet d a t a wc h a v e just s t u d i e d . If we i n t e r c h a n g e the r e a d i n g s for 75% a n d 50'7, for A. d'uiitulis only, we o b t a i n the d a t a t a b i c s h o w n in T a b i c 9.2. O n l y the s u m s of t h e s u b g r o u p s , rows, a n d c o l u m n s a r e s h o w n . W e c o m p l e t e the a n a l y s i s of v a r i a n c e in t h e m a n n e r p r e s e n t e d a b o v e a n d n o t e the results at the fool of f a b l e 9.2. T h e lotal a n d e r r o r SS are the s a m e as b e f o r e ( T a b l e 9.1). T h i s s h o u l d not be 9.1 / t w o - w a y a n o v a w i t h r f . p i r 193 ation R o w SS = 181.3210 T o t a l SS = 77,570.25 "S Column • S u b g r o u p SS SS = 10.6380 = 211.8803 I n t e r a c t i o n S',S* = 23.02(53 E r r o r AS = 401.5213 FIGURE 9.1 D i a g r a m m a t i c r e p r e s e n t a t i o n of the p a r t i t i o n i n g of the total s u m s of s q u a r e s in a t w o - w a y o r t h o g o n a l a n o v a . T h e a r e a s of the subdivisions are not s h o w n p r o p o r t i o n a l to the m a g n i t u d e s of the s u m s of squares. s u r p r i s i n g , since we a r e u s i n g the s a m e d a t a . All t h a t we h a v e d o n e is t o interc h a n g e the c o n t e n t s of t h e l o w e r t w o cells in t h e r i g h t - h a n d c o l u m n of the table. W h e n we p a r t i t i o n t h e s u b g r o u p SS, we d o find s o m e differences. W e n o t e t h a t the SS b e t w e e n species (between c o l u m n s ) is u n c h a n g e d . Since the c h a n g e we m a d e w a s w i t h i n o n e c o l u m n , t h e t o t a l for t h a t c o l u m n w a s n o t altered a n d c o n s e q u e n t l y t h e c o l u m n SS did n o t c h a n g e . H o w e v e r , t h e s u m s TABl.F. 9 . 2 An artificial example to illustrate the meaning of interaction. T h e r e a d i n g s for 75'7, a n d 50% s e a w a t e r c o n c e n t r a t i o n s of Acmaea digitalis in Box 9.1 have been i n t e r c h a n g e d . O n l y s u b g r o u p a n d marginal totals are given below. Species Seawater concentration A. scahra A digitalis 100";, 75",; so",; 84.49 63.12 97.39 59.43 98.61 58.70 Σ 245.00 216.74 £ 143.92 161.73 156.09 461/74 Completed anova Sintrce of variation df SS MS Species Salinities Sp χ Sal Error Total 1 2 2 42 47 16.6380 10.3566 194.8907 401.5213 623.4066 16.638 ns 5.178 m 97.445** 9.560 194 c h a p t e r 9 ,/ t w o - w a y a n a l y s i s oh v a r i a n c e of the second and third rows have been altered appreciably as a result of the interchange of the readings for 75% and 50% salinity in A. digitalis. The sum for 75% salinity is now very close to that for 50% salinity, and the difference between the salinities, previously quite m a r k e d , is now n o longer so. By contrast, the interaction SS, obtained by subtracting the sums of squares of rows and columns from the s u b g r o u p SS, is now a large quantity. R e m e m b e r that the s u b g r o u p SS is the same in the two examples. In the first example we subtracted sums of squares due to the effects of both species and salinities, leaving only a tiny residual representing the interaction. In the second example these two main effects (species and salinities) account only for little of the s u b g r o u p sum of squares, leaving the interaction sum of squares as a substantial residual. W h a t is the essential difference between these two examples? In Table 9.3 we have shown the s u b g r o u p and marginal m e a n s for the original d a t a from Table 9.1 and for the altered d a t a of Table 9.2. T h e original results are quite clear: at 75% salinity, oxygen c o n s u m p t i o n is lower than at the other two salinities, and this is true for both species. We note further that A. scabra consumes more oxygen than A. digitalis at two of the salinities. T h u s our statements a b o u t differences due to species or to salinity can be m a d e largely independent of each other. However, if we had to interpret the artificial d a t a (lower half of Table 9.3), we would note that although A. scabra still consumes m o r e oxygen than A. digitalis (since column sums have not changed), this difference depends greatly on the salinity. At 100% and 50%, A. scabra consumes considerably more oxygen than A. digitalis, but at 75% this relationship is reversed. Thus, we are n o longer able to m a k e an unequivocal statement a b o u t the a m o u n t of oxygen taken up by the two species. We have to qualify our statement by the seawater concentration at which they are kept. At 100% ι Mil ι 9.3 Comparison of means of the data in Box 9.1 and Table 9.2. Spa ies Seawiiter ianccniraiion - A. scabra - .·). (lii/italis Μ can 10.56 7.89 12.17 7.43 7.34 12.33 9.00 7.61 12.25 10.21 9.03 9.62 10.56 7.89 12.17 7.43 12.33 7.34 9.03 9.00 10.1 1 9.76 9.62 V./ Oruftnui ilalu from Box ion",; 75".; 50",; Mean Artificial data from loo",; 75",; 50",; Mean Table 10.21 9.1 / t w o - w a y a n o v a w i t h r i i'i κ λ h o n 195 a n d 50%, Yscabra > y d i g i , a l i ! ^ b u t at 75%, T scabril < K d , Bilali ,. If we examine the effects of salinity in the artificial example, we notice a mild increase in oxygen c o n s u m p t i o n at 75%. H o w e v e r , again we have to qualify this s t a t e m e n t by the species of the c o n s u m i n g limpet; scabra c o n s u m e s least at 75%, while digitalis c o n s u m e s most at this c o n c e n t r a t i o n . This d e p e n d e n c e of the effect of o n e factor o n the level of a n o t h e r f a c t o r is called interaction. It is a c o m m o n a n d f u n d a m e n t a l scientific idea. It indicates that the effects of t h e t w o factors are not simply additive b u t t h a t any given c o m b i n a t i o n of levels of factors, such as salinity c o m b i n e d with a n y one species, contributes a positive o r negative increment to the level of expression of the variable. In c o m m o n biological terminology a large positive increment of this sort is called synergism. W h e n drugs act synergistically, the result of the interaction of the t w o d r u g s m a y be a b o v e a n d b e y o n d the sum of the separate effects of each drug. W h e n levels of t w o factors in c o m b i n a t i o n inhibit each other's effects, wc call it interference. ( N o t e that "levels" in a n o v a is customarily used in a loose sense to include not only c o n t i n u o u s factors, such as the salinity in the present example, but also qualitative factors, such as the two species of limpets.) Synergism a n d interference will both tend to magnify the interaction SS. Testing for interaction is an i m p o r t a n t p r o c e d u r e in analysis of variance. If the artificial d a t a of T a b l e 9.2 were real, it would be of little value to state that 75% salinity led to slightly greater c o n s u m p t i o n of oxygen. This statement would cover up the i m p o r t a n t differences in the d a t a , which are t h a t scabra c o n s u m e s least at this c o n c e n t r a t i o n , while digitalis c o n s u m e s most. Wc are now able to write an expression symbolizing the d e c o m p o s i t i o n of a single variatc in a two-way analysis of variance in the m a n n e r of Expression (7.2) for single-classification a n o v a . T h e expression below a s s u m e s that both factors represent fixed treatment effects. Model I. This would seem reasonable, since species as well as salinity are fixed treatments. Variatc Yiik is the Alh item in the s u b g r o u p representing the /th g r o u p οΓ treatment A a n d the /th g r o u p οΓ t r e a t m e n t B. It is d e c o m p o s e d as follows: Yijk = / < + «, + / i , + (=r/i),7 + (9.1) where μ equals the p a r a m e t r i c mean of the p o p u l a t i o n , is the fixed treatment effect for the ;th g r o u p of treatment Α, β, is the fixed treatment effect of the /th g r o u p of t r e a t m e n t β, (of/0,,· is the interaction effect in the s u b g r o u p representing the /th g r o u p of factor A a n d the /lh g r o u p of factor B, and t,jk is the e r r o r term of the fctli item in s u b g r o u p ij. We m a k e the usual a s s u m p t i o n that ej;Jl is normally distributed with a mean of 0 and a variance of a 2 . If one or both of the factors represent Model II effects, we replace the a, a n d / o r ftj in Ihe f o r m u l a by A, a n d / ο ι ΰ,. In previous c h a p t e r s we have seen that each sum of s q u a r e s represents a sum of s q u a r e d deviations. W h a t actual deviations does an interaction SS represent? Wc can see this easily by referring back to t h e j u i o v a s of T a b l e 9.1. T h e variation a m o n g s u b g r o u p s is represented by ( F — V), where V s t a n d s for the c h a p t e r 9 ,/ t w o - w a y 196 a n a l y s i s oh variance s u b g r o u p m e a n , a n d F for the g r a n d m e a n . W h e n we subtract the deviations d u e to rows ( R — F) a n d those d u e to c o l u m n s (C — F) f r o m those d u e t o subg r o u p s , we o b t a i n (F-P)-(«-?)-(C-y)=F-y-K+?-c+F = F-κ - c + F T h i s s o m e w h a t involved expression is the deviation d u e t o interaction. W h e n we e v a l u a t e o n e such expression for each s u b g r o u p , s q u a r e it, s u m the squares, a n d multiply the s u m by n, we o b t a i n the i n t e r a c t i o n SS. This p a r t i t i o n of the d e v i a t i o n s also holds for their squares. This is so because the s u m s of t h e p r o d ucts of the s e p a r a t e t e r m s cancel o u t . A simple m e t h o d for revealing the n a t u r e of the interaction present in the d a t a is to inspect the m e a n s of the original d a t a table. We c a n d o this in T a b l e 9.3. T h e original d a t a , s h o w i n g n o interaction, yield the following p a t t e r n of relative m a g n i t u d e s : Scahra Digitalis ν ν Λ Λ 100% 75% 50% T h e relative m a g n i t u d e s of the m e a n s in the lower part of T a b l e 9.3 can be s u m marized as follows: Scuhru Digitalis 100% V Λ Λ V 75% 50% W h e n the p a t t e r n of signs expressing relative m a g n i t u d e s is not u n i f o r m as in this latter table, interaction is indicated. As long as the p a t t e r n of m e a n s is consistent, as in the f o r m e r table, interaction may not be present. However, interaction is often present without c h a n g e in the direction of the differences; sometimes only the relative m a g n i t u d e s are alTected. In any case, the statistical test needs to be performed to test whether the deviations arc larger t h a n can be expected f r o m c h a n c e alone. In s u m m a r y , when the effect of two t r e a t m e n t s applied together c a n n o t be predicted from the average responses of the s e p a r a t e factors, statisticians call this p h e n o m e n o n interaction a n d test its significance by m e a n s of an interaction 9.2 / T W O - W A Y ANOVA: SIGNIFICANCE TESTING I')/ m e a n square. This is a very c o m m o n p h e n o m e n o n . If we say that the effect of density o n the fecundity or weight of a beetle d e p e n d s o n its genotype, we imply that a g e n o t y p e χ density interaction is present. If the success of several alternative surgical p r o c e d u r e s d e p e n d s on the n a t u r e of the p o s t o p e r a t i v e t r e a t m e n t , we s p e a k of a p r o c e d u r e χ t r e a t m e n t interaction. O r if t h e effect of t e m p e r a t u r e on a m e t a b o l i c process is i n d e p e n d e n t of the effect of oxygen c o n c e n t r a t i o n , we say t h a t t e m p e r a t u r e χ oxygen interaction is absent. Significance testing in a two-way a n o v a will be deferred until t h e next section. H o w e v e r , we should point o u t that the c o m p u t a t i o n a l steps 4 a n d 9 of Box 9.1 could have been s h o r t e n e d by e m p l o y i n g the simplified f o r m u l a for a sum of squares between two groups, illustrated in Section 8.4. In a n analysis with only t w o r o w s a n d t w o c o l u m n s the interaction SS c a n be c o m p u t e d directly as (Sum of o n e d i a g o n a l - sum of o t h e r diagonal) 2 abn 9.2 Two-way anova: Significance testing Before we can test h y p o t h e s e s a b o u t the sources of variation isolated in Box 9.1, we must become familiar with t h e expected m e a n squares for this design. In the a n o v a table of Box 9.1 we first show the e x p e c t e d - m e a n s q u a r e s for M o d e i I, both species differences a n d seawater c o n c e n t r a t i o n s being fixed t r e a t m e n t effects. T h e t e r m s should be familiar in the context of y o u r experience in the previous chapter. T h e q u a n t i t i e s Σ " α 2 , Σ ' / ? 2 , a n d Σ ^ α β ) 2 represent a d d e d c o m p o n e n t s d u e t o t r e a t m e n t for columns, rows, a n d interaction, respectively. N o t e t h a t the w i t h i n - s u b g r o u p s or e r r o r MS again estimates the p a r a m e t r i c variance of the items, σ 2 . T h e most i m p o r t a n t fact to r e m e m b e r a b o u t a M o d e l 1 a n o v a is that the m e a n s q u a r e at each level of variation carries only the added effect d u e to that level of t r e a t m e n t . Hxccp! for the p a r a m e t r i c variance of the items, it d o e s not contain any term from a lower line. T h u s , the expected M S o f f a c l o r A c o n t a i n s only the p a r a m e t r i c variance of the items plus the a d d e d term d u e to f a c t o r A, but does nol also include interaction effects. In M o d e l 1, the significance test is therefore simple a n d s t r a i g h t f o r w a r d . Any source of variation is tested by the variance ratio of the a p p r o p r i a t e m e a n s q u a r e over the e r r o r MS T h u s , for the a p p r o p r i a t e tests we e m p l o y variance ratios Λ/Error, β/Error a n d ( Α χ β)/ Error, where each boldface term signifies a m e a n square. T h u s A — MSA, Error = MSwilhiI1. W h e n we d o this in the e x a m p l e of Box 9.1, we find only factor ΰ, salinity, significant. Neither factor A nor the interaction is significant. We c o n c l u d e that the differences in oxygen c o n s u m p t i o n are induced by varying salinities ( O z c o n s u m p t i o n r e s p o n d s in a V-shaped manner), a n d there d o e s not a p p e a r to be sufficient evidence for species differences in oxygen c o n s u m p t i o n . T h e t a b u l a t i o n of the relative m a g n i t u d e s of the m e a n s in the previous section s h o w s t h a t the 198 CHAPTER 9 / TWO-WAY ANALYSIS OF VARIANCE p a t t e r n of signs in t h e t w o lines is identical. H o w e v e r , this m a y be m i s l e a d i n g , since t h e m e a n of A. scabra is far higher a t 100% s e a w a t e r t h a n a t 75%, b u t t h a t of A. digitalis is only very slightly higher. A l t h o u g h the o x y g e n c o n s u m p t i o n c u r v e s of t h e t w o species w h e n g r a p h e d a p p e a r far f r o m parallel (see F i g u r e 9.2), this s u g g e s t i o n of a species χ salinity i n t e r a c t i o n c a n n o t b e s h o w n t o be significant w h e n c o m p a r e d w i t h t h e w i t h i n - s u b g r o u p s v a r i a n c e . F i n d i n g a significant difference a m o n g salinities d o e s n o t c o n c l u d e the analysis. T h e d a t a suggest t h a t at 75% salinity t h e r e is a real r e d u c t i o n in o x y g e n c o n s u m p t i o n . W h e t h e r this is really so c o u l d be tested by t h e m e t h o d s of S e c t i o n 8.6. W h e n w e a n a l y z e t h e results of the artificial e x a m p l e in T a b l e 9.2, we find o n l y t h e i n t e r a c t i o n MS significant. T h u s , we w o u l d c o n c l u d e t h a t t h e r e s p o n s e t o salinity differs in t h e t w o species. T h i s is b r o u g h t o u t b y i n s p e c t i o n of t h e d a t a , w h i c h s h o w t h a t at 75% salinity A. scabra c o n s u m e s least o x y g e n a n d A. digitalis c o n s u m e s m o s t . In t h e last (artificial) e x a m p l e the m e a n s q u a r e s of t h e t w o f a c t o r s ( m a i n effects) a r e n o t significant, in a n y ease. H o w e v e r , m a n y statisticians w o u l d n o t even test t h e m o n c e they f o u n d t h e i n t e r a c t i o n m e a n s q u a r e t o be significant, since in such a case a n overall s t a t e m e n t for each f a c t o r w o u l d h a v e little m e a n ing. A s i m p l e s t a t e m e n t of r e s p o n s e to salinity w o u l d be unclear. T h e p r e s e n c e of i n t e r a c t i o n m a k e s us q u a l i f y o u r s t a t e m e n t s : " T h e p a t t e r n of r e s p o n s e to c h a n g e s in salinity differed in the t w o species." W e w o u l d c o n s e q u e n t l y h a v e t o d e s c r i b e s e p a r a t e , n o n p a r a l l e l r e s p o n s e c u r v e s for the t w o species. O c c a sionally, it b e c o m e s i m p o r t a n t to test for overall significance in a M o d e l 1 a n o v a in spite of the p r e s e n c e of i n t e r a c t i o n . W e m a y wish t o d e m o n s t r a t e t h e significance of the effect of a d r u g , r e g a r d l e s s of its significant i n t e r a c t i o n with a g e of t h e p a t i e n t . T o s u p p o r t this c o n t e n t i o n , we m i g h t wish t o test t h e m e a n s q u a r e a m o n g d r u g c o n c e n t r a t i o n s (over the e r r o r MS), r e g a r d l e s s of w h e t h e r the i n t e r a c t i o n MS is significant. .1. digitalis I'KiURE 9 . 2 50 75 % Seawatrr 100 Oxygen consumption by two species of l i m p e t s at t h r e e salinities. D a t a f r o m Box 9.1. 9.3 / TWO-WAV ANOVA WITHOU I ΚΙ ΙΊ (CATION 199 Box 9.1 also lists expected m e a n squares for a M o d e l II a n o v a a n d a mixedmodel two-way a n o v a . Here, variance c o m p o n e n t s for c o l u m n s (factor A), for rows (factor B), a n d for interaction m a k e their a p p e a r a n c e , a n d they are design a t e d σΑ, σ | , a n d σ2ΑΒ, respectively. In the M o d e l II a n o v a n o t e t h a t the two m a i n effects c o n t a i n the variance c o m p o n e n t of the interaction as well as their own variance c o m p o n e n t . In a M o d e l II a n o v a we first test (A χ 6)/Error. If the interaction is significant, we c o n t i n u e testing Aj(A χ Β) a n d B/(A χ Β). But when Α χ Β is n o t significant, some a u t h o r s suggest c o m p u t a t i o n of a pooled e r r o r MS = (SSAxB + S S w i t h i n ) / ( ^ x B + i// within ) t o test the significance of the main effects. T h e conservative position is to c o n t i n u e to test the main effects over the interaction MS, a n d we shall follow this p r o c e d u r e in this b o o k . Only one type of mixed m o d e l is s h o w n in Box 9.1, in which factor A is assumed to be fixed a n d factor Β to be r a n d o m . If the situation is reversed, the expected m e a n squares c h a n g e accordingly. In the mixed model, it is the m e a n s q u a r e representing the fixed t r e a t m e n t that carries with it the variance c o m p o n e n t of the interaction, while the m e a n s q u a r e representing the r a n d o m factor c o n t a i n s only the error variance a n d its o w n variance c o m p o n e n t a n d does not includc the interaction c o m p o n e n t . We therefore test the MS of the r a n d o m m a i n effect over the error, but test the fixed treatment MS over the interaction. 9.3 Two-way anova without replication In m a n y experiments there will be no replication for each c o m b i n a t i o n of factors represented by a cell in the data lable. In such cases we c a n n o t easily talk of " s u b g r o u p s , " since each ccll contains a single reading only. F r e q u e n t l y it m a y be t o o difficult or t o o expensive to o b t a i n m o r e than o n e reading per cell, or the m e a s u r e m e n t s m a y be k n o w n to be so repeatable that there is little point in estimating their error. As we shall see in the following, a two-way a n o v a without replication can be properly applied only with certain assumptions. For s o m e models a n d tests in a n o v a wc must assume that there is n o interaction present. O u r illustration for this design is from a study in m e t a b o l i c physiology. In Box 9.2 wc s h o w levels of a chemical, S - P I . P , in the blood scrum of eight s t u d e n t s before, immediately after, a n d 12 h o u r s after the a d m i n i s t r a t i o n of an alcohol dose. Each studcnl has been measured only once al each lime. What is (he a p p r o p r i a t e model for this a n o v a 7 Clearly, the times arc Model I. T h e eight individuals, however, a r e not likely to be of specific interest. It is i m p r o b a b l e that an investigator would try to ask why student 4 has an S - P E P level so much higher than that of student 3. Wc would d r a w m o r e meaningful conclusions from this p r o b l e m if wc considered the eight individuals to be r a n d o m l y sampled. W c could then estimate the variation a m o n g individuals with respect to the effect of alcohol over time. T h e c o m p u t a t i o n s a r c s h o w n in Box 9.2. T h e y arc the same as those in Box 9.1 except that the expressions to be evaluated are considerably simpler. Since ι i = l , much of the s u m m a t i o n can be omitted. T h e s u b g r o u p sum of squares BOX 9.2 Two-way anova without replication. Serum-pyridoxal-t-phosphate (S-PLP) content (ng per ml of serum) of blood serum before and after ingestion of alcohol in eight subjects. This is a mixed-model anova. Factor A: Time (a = 3) Factor B: Individuals Φ = 8) Before alcohol ingestion Immediately after ingestion 12 hours later Σ 1 2 3 4 5 6 7 8 20.00 17.62 11.77 30.78 11.25 19.17 9.33 32.96 12.34 16.72 9.84 20.25 9.70 15.67 8.06 19.10 17.45 18.25 11.45 28.70 12.50 20.04 10.00 30.45 49.79 52.59 33.06 79.73 33.45 54.88 27.39 82.51 Σ 152.88 111.68 148.84 413.40 Source: Data from Leinert et aL (1983). The eight sets of three readings are treated as replications (blocks) in this analysis. Time is a fixed treatment effect, while differences between individuals are considered to be random effects. Hence, this is a mixed-model anova. Preliminary computations a b 1. Grand total = Σ Σ y 413 40 = · α b 2. Sum of the squared observations = Σ Σ y2 = (20.00)2 + - · · + (30.45)2 = 8349.4138 -» Sum c ι . . ι divided .»· Μ by u sample ι size · ofr a column ι Σ ( Σ 77 3. ofr squaredΛ column totals =— b b fa 2 2 2 + (148.84) = (152.88) + (111.68) — — = 7249.7578 8 \2 y τ y] 4. Sum of squared row totals divided by sample size of a row = — \ a / (49 79)2 -t- • · • -j- (82 51 )2 — = —' —-—— = 8127.8059 3 a b \2 Σ Σ η 5. Grand total squared and divided by the total sample size = correction term CT •ab = (quantity I) 2 = ( 4 1 3 : 4 0 ) ! = ab 6· SSu»ai = Σ Σ 7. SSA γ2 ~ C T = 7120 8150 24 quantity 2 - quantity 5 = 8349.4138 - 7120.8150 = 1228.5988 Σ(ςυ)2 (SS of columns) = — \ 1 - C T = quantity 3 - quantity 5 = 7249.7578 - 7120.8150 = 128.9428 b Σ(ςυ)2 8. SSB (SS of rows) = — ^ a J — - CT= quantity 4 - quantity 5 = 8127.8059 - 7120.8150 = 1006.9909 9. SS error (remainder; discrepance) = SSlota) - SSA - SSB = quantity 6 — quantity 7 - quantity 8 = 1228.5988 - 128.9428 - 1006.9909 = 92.6651 202 8 w •α a, «3 + to + ** •G OS to 5 en § + «1 NX is * ΐί Tf 3 oo Λ W> 00 w-1 00 vO ci •Ί- 00 rJi •Ίο·. oo < N4 T— Ο οOS CJ\ οΟ Vl SO \D Η Os OO OO Os v-i 00 r^i π e CO o. 1) β ε CTJ 3 υ Ό ' > •ο c •3 ' 3 C Β υ J2 ο υ OQ ω ο Η + Γ-1 "S χ i 3 χ "S Ο § βο υ c .s § Ϊ3 ι>? I ι ί». 203 9 . 3 / TWO-WAY ANOVA WITHOUT REPLICATION R o w SS = 1006.9909 T o t a l SS = 1228.5988 < C o l u m n .S'.S = 128.9428 >- S u b g r o u p = 122S.5988 I n t e r a c t i o n SS = 92.6651 = r e m a i n d e r £ E r r o r .S'.V = 0 FIGURF. 9 . 3 D i a g r a m m a t i c r e p r e s e n t a t i o n of t h e p a r t i t i o n i n g of t h e total s u m s of s q u a r e s in a t w o - w a y o r t h o g o n a l a n o v a w i t h o u t r e p l i c a t i o n . T h e a r e a s of the s u b d i v i s i o n s a r e not s h o w n p r o p o r t i o n a l to t h e m a g n i t u d e s of t h e s u m s of s q u a r e s . in this example is the s a m e as the total sum of squares. If this is not immediately a p p a r e n t , consult Figure 9.3, which, w h e n c o m p a r e d with Figure 9.1, illustrates that the e r r o r sum of squares based on variation within s u b g r o u p s is missing in this example. T h u s , after we s u b t r a c t t h e sum of squares for c o l u m n s (factor A) a n d for rows (factor B) f r o m the total SS, we are left with only a single sum of squares, which is the equivalent of the previous interaction SS but which is n o w the only source for an e r r o r term in the a n o v a . This SS is k n o w n as the remainder SS or the discrepance. If you refer to the expected m e a n s q u a r e s for the two-way a n o v a in Box 9.1, you will discover why we m a d e the s t a t e m e n t earlier that for s o m e models and tests in a two-way a n o v a w i t h o u t replication we must a s s u m e that the interaction is not significant. If interaction is present, only a M o d e l II a n o v a can be entirely tested, while in a mixed model only the fixed level c a n be tested over the r e m a i n d e r m e a n square. But in a pure M o d e l I a n o v a , o r for the r a n d o m factor in a mixed model, it would be i m p r o p e r to test the m a i n effects over the r e m a i n d e r unless we could reliably a s s u m e that n o a d d e d effect d u e to interaction is present. G e n e r a l inspection of the d a t a in Box 9.2 convinces us that the t r e n d s with time for any o n e individual are faithfully reproduced for the o t h e r individuals. Thus, interaction is unlikely to be present. If, for example, some individuals had not responded with a lowering of their S - P L P levels after ingestion of alcohol, interaction would have been a p p a r e n t , a n d the test of the m e a n s q u a r e a m o n g individuals carricd out in Box 9.2 would not have been legitimate. Since we a s s u m e no interaction, the r o w and c o l u m n m e a n s q u a r e s arc tested over the e r r o r MS. T h e results a r e not surprising; casual inspection of the d a t a would have predicted o u r findings. Differences with time are highly significant, yielding a n F„ value of 9.741. T h e a d d e d variance a m o n g individuals is also highly significant, a s s u m i n g there is n o interaction. A c o m m o n a p p l i c a t i o n of t w o - w a y a n o v a w i t h o u t replication is the repeated testing of the same individuals. By this we m e a n that the same g r o u p of individuals 204 CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE is tested repeatedly over a period of time. T h e individuals are o n e factor (usually considered as r a n d o m a n d serving as replication), a n d the time d i m e n s i o n is the second factor, a fixed t r e a t m e n t effect. F o r example, we might m e a s u r e g r o w t h of a s t r u c t u r e in ten individuals at regular intervals. W h e n we test for the presence of an a d d e d variance c o m p o n e n t (due to the r a n d o m factor), we again m u s t a s s u m e that there is n o interaction between time a n d the individuals; that is, the responses of the several individuals are parallel t h r o u g h time. Ano t h e r use of this design is f o u n d in various physiological a n d psychological experiments in which we test the same g r o u p of individuals for the a p p e a r a n c e of some response after t r e a t m e n t . E x a m p l e s include increasing i m m u n i t y after antigen inoculations, altered responses after conditioning, and m e a s u r e s of learning after a n u m b e r of trials. Thus, we m a y study the speed with which ten rats, repeatedly tested on the same maze, reach the end point. T h e fixedt r e a t m e n t effect would be the successive trials to which the rats h a v e been subjected. T h e second factor, the ten rats, is r a n d o m , p r e s u m a b l y representing a r a n d o m sample of rats f r o m the l a b o r a t o r y p o p u l a t i o n . O n e special case, c o m m o n e n o u g h to merit s e p a r a t e discussion, is repeated testing of the s a m e individuals in which only two treatments (a = 2) a r e given. This case is also k n o w n as paired comparisons, because each o b s e r v a t i o n for o n e t r e a t m e n t is paired with o n e for the o t h e r t r e a t m e n t . This pair is c o m posed of the same individuals tested twice o r of two individuals with c o m m o n experiences, so t h a t we can legitimately a r r a n g e the d a t a as a t w o - w a y anova. Let us e l a b o r a t e on this point. S u p p o s e we test the muscle t o n e of a g r o u p of individuals, subject t h e m to severe physical exercise, a n d measure their muscle tone once more. Since the same g r o u p of individuals will have been tested twice, we can a r r a n g e o u r muscle tone readings in pairs, each pair representing readings on o n e individual (before a n d after exercise). Such d a t a are a p p r o p r i a t e l y treated by a two-way a n o v a without replication, which in this case would be a paircdc o m p a r i s o n s test because there are only t w o t r e a t m e n t classes. This " b e f o r e a n d after t r e a t m e n t " c o m p a r i s o n is a very frequent design leading to paired c o m parisons. A n o t h e r design simply measures t w o stages in the d e v e l o p m e n t of a g r o u p of organisms, time being the treatment intervening between the Iwo stages. The e x a m p l e in Box 9.3 is of this nature. It measures lower face width in a g r o u p of girls at age five and in the s a m e g r o u p of girls when they a r e six years old. The paired c o m p a r i s o n is for each individual girl, between her face width when she is five years old a n d her face width at six years. Paired c o m p a r i s o n s often result from dividing an organism o r o t h e r individual unit so that half receives t r e a t m e n t I a n d the o t h e r half t r e a t m e n t 2, which m a y be the control. T h u s , if we wish to test the strength of t w o antigens o r allergens we might inject o n e into each a r m of a single individual a n d measure the d i a m e t e r of the red area p r o d u c e d . It would not be wise, f r o m the point of view of experimental design, to test antigen 1 on individual I a n d antigen 2 on individual 2. These individuals m a y be differentially susceptible to these antigens, and we may learn little a b o u t the relative potency of the 9 . 3 / TWO-WAV ANOVA WITHOU IΚΙΙΊ(CATION 205 BOX 9.3 Paired comparisons (randomized Mocks with β = 2). Lower face width (skeletal bigoniai diameter in cm) for 15 North American white girls measured when 5 and again when 6 years old. Individuals 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Er Σγ1 w 5-year-olds (2) 6-year-olds (i) Σ M> »=ri2-r(I (difference) 7.33 7.49 7.27 7.93 7.56 7.81 7.46 6.94 7.49 7.44 7.95 7.47 7.04 7.10 7.64 111.92 836.3300 7.53 7.70 7.46 8.21 7.81 8.01 7.72 7.13 7.68 7.66 8.11 7.66 7.20 7.25 7.79 114.92 881.8304 14.86 15.19 14.73 16.14 15.37 15.82 15.18 14.07 15.17 15.10 16.06 15.13 14.24 14.35 15.43 226.84 3435.6992 0.20 .21 .19 .28 .25 .20 .26 .19 .19 .22 .16 .19 .16 .15 .15 3.00 0.6216 Source: From a larger study by Newman and Meredith (1956). Two-way anova without replication Anova table Source of variation df SS Ages (columns; factor A) 1 0.3000 Individuals (rows; factor Β) Remainder Total 14 2.6367 14 0.0108 29 2.9475 ^o.oi|i.i4] = 8.86 MS 0.3000 388.89** 0.188,34 0.000,771,43 (244.14)** ^0.01(12.12] = Expected MS F. <r2 + o2AB + -b-τΣ"2 <3—1 22 σ + tTab <r + ασί (Conservative tabled value) Conclusions.—The variance ratio for ages is highly significant. We conclude that faces of 6-year-old girls are wider than those of 5-year-olds. If we are willing CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE 206 BOX 9.3 Continued to assume that the interaction o \ B is zero, we may test for an added variance component among individual girls and would find it significant. The t test for paired comparisons ._ D ~ (μι~μ2) «Β where D is the mean difference between the paired observations. _ το 3.oo D = _ _ — _ _ _ Λο 20 and sg = sD/v'fo is the standard error of D calculated from the observed differences in column (4): - (^Dfjb _ Sj> 1 - jO.6216 - (3.00 2 /fS) _ /0.0216 14 14 b —I ~yj = V0S")T,542,86 = 0.039,279,2 and thus _ s„ _ 0.039,279,2 • 0.010,141,9 We assume that the true difference between the means of the two groups, pt — μ2, equals zero: D- 0 ^ " 0.20 - 0 " 0Ό10,14Ι,9 " 19 7 2 0 3 With " ' = This yields Ρ « 0.0L Also tj = 388.89, which equals the previous F„, antigens, since this would be c o n f o u n d e d by the differential responses of the subjects. A m u c h better design would be lirst to injcct antigen 1 into the left a r m a n d antigen 2 into the right a r m of a g r o u p of n individuals and then to analyze the d a t a as a two-way a n o v a without replication, with η rows (individuals) a n d 2 c o l u m n s (treatments). It is p r o b a b l y immaterial whether an antigen is injected into the right or left a r m , but if wc were designing such an e x p e r i m e n t a n d knew little a b o u t the reaction of h u m a n s to antigens, we might, as a p r e c a u t i o n , r a n d o m l y allocate antigen 1 to the left or right a r m for different subjects, antigen 2 being injccted into the o p p o s i t e a r m . A similar example is the testing of ccrtain plant viruses by r u b b i n g a c o n c e n t r a t i o n of the virus over the surfacc of a leaf a n d c o u n t i n g the resulting lesions. Since different leaves are susceptible in different degrees, a c o n v e n t i o n a l way of m e a s u r i n g the strength of the virus is to 9 . 3 / T W O - W A V ANOVA WITHOU IΚΙΙΊ(CATION 207 wipe it over t h e half of the leaf on o n e side of the midrib, r u b b i n g the other half of the leaf with a control or s t a n d a r d solution. A n o t h e r design leading to paired c o m p a r i s o n s is to apply the t r e a t m e n t to t w o individuals s h a r i n g a c o m m o n experience, be this genetic or e n v i r o n m e n t a l . T h u s , a d r u g or a psychological test might be given to g r o u p s of twins o r sibs. one of each pair receiving the treatment, the o t h e r one not. Finally, the p a i r e d - c o m p a r i s e n s technique may be used when the t w o individuals to be c o m p a r e d share a single experimental unit a n d are thus subjected to c o m m o n e n v i r o n m e n t a l experiences. If we have a set of rat cages, each of which holds two rats, a n d we are trying to c o m p a r e the effect of a h o r m o n e injection with a control, we might inject o n e of each pair of rats with the h o r m o n e a n d use its cage m a t e as a control. This w o u l d yield a 2 χ η a n o v a for η cages. O n e reason for f e a t u r i n g the p a i r e d - c o m p a r i s o n s test separately is t h a t it alone a m o n g the t w o - w a y a n o v a s w i t h o u t replication h a s a n equivalent, alternative m e t h o d of a n a l y s i s — t h e t test for paired c o m p a r i s o n s , which is the traditional m e t h o d of analyzing it. T h e p a i r e d - c o m p a r i s o n s ease shown in Box 9.3 analyzes face widths of fiveand six-year-old girls, as already m e n t i o n e d . T h e question being asked is whether the faces of six-year-old girls are significantly wider than those of fiveyear-old girls. T h e d a t a a r e s h o w n in c o l u m n s (1) a n d (2) for 15 individual girls. C o l u m n (3) features the row s u m s that are necessary for the analysis of variance. T h e c o m p u t a t i o n s for the two-way a n o v a w i t h o u t replication are the same as those already s h o w n for Box 9.2 and thus arc not shown in detail. T h e a n o v a table shows that there is a highly significant difference in face width between the two age groups. If interaction is assumed to be zero, there is a large a d d e d variance c o m p o n e n t a m o n g the individual girls, u n d o u b t e d l y representing genetic as well as e n v i r o n m e n t a l differences. T h e o t h e r m e t h o d of analyzing p a i r e d - c o m p a r i s o n s designs is the wellk n o w n t test for paired comparisons. It is quite simple to apply a n d is illustrated in the second half of Box 9.3. It tests whether the mean of s a m p l e differences between pairs of readings in the t w o c o l u m n s is significantly different from a hypothetical mean, which the null hypothesis puts at zero. T h e s t a n d a r d error over which this is tested is the s t a n d a r d e r r o r of the m e a n difference. T h e difference c o l u m n has to be calculated and is s h o w n in c o l u m n (4) of the data tabic in Box 9.3. T h e c o m p u t a t i o n s arc quite s t r a i g h t f o r w a r d , a n d the conclusions a r c the s a m e as for the two-way a n o v a . This is a n o t h e r instance in which we o b t a i n the value of F s when we s q u a r e the value of /,. Although the p a i r e d - c o m p a r i s o n s t test is the traditional m e t h o d of solving this type of problem, we prefer the two-way a n o v a . Its c o m p u t a t i o n is no more t i m e - c o n s u m i n g and has the a d v a n t a g e of providing a measure of the variance c o m p o n e n t a m o n g the rows (blocks). This is useful knowledge, because if thereis no significant a d d e d variance c o m p o n e n t a m o n g blocks, o n e might simplify the analysis a n d design of future, similar studies by e m p l o y i n g single classification a n o v a . CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE 208 Exercises 9.1 Swanson, Latshaw, and Tague (1921) determined soil p H electrometrically for various soil samples from Kansas. An extract of their d a t a (acid soils) is shown below. D o subsoils differ in p H from surface soils (assume that there is no interaction between localities and depth for p H reading)? County Finney Montgomery Doniphan Jewell Jewell Shawnee Cherokee Greenwood Montgomery Montgomery Cherokee Cherokee Cherokee 9.2 Soil type Surface ρ Η Subsoil 6.57 6.77 6.53 6.71 6.72 6.01 4.99 5.49 5.56 5.32 5.92 6.55 6.53 Richfield silt loam Summit silty clay loam Brown silt loam Jewell silt loam Colby silt loam Crawford silty clay loam Oswego silty clay loam Summit silty clay loam Cherokee silt loam Oswego silt loam Bates silt loam Cherokee silt loam Neosho silt loam pH 8.34 6.13 6.32 8.30 8.44 6.80 4.42 7.90 5.20 5.32 5.21 5.66 5.66 ANS. MS between surface and subsoils = 0.6246, MS r e s i d u a l = 0.6985, Fs = 0.849 which is clearly not significant at the 5% level. The following data were extracted from a Canadian record book of purebred dairy cattle. R a n d o m samples of 10 mature (five-year-old and older) and 10 two-year-old cows were taken from each of five breeds (honor roll, 305-day class). The average butterfat percentages of these cows were recorded. This gave us a total of 100 butterfat percentages, broken down into five breeds and into two age classes. The 100 butterfat percentages are given below. Analyze and discuss your results. You will note that the tedious part of the calculation has been done for you. Ayshire Mature 2-yr 3.74 4.01 3.77 3.78 4.10 4.06 4.27 3.94 4.1 1 4.25 40.03 4.003 4.44 4.37 4.25 3.71 4.08 3.90 4.41 4.1 1 4.37 3.53 41.17 4.1 17 Canadian Mature 2-yr Guernsey Mature 2-yr Holstein-Friesian 2-vr Mature Jersey Mature 2-yr 3.92 4.95 4.47 4.28 4.07 4.10 4.38 3.98 4.46 5.05 4.29 5.24 4.43 4.00 4.62 4.29 4.85 4.66 4.40 4.33 4.54 5.18 5.75 5.04 4.64 4.79 4.72 3.88 5.28 4.66 5.30 4.50 4.59 5.04 4.83 4.55 4.97 5.38 5.39 5.97 3.40 3.55 3.83 3.95 4.43 3.70 3.30 3.93 3.58 3.54 3.79 3.66 3.58 3.38 3.71 3.94 3.59 3.55 3.55 343 4.80 6.45 5.18 4.49 5.24 5.70 5.41 4.77 5.18 5.23 5.75 5.14 5.25 4.76 5.18 4.22 5.98 4.85 6.55 5.72 43.66 45.11 48.48 50.52 37.21 36.18 52.45 53.40 4.366 4.51 1 4.848 5.052 iihit X Y2 = 2059.6109 3.721 3.618 5.245 5.340 1 \ l KC IS1 s 9.3 209 Blakeslee (1921) studied length-width ratios of second seedling leaves of two types of Jimson weed called globe (G) a n d nominal (TV). Three seeds of each type were planted in 16 pots. Is there sufficient evidence to conclude that globe and nominal differ in length-width ratio? Types Pot identification number 1.67 1.68 1.38 1.66 1.38 1.70 1.58 1.49 1.48 1.28 1.55 1.29 1.36 1.47 1.52 1.37 16533 16534 16550 16668 16767 16768 16770 16771 16773 16775 16776 16777 16780 16781 16787 16789 9.4 Ν G 1.53 1.70 1.76 1.48 1.61 1.71 1.59 1.52 1.44 1.45 1.45 1.57 1.22 1.43 1.56 1.38 1.61 1.49 1.52 1.69 1.64 1.71 1.38 1.68 1.58 1.50 1.44 1.44 1.41 1.61 1.56 1.40 2.18 2.00 2.41 1.93 2.32 2.48 2.00 1.94 1.93 1.77 2.06 2.00 1.87 2.24 1.79 1.85 2.23 2.12 2.11 2.00 2.23 2.11 2.18 2.13 1.95 2.03 1.85 1.94 1.87 2.00 2.08 2.10 2.32 2.18 2.60 2.00 1.90 2.00 2.16 2.29 2.10 2.08 1.92 1.80 2.26 2.23 1.89 2.00 ANS. AFVwilhin — 0.0177, MS, x ,, = 0.0203, MSiy)^ = 7.3206 (1·\ = 360.62**), MSiMs = 0.0598 (F, = 3.378**). T h e cllect of pots is considered to be a Model 11 factor, and types, a Model 1 factor. The following data were extracted from a more cntensive study by Sokal and K a r t c n (1964). T h e data represent mean dry weights (in mg) of three genotypes of beetles, 'I'riholiimi castaneum, reared at a density of 20 beetles per gram of flour. T h e four scries of experiments represent replications. (ienol ι Series 1 2 3 4 9.5 ι + +b bb 0.958 0.971 0.927 0.971 0.986 1.051 0.891 1.010 0.925 0.952 0.829 0.955 Test whether the genotypes differ in mean dry weight. T h e mean length of developmental period (in days) for three strains of houseflies at seven densities is given. (Data by Sullivan and Sokal, 1963.) Do these Hies differ in development period with density and a m o n g strains? You may assume absence of strain χ density interaction. 210 CHAPTER 9 ,/ TWO-WAY ANALYSIS Oh VARIANCE Strains per Dt'/i.si'/ V container OL BF.LL bwb 60 9.6 9.3 80 10.6 9.8 9.1 9.3 9.3 9.2 9.5 10.7 n.i 10.9 9.1 11.1 10.0 10.4 11.8 10.6 10.8 10.7 160 320 640 1280 2560 12.8 ANS. MS r „ 1 ( i u a l = 0.3426, MS M r a i n , = 1.3943 (F, = 4.070*), MS„ cn , lty = 2.0905 (F„ = 6.1019**). 9.6 The following data are extracted from those of French (1976), who carried out a study of energy utilization in the pocket mouse I'eroynathus longimembris during hibernation at different temperatures. Is there evidence that the amount of food available affects the amount of energy consumed at different temperatures during hibernation? Restricted ,v C Animal no 1 ρ 3 4 Ad-libit um footl food IS C I'jicriiv used ike id ·/) 62.69 .54.07 65.73 62.98 Animal int. 5 6 7 8 s hnenjv used 1 kcal !l) •1 nnnal no. 72.60 70.97 13 14 74.32 53.02 15 16 c IS C hncrii r used (/«•«/,>/) 95.73 63.95 144.30 144.30 Animal no. 17 18 19 20 Enerij used \kcal;g\ 101.19 76.8 (S 74.08 81.40