3.0 ANOVA - ANALYSIS OF VARIANCE (updated Spring 2001) ANOVA is extensively used in the science and engineering. Study the coagulation time for samples of blood drawn from 24 animals receiving 4 different diets. Does diet affect blood coagulation time? Twenty four (24) animals were randomly allocated to four diets (A,B,C, and D). The blood samples were taken and tested in random order. The analysis that follows is exactly justified if the data are random samples from four normal populations, and is approximately justified based on the randomization distn idea presented previously. 2 We will assume that the four populations have equal variance, σ y . Is there evidence to indicate real differences between the diets/treatments? H0: µ A = µ B = µ C = µ D . HA: At least one of the means differs from the others. In the following table, test order is listed in the parenthesis. DIET (Treatment) A B C D ----------------------------------------------------------------------63 (12) 62 (20) 68 (16) 56 (23) 67 (9) 60 (2) 66 (7) 62 (3) 71 (15) 63 (11) 71 (1) 60 (6) 64 (14) 59 (10) 67 (17) 61 (18) 65 (4) 68 (13) 63 (22) 66 (8) 68 (21) 64 (19) 63 (5) 59 (24) nA =6 Treat. Avg. Y A = 66 Grand Avg. 64 nB = 4 nC = 6 nD = 8 Y B = 61 Y C = 68 Y D = 61 Are the differences between the treatment averages greater than expected given the variation within treatments? 22 • Variation within treatments NA ∑ ( Y Aj – Y A ) 2 Sum of squares within treatment A S 2 j=1 - = -----A- = s A = -------------------------------------na – 1 νA Degree of freedom for treatment A 2 2 2 S 2 ( 63 – 66 ) + ( 67 – 66 ) + … + ( 66 – 66 ) 40 s A = ----------------------------------------------------------------------------------------------------- = -----A- = ------ = 8 vA 6–1 5 Similarly, S 2 10 s B = -----B- = ------ = 3.33 3 νB S 2 14 s C = -----C- = ------ = 2.8 νC 5 S 2 48 s D = -----D- = ------ = 6.857 νD 7 2 2 2 2 Since s A , s B , s C , and s D all provide estimates of the unknown variance, 2 σ y . They may be combined to provide a pooled estimate. 2 2 sP 2 2 2 S A + SB + SC + SD ν A s A + νB sB + νC sC + νD sD = -------------------------------------------------------------------= -------------------------------------------ν A + νB + νC + νD ν A + νB + νC + νD We’ll refer to this sample variance as the within treatment sample variance. SW 2 40 + 10 + 14 + 48 112 - = ------------------------------------------- = --------s w = -----νW 5+3+5+7 20 Within Treatment Sum of Squares = ------------------------------------------------------------------------------------------Within Treatment Degree of Freedom 2 s w = 5.6. It is the within treatment mean square estimate of the true 2 variance, σ y . One-Way ANOVA In general, 23 nI k ∑ ∑ ( Y ij – Y i ) Σ WithinTreatment = 2 i=1j=1 S WithinTreatment 2 sw = Total number of observations - Number of treatment Variation Between Treatments 2 If there is no difference between treatment means, a second estimate of σ y can be obtained based on the variation of the treatment means about y . = ( Y treatment – Y ) ∑ OverAllTreatments If we calculate ---------------------------------------------------------------------------------- , it can be used as a #Treatment - 1 2 2 estimate of σ y . 2 2 2 But we want an estimate of σ y . Since σ y = n σ y , let’s scale = 2 ( y treatment – y ) by ntreat. So, in general, k = ∑ ni ( Y treatment – Y ) 2 S 2 --------------------------------------------------------- = s BT = -----Bk–1 νB . i=1 where k is the number of treatments. It is the Between Treatment estimate 2 of σ y , or, the Between Treatment Mean Square. For our Problem, Treatment Yi = Y = (Y i – Y ) ni A 66 B 61 C 68 D 61 64 64 64 64 2 6 -3 4 4 6 -3 8 24 S B = 6(2)2 + 4(-3)2 +6(4)2 +8(-3)2 = 228, νB = 3, 2 228 S and s B = ------ = --------- = 76.0 3 νB Summary: Within Treatments: Between Treatments: 2 Sum of Squares: Degrees of Freedom: SW = 112 νW = 20 Mean Square: s W = 5.6 Sum of Square Degrees of Freedom SB = 228 νB = 3 Mean Square s B = 76.0 2 2 2 2 Under H0, both s W and s B are estimates of σ y If there is a difference between treatment means, the between treatment variance will be inflated. If a sample of size n is drawn from a normal 2 population with variance σ , The quantity, 2 s (n – 1) --------------------2 σ 2 2 2 2 s (n – 1) - can be follows the χ distribution. If σ is postulated, χ calc = --------------------2 σ 2 computed and compared to the existing values. If σ is not postulated, a 2 100(1- α )% confidence interval for σ can be calculated. 2 2 s (n – 1) 2 s (n – 1) --------------------- ≤ σ ≤ --------------------2 2 χ χ α α ν, 1 – --2 ν, --2 25 χ χ 2 χ α ν, --2 2 2 α ν, 1 – --2 • Sampling distn of Two Variances 2 If sample of size n1 is drawn from normal distn with a variance of σ 1 . and 2 sample of size n2 is drawn from normal distn with a variance of σ 2 , 2 2 estimates of the population variances may be calculated: s 1 and s 2 . 2 2 2 s1 χ ν1 s2 The quantity -----2- is -------- distributed and -----2- is ν1 σ1 σ2 2 χ v2 -------- distributed. The ratio ν2 2 χ ν1 ⁄ ν 1 ----------------- follows the F distn with ν1, ν2 degrees of freedom. 2 χ ν2 ⁄ ν 2 Therefore, 2 2 s1 ⁄ σ1 --------------~F ν1, ν 2 2 2 s2 ⁄ σ2 or 2 2 s1 σ2 ----2 -----2- ∼ F ν1, ν2 s2 σ1 2 2 If σ 1 = σ 2 , then 2 s1 ----~F ν1, ν2 2 s2 2 For the blood coagulation effect problem, s W = 5.6 (νw=20) and 2 s B = 76. 0 (νB=3) 2 2 2 Under H0, both are estimates of σ y , and F calc = s B ⁄ s W = 13.57 . Note 26 that this ratio should be distributed according to F ν1 = 3, ν2 = 20 under H0. From a F-distn table, it is found that F (3,20,.95) = 3.10 and F(3,20,.99) = 4.94. Since both numbers are less than Fcalc, strong evidence shows that 2 Fcalc is not a typical value. This means that s B is inflated by differences between µ A , µ B , µ C , and µ D . Therefore, reject H0. At least one of the means is different. To confirm ANOVA results, we would like to display Y’s on their reference distn. It may be difficult, since each Y is based on different Degrees of 2 Freedom ( σ y differs from each other). = Y 2 = 64 , s y = 5.6 , sy = 2.366. We need to calculate a t for each Y. A B C D ---------------------------------------------------------------66 61 68 61 6 4 6 8 0.966 1.183 0.966 0.837 2.070 -2.536 4.141 -3.584 Y n sY tcalc t20 -2.54 2.07 -3.58 -4 -3 -2 -1 0 1 2 4.14 3 4 5 The four calculated t points are plotted on a t20 distribution plot. From the plot, we can ask if these four t values be drawn randomly from a t20 distn. Decomposition of the variance The coagulation time for the ith treatment and jth trial within the treatment is Yij where i goes from 1 to k (k is the number of treatments/diets) and j goes from 1 to ni (ni is the number of trials for the ith treatment). The coagulation time, Yij, may be expressed as 27 = = Yij = Y + ( Y i – Y ) + ( Y ij – Y i ) Squaring both sides and summing over all trials and treatments, and after simplification, ni k ∑∑ ni k 2 Y ij = ∑ ∑Y ni ∑∑ 2 Y ij = 2 = NY i=1j=1 k STOT = k ni = ∑ ∑ (Y i – Y + i=1j=1 = 1 j= 1 k = 2 2 ) + i=1j=1 k + k ni ∑ ∑ ( Y ij – Y i ) 2 i=1j=1 = ∑ ni ( Y i – Y 2 ) + k ni ∑ ∑ ( Y ij – Y i ) 2 i=1j=1 i=1 ni ∑ ∑ Y ij , is the Total Sum of Squares 2 i=1j=1 = 2 SAVG = N Y k SB = = ∑ ni ( Y i – Y i=1 ni k SW = , is the Sum of Squares Due to the Mean 2 ) , is the Between Treatment Sum of Squares. ∑ ∑ ( Y ij – Y i ) 2 , is the Within Treatment Sum of Squares. i=1j=1 For our blood coagulation time example, STOT = 98644, SAVG = 98304, SB = 228, SW = 112. We observe that in fact SAVG + SB + SW = STOT Additionally, νTOT = νAVG + νB +νW (24) = (1) + (3) + (20) What we know can be summarized in an ANOVA Table Source of Var. SS. D.of F. Mean Sqr. Fcalc ------------------------------------------------------------------------------------Average 98304 1 98304 17554.3 Between Treat. 228 3 76 13.57 Within Treat. 112 20 5.6 Total 98644 24 In the above table, the average mean square error (98304) is an estimate of 2 σ y under the assumption that µy=0. The Between Treatment Mean Square 2 Error (76) is an estimate of σ y under the assumption that 28 µ A = µB = µC = µD . To test the hypothesis that µ A = µ B = µ C = µ D , compare Between/ Within Treatment Mean Square 2 sB ----- = 13.57 = F calc 2 sW to F ν1 = 3, ν2 = 20 . F(3,20,.95) = 3.10 and F (3,20,.99) = 4.94. To test the hypothesis that µ y = 0 , compare average and within treatment mean squares 2 s AVG ----------- = 17554.3 to F V 1 = 3, V 2 = 20 . F (1,20,.95) = 4.35 and F (1,20,.99) = 2 sW 8.10. So, the average is significant, so as are the treatments. Earlier, we expressed the response as: = = Y ij = Y + ( Y i – Y ) + ( Y ij – Y i ) = = where Y is the grand mean. ( Y i – Y ) is treatment effect, and ( Y ij – Y i ) is experimental error. y = η + τi + ε = y is an estimate of η . A model for the response is: ŷ = η̂ + τˆi In blood coagulation experiment, we studied one variable, diet. We will now look at two variables (1 blocking variables and 1 treatment variable or 2 blocking variables). 29