Analysis of Variance 11/6 Comparing Several Groups Section: M 8-10 M 12-2 T 9-11 W 8-10 R 9-11 Exam 1 Mean: 76.6% 79.5% 82.5% 83.7% 82.9% • Do the group means differ? • Naive approach – Independent-samples t-tests of all pairs – Each test doesn't use all data Less power – 10 total tests Greater chance of Type I error • Analysis of Variance (ANOVA) – – – – Single test for any group differences Null hypothesis: All means are equal Works using variance of the sample means Also based on separating explained and unexplained variance Variance of Sample Means • Say we have k groups of n subjects each – Want to test whether mi are all equal for i = 1,…,k • Even if all population means are equal, sample means will vary – var(M) s 2 or n × var(M) n s 0.8 n 0.4 n s2 0.0 • Compare actual variance of sample means to amount expected by chance s2 p – Central limit theorem tells how much: s = 2 M -4 -2 m 0 2 4 z 2 – Estimate s2 using MSresidual n × var(M) – Test statistic: F = MSresidual • If F is large enough – Sample means vary more than expected by chance – Reject null hypothesis M1 M3 M2 Mk Residual Mean Square • Best estimate of population variance, s2 • Based on remaining variability after removing group differences • Extends MS for independent-samples t-test å ( X 1 - M 1) + å 2 2 groups: MSresidual = group 1 group 2 ( X 2 - M 2) n1 + n2 - 2 å ( X 1 - M 1 ) +… + å 2 k groups: MSresidual = 2 group 1 group k n1 +… + nk - k ( Xk - Mk ) 2 Example: Lab Sections • Group means: M = [76.6 82.5 83.7 79.5 82.9] • Variance of group means: mean(M ) = åi M i 5 = 81.0 s2 22 • Var(M) expected by chance: var( M ) = var(M ) vs. – Instead: åi ( M i - 81.0) 5 -1 = 8.69 s2 22 22 × var(M ) vs. s 2 191.1 vs. s2 å ( X - M 1) + å ( X - M 2 ) + å ( X - M 3) + å ( X - M 4 ) + å ( X - M 5) 2 • Estimate of s2: 2 MS = Sec 1 2 Sec 2 2 Sec 3 2 Sec 4 110 - 5 Sec 5 p p = .44 0.002 0.004 0.004 0.002 0.000 0.000 p F4, 105 = 0.006 0.006 • Ratio of between-group variance to amount expected by chance: P(F4, 105) 0 1 F = .96 2 3 Fcrit = 2.46 f[1:500] 4 5 0 1 F = .96 2 2 3 f[1:500] 4 5 = 199.1 191.1 = .96 199.1 Partitioning Sum of Squares • General approach to ANOVA uses sums of squares – Works with unequal sample sizes and other complications – Breaks total variability into explained and unexplained parts • SStotal – Sum of squares for all data, treated as a single sample – Based on differences from grand mean, M SStotal = å all groups • (X - M) 2 SSresidual (SSwithin) – Variability within groups, just like with independent-samples t-test – Not explainable by group differences æ ö 2 2 2 SSresidual = å çç å ( X i - M i ) ÷÷ = å ( X 1 - M 1) +… + å ( X k - M k ) i=1 to k è group i group 1 group k ø • SStreatment (SSbetween) – Variability explainable by difference among groups – Treat each datum as its group mean and take difference from grand mean æ 2ö SStreatment = å ç ni × ( M i - M ) ÷ è ø i=1 to k Magic of Squares SStotal = SStreatment + SSresidual M1 ×5 M ×6 M2 SStotal = S(X – M)2 ×5 ×4 M3 SSresidual = Si(S(Xi – Mi)2) M4 SStreatment = Si(ni(Mi – M)2) Test Statistic for ANOVA • Goal: Determine whether group differences account for more variability than expected by chance • Same approach as with regression • Calculate Mean Square: MStreatment • H0: MStreatment ≈ s2 • Estimate s2 using MSresidual = MStreatment • Test statistic: F = MSresidual SStreatment = df treatment SSresidual df residual Lab Sections Revisited • Two approaches – Variance of section means – Mean Square for treatment (lab section) • Variance of group means – Var(76.6 82.5 83.7 79.5 82.9) = 8.69 – Expected by chance: s2/22 – Var(M) vs. s2/22, or 22*var(M) vs. s2 • Mean Square for treatment 22 MStreatment = mean(M) åi n`i ( M i - M ) 2 = 22× var( M ) 5 -1 22× ( 76.6 - 81.0) + 22× ( 82.5 - 81.0) + 22× ( 83.7 - 81.0) + 22× (79.5 - 81.0) + 22× (82.9 - 81.0) 2 = F= 191.1 191.1 = = .96 MSresidual 199.1 2 2 4 2 2 = 191.1 Two Views of ANOVA s2 0.8 • Simple: Equal sample sizes var(M) s 2 = n×var(M) 0.4 F= s2 0.0 p n n -4 -2 n×var(M) = MSresidual m 0 2 4 z M1 M3 M2 Mk • General: Using sums of squares F= MStreatment MSresidual MStreatment If sample sizes equal: SStreatment åi ni ( M i - M ) = = df treatment k -1 2 åi ( M i - M ) 2 MStreatment = n× k -1 = n×var(M) Degrees of Freedom SStreatment » SS(M ) dftreatment = k – 1 SStotal = å( X - M ) 2 dftotal = Sni – 1 SS residual = å SS( X i ) i dfresidual = S(ni – 1) = Sni – k .