6.3 - ANOVA

CHAPTER 6 Statistical Inference & Hypothesis Testing • 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π • 6.2 - Two Samples  Means, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π2 • 6.3 - Multiple Samples  Means, Variances, μ1, …, μk σ12, …, σk2 Proportions π1, …, πk CHAPTER 6 Statistical Inference & Hypothesis Testing • 6.1 - One Sample  Mean μ, Variance σ 2, Proportion π • 6.2 - Two Samples  Means, Variances, Proportions μ1 vs. μ2 σ12 vs. σ22 π1 vs. π2 • 6.3 - Multiple Samples  Means, Variances, μ1, …, μk σ12, …, σk2 Proportions π1, …, πk Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • Analysis via T-test (if equivariance holds): “Group Means” y1  “Group 2 Variances” s1  5 2 ( 667  630 ) 2  2 s ppooled  o o led 2 2  ( 604  630 ) 2 5 1 SS1 s2 = SS/df Pooled Variance 667  653  614  612  604  630 Sample 2 = {593, 525, 520}; n2 = 3 y  Point estimates y2   788.5 s 2  2 593  525  520 3 2 ( 593  546 ) 2   y1  y 2  84  546 2  ( 520  546 ) 2 3 1 yi / n  1663 F  1663 788.5 NOTE: >0   2.11  4 SS2 2 2 ( 5n11)( 3 s1)( 1663 ) 1) 7s18 8.5()n2 ( 1) 2 n1  n52 32 2  1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3 • Analysis via T-test (if equivariance holds): Point estimates “Group Means” y1  “Group 2 Variances” s1  5 2 ( 667  630 ) 2  s2 = SS/df Pooled Variance 667  653  614  612  604 2  ( 604  630 ) 2 5 1  630 y2   788.5 s 2  2 593  525  520 3 2 ( 593  546 ) 2  y   yi / n y1  y 2  84  546 2  ( 520  5546 46 ) 2 3 1  1663 F  1663 788.5 NOTE: >0  2.11  4 SSErr = 6480 2 s ppooled  o o led 2 2 2 ( 5n11)( 3 s1)( 1663 ) 1) 7s18 8. 5()n2 ( 1) 2 n1  n5232 2 dfErr = 6 Standard s.e.0  Error 2 pooled s 1080 11 1 1   24 5n1 3 n 2  1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. p-value = 2 P (Y1  Y2  84 )  2 P  T6  84  0 24   2 P  T6  3.5 > 2 * (1 - pt(3.5, 6)) Reject H0 at α = .05 stat signif, Hosp > Clinic [1] 0.01282634  R code: > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > t.test(y1, y2, var.equal = T) Formal Conclusion Two Sample t-test p-value < α = .05 Reject H0 at this level. data: y1 and y2 t = 3.5, df = 6, p-value = 0.01283 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 25.27412 142.72588 Interpretation sample estimates: mean of x mean of y The samples provide evidence that the 630 546 difference between mean costs is (moderately) statistically significant, at the 5% level, with the hospital being higher than the clinic (by an average of $84). Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… “Total Variability” = “Variability between groups” + “Variability within groups” Y1 Y2 Yk k 1 1 H0 : 1 k 2 2 = 2 = HA: “At least one ‘treatment mean’ μi is significantly different from the others. = k Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604 5  630 Sample 2 = {593, 525, 520}; n2 = 3 Point estimates y2  593  525  520 3 y   546 3 (546) 5 (630) “Grand Mean” y  667  653  614  612  604  593  525  520 53  598.50 The grand mean is a weighted average of the group means, using the sample sizes as the weights.  yi / n y1  y 2  84 NOTE: >0 Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… “Total Variability” = “Variability between groups” + “Variability within groups” Y1 Y2 Yk k 1 1 H0 : 1 k 2 2 = 2 = HA: “At least one ‘treatment mean’ μi is significantly different from the others. = k Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  “Grand Mean” 667  653  614  612  604 5 y   630 Point estimates y2  5 ( 630 )  3( 546 ) 53 Sample 2 = {593, 525, 520}; n2 = 3 593  525  520 3 y   546  598.50 How far is the “total” sample from the grand mean?  yi / n Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604 5 y  “Grand Mean” SSTot = (667  598.5 )  630 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 3 y   yi / n  546  598.50  (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2  (593  598.5 )  (525  598.5 )  (520  598.5 ) 2 Sample 2 = {593, 525, 520}; n2 = 3 2 2 2 2 = 19710 2 dfTot = (5+3) –1 = 7 Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… “Total Variability” = “Variability between groups” + “Variability within groups” Y1 Y2 Yk k 1 1 H0 : 1 k 2 2 = How can we measure this? 2 = = k Imagine zero variability within groups… Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… “Total Variability” = “Variability between groups” + “Variability within groups” Y1 Y2 Yk k 1 1 H0 : 1 k 2 2 = How can we measure this? 2 = = k Imagine zero variability within groups… Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, {630, 653, 630, 614, 630, 612, 630, 630 604}; } n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604 5 y  “Grand Mean” SSTot = (667  598.5 )  630 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 3 y   yi / n  546  598.50  (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2 2  (593  598.5 )  (525  598.5 )  (520  598.5 ) 2 SSTrt = Sample 2 = {593, {546, 525, 546, 520}; 546} n2 = 3 2 5 ( 630  598.5 )  3 ( 546  598.5 ) 2 “The Clonemaster” 2 = 13230 2 2 = 19710 2 dfTot = (5+3) –1 = 7 dfTrt = (2) –1 =1 Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… “Total Variability” = “Variability between groups” + “Variability within groups” Y1 Y2 Yk k 1 1 H0 : 1 k 2 2 = 2 = = k Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604 5 y  “Grand Mean” SSTot = (667  598.5 )  630 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 3 y   yi / n  546  598.50  (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2 2  (593  598.5 )  (525  598.5 )  (520  598.5 ) 2 SSTrt = Sample 2 = {593, 525, 520}; n2 = 3 2 5 ( 630  598.5 )  3 ( 546  598.5 ) 2 2 2 2 = 19710 = 13230 How far is each sample from its own group mean? 2 dfTot = (5+3) –1 = 7 dfTrt = (2) –1 =1 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604  630 5 y  “Grand Mean” SSTot = (667  598.5 ) 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 2 2 2 2 2 2 dfTrt = (2) –1  (653  630 )  (614  630 )  (612  630 )  (604  630 ) 2 2 2 2 2 dfTot = (5+3) –1 = 7 = 19710 = 13230  (593  546 )  (525  546 )  (520  546 ) 2 yi / n  598.50 2 5 ( 630  598.5 )  3 ( 546  598.5 ) 2   (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2 SSErr = (667  630 ) y   546 3  (593  598.5 )  (525  598.5 )  (520  598.5 ) SSTrt = Sample 2 = {593, 525, 520}; n2 = 3 2 2 =1 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3 • Analysis via T-test (if equivariance holds): Point estimates “Group Means” y1  “Group 2 Variances” s1  5 2 ( 667  630 )  2 s ppooled  o o led 2  (6 04  630 ) 604 2 5 1 SS1 s2 = SS/df Pooled Variance 667  653  614  612  604  630 y2   788.5 s 2  2 593  525  520 3 2 ( 593  546 ) 2  y   y1  y 2  84  546 2  ( 520  5546 46 ) 2 3 1 yi / n  1663 F  1663 788.5 NOTE: >0  2.11  4 SS2 2 2 ( 5n11)( 3 s1)( 1663 ) 1) 7s18 8.5()n2 ( 1) 2 n1  n52 32 2  1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 Sample 2 = {593, 525, 520}; n2 = 3 • Analysis via T-test (if equivariance holds): Point estimates “Group Means” y1  “Group 2 Variances” s1  5 2 ( 667  630 )  s2 = SS/df Pooled Variance 667  653  614  612  604  (6 04  630 ) 604 2 5 1  630 y2   788.5 s 2  2 593  525  520 3 2 ( 593  546 ) 2  y   y1  y 2  84  546 2  ( 520  5546 46 ) 2 3 1 yi / n  1663 F  1663 788.5 NOTE: >0  2.11  4 SSErr = 6480 2 s ppooled  o o led 2 2 2 ( 5n11)( 3 s1)( 1663 ) 1) 7s18 8.5()n2 ( 1) 2 n1  n52 32 2 dfErr = 6  1080 The pooled variance is a weighted average of the group variances, using the degrees of freedom as the weights. Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604  630 5 y  “Grand Mean” SSTot = (667  598.5 ) 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 2 2 2 2 2 2 dfTrt = (2) –1  (653  630 )  (614  630 )  (612  630 )  (604  630 ) 2 2 2 2 2 dfTot = (5+3) –1 = 7 = 19710 = 13230  (593  546 )  (525  546 )  (520  546 ) 2 yi / n  598.50 2 5 ( 630  598.5 )  3 ( 546  598.5 ) 2   (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2 SSErr = (667  630 ) y   546 3  (593  598.5 )  (525  598.5 )  (520  598.5 ) SSTrt = Sample 2 = {593, 525, 520}; n2 = 3 2 2 =1 Example: Y = “$ Cost of a certain medical service” Assume Y is known to be normally distributed at each of k = 2 health care facilities (“groups”). Hospital: Y1 ~ N(μ1, σ1) Clinic: Y2 ~ N(μ2, σ2) • Null Hypothesis H0: μ1 = μ2, i.e., μ1 – μ2 = 0 (“No difference exists.") 2-sided test at significance level α = .05 • Data: Sample 1 = {667, 653, 614, 612, 604}; n1 = 5 • ANOVA F-test (if equivariance holds):  “Group Means” y1  667  653  614  612  604 5 y  “Grand Mean” SSTot = (667  598.5 )  630 2 Point estimates y2  5 ( 630 )  3( 546 ) 53 593  525  520 3 y   yi / n  546  598.50  (653  598.5 )  (614  598.5 )  (612  598.5 )  (604  598.5 ) 2 2  (593  598.5 )  (525  598.5 )  (520  598.5 ) 2 SSTrt = Sample 2 = {593, 525, 520}; n2 = 3 2 5 ( 630  598.5 )  3 ( 546  598.5 ) 2 2 = 13230 SSErr = 4 ( 788.5 )  2 (1663 ) = 6480 SSTot = SSTrt + SSErr 2 2 = 19710 2 dfTot = (5+3) –1 = 7 dfTrt = (2) –1 =1 dfErr = (5+3) –2 = 6 dfTot = dfTrt + dfErr SSTot = SSTrt + SSErr Tot dfTot = dfTrt + dfErr Err Trt MS  ANOVA Table SS F  M S T rt F1,6 M S E rr df 12.25 Source Treatment df SS MS 1 13230 13230   s b etw een  F-ratio p-value 12.25 .01282634 2 Error Total 6 7 6480 19710 SSTot = SSTrt + SSErr 1080   s w2 ith in  – on F1, 6 1–pf(12.25, 1, 6) F-table: comp w/ α Note: 2 This is also s p o o led . dfTot = dfTrt + dfErr SSTot = SSTrt + SSErr dfTot = dfTrt + dfErr Err MS  ANOVA Table SS df F  M S T rt F1,6 M S E rr 12.25 Source Treatment df SS 1 MS F-ratio p-value 12.25 .01282634 13230   s b etw een  2 Error Total 6 6480 on F1, 6 1–pf(12.25, 1, 6) F-table: comp w/ α – 7 Thus, the treatment accounts for 1080   s w2 ith in  13230 19710 = 67.1% of the total variability in the response Y. R code: # ANOVA FOR UNBALANCED DESIGN > y1 = c(667, 653, 614, 612, 604) > y2 = c(593, 525, 520) > > Data = data.frame( + Y = c(y1, y2), + X = factor(rep(c("y1", "y2"), times = c(length(y1), length(y2)))) + ) > > var.test(Y ~ X, data = Data) # EQUIVARIANCE? F test to compare two variances data: Y by X F = 0.4741, num df = 4, denom df = 2, p-value = 0.4738 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.01208057 5.04920249 sample estimates: ratio of variances 0.4741431  R code: # ANOVA FOR UNBALANCED DESIGN > out = aov(Y ~ X, data = Data) > anova(out) Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) X 1 13230 13230 12.25 0.01283 * Residuals 6 6480 1080 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Note: Vis-à-vis T-test vs. F-test, • p-value is the same using either method (.01283), since the sample is unchanged! • The square of the Tdf -score (3.5) is equal to the F1, df -score (12.25). (Recall that the square of the Z-score is equal to the  1 -score.) 2 Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… Y1 Y2 Yk k 1 1 H0: 1 k 2 2 = 2 = = k Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… • Equivariance can be tested via very similar “two variances” F-test in 6.2.2 (but this is very sensitive to normality assumption), or others. If violated, can extend Welch Test for two means. Y1 Y2 Yk k 1 1 H0: 1 k 2 2 = 2 = = k Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… • Normality can be tested via usual methods. If violated, use nonparametric Kruskal-Wallis Test. Y1 Y2 Yk k 1 1 H0: 1 k 2 2 = 2 = = k Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… • Extensions of ANOVA for data in matched “blocks” designs, repeated measures, multiple factor levels within groups, etc. Y1 Y2 Yk k 1 1 H0: 1 k 2 2 = 2 = = k Alternate method ~  Main Idea: Among several (k  2) independent, equivariant, normally-distributed “treatment groups”… • How to identify significant group(s)? Pairwise testing, with correction (e.g., Bonferroni) for spurious significance. • Example: k = 5 groups result in 10 such tests, so let each α* = α / 10. Y1 Y2 Yk k 1 1 H0: 1 k 2 2 = 2 = = k

6.3 - ANOVA

Related documents

Products

Support

6.3 - ANOVA

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib