Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework Lab 11: Comparing k(> 2) Populations Michael Akritas Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I I I I One-way ANOVA refers to the methodology for testing H0 : µ1 = · · · = µk vs the alternative that not all are equal. Copy the following data and paste it in C1-C4: http://www.stat.psu.edu/∼mga/401/Data/anova.fe. data.txt The data are about total Fe for four types of iron formation (1= carbonate, 2= silicate, 3= magnetite, 4= hematite). Use the command sequence Stat>ANOVA>One-way (Unstacked)>Enter C1-C4 for Response, 95 for confidence level>OK. If all the data are in one column, say C1, there must also be a second column, say C2, which indicates the group membership of each observation. The command sequence in this case is: Stat>ANOVA>One-way>Enter C1 for Response and C2 for Factor, 95 for confidence level>OK Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework The output produced is: I I I I One-way ANOVA: C1, C2, C3, C4 ANOVA Table Source DF SS MS F P Factor 3 509.1 169.7 10.85 0.000 Error 36 563.1 15.6 Total 39 1072.3 S = 3.955 R-Sq = 47.48% R-Sq(adj) = 43.10% The ANOVA table decomposes the total sum of squares into a sum of squares due to the population differences (Factor) and a sum of squares due to the intrinsic error. Thus, 509.1 + 563.1 = 1072.3 (not really, due to rounding). The DF for Factor and Error sum to the DF for Total. Thus, 3 + 36 = 39. Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I The DF for Factor equals the number of factor levels (or populations) minus one. I The DF for Total equals the total sample size minus one. I MS = SS/DF. I The F statistic is the ratio of the MS for Factor over MS for error. Thus, 10.85 = 169.7/15.6 I Because the p-value is small, the hypothesis of equality of the population means is rejected. I The estimate of the standard deviation, which is assumed to be the same in all populations, is S=3.955. I The R-Sq, which has the same significance as explained in the activity for regression, but it is not as popular in ANOVA. Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I When H0 is rejected we are confident that at least one of the population means is different from the others. I When k > 2, additional testing needs to be done to identify which are the means that differ. This additional testing is called multiple comparisons. I It involves performing all pair-wise comparisons in such a way that the probability of committing one or more type I errors does not exceed the designated α. I One of the ways of doing multiple comparisons, is to perform the pair-wise comparison, at an adjusted level of significance. The adjusted level equals the designated alpha divided by the total number of pair-wise comparisons. This is called the Bonferroni method. Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I I I Here we will demonstrate the Tukey method: Stat >ANOVA>One-way (Unstacked)>Enter C1-C4 for Response, 95 for confidence level>Click Comparisons select Tukey’s, enter family error rate (5 for overall level of significance 0.05)>OK>OK The additional Minitab output is: Tukey 95% Simultaneous CIs for All Pairwise Comparisons Individual confidence level = 98.93% C1 subtracted from: Lower Center Upper C2 -6.155 -1.390 3.375 C3 -0.895 3.870 8.635 C4 2.995 7.760 12.525 The above are simultaneous CI for the contrasts µ1 − µ2 , µ1 − µ3 , and µ1 − µ4 . Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I I I If a CI does not contain 0, the two means are declared significantly different. Thus, µ1 is significantly different from µ4 , but not significantly different from µ2 or from µ3 . C2 subtracted from: Lower Center Upper C3 0.495 5.260 10.025 C4 4.385 9.150 13.915 These are simultaneous CI for µ2 − µ3 and µ2 − µ4 . None of these CI contains zero, and thus µ2 is significantly different from both µ3 and µ4 . C3 subtracted from: Lower Center Upper C4 -0.875 3.890 8.655 Since the CI contains zero, µ3 is not significantly different from µ4 . Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I The ANOVA F test is valid only if the population variances are equal (homoscedastic) and either the population distributions are normal or the sample sizes are large. I Moreover, the F test is most powerful only when the k population distributions are normal and homoscedastic. I The Kruskal-Wallis test is nearly as powerful as the F test under normality and homoscedasticity, but can be much more powerful than the F test when the population distributions are non-normal. I The Kruskal-Wallis procedure consists of combining the data from the k populations and ranking the combined data set from smallest to largest. The ranks are then used to compute the Kruskal-Wallis statistic. Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I Copy the following data and paste it in C5, C6, C7: http://www.stat.psu.edu/∼mga/401/Data/k-w. cortisol.data.txt I To use the Kruskal-Wallis procedure, the data must be stacked. Data>Stack>Columns, enter C5-C7 under Stack the Following Columns, click on Column of Current Worksheet and enter C8, enter C9 in Store Subscripts in>OK I Then use the sequence of commands: Stat>Nonparametrics>Kruskal-Wallis, enter C4 for Response, and C5 for Factor>OK Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework The output is: I I Kruskal-Wallis Test on C4 Kruskal-Wallis Test on C4 C5 N Median Ave Rank Z C1 10 305.5 6.9 -3.03 C2 6 460.0 15.0 1.55 C3 6 729.5 15.7 1.84 Overall 22 11.5 H = 9.23 DF = 2 P = 0.010 The est statistic is H = 9.23, and it corresponds to p-value of 0.010. Thus, H0 can be rejected at level α = 0.05. Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I I Data>Rank>Enter C4 in ”Rank data in:”, Enter C6 in ”Store ranks in:”>OK Stat >ANOVA>One-way>Enter C6 for Response, C5 for Factor, 95 for Confidence level>Click Comparisons select Tukey’s, enter family error rate >OK>OK Tukey 95% Simultaneous CIs for All Pairwise Comparisons Individual confidence level = 98.00% C5 = C1 subtracted from: C5 Lower Center Upper C2 1.401 8.100 14.799 (Thus, µ1 6= µ2 , µ3 .) C3 2.067 8.767 15.466 C5 = C2 subtracted from: C5 Lower Center Upper (Thus, µ2 is not significantly C3 -6.823 0.667 8.157 different from µ3 .) Michael Akritas Lab 11: Comparing k(> 2) Populations Outline The ANOVA F-Test for Comparing k > 2 Populations Multiple Comparisons for One-Way ANOVA The Kruskal-Wallis Test for Comparing k > 2 Populations Tukey’s Multiple Comparisons on the Ranks Homework I Perform the Kruskal-Wallis test, and Tukey’s multiple comparisons on the ranks, using the data: http://www.stat.psu.edu/∼mga/401/Data/anova.fe. data.txt Compare the results (p-value for the test of equality, and the conclusions from the multiple comparisons on the ranks) with the analysis we did using the ANOVA approach. I Perform the ANOVA test, and Tukey’s multiple comparisons, using the data: http://www.stat.psu.edu/∼mga/401/Data/k-w. cortisol.data.txt Compare the results with the analysis we did using the Kruskal-Wallis approach. If different, which results do you trust most? Michael Akritas Lab 11: Comparing k(> 2) Populations