Outline The ANOVA F-Test for Comparing k > 2 Populations

advertisement
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
Lab 11: Comparing k(> 2) Populations
Michael Akritas
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
I
I
I
One-way ANOVA refers to the methodology for testing
H0 : µ1 = · · · = µk vs the alternative that not all are equal.
Copy the following data and paste it in C1-C4:
http://www.stat.psu.edu/∼mga/401/Data/anova.fe.
data.txt
The data are about total Fe for four types of iron formation
(1= carbonate, 2= silicate, 3= magnetite, 4= hematite).
Use the command sequence
Stat>ANOVA>One-way (Unstacked)>Enter C1-C4 for
Response, 95 for confidence level>OK.
If all the data are in one column, say C1, there must also be a
second column, say C2, which indicates the group membership
of each observation. The command sequence in this case is:
Stat>ANOVA>One-way>Enter C1 for Response and C2 for
Factor, 95 for confidence level>OK
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
The output produced is:
I
I
I
I
One-way ANOVA: C1, C2, C3, C4
ANOVA Table
Source DF
SS
MS
F
P
Factor
3
509.1 169.7 10.85 0.000
Error
36 563.1
15.6
Total
39 1072.3
S = 3.955 R-Sq = 47.48% R-Sq(adj) = 43.10%
The ANOVA table decomposes the total sum of squares into a
sum of squares due to the population differences (Factor) and
a sum of squares due to the intrinsic error.
Thus, 509.1 + 563.1 = 1072.3 (not really, due to rounding).
The DF for Factor and Error sum to the DF for Total. Thus,
3 + 36 = 39.
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
The DF for Factor equals the number of factor levels (or
populations) minus one.
I
The DF for Total equals the total sample size minus one.
I
MS = SS/DF.
I
The F statistic is the ratio of the MS for Factor over MS for
error. Thus, 10.85 = 169.7/15.6
I
Because the p-value is small, the hypothesis of equality of the
population means is rejected.
I
The estimate of the standard deviation, which is assumed to
be the same in all populations, is S=3.955.
I
The R-Sq, which has the same significance as explained in the
activity for regression, but it is not as popular in ANOVA.
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
When H0 is rejected we are confident that at least one of the
population means is different from the others.
I
When k > 2, additional testing needs to be done to identify
which are the means that differ. This additional testing is
called multiple comparisons.
I
It involves performing all pair-wise comparisons in such a way
that the probability of committing one or more type I errors
does not exceed the designated α.
I
One of the ways of doing multiple comparisons, is to perform
the pair-wise comparison, at an adjusted level of significance.
The adjusted level equals the designated alpha divided by the
total number of pair-wise comparisons. This is called the
Bonferroni method.
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
I
I
Here we will demonstrate the Tukey method:
Stat >ANOVA>One-way (Unstacked)>Enter C1-C4 for
Response, 95 for confidence level>Click Comparisons select
Tukey’s, enter family error rate (5 for overall level of
significance 0.05)>OK>OK
The additional Minitab output is:
Tukey 95% Simultaneous CIs for All Pairwise Comparisons
Individual confidence level = 98.93%
C1 subtracted from:
Lower Center Upper
C2 -6.155 -1.390 3.375
C3 -0.895 3.870
8.635
C4 2.995 7.760 12.525
The above are simultaneous CI for the contrasts µ1 − µ2 ,
µ1 − µ3 , and µ1 − µ4 .
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
I
I
If a CI does not contain 0, the two means are declared
significantly different. Thus, µ1 is significantly different from
µ4 , but not significantly different from µ2 or from µ3 .
C2 subtracted from:
Lower Center Upper
C3 0.495 5.260 10.025
C4 4.385 9.150 13.915
These are simultaneous CI for µ2 − µ3 and µ2 − µ4 . None of
these CI contains zero, and thus µ2 is significantly different
from both µ3 and µ4 .
C3 subtracted from:
Lower Center Upper
C4 -0.875 3.890 8.655
Since the CI contains zero, µ3 is not significantly different
from µ4 .
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
The ANOVA F test is valid only if the population variances
are equal (homoscedastic) and either the population
distributions are normal or the sample sizes are large.
I
Moreover, the F test is most powerful only when the k
population distributions are normal and homoscedastic.
I
The Kruskal-Wallis test is nearly as powerful as the F test
under normality and homoscedasticity, but can be much more
powerful than the F test when the population distributions are
non-normal.
I
The Kruskal-Wallis procedure consists of combining the data
from the k populations and ranking the combined data set
from smallest to largest. The ranks are then used to compute
the Kruskal-Wallis statistic.
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
Copy the following data and paste it in C5, C6, C7:
http://www.stat.psu.edu/∼mga/401/Data/k-w.
cortisol.data.txt
I
To use the Kruskal-Wallis procedure, the data must be
stacked.
Data>Stack>Columns, enter C5-C7 under Stack the
Following Columns, click on Column of Current Worksheet
and enter C8, enter C9 in Store Subscripts in>OK
I
Then use the sequence of commands:
Stat>Nonparametrics>Kruskal-Wallis, enter C4 for Response,
and C5 for Factor>OK
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
The output is:
I
I
Kruskal-Wallis Test on C4
Kruskal-Wallis Test on C4
C5
N Median Ave Rank
Z
C1
10 305.5
6.9
-3.03
C2
6
460.0
15.0
1.55
C3
6
729.5
15.7
1.84
Overall 22
11.5
H = 9.23 DF = 2 P = 0.010
The est statistic is H = 9.23, and it corresponds to p-value of
0.010. Thus, H0 can be rejected at level α = 0.05.
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
I
Data>Rank>Enter C4 in ”Rank data in:”, Enter C6 in ”Store
ranks in:”>OK
Stat >ANOVA>One-way>Enter C6 for Response, C5 for
Factor, 95 for Confidence level>Click Comparisons select
Tukey’s, enter family error rate >OK>OK
Tukey 95% Simultaneous CIs for All Pairwise Comparisons
Individual confidence level = 98.00%
C5 = C1 subtracted from:
C5 Lower Center Upper
C2 1.401 8.100 14.799 (Thus, µ1 6= µ2 , µ3 .)
C3 2.067 8.767 15.466
C5 = C2 subtracted from:
C5 Lower Center Upper
(Thus, µ2 is not significantly
C3 -6.823 0.667 8.157
different from µ3 .)
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Outline
The ANOVA F-Test for Comparing k > 2 Populations
Multiple Comparisons for One-Way ANOVA
The Kruskal-Wallis Test for Comparing k > 2 Populations
Tukey’s Multiple Comparisons on the Ranks
Homework
I
Perform the Kruskal-Wallis test, and Tukey’s multiple
comparisons on the ranks, using the data:
http://www.stat.psu.edu/∼mga/401/Data/anova.fe.
data.txt
Compare the results (p-value for the test of equality, and the
conclusions from the multiple comparisons on the ranks) with
the analysis we did using the ANOVA approach.
I
Perform the ANOVA test, and Tukey’s multiple comparisons,
using the data:
http://www.stat.psu.edu/∼mga/401/Data/k-w.
cortisol.data.txt
Compare the results with the analysis we did using the
Kruskal-Wallis approach. If different, which results do you
trust most?
Michael Akritas
Lab 11: Comparing k(> 2) Populations
Download