U4.5-MultipleComparisonsExample

advertisement
Multiple Comparisons: Example
Study Objective:
Test the effect of six varieties of wheat to a
particular race of stem rust.
Treatment:
Wheat Variety
Levels:
A(i=1), B (i=2), C (i=3), D (i=4), E (i=5), F (i=6)
Experimental Unit:
Pot of well mixed potting soil.
Replication:
Four (4) pots per treatment, four(4) plants per
pot.
Randomization:
Varieties randomized to 24 pots (CRD)
Response:
Yield (Yij) (in grams) of wheat variety(i) at
maturity in pot (j).
Implementation Notes: Six seeds of a variety are planted in a pot.
Once plants emerge, the four most vigorous are
retained and inoculated with stem rust.
STA 6166 - MCP
1
Statistics and AOV Table
Rank Variety
5
4
6
2
3
1
A
B
C
D
E
F
Mean Yield
50.3
69.0
24.0
94.0
75.0
95.3
n1=n2=n3=n4=n5=n=4
ANOVA Table
Source
Variety
Error
df
5
18
MeanSquare
F
2976.44
24.80**
120.00
STA 6166 - MCP
2
H0 : 1   2   3   4   5   6
HA : i  i
for some i  i
Overall F-test indicates that we reject H0 and assume HA
Which mean is not equal to which other means.
Consider all possible comparisons between varieties:
yi  y j
First sort the treatment levels such that the level with the smallest
sample mean is first down to the level with the largest sample mean.
Then in a table (matrix) format, compute the differences for all of the
t(t-1)/2 possible pairs of level means.
t(t  1) 6(5)

 15
2
2
STA 6166 - MCP
3
Differences for all of the t(t-1)/2=15 possible pairs of level
means
Largest Difference
y A  yC
C
A
B
E
D
F
24.0
50.3
69.0
75.0
94.0
95.3
C
24.0
--
A
50.3
26.3
--
B
69.0
45.0
18.7
--
E
75.0
51.0
24.7
6.0
--
D
94.0
70.0
43.7
25.0
19.0
--
F
95.3
71.3
45.0
26.3
20.3
1.3
--
Smallest difference
Question: How big does the difference have to be
before we consider it “significantly big”?
STA 6166 - MCP
4
Fisher’s Protected LSD
F=24.8 > F5,18,.05=2.77 --> F is significant
LSD  t18 ,
C
A
B
E
D
F
‡
a
C
24.0
2
MSE   n2   t18 ,0.025 120   24   2.101 7.746  16.27
24.0
50.3
69.0
75.0
94.0
95.3
C
24.0
--
A
50.3
26.3 ‡
--
B
69.0
45.0 ‡
18.7 ‡
--
E
75.0
51.0 ‡
24.7 ‡
6.0
--
D
94.0
70.0
43.7
25.0
19.0
--
‡
‡
‡
‡
F
95.3
71.3 ‡
45.0 ‡
26.3 ‡
20.3 ‡
1.3
--
Implies that the two treatment level means are statistically
different at the  = 0.05 level.
b
A
50.3
c
B
69.0
c
E
75.0
d
D
94.0
d
F
95.3
Alternate ways to
indicate grouping
of means.
STA 6166 - MCP
5
Tukey’s W (Honestly Significant Difference)
Not protected hence no preliminary F test required.
W  q t , df error  MSE   1n   q0.05 6,18 120   14 
 4.49  5.477  24.59
C
A
B
E
D
F
‡
a
C
24.0
24.0
50.3
69.0
75.0
94.0
95.3
C
24.0
--
A
50.3
26.3 ‡
--
Table 10
B
69.0
45.0 ‡
18.7
--
E
75.0
51.0 ‡
24.7 ‡
6.0
--
D
94.0
70.0 ‡
43.7 ‡
25.0 ‡
19.0
--
F
95.3
71.3 ‡
45.0 ‡
26.3 ‡
20.3
1.3
--
Implies that the two treatment level means are statistically different at the  = 0.05 level.
b
A
50.3
bc
B
69.0
cd
E
75.0
d
D
94.0
d
F
95.3
STA 6166 - MCP
6
Student-Newman-Keul Procedure (SNK)
Not protected hence no preliminary F test required.
Wr  q r , df error  MSE   1n   q0.05 r ,18 30
neighbors
One between
Two between
Table 10
row Error df=18
 = 0.05
col = r
r
q(r,dferror)
2
2.97
3
3.61
4
4.00
5
4.28
6
4.49
Wr
16.27
19.77
21.91
23.44
24.59
STA 6166 - MCP
7
SNK
C
A
B
E
D
F
‡
24.0
50.3
69.0
75.0
94.0
95.3
r
q(r,nT-t)
2
2.97
3
3.61
4
4.00
5
4.28
6
4.49
Wr
16.27
19.77
21.91
23.44
24.59
A
50.3
26.3 ‡
--
B
69.0
45.0 ‡
18.7 ‡
--
E
75.0
51.0 ‡
24.7 ‡
6.0
--
D
94.0
70.0
43.7
25.0
19.0
--
F
95.3
71.3 ‡
45.0 ‡
26.3 ‡
20.3 ‡
1.3
--
C
24.0
--
‡
‡
‡
‡
Implies that the two treatment level means are statistically different at the  = 0.05 level.
a
C
24.0
b
A
50.3
c
B
69.0
c
E
75.0
d
D
94.0
d
F
95.3
STA 6166 - MCP
8
Duncan’s New Multiple Range Test (Passe)
Not protected hence no preliminary F test required.
Wr  q (r, df error ) MSE n  q (r,18) 30
neighbors
One between
Two between
Table 11 (next pages)
row error df = 18
 = 0.05
col = r
r
q'(r,dferror)
2
2.97
3
3.12
4
3.21
5
3.27
6
3.32
Wr
16.27
17.09
17.58
17.91
18.18
STA 6166 - MCP
9
Duncan’s
Test
Critical
values
STA 6166 - MCP
10
STA 6166 - MCP
11
Duncan’s MRT
C
A
B
E
D
F
‡
24.0
50.3
69.0
75.0
94.0
95.3
r
q'(r,nT-t)
2
2.97
3
3.12
4
3.21
5
3.27
6
3.32
Wr
16.27
17.09
17.58
17.91
18.18
C
24.0
--
A
50.3
26.3 ‡
--
B
69.0
45.0 ‡
18.7 ‡
--
E
75.0
51.0 ‡
24.7 ‡
6.0
--
D
94.0
70.0
43.7
25.0
19.0
--
‡
‡
‡
‡
F
95.3
71.3 ‡
45.0 ‡
26.3 ‡
20.3 ‡
1.3
--
Implies that the two treatment level means are statistically different at the  = 0.05 level.
a
C
24.0
b
A
50.3
c
B
69.0
c
E
75.0
d
D
94.0
d
F
95.3
STA 6166 - MCP
12
Scheffé’s S Method
F=24.8 > F5,18,.05=2.77 => F is significant
For comparing 1  2
l  (1)1  (1) 2  (0)3  (0) 4  (0)5  (0)6
l̂  (1)y1  ( 1)y 2  (0)y3  (0)y 4  (0)y 5  ( 0)y 6
2
2
2
2
2
2


1 1 
(
1
)
(

1
)
(
0
)
(
0
)
(
0
)
(
0
)
2
ˆ ˆ  MSE
V(l)






MSE


MSE




n 
n
n
n
n
n
n
n
n
 
2
3
4
5
6 
2
 1
 1
Reject Ho: l=0 at =0.05 if
l̂  S
 n
ˆ ˆ (t  1)F
2
S  V(l)
t 1,nT  t,  MSE
5 F5,18,0.05  60 (5)( 2.77)  28.82
Since each treatment is replicated the same number of time, S will be
the same for comparing any pair of treatment means.
STA 6166 - MCP
13
Scheffe’s S Method
Any difference larger than S=28.82 is significant.
C
A
B
E
D
F
24.0
50.3
69.0
75.0
94.0
95.3
C
24.0
--
A
50.3
26.3
--
B
69.0
45.0 ‡
18.7
--
E
75.0
51.0 ‡
24.7
6.0
--
D
94.0
70.0 ‡
43.7 ‡
25.0
19.0
--
F
95.3
71.3 ‡
45.0 ‡
26.3
20.3
1.3
--
‡ Implies that the two treatment level means are statistically different at the  = 0.05 level.
a
C
24.0
ab
A
50.3
bc
B
69.0
bc
E
75.0
c
D
94.0
c
F
95.3
Very conservative => Experimentwise error driven.
STA 6166 - MCP
14
Grouping of Ranked Means
C
24.0
A
50.3
B
69.0
E
75.0
D
94.0
F
95.3
LSD
SNK
Duncan’s
Tukey’s HSD
Scheffe’s S
Which grouping will you use?
1) What is your risk level?
2) Comparisonwise versus Experimentwise error concerns.
STA 6166 - MCP
15
So, which MC method should you use…?
There is famous story of a statistician and his two clients:
• Client 1 arrives daily with his hypothesis test and asks for assistance. The
statistician helps him using α=0.05. After 1 year they have done 365 tests.
If all nulls tested were indeed true, they would have made approx
(365)(0.05) = 18
erroneous rejections, but they are satisfied with the progress of the
research.
• Client 2 saves all his statistical analysis for end of the year, and
approaches the statistician for help. The statistician responds:
“My! You have a terrible multiple comparisons problem!”
In cases where the researcher is just searching the data (does not have an
interest in every comparison made), some form of error rate control beyond
the simple Fisher’s LSD may be appropriate. On the other hand, if you
definitely have an interest in every comparison, it may be better to use
LSD (and accept the comparison-wise error rate).
STA 6166 - MCP
16
Which method to use? Some practical advice
If comparisons were decided upon before examining the data (best):
• Just one comparison – use the standard (two-sample) t-test. (In this case
use the pooled estimate of the common variance, MSE, and it’s
corresponding error df. This is just Fisher’s LSD.)
• Few comparisons – use Bonferroni adjustment to the t-test. With m
comparisons, use /m for the critical value.
• Many comparisons – Bonferroni becomes increasingly conservative as m
increases. At some point it is better to use Tukey (for pairwise
comparisons) or Scheffe (for contrasts).
If comparisons were decided upon after examining the data:
• Just want pairwise comparisons – use Tukey.
• All contrasts (linear combinations of treatment means) – use Scheffe.
STA 6166 - MCP
17
Download