Multiple Comparisons: Example Study Objective: Test the effect of six varieties of wheat to a particular race of stem rust. Treatment: Wheat Variety Levels: A(i=1), B (i=2), C (i=3), D (i=4), E (i=5), F (i=6) Experimental Unit: Pot of well mixed potting soil. Replication: Four (4) pots per treatment, four(4) plants per pot. Randomization: Varieties randomized to 24 pots (CRD) Response: Yield (Yij) (in grams) of wheat variety(i) at maturity in pot (j). Implementation Notes: Six seeds of a variety are planted in a pot. Once plants emerge, the four most vigorous are retained and inoculated with stem rust. STA 6166 - MCP 1 Statistics and AOV Table Rank Variety 5 4 6 2 3 1 A B C D E F Mean Yield 50.3 69.0 24.0 94.0 75.0 95.3 n1=n2=n3=n4=n5=n=4 ANOVA Table Source Variety Error df 5 18 MeanSquare F 2976.44 24.80** 120.00 STA 6166 - MCP 2 H0 : 1 2 3 4 5 6 HA : i i for some i i Overall F-test indicates that we reject H0 and assume HA Which mean is not equal to which other means. Consider all possible comparisons between varieties: yi y j First sort the treatment levels such that the level with the smallest sample mean is first down to the level with the largest sample mean. Then in a table (matrix) format, compute the differences for all of the t(t-1)/2 possible pairs of level means. t(t 1) 6(5) 15 2 2 STA 6166 - MCP 3 Differences for all of the t(t-1)/2=15 possible pairs of level means Largest Difference y A yC C A B E D F 24.0 50.3 69.0 75.0 94.0 95.3 C 24.0 -- A 50.3 26.3 -- B 69.0 45.0 18.7 -- E 75.0 51.0 24.7 6.0 -- D 94.0 70.0 43.7 25.0 19.0 -- F 95.3 71.3 45.0 26.3 20.3 1.3 -- Smallest difference Question: How big does the difference have to be before we consider it “significantly big”? STA 6166 - MCP 4 Fisher’s Protected LSD F=24.8 > F5,18,.05=2.77 --> F is significant LSD t18 , C A B E D F ‡ a C 24.0 2 MSE n2 t18 ,0.025 120 24 2.101 7.746 16.27 24.0 50.3 69.0 75.0 94.0 95.3 C 24.0 -- A 50.3 26.3 ‡ -- B 69.0 45.0 ‡ 18.7 ‡ -- E 75.0 51.0 ‡ 24.7 ‡ 6.0 -- D 94.0 70.0 43.7 25.0 19.0 -- ‡ ‡ ‡ ‡ F 95.3 71.3 ‡ 45.0 ‡ 26.3 ‡ 20.3 ‡ 1.3 -- Implies that the two treatment level means are statistically different at the = 0.05 level. b A 50.3 c B 69.0 c E 75.0 d D 94.0 d F 95.3 Alternate ways to indicate grouping of means. STA 6166 - MCP 5 Tukey’s W (Honestly Significant Difference) Not protected hence no preliminary F test required. W q t , df error MSE 1n q0.05 6,18 120 14 4.49 5.477 24.59 C A B E D F ‡ a C 24.0 24.0 50.3 69.0 75.0 94.0 95.3 C 24.0 -- A 50.3 26.3 ‡ -- Table 10 B 69.0 45.0 ‡ 18.7 -- E 75.0 51.0 ‡ 24.7 ‡ 6.0 -- D 94.0 70.0 ‡ 43.7 ‡ 25.0 ‡ 19.0 -- F 95.3 71.3 ‡ 45.0 ‡ 26.3 ‡ 20.3 1.3 -- Implies that the two treatment level means are statistically different at the = 0.05 level. b A 50.3 bc B 69.0 cd E 75.0 d D 94.0 d F 95.3 STA 6166 - MCP 6 Student-Newman-Keul Procedure (SNK) Not protected hence no preliminary F test required. Wr q r , df error MSE 1n q0.05 r ,18 30 neighbors One between Two between Table 10 row Error df=18 = 0.05 col = r r q(r,dferror) 2 2.97 3 3.61 4 4.00 5 4.28 6 4.49 Wr 16.27 19.77 21.91 23.44 24.59 STA 6166 - MCP 7 SNK C A B E D F ‡ 24.0 50.3 69.0 75.0 94.0 95.3 r q(r,nT-t) 2 2.97 3 3.61 4 4.00 5 4.28 6 4.49 Wr 16.27 19.77 21.91 23.44 24.59 A 50.3 26.3 ‡ -- B 69.0 45.0 ‡ 18.7 ‡ -- E 75.0 51.0 ‡ 24.7 ‡ 6.0 -- D 94.0 70.0 43.7 25.0 19.0 -- F 95.3 71.3 ‡ 45.0 ‡ 26.3 ‡ 20.3 ‡ 1.3 -- C 24.0 -- ‡ ‡ ‡ ‡ Implies that the two treatment level means are statistically different at the = 0.05 level. a C 24.0 b A 50.3 c B 69.0 c E 75.0 d D 94.0 d F 95.3 STA 6166 - MCP 8 Duncan’s New Multiple Range Test (Passe) Not protected hence no preliminary F test required. Wr q (r, df error ) MSE n q (r,18) 30 neighbors One between Two between Table 11 (next pages) row error df = 18 = 0.05 col = r r q'(r,dferror) 2 2.97 3 3.12 4 3.21 5 3.27 6 3.32 Wr 16.27 17.09 17.58 17.91 18.18 STA 6166 - MCP 9 Duncan’s Test Critical values STA 6166 - MCP 10 STA 6166 - MCP 11 Duncan’s MRT C A B E D F ‡ 24.0 50.3 69.0 75.0 94.0 95.3 r q'(r,nT-t) 2 2.97 3 3.12 4 3.21 5 3.27 6 3.32 Wr 16.27 17.09 17.58 17.91 18.18 C 24.0 -- A 50.3 26.3 ‡ -- B 69.0 45.0 ‡ 18.7 ‡ -- E 75.0 51.0 ‡ 24.7 ‡ 6.0 -- D 94.0 70.0 43.7 25.0 19.0 -- ‡ ‡ ‡ ‡ F 95.3 71.3 ‡ 45.0 ‡ 26.3 ‡ 20.3 ‡ 1.3 -- Implies that the two treatment level means are statistically different at the = 0.05 level. a C 24.0 b A 50.3 c B 69.0 c E 75.0 d D 94.0 d F 95.3 STA 6166 - MCP 12 Scheffé’s S Method F=24.8 > F5,18,.05=2.77 => F is significant For comparing 1 2 l (1)1 (1) 2 (0)3 (0) 4 (0)5 (0)6 l̂ (1)y1 ( 1)y 2 (0)y3 (0)y 4 (0)y 5 ( 0)y 6 2 2 2 2 2 2 1 1 ( 1 ) ( 1 ) ( 0 ) ( 0 ) ( 0 ) ( 0 ) 2 ˆ ˆ MSE V(l) MSE MSE n n n n n n n n n 2 3 4 5 6 2 1 1 Reject Ho: l=0 at =0.05 if l̂ S n ˆ ˆ (t 1)F 2 S V(l) t 1,nT t, MSE 5 F5,18,0.05 60 (5)( 2.77) 28.82 Since each treatment is replicated the same number of time, S will be the same for comparing any pair of treatment means. STA 6166 - MCP 13 Scheffe’s S Method Any difference larger than S=28.82 is significant. C A B E D F 24.0 50.3 69.0 75.0 94.0 95.3 C 24.0 -- A 50.3 26.3 -- B 69.0 45.0 ‡ 18.7 -- E 75.0 51.0 ‡ 24.7 6.0 -- D 94.0 70.0 ‡ 43.7 ‡ 25.0 19.0 -- F 95.3 71.3 ‡ 45.0 ‡ 26.3 20.3 1.3 -- ‡ Implies that the two treatment level means are statistically different at the = 0.05 level. a C 24.0 ab A 50.3 bc B 69.0 bc E 75.0 c D 94.0 c F 95.3 Very conservative => Experimentwise error driven. STA 6166 - MCP 14 Grouping of Ranked Means C 24.0 A 50.3 B 69.0 E 75.0 D 94.0 F 95.3 LSD SNK Duncan’s Tukey’s HSD Scheffe’s S Which grouping will you use? 1) What is your risk level? 2) Comparisonwise versus Experimentwise error concerns. STA 6166 - MCP 15 So, which MC method should you use…? There is famous story of a statistician and his two clients: • Client 1 arrives daily with his hypothesis test and asks for assistance. The statistician helps him using α=0.05. After 1 year they have done 365 tests. If all nulls tested were indeed true, they would have made approx (365)(0.05) = 18 erroneous rejections, but they are satisfied with the progress of the research. • Client 2 saves all his statistical analysis for end of the year, and approaches the statistician for help. The statistician responds: “My! You have a terrible multiple comparisons problem!” In cases where the researcher is just searching the data (does not have an interest in every comparison made), some form of error rate control beyond the simple Fisher’s LSD may be appropriate. On the other hand, if you definitely have an interest in every comparison, it may be better to use LSD (and accept the comparison-wise error rate). STA 6166 - MCP 16 Which method to use? Some practical advice If comparisons were decided upon before examining the data (best): • Just one comparison – use the standard (two-sample) t-test. (In this case use the pooled estimate of the common variance, MSE, and it’s corresponding error df. This is just Fisher’s LSD.) • Few comparisons – use Bonferroni adjustment to the t-test. With m comparisons, use /m for the critical value. • Many comparisons – Bonferroni becomes increasingly conservative as m increases. At some point it is better to use Tukey (for pairwise comparisons) or Scheffe (for contrasts). If comparisons were decided upon after examining the data: • Just want pairwise comparisons – use Tukey. • All contrasts (linear combinations of treatment means) – use Scheffe. STA 6166 - MCP 17