Multiple Comparison Procedures Comfort Ratings of 13 Fabric Types A.V. Cardello, C. Winterhalter, and H.G. Schultz (2003). "Predicting the Handle and Comfort of Military Clothing Fabrics from Sensory and Instrumental Data: Development and Application of New Psychophysical Methods," Textile Research Journal, Vol. 73, pp. 221-237. Treatments Means and standard deviations of 45 comfort ratings for 13 military fabrics. Fabric types: 10R - 50/50 Nylon/combed cotton, ripstop poplin weave 11A - 50/50 Nylon/Polyester, oxford weave (Australian) 12T - 50/50 Nylon/cotton, twill weave 13P - 92/5/3 Nomex/Kevlar/P140, plain weave 14N - 100 Cotton (former flame retardant treated) 15B - 77/23 Cotton sheath/synthetic core, twill (UK) 16C - 100 combed cotton, ripstop poplin (former hot weather BDU) 17C - 65/35 Wool/Polyester, plain weave (Canada-unlaundered) 18L - 65/35 Wool/Polyester, plain weave (Canada-laundered) 19N - 92/5/3 Nomex/Kevlar/P140, oxford weave 20J - Carded cotton sheath/nylon core, plain weave (Canada) 124 - 100 Pima cotton ripstop poplin (experimental) 176 - 50/50 Nylon carded cotton ripstop poplin weave Multiple Comparisons • • • • • • • Individual & Combined Null Hypotheses (H0 H01… H0k) Comparisonwise Error Rate Pr(Reject H0i|H0i True) Experimentwise Error Rate Pr(Reject any H0i|All H0i True) False Discovery Rate E(# False Rejects/Total Rejections) Strong Familywise Error Rate Pr(Any False Discoveries) Simultaneous Confidence Intervals Pr(All Correct)=1-e Multiple Comparison Procedures control Type 1 Error Rate other than per comparison True State \ Test Result H0 True H0 False Total FWER=P(V>=1)=1-P(V=0) FDR=E(V/R) Do Not Reject H0 U T m-R=U+T Reject H0 V S R=V+S Total m0 m-m0 m Data and Analysis of Variance Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Sum ANOVA Source Fabric Error Total df 12 572 584 Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 #N/A SD 44.8 40.3 29.4 25.0 31.0 26.2 27.1 30.8 36.1 25.7 25.3 22.5 27.8 #N/A n 45 45 45 45 45 45 45 45 45 45 45 45 45 585 SS MS 172053.6 14337.80 542689.8 948.76 714743.4 AllMean 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 18.575 #N/A F 15.11 SS(TRT) 36231.33 17955.03 11773.38 3465.028 2650.753 527.8781 1136.278 1423.828 4432.753 4797.253 15947.13 34840.38 36872.58 172053.6 SS(ERR) 88309.76 71459.96 38031.84 27500 42284 30203.36 32314.04 41740.16 57341.24 29061.56 28163.96 22275 34004.96 542689.8 F(.05) 1.769 P-value 0.0000 Bonferroni Based Methods • Construct P-values for all k test statistics • Order P-values from smallest p(1) ≤ … ≤ p(k) Bonferroni: Reject H0(i) if p(i) ≤ e/k Holm (Controls Strong FWER): Reject H0(i) if p(j) ≤ e/(k-j+1) j ≤ i False Discovery Rate: Reject H0(i) if p(j) ≤ je/k for some j ≥ i (Assumes independent tests, not the case for this example) • Example: Comparing all k=13(12)/2=78 pairs of fabrics Fabric Example j=1,…,26 Ybar Hi 47.2 46.4 47.2 46.4 37.4 47.2 46.4 37.4 28.9 28.5 47.2 46.4 47.2 46.4 37.4 24.2 23.6 22 28.9 28.5 37.4 28.9 37.4 28.5 24.2 47.2 Lo -9.8 -9.8 -1.4 -1.4 -9.8 2.4 2.4 -1.4 -9.8 -9.8 9.8 9.8 10.9 10.9 2.4 -9.8 -9.8 -9.8 -1.4 -1.4 9.8 2.4 10.9 2.4 -1.4 22 HighTrt 11A 15B 11A 15B 13P 11A 15B 13P 10R 19N 11A 15B 11A 15B 13P 14N 12T 16C 10R 19N 13P 10R 13P 19N 14N 11A LowTrt 18L 18L 17C 17C 18L 176 176 17C 18L 18L 124 124 20J 20J 176 18L 18L 18L 17C 17C 124 176 20J 176 17C 16C P 1.92281E-17 5.02202E-17 2.73176E-13 6.38303E-13 1.19751E-12 1.39182E-11 3.0811E-11 4.04527E-09 4.42097E-09 6.29464E-09 1.37836E-08 2.73042E-08 3.51693E-08 6.84851E-08 1.03211E-07 2.30981E-07 3.70989E-07 1.26732E-06 3.82636E-06 5.09826E-06 2.49233E-05 5.12351E-05 5.12351E-05 6.61739E-05 9.06917E-05 0.000116266 j 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Holm .05/(K-j+1) 0.000641 0.000649 0.000658 0.000667 0.000676 0.000685 0.000694 0.000704 0.000714 0.000725 0.000735 0.000746 0.000758 0.000769 0.000781 0.000794 0.000806 0.000820 0.000833 0.000847 0.000862 0.000877 0.000893 0.000909 0.000926 0.000943 FDR j*.05/K 0.000641 0.001282 0.001923 0.002564 0.003205 0.003846 0.004487 0.005128 0.005769 0.006410 0.007051 0.007692 0.008333 0.008974 0.009615 0.010256 0.010897 0.011538 0.012179 0.012821 0.013462 0.014103 0.014744 0.015385 0.016026 0.016667 Holm P<.05/(K-j+1) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 FDR P<j*.05/K 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Bonferroni P<.05/78 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Fabric Example j=27,…,52 Ybar Hi 23.6 46.4 47.2 22 47.2 46.4 46.4 24.2 23.6 10.9 9.8 22 28.9 28.5 47.2 47.2 28.9 46.4 28.5 46.4 37.4 24.2 23.6 37.4 24.2 37.4 Lo -1.4 22 23.6 -1.4 24.2 23.6 24.2 2.4 2.4 -9.8 -9.8 2.4 9.8 9.8 28.5 28.9 10.9 28.5 10.9 28.9 22 9.8 9.8 23.6 10.9 24.2 HighTrt 12T 15B 11A 16C 11A 15B 15B 14N 12T 20J 124 16C 10R 19N 11A 11A 10R 15B 19N 15B 13P 14N 12T 13P 14N 13P LowTrt 17C 16C 12T 17C 14N 12T 14N 176 176 18L 18L 176 124 124 19N 10R 20J 19N 20J 10R 16C 124 124 12T 20J 14N P 0.000131479 0.000189193 0.000303797 0.000341271 0.000429581 0.000481366 0.000673894 0.000839827 0.001161089 0.001512166 0.002654749 0.002654749 0.003400107 0.004128893 0.004128893 0.004997148 0.005753551 0.006027804 0.006922844 0.00724678 0.018043298 0.026976865 0.034000896 0.034000896 0.040999764 0.042537508 j 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Holm .05/(K-j+1) 0.000962 0.000980 0.001000 0.001020 0.001042 0.001064 0.001087 0.001111 0.001136 0.001163 0.001190 0.001220 0.001250 0.001282 0.001316 0.001351 0.001389 0.001429 0.001471 0.001515 0.001563 0.001613 0.001667 0.001724 0.001786 0.001852 FDR j*.05/K 0.017308 0.017949 0.018590 0.019231 0.019872 0.020513 0.021154 0.021795 0.022436 0.023077 0.023718 0.024359 0.025000 0.025641 0.026282 0.026923 0.027564 0.028205 0.028846 0.029487 0.030128 0.030769 0.031410 0.032051 0.032692 0.033333 Holm P<.05/(K-j+1) 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 FDR P<j*.05/K 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 Bonferroni P<.05/78 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Fabric Example j=53,…,78 Ybar Hi 23.6 10.9 2.4 22 9.8 22 47.2 46.4 37.4 10.9 37.4 -1.4 9.8 28.9 28.5 28.9 28.5 28.9 28.5 2.4 24.2 23.6 10.9 47.2 24.2 28.9 Lo 10.9 -1.4 -9.8 9.8 -1.4 10.9 37.4 37.4 28.5 2.4 28.9 -9.8 2.4 22 22 23.6 23.6 24.2 24.2 -1.4 22 22 9.8 46.4 23.6 28.5 HighTrt 12T 20J 176 16C 124 16C 11A 15B 13P 20J 13P 17C 124 10R 19N 10R 19N 10R 19N 176 14N 12T 20J 11A 14N 10R LowTrt 20J 17C 18L 124 17C 20J 13P 13P 19N 176 10R 18L 176 16C 16C 12T 12T 14N 14N 17C 16C 16C 124 15B 12T 19N P 0.050978985 0.058706891 0.060784417 0.060784417 0.085108937 0.08792458 0.131806304 0.166294081 0.171044456 0.191067415 0.191067415 0.196333249 0.254937943 0.288419882 0.317258191 0.414733355 0.450807179 0.469491888 0.508116877 0.558650627 0.734889019 0.805464205 0.86554415 0.901993639 0.926413823 0.950903718 j 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 Holm .05/(K-j+1) 0.001923 0.002000 0.002083 0.002174 0.002273 0.002381 0.002500 0.002632 0.002778 0.002941 0.003125 0.003333 0.003571 0.003846 0.004167 0.004545 0.005000 0.005556 0.006250 0.007143 0.008333 0.010000 0.012500 0.016667 0.025000 0.050000 FDR j*.05/K 0.033974 0.034615 0.035256 0.035897 0.036538 0.037179 0.037821 0.038462 0.039103 0.039744 0.040385 0.041026 0.041667 0.042308 0.042949 0.043590 0.044231 0.044872 0.045513 0.046154 0.046795 0.047436 0.048077 0.048718 0.049359 0.050000 Holm P<.05/(K-j+1) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 FDR P<j*.05/K 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Scheffe’s Method for All Contrasts • Can be used for any number of contrasts, even those suggested by data. Conservative (Wide CI’s, Low Power) t C ki i i 1 ^ t C ki y i i 1 t t s.t. ki 0 i 1 k ^ SE C MSE i i 1 ri H 0 : ki i i 1 t 2 t H A : ki i i 1 ^ t test : Test Stat : tobs C ^ SE C F test ( 0 : Test Stat:Fobs Reject H 0 if tobs (t 1) Fe ,t 1, ^ C 2 ki2 MSE i 1 ri t df Error Reject H 0 if Fobs (t 1) Fe ,t 1, ^ Simultaneous (1 e 100% Confidence Intervals: C (t 1) Fe ,t 1, ki2 MSE i 1 ri t Example – Scheffe’s Method – All Pairwise Tests/CIs i j ki 1, k j 1, km 0 m i, m j 1 1 1 1 ^ C Y i Y j SE C MSE 948.76 6.49 r r 45 45 j i H 0 : i j 0 H A : i j 0 ^ Reject H 0 if Y i Y j (t 1) Fe ,t 1, N t 1 1 MSE MSDij r r j i 1 (e 0.05 MSDij (t 1) Fe ,t 1, N t MSE ri Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 15B 11A 1 (13 1)(1.769)(6.49) 29.92 rj Simultaneous 95% Confidence Intervals: ( 1 1 Y i Y j (t 1) Fe ,t 1, N t MSE r r j i Fabric 18L 17C 176 124 20J 16C (Y 12T i Y j 29.92 14N 19N 10R 13P Tukey’s Method for All Pairwise Comparisons • Makes use of the Studentized Range Distribution • Pr{(max(Y1,…,Yn)-min(Y1,…,Yn))/S ≥ q(a,n,)} = a • {Y1,…,Yn } S degrees of freedom for S ( ( max Y 1 ,..., Y t min Y 1 ,..., Y t Pr q ( e , t , 1 e MSE r Y i Y j Pr q ( e , t , 1 e i, j MSE r Tests of H 0(i , j ) : i j i j 0 : Reject H 0(i , j ) if Y i Y j q ( e , t , MSE r Approximate test for unequal sample sizes: Reject H 0(i , j ) if Y i Y j Simultaneous (1 e 100% Confidence Intervals: (Y (Y i Y j q ( e , t , MSE r i Y j q ( e , t , 2 Equal Sample Sizes 1 1 MSE r r j i Unequal Sample Sizes q ( e , t , 2 1 1 MSE r r j i Tukey’s Method for All Pairwise Comparisons Tests of H 0(i , j ) : i j i j 0 : Reject H 0(i , j ) if Y i Y j q ( e , t , MSE r HSDi , j HSDi , j q ( 0.05,13,585 13 948.76/45 4.70(4.59) 21.58 Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 Simultaneous 95% Confidence Intervals: (Y Fabric i Y j 21.58 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Bonferroni’s Method for All Pairwise Comparisons • Adjusts type I error rate for each test to e/(# of tests) • Increases Confidence levels of CI’s to (1-(e/(# of CIs))) Tests of H 0(i , j ) : i j i j 0 : k # of tests/CIs Reject H 0(i , j ) if Y i Y j t e 2k , 1 1 MSE r r j i Equivalently, Reject H 0(i , j ) if P-value e k Simultaneous (1 a 100% Confidence Intervals: (Y i Y j t e 2k , 1 1 MSE r r j i Bonferroni’s Method for All Pairwise Comparisons Tests of H 0(i , j ) : i j i j 0 : Reject H 0(i , j ) if Y i Y j t e 2k MSDi , j t 0.05 2(78) , 1 1 MSE MSDi , j r r j i 1 1 948.76 3.43(6.49) 22.29 45 45 Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 Simultaneous (1 a 100% Confidence Intervals: (Y i Fabric Y j 22.29 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A SNK Method for All Pairwise Comparisons • Controls False Discovery Rate at e • Uses Different Critical Values for different ranges of means Tests of H 0(i , j ) : (i ) ( j ) 0 : where Y (1) ... Y (t ) i j Reject H 0(i , j ) if 1 1 q(e , k , ) Y ( i ) Y ( j ) MSE k i j 1 r r 2 j i Start with largest "stretch" and work down, if fail to reject for one "stretch" do not test those "beneath" it SNK Method for All Pairwise Comparisons Tests of H 0( i , j ) : ( i ) ( j ) 0 : where Y (1) ... Y ( t ) i j Reject H 0( i , j ) if Y ( i ) Y ( j ) MSDk Fabric 1 1 q (e , k , ) MSE MSDk r r 2 j i k i j 1 q (e , k , ) 1 q(e , k , ) 1 948.76 (6.49) 2 2 45 45 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A k 2 3 4 5 6 7 8 9 10 11 12 13 q(.05,k,inf) q*SE/sqrt(2) 2.772 12.73 3.314 15.22 3.633 16.68 3.858 17.71 4.03 18.50 4.17 19.15 4.286 19.68 4.387 20.14 4.474 20.54 4.552 20.90 4.622 21.22 4.685 21.51 Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 SNK Method for All Pairwise Comparisons Low Type \k 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B Fabric 18L 13 57.0 12.0 56.2 48.6 17C 11 47.2 47.8 44.8 176 10 38.7 38.8 44.0 37.4 124 9 38.3 30.3 35.0 36.6 36.3 20J 8 34.0 29.9 26.5 27.6 35.5 25.2 16C 7 33.4 25.6 26.1 19.1 26.5 24.4 23.6 12T 6 31.8 25.0 21.8 18.7 18.0 15.4 22.8 23.0 14N 5 20.7 23.4 21.2 14.4 17.6 6.9 13.8 22.2 18.7 19N 4 19.6 12.3 19.6 13.8 13.3 6.5 5.3 13.2 17.9 18.3 10R 3 12.2 11.2 8.5 12.2 12.7 2.2 4.9 4.7 8.9 17.5 9.8 k 2 3 4 5 6 7 8 9 10 11 12 13 2 8.4 3.8 7.4 1.1 11.1 1.6 0.6 4.3 0.4 8.5 9.0 0.8 13P 15B 11A q(.05,k,inf) q*SE/sqrt(2) 2.772 12.73 3.314 15.22 3.633 16.68 3.858 17.71 4.03 18.50 4.17 19.15 4.286 19.68 4.387 20.14 4.474 20.54 4.552 20.90 4.622 21.22 4.685 21.51 Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 Fisher’s Protected LSD for All Pairwise Comparisons • Controls Experimentwise Error Rate at e • Only Conducted if F-test is significant (P-value ≤ e Tests of H 0( i , j ) : i j 0 : Reject H 0( i , j ) if Y i Y j te /2, 1 1 MSE r r j i Fisher’s Protected LSD for All Pairwise Comparisons Tests of H 0(i , j ) : i j 0 : Reject H 0( i , j ) if 1 1 MSE MSDij r r j i Y i Y j te /2, MSDij t.025,572 Fabric 18L 17C 1 1 948.76 1.96(6.49) 12.72 45 45 176 124 20J 16C 12T 14N 19N 10R 13P Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2 15B 11A Multiple Comparisons with Best Treatment/Control • Pr{subset of treatments contains the best} = 1- e Case where higher means are good: Subset consists of all treatments i such that: 1 1 Y i Y j de (t 1, ) MSE i, j r r j i ' de' (t 1, ) are 1-sided critical values from Dunnett's t-table Case where lower means are good: Subset consists of all treatments i such that: 1 1 Y i Y j de (t 1, ) MSE i, j r r j i ' Multiple Comparisons with Best Treatment/Control Case where higher means are good: Subset consists of all treatments i such that: 1 1 Y i Y j de (t 1, ) MSE i, j r r j i ' Y i Y j d ' 0.05 1 1 (12,572) 948.76 i, j 45 45 Fabric 18L 17C 176 124 20J 16C 12T 14N 19N 10R 13P 15B 11A Y i Y j 2.50(6.49) Y i Y j 16.22 Treatments 13P, 15B, and 11A all lie within 16.22 of the highest mean Mean -9.8 -1.4 2.4 9.8 10.9 22.0 23.6 24.2 28.5 28.9 37.4 46.4 47.2