Comparison of 2 Population Means • Goal: To compare 2 populations/treatments wrt a numeric outcome • Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design) • Data Structure: Normal vs Non-normal • Sample Sizes: Large (n1,n2>20) vs Small Independent Samples • Units in the two samples are different • Sample sizes may or may not be equal • Large-sample inference based on Normal Distribution (Central Limit Theorem) • Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal) Parameters/Estimates (Independent Samples) • • • • Parameter: Estimator: Y 1 Y 2 S12 S 22 Estimated standard error: n1 n2 Shape of sampling distribution: – Normal if data are normal – Approximately normal if n1,n2>20 – Non-normal otherwise (typically) Large-Sample Test of • Null hypothesis: The population means differ by D0 (which is typically 0): H 0 : 1 2 D 0 • Alternative Hypotheses: – 1-Sided: H A : 1 2 D 0 – 2-Sided: H A : 1 2 D0 • Test Statistic: ( y1 y 2 ) D 0 zobs S12 S 22 n1 n2 Large-Sample Test of • Decision Rule: – 1-sided alternative H A : 1 2 D 0 • If zobs za ==> Conclude D0 • If zobs < za ==> Do not reject D0 – 2-sided alternative H A : 1 2 D 0 • If zobs za/ ==> Conclude D0 • If zobs -za/ ==> Conclude < D0 • If -za/ < zobs < za/ ==> Do not reject D0 Large-Sample Test of • Observed Significance Level (P-Value) – 1-sided alternative H A : 1 2 D 0 • P=P(z zobs) (From the std. Normal distribution) – 2-sided alternative H A : 1 2 D0 • P=2P( z |zobs| ) (From the std. Normal distribution) • If P-Value a, then reject the null hypothesis Large-Sample (1-a)100% Confidence Interval for • Confidence Coefficient (1-a) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples • Rule: y 1 ) y 2 za / 2 S12 S 22 n1 n2 Large-Sample (1-a)100% Confidence Interval for • For 95% Confidence Intervals, z.025=1.96 • Confidence Intervals and 2-sided tests give identical conclusions at same a-level: – If entire interval is above D0, conclude D0 – If entire interval is below D0, conclude < D0 – If interval contains D0, do not reject ≠ D0 Example: Vitamin C for Common Cold • Outcome: Number of Colds During Study Period for Each Student • Group 1: Given Placebo y1 2.2 s1 0.12 n1 155 • Group 2: Given Ascorbic Acid (Vitamin C) y 2 1.9 s2 0.10 n2 208 Source: Pauling (1971) 2-Sided Test to Compare Groups • H0: 12 0 No difference in trt effects) • HA: 12≠ 0 Difference in trt effects) • Test Statistic: zobs (2.2 1.9) 0 (0.12) 2 (0.10) 2 155 208 0.3 25.3 0.0119 • Decision Rule (a=0.05) – Conclude > 0 since zobs = 25.3 > z.025 = 1.96 95% Confidence Interval for • Point Estimate: y1 y 2 2.2 1.9 0.3 • Estimated Std. Error: (0.12) 2 (0.10) 2 0.0119 155 208 • Critical Value: z.025 = 1.96 • 95% CI: 0.30 ± 1.96(0.0119) 0.30 ± 0.023 (0.277 , 0.323) Entire interval > 0 Small-Sample Test for Normal Populations • Case 1: Common Variances (s12 = s22 = s2) • Null Hypothesis: H 0 : 1 2 D 0 • Alternative Hypotheses: – 1-Sided: H A : 1 2 D 0 – 2-Sided: H A : 1 2 D 0 • Test Statistic:(where Sp2 is a “pooled” estimate of s2) tobs ( y1 y 2 ) D 0 1 1 S p2 n1 n2 2 2 ( n 1 ) S ( n 1 ) S 1 2 2 S p2 1 n1 n2 2 Small-Sample Test for Normal Populations • Decision Rule: (Based on t-distribution with n=n1+n2-2 df) – 1-sided alternative • If tobs ta,n ==> Conclude D0 • If tobs < ta,n ==> Do not reject D0 – 2-sided alternative • If tobs ta/ ,n ==> Conclude D0 • If tobs -ta/,n ==> Conclude < D0 • If -ta/,n < tobs < ta/,n ==> Do not reject D0 Small-Sample Test for Normal Populations • Observed Significance Level (P-Value) • Special Tables Needed, Printed by Statistical Software Packages – 1-sided alternative • P=P(t tobs) (From the tn distribution) – 2-sided alternative • P=2P( t |tobs| ) (From the tn distribution) • If P-Value a, then reject the null hypothesis Small-Sample (1-a)100% Confidence Interval for Normal Populations • Confidence Coefficient (1-a) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples • Rule: y y ) t 1 2 a / 2, 1 1 S n1 n2 2 p • Interpretations same as for large-sample CI’s Small-Sample Inference for Normal Populations • Case 2: s12 s22 • Don’t pool variances: S12 S 22 n1 n2 Sy y 1 2 • Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) : S S 2 1 n* 2 2 2 n n 2 1 2 2 S2 S 22 1 n1 n2 n 1 n2 1 1 Example - Scalp Wound Closure • Groups: Stapling (n1=15) / Suturing (n2=16) • Outcome: Physician Reported VAS Score at 1-Year Mean Std Dev Sample Size Stapling (i=1) 96.92 7.51 15 Suturing (i=2) 96.31 8.06 16 • Conduct a 2-sided test of whether mean scores differ • Construct a 95% Confidence Interval for true difference Source: Khan, et al (2002) Example - Scalp Wound Closure H0: 0 HA: 0 (a = 0.05) (15 1)( 7.51) 2 (16 1)(8.06) 2 S 60.83 15 16 2 96.92 96.31 0.61 TS : tobs 0.22 2.80 1 1 60.83 15 16 RR : | tobs | t.025, 29 2.045 2 p 95%CI : 0.61 2.045( 2.80) 0.61 5.73 ( 5.12,6.34) No significant difference between 2 methods Small Sample Test to Compare Two Medians - Nonnormal Populations • Two Independent Samples (Parallel Groups) • Procedure (Wilcoxon Rank-Sum Test): – Rank measurements across samples from smallest (1) to largest (n1+n2). Ties take average ranks. – Obtain the rank sum for each group (T1 , T2 ) – 1-sided tests:Conclude HA: M1 > M2 if T2 T0 – 2-sided tests:Conclude HA: M1 M2 if min(T1, T2) T0 – Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages. Example - Levocabostine in Renal Patients • 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6) • Outcome: Levocabastine AUC (1 Outlier/Group) Non-Dialysis 857 (12) 567 (9) 626 (10) 532 (8) 444 (5) 357 (1) T1 = 45 Hemodialysis 527 (7) 740 (11) 392 (2.5) 514 (6) 433 (4) 392 (2.5) T2 = 33 2-sided Test: Conclude Medians differ if min(T1,T2) 26 Source: Zagornik, et al (1993) Computer Output - SPSS n N f G A N 0 H 0 T b a U M W Z A a E S a N b G Inference Based on Paired Samples (Crossover Designs) • Setting: Each treatment is applied to each subject or pair (preferably in random order) • Data: di is the difference in scores (Trt1-Trt2) for subject (pair) i • Parameter: D - Population mean difference • Sample Statistics: d n d i 1 i n d d) 2 n s 2 d i 1 i n 1 sd sd2 Test Concerning D • Null Hypothesis: H0:D=D0 (almost always 0) • Alternative Hypotheses: – 1-Sided: HA: D > D0 – 2-Sided: HA: D D0 • Test Statistic: tobs d sd n Test Concerning D Decision Rule: (Based on t-distribution with n=n-1 df) 1-sided alternative If tobs ta,n ==> Conclude D D0 If tobs < ta,n ==> Do not reject D D0 2-sided alternative If tobs ta/ ,n ==> Conclude D D0 If tobs -ta/,n ==> Conclude D < D0 If -ta/,n < tobs < ta/,n ==> Do not reject D D0 Confidence Interval for D sd d ta / 2,n n Example - Evaluation of Transdermal Contraceptive Patch In Adolescents • Subjects: Adolescent Females on O.C. who then received Ortho Evra Patch • Response: 5-point scores on ease of use for each type of contraception (1=Strongly Agree) • Data: di = difference (O.C.-EVRA) for subject i • Summary Statistics: d 1.77 sd 1.48 n 13 Source: Rubinstein, et al (2004) Example - Evaluation of Transdermal Contraceptive Patch In Adolescents • 2-sided test for differences in ease of use (a=0.05) • H0:D = 0 HA:D 0 1.77 1.77 4.31 1.48 0.41 13 RR :| tobs | t.025,12 2.179 TS : tobs 95%CI : 1.77 2.179(0.41) 1.77 0.89 (0.88,2.66) Conclude Mean Scores are higher for O.C., girls find the Patch easier to use (low scores are better) Small-Sample Test For Nonnormal Data • Paired Samples (Crossover Design) • Procedure (Wilcoxon Signed-Rank Test) – Compute Differences di (as in the paired t-test) and obtain their absolute values (ignoring 0s) – Rank the observations by |di| (smallest=1), averaging ranks for ties – Compute T+ and T-, the rank sums for the positive and negative differences, respectively – 1-sided tests:Conclude HA: M1 > M2 if T- T0 – 2-sided tests:Conclude HA: M1 M2 if min(T+, T- ) T0 – Values of T0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages. Example - New MRI for 3D Coronary Angiography • Previous vs new Magnetization Prep Schemes (n=7) • Response: Blood/Myocardium Contrast-Noise-Ratio Subject A B C D E F G Previous 20 31 20 19 40 28 10 New 36 37 27 32 48 40 25 Diff=Pre-New -16 -6 -7 -13 -8 -12 -15 |Diff| 16 6 7 13 8 12 15 Rank(|Diff|) 7 1 2 5 3 4 6 • All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0 • From tables for 2-sided tests, n=7, a=0.05, T0=2 • Since min(0,28) 2, Conclude the scheme means differ Source: Nguyen, et al (2004) Computer Output - SPSS n n N f a N N 0 0 0 b P 7 0 0 c T 0 T 7 a N b N c N t b a W V a Z 6 A 8 a B b W Note that SPSS is taking NEW-PREVIOUS in top table