Comparing 2 Population Means

advertisement
Comparison of 2 Population
Means
• Goal: To compare 2 populations/treatments
wrt a numeric outcome
• Sampling Design: Independent Samples
(Parallel Groups) vs Paired Samples
(Crossover Design)
• Data Structure: Normal vs Non-normal
• Sample Sizes: Large (n1,n2>20) vs Small
Independent Samples
• Units in the two samples are different
• Sample sizes may or may not be equal
• Large-sample inference based on Normal
Distribution (Central Limit Theorem)
• Small-sample inference depends on
distribution of individual outcomes (Normal
vs non-Normal)
Parameters/Estimates
(Independent Samples)
•
•
•
•
Parameter:     
Estimator: Y 1  Y 2
S12 S 22

Estimated standard error:
n1 n2
Shape of sampling distribution:
– Normal if data are normal
– Approximately normal if n1,n2>20
– Non-normal otherwise (typically)
Large-Sample Test of 
• Null hypothesis: The population means differ by
D0 (which is typically 0): H 0 : 1  2  D 0
• Alternative Hypotheses:
– 1-Sided: H A : 1   2  D 0
– 2-Sided: H A : 1  2  D0
• Test Statistic:
( y1  y 2 )  D 0
zobs 
S12 S 22

n1
n2
Large-Sample Test of 
• Decision Rule:
– 1-sided alternative
H A : 1   2  D 0
• If zobs  za ==> Conclude   D0
• If zobs < za ==> Do not reject   D0
– 2-sided alternative
H A : 1   2  D 0
• If zobs  za/ ==> Conclude   D0
• If zobs  -za/ ==> Conclude  < D0
• If -za/ < zobs < za/ ==> Do not reject   D0
Large-Sample Test of 
• Observed Significance Level (P-Value)
– 1-sided alternative H A : 1   2  D 0
• P=P(z  zobs) (From the std. Normal distribution)
– 2-sided alternative H A : 1  2  D0
• P=2P( z |zobs| ) (From the std. Normal distribution)
• If P-Value  a, then reject the null hypothesis
Large-Sample (1-a)100% Confidence
Interval for 
• Confidence Coefficient (1-a) refers to the proportion
of times this rule would provide an interval that
contains the true parameter value  if it were
applied over all possible samples
• Rule:
y
1
)
 y 2  za / 2
S12 S 22

n1 n2
Large-Sample (1-a)100% Confidence
Interval for 
• For 95% Confidence Intervals, z.025=1.96
• Confidence Intervals and 2-sided tests give
identical conclusions at same a-level:
– If entire interval is above D0, conclude   D0
– If entire interval is below D0, conclude  < D0
– If interval contains D0, do not reject  ≠ D0
Example: Vitamin C for Common Cold
• Outcome: Number of Colds During Study Period
for Each Student
• Group 1: Given Placebo
y1  2.2 s1  0.12 n1  155
• Group 2: Given Ascorbic Acid (Vitamin C)
y 2  1.9 s2  0.10 n2  208
Source: Pauling (1971)
2-Sided Test to Compare Groups
• H0: 12 0 No difference in trt effects)
• HA: 12≠ 0 Difference in trt effects)
• Test Statistic:
zobs 
(2.2  1.9)  0
(0.12) 2 (0.10) 2

155
208

0.3
 25.3
0.0119
• Decision Rule (a=0.05)
– Conclude  > 0 since zobs = 25.3 > z.025 = 1.96
95% Confidence Interval for 
• Point Estimate:
y1  y 2  2.2 1.9  0.3
• Estimated Std. Error:
(0.12) 2 (0.10) 2

 0.0119
155
208
• Critical Value: z.025 = 1.96
• 95% CI: 0.30 ± 1.96(0.0119)  0.30 ± 0.023
 (0.277 , 0.323) Entire interval > 0
Small-Sample Test for 
Normal Populations
• Case 1: Common Variances (s12 = s22 = s2)
• Null Hypothesis:
H 0 : 1   2  D 0
• Alternative Hypotheses:
– 1-Sided:
H A : 1   2  D 0
– 2-Sided:
H A : 1   2  D 0
• Test Statistic:(where Sp2 is a “pooled” estimate of s2)
tobs 
( y1  y 2 )  D 0
 1
1 

S p2 

 n1 n2 
2
2
(
n

1
)
S

(
n

1
)
S
1
2
2
S p2  1
n1  n2  2
Small-Sample Test for 
Normal Populations
• Decision Rule: (Based on t-distribution with n=n1+n2-2 df)
– 1-sided alternative
• If tobs  ta,n ==> Conclude   D0
• If tobs < ta,n ==> Do not reject   D0
– 2-sided alternative
• If tobs  ta/ ,n ==> Conclude   D0
• If tobs  -ta/,n ==> Conclude  < D0
• If -ta/,n < tobs < ta/,n ==> Do not reject   D0
Small-Sample Test for 
Normal Populations
• Observed Significance Level (P-Value)
• Special Tables Needed, Printed by Statistical Software
Packages
– 1-sided alternative
• P=P(t  tobs) (From the tn distribution)
– 2-sided alternative
• P=2P( t  |tobs| ) (From the tn distribution)
• If P-Value  a, then reject the null hypothesis
Small-Sample (1-a)100% Confidence Interval
for   Normal Populations
• Confidence Coefficient (1-a) refers to the proportion of
times this rule would provide an interval that contains the
true parameter value  if it were applied over all
possible samples
• Rule:
y  y )  t
1
2
a / 2,
1 1
S   
 n1 n2 
2
p
• Interpretations same as for large-sample CI’s
Small-Sample Inference for 
Normal Populations
• Case 2: s12  s22
• Don’t pool variances:
S12 S 22

n1 n2
Sy y 
1
2
• Use “adjusted” degrees of freedom (Satterthwaites’
Approximation) :
S
S 
2
1
n* 
2
2
2

 n  n 

2 
 1
2
2
S2

 S 22

 1

n1 
n2 

 

 n 1
n2  1
1










Example - Scalp Wound Closure
• Groups: Stapling (n1=15) / Suturing (n2=16)
• Outcome: Physician Reported VAS Score at 1-Year
Mean
Std Dev
Sample Size
Stapling (i=1)
96.92
7.51
15
Suturing (i=2)
96.31
8.06
16
• Conduct a 2-sided test of whether mean scores differ
• Construct a 95% Confidence Interval for true difference
Source: Khan, et al (2002)
Example - Scalp Wound Closure
H0:   0
HA:   0
(a = 0.05)
(15  1)( 7.51) 2  (16  1)(8.06) 2
S 
 60.83
15  16  2
96.92  96.31
0.61
TS : tobs 

 0.22
2.80
1 
 1
60.83


 15 16 
RR : | tobs |  t.025, 29  2.045
2
p
95%CI : 0.61  2.045( 2.80)  0.61  5.73  ( 5.12,6.34)
No significant difference between 2 methods
Small Sample Test to Compare Two
Medians - Nonnormal Populations
• Two Independent Samples (Parallel Groups)
• Procedure (Wilcoxon Rank-Sum Test):
– Rank measurements across samples from smallest (1)
to largest (n1+n2). Ties take average ranks.
– Obtain the rank sum for each group (T1 , T2 )
– 1-sided tests:Conclude HA: M1 > M2 if T2  T0
– 2-sided tests:Conclude HA: M1  M2 if min(T1, T2)  T0
– Values of T0 are given in many texts for various sample
sizes and significance levels. P-values printed by
statistical software packages.
Example - Levocabostine in Renal Patients
• 2 Groups: Non-Dialysis/Hemodialysis (n1 = n2 = 6)
• Outcome: Levocabastine AUC (1 Outlier/Group)
Non-Dialysis
857 (12)
567 (9)
626 (10)
532 (8)
444 (5)
357 (1)
T1 = 45
Hemodialysis
527 (7)
740 (11)
392 (2.5)
514 (6)
433
(4)
392 (2.5)
T2 = 33
2-sided Test: Conclude Medians differ if min(T1,T2)  26
Source: Zagornik, et al (1993)
Computer Output - SPSS
n
N
f
G
A
N
0
H
0
T
b
a
U
M
W
Z
A
a
E
S
a
N
b
G
Inference Based on Paired Samples
(Crossover Designs)
• Setting: Each treatment is applied to each subject or pair
(preferably in random order)
• Data: di is the difference in scores (Trt1-Trt2) for subject
(pair) i
• Parameter: D - Population mean difference
• Sample Statistics:

d
n
d
i 1 i
n

d d)


2
n
s
2
d
i 1
i
n 1
sd  sd2
Test Concerning D
• Null Hypothesis: H0:D=D0
(almost always 0)
• Alternative Hypotheses:
– 1-Sided: HA: D > D0
– 2-Sided: HA: D  D0
• Test Statistic:
tobs 
d
sd
n
Test Concerning D
Decision Rule: (Based on t-distribution with n=n-1 df)
1-sided alternative
If tobs  ta,n ==> Conclude D  D0
If tobs < ta,n ==> Do not reject D  D0
2-sided alternative
If tobs  ta/ ,n ==> Conclude D  D0
If tobs  -ta/,n ==> Conclude D < D0
If -ta/,n < tobs < ta/,n ==> Do not reject D  D0
Confidence Interval for D
 sd 
d  ta / 2,n 

 n
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• Subjects: Adolescent Females on O.C. who then
received Ortho Evra Patch
• Response: 5-point scores on ease of use for each
type of contraception (1=Strongly Agree)
• Data: di = difference (O.C.-EVRA) for subject i
• Summary Statistics:
d  1.77 sd  1.48 n  13
Source: Rubinstein, et al (2004)
Example - Evaluation of Transdermal
Contraceptive Patch In Adolescents
• 2-sided test for differences in ease of use (a=0.05)
• H0:D = 0
HA:D  0
1.77
1.77

 4.31
1.48
0.41
13
RR :| tobs | t.025,12  2.179
TS : tobs 
95%CI : 1.77  2.179(0.41)  1.77  0.89  (0.88,2.66)
Conclude Mean Scores are higher for O.C., girls find
the Patch easier to use (low scores are better)
Small-Sample Test For Nonnormal Data
• Paired Samples (Crossover Design)
• Procedure (Wilcoxon Signed-Rank Test)
– Compute Differences di (as in the paired t-test) and obtain their
absolute values (ignoring 0s)
– Rank the observations by |di| (smallest=1), averaging ranks for ties
– Compute T+ and T-, the rank sums for the positive and negative
differences, respectively
– 1-sided tests:Conclude HA: M1 > M2 if T-  T0
– 2-sided tests:Conclude HA: M1  M2 if min(T+, T- )  T0
– Values of T0 are given in many texts for various sample sizes and
significance levels. P-values printed by statistical software
packages.
Example - New MRI for 3D Coronary
Angiography
• Previous vs new Magnetization Prep Schemes (n=7)
• Response: Blood/Myocardium Contrast-Noise-Ratio
Subject
A
B
C
D
E
F
G
Previous
20
31
20
19
40
28
10
New
36
37
27
32
48
40
25
Diff=Pre-New
-16
-6
-7
-13
-8
-12
-15
|Diff|
16
6
7
13
8
12
15
Rank(|Diff|)
7
1
2
5
3
4
6
• All Differences are negative, T- = 1+2+…+7 = 28, T+ = 0
• From tables for 2-sided tests, n=7, a=0.05, T0=2
• Since min(0,28)  2, Conclude the scheme means differ
Source: Nguyen, et al (2004)
Computer Output - SPSS
n
n
N
f
a
N
N
0
0
0
b
P
7
0
0
c
T
0
T
7
a
N
b
N
c
N
t
b
a
W
V
a
Z
6
A
8
a
B
b
W
Note that SPSS is taking NEW-PREVIOUS in top table
Download