AMS 572 Lecture Notes 8 Oct. 8th , 2010 Ch8 Inference on two

advertisement
AMS 572 Lecture Notes 8
Oct. 8th , 2010
Ch8 Inference on two population means (and two population
variances).

1. The samples are paired
paired sample t-test
2. The samples are independent  independent sample t-test
a)  12   2 2  pooled-variance t-test
b)  12   2 2  unspooled-variance t-test
(Note: to check if  12   2 2 , use F-test)
2a.Inference on 2 population means, when both populations are
normal. We have 2 independent samples, the population variances
are unknown but equal (  12   2 2   2 )  pooled-variance t-test.
Data:
iid
X1 ,
, X n1
N ( 1 , 12 )
iid
Y1 ,
Goal:
, Yn2 ~ N (2 ,  22 )
Compare 1 and 2
1) Point estimator:
1  2  X  Y ~ N ( 1  2 ,
2) Pivotal quantity:
 12
n1

 22
n2
)  N ( 1   2 , (
1 1 2
 ) )
n1 n2
Z
( X  Y )  ( 1  2 )
~ N (0,1)
1 1


n1 n2
(n1  1) S12
2
~  n21 1 ,
(n2  1) S22
2
not the PQ, since we don’t know  2
~  n22 1 , and they are independent ( S12 & S 22 are
independent because these two samples are independent to each other)
W 
(n1  1) S12
2

(n2  1) S22
2
Definition: W  Z12  Z 22 
~  n21  n2 2
i .i .d .
 Z k2 , when Z i ~ N (0,1) , then W ~  k2 .
Definition:
t-distribution: Z ~ N (0,1) , W ~  k2 , and Z & W are independent, then T 
T 
Z

W
n1  n2  2
where S p2 
( X  Y )  ( 1  2 )
1 1


n1 n2
(n1  1) S12
2

(n2  1) S 22
2
n1  n2  2

( X  Y )  ( 1  2 )
~ tn1  n2  2 .
1 1
Sp

n1 n2
(n1  1) S12  (n2  1) S22
is the pooled variance.
n1  n2  2
This is the PQ of the inference on the parameter of interest ( 1  2 )
3) Confidence Interval for (1  2 )
Z
~ tk
W
k
1    P ( t
1    P ( t
1    P ( t
n1  n2  2,
n1  n2  2,
n1  n2  2,
T t

2


2
)
2
1 1
1 1

 ( X  Y )  ( 1   2 )  t
 )
  Sp
n1  n2  2,
n1 n2
n1 n2
2
2
1    P( X  Y  t

( X  Y )  ( 1  2 )
t
)
n1  n2  2,
1 1
2
Sp

n1 n2
 Sp

n1  n2  2,
n1  n2  2,

2
 Sp
1 1
1 1

 1   2  X  Y  t
 )
  Sp
n1  n2  2,
n1 n2
n1 n2
2
 This is the 100(1   )% C.I for (1  2 )
4) Test:
Test statistic: T0 =
( X  Y )  c0
Sp
 H 0 : 1  2  c0
a) 
 H a : 1  2  c0
1 1

n1 n2
H0
~ tn1  n2 2
(The most common situation is c0 =0  H 0 : 1  2 )
At the significance level α, we reject H 0 in favor of H a iff T0  tn1  n2 2,
If P  value  P(T0  t0 | H 0 )   , reject H 0
 H 0 : 1  2  c0
b) 
 H a : 1  2  c0
At significance level α, reject H 0 in favor of H a iff T0  tn1  n2  2,
If P  value  P(T0  t0 | H 0 )   , reject H 0
 H 0 : 1  2  c0
c) 
 H a : 1  2  c0
At α=0.05, reject H 0 in favor of H a iff | T0 | t
n1  n2  2,

2
If P  value  2 P(T0 | t0 || H 0 )   , reject H 0
Example 2A. An experiment was conducted to compare the mean number of tapeworms in the
stomachs of sheep that had been treated for worms against the mean number in those that were
untreated. A sample of 14 worm-infected lambs was randomly divided into 2 groups. Seven were
injected with the drug and the remainders were left untreated. After a 6-month period, the lambs
were slaughtered and the following worm counts were recorded:
Drug-treated sheep
18 43 28 50 16 32 13
Untreated sheep
40 54 26 63 21 37 39
(a). Test at α = 0.05 whether the treatment is effective or not.
(b) What assumptions do you need for the inference in part (a)?
(c). Please write up the entire SAS program necessary to answer questions raised in (a) and (b).
SOLUTION:
Inference on two population means. Two small and independent samples.
Drug-treated sheep: X 1  28.57 , s12  198.62 , n1  7
Untreated sheep: X 2  40.0 , s 22  215.33 , n2  7
(a) Under the normality assumption, we first test if the two population variances are equal. That is,
H 0 :  12   22 versus H a :  12   22 . The test statistic is
F0 
s12 198.62

 0.92 , F6,6, 0.05,U  4.28 and F6,6,0.05, L  1 / 4.28  0.23 .
s 22 215.33
Since F0 is between 0.23 and 4.28, we cannot reject H0 . Therefore it is reasonable to assume
that  12   22 .
Next we perform the pooled-variance t-test with hypotheses H 0 : 1   2  0 versus
H a : 1   2  0
t0 
X1  X 2 0
sp
1
1

n
n2

28.57  40.0  0  1.49
14.39
1 1

7 7
Since t 0  1.49 is greater than  t12, 0.05  1.782 , we cannot reject H0. We have insufficient
evidence to reject the hypothesis that there is no difference in the mean number of worms in
treated and untreated lambs.
(b) (1) Both populations are normally distributed
(2)  12   22
(c) /*Problem #2B*/
data sheep;
input group worms;
datalines;
1 18
1 43
1 28
1 50
1 16
1 32
1 13
2 40
2 54
2 26
2 63
2 21
2 37
2 39
;
run;
proc univariate data=sheep normal;
class group;
var worms;
title 'Check for normality';
run;
proc ttest data=sheep;
class group;
var worms;
title 'Independent samples t-test';
run;
proc npar1way data=sheep wilcoxon;
class group;
var worms;
title 'Nonparametric test for two-mean comparisons';
run;
Oct. 11th , 2010
2b. Inference on 2 population means, when both populations are
normal and the population variances are not equal (  12   22 ). We
have two independent samples.

unpooled-variance t-test.
1) P.Q:
T
( X 1  X 2 )  ( 1  2 )
S12 S 2 2

n1
n2

~ tdf
Formula for the df:
Walch Satterthwaite method:
df 
W12
(W1  W2 ) 2
S12
S2 2
W

,
W

where
1
2
(n1  1)  W2 2 (n2  1)
n1
n2
or another way to find df (less accurate and more conservative)
df  min(n1  1, n2  1)
2) 100(1-α)% C.I. for ( 1  2 )
X Y  t
df ,


2
S12 S2 2

n1 n2
3) Test:
Test statistic: T0 =
( X  Y )  c0
S12 S2 2

n1 n2
H0
~ tdf
 H :   2  c0
a)  0 1
 H a : 1  2  c0
At α=0.05, reject H 0 in favor of H a iff T0  tdf ,
If P  value  P(T0  t0 | H 0 )   , reject H 0
 H 0 : 1  2  c0
b) 
 H a : 1  2  c0
At α=0.05, reject H 0 in favor of H a iff T0  tdf ,
If P  value  P(T0  t0 | H 0 )   , reject H 0
 H 0 : 1  2  c0
c) 
 H a : 1  2  c0
At α=0.05, reject H 0 in favor of H a iff | T0 | t
If P  value  2 P(T0 | t0 || H 0 )   , reject H 0
4) F-test:
iid
Data: X1 ,
, X n1
N ( 1 , 12 )
df ,

2
iid
, Yn2 ~ N (2 ,  22 )
Y1 ,

 12
H
:
1
 0
 H 0 :  12   22
 22



2
2
2
 H a :  1   2
H : 1  1
 a  22
① Point estimator: (
 12
 12 S12
)


 2 2  2 2 S2 2
② P.Q:
Definition: F-Distribution
Let W1 ~  k21 , W2 ~  k22 , and W1 , W2 are independent. Then F 
(n1  1) S12
 12
~  n21 1
(n2  1) S22

2
2
~  n22 1
(n1  1) S12
 F
W1 / k1
~ Fk1 ,k2
W2 / k2
 12
(n2  1) S 22
 22
(n1  1)

(n2  1)
S12  12
~ Fn1 1,n2 1
S 22  22
 Pivotal Quantity
③ Conference Interval for
 12
 22
1- α = P( Fn1 1,n2 1, L  F  Fn1 1,n2 1,U ) = P ( FL 
S12
S12
S2 2 S2
= P( 2  1 2  2 )
FU  2
FL
④ F-test:
Test Statistic:
S12 H0
F0  2 = ~ Fn1 1,n2 1
S2

 12
H
:
 0 2 1
2

a) 
2
H : 1  1
 a  22
S12
S2 2
 12
 22
 FU )
At significance level  , we will reject H 0 in favor of H a iif. F0  Fn1 1,n2 1, 2,upper
or F0  Fn1 1,n2 1, 2,lower

 12
H0 : 2  1
2

b) 
2
H : 1  1
 a  22
Reject H 0 iif F0  Fn1 1,n2 1, ,upper

 12
H
:
 0 2 1
2

c) 
2
H : 1  1
 a  22
Reject H 0 iff F0  Fn1 1,n2 1, ,lower
5) A trick for the F distribution
1
Fk2 ,k1
F
Some F-table only gives the upper bound. If we know Fk1 ,k2 , ,U , how to find
If F
Fk1 ,k2 , then
Fk1 ,k2 , , L ?
  P( F  Fk ,k
1
1
Fk1 ,k2 , , L
2 , , L
)  P(
1
1
1
 Fk2 ,k1 , ,U ) = P( 
)
F
F Fk1 ,k2 , , L
 Fk2 ,k1 , ,U
6) Example:
A new drug for reducing blood pressure (BP) is compared to an old drug. 20
patients with comparable high BP were recruited and randomized evenly to the 2
drugs. Reductions in BP after 1 month of taking the drugs are as follows:
New drug: 0, 10, -3, 15, 2, 27, 19, 21, 18, 10
Old drug: 8, -4, 7, 5, 10, 11, 9, 12, 7, 8
① Assume both populations are normal, please test at α =0.05 whether the new
drug is better than the old one.
② Run the SAS program to do the above tests (including the normality test)
Solution:
① n1 =10, X =11.9, S12 =97.43, n2 =10, Y =7.3, S 2 2 =20.01
2
2
 H 0 : 1   2
F test: 
2
2
 H a : 1   2
Test statistic: F0 
S12
=4.87> F10,10,0.25,U
S22
 At α=0.05, reject H 0 and use unpooled-variance t-test
 H 0 : 1  2  0
T-test: 
 H a : 1  2  0
Test statistic: T0 =
X Y
S12 S 2 2

n1
n2
=1.34< tdf ,0.05
 At α =0.05, fail to reject H 0 and we cannot say that the new drug is
better than the old one.
②
Data BP;
input drug BPR;
datalines;
1 0
1 10
1 -3
1 15
1 2
1 27
1 19
1 21
1 18
1 10
2 8
2 -4
2 7
2 5
2 10
2 11
2 9
2 12
2 7
2 8
;
run;
PROC UNIVARIATE DATA=BP NORMAL;
CLASS DRUG;
VAR BPR;
RUN;
PROC TTEST DATA=BP;
CLASS DRUG;
VAR BPR;
RUN;
Selected result:
Tests for Normality
Test
--Statistic---
-----p Value------
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
Pr
Pr
Pr
Pr
Test
0.955526
0.142059
0.037679
0.238555
--Statistic---
Shapiro-Wilk
Kolmogorov-Smirnov
Cramer-von Mises
Anderson-Darling
W
D
W-Sq
A-Sq
<
>
>
>
W
D
W-Sq
A-Sq
0.7339
>0.1500
>0.2500
>0.2500
-----p Value------
0.805147
0.273266
0.124461
0.776159
Pr
Pr
Pr
Pr
<
>
>
>
W
D
W-Sq
A-Sq
0.0167
0.0334
0.0447
0.0294
*The data of old drug is not normal, use nonparametric test.
Method
Pooled
Satterthwaite
Variances
Equal
Unequal
DF
18
12.547
t Value
1.34
1.34
Pr > |t|
0.1962
0.2033
Equality of Variances
Method
Folded F
Num DF
Den DF
F Value
Pr > F
9
9
4.87
0.0274
If the populations are not normal, use nonparametric test for 2 means
from 2 independent samples (Wilcoxon Rank Sum test)
PROC NPAR1WAY DATA=BP;
CLASS DRUG;
VAR DRUG;
RUN;
Selected result:
Wilcoxon Two-Sample Test
Statistic
55.0000
Normal Approximation
Z
One-Sided Pr < Z
Two-Sided Pr > |Z|
-4.3153
<.0001
<.0001
t Approximation
One-Sided Pr < Z
Two-Sided Pr > |Z|
0.0002
0.0004
*In this problem, we can conclude that the new drug is better than the old
one. The result conflicts with the manually calculated one due to the
assumption of normality.
Download