Statistics 512 Notes 6 - Wharton Statistics Department

advertisement
Statistics 512 Notes 6
Hypothesis Testing Continued
Quick Review on Hypothesis Testing
Goal: Decide between two hypotheses about a parameter of
interest  
H 0 :   0
H1 :   1 ,
where 0
1   .
Null vs. Alternative Hypothesis: The alternative hypothesis
is the hypothesis we are trying to see if there is strong
evidence for. The null hypothesis is the default hypothesis
that we will retain unless there is strong evidence for the
alternative hypothesis.
Test statistic and critical region: Test is defined by test
statistic and critical region. Critical region is region of
values of test statistic for which we will reject the null
hypothesis.
Errors in hypothesis testing: Type I and Type II errors.
Size of test, power of test: Power function of test =
 C ( )  P (W ( X1 , , X n )  C ) =
Probability of rejecting null hypothesis when true
parameter is  .
Size of test = max 0  C ( )
Power at an alternative   1 =  C ( )
Neyman-Pearson paradigm: Choose size of test to be
reasonably small to protect against Type I error, typically
0.05 or 0.01. Among tests which have prescribed size,
choose the most powerful test.
P-value: For a test statistic W ( X1 , , X n ) , consider a
family of critical regions {C :   } each with different
sizes. For the observed value of the test statistic Wobs from
the sample, consider the subset of critical regions for which
we would reject the null hypothesis, {C : Wobs  C } . The
p-value is the maximum size of the tests in the subset
{C : Wobs  C } ,
p-value = max Size(test with critical region C ) .
{C :Wobs C }
The p-value is a measure of how much evidence there is
against the null hypothesis; it is the maximum significance
level for which we would still reject the null hypothesis.
Consider the family of critical regions Ci  {i 1 X i  i} for
the motivating example. Since the graphologist made 6
correct identifications, we reject the null hypothesis for
critical regions Ci , i  6 . The maximum size of the critical
regions Ci , i  6 is for i=6 and equals 0.377. The p-values
is thus 0.377.
10
Scale of evidence
p-value
<0.01
0.01-0.05
0.05-0.10
>0.1
evidence
very strong evidence against
the null hypothesis
Strong evidence against the
null hypothesis
weak evidence against the
null hypothesis
little or no evidence against
the null hypothesis
Large sample binomial hypothesis tests:
For large samples, we can use the Central Limit Theorem
to construct a test with size approaching a prescribed value
as the sample becomes large.
The Sports Illustrated Jinx:
Many athletes believe that there is a Sports Illustrated jinx:
appearing on the cover of Sports Illustrated tends to lead to
a subsequent decline in performance. Gluckson and Leone
(1984) put the Sports Illustrated jinx to the test. Let p
denote the probability that the performance level of a cover
subject declines. If the performance is such that in normal
circumstances performance is as likely to decline as not,
then the hypotheses that Gluckson and Leone set out to test
can be written
1
H0 : p 
2 (SI cover has no effect)
1
H1 : p 
2 (SI jinx exists)
Included in the study were some 271 subjects appearing on
SI covers during the years 1954 through 1983. Let Y
denote the number of subjects whose performance
subsequently declined. We use Y as our test statistic. We
would like to do a test of size approximately 0.05.
Consider critical regions of the form C  {Y : Y  y*} . To
choose y* so that the test has size 0.05, we need to solve:
y
271 y
271 271

 1   1 
Pp 0.5 (Y  y*)   
 0.05
  1  
y  y*  y
 2   2 
As written, solving this equation would be difficult. The
task can be greatly simplified by using the Central Limit
Theorem which says that
Y  np D
 N (0,1)
np(1  p)
for a binomial random variable. Thus,

y * 271(0.5) 
Pp 0.5 (Y  y*)  P  Z 


271(0.5)(1

0.5)


for a standard normal random variable Z . Since
P( Z  1.64)  0.05 , it follows that
y * 135.5
1.64 
271(0.5)(0.5) .
Specifically, y*=149.
The observed number of declines was found to be 114.
Since 114<149, we do not reject the null hypothesis. There
is no strong evidence of a Sports Illustrated jinx.
p-value: Consider the test with critical region
Y  271(0.5)
C  {Y :
 }
271(.5)(1  .5)
The approximate size of the test is P( Z   )  1  ( ) .
Yobs  271*.5 114  271*.5
We have 271*.5*.5  271*.5*.5  2.61 . Thus, we
reject H 0 for all tests with critical region C with   2.61 .
The maximum size among these tests for which we reject
H 0 is for C2.61 with size =0.995. This the p-value, p-value
= 0.995. No evidence against the null hypothesis – no
evidence of a Sports Illustrated jinx.
Choosing the sample size
In the Neyman-Pearson, we choose the size of the test to be
small to protect against Type I errors, typically we set the
size to be 0.05. This constrains the power of the test. To
achieve both a small size and a high power, we can choose
the sample size.
1
H
:
p

Example: Suppose we want to test 0
2 vs.
1
H1 : p  from an iid Bernoulli sample. Suppose we want
2
the size to be 0.05 and the power to be 0.8 for the
alternative p  0.6 . How large a sample size do we need if
we use the large sample binomial test?
Let Y be the number of successes. Using the large sample
binomial test, the test statistic is
Y  n(0.5)
W
n(0.5)(1  0.5)
For large n, W has approximately a standard normal
distribution when p  0.5 . Thus, a test of size 0.05 has
critical region C  {W :W  1.64} . The power of this test
when p  0.6 is
 Y  n(0.5)

 Y  n(0.6)

n(0.1)
P
 1.645   P 
 1.645 
 
 n(0.5)(0.5)

 n(0.5)(0.5)
n(0.5)(0.5) 





n(0.1)
P  Z  1.645 


n(0.5)(0.5) 

1
We have  (0.8)  0.842 where  is the standard normal
CDF. Thus, we want to choose the sample size n so that
n(0.1)
1.645 
 0.842
n(0.5)(0.5)
The smallest sample size n that achieves this is found by
n(0.1)
1.645

 0.842 and solving for
setting
n(0.5)(0.5)
n resulting in n  16.12 . Thus, the smallest sample size
n needed is 17.
Testing a normal mean
2
Suppose X 1 , , X n iid N (  ,  ) with the variance known.
We want to test
H 0 :   0 vs. H1 :   0 .
X  0
z

Consider the test statistic
and critical region
n
C  {z : z  c} . What do we need to choose c to be so that
the size of the test is 0.05?


X  0
P  
 c   P( Z  c)
0
 

n


where Z is a standard normal random variable.
Thus, we want to choose c to be the 0.95 quantile of the
standard normal distribution which equals 1.645.
Suppose we wanted to test H 0 :   0 vs. H1 :   0 .
X  0
z

The size of the test with test statistic
and
n
critical region C  {z : z  c} is


X


0
max   0 P 
 c 
 
 . We have
n






X





X



0
P 
 c   P 
c 0


 



n
n
n 









P  Z  c  0




n 





  
 c  0   
P  Z  c  0

1




 is an


Because 



n 
n 


increasing function of  , the size of the test is


X  0
P  
 c 
0
 
 . Thus a test of size 0.05 for testing
n


H 0 :   0 vs. H1 :   0 is the same as the test of size
0.05 for testing H 0 :   0 vs. H1 :   0 -- the critical
region is C  {z : z  c} where
z
X  0

.
n
Two sided tests: Suppose we want to test H 0 :   0 vs.
X  0
z

H1 :   0 . Using the test statistic

still seems
n
reasonable but now it makes sense to reject for both very
large and very small values of z . We can use a critical
region of the form C  {z :| z | c} . A test of size 0.05 has
critical region C  {z :| z | 1.96} because


 X  0

P  0 
 c   P  0 | Z | c 



n


Duality between tests and confidence intervals
Suppose we want to test H 0 :   0 vs. H1 :   0 and
use the rejection region C  {z :| z | 1.96} . Then, the set of
0 for which the H 0 :   0 is not rejected is
{0 :
X  0

 1.96}  {0 : 1.96 
X  0

n
{0 : X  1.96
 1.96} 
n

 0  X  1.96

}
n
n
which is the 95% confidence interval for  that we have
used.
In general, there is a duality between tests and confidence
intervals.
Suppose we have a family of tests of size  of
H 0 :   0 vs. H a :   0 for each   . Then
{0 : test of H0 :   0 vs. H1 :   0 is not rejected}
is a (1   ) confidence interval for  .
Proof:
Conversely, suppose we have a (1   ) confidence interval
for  . Then a test of size  of H 0 :   0 vs. H a :   0 is
to reject the null hypothesis if and only if 0 does not belong
to the confidence region.
Proof:
Download