Hypothesis testing

advertisement
Hypothesis testing
Some general concepts:
Null hypothesis
H0
A statement we “wish” to refute
Alternative hypotesis
H1
The whole or part of the
complement of H0
Common case:
The statement is about an unknown parameter, 
 H0:   
H1:    –  ( \ )
where  is a well-defined subset of the parameter space 
 \

 \
Simple hypothesis:  (or  –  ) contains only one point (one
single value)
Composite hypothesis: The opposite of simple hypothesis
Critical region (Rejection region)
A subset C of the sample space for the random sample X = (X1, … ,
Xn ) such that we reject H0 if X C (and accept (better phrase: do
not reject ) H0 otherwise ).
The complement of C, i.e. C will be referred to as the acceptance
region
C
C
C is usually defined in terms of a statistic, T(X) , called the test
statistic
Simple null and alternative hypotheses
Errors in hypothesis testing:
Type I error
Rejecting a true H0
Type II error
Accepting a false H0
Significance level

The probability of Type I error
Also referred to as the size of the test or
the risk level
Risk of Type II error

The probability of Type II error
Power

The probability of rejecting a false H0 ,
i.e. the probability of the complement
of Type II error = 1 – 
Writing it more “mathematically”:
  Pr X  C H 0 

 \

  Pr  X  C H1   Pr X  C H1
  1    Pr  X  C H1 

C
C
Classical approach: Fix  and then find a test that makes  desirably
small
A low value of  does not imply a low value of  , rather the contrary
Most powerful test
A test which minimizes  for a fixed value of  is called a most
powerful test (or best test) of size 
Neyman-Pearson lemma
x = (x1, … , xn ) a random sample from a distribution with p.d.f.
f (x;  )
We wish to test
H0 :  = 0 (simple hypothesis)
versus
H1 :  = 1 (simple hypothesis)
The the most powerful test of size  has a critical region of the form
Lθ1; x 
A
Lθ0 ; x 
where A is some non-negative constant.
Proof: Se the course book
Note! Both hypothesis are simple
Example:
x  x1 , , xn  a random sample from Exp , i.e.
with p.d.f.
f  x;    e
1  1 x
;x0
Test H 0 :    0 vs. H1 :   1 where 1   0 with a test of size 
1
 n   xi
L ; x    e
 The critical region for a most powerful test is
n
 n 1  xi
1 e
n
 n  0  xi
0 e
 Ae
 
n
1
n
0
 x  A   1 
 
i
 0 
n

  1 n   0 n

   n 
 xi  ln  A   1    ln A  nln 1  ln  0 
0 





1   0  1 n   0 n  0
ln A  nln 1  ln  0 
  xi  
B
n
n
1   0

 T  x    xi  B

If 1 had been  θ0   xi  B
Logical since   E  X 
How to find B ?
If 1 > 0 then B must satisfy

 n

Pr  X i  B    0   

 i 1
Use the result th at a sum of Exp  distribute d variable s
is Gamma distribute d , Gamman,  ,
i.e. with T  X    X i  fT t   t n1 

 t
B
n 1

e t  0
 0n
 n 
dt  
Solve for B (numerical ly)
e t 
 n  n 
If the sample x comes from a distribution belonging to the oneparameter exponential family:
Aθ1 i1 B  xi  i1 C  xi  nD θ1 
Lθ1; x  e


n
n
Lθ0 ; x  e Aθ0 i1 B  xi i1 C  xi  nD θ0 
n
e
n
 Aθ1  Aθ0 in1 B  xi  nD θ1  D θ1 

If  Aθ1   Aθ0   0  Critical region is T  x   in1 Bxi   
If  Aθ1   Aθ0   0  Critical region is T  x   in1 Bxi   
“Pure significance tests”
Assume we wish to test H0:  = 0 with a test of size 
Test statistic T(x) is observed to the value t
Case 1: H1 :  > 0
The P-value is defined as Pr(T(x)  t | H0 )
Case 2: H1 :  < 0
The P-value is defined as Pr(T(x)  t | H0 )
If the P-value is less than  H0 is rejected
Case 3: H1 :   0
The P-value is defined as the probability that T(x) is as extreme as the
observed value, including that it can be extreme in two directions from H0
In general:
Consider we just have a null hypothesis, H0, that could specify
• the value of a parameter (like above)
• a particular distribution
• independence between two or more variables
•…
Important is that H0 specifies something under which calculations are
feasible
Given a test statistic T = t the P-value is defined as
Pr (T is as extreme as t | H0 )
Uniformly most powerful tests (UMP)
Generalizations of some concepts to composite (null and) alternative
hypotheses:
H0:   
H1:    –  ( \ )
Power function:
 θ   PrX  C θ  i.e. a function of θ
Size:
  sup  θ 
θ
A test of size  is said to be uniformly most powerful (UMP) if
 θ    * θ   θ  Ω  
where  * θ  is the power function of any other tes t
of size 
If H0 is simple but H1 is composite and we have found a best test
(Neyman-Pearson) for H0 vs. H1’:  =  1 where  1   –  , then
if this best test takes the same form for all  1   –  , the test is UMP.
Univariate cases:
H0:  = 0 vs. H1:  > 0 (or H1:  < 0 ) usually UMP test is found
H0:  = 0 vs. H1:   0 usually UMP test is not found
Unbiased test:
A test is said to be unbiased if  ( )   for all    – 
Similar test:
A test is said to be similar if  ( ) =  for all   
Invariant test:
Assume that the hypotheses of a test are unchanged if a transformation of
sample data is applied. If the critical region is not changed by this
transformation, the test is said to be invariant.
Consistent test:
If a test depends on the sample size n such that  ( ) = n ( ).
If limn  n ( ) = 1 the test is said to be consistent.
Efficiency:
Two test of the pair of simple hypotheses H0 and H1. If n1 and n2 are the
minimum sample sizes for test 1 and 2 resp. to achieve size  and
power   , then the relative efficiency of test1 vs. test 2 is defined as n2 / n1
(Maximum) Likelihood Ratio Tests
Consider again that we wish to test
H0:   
H1:    –  ( \ )
The Maximum Likelihood Ratio Test (MLRT) is defined as rejecting H0 if

max Lθ; x 
θ
max Lθ; x 
A
θΩ
•01
• For simple H0,  gives a UMP test
• MLRT is asymptotically most powerful unbiased
• MLRT is asymptotically similar
• MLRT is asymptotically efficient
If H0 is simple, i.e. H0:  =  0 the MLRT is simplified to
Lθ0 ; x 

A
ˆ
L θML ; x


Example
x  x1 , , xn  random sample from Exp 
H 0 :   0
H1 :    0
n
L ; x     1e
1
xi
1
  n e   xi
i 1

ˆML  x (according to earlier examples)


 0n e   x
1
0
i
 n  x 
n
 xi
e
    ln   ln   
x 
1
 n ln x  n ln  0 
n
 x  01  xi  n  x  01n x 1
   e
   e

 0
 0 
n
0
x  1  
Sampling distribution of 
Sometimes  has a well-defined sampling distribution:
e.g.   A can be shown to be an ordinary t-test when the sample
is from the normal distribution with unknown variance and
H0:  = 0
Often, this is not the case.
Asymptotic result:
Under H0 it can be shown that –2ln  is asymptotically  2-distributed
with d degrees of freedom, where
d is the difference in estimated parameters (including “nuisance”
parameters) between
" max Lθ; x  " and " max Lθ; x  "
θ
θΩ
Example Exp ( ) cont.
ln   n ln x  n ln  0 
n
0
x  1
d  1 as we estimate 0 parameters in the numerator of 
and 1 parameter ( ) in the denominato r
2n
 2 ln   2n ln x  2n ln  0   x  1 is asymptotic ally
0
12 - distribute d when    0 (i.e. when H 0 is true)
Score tests
Test of H 0 : θ  θ0 vs. H 0 : θ  θ0 θ  Ω  θ0 
Test statistic :
ψ  u T θ0   I θ01  uθ0 
where
 l l
l
u θ0   
,
, ,
 k
 1  2
T



Under H 0  is asymptotic ally  k2 - distribute d
and the test is asymptotic ally equivalent to the
correspond ing MLRT
Wald tests
Test of H 0 : θ  θ0 vs. H 0 : θ  θ0 θ  Ω  θ0 
Test statistic :

  θˆML  θ0

T

 I θˆ  θˆML  θ0
ML

Under H 0  is asymptotic ally  k2 - distribute d
and the test is asymptotic ally equivalent to the
correspond ing MLRT
Score and Wald tests are particularly used in Generalized Linear
Models
Confidence sets and confidence intervals
Definition:
Let x be a random sample from a distribution with p.d.f. f (x ;  ) where 
is an unknown parameter with parameter space , i.e.   .
If SX is a subset of  , depending on X such that
Pr X : S X  θ   1  
then SX is said to be a confidence set for  with confidence coeffcient
(level) 1 – 
For a one-dimensional parameter  we rather refer to this set as a
confidence interval
Pivotal quantities
A pivotal quantity is a function g of the unknown parameter  and the
observations in the sample, i.e. g = g (x ;  ) whose distribution is known
and independent of .
Examples:

x a random sample from N  ;  2

X 

is N 0,1 - distribute d and thus independen t of  and  2
 n
X 
is t n1 - distribute d and thus independen t of  and  2
s n
n  1S 2
2
is χ n21 - distribute d and thus independen t of  and  2
To obtain a confidence set from a pivotal quantity we write a probability
statement as
Prg1  g  X ; θ   g2   1  
(1)
For a one-dimensional  and g monotonic, the probability statement can
be re-written as
Pr1 X     2  X   1  
where now the limits are random variables, and the resulting observed
confidence interval becomes
1 x ,2  x 
For a k-dimensional  the transformation of (1) to a confidence set is
more complicated but feasible.
In particular, a point estimator of  is often used to construct the pivotal
quantity.


Example:
x a random sample from N  ;  2 ,  and  2 unknown

X 
is t n 1 - distribute d
s n

 Pr  t


2,n 1 
  1 

s
s 

 Pr X  t 2,n 1 
   X  t 2,n1 
  1 
n
n

s
s




 1 X  X  t 2,n1 
and  2 X  X  t 2,n 1 
n
n
 1   observed confidence interval for  is
X 

 t
2,n 1
s n

1  x ,  2  x    x  t

s
, x  t
2,n 1 
n
s 

2,n 1 
n
n  1S 2
is χ n21 - distribute d
2
2
 2


n

1
S
2 

Pr 1 2 
  2   1  
2





2
 n  1S 2


n

1
S
2
  1
 Pr



2
 2



2
1


2


2


n

1
S
  2 X  
1

2
2
and
2


n

1
S
 2 X  
2
12
2
 1   observed confidence interval for  2 is

 12
x
,  22
 n  1s 2 n  1s 2 

 x    2 , 2
1 2 
  2

Using the asymptotic normality of MLE:s
One-dimensional parameter  :

ˆML ~ N  , I1

ˆ 



ML


~ N 0,1  Pr  z 2 
 z 2   1  
1

1


I
I


 Approximat e 1   confidence interval for  is
ˆ  z  I 1,ˆ  z  I 1
ˆML  

 2
ML

 2
ML


k-dimensional parameter  :

θˆ ~ N θ; I θ-1





T
ˆ
 θ  θ  I θ  θˆ  θ ~ χ k2
 Ellipsoida l confidence set for θ can be constructe d
Construction of confidence intervals from hypothesis tests:
Assume a test of H0:  =  0 vs. H1:    0 with critical region C( 0 ).
Then a confidence set for  with confidence coefficient 1 –  is


S X  θ0 : X  C θ0 
where C θ0  is the acceptance region
Download