Likelihood Ratio Test - Department of Applied Mathematics & Statistics

advertisement
AMS 572 Lecture Notes
Oct. 3, 2013.
Topic 1: Let W ~  k2 . W is indeed a special Gamma random variable.
Gamma distribution
X ~ gamma ( ,  ) (Some books use  
1

)
x

1
 1 
if f ( x) 
x
e
, 0
( )  
x

or
f ( x) 
r
( r )
x r 1e  x , 0



0
1   f ( x)dx  
x
r
( r )
 if r is a non-negative integer, then (r )  ( r  1)!
x r 1e  x dx

(r )   r  xr 1e x dx
0
t

M X (t )  1  
 
r
  t 


  
  
 M X (t )  

  t 
E( X ) 
r
r
r

Var ( X ) 
r
2
k
1
, 
 X ~  k2
2
2
Special case : when r  1  X ~ exp( )
Special case : when r 
Review
X ~ exp( )
p.d.f. f ( x)  e  x , x
m.g.f. M X (t ) 
0

 t
n
i .i .d .
e.g. Let X i ~ exp( ) , i  1,
, n . What is the distribution of
X
i 1
i
?
Solution
1
M
n
  
(
t
)

M X i (t )  


X
 i
  t 
i 1
n
  X i ~ gamma(r  n,  )
e.g. Let W ~  k2 . What is the mgf of W ?
Solution
k
 1 2


M W (t )   2 
1
 t 
2 
k
 1 2
 M W (t )  

 1  2t 
2
2
e.g. Let W1 ~  k1 , W2 ~  k2 , and W1 and W2 are independent. What is the distribution of
W1  W2 ?
Solution
k1
k2
 1 2  1 2
 1 
M W1 W2 (t )  M W1 (t )  M W2 (t )  
 
 

 1  2t   1  2t 
 1  2t 
 W1  W2 ~ k21 k2
k1  k2
2
Topic 2. Methods for deriving point estimators
Maximum Likelihood Estimator (MLE)
Method Of Moment Estimator (MOME)
i .i .d .
Example. X i
~ N ( ,
2
) , i  1,
,n
1. Derive the MLE for  and  2 .
2. Derive the MOME for  and  2 .
Solution.
1. MLE
[i] f ( xi ) 
1
2 2

e
( xi   )2
2 2
1

 ( x   )2 
 (2 2 ) 2 exp   i 2  , xi  R, i  1,
2


[ii] likelihood function  L  f ( x1 , x2 ,
,n
n
, xn )   f ( xi )
i 1
2
1
n


( x   )2 
  (2 2 ) 2 exp[ i 2 ]
2
i 1 

n


( xi   ) 2 
1




 (2 2 ) 2 exp   i 1
2 2




[iii] log likelihood function
n
( xi   )2

1
n
l  ln L  ( ) ln(2 )  ln  2  i 1
2
2
2 2
[iv]
n

2
( xi   )

 l
i 1


0
2 2
 

n

( xi   ) 2

 l
n
i 1
0
 2  2 
2
2 4
 
 ˆ  X


( X i  X )2

2
ˆ 
n

2. MOME
Order
Population Moment
1st
E( X )
nd

X1  X 2 
n
 Xn
2

X 12  X 22 
n
 X n2
k

X 1k  X 2k 
n
 X nk
E( X )
2
kth
Sample Moment
E( X )
E( X )    X
n
E( X 2 )   2   2 
X
i 1
2
i
n
3
 ˆ  X

n

X i2

ˆ 2  i 1
 ( X )2

n
We can also see that:
n
 X i2
i 1
n
n
 ( X )2 
 ( X i  X  X )2
i 1
n

(X
i
 ( X )2 
n
n
i 1
i 1
n
 ( X i  X )2   2 X ( X i  X )   ( X )2
i 1
n
 ( X )2
 X )2
n
Therefore, the MLE and MOME for  2 are the same for the normal population.
  ( X i  X )2 
 n  1  ( X i  X )2 
E (ˆ 2 )  E 
E




n
n 1


 n

n 1
n  1 2 n 2

E (S 2 ) 
   (asymptotically unbiased)
n
n
i .i .d .
Example. Let X i ~ Bernoulli ( p ) , i  1,
Please derive 1. The MLE of p
Solution.
1. MLE
[i] f ( xi )  p xi (1  p )1 xi , i  1,
,n.
2. The MOME of p.
,n
 f ( x )  p (1  p) 
[iii] l  ln L  ( x )ln p (n   x )ln(1  p)
dl  x n   x
[iv]


0
[ii] L 
n
xi
xi
i
i
i
i
i
1 p
dp
p
 pˆ 
x
i
n
2. MOME
X1  X 2 
n
 xi
E( X )  p 
 pˆ 
 Xn
n
4
Topic 3. Likelihood Ratio Test (for one population mean, normal
population)
1. Please derive the likelihood ratio test for H0: μ = μ0 versus Ha: μ ≠ μ0, when the
population is normal and population variance σ2 is known.
Solution:
First we write down the two parameter space as follows:
(1) The parameter space under H0:    :    0  , and,
(2)
The unrestricted parameter space:    :     
The likelihoods are:
(1) The likelihood under H0:
L   L 0   f x1 , x 2 ,, x n ;  0   i 1 f xi ;  0 
n
n
 xi   0 2   1  2
n
 1

xi   0 2 
 i 1
exp  
exp 
2
2 
2 i 1

2
2
 2

2

  2 
The MLE for μ under H0: is μ̂
ω = μ0
Thus the maximum value of the likelihood L  is is L(ω
̂ ) = L(μ̂
ω ) = L(μ0 ).
n
1
(2) The unrestricted likelihood is:
L   L   f x1 , x2 ,, xn ;    i 1 f xi ;  
n
n
 x   2   1  2
n
 1
xi   2 
 i 1
exp   i 2   
exp 


2
2
i 1
2
2
 2

2

  2 
There is only one free parameter μ in L . Now we shall derive the MLE for μ.
That is, we will find the value of μ that maximizes the log likelihood
n
n
1
xi   2 .
ln L    ln 2 2  
2 i 1
2
2
n
d ln L  1
By solving
X
 2 i 1 xi     0 , we have μ̂ = ̅
d

̅ indeed maximizes the loglikelihood, and thus the
It is easy to verify that μ̂ = X
likelihood function. So it is the MLE.
̂ ) = L(μ̂) = L(x̅).
Thus the maximum value of the likelihood L is L(Ω
n
1
(***Note: the unrestricted MLE is simply the usual MLE we discussed in the point
estimator section. Therefore to simply the notation, we shall not have insert any
subscript for the point estimators under Ω.***)
Therefore the likelihood ratio is:
5

L  0 
Lˆ 

ˆ
max  L 
L
 
n
n
 1 2
 1
xi   0 2 
exp 

2 
2 i 1
L 0   2 
 2



n
L x 
n
 1 2
 1
xi  x 2 
exp 

2 
2 i 1
 2 
 2



n
 1
xi   0 2  xi  x 2 
 exp 
2 i 1
 2

2
 1x   
 1
2
0
 exp  
   exp   z 0  
 2

 2   / n  
Therefore, the likelihood ratio test that will reject H0 when   * is equivalent to
the z-test that will reject H0 when Z 0  c , where c can be determined by the
significance level α as c  z / 2 .
2. Please derive the likelihood ratio test for H0: μ = μ0 versus Ha: μ ≠ μ0, when the
population is normal and population variance σ2 is unknown.
Solution: Now that we have two unknown parameters, the parameter spaces are:
   ,  2 :   0 , 0   2   and    ,  2 :     , 0   2  






The likelihood under the null hypothesis is:
L   L  0 ,  2


n
  x i   0 2   1  2
n
 1
2


 i 1
exp  
exp 
xi   0  


2
2
2
i

1

2
 2

2 2

  2 
There is one free parameter, σ2, in L  . Now we shall find the value of σ2 that
n
1
n
2
maximizes the log likelihood ln L     ln  2 2   2  i 1  xi  0  . By
2
2
d ln L  
n
n
1
2
2 = 1 ∑n (X − μ )2
  2  4  i1  xi  0   0 , we have σ̂
solving
ω
0
2
n i=1 i
d
2
2
n
1
It is easy to verify that this solution indeed maximizes the loglikelihood, and thus
the likelihood function.
The unrestricted likelihood is:
L   L ,  2 
 i 1
n
  x i   2
exp  
2
2 2
2

1
n
  1 2
 1

  2 2  exp  2 2

 x
n
i 1
i
2
  

6
There are two free parameter μ and σ2 in L . Now we shall find the value of μ
and σ2 that maximizes the log likelihood
n
n
1
xi   2 .
ln L    ln 2 2  
2 i 1
2
2
By solving the equation system:
 ln L    1
 ln L   
n
n
n
1
2
  2  4  i1  xi     0
 2 i 1  xi     0 and
2

2
2


1
̂2 = ∑n (Xi − ̅
we have ̂  X and σ
X)2
i=1
n
It is easy to verify that this solution indeed maximizes the loglikelihood, and thus
the likelihood function.
Therefore the likelihood ratio is:
n





2
L 0 , ˆ2
L ˆ  max 2 L 0 , 



ˆ
max  , 2 L  ,  2
L ˆ , ˆ 2
L 
 
  n  xi  0 2 

  i n1
   xi  x 2 
i 1




n
2




2
n
n

 exp   
n
2

 2   xi  0  
 2 
i 1



n

2
n
n

 exp   
n
2

 2   xi  x  
 2 
i 1


  n  xi  x 2  n  x  0 2 

  i 1
n
2


x

x


 i 1 i



n
2
2

n  x  0  

 1  n
   xi  x 2 
i 1



n
2
n
  t0  2  2

 1 
 n 1 


Therefore, the likelihood ratio test that will reject H0 when   * is equivalent to
the t-test that will reject H0 when t0  c , where c can be determined by the
significance level α as c  tn 1, / 2 .
3. Optimal property of the Likelihood Ratio Test
Theorem (Neyman-Pearson Lemma). The likelihood ratio test for a simple null
hypothesis 𝐻0 : 𝜃 = 𝜃 ′ versus a simple alternative hypothesis 𝐻𝑎 : 𝜃 = 𝜃 ′′ is a most
powerful test.
7
Egon Sharpe Pearson (August 11, 1895 – June 12, 1980); Karl Pearson (March 27,
1857 – April 27, 1936); Maria Pearson (née Sharpe); Sigrid Letitia Sharpe
Pearson.
In 1901, with Weldon and Galton, Karl Pearson founded the journal Biometrika
whose object was the development of statistical theory. He edited this journal until
his death. In 1911, Karl Pearson founded the world's first university statistics
department at University College London. His only son, Egon Pearson, became
an eminent statistician himself, establishing the Neyman-Pearson lemma. He
succeeded his father as head of the Applied Statistics Department at University
College. http://en.wikipedia.org/wiki/Karl_Pearson
Jerzy Neyman (April 16, 1894 – August 5, 1981), is best known for the NeymanPearson Lemma. He has also developed the concept of confidence interval (1937),
and contributed significantly to the sampling theory. He published many books
dealing with experiments and statistics, and devised the way which the FDA tests
medicines today. Jerzy Neyman was also the founder of the Department of
8
Statistics at the University of California,
http://en.wikipedia.org/wiki/Jerzy_Neyman
Berkeley,
in
1955.
Note: Under certain conditions, the likelihood ratio test is also a uniformly
most power test for a simple null hypothesis versus a composite alternative
hypothesis, or for a composite null hypothesis versus a composite alternative
hypothesis. (See for example, the Karlin-Rubin theorem.)
Topic 4. Likelihood Ratio Test (for two population means,
both populations are normal, the populations variances are unknown
but equal, and we have two independent samples)
Derivation of the Pooled Variance T-Test (2-sided test)
Given that we have two independent random samples from two normal populations
with equal but unknown variances. Now we derive the likelihood ratio test for:
H0 : μ1 = μ2 vs Ha : μ1 ≠ μ2
Let μ1 = μ2 = μ, then,
={−∞ < μ1 = μ2 = μ < +∞, 0 ≤ σ2 < +∞}, Ω = {−∞ < μ1 , μ2 < +∞, 0 <
σ2 < +∞}
1
n1 +n2
2
1
n1
2
(xi − μ)2 + ∑nj=1
L(ω) = L(μ, σ2 ) = ( 2 ) 2 exp[− 2 (∑i=1
(yj − μ) )], and
2πσ
2σ
there are two parameters .
2
n +n
1
n1
2
(xi − μ)2 + ∑nj=1
lnL(ω) = − 1 2 2 ln(2πσ2 ) − 2σ2 (∑i=1
(yj − μ) ), for it contains
two parameters, we do the partial derivatives with and σ2 respectively and let the
partial derivatives equal to 0. Then we have:
n2
1
∑ni=1
xi + ∑j=1
yj n1 x̅ + n2 y̅
μ̂ =
=
n1 + n2
n1 + n2
2 =
σ̂
ω
n1
n2
1
2
[∑ (xi − μ̂)2 + ∑ (yj − μ̂) ]
n1 + n2
i=1
j=1
1
n1 +n2
1
2
n1
2
(xi − μ1 )2 + ∑nj=1
L(Ω) = L(μ1 , μ2 , σ2 ) = (2πσ2 ) 2 exp[− 2σ2 (∑i=1
(yj − μ2 ) )],
and there are three parameters.
n1
n2
n1 + n2
1
2
lnL(Ω) = −
ln(2πσ2 ) − 2 (∑ (xi − μ1 )2 + ∑ (yj − μ2 ) )
2
2σ
i=1
j=1
We do the partial derivatives with μ1 , μ2 and σ2 respectively and let them all equal to
0. Then we have:
n1
n2
1
2
μ
̂1 = x̅, μ
̂2 = y̅, σ̂2Ω =
[∑ (xi − x̅)2 + ∑ (yj − y̅) ]
n1 + n2
i=1
j=1
At this time, we have done all the estimation of parameters. Then, after some
cancellations/simplifications, we have:
9
n1 +n2
2
1
n1 +n2
( ̂
)
2
2
̂
2
L(ω
̂)
σ
2πσω
Ω
λ=
=
]
n1 +n2 = [ ̂
̂)
L(Ω
σ2ω
2
1
( ̂2 )
2πσΩ
1
∑ni=1
(xi
− x̅)2 +
2
∑nj=1
(yj
2
n1 +n2
2
− y̅)
=[
]
n1 x̅ + n2 y̅ 2
n1 x̅ + n2 y̅ 2
n2
1
∑ni=1
∑
(xi − n + n ) + j=1 (yj − n + n )
1
2
1
2
n1 +n2
t 20
]− 2
n1 + n2 − 2
where t 0 is the test statistic in the pooled variance t-test. Therefore, λ ≤ λ∗ is
equivalent to |t 0 |≥ c. Thus at the significance level α, we reject the null hypothesis in
favor of the alternative when |t 0 | ≥ c = t n1 +n2−2,α/2 .
= [1 +
10
Download