Statistics 512 Notes 12: Maximum Likelihood Estimation

advertisement
Statistics 512 Notes 13: Properties of Maximum
Likelihood Estimates
L( )  L( ; X1 , , X n )  f ( X1 , , X n )
ˆMLE  max L( ; X1, , X n )  max l ( ; X1,
, Xn )
Good properties of maximum likelihood estimates:
(1) Invariance
(2) Consistency
(3) Asymptotic Normality
(4) Efficiency
Invariance (Theorem 6.1.2): Let X 1 , , X n be iid with the
pdf f ( x; ),    . For a specified function g, let
  g ( ) be a parameter of interest. Suppose ˆ is the mle
of  . Then g (ˆ) is mle of   g ( ) .
Proof: For each  in the range of g, define the set
g 1 ( )  { : g ( )  }
The maximum occurs at ˆ and the domain of g is  which
covers ˆ . Hence ˆ is in one of these sets and , in fact can
only be in one set. Hence to maximize L ( ) , choose ̂ so
1
that g (ˆ ) is the unique set containing ˆ . Then ˆ  g (ˆ) .
Example: X 1 , , X n iid Bernoulli (p). The large sample
confidence sample interval for p is
pˆ  1.96 Varˆ( pˆ )
The MLE for p̂ is pˆ MLE 

n
i 1
n
Xi
.
p(1  p)
.
n
By invariance the maximum likelihood estimate of
pˆ (1  pˆ )
Var ( pˆ MLE ) is
.
n
The variance of the MLE is Var ( pˆ MLE ) 
Consistency:
Consistency means that the MLE converges in probability
to the true value. To proceed, we need a definition. If f
and g are PDF’s define the Kullback-Leibler distance
between f and g to be
 f ( x) 
D( f , g )   f ( x) log 
dx
g
(
x
)


It can be shown that D ( f , g )  0 and D( f , f )  0 . For any
 ,   to mean D( f ( x), f ( x)) .
We say that the model is identifiable if    implies that
D ( , )  0 . This means that different values of the
parameter correspond to different distributions. We will
assume that the model is identifiable.
Let 0 denote the true value of  . Let ln ( ) denote the log
likelihood of  based on an iid sample X 1 , , X n .
Maximizing ln ( ) is equivalent to maximizing
f (X )
1
M n ( )   log  i .
n i
f0 ( X i )
By the law of large numbers, M n ( ) converges to

 f ( x) 
f ( X i ) 
E0  log
   log 
 f0 ( x)dx 




f0 ( X i ) 

 f0 ( x) 
 f0 ( x) 
  log 
 f0 ( x)dx   D( , 0 )
f
(
x
)
 

Hence, M n ( )   D(0 , ) which is maximized at 0 since
 D(0 ,0 )  0 and  D(0 , )  0 for    0 . Therefore,
we expect that the maximizer will tend to 0 . To prove this
P
formally, we need more than M n ( )  D( 0 ,  ) . We
need this convergence to be uniform over  . We also have
to make sure that the function D(0 , ) is well behaved.
Here are the formal details
Theorem: Let 0 denote the true value of  . Define
f (X )
1
M n ( )   log  i
n i
f0 ( X i )
and M ( )   D(0 , ) . Suppose that
P
sup  | M n ( )  M ( ) |  0
and that for every   0 ,
sup :| * | M ( )  M ( 0 )
P
Let ˆn denote the MLE. Then ˆn   0 .
Proof: Since ˆn maximizes M n ( ) , we have
M (ˆ )  M ( ) . Hence,
n
n
n
0
M ( 0 )  M (ˆn )  M n ( 0 )  M (ˆn )  M ( 0 )  M n ( 0 )
 M (ˆ )  M (ˆ )  M ( )  M ( )
n
n
n
0
n
0
 sup | M n ( )  M ( ) |  M ( 0 )  M n ( 0 )
P
0
where the last line follows from the assumption
P
sup  | M n ( )  M ( ) |  0 .
Pick any   0 . By the assumption
sup :| * | M ( )  M ( 0 ) , there exists   0 such that
|   0 |  implies that M ( )  M (0 )   . Hence,
P(| ˆ   |  )  P(M (ˆ )  M ( )   )  0 .
n
0
n
0
Download