Notes 14 - Wharton Statistics Department

advertisement
Statistics 512 Notes 14: Properties of Maximum
Likelihood Estimates Continued
Good properties of maximum likelihood estimates:
(1) Invariance
(2) Consistency
(3) Asymptotic Normality
(4) Efficiency
Asymptotic Normality
Suppose X 1 , , X n iid with density f ( x; ) ,   .
Under regularity conditions, the large sample distribution
of ˆ
is approximately normal with mean  and variance
MLE
0
1/(nI (0 )) where 0 is the true value of  .
Regularity Conditions:
(R0) The pdfs f ( x; ) are distinct, i.e.,    ' implies
f ( x; )  f ( x; ') (the model is identifiable).
(R1) The pdfs have common support for all  .
(R2) The point 0 is an interior point of  .
(R3) The pdf f ( x; ) is twice differentiable as a function
of  .
(R4) The integral  f ( x; )dx can be differentiated twice
under the integral sign as a function of 
Note that X 1 ,
(R1).
, X n iid uniform on [0, ] does not satisfy
Fisher information: Define I ( ) by

I ( )  E [ log f ( X ; )]2 .

I ( ) is called the Fisher information about  .

The greater the squared value of  log f ( X ;  ) is on
average, the more information there is to distinguish
between different values of  , making it easier to estimate
.
Lemma: Under the regularity conditions,
2
I ( )   E [ 2 log f ( X ; )] .

Proof: First, we observe that since  f ( x; )dx  1 ,

f ( x; )dx  0 .


Combining this with the identity

 

f ( x; )   log f ( x; )  f ( x; ) ,

 

we have

 

0
f
(
x
;

)
dx

log
f
(
x
;

)
  
 f ( x; )dx
 

where we have interchanged differentiation and integration
using regularity condition (R4). Taking derivatives of the
expressions just above, we have
0
  

log
f
(
x
;

)
 f ( x; )dx
   

 2

 

   2 log f ( x; )  f ( x; )dx    log f ( x; )  f ( x;  ) dx
 

 

2
so that
 2

 

I ( )    log f ( x; )  f ( x; )dx     2 log f ( x; )  f ( x; )dx
 

 

2
Example: Information for a Bernoulli random variable.
Let X be Bernoulli (p). Then
log f ( x; p)  x log p  (1  x) log(1  p) ,
 log f ( x; p) x 1  x
 
,
p
p 1 p
 2 log f ( x; p)  x
1 x


p 2
p 2 (1  p)2
Thus,
X
1 X 
I ( p)   E p  2 
(1  p) 2 
 p
p
1 p
1
1
1




p 2 (1  p) 2 p (1  p) p(1  p)
There is more information about p when p is closer to zero
or one.

Additional regularity condition:
(R5) The pdf f ( x; ) is three times differentiable as a
function of  . Further, for all   , there exists a
constant c and a function M(x) such that
3
log f ( x; )  M ( x)
 3
with E0 [ M ( X )]   for all 0  c    0  c and all x in
the support of X.
Theorem (6.2.2): Assume X 1 , , X n are iid with pdf
f ( x;0 ) for  0   such that the regularity conditions (R0)(R5) are satisfied. Suppose further that Fisher information
satisfies 0  I (0 )   . Then
D

1 
ˆ
n  MLE  0  N  0,

I
(

)
0 



Proof: Sketch of proof. From a Taylor series expansion,
0  l '(ˆMLE )  l '( 0 )  (ˆMLE   0 )l ''( 0 )
(ˆMLE  0 ) 
l '(0 )
l ''(0 )
1/ 2

n
l '(0 )
n (ˆMLE  0 )  1
n l ''(0 )
First, we consider the numerator of this last expression. Its
expectation is
n
 

n 1/ 2  i 1 E0  log f ( X i ; 0 )   0
 

because

f ( x; )

 



E0  log f ( X i ;0 )   
f (x;0 )dx 
f ( x;0 )

 

 f ( x; )dx  0
Its variance is
2
 

1 n
1/ 2
Var0 [n l '(0 )]   i 1 E0  log f ( X i ; )
  I (0 )
n

 0 
 
Next we consider the denominator:
1
1 n 2
l ''(0 )   i 1 2 log f ( X i ; )
n
n

By the law of large numbers, the latter expression
converges to
 2

   I (0 )
E0  2 log f ( X ; )
 
 0 

We thus have
1/ 2
n
l '(0 )
n (ˆMLE  0 ) 
I (0 )
Therefore,
E [n1/ 2 (ˆMLE  0 )]  0 .
Furthermore,
I ( )
1
Var[n1/ 2 (ˆMLE   0 )]  2 0 
I ( 0 ) I ( 0 )
and thus
1
Var[(ˆMLE   0 )] 
nI ( 0 )
The central limit theorem may be applied to l '( 0 ) , which
is a sum of iid random variables:

n
l '( 0 )   i 1
log f ( X i ; ) 

Corollary: Under the same assumptions as Theorem 6.2.2,
D

1 
ˆ
n  MLE   0  N  0,
ˆ ) 
I
(

MLE 



Informally, Theorem 6.2.2 and its corollary say that the
distribution of the MLE can be approximated by
1
N (0 ,
).
ˆ
nI ( )
MLE
From this fact, we can construct an asymptotic correct
confidence interval.
1
1
ˆ z
ˆ z
C

(

,

).
MLE
 /2
MLE
 /2
Let n
ˆ
ˆ
nI ( MLE )
nI ( MLE )
P
Then P0 (0  Cn ) 1   as n   .
For   0.05, z / 2  1.96  2 so
ˆMLE  2
1
nI (ˆ
approximate 95% confidence interval for  .
MLE )
is an
Example 1: Let X 1 ,
, X n be iid Bernoulli (p). The MLE is
1
I
(
p
)

p̂  X . We calculated above that
p(1  p) . Thus,
an approximate 95% confidence interval for p is
1/ 2
 pˆ (1  pˆ ) 
pˆ  2 
 . This is what the newspapers report
n


when they say “the poll is accurate to within four points, 95
percent of the time.”
Computation of maximum likelihood estimates
Example 2: Logistic distribution. Let X 1 , , X n be iid with
density
exp{( x   )}
f ( x; ) 
  x  ,      .
(1  exp{( x   )}) 2 ,
The log of the likelihood simplifies to:
n
n
l ( )  i 1 log f ( X i ; )  n  nX  2i 1 log(1  exp{( X i   )})
Using this, the first derivative is
exp{( X i   )}
n
l '( )  n  2 i 1
1  exp{( X i   )}
Setting this equal to 0 and rearranging terms results in teh
equation:
exp{( X i   )}
n
n

 i 1 1  exp{( X   )} 2 .
(*)
i
Although this does not simplify, we can show the equation
(*) has a unique solution. The derivative of the left hand
side of (i) simplifies to
exp{( X i   )}
exp{( X i   )}
n

 i 1 1  exp{( X   )}  i 1 1  exp{( X   )}2  0
i
i
Thus, the left hand side of (*) is a strictly increasing
function of  . Finally, the left hand side of (*) approaches
0 as    and approaches n as    . Thus, the
equation (*) has a unique solution. Also the second
derivative of l ( ) is strictly negative for all  ; so the
solution is a maximum.
n
How do we find the maximum likelihood estimate that is
the solution to (*)?
Newton’s method is a numerical method for approximating
solutions to equations. The method produces a sequence of
(0)
(1)
values  , , that, under ideal conditions, converges
to the MLE ˆ .
MLE
To motivate the method, we expand the derivative of the
( j)
log likelihood around  :
0  l '(ˆMLE )  l '( ( j ) )  (ˆMLE   ( j ) )l ''( ( j ) )
Solving for ˆ gives
MLE
l '( ( j ) )

MLE  
l ''( ( j ) )
This suggests the following iterative scheme:
l '( ( j ) )
( j 1)
( j)

 
l ''( ( j ) ) .
ˆ
( j)
The following is an R function that uses Newton’s method
to approximate the maximum likelihood estimate for a
logistic distribution:
mlelogisticfunc=function(xvec,toler=.001){
startvalue=median(xvec);
n=length(xvec);
thetahatcurr=startvalue;
# Compute first deriviative of log liklelihood
firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr)));
# Continue Newton’s method until the first derivative
# of the likelihood is within toler of 0
while(abs(firstderivll)>toler){
# Compute second derivative of log likelihood
secondderivll=-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))^2);
# Newton’s method update of estimate of theta
thetahatnew=thetahatcurr-firstderivll/secondderivll;
thetahatcurr=thetahatnew;
# Compute first derivative of log likelihood
firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr)));
}
list(thetahat=thetahatcurr);
}
Download