Statistics 512 Notes 14: Properties of Maximum Likelihood Estimates

advertisement
Statistics 512 Notes 16: Efficiency of Estimators
and the Asymptotic Efficiency of the MLE
Method of moments estimator
, X n iid f ( x; ),    .
Find E ( X i )  h( ) .
X1 ,
1
Method of moments estimator ˆMOM  h ( X ) .
Examples:
(1) X 1 , , X n iid uniform (0,  ) .
E ( X i ) 
ˆMOM

2.
 2X
(2) X 1 ,
, X n iid logistic distribution
exp{( x   )}
f ( x; ) 
  x  ,      .
(1  exp{( x   )}) 2 ,
E ( X i )  
ˆ
X
MOM
ˆ
MLE
exp{( X i   )}
n

solves  i 1 1  exp{( X   )} 2
i
n
Efficiency of estimators:
A good criterion for comparing estimators is the mean
squared error:
MSE (ˆ)  E (ˆ   )2   {Bias (ˆ)}2  Var (ˆ)
For unbiased estimators, MSE (ˆ)  Var (ˆ)
Relative efficiency of two unbiased estimators:
Let W1 and W2 be two unbiased estimators for  with
variances Var (W1 ) and Var (W2 ) respectively. We will call
W1 more efficient than W2 if Var (W1 ) < Var (W2 ) .
Also the relative efficiency of W1 with respect to W2 is
Var (W2 ) / Var (W1 ) .
Rao-Cramer Lower Bound:
The concept of relative efficiency provides a working
criterion for choosing between two competing estimators
but it does not give us any assurance that even the better of
W1 and W2 is any good. How do we know that there isn’t
an unbiased estimator W3 which is better than both W1 and
W2 ? The Rao-Cramer lower bound provides a partial
answer to this question in the form of a lower bound.
Theorem 6.2.1 (Rao-Cramer Lower Bound): Let
X 1 , , X n be iid with pdf f ( x; ) for   . Assume that
the regularity conditions (R0)-(R4) hold. Let
Y  u ( X1 , , X n ) be a statistic with mean
E (Y )  E [u( X1 , , X n )]  k ( ) . Then
[k '( )]2
Var (Y ) 
nI ( ) .
Note (Corollary 6.2.1): If Y  u ( X1 , , X n ) is an unbiased
estimator of  , then E (Y )  E [u ( X1 , , X n )]  k ( )  
so that k '( )  1 . Thus for unbiased estimators
Y  u ( X1 , , X n ) , there is a lower bound on the variance
1
Var (Y ) 
nI ( )
Proof: The proof of this theorem is a clever application of
the Cauchy-Schwarz Inequality or, stated statistically, the
fact that for any two random variables V and W,
[Cov(V ,W )]2  (VarV )(VarW )
(*)
If we rearrange (*), we can get a lower bound on the
variance of X,
[Cov(V ,W )]2
VarV 
(**).
VarW
The cleverness in this theorem follows from choosing V to
to be the estimator Y  u ( X1 , , X n ) and W to be the

quantity  log f ( X 1 , , X n ; ) and applying the CauchySchwarz Inequality.

First, we calculate Cov u ( X 1 ,

, X n ),

log f ( X 1 ,


, X n ; )  .

We have

E u ( X 1 ,

  u( x ,
1
  u( x ,
1





log f ( X 1 , , X n ; )  



, xn )
log f ( X 1 , , X n ; ) f ( x1 ) f ( xn )dx1 dxn 


f ( X 1 , , X n ; )


, xn )
f ( x1 ) f ( xn )dx1 dxn 
f ( X 1 , , X n ; )
, Xn)
 u ( x1 ,

E [u ( X 1 ,

, xn )
f ( X1,
f ( X1,
, X n ; )
f ( x1 )
, X n ; )
f ( xn )dx1
dxn 
, X n )]  k '( )
Also we have
E [





log f ( X 1 , , X n ; )]   
log f ( x1 , , xn ; ) f ( x1 ,



f ( x1 , , xn ; )

 f ( x1 , , xn ; ) f ( x1 , , xn ; )dx1 dxn 
f ( x1 ,
  f (x ,
Thus,
1
, xn ; )
f ( x1 ,
, xn ; )

Cov u ( X 1 ,

, X n ),
, xn ; )dx1
, xn ; ) dx1
dxn 
dxn 0

log f ( X 1 ,


, X n ; )   k '( )

Finally, we calculate

Var  log f ( X 1 ,
 

 n 

, X n ; )   Var   i 1 log f ( X i ; )  




2
2
   
n
 

    
 
 i1Var   log f ( X i ; )   n  E   log f ( X1; )     E   log f ( X 1; )    
 
 
 
n( I ( )  0)  nI ( )
Thus, using (**), we conclude that
[k '( )]2
Var (Y ) 
nI ( )

Example: Let X 1 , , X n be iid Poisson(  ). On your
homework, you should have found that
ˆMLE  X
1
I ( ) 

From the properties of the Poisson distribution, we know

that E ( X )   , Var ( X )  n .
The Rao-Cramer lower bound for the variance of an
1
1

Var (Y ) 


unbiased estimator is
nI ( ) n 1 n .

Thus, the maximum likelihood estimator is efficient.
The Rao-Cramer lower bound might not be achieved by
any unbiased estimator.
Asymptotic Optimality of MLE
The maximum likelihood estimator is consistent so that its
bias converges to 0 as n   .
Example 6.2.4 shows that the maximum likelihood
estimator may not achieve the Rao-Cramer lower bound for
finite samples.
Under the regularity conditions assumed in Theorem 6.2.2,
D

1 
ˆ
n  MLE   0  N  0,
ˆ ) 
I
(

MLE 



Informally, Theorem 6.2.2 and its corollary say that the
distribution of the MLE can be approximated by
1
N (0 ,
).
ˆ
nI ( )
MLE
Thus, the MLE is asymptotically unbiased and has variance
equal to the Rao-Cramer lower bound.
In this sense, the MLE is as efficient as any other estimator
for large samples. For large enough samples, the MLE is
the optimal estimator.
Monte Carlo comparison of MSE for maximum likelihood
vs. method of moments for the logistic distribution.
+
Download