Notes 14 - Wharton Statistics Department

Statistics 512 Notes 14: Properties of Maximum Likelihood Estimates Continued Good properties of maximum likelihood estimates: (1) Invariance (2) Consistency (3) Asymptotic Normality (4) Efficiency Asymptotic Normality Suppose X 1 , , X n iid with density f ( x; ) ,   . Under regularity conditions, the large sample distribution of ˆ is approximately normal with mean  and variance MLE 0 1/(nI (0 )) where 0 is the true value of  . Regularity Conditions: (R0) The pdfs f ( x; ) are distinct, i.e.,    ' implies f ( x; )  f ( x; ') (the model is identifiable). (R1) The pdfs have common support for all  . (R2) The point 0 is an interior point of  . (R3) The pdf f ( x; ) is twice differentiable as a function of  . (R4) The integral  f ( x; )dx can be differentiated twice under the integral sign as a function of  Note that X 1 , (R1). , X n iid uniform on [0, ] does not satisfy Fisher information: Define I ( ) by  I ( )  E [ log f ( X ; )]2 .  I ( ) is called the Fisher information about  .  The greater the squared value of  log f ( X ;  ) is on average, the more information there is to distinguish between different values of  , making it easier to estimate . Lemma: Under the regularity conditions, 2 I ( )   E [ 2 log f ( X ; )] .  Proof: First, we observe that since  f ( x; )dx  1 ,  f ( x; )dx  0 .   Combining this with the identity     f ( x; )   log f ( x; )  f ( x; ) ,     we have     0 f ( x ;  ) dx  log f ( x ;  )     f ( x; )dx    where we have interchanged differentiation and integration using regularity condition (R4). Taking derivatives of the expressions just above, we have 0     log f ( x ;  )  f ( x; )dx       2        2 log f ( x; )  f ( x; )dx    log f ( x; )  f ( x;  ) dx       2 so that  2     I ( )    log f ( x; )  f ( x; )dx     2 log f ( x; )  f ( x; )dx       2 Example: Information for a Bernoulli random variable. Let X be Bernoulli (p). Then log f ( x; p)  x log p  (1  x) log(1  p) ,  log f ( x; p) x 1  x   , p p 1 p  2 log f ( x; p)  x 1 x   p 2 p 2 (1  p)2 Thus, X 1 X  I ( p)   E p  2  (1  p) 2   p p 1 p 1 1 1     p 2 (1  p) 2 p (1  p) p(1  p) There is more information about p when p is closer to zero or one.  Additional regularity condition: (R5) The pdf f ( x; ) is three times differentiable as a function of  . Further, for all   , there exists a constant c and a function M(x) such that 3 log f ( x; )  M ( x)  3 with E0 [ M ( X )]   for all 0  c    0  c and all x in the support of X. Theorem (6.2.2): Assume X 1 , , X n are iid with pdf f ( x;0 ) for  0   such that the regularity conditions (R0)(R5) are satisfied. Suppose further that Fisher information satisfies 0  I (0 )   . Then D  1  ˆ n  MLE  0  N  0,  I (  ) 0     Proof: Sketch of proof. From a Taylor series expansion, 0  l '(ˆMLE )  l '( 0 )  (ˆMLE   0 )l ''( 0 ) (ˆMLE  0 )  l '(0 ) l ''(0 ) 1/ 2  n l '(0 ) n (ˆMLE  0 )  1 n l ''(0 ) First, we consider the numerator of this last expression. Its expectation is n    n 1/ 2  i 1 E0  log f ( X i ; 0 )   0    because  f ( x; )       E0  log f ( X i ;0 )    f (x;0 )dx  f ( x;0 )      f ( x; )dx  0 Its variance is 2    1 n 1/ 2 Var0 [n l '(0 )]   i 1 E0  log f ( X i ; )   I (0 ) n   0    Next we consider the denominator: 1 1 n 2 l ''(0 )   i 1 2 log f ( X i ; ) n n  By the law of large numbers, the latter expression converges to  2     I (0 ) E0  2 log f ( X ; )    0   We thus have 1/ 2 n l '(0 ) n (ˆMLE  0 )  I (0 ) Therefore, E [n1/ 2 (ˆMLE  0 )]  0 . Furthermore, I ( ) 1 Var[n1/ 2 (ˆMLE   0 )]  2 0  I ( 0 ) I ( 0 ) and thus 1 Var[(ˆMLE   0 )]  nI ( 0 ) The central limit theorem may be applied to l '( 0 ) , which is a sum of iid random variables:  n l '( 0 )   i 1 log f ( X i ; )   Corollary: Under the same assumptions as Theorem 6.2.2, D  1  ˆ n  MLE   0  N  0, ˆ )  I (  MLE     Informally, Theorem 6.2.2 and its corollary say that the distribution of the MLE can be approximated by 1 N (0 , ). ˆ nI ( ) MLE From this fact, we can construct an asymptotic correct confidence interval. 1 1 ˆ z ˆ z C  (  ,  ). MLE  /2 MLE  /2 Let n ˆ ˆ nI ( MLE ) nI ( MLE ) P Then P0 (0  Cn ) 1   as n   . For   0.05, z / 2  1.96  2 so ˆMLE  2 1 nI (ˆ approximate 95% confidence interval for  . MLE ) is an Example 1: Let X 1 , , X n be iid Bernoulli (p). The MLE is 1 I ( p )  p̂  X . We calculated above that p(1  p) . Thus, an approximate 95% confidence interval for p is 1/ 2  pˆ (1  pˆ )  pˆ  2   . This is what the newspapers report n   when they say “the poll is accurate to within four points, 95 percent of the time.” Computation of maximum likelihood estimates Example 2: Logistic distribution. Let X 1 , , X n be iid with density exp{( x   )} f ( x; )    x  ,      . (1  exp{( x   )}) 2 , The log of the likelihood simplifies to: n n l ( )  i 1 log f ( X i ; )  n  nX  2i 1 log(1  exp{( X i   )}) Using this, the first derivative is exp{( X i   )} n l '( )  n  2 i 1 1  exp{( X i   )} Setting this equal to 0 and rearranging terms results in teh equation: exp{( X i   )} n n   i 1 1  exp{( X   )} 2 . (*) i Although this does not simplify, we can show the equation (*) has a unique solution. The derivative of the left hand side of (i) simplifies to exp{( X i   )} exp{( X i   )} n   i 1 1  exp{( X   )}  i 1 1  exp{( X   )}2  0 i i Thus, the left hand side of (*) is a strictly increasing function of  . Finally, the left hand side of (*) approaches 0 as    and approaches n as    . Thus, the equation (*) has a unique solution. Also the second derivative of l ( ) is strictly negative for all  ; so the solution is a maximum. n How do we find the maximum likelihood estimate that is the solution to (*)? Newton’s method is a numerical method for approximating solutions to equations. The method produces a sequence of (0) (1) values  , , that, under ideal conditions, converges to the MLE ˆ . MLE To motivate the method, we expand the derivative of the ( j) log likelihood around  : 0  l '(ˆMLE )  l '( ( j ) )  (ˆMLE   ( j ) )l ''( ( j ) ) Solving for ˆ gives MLE l '( ( j ) )  MLE   l ''( ( j ) ) This suggests the following iterative scheme: l '( ( j ) ) ( j 1) ( j)    l ''( ( j ) ) . ˆ ( j) The following is an R function that uses Newton’s method to approximate the maximum likelihood estimate for a logistic distribution: mlelogisticfunc=function(xvec,toler=.001){ startvalue=median(xvec); n=length(xvec); thetahatcurr=startvalue; # Compute first deriviative of log liklelihood firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))); # Continue Newton’s method until the first derivative # of the likelihood is within toler of 0 while(abs(firstderivll)>toler){ # Compute second derivative of log likelihood secondderivll=-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))^2); # Newton’s method update of estimate of theta thetahatnew=thetahatcurr-firstderivll/secondderivll; thetahatcurr=thetahatnew; # Compute first derivative of log likelihood firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))); } list(thetahat=thetahatcurr); }

Notes 14 - Wharton Statistics Department

Related documents

Products

Support

Notes 14 - Wharton Statistics Department

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib