Statistics 512 Notes 15: Properties of Maximum Likelihood Estimates Continued Computation of maximum likelihood estimates Example 2: Logistic distribution. Let X 1 , , X n be iid with density exp{( x )} f ( x; ) x , . (1 exp{( x )}) 2 , The log of the likelihood simplifies to: n n l ( ) i 1 log f ( X i ; ) n nX 2i 1 log(1 exp{( X i )}) Using this, the first derivative is exp{( X i )} n l '( ) n 2 i 1 1 exp{( X i )} Setting this equal to 0 and rearranging terms results in teh equation: exp{( X i )} n n i 1 1 exp{( X )} 2 . (*) i Although this does not simplify, we can show the equation (*) has a unique solution. The derivative of the left hand side of (i) simplifies to exp{( X i )} exp{( X i )} n n i 1 1 exp{( X )} 2 0 i 1 1 exp{( X i )} i Thus, the left hand side of (*) is a strictly increasing function of . Finally, the left hand side of (*) approaches 0 as and approaches n as . Thus, the equation (*) has a unique solution. Also the second derivative of l ( ) is strictly negative for all ; so the solution is a maximum. How do we find the maximum likelihood estimate that is the solution to (*)? Newton’s method is a numerical method for approximating solutions to equations. The method produces a sequence of (0) (1) values , , that, under ideal conditions, converges to the MLE ˆ . MLE To motivate the method, we expand the derivative of the ( j) log likelihood around : 0 l '(ˆMLE ) l '( ( j ) ) (ˆMLE ( j ) )l ''( ( j ) ) Solving for ˆ gives MLE l '( ( j ) ) MLE l ''( ( j ) ) This suggests the following iterative scheme: l '( ( j ) ) ( j 1) ( j) l ''( ( j ) ) . ˆ ( j) The following is an R function that uses Newton’s method to approximate the maximum likelihood estimate for a logistic distribution: mlelogisticfunc=function(xvec,toler=.001){ startvalue=median(xvec); n=length(xvec); thetahatcurr=startvalue; # Compute first deriviative of log likelihood firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))); # Continue Newton’s method until the first derivative # of the likelihood is within toler of 0 while(abs(firstderivll)>toler){ # Compute second derivative of log likelihood secondderivll=-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))^2); # Newton’s method update of estimate of theta thetahatnew=thetahatcurr-firstderivll/secondderivll; thetahatcurr=thetahatnew; # Compute first derivative of log likelihood firstderivll=n-2*sum(exp(-xvec+thetahatcurr)/(1+exp(xvec+thetahatcurr))); } list(thetahat=thetahatcurr); } Issues with Newton’s method: ( j) 1. The method does not work if l ''( ) 0 . 2. The method does not always converge. 3. The method may converge to a local but not global maximum. 4. The starting value is important since different starting values can converge to different local maxima. If the log likelihood is not concave, try different starting values. Another useful method for computing maximum likelihood estimates is the EM algorithm (Chapter 6.6). Good book on computation in statistics: Numerical Methods of Statistics, John Monahan Confidence interval Theorem 6.2.2: Assume X 1 , , X n are iid with pdf f ( x; 0 ) for 0 such that the regularity conditions (R0)(R5) are satisfied. Then D 1 ˆ n MLE 0 N 0, I ( ) 0 Approximate (1 ) confidence interval for : 1 ˆMLE z / 2 nI (ˆ ) MLE For the logistic distribution model, 2 I ( ) E 2 log f ( X ; ) E n exp{( X i )} 2 i1 2 1 exp{( X i )} Problem: It is hard to compute I ( ) . Solutions: (1) Use the observed Fisher information rather than the expected information. Observed Fisher information = log f ( X ; ) n i 1 2 2 i ˆMLE Another approximate (1 ) confidence interval for besides that given by Theorem 6.2.2 is: 1 ˆMLE z / 2 n 2 i 1 2 log f ( X i ; ) ˆ MLE (2) Parametric bootstrap: In the nonparametric bootstrap, we approximated the true distribution F by the empirical distribution Fˆn . In the parametric bootstrap, we approximate the true distribution F by the density f ( x;ˆMLE ) . To obtain a bootstrap percentile confidence interval: * * 1. Draw an iid sample X 1 , , X n from f ( x;ˆMLE ) . * * * 2. Compute , ˆ ˆ ( X , , X ) b MLE 1 n 3. Repeat steps 1 and 2 for b=1,...,M. 4. The bootstrap percentile confidence interval endpoints * * for are the / 2 and 1 / 2 quantiles of (ˆ1 , ,ˆM ) . The function rlogis(n,location=theta,scale=1) generates n iid logistic random variables in R. # Function that forms percentile bootstrap confidence # interval for parameter theta of logistic distribution percentcibootlogisticfunc=function(X,m,alpha){ # X is a vector containing the original sample # m is the desired number of bootstrap replications thetahat=mlelogisticfunc(X)$thetahat; thetastar=rep(0,m); # stores bootstrap estimates of theta n=length(X); # Carry out m bootstrap resamples and estimate theta for # each resample for(i in 1:m){ Xstar=rlogis(n,location=thetahat,scale=1); thetastar[i]=mlelogisticfunc(Xstar)$thetahat; } thetastarordered=sort(thetastar); # order the thetastars cutoff=floor((alpha/2)*(m+1)); lower=thetastarordered[cutoff]; # lower CI endpoint upper=thetastarordered[m+1-cutoff]; # upper CI endpoint list(thetahat=thetahat,lower=lower,upper=upper); } Optimality (Efficiency) of the MLE: We will now show that for large samples, the MLE does as well as any other possible estimator in some sense. We start by presenting a lower bound on the variance of an unbiased estimator for an estimator in finite samples. Theorem 6.2.1 (Rao-Cramer Lower Bound): Let X 1 , , X n be iid with pdf f ( x; ) for . Assume that the regularity conditions (R0)-(R4) hold. Let Y u ( X1 , , X n ) be a statistic with mean E (Y ) E [u( X1 , , X n )] k ( ) . Then [k '( )]2 Var (Y ) nI ( ) . Proof: We will follow the proof in the book.