9. Maximum Likelihood Estimation The estimating equations (normal equations) are I. Ordinary Least Squares Estimation: Only requires a model for the conditional mean of the response variable. For a linear model @Q(b) @b0 = ;2 (Y ; b0 ; b1X1 ; X ) = 0 @Q(b) @bi = ;2 n X j j and n X j r j rj =1 =1 Xij (Yj ; b0 ; b1X1j ; br Xrj ) = 0 for i = 1; 2; : : : ; r = 0 + 1X1j + + rXrj + j ; the OLS estimator for Yj 2 3 0 6 . = 4 . 75 r is any 2 3 b0 6 . b = 4 . 75 br that minimizes the sum of squared residuals Q(b) = n X j =1 (Yj ; 0 ; ; rXrj )2: The matrix form of these equations is (X T X )b = X T Y and a solution is b = (X T X );X T Y: 584 For a Gauss-Markov model with E (Y) = X and V ar(Y) = 2I the OLS estimator of an estimable function is the unique best linear unbiased estimator (b.l.u.e.) of : The OLS estimator is C T b = C T (X T X );X T Y for any solution to the normal equations. E (C T b) = C T and T 2 T V ar(C b) = C (X T X );C: The distribution of specied. Y is not completely 586 585 II. Generalized Least Squares Estimation Consider the Aitken model E (Y) = X and V ar(Y) = 2V where V is positive denite symmetric matrix of known constants and 2 is an unknown variance parameter. A GLS estimator for is any b that minimizes Q(b) = (Y ; X b)T V ;1(Y ; X b) (from Denition 3.8 with = 2V ). 587 The estimating equations are (X T V ;1X )b = X T V ;1Y: An unbiased estimator for 2 in the Aitken model is 1 2 = T ^GLS n;rank(X ) Y h i V ;1 ; V ;1X (X T V ;1X );X T V ;1 Y: A solution is bGLS = (X T V ;1X );X T V ;1Y: For any estimable function C T the unique b.l.u.e. is C T bGLS = C T (X T V ;1X );X T V ;1Y for any solution to the normal equations. E( )= and 2 T V ar( = C (X T V ;1X );C: CT b CT b CT The distribution of specied. Y is not completely In practice, V may not be known. Then 2 can be approximated by bGLS and GLS replacing V with a consistant estimator for V : { The estimator for C T is not b.l.u.e. { The estimator for 2 is not unbiased. { Both estimators are consistent. 589 588 III. Maximum Likelihood Estimation The model must include a complete specication of the joint distribution of the observations Find the parameter values that maximize the \likelihood" of the observed data. For the normal-theory Gauss-Markov model, the likelihood function is Example: Normal theory Gauss-Markov model: Yj = 0 + 1X1j + + r Xrj + j where or j NID(0; 2); L( ; 2; Y1; : : : ; Yn) 1 T = (2)1n=2n e; 22 (Y;X ) (Y;X ) i = 1; : : : ; n Find values of and 2 that maximize this likelihood function. 2 3 Y1 6 . Y = 4 . 75 N (X ; 2I ) Yn 590 591 This is equivalent to nding values of and 2 that maximize the log-likelihood. `( ; 2; Y1; : : : ; Yn) @`(; 2; Y) @0 = 12 = log L(; 2; Y1; : : : ; Yn) = ; n2 log(2) ; n2 log(2) ; 212 (Y ; X )T (Y ; X ) n X (Yj ; 0 ; ; rXrj ) = 0 j =1 @`(; 2; Y) @i = ; 2 log(2) ; 2 log(2) n X ; 212 (Yj ; 0 ; ; rXrj )2 j =1 % this is minimized by an OLS estimator for , regardless of the value of 2 n Solve the likelihood equations: n = 12 n X j =1 Xij (Yj ; 0 ; ; r Xrj ) = 0 for i = 1; 2; : : : ; r @`(; 2; Y) @ 2 n X = ; 2n2 + 212 (Yj ; 0 ; ; rXrj )2 j =1 = 0 593 592 Solution: and ^2 ^ = bOLS = ( XT X General normal-theory linear model: Y = X + where N (0; ) and is known. ); X T Y n X = n1 (Yj ; ^0 ; ; ^rXrj )2 j =1 The multivariate normal likelihood function is = n1 YT (I ; PX )Y = n1 SSE % This estimator for 2 is biased. L( ; Y) = (2)n=12jj1=2 e; 21 (Y;X )T ;1(Y;X ) 1 n;rank (X ) SSE is an unbiased estimator for 2. The log-likelihood function is n 1 `( ; Y) = ; log(2 ) ; log(jj) 2 2 1 n1 SSE and n;rank (X ) SSE are asymptotically equivalent. ; 21 (Y ; X )T ;1(Y ; X ) 594 595 Maximizing the log-likelihood when is known is equivalent to nding a that minimizes (Y ; X )T ;1(Y ; X ) The estimating equations are (X T ;1X ) = X T ;1Y Solutions are of the form ^ = bGLS = (X T ;1X ); X T ;1Y For the general normal theory linear model, when is known, maximum likelihood estimation is the same as generalized least squares estimation. When contains unknown parameters: You could maximize the log-likelihood `(; ; Y) = ; n2 log(2) ; 12 log(jj) ; 12 (Y ; X )T ;1(Y ; X ) with respect to both and . There may be no algebraic formulas for solutions to the joint likelihood equations, say ^ and ^ . The MLE for is usually biased. 596 Similarly, generalized least squares estimation and maximum likelihood estimation are equivalent for in the Aitken model Y N (X ; 2V ) when V is known. Substitute = 2V into the previous discussion. Then, the log-likelihood is `( ; 2; Y) 597 The \likelihood equations" are (X T V ;1X ) = X T V ;1Y and 2 = 1 (Y ; X )T V ;1(Y ; X ) n Solutions have the form ^ = bGLS = (X T V ;1X ); X T V ;1Y = ; n2 log(2) ; n2 log(2) ; 21 log(jV j) ; 212 (Y ; X )T V ;1(Y ; X ) 598 and ^2 = 1 (Y ; X ^)T V ;1(Y ; X ^ ) n The likelihood equations are more complicated when V contains unknown parameters. 599 (iii) First three partial derivatives of the log-likelihood function, with respect to the parameters, General Properties of MLE's: (a) exist (b) are bounded by a function with a nite expectation. Regularity Conditions 9.1 (i) The parameter space has nite dimension, is closed and compact, and the true parameter vector is in the interior of the parameter space. (ii) Probability distributions dened by any two dierent values of the parameter vector are distinct (an identiability condition). (iv) The expectation of the negative of the matrix of second partial derivatives of the log-likelihood is (a) nite (b) positive denite in a neighborhood of the true value of the parameter vector. This is called the Fisher information matrix. 600 Suppose Y1; : : : ; Yn are independent vectors of observations, with 2 3 Y1j Yj = 64 .. 75 ; Ypj j =1 f (Yj ; ) and the log-likelihood function is `(; Y1; : : : ; Yn) = log (L( ; Y1; : : : ; Yn)) = n X j =1 log f (Yj ; ) 3 7 7 7 5 is the vector of rst partial derivatives of the log-likelihood function with respect to the elements of Then, the joint likelihood function is n Y The score function 2 3 2 @`( ;Y1;:::;Yn) u1() 6 @1 6 7 6 . .. u() = 4 . 5 = 64 @` ( ; Y ur () 1;:::;Yn) @r and the density function (or probability function) is f (Yj ; ) L( ; Y1; : : : ; Yn) = 601 3 2 1 6 . = 4 . 75 : r The likelihood equations are u(; Y1; : : : ; Yn) = 0 : 602 603 Let The maximum likelihood estimator (MLE) 2 3 ^1 6 ^ = 4 .. 75 ^r is a solution to the likelihood equations, that maximizes the log-likelihood function. Fisher information matrix: i() = Var(u( ; Y1; : : : ; Yn)) = = ;E @`(; Y1; : : : ; Yn) @r @k denote the parameter vector i() denote the Fisher information matrix ^ denote the MLE for . Then, if Regularity Conditions 9.1 are satised, we have the following results: Result 9.1: ^ is a consistent estimator. E u(; Y1 : : : ; Yn)[u( ; Y1; : : : ; Yn)]T " #! Pr o (^ ; )T (^ ; ) > ! 0 as n ! 1; for any > 0. 605 604 Result 9.2: Asymptotic normality References: 0 pn(^ ; ) dist ;1 ;!n N 0; nlim n [ i ( )] !1 as n Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, (2nd ed.), Wiley, New York. n ! 1: Cox, C. (1984). American Statistician, 38, pp. 283{287. With a slight abuse of notation we may express this as ^ N ; [i()];1 Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman & Hall, London (Chapters 8 and 9). Rao, C.R. (1973). Linear Statistical Inference, Wiley, New York (Chapter 5). for \large" sample sizes. 606 607