The estimating equations 9. Maximum Likelihood (normal equations) are Estimation I. Ordinary Least Squares Estimation: 2 @Q(b) = @bi 2 and For a linear model Yj @Q(b) = @b0 n X j =1 n X j =1 (Yj b1X1j b0 Xij (Yj b0 3 = . 77777 r 5 2 66 66 66 66 4 3 b0 777 is any b = .. br 77 77 75 (X T X )b = X T Y that minimizes the sum of squared residuals n X Q(b) = (Yj b b X j br Xrj ) : j =1 0 1 2 1 br Xrj ) = 0 The matrix form of these equations is the OLS estimator for 0 777 b1X1j for i = 1; 2; : : : ; r = 0 + 1X1j + + rXrj + j ; 2 66 66 66 66 4 r Xrj ) = 0 and a solution is b = (X T X ) X T Y : 657 For a Gauss-Markov model with The OLS estimator for an estimable function C T is E (Y) = X and V ar(Y) = 2I the OLS estimator of an estimable function C T is the unique best linear unbiased estimator (b.l.u.e.) of C T : C T b = C T (X T X ) X T Y for any solution to the normal equations. E(C T b) = C T V ar(C T b) = C T (X T X ) X T X [(X T X ) where = V ar(Y). 658 ]T C: The distribution of Y is not completely specied. 659 E(C T b) = C T V ar(C T b) = 2C T (X T X ) is smaller than the variance of any other linear unbiasd estimator for C T : The distribution of Y is not completely specied. C 660 II. Generalized Least Squares Estimation The estimating equations are (X T V 1X )b = X T V 1Y: Consider the Aitken model E (Y) = X and V ar(Y) = 2V where V is a positive denite symmetric matrix of known constants and 2 is an unknown variance parameter. A solution is bGLS = (X T V 1X ) X T V 1Y: For any estimable function C T the unique b.l.u.e. is A GLS estimator for is any b that minimizes Q(b) = (Y X b) TV 1 (Y X b) (from Denition 3.8 with = 2V ). C T bGLS = C T (X T V 1 X) XT V for any solution to the normal equations. 661 E(C T b) = C T and V ar(C T b) = 2C T (X T V 1 662 X ) C: The distribution of Y is not completely specied. An unbiased estimator for 2 in the Aitken model is 2 ^ GLS = YT " V = (Y 1 V 1 X (X T V 1X ) XT V n rank(X ) X bGLS )T V 1(Y X bGLS) n 663 1 # 1 Y In practice, V may not be known. 2 Then bGLS and GLS can be approximated by replacing V with a consistent estimator: { The estimator for C T is not b.l.u.e. { The estimator for 2 is not unbiased. { Both estimators are consistent. 664 Y III. Maximum Likelihood Estimation Find the parameter values that maximize the \likelihood" of the observed data. The model must include a specification of the joint distribution of the observations. Example: Normal theory Gauss-Markov model: For the normal-theory Gauss-Markov model, the likelihood function is where L(; 2; Y1; : : : ; Yn) Yj = 0 + 1X1j + + r Xrj + j j NID(0; 2); or 2 66 66 66 66 4 Y1 i = 1; : : : ; n 3 77 77 77 77 5 1 (Y 2 2 X )T (Y X ) Find values of and 2 that maximize this likelihood function. Y = .. N (X; 2I ) Yn 1 = (2)n= e 2 n 666 665 This is equivalent to nding Solve the likelihood equations: values of and 2 that maximize the log-likelihood. 2 0 = @`(;@ ; Y) 0 = 1 n (Y `(; 2; Y1; : : : ; Yn) 0 @ 1 A = log L(; ; Y1; : : : ; Yn) 2 = n2 log(2) n2 log(2) 1 (Y X)T (Y X) 22 = n2 log(2) n2 log(2) 1 n (Y X ) 2 r rj 22 j =1 j 0 X % this is minimized by an OLS estimator for regardless of the value of 2 667 X 2 j =1 j 0 2 0 = @`(;@ ; Y) i = 1 n X (Y 0= X 2 j =1 ij j 0 r Xrj ) r Xrj ) for i = 1; 2; : : : ; r @`(; 2; Y) = @2 n 22 n +2(12)2 j =1 (Yj 0 rXrj )2 X 668 Solution: ^ = bOLS = (X T X ) X T Y Normal-theory Aitken model and 1 n (Y ^ ^ X )2 ^ 2 = r rj 0 n j =1 j X = n1 YT (I PX )Y = 1 SSE n % This is a biased estimator for 2. 1 n rank SSE is an unbiased (X ) estimator for . 1 SSE are n1 SSE and n rank (X ) asymptotically equivalent. 2 Y = X + where N (0; 2V ) and V is a known positive denite matrix. The multivariate normal likelihood function is L(; Y) = 1 (2 2 )n=2 jV j 1=2 e 1 (Y 2 2 X )T V 669 1 (Y X ) 670 For any value of , the loglikelihood is maximized by nding a that minimizes The log-likelihood function is 1 log(jV j) `(; Y) = log(2 ) 2 2 n 2 2 log( ) 1 (Y X)T V 1(Y X) 22 n (Y X)T V 1(Y X) The estimating equations are (X T V 1X ) = X T V 1Y Solutions are of the form ^ = bGLS = (X T V 1 X) XT V 1 Y When V is known the mle for is also the generalized least squares estimator. 671 The additional estimating equation corresponding to 2 is 2 0 = @`(;@2 ; Y) = 2n2 +2(12)2 (Y X)T V 1(Y X) Substituting the solution to the other estimating equations for , the solution is ^ 2 = n1 (Y X bGLS)T V 1(Y X bGLS) % This is a biased estimator for 2 . When V contains unknown parameters: You could maximize the loglikelihood `(; ; Y) n = 1 log(jV j) log(2 ) 2 2 n log(2) 2 1 (Y X)T V 1(Y X) 22 with respect to , 2 and the parameters in V . 673 672 General Properties of MLE's There may be no algebraic Regularity Conditions: The MLE's for 2 and the (i) The parameter space has nite dimension, is closed and compact, and the true parameter vector is in the interior of the parameter space. REML estimates are often used. (ii) Probability distributions dened by any two dierent values of the parameter vector are distinct (an identiability condition). formulas for the solutions to the joint likelihood equations. parameters in V are usually biased (too small). 674 (iii) First three partial derivatives of the log-likelihood function, with respect to the parameters (iv) The expectation of the negative of the matrix of second partial derivatives of the log-likelihood is (a) nite (b) positive denite in a neighborhood of the true value of the parameter vector. This matrix is called the Fisher information matrix. (a) exist (b) are bounded by a function with a nite expectation. 675 676 Suppose Y1; : : : ; Yn are independent vectors of observations, with 2 66 66 66 66 4 3 77 77 77 77 5 Y1j Yj = . ; Ypj The log-likelihood function is and the density function (or probability function) is `(; Y1; : : : ; Yn) f (Yj ; ) = log (L(; Y1; : : : ; Yn)) = log f (Yj ; ) : j =1 n X ! Then, the joint likelihood function is L(; Y1; : : : ; Yn) = n Y j =1 f (Yj ; ) 677 678 The score function u() = 2 66 66 66 66 4 u1() .. ur () 3 77 77 77 77 5 = 2 66 66 66 66 66 66 4 @`( ;Y1;:::;Yn) @1 3 77 77 77 77 77 @`( ;Y1;:::;Yn) 775 .. @r is the vector of rst partial derivatives of the loglikelihood function with respect to the elements of 2 66 66 66 66 4 The maximum likelihood estimator (MLE) 2 66 66 66 66 64 ^1 7777 3 ^ = . 77777 ^r 5 3 is a solution to the likelihood equations, that maximizes the log-likelihood function. 1 777 = .. 77777 : r 5 The likelihood equations are u(; Y1; : : : ; Yn) = 0 679 680 Let Fisher information matrix: i() i() denote the Fisher information = Var(u(; Y1; : : : ; Yn)) matrix = E (u(; Y1 : : : ; Yn) [u(; Y1; : : : ; Yn)] = 1 TA @`(; Y1; : : : ; Yn) 777CCC 75C E A @r @k 02 B 6 B B666 B @4 denote the parameter vector 31 681 ^ denote the MLE for . Then, if the Regularity Conditions are satised, we have the following results: 682 Result 9.2: Asymptotic normality Result 9.1: ^ is a consistent estimator. P r <:(^ 8 )T (^ ) > =; ! 0 pn(^ as 0 ) dist!n N @0; nlim !1 n[i()] 0 1 1A n ! 1: 9 With a slight abuse of notation we may express this as as n ! 1; for any > 0. N @; [i()] ^ 0 1 1A for \large" sample sizes. 683 684 References: Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, (2nd ed.), Wiley, New York. Result 9.3: If ^ is the mle for , then the mle for g() is g(^) for any function g( ). Cox, C. (1984). American Statistician, 38, pp. 283{287. Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman & Hall, London (Chapters 8 and 9). Rao, C.R. (1973). Linear Statistical Inference, Wiley, New York (Chapter 5). 685 686