9. Maximum Lik eliho

9. Maximum Likelihood The estimating equations (normal Estimation equations) are I. Ordinary Least Squares Estimation: Only requires a model for the conditional mean of the response variable. n X @Q(b) @b0 = 2 (Yj @Q(b) @bi = 2 j =1 and For a linear model n X j =1 b0 Xij (Yj b1X1j b0 r Xrj ) = 0 b1X1j br Xrj ) = 0 for i = 1; 2; : : : ; r = 0 + 1X1j + + rXrj + j ; the OLS estimator for Yj 2 0 = .. 6 4 r 3 7 5 2 b0 is any b = .. 6 4 br 3 7 5 that minimizes the sum of squared residuals n X Q(b) = (Yj b0 b1X1j brXrj )2: j =1 The matrix form of these equations is (X T X )b = X T Y and a solution is b = (X T X ) X T Y: 701 700 The OLS estimator for an estimable function C T is C T b = C T (X T X ) X T Y for any solution to the normal equations. For a Gauss-Markov model with E (Y ) = X and V ar(Y) = 2I the OLS estimator of an estimable function C T is the unique best linear unbiased estimator (b.l.u.e.) of C T : E (C T b) = C T E (C T b) = C T V ar(C T b) = V ar(C T b) = 2C T (X T X ) C is smaller than the variance of any other linear unbiasd estimator for C T : 2C T (X T X ) X T X (X T X ) C: where = V ar(Y). The distribution of specied. Y is not completely 702 The distribution of specied. Y is not completely 703 The estimating equations are (X T V 1X )b = X T V 1Y: II. Generalized Least Squares Estimation Consider the Aitken model E (Y) = X and V ar(Y) = 2V where V is a positive denite symmetric matrix of known constants and 2 is an unknown variance parameter. A solution is bGLS X b)T V 1(Y 1Y: b.l.u.e. is = C T (X T V 1 X ) X T V 1 Y for any solution to the normal equations. C T bGLS E (C T b) = C T and X b) XT V For any estimable function C T the unique A GLS estimator for is any b that minimizes Q(b) = (Y = (X T V 1 X ) V ar(C T b) = 2C T (X T V (from Denition 3.8 with = 2V ). The distribution of specied. Y 1X ) C: is not completely 704 705 An unbiased estimator for 2 in the Aitken III. Maximum Likelihood Estimation The model must include a specication of the joint distribution of the observations model is 2 ^GLS = n rank(X ) Y 1 T V 1 V 1 X (X T V 1 X) XT V 1 In practice, V may not be known. Then 2 can be approximated by and GLS replacing V with a consistent estimator for V : { The estimator for C T is not b.l.u.e. { The estimator for 2 is not unbiased. { Both estimators are consistent. bGLS 706 Y: Example: Normal theory Gauss-Markov model: Yj = 0 + 1X1j + + r Xrj + j where or j NID(0; 2); 2 Y= 6 4 Y1 i = 1; : : : ; n 3 .. 75 N (X ; 2I ) Yn 707 This is equivalent to nding values of and 2 that maximize the log-likelihood. Find the parameter values that maximize the \likelihood" of the observed data. `( ; 2; Y1; : : : ; Yn) = log L(; 2; Y1; : : : ; Yn) For the normal-theory Gauss-Markov model, the likelihood function is = n2 log(2) n2 log(2) 1 (Y X )T (Y 22 L( ; 2; Y1; : : : ; Yn) = (2)1n=2n e 1 22 (Y X ) = n2 log(2) n2 log(2) 1 Xn (Y 22 j=1 j 0 X )T (Y X ) Find values of and 2 that maximize this likelihood function. r Xrj )2 % this is minimized by an OLS estimator for , regardless of the value of 2 708 709 Solve the likelihood equations: Solution: @`(; 2; Y) @0 and = 12 n X j =1 (Yj 0 ^2 r Xrj ) = 0 @i j =1 0 r Xrj ) = 0 for i = 1; 2; : : : ; r @`(; 2; Y) @ 2 n = 2n2 + 212 X (Yj j =1 = 0 r Xrj )2 710 j =1 % PX )Y n This estimator for 2 is biased. 0 n = n1 X (Yj ^0 ^rXrj )2 = n1 YT (I = 1 SSE @`(; 2; Y) n = 12 X Xij (Yj ^ = bOLS = (X T X ) X T Y 1 rank(X ) SSE is an unbiased estimator for 2. n n1 SSE and 1 rank(X ) SSE are asymptotically equivalent. n 711 General normal-theory linear model: Y = X + where N (0; ) and is known. Maximizing the log-likelihood when is known is equivalent to nding a that minimizes (Y X )T 1(Y X ) The multivariate normal likelihood function is L( ; Y) = (2)n=12jj1=2 e 12 (Y X )T 1(Y X ) The log-likelihood function is 1 n `( ; Y) = 2 log(2) 2 log(jj) 1 (Y X )T 1(Y X ) 2 The estimating equations are ( X T 1 X ) = X T 1 Y Solutions are of the form ^ = bGLS = (X T 1X ) XT For the general normal theory linear model, when is known, maximum likelihood estimation is the same as generalized least squares estimation. 713 712 When contains unknown parameters: You could maximize the log-likelihood `(; ; Y) = n2 log(2) 12 log(jj) 1 (Y X )T 1(Y 2 with respect to both and . 1Y Similarly, generalized least squares estimation and maximum likelihood estimation are equivalent for in the Aitken model Y N (X ; 2V ) when V is known. Substitute = 2V into the previous X ) There may be no algebraic formulas for solutions to the joint likelihood equations, say ^ and ^ . The MLE for is usually biased. 714 discussion. Then, the log-likelihood is `( ; 2; Y) = n2 log(2) n2 log(2) 1 log(jV j) 2 1 T 1 2 2 (Y X ) V ( Y X ) 715 The \likelihood equations" are (X T V 1X ) = X T V 1Y and 2 = 1 (Y n X )T V 1(Y General Properties of MLE's Regularity Conditions: X ) (i) The parameter space has nite dimension, is closed and compact, and the true parameter vector is in the interior of the parameter space. Solutions have the form ^ = bGLS = (X T V 1X ) X T V 1Y and ^2 = 1 (Y n X ^)T V 1(Y (ii) Probability distributions dened by any two dierent values of the parameter vector are distinct (an identiability condition). ^) X The likelihood equations are more complicated when V contains unknown parameters. 716 717 (iii) First three partial derivatives of the log-likelihood function, with respect to the parameters, (a) exist (b) are bounded by a function with a nite expectation. Suppose Y1; : : : ; Yn are independent vectors of observations, with 2 3 Y1j Yj = 64 . 75 ; (iv) The expectation of the negative of the matrix of second partial derivatives of the log-likelihood is (a) nite (b) positive denite in a neighborhood of the true value of the parameter vector. This is called the Fisher information matrix. Then, the joint likelihood function is n Y L( ; Y1; : : : ; Yn) = f (Yj ; ) 718 Ypj and the density function (or probability function) is f ( Yj ; ) j =1 and the log-likelihood function is `(; Y1; : : : ; Yn) = log (L( ; Y1; : : : ; Yn)) = n X j =1 log f (Yj ; ) : 719 The score function 2 u1() The maximum likelihood estimator (MLE) 2 3 @`( ;Y1;:::;Yn) @1 3 2 7 7 .. 75 = 666 .. u() = 7 4 5 @`( ;Y1;:::;Yn) ur () @r 6 4 is the vector of rst partial derivatives of the log-likelihood function with respect to the elements of 2 = 6 4 1 .. r 3 7 5 ^1 3 ^ = .. 75 ^r 6 4 is a solution to the likelihood equations, that maximizes the log-likelihood function. Fisher information matrix: i() = Var(u( ; Y1; : : : ; Yn)) = E u(; Y1 : : : ; Yn)[u(; Y1; : : : ; Yn)]T " #! @`(; Y1; : : : ; Yn) = E : The likelihood equations are @r @k u(; Y1; : : : ; Yn) = 0 720 721 Let Result 9.2: Asymptotic normality denote the parameter vector i() denote the Fisher information matrix ^ denote the MLE for . as Then, if the Regularity Conditions are satised, we have the following results: Result 9.1: ^ is a consistent estimator. Pr (^ n pn(^ ) dist!0n N 0; lim n[i()] 1 n!1 )T (^ ) > o n ! 1: With a slight abuse of notation we may express this as ^ N ; [i()] 1 !0 for \large" sample sizes. as n ! 1; for any > 0. 722 723 References: Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis Wiley, New York. Result 9.3: If ^ is the mle for , then the mle for g() is g(^) for any function g( ). , (2nd ed.), Cox, C. (1984). American Statistician, 38, pp. 283{287. Cox, D.R. and Hinkley, D.V. (1974). Theoretical Statistics, Chapman & Hall, London (Chapters 8 and 9). Rao, C.R. (1973). Linear Statistical Inference, Wiley, New York (Chapter 5). 724 725

9. Maximum Lik eliho

Related documents

Products

Support

9. Maximum Lik eliho

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib