Regression Analysis II Regression Analysis MIT 18.443 Dr. Kempthorne Spring 2015 MIT 18.443 Regression Analysis II 1 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.443 Regression Analysis II 2 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Marginal Distributions of Least Squares Estimates Because β̂ ∼ Np (β, σ 2 (XT X)−1 ) the marginal distribution of each β̂j is: β̂j ∼ N(βj , σ 2 Cj,j ) where Cj.j = jth diagonal element of (XT X)−1 MIT 18.443 Regression Analysis II 3 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation The Q-R Decomposition of X Consider expressing the (n × p) matrix X of explanatory variables as X=Q·R where Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip . R is a (p × p) upper-triangular matrix. The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by performing the Gram-Schmidt Orthonormalization procedure on the columns of X = [X[1] , X[2] , . . . , X[p] ] MIT 18.443 Regression Analysis II 4 Regression Analysis II ⎡ If r1,1 r1,2 · · · ⎢ 0 r2,2 · · · ⎢ ⎢ .. R = ⎢ 0 . 0 ⎢ ⎣ 0 0 0 0 ··· Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation r1,p−1 r2,p−1 .. . r1,p r2,p ... rp−1,p−1 rp−1,p 0 rp,p ⎤ ⎥ ⎥ ⎥ ⎥, then ⎥ ⎦ X[1] = Q[1] r1,1 =⇒ 2 r1,1 = XT [1] X[1] Q[1] = X[1] /r1,1 X[2] = Q[1] r1,2 + Q[2] r2,2 =⇒ T T QT [1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2 = 1 · r1,2 + 0 · r2,2 = r1,2 (known since Q[1] specfied) MIT 18.443 Regression Analysis II 5 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation With r1,2 and Q[1] specfied we can solve for r2,2 : =⇒ Q[2] r2,2 = X[2] − Q[1] r1,2 Take squared norm of both sides: T 2 = XT X 2 r2,2 [2] [2] − 2r1,2 Q[1] X[2] + r1,2 (all terms on RHS are known) With r2,2 specified =⇒ � � 1 Q[2] = r2,2 X[2] − r1,2 Q[1] Etc. (solve for elements of R, and columns of Q) MIT 18.443 Regression Analysis II 6 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation With the Q-R Decomposition X = QR (QT Q = Ip , and R is p × p upper-triangular) β̂ = (XT X)−1 XT y = R−1 QT y (plug in X = QR and simplify) Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T H = X(XT X)−1 XT = QQT (giving ŷ = Hy and ˆ� = (In − H)y) MIT 18.443 Regression Analysis II 7 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation More Distribution Theory Assume y = Xβ + �, where {Ei } are i.i.d. N(0, σ 2 ), i.e., � ∼ Nn (0n , σ 2 In ) or y ∼ Nn (Xβ, σ 2 In ) Theorem* For any (m × n) matrix A of rank m ≤ n, the random normal vector y transformed by A, z = Ay is also a random normal vector: z ∼ Nm (µz , Σz ) where µz = AE (y) = AXβ, and Σz = ACov (y)AT = σ 2 AAT . Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay With a different definition of A (and z) we give an easy proof of: MIT 18.443 Regression Analysis II 8 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Theorem For the normal linear regression model y = Xβ + �, where X (n × p) has rank p and � ∼ Nn (0n , σ 2 In ). (a) β̂ = (XT X)−1 XT y and ˆ� = y − Xβ̂ are independent r.v.s ˆ ∼ Np (β, σ 2 (XT X)−1 ) (b) β �n 2 2 (c) �T ˆ� ∼ σ 2 χn−p (Chi-squared r.v.) i=1 Êi = ˆ (d) For each j = 1, 2, . . . , p t̂j = where β̂j −βj σ̂Cj,j ∼ tn−p (t− distribution) 1 �n σ̂ 2 = n−p ˆi2 i=1 � Cj,j = [(XT X)−1 ]j,j MIT 18.443 Regression Analysis II 9 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Proof: Note that (d) follows immediately from (a), (b), (c) QT Define A = WT � � , where A is an (n × n) orthogonal matrix (i.e. AT = A−1 ) Q is the column-orthonormal matrix in a Q-R decomposition of X Note: W can be constructed by continuing the Gram-Schmidt Orthonormalization process (which was used to construct Q from X) with X∗ = [ X In ]. Then, consider � T � � � Q y zQ (p × 1) = z = Ay = T (n − p) × 1 zW W y MIT 18.443 Regression Analysis II 10 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation The distribution of z = Ay is Nn (µz , Σz ) where � T � Q µz = [A][Xβ] = [Q · R · β] WT � T � Q Q = [R · β] T � W Q � Ip = [R · β] 0(n−p)×p � � R·β = 0(n−p)×p Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In since AT = A−1 MIT 18.443 Regression Analysis II 11 Regression Analysis II � Thus z = =⇒ zQ zW � �� ∼ Nn Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Rβ On−p � 2 � , σ In zQ ∼ Np [(Rβ), σ 2 Ip ] zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ] and zQ and zW are independent. The Theorem follows by showing (a*) β̂ = R−1 zQ and ˆ� = WzW , ˆ and ˆ� are functions of different independent vecctors). (i.e. β (b*) Deducing the distribution of β̂ = R−1 zQ , applying Theorem* with A = R−1 and “y” = zQ (c*) ˆ�T ˆ� = zW T zW = sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ). ∼ σ 2 χ2(n−p) , a scaled Chi-Squared r.v. MIT 18.443 Regression Analysis II 12 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Proof of (a*) β̂ = R−1 zQ follows from β̂ = (XT X)−1 Xy and X = QR with Q : QT Q = Ip ˆ� = = = = = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ ) y − QzQ y − QQT y = (In − QQT )y WWT y (since In = AT A = QQT + WWT ) WzW MIT 18.443 Regression Analysis II 13 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.443 Regression Analysis II 14 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Maximum-Likelihood Estimation Consider the normal linear regression model: y = Xβ + �, where {Ei } are i.i.d. N(0, σ 2 ), i.e., � ∼ Nn (0n , σ 2 In ) or y ∼ Nn (Xβ, σ 2 In ) Definitions: The likelihood function is L(β, σ 2 ) = p(y | X, B, σ 2 ) where p(y | X, B, σ 2 ) is the joint probability density function (pdf) of the conditional distribution of y given data X, (known) and parameters (β, σ 2 ) (unknown). The maximum likelihood estimates of (β, σ 2 ) are the values maximizing L(β, σ 2 ), i.e., those which make the observed data y most likely in terms of its pdf. MIT 18.443 Regression Analysis II 15 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation 2 Because �pthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where µi = j=1 βj xi,j , Qn 2 L(β, σ 2 ) = p � � (yi | β, σ 1) � Qi=1 − (y − j=1 βj xi,j )2 n √ 1 e 2σ2 i = i=1 2 2πσ − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ) 1 e (2πσ 2 )n/2 likelihood estimates (β̂, σ̂ 2 ) maximize the = The maximum log-likeliood function (dropping constant terms) logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ) = − n2 log (σ 2 ) − 2σ1 2 Q(β) where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!) The OLS estimate β̂ is also the ML-estimate. The ML estimate of σ 2 solves 2 ∂log L(β̂,σ ) ∂(σ 2 ) 2̂ = =⇒ σML = 0 ,i.e., − n2 σ12 − 12 (−1)(σ 2 )−2 Q(β̂) = 0 �n Q(β̂)/n = ( i=1 �ˆi2 )/n (biased!) MIT 18.443 Regression Analysis II 16 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.443 Regression Analysis II 17 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Generalized M Estimation For data y, X fit the linear regression model yi = xT i β + Ei , i = 1, 2, . . . , n. ˆ to minimize by specifying β =� β Q(β) = ni=1 h(yi , xi , β, σ 2 ) The choice of the function h( ) distinguishes different estimators. (1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xiT β)2 (2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xT i β| (3) Maximum Likelihood (ML): Assume the yi are independent with pdf’s p(yi | β, xi , σ 2 ), h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 ) (4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT i β) χ( ) is even, monotone increasing on (0, ∞). MIT 18.443 Regression Analysis II 18 Regression Analysis II Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation (5) Quantile Estimator: ForTτ : 0 < τ < 1, a fixed quantile τ |yi − xT i β|, if yi ≥ xi β h(yi , xi , β, σ 2 ) = (1 − τ )|yi − xT i β|, if yi < xi β E.g., τ = 0.90 corresponds to the 90th quantile / upper-decile. τ = 0.50 corresponds to the MAD Estimator MIT 18.443 Regression Analysis II 19 MIT OpenCourseWare http://ocw.mit.edu 18.443 Statistics for Applications Spring 2015 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.