Regression Analysis Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 2 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Multiple Linear Regression: Setup Data Set n cases i = 1, 2, . . . , n 1 Response (dependent) variable yi , i = 1, 2, . . . , n p Explanatory (independent) variables xi = (xi,1 , xi,2 , . . . , xi,p )T , i = 1, 2, . . . , n Goal of Regression Analysis: Extract/exploit relationship between yi and xi . Examples Prediction Causal Inference Approximation Functional Relationships MIT 18.S096 Regression Analysis 3 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation General Linear Model: For each case i, the conditional distribution [yi | xi ] is given by yi = ŷi + i where ŷi = β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p β = (β1 , β2 , . . . , βp )T are p regression parameters (constant over all cases) i Residual (error) variable (varies over all cases) Extensive breadth of possible models Polynomial approximation (xi,j = (xi )j , explanatory variables are different powers of the same variable x = xi ) Fourier Series: (xi,j = sin(jxi ) or cos(jxi ), explanatory variables are different sin/cos terms of a Fourier series expansion) Time series regressions: time indexed by i, and explanatory variables include lagged response values. Note: Linearity of ŷi (in regression parameters) maintained with non-linear x. MIT 18.S096 Regression Analysis 4 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Steps for Fitting a Model (1) Propose a model in terms of Response variable Y (specify the scale) Explanatory variables X1 , X2 , . . . Xp (include different functions of explanatory variables if appropriate) Assumptions about the distribution of over the cases (2) Specify/define a criterion for judging different estimators. (3) Characterize the best estimator and apply it to the given data. (4) Check the assumptions in (1). (5) If necessary modify model and/or assumptions and go to (1). MIT 18.S096 Regression Analysis 5 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Specifying Assumptions in (1) for Residual Distribution Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: i are i.i.d. N(0, σ 2 ) r.v.s Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic) Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction (1 − δ) of the i are i.i.d. N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some contamination distribution). MIT 18.S096 Regression Analysis 6 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Specifying Estimator Criterion in (2) Least Squares Maximum Likelihood Robust (Contamination-resistant) Bayes (assume βj are r.v.’s with known prior distribution) Accommodating incomplete/missing data Case Analyses for (4) Checking Assumptions Residual analysis Model errors i are unobservable Model residuals for fitted regression parameters β̃j are: ei = yi − [β̃1 xi,1 + β̃2 xi,2 + · · · + β̃p xi,p ] Influence diagnostics (identify cases which are highly ‘influential’ ?) Outlier detection MIT 18.S096 Regression Analysis 7 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 8 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Ordinary Least Squares Estimates Least Squares Criterion: For β = (β1 , β2 , . . . , βp )T , define PN 2 Q(β) = i=1 [yi − ŷi ] PN 2 = i=1 [yi − (β1 xi,1 + β2 xi,2 + · · · + βi,p xi,p )] Ordinary Least-Squares (OLS) estimate β̂: minimizes Q(β). Matrix Notation y1 x1,1 x1,2 · · · y2 x2,1 x2,2 · · · y= . X= . .. .. .. .. . . yn xn,1 xn,2 · · · MIT 18.S096 x1,p x2,p . .. xp,n Regression Analysis β= β1 .. . βp 9 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Solving for OLS Estimate β̂ ŷ = Q(β) = ŷ1 ŷ2 .. . = Xβ and ŷn Pn i=1 (yi − ŷi )2 = (y − ŷ)T (y − ŷ) = (y − Xβ)T (y − Xβ) OLS β̂ solves ∂Q(β) ∂βj = = = ∂Q(β) ∂βj =0, j = 1, 2, . . . , p Pn ∂ 2 i=1 [yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )] ∂βj Pn i=1 2(−xi,j )[yi − (xi,1 β1 + xi,2 β2 + · · · xi,p βp )] −2(X[j] )T (y − Xβ) where X[j] is the jth column of X MIT 18.S096 Regression Analysis 10 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Solving for OLS Estimate β̂ ∂Q ∂β = ∂Q ∂β1 ∂Q ∂β2 .. . ∂Q ∂βp = −2 So the OLS Estimate β̂ solves XT (y − Xβ) ⇐⇒ XT Xβ̂ =⇒ β̂ XT [1] (y − Xβ) XT [2] (y − Xβ) .. . XT [p] (y − Xβ) the = = = = −2XT (y − Xβ) “Normal Equations” 0 XT y (XT X)−1 XT y N.B. For β̂ to exist (uniquely) (XT X) must be invertible ⇐⇒ X must have Full Column Rank MIT 18.S096 Regression Analysis 11 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation (Ordinary) Least Squares Fit OLS Estimate: β̂ = yˆ = ŷ1 ŷ2 .. . ŷn β̂1 βˆ2 .. . β̂p T T = (X X)−1 X y Fitted Values: = x1,1 β̂1 + · · · + x1,p β̂p x2,1 β̂1 + · · · + x2,p β̂p .. . xn,1 βˆ1 + · · · + xn,p βˆp = Xβ̂ = X(XT X)−1 XT y = Hy Where H = X(XT X)−1 XT is the n × n “Hat Matrix” MIT 18.S096 Regression Analysis 12 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation (Ordinary) Least Squares Fit The Hat Matrix H projects R n onto the column-space of X Residuals: ˆi = yi − ŷi , i = 1, 2, . . . , n ˆ = ˆ1 ˆ2 .. . = y − ŷ = (In − H)y ˆn 0 Normal Equations: XT (y − Xβ̂) = XT ˆ = 0p = ... 0 N.B. The Least-Squares Residuals vector ˆ is orthogonal to the column space of X MIT 18.S096 Regression Analysis 13 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 14 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Gauss-Markov Theorem: Assumptions x1,1 x1,2 · · · x1,p x2,1 x2,2 · · · x2,p Data y = and X = .. .. .. . . . . . . yn xn,1 xn,2 · · · xp,n follow a linear model satisfying the Gauss-Markov Assumptions if y is an observation of random vector Y = (Y1 , Y2 , . . . YN )T and y1 y2 .. . E (Y | X, β) = Xβ, where β = (β1 , β2 , . . . βp )T is the p-vector of regression parameters. Cov (Y | X, β) = σ 2 In , for some σ 2 > 0. I.e., the random variables generating the observations are uncorrelated and have constant variance σ 2 (conditional on X, and β). MIT 18.S096 Regression Analysis 15 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Gauss-Markov Theorem For known constants c1 , c2 , . . . , cp , cp+1 , consider the problem of estimating θ = c1 β1 + c2 β2 + · · · cp βp + cp+1 . Under the Gauss-Markov assumptions, the estimator θ̂ = c1 βˆ1 + c2 β̂2 + · · · cp βˆp + cp+1 , where β̂1 , β̂2 , . . . βˆp are the least squares estimates is 1) An Unbiased Estimator of θ 2) A Linear Estimator of θ, that is P θ̂ = ni=1 bi yi , for some known (given X) constants bi . Theorem: Under the Gauss-Markov Assumptions, the estimator θ̂ has the smallest (Best) variance among all Linear Unbiased Estimators of θ, i.e., θ̂ is BLUE . MIT 18.S096 Regression Analysis 16 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Gauss-Markov Theorem: Proof Proof: Without loss of generality, assume cp+1 = 0 and define c =(c1 , c2 , . . . , cp )T . The Least Squares Estimate of θ = cT β is: θ̂ = cT β̂ = cT (XT X)−1 XT y ≡ dT y a linear estimate in y given by coefficients d = (d1 , d2 , . . . , dn )T . Consider an alternative linear estimate of θ: θ̃ = bT y with fixed coefficients given by b = (b1 , . . . , bn )T . Define f = b − d and note that θ̃ = bT y = (d + f)T y = θ̂ + f T y If θ̃ is unbiased then because θ̂ is unbiased 0 = E (f T y) = dT E (y) = f T (Xβ) for all β ∈ R p =⇒ f is orthogonal to column space of X =⇒ f is orthogonal to d = X(XT X)−1 c MIT 18.S096 Regression Analysis 17 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation If θ̃ is unbiased then The orthogonality of f to d implies Var (θ̃) = = = = = = ≥ Var (bT y) = Var (dT y + f T y) Var (dT y) + Var (f T y) + 2Cov (dT y, f T y) Var (θ̂) + Var (f T y) + 2dT Cov (y)f Var (θ̂) + Var (f T y) + 2dT (σ 2 In )f Var (θ̂) + Var (f T y) + 2σ 2 dT f Var (θ̂) + Var (f T y) + 2σ 2 × 0 Var (θ̂) MIT 18.S096 Regression Analysis 18 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 19 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Generalized Least Squares (GLS) Estimates Consider generalizing the Gauss-Markov assumptions for the linear regression model to Y = Xβ + where the random n-vector : E [] = 0n and E [0 ] = σ 2 Σ. σ 2 is an unknown scale parameter Σ is a known (n × n) positive definite matrix specifying the relative variances and correlations of the component observations. 1 1 Transform the data (Y, X) to Y∗ = Σ− 2 Y and X∗ = Σ− 2 X and the model becomes Y∗ = X∗ β + ∗ , where E [∗ ] = 0n and E [∗ (∗ )0 ] = σ 2 In By the Gauss-Markov Theorem, the BLUE (‘GLS’) of β is β̂ = [(X∗ )T (X∗ )]−1 (X∗ )T (Y∗ ) = [XT Σ−1 X]−1 (XT Σ−1 Y) MIT 18.S096 Regression Analysis 20 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 21 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Normal Linear Regression Models Distribution Theory Yi = xi,1 β1 + xi,2 β2 + · · · xi,p βp + i = µ i + i Assume {1 , 2 , . . . , n } are i.i.d N(0, σ 2 ). =⇒ [Yi | xi,1 , xi,2 , . . . , xi,p , β, σ 2 ] ∼ N(µi , σ 2 ), independent over i = 1, 2, . . . n. Conditioning on X, β, and σ 2 Y = Xβ + , where = 1 2 .. . ∼ Nn (On , σ 2 In ) n MIT 18.S096 Regression Analysis 22 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Distribution Theory µ1 µ = ... = E (Y | X, β, σ 2 ) = Xβ µn MIT 18.S096 Regression Analysis 23 Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Regression Analysis Σ = Cov (Y | X, β, σ 2 ) = σ2 0 0 ... 0 0 σ2 0 0 σ2 ... 0 0 ··· ··· .. . ··· 0 0 0 ... σ2 = σ 2 In That is, Σi,j = Cov (Yi , Yj | X, β, σ 2 ) = σ 2 × δi,j . Apply Moment-Generating Functions (MGFs) to derive Joint distribution of Y = (Y1 , Y2 , . . . , Yn )T Joint distribution of β̂ = (β̂1 , β̂2 , . . . , β̂p )T . MIT 18.S096 Regression Analysis 24 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MGF of Y For the n-variate r.v. Y, and constant n−vector t = (t1 , . . . , tn )T , MY (t) = = = = = T E (e t Y ) = E (e t1 Y1 +t2 Y2 +···tn Yn ) E (e t1 Y1 ) · E (e t2 Y2 ) · · · E (e tn Yn ) MY1 (t1 ) · MY2 (t2 ) · · · MYn (tn ) Qn ti µi + 21 ti2 σ 2 i =1 e P n 1 Pn 1 T T e i=1 ti µi + 2 i,k=1 ti Σi,k tk = e t u+ 2 t Σt =⇒ Y ∼ Nn (µ, Σ) Multivariate Normal with mean µ and covariance Σ MIT 18.S096 Regression Analysis 25 Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Regression Analysis MGF of β̂ For the p-variate r.v. β̂, and constant p−vector τ = (τ1 , . . . , τp )T , Mβ̂ (τ ) = E (e τ T β̂ ˆ ˆ ) = E (e τ1 β1 +τ2 β2 +···τp βp ) Defining A = (XT X)−1 XT we can express β̂ = (XT X)−1 XT y = AY and Mβˆ (τ ) = = = = = T ˆ E (e τ β ) T E (e τ AY ) T E (e t Y ), with t = AT τ MY (t) 1 T T e t u+ 2 t Σt MIT 18.S096 Regression Analysis 26 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MGF of β̂ For Mβ̂ (τ ) = E (e τ = e T β̂ ) tT u+ 12 tT Σt Plug in: t = AT τ = X(XT X)−1 τ µ = Xβ Σ = σ 2 In Gives: tT µ = τ T β tT Σt = τ T (XT X)−1 XT [σ 2 In ]X(XT X)−1 τ = τ T [σ 2 (XT X)−1 ]τ So the MGF of β̂ is 1 T 2 T T −1 Mβ̂ (τ ) = e τ β+ 2 τ [σ (X X) ]τ T Regression 2 −1 Analysis MIT 18.S096 ˆ 27 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Marginal Distributions of Least Squares Estimates Because β̂ ∼ Np (β, σ 2 (XT X)−1 ) the marginal distribution of each β̂j is: β̂j ∼ N(βj , σ 2 Cj,j ) where Cj.j = jth diagonal element of (XT X)−1 MIT 18.S096 Regression Analysis 28 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation The Q-R Decomposition of X Consider expressing the (n × p) matrix X of explanatory variables as X=Q·R where Q is an (n × p) orthonormal matrix, i.e., QT Q = Ip . R is a (p × p) upper-triangular matrix. The columns of Q = [Q[1] , Q[2] , . . . , Q[p] ] can be constructed by performing the Gram-Schmidt Orthonormalization procedure on the columns of X = [X[1] , X[2] , . . . , X[p] ] MIT 18.S096 Regression Analysis 29 Regression Analysis If X[1] =⇒ r1,1 r1,2 · · · 0 r2,2 · · · .. R= 0 . 0 0 0 0 0 ··· = Q[1] r1,1 Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation r1,p−1 r2,p−1 ... r1,p r2,p ... rp−1,p−1 rp−1,p 0 rp,p , then 2 r1,1 = XT [1] X[1] Q[1] = X[1] /r1,1 X[2] = Q[1] r1,2 + Q[2] r2,2 =⇒ T T T Q[1] X[2] = Q[1] Q[1] r1,2 + Q[1] Q[2] r2,2 = 1 · r1,2 + 0 · r2,2 = r1,2 (known since Q[1] specfied) MIT 18.S096 Regression Analysis 30 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation With r1,2 and Q[1] specfied we can solve for r2,2 : =⇒ Q[2] r2,2 = X[2] − Q[1] r1,2 Take squared norm of both sides: T 2 = XT X 2 r2,2 [2] [2] − 2r1,2 Q[1] X[2] + r1,2 (all terms on RHS are known) With r2,2 specified =⇒ 1 Q[2] = r2,2 X[2] − r1,2 Q[1] Etc. (solve for elements of R, and columns of Q) MIT 18.S096 Regression Analysis 31 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation With the Q-R Decomposition X = QR (QT Q = Ip , and R is p × p upper-triangular) β̂ = (XT X)−1 XT y = R−1 QT y (plug in X = QR and simplify) Cov (β̂) = σ 2 (XT X)−1 = σ 2 R−1 (R−1 )T H = X(XT X)−1 XT = QQT (giving ŷ = Hy and ˆ = (In − H)y) MIT 18.S096 Regression Analysis 32 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation More Distribution Theory Assume y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e., ∼ Nn (0n , σ 2 In ) or y ∼ Nn (Xβ, σ 2 In ) Theorem* For any (m × n) matrix A of rank m ≤ n, the random normal vector y transformed by A, z = Ay is also a random normal vector: z ∼ Nm (µz , Σz ) where µz = AE (y) = AXβ, and Σz = ACov (y)AT = σ 2 AAT . Earlier, A = (XT X)−1 XT yields the distribution of β̂ = Ay With a different definition of A (and z) we give an easy proof of: MIT 18.S096 Regression Analysis 33 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Theorem For the normal linear regression model y = Xβ + , where X (n × p) has rank p and ∼ Nn (0n , σ 2 In ). (a) (b) (c) (d) β̂ = (XT X)−1 XT y and ˆ = y − Xβ̂ are independent r.v.s β̂ ∼ N (β, σ 2 (XT X)−1 ) Pn p2 ˆi = ˆT ˆ ∼ σ 2 χ2n−p (Chi-squared r.v.) i=1 For each j = 1, 2, . . . , p ˆtj = where β̂j −βj σ̂Cj,j ∼ tn−p (t− distribution) 1 Pn σ̂ 2 = n−p ˆ2i i=1 Cj,j = [(XT X)−1 ]j,j MIT 18.S096 Regression Analysis 34 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Proof: Note that (d) follows immediately from (a), (b), (c) QT Define A = WT , where A is an (n × n) orthogonal matrix (i.e. AT = A−1 ) Q is the column-orthonormal matrix in a Q-R decomposition of X Note: W can be constructed by continuing the Gram-Schmidt Orthonormalization process (which was used to construct Q from X) with X∗ = [ X In ]. Then, consider T Q y zQ (p × 1) z = Ay = = T (n − p) × 1 z W y W MIT 18.S096 Regression Analysis 35 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation The distribution of z = Ay is Nn (µz , Σz ) where T Q µz = [A][Xβ] = [Q · R · β] WT T Q Q = [R · β] T W Q Ip = [R · β] 0 (n−p)×p R·β = 0(n−p)×p Σz = A · [σ 2 In ] · AT = σ 2 [AAT ] = σ 2 In since AT = A−1 MIT 18.S096 Regression Analysis 36 Regression Analysis Thus z = =⇒ zQ zW ∼ Nn Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Rβ On−p 2 , σ In zQ ∼ Np [(Rβ), σ 2 Ip ] zW ∼ N(n−p) [(O(n−p) , σ 2 I(n−p) ] and zQ and zW are independent. The Theorem follows by showing (a*) β̂ = R−1 zQ and ˆ = WzW , (i.e. β̂ and ˆ are functions of different independent vecctors). (b*) Deducing the distribution of β̂ = R−1 zQ , applying Theorem* with A = R−1 and “y” = zQ (c*) ˆT ˆ = zW T zW = sum of (n − p) squared r.v’s which are i.i.d. N(0, σ 2 ). 2 ∼ σ 2 χ(n−p) , a scaled Chi-Squared r.v. MIT 18.S096 Regression Analysis 37 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Proof of (a*) β̂ = R−1 zQ follows from β̂ = (XT X)−1 Xy and X = QR with Q : QT Q = Ip ˆ = = = = = y − ŷ = y − Xβ̂ = y − (QR) · (R−1 zQ ) y − QzQ y − QQT y = (In − QQT )y WWT y (since In = AT A = QQT + WWT ) WzW MIT 18.S096 Regression Analysis 38 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 39 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Maximum-Likelihood Estimation Consider the normal linear regression model: y = Xβ + , where {i } are i.i.d. N(0, σ 2 ), i.e., ∼ Nn (0n , σ 2 In ) or y ∼ Nn (Xβ, σ 2 In ) Definitions: The likelihood function is L(β, σ 2 ) = p(y | X, B, σ 2 ) where p(y | X, B, σ 2 ) is the joint probability density function (pdf) of the conditional distribution of y given data X, (known) and parameters (β, σ 2 ) (unknown). The maximum likelihood estimates of (β, σ 2 ) are the values maximizing L(β, σ 2 ), i.e., those which make the observed data y most likely in terms of its pdf. MIT 18.S096 Regression Analysis 40 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation 2 Because Ppthe yi are independent r.v.’s with yi ∼ N(µi , σ ) where µi = j=1 βj xi,j , Qn L(β, σ 2 ) = p (yi | β, σ 2 ) i P Qni=1 h 1 − 1 2 (yi − j=1 βj xi,j )2 √ 2σ = e i=1 2 2πσ − 12 (y−Xβ)T (σ 2 In )−1 (y−Xβ) 1 e (2πσ 2 )n/2 likelihood estimates (β̂, σ̂ 2 ) maximize the = The maximum log-likeliood function (dropping constant terms) logL(β, σ 2 ) = − n2 log (σ 2 ) − 12 (y − Xβ)T (σ 2 In )−1 (y − Xβ) = − n2 log (σ 2 ) − 2σ1 2 Q (β) where Q(β) = (y − Xβ)T (y − Xβ) ( “Least-Squares Criterion”!) The OLS estimate β̂ is also the ML-estimate. The ML estimate of σ 2 solves 2 ∂log L(β̂ ,σ ) ∂(σ 2 ) ˆ 2 = =⇒ σML = 0 ,i.e., − n2 σ12 − 21 (−1)(σ 2 )−2 Q(β̂) = 0 P Q(β̂)/n = ( ni=1 ˆ2i )/n (biased!) MIT 18.S096 Regression Analysis 41 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Outline 1 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation MIT 18.S096 Regression Analysis 42 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation Generalized M Estimation For data y, X fit the linear regression model y i = xT i β + i , i = 1, 2, . . . , n. by specifying β =P β̂ to minimize Q(β) = ni=1 h(yi , xi , β, σ 2 ) The choice of the function h( ) distinguishes different estimators. 2 (1) Least Squares: h(yi , xi , β, σ 2 ) = (yi − xT i β) (2) Mean Absolue Deviation (MAD): h(yi , xi , β, σ 2 ) = |yi − xiT β| (3) Maximum Likelihood (ML): Assume the yi are independent with pdf’s p(yi | β, xi , σ 2 ), h(yi , xi , β, σ 2 ) = −log p(yi | β, xi , σ 2 ) (4) Robust M−Estimator: h(yi , xi , β, σ 2 ) = χ(yi − xT i β) χ( ) is even, monotone increasing on (0, ∞). MIT 18.S096 Regression Analysis 43 Regression Analysis Linear Regression: Overview Ordinary Least Squares (OLS) Gauss-Markov Theorem Generalized Least Squares (GLS) Distribution Theory: Normal Regression Models Maximum Likelihood Estimation Generalized M Estimation (5) Quantile Estimator: Forτ : 0 < τ < 1, a fixed quantile τ |yi − xT i β|, if yi ≥ xi β h(yi , xi , β, σ 2 ) = (1 − τ )|yi − xT i β|, if yi < xi β E.g., τ = 0.90 corresponds to the 90th quantile / upper-decile. τ = 0.50 corresponds to the MAD Estimator MIT 18.S096 Regression Analysis 44 MIT OpenCourseWare http://ocw.mit.edu 18.S096 Topics in Mathematics with Applications in Finance Fall 2013 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.