Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 1 / 174 Section 1 Introduction Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 2 / 174 1. Introduction The objectives of this chapter are the following: 1 De…ne the multiple linear regression model. 2 Introduce the ordinary least squares (OLS) estimator. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 3 / 174 1. Introduction The outline of this chapter is the following: Section 2: The multiple linear regression model Section 3: The ordinary least squares estimator Section 4: Statistical properties of the OLS estimator Subsection 4.1: Finite sample properties Subsection 4:2: Asymptotic properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 4 / 174 1. Introduction References Amemiya T. (1985), Advanced Econometrics. Harvard University Press. Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil (recommended) Pelgrin, F. (2010), Lecture notes Advanced Econometrics, HEC Lausanne (a special thank) Ruud P., (2000) An introduction to Classical Econometric Theory, Oxford University Press. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 5 / 174 1. Introduction Notations: In this chapter, I will (try to...) follow some conventions of notation. fY ( y ) probability density or mass function FY ( y ) cumulative distribution function Pr () probability y vector Y matrix Be careful: in this chapter, I don’t distinguish between a random vector (matrix) and a vector (matrix) of deterministic elements. For more appropriate notations, see: Abadir and Magnus (2002), Notation in econometrics: a proposal for a standard, Econometrics Journal. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 6 / 174 Section 2 The Multiple Linear Regression Model Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 7 / 174 2. The Multiple Linear Regression Model Objectives 1 De…ne the concept of Multiple linear regression model. 2 Semi-parametric and Parametric multiple linear regression model. 3 The multiple linear Gaussian model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 8 / 174 2. The Multiple Linear Regression Model De…nition (Multiple linear regression model) The multiple linear regression model is used to study the relationship between a dependent variable and one or more independent variables. The generic form of the linear regression model is y = x1 β1 + x2 β2 + .. + xK βK + ε where y is the dependent or explained variable and x1 , .., xK are the independent or explanatory variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 9 / 174 2. The Multiple Linear Regression Model Notations 1 y is the dependent variable, the regressand or the explained variable. 2 xj is an explanatory variable, a regressor or a covariate. 3 ε is the error term or disturbance. IMPORTANT: do not use the term "residual".. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 10 / 174 2. The Multiple Linear Regression Model Notations (cont’d) The term ε is a random disturbance, so named because it “disturbs” an otherwise stable relationship. The disturbance arises for several reasons: 1 Primarily because we cannot hope to capture every in‡uence on an economic variable in a model, no matter how elaborate. The net e¤ect, which can be positive or negative, of these omitted factors is captured in the disturbance. 2 There are many other contributors to the disturbance in an empirical model. Probably the most signi…cant is errors of measurement. It is easy to theorize about the relationships among precisely de…ned variables; it is quite another to obtain accurate measures of these variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 11 / 174 2. The Multiple Linear Regression Model Notations (cont’d) We assume that each observation in a sample fyi , xi 1 , xi 2 ..xiK g for i = 1, .., N is generated by an underlying process described by yi = xi 1 β1 + xi 2 β2 + .. + xiK βK + εi Remark: xik = value of the k th explanatory variable for the i th unit of the sample xunit,variable Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 12 / 174 2. The Multiple Linear Regression Model Notations (cont’d) Let the N 1 column vector xk be the N observations on variable xk , for k = 1, .., K . Let assemble these data in an N K data matrix, X. Let y be the N 1 column vector of the N observations, y1, y2 , .., yN . Let ε be the N 1 column vector containing the N disturbances. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 13 / 174 2. The Multiple Linear Regression Model Notations (cont’d) 0 B B B y =B B N 1 B @ y1 y2 .. yi .. yN 0 1 C C C C C C A xk N 1 Christophe Hurlin (University of Orléans) B B B =B B B @ x1k x2k .. xik .. xNk 1 C C C C C C A 0 B B B ε =B B N 1 B @ Advanced Econometrics - HEC Lausanne ε1 ε2 .. εi .. εN 1 C C C C C C A 0 1 β1 B β C 2 C β =B @ .. A K 1 βK November 23, 2013 14 / 174 2. The Multiple Linear Regression Model Notations (cont’d) X = (x1 : x2 : .. : xK ) N K or equivalently 0 B B B X =B B N K B @ Christophe Hurlin (University of Orléans) x11 x12 x21 x22 .. .. xi 1 xi 2 .. xN 1 xN 2 .. x1k .. x2k .. .. .. xik .. .. .. xNk .. x1K .. x2K .. .. .. xiK .. .. .. xNK Advanced Econometrics - HEC Lausanne 1 C C C C C C A November 23, 2013 15 / 174 2. The Multiple Linear Regression Model Fact In most of cases, the …rst column of X is assumed to be a column of 1s so that β1 is the constant term in the model. x1 N 1 0 B B B X =B B N K B @ Christophe Hurlin (University of Orléans) 1 x12 1 x22 .. .. 1 xi 2 .. 1 xN 2 = 1 N 1 .. x1k .. x2k .. .. .. xik .. .. .. xNk .. x1K .. x2K .. .. .. xiK .. .. .. xNK Advanced Econometrics - HEC Lausanne 1 C C C C C C A November 23, 2013 16 / 174 2. The Multiple Linear Regression Model Remark More generally, the matrix X may as well contain stochastic and non stochastic elements such as: Constant; Time trend; Dummy variables (for speci…c episodes in time); Etc. Therefore, X is generally a mixture of …xed and random variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 17 / 174 2. The Multiple Linear Regression Model De…nition (Simple linear regression model) The simple linear regression model is a model with only one stochastic regressor: K = 1 if there is no constant yi = β1 xi + εi or K = 2 if there is a constant: yi = β1 + β2 xi 2 + εi for i = 1, .., N, or y = β1 + β2 x2 + ε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 18 / 174 2. The Multiple Linear Regression Model De…nition (Multiple linear regression model) The multiple linear regression model can be written y = X N 1 Christophe Hurlin (University of Orléans) β + ε N KK 1 N 1 Advanced Econometrics - HEC Lausanne November 23, 2013 19 / 174 2. The Multiple Linear Regression Model One key di¤erence for the speci…cation of the MLRM: Parametric/semi-parametric speci…cation Parametric model: the distribution of the error terms is fully characterized, e.g. ε N (0, Ω) Semi-Parametric speci…cation: only a few moments of the error terms are speci…ed, e.g. E (ε) = 0 and V (ε) = E εε> = Ω. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 20 / 174 2. The Multiple Linear Regression Model This di¤erence does not matter for the derivation of the ordinary least square estimator But this di¤erence matters for (among others): 1 The characterization of the statistical properties of the OLS estimator (e.g., e¢ ciency); 2 The choice of alternative estimators (e.g., the maximum likelihood estimator); 3 Etc. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 21 / 174 2. The Multiple Linear Regression Model De…nition (Semi-parametric multiple linear regression model) The semi-parametric multiple linear regression model is de…ned by y = Xβ + ε where the error term ε satis…es E ( ε j X) = 0 N 1 V ( ε j X ) = σ 2 IN N N and IN is the identity matrix of order N. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 22 / 174 2. The Multiple Linear Regression Model Remarks 1 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the conditions on the error term u read: E (ε) = 0 V ( ε ) = σ 2 IN 2 If the (conditional) variance covariance matrix of ε is not diagonal, i.e. if V ( ε j X) = Ω the model is called the Multiple Generalized Linear Regression Model Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 23 / 174 2. The Multiple Linear Regression Model Remarks (cont’d) The two conditions on the error term ε E ( εj X) = 0N 1 V ( ε j X ) = σ 2 IN are equivalent to E ( yj X) = Xβ V ( y j X ) = σ 2 IN Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 24 / 174 2. The Multiple Linear Regression Model De…nition (The multiple linear Gaussian model) The (parametric) multiple linear Gaussian model is de…ned by y = Xβ + ε where the error term ε is normally distributed ε N 0, σ2 IN As a consequence, the vector y has a conditional normal distribution with yj X Christophe Hurlin (University of Orléans) N Xβ, σ2 IN Advanced Econometrics - HEC Lausanne November 23, 2013 25 / 174 2. The Multiple Linear Regression Model Remarks 1 The multiple linear Gaussian model is (by de…nition) a parametric model. 2 If the matrix X is non stochastic (…xed), i.e. there are only …xed regressors, then the vector y has marginal normal distribution: y Christophe Hurlin (University of Orléans) N Xβ, σ2 IN Advanced Econometrics - HEC Lausanne November 23, 2013 26 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 27 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 1: Linearity) The model is linear with respect to the parameters β1 , .., βK . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 28 / 174 Remarks The model speci…es a linear relationship between the dependent variable and the regressors. For instance, the models y = β0 + β1 x + u y = β0 + β1 cos (x ) + v y = β0 + β1 1 +w x are all linear (with respect to β). In contrast, the model y = β0 + β1 x β2 + ε is non linear Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 29 / 174 2. The Multiple Linear Regression Model Remark The model can be linear after some transformations. Starting from y = Ax β exp (ε) , one has a log-linear speci…cation: ln (y ) = ln (A) + β ln (x ) + ε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 30 / 174 2. The Multiple Linear Regression Model De…nition (Log-linear model) The loglinear model is ln (yi ) = β1 ln (xi 1 ) + β2 ln (xi 2 ) + .. + βK ln (xiK ) + εi This equation is also known as the constant elasticity form as in this equation, the elasticity of y with respect to changes in x does not vary with xik : ∂yi xik ∂ ln (yi ) = βk = ∂ ln (xik ) ∂xik yi Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 31 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 32 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 2: Full column rank) X is an N K matrix with rank K . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 33 / 174 2. The Multiple Linear Regression Model Interpretation 1 There is no exact relationship among any of the independent variables in the model. 2 The columns of X are linearly independent. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 34 / 174 2. The Multiple Linear Regression Model Example Suppose that a cross-section model satis…es: yi = β0 + β1 non labor incomei + β2 salaryi + β3 total incomei + εi The identi…cation condition does not hold since total income is exactly equal to salary plus non labor income (exact linear dependency in the model). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 35 / 174 2. The Multiple Linear Regression Model Remarks 1 Perfect multi-collinearity is generally not di¢ cult to spot and is signalled by most statistical software. 2 Imperfect multi-collinearity is a more serious issue (see further). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 36 / 174 2. The Multiple Linear Regression Model De…nition (Identi…cation) The multiple linear regression model is identi…able if and only if one the following equivalent assertions holds: (i) rank (X) = K (ii) The matrix X> X is invertible (iii) The columns of X form a basis of L (X) (iv) Xβ1 = Xβ2 =) β1 = β2 8 ( β1 , β2 ) 2 RK RK (v) Xβ = 0 =) β = 0 8 β 2 RK (vi) ker (X) = f0g Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 37 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 38 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 3: Strict exogeneity of the regressors) The regressors are exogenous in the sense that: E ( εj X) = 0N 1 or equivalently for all the units i 2 f1, ..N g E ( εi j X) = 0 or equivalently E ( εi j xjk ) = 0 for any explanatory variable k 2 f1, ..K g and any unit j 2 f1, ..N g . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 39 / 174 2. The Multiple Linear Regression Model CommentsComments 1 The expected value of the error term at observation i (in the sample) is not a function of the independent variables observed at any observation (including the i th observation). The independent variables are not predictors of the error terms. 2 The strict exogeneity condition can be rewritten as: E ( y j X) = Xβ 3 If the regressors are …xed, this condition can be rewritten as: E (ε) = 0N Christophe Hurlin (University of Orléans) 1 Advanced Econometrics - HEC Lausanne November 23, 2013 40 / 174 2. The Multiple Linear Regression Model Implications The (strict) exogeneity condition E ( εj X) = 0N 1 1 has two implications: The zero conditional mean of ε implies that the unconditional mean of u is also zero (the reverse is not true): E (ε) = EX (E ( εj X)) = EX (0) = 0 2 The zero conditional mean of ε implies that (the reverse is not true): E (εi xjk ) = 0 8i, j, k or Cov (εi , X) = 0 Christophe Hurlin (University of Orléans) 8i Advanced Econometrics - HEC Lausanne November 23, 2013 41 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 42 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 4: Spherical disturbances) The error terms are such that: V ( εi j X) = E ε2i X = σ2 for all i 2 f1, ..N g and Cov ( εi , εj j X) = E ( εi εj j X) = 0 for all i 6= j The condition of constant variances is called homoscedasticity. The uncorrelatedness across observations is called nonautocorrelation. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 43 / 174 2. The Multiple Linear Regression Model Comments 1 Spherical disturbances = homoscedasticity + nonautocorrelation 2 If the errors are not spherical, we call them nonspherical disturbances. 3 The assumption of homoscedasticity is a strong one: this is the exception rather than the rule! Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 44 / 174 2. The Multiple Linear Regression Model Comments Let us consider the (conditional) variance covariance matrix of the error terms: V ( εj X) = E εε> X | {z } | {z } N N 0 B B B =B B B @ E ( ε1 ε2 j X) E ε21 X E ( ε2 ε1 j X) E ε22 X .. E ( εi ε1 j X) .. E ( εN ε1 j X) Christophe Hurlin (University of Orléans) N N .. E ( ε1 εj j X) .. E ( ε2 εj j X) .. .. .. E ( εi εj j X) .. .. .. E ( εN εj j X) Advanced Econometrics - HEC Lausanne .. E ( ε1 εN j X) .. E ( ε2 εN j X) .. .. .. E ( εi εN j X) .. .. .. E ε2N X November 23, 2013 1 C C C C C C A 45 / 174 2. The Multiple Linear Regression Model Comments The two assumptions (homoscedasticity and nonautocorrelation) imply that: V ( εj X) = E εε> X = σ2 IN | {z } | {z } N N 0 B B B =B B B @ Christophe Hurlin (University of Orléans) N N σ2 0 0 σ2 .. .. .. 0 .. .. .. .. .. .. 0 0 .. .. .. 0 .. 0 .. 0 .. .. .. 0 .. .. .. σ2 Advanced Econometrics - HEC Lausanne 1 C C C C C C A November 23, 2013 46 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 47 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 5: Data generation) The data in (xi 1 xi 2 ...xiK ) may be any mixture of constants and random variables. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 48 / 174 2. The Multiple Linear Regression Model Comments 1 Analysis will be done conditionally on the observed X, so whether the elements in X are …xed constants or random draws from a stochastic process will not in‡uence the results. 2 In the case of stochastic regressors, the unconditional statistical properties of are obtained in two steps: (1) using the result conditioned on X and (2) …nding the unconditional result by ”averaging” (i.e., integrating over) the conditional distributions. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 49 / 174 2. The Multiple Linear Regression Model Comments Assumptions regarding (xi 1 xi 2 ...xiK yi ) for i = 1, .., N is also required. This is a statement about how the sample is drawn. In the sequel, we assume that (xi 1 xi 2 ...xiK yi ) for i = 1, .., N are independently and identically distributed (i.i.d). The observations are drawn by a simple random sampling from a large population. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 50 / 174 2. The Multiple Linear Regression Model The classical linear regression model consists of a set of assumptions that describes how the data set is produced by a data generating process (DGP) Assumption 1: Linearity Assumption 2: Full rank condition or identi…cation Assumption 3: Exogeneity Assumption 4: Spherical error terms Assumption 5: Data generation Assumption 6: Normal distribution Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 51 / 174 2. The Multiple Linear Regression Model De…nition (Assumption 6: Normal distribution) The disturbances are normally distributed. εi j X N 0, σ2 or equivalently εj X Christophe Hurlin (University of Orléans) N 0N 1, σ 2 IN Advanced Econometrics - HEC Lausanne November 23, 2013 52 / 174 2. The Multiple Linear Regression Model Comments 1 Once again, this is a convenience that we will dispense with after some analysis of its implications. 2 Normality is not necessary to obtain many of the results presented below. 3 Assumption 6 implies assumptions 3 (exogeneity) and 4 (spherical disturbances). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 53 / 174 2. The Multiple Linear Regression Model Summary The main assumptions of the multiple linear regression model A1: linearity The model is linear with β A2: identi…cation X is an N A3: exogeneity E ( εj X) = 0N K matrix with rank K 1 A4: spherical error terms V ( ε j X) = A5: data generation X may be …xed or random A6: normal distribution εj X Christophe Hurlin (University of Orléans) σ2 I N 0N Advanced Econometrics - HEC Lausanne N 1, σ 2I N November 23, 2013 54 / 174 2. The Multiple Linear Regression Model Key Concepts 1 Simple linear regression model 2 Multiple linear regression model 3 Semi-parametric multiple linear regression model 4 Multiple linear Gaussian model 5 Assumptions of the multiple linear regression model 6 Linearity (A1), Identi…cation (A2), Exogeneity (A3), Spherical error terms (A4), Data generation (A5) and Normal distribution (A6) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 55 / 174 Section 3 The ordinary least squares estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 56 / 174 3. The ordinary least squares estimator Introduction 1 The simple linear regression model assumes that the following speci…cation is true in the population: y = Xβ + ε where other unobserved factors determining y are captured by the error term ε. 2 3 Consider a sample fxi 1 , xi 2 , .., xiK , yi gN i =1 of i.i.d. random variables (be careful to the change of notations here) and only one realization of this sample (your data set). How to estimate the vector of parameters β? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 57 / 174 3. The ordinary least squares estimator Introduction (cont’d) 1 If we assume that assumptions A1-A6 hold, we have a multiple linear Gaussian model (parametric model), and a solution is to use the MLE. The MLE estimator for β coincides to the ordinary least squares (OLS) estimator (cf. chapter 2). 2 If we assume that only assumptions A1-A5 hold, we have a semi-parametric multiple linear regression model, the MLE is unfeasible. 3 In this case, the only solution is to use the ordinary least squares estimator (OLS). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 58 / 174 3. The ordinary least squares estimator Intuition Let us consider the simple linear regression model and for simplicity denote xi = xi 2 : yi = β1 + β2 xi + εi The general idea of the OLS consists in minimizing the ”distance” between the points (xi , yi ) and the regression line ybi = b β1 + b β2 xi or the points (xi , ybi ) for all i = 1, .., N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 59 / 174 3. The ordinary least squares estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 60 / 174 3. The ordinary least squares estimator Intuition (cont’d) Estimates of β1 and β2 are chosen by minimizing the sum of the squared residuals (SSR): N ∑ bε2i i =1 This SSR can be written as: N ∑ bε2i = i =1 yi b β1 b β2 xi 2 Therefore, b β1 and b β2 are the solutions of the minimization problem N b β1 , b β2 = arg min ∑ (yi ( β1 ,β2 ) i =1 Christophe Hurlin (University of Orléans) β1 Advanced Econometrics - HEC Lausanne β2 xi )2 November 23, 2013 61 / 174 3. The ordinary least squares estimator De…nition (OLS - simple linear regression model) In the simple linear regression model yi = β1 + β2 xi + εi , the OLS estimators b β1 and b β2 are the solutions of the minimization problem N b β1 , b β2 = arg min ∑ (yi ( β1 ,β2 ) i =1 The solutions are: b β1 = y N β1 β2 xi )2 b β2 x N ∑N (xi x N ) (yi y N ) b β 2 = i =1 N 2 ∑i =1 (xi x N ) 1 N x respectively denote the where y N = N 1 ∑N ∑ i =1 i i =1 yi and x N = N sample mean of the dependent variable y and the regressor x. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 62 / 174 3. The ordinary least squares estimator Remark The OLS estimator is a linear estimator (cf. chapter 1) since it can be expressed as a linear function of the observations yi : b β2 = with ωi = N ∑ ωi yi i =1 ( xi x N ) (xi x N )2 ∑N i =1 in the case where y N = 0. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 63 / 174 3. The ordinary least squares estimator De…nition (Fitted value) The predicted or …tted value for observation i is: ybi = b β1 + b β2 xi with a sample mean equal to the sample average of the observations ybN = Christophe Hurlin (University of Orléans) 1 N N ∑ ybi = y N i =1 = 1 N N ∑ yi i =1 Advanced Econometrics - HEC Lausanne November 23, 2013 64 / 174 3. The ordinary least squares estimator De…nition (Fitted residual) The residual for observation i is: b β1 bεi = yi b β 2 xi with a sample mean equal to zero by de…nition. bεN = Christophe Hurlin (University of Orléans) 1 N N ∑ bεi = 0 i =1 Advanced Econometrics - HEC Lausanne November 23, 2013 65 / 174 3. The ordinary least squares estimator Remarks 1 2 ε2i (or SSR) is The …t of the regression is ”good” if the sum ∑N i =1 b ”small”, i.e., the unexplained part of the variance of y is ”small”. The coe¢ cient of determination or R2 is given by: ∑N (ybi R = iN=1 ∑i =1 (yi 2 Christophe Hurlin (University of Orléans) y N )2 y N )2 2 =1 εi ∑N i =1 b 2 N ∑i =1 (yi y N ) Advanced Econometrics - HEC Lausanne November 23, 2013 66 / 174 3. The ordinary least squares estimator Orthogonality conditions Under assumption A3 (strict exogeneity), we have E ( εi j xi ) = 0. This condition implies that: E ( εi ) = 0 E (εi xi ) = 0 Using the sample analog of this moment conditions (cf. chapter 6, GMM), one has: 1 N β1 b β2 xi = 0 yi b N i∑ =1 1 N N ∑ i =1 yi b β1 b β2 xi xi = 0 This is a system of two equations and two unknowns b β1 and b β2 . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 67 / 174 3. The ordinary least squares estimator De…nition (Orthogonality conditions) The ordinary least squares estimator can be de…ned from the two sample analogs of the following moment conditions: E ( εi ) = 0 E (εi xi ) = 0 The corresponding system of equations is just-identi…ed. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 68 / 174 3. The ordinary least squares estimator OLS and multiple linear regression model Now consider the multiple linear regression model y = Xβ + ε or K yi = ∑ βk xik + εi k =1 Objective: …nd an estimator (estimate) of β1 , β2 , .., βK and σ2 under the assumptions A1-A5. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 69 / 174 3. The ordinary least squares estimator OLS and multiple linear regression model Di¤erent methods: 1 Minimize the sum of squared residuals (SSR) 2 Solve the same minimization problem with matrix notation. 3 Use moment conditions. 4 Geometrical interpretation Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 70 / 174 3. The ordinary least squares estimator 1. Minimize the sum of squared residuals (SSR): As in the simple linear regression, N b = arg min ∑ β β i =1 ε2i K N = arg min ∑ β yi i =1 ∑ βk xik k =1 !2 One can derive the …rst order conditions with respect to βk for k = 1, .., K and solve a system of K equations with K unknowns. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 71 / 174 3. The ordinary least squares estimator 2. Using matrix notations: De…nition (OLS and multiple linear regression model) In the multiple linear regression model yi = xi> β + εi , with b is the solution of xi = (xi 1 , .., xiK )> , the OLS estimator β N b = arg min ∑ yi β β xi> β 2 i =1 The OLS estimators of β is: b= β Christophe Hurlin (University of Orléans) N ∑ i =1 xi xi> ! 1 N ∑ xi yi i =1 Advanced Econometrics - HEC Lausanne ! November 23, 2013 72 / 174 3. The ordinary least squares estimator 2. Using matrix notations: De…nition (Normal equations) Under suitable regularity conditions, in the multiple linear regression model yi = xi> β + εi , with xi = (xi 1 : .. : xiK )> , the normal equations are N ∑ xi i =1 Christophe Hurlin (University of Orléans) yi b = 0K xi> β Advanced Econometrics - HEC Lausanne 1 November 23, 2013 73 / 174 3. The ordinary least squares estimator 2. Using matrix notations: De…nition (OLS and multiple linear regression model) b In the multiple linear regression model y = Xβ + ε, the OLS estimator β is the solution of the minimization problem b = arg min ε> ε = arg min (y β β Xβ)> (y Xβ) β The OLS estimators of β is: b = X> X β Christophe Hurlin (University of Orléans) 1 X> y Advanced Econometrics - HEC Lausanne November 23, 2013 74 / 174 3. The ordinary least squares estimator 2. Using matrix notations: De…nition b of β minimizes the following criteria The ordinary least squares estimator β s ( β) = k(y Christophe Hurlin (University of Orléans) Xβ)k2IN = (y Xβ)> (y Advanced Econometrics - HEC Lausanne Xβ) November 23, 2013 75 / 174 3. The ordinary least squares estimator 2. Using matrix notations: The FOC (normal equations) are de…ned by: ∂s ( β) ∂β = b β b = 0K 2|{z} X> y X β {z } K N| 1 N 1 The second-order conditions hold: ∂s ( β) ∂β∂β > b β > =2 X | {zX} is de…nite positive K K since by de…nition X> X is a positive de…nite matrix. We have a minimum. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 76 / 174 3. The ordinary least squares estimator 2. Using matrix notations: De…nition (Normal equations) Under suitable regularity conditions, in the multiple linear regression model y = Xβ + ε, the normal equations are given by: X> (y Xβ) = 0K |{z} | {z } K N Christophe Hurlin (University of Orléans) 1 N 1 Advanced Econometrics - HEC Lausanne November 23, 2013 77 / 174 3. The ordinary least squares estimator De…nition (Unbiased variance estimator) In the multiple linear regression model y = Xβ + ε, the unbiased estimator of σ2 is given by: b2 = σ Christophe Hurlin (University of Orléans) N 1 N K ∑ bε2i i =1 SSR N K Advanced Econometrics - HEC Lausanne November 23, 2013 78 / 174 3. The ordinary least squares estimator 2. Using matrix notations: b2 can also be written as: The estimator σ b2 = σ N b2 = σ 2 (y b = σ Christophe Hurlin (University of Orléans) N 1 K ∑ yi i =1 Xβ)> (y N K k(y b xi> β 2 Xβ) Xβ)k2IN N K Advanced Econometrics - HEC Lausanne November 23, 2013 79 / 174 3. The ordinary least squares estimator 3. Using moment conditions: Under assumption A3 (strict exogeneity), we have E ( εj X) = 0. This condition implies: E (εi xi ) = 0K 1 with xi = (xi 1 : .. : xiK )> . Using the sample analogs, one has: 1 N N ∑ xi i =1 yi b = 0K xi> β 1 We have K (normal) equations with K unknown parameters b β1 , .., b βK . The system is just identi…ed. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 80 / 174 3. The ordinary least squares estimator 4. Geometric interpretation: 1 2 The ordinary least squares estimation methods consists in determining the adjusted vector, b y, which is the closest to y (in a certain space...) such that the squared norm between y and b y is minimized. Finding b y is equivalent to …nd an estimator of β. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 81 / 174 3. The ordinary least squares estimator 4. Geometric interpretation: De…nition (Geometric interpretation) The adjusted vector, b y, is the (orthogonal) projection of y onto the column space of X. The …tted error terms, b ε, is the projection of y onto the orthogonal space engendered by the column space of X. The vectors b y and b ε are orthogonal. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 82 / 174 3. The ordinary least squares estimator 4. Geometric interpretation: Source: F. Pelgrin (2010), Lecture notes, Advanced Econometrics Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 83 / 174 3. The ordinary least squares estimator 4. Geometric interpretation: De…nition (Projection matrices) The vectors b y and b ε are de…ned to be: b y=P y b ε=M y where P and M denote the two following projection matrices: P = X X> X M = IN Christophe Hurlin (University of Orléans) P = IN 1 X> X X> X Advanced Econometrics - HEC Lausanne 1 X> November 23, 2013 84 / 174 3. The ordinary least squares estimator Other geometric interpretations: Suppose that there is a constant term in the model. 1 The least squares residuals sum to zero: N ∑ bεi = 0 i =1 2 The regression hyperplane passes through the point of means of the data (xN , y N ) . 3 The mean of the …tted (adjusted) values of y equals the mean of the actual values of y : ybN = y N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 85 / 174 3. The ordinary least squares estimator De…nition (Coe¢ cient of determination) The coe¢ cient of determination of the multiple linear regression model (with a constant term) is the ratio of the total (empirical) variance explained by model to the total (empirical) variance of y: R2 = bi ∑N i =1 ( y N ∑i =1 (yi Christophe Hurlin (University of Orléans) y N )2 y N )2 2 =1 εi ∑N i =1 b 2 N ∑i =1 (yi y N ) Advanced Econometrics - HEC Lausanne November 23, 2013 86 / 174 3. The ordinary least squares estimator Remark 1 2 The coe¢ cient of determination measures the proportion of the total variance (or variability) in y that is accounted for by variation in the regressors (or the model). Problem: the R2 automatically and spuriously increases when extra explanatory variables are added to the model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 87 / 174 3. The ordinary least squares estimator De…nition (Adjusted R-squared) The adjusted R-squared coe¢ cient is de…ned to be: N 2 R =1 N 1 p 1 1 R2 where p denotes the number of regressors (not counting the constant term, i.e., p = K 1 if there is a constant or p = K otherwise). Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 88 / 174 3. The ordinary least squares estimator Remark One can show that 1 2 3 2 R <R2 2 if N is large R 'R2 2 The adjusted R-squared R can be negative. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 89 / 174 3. The ordinary least squares estimator Key Concepts 1 OLS estimator and estimate 2 Fitted or predicted value 3 Residual or …tted residual 4 Orthogonality conditions 5 Normal equations 6 Geometric interpretations of the OLS 7 Coe¢ cient of determination and adjusted R-squared Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 90 / 174 Section 4 Statistical properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 91 / 174 4. Statistical properties of the OLS estimator In order to study the statistical properties of the OLS estimator, we have to distinguish (cf. chapter 1): 1 The …nite sample properties 2 The large sample or asymptotic properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 92 / 174 4. Statistical properties of the OLS estimator But, we have also to distinguish the properties given the assumptions made on the linear regression model 1 Semi-parametric linear regression model (the exact distribution of ε is unknown) versus parametric linear regression model (and especially Gaussian linear regression model, assumption A6). 2 X is a matrix of random regressors versus X is a matrix of …xed regressors. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 93 / 174 4. Statistical properties of the OLS estimator Fact (Assumptions) In the rest of this section, we assume that assumptions A1-A5 hold. A1: linearity The model is linear with β A2: identi…cation X is an N A3: exogeneity E ( εj X) = 0N K matrix with rank K 1 A4: spherical error terms V ( ε j X ) = σ 2 IN A5: data generation X may be …xed or random Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 94 / 174 Subsection 4.1. Finite sample properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 95 / 174 4.1. Finite sample properties Objectives The objectives of this subsection are the following: 1 2 3 4 Compute the two …rst moments of the (unknown) …nite sample b and σ b2 distribution of the OLS estimators β b and Determine the …nite sample distribution of the OLS estimators β b under particular assumptions (A6). σ Determine if the OLS estimators are "good": e¢ cient estimator versus BLUE. Introduce the Gauss-Markov theorem. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 96 / 174 4.1. Finite sample properties First moments of the OLS estimators Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 97 / 174 4.1. Finite sample properties Moments In a …rst step, we will derive the …rst moments of the OLS estimators 1 2 b and V β b Step 1: compute E β b2 Step 2: compute E σ Christophe Hurlin (University of Orléans) b2 and V σ Advanced Econometrics - HEC Lausanne November 23, 2013 98 / 174 4.1. Finite sample properties De…nition (Unbiased estimator) In the multiple linear regression model y = Xβ0 + ε, under the assumption b is unbiased: A3 (strict exogeneity), the OLS estimator β b =β E β 0 where β0 denotes the true value of the vector of parameters. This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 99 / 174 4.1. Finite sample properties Proof Case 1: …xed regressors (cf. chapter 1) b = X> X β 1 X> y = β 0 + X> X 1 X> ε So, if X is a matrix of …xed regressors: b = β + X> X E β 0 1 X> E ( ε ) Under assumption A3 (exogeneity), E ( εj X) = E (ε) = 0. Then, we get: b =β E β 0 The OLS estimator is unbiased. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 100 / 174 4.1. Finite sample properties Proof (cont’d) Case 2: random regressors b = X> X β 1 X> y = β 0 + X> X 1 X> ε If X is includes some random elements: b X = β + X> X E β 0 1 X> E ( ε j X ) Under assumption A3 (exogeneity), E ( εj X) = 0. Then, we get: b X =β E β 0 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 101 / 174 4.1. Finite sample properties Proof (cont’d) Case 2: random regressors b is conditionally unbiased. The OLS estimator β Besides, we have: b X =β E β 0 b = EX E β b X E β = EX ( β 0 ) = β 0 where EX denotes the expectation with respect to the distribution of X. b is unbiased. So, the OLS estimator β b =β E β 0 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 102 / 174 4.1. Finite sample properties De…nition (Variance of the OLS estimator, non-stochastic regressors) In the multiple linear regression model y = Xβ + ε, if the matrix X is non-stochastic, the unconditional variance covariance matrix of the OLS b is: estimator β 1 b = σ 2 X> X V β Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 103 / 174 4.1. Finite sample properties Proof b = X> X β 1 X> y = β 0 + X> X 1 X> ε So, if X is a matrix of …xed regressors: b V β = E = E = Christophe Hurlin (University of Orléans) b β β0 X> X X> X 1 > 1 b β β0 X> εε> X X> X X> E εε> X X> X Advanced Econometrics - HEC Lausanne 1 1 November 23, 2013 104 / 174 4.1. Finite sample properties Proof (cont’d) Under assumption A4 (spherical disturbances), we have: V (ε) = E εε> = σ2 IN The variance covariance matrix of the OLS estimator is de…ned by: b V β = X> X = X> X = σ 2 X> X = σ 2 X> X Christophe Hurlin (University of Orléans) 1 1 X> E εε> X X> X X > σ 2 IN X X > X 1 X> X X> X 1 1 1 1 Advanced Econometrics - HEC Lausanne November 23, 2013 105 / 174 4.1. Finite sample properties De…nition (Variance of the OLS estimator, stochastic regressors) In the multiple linear regression model y = Xβ0 + ε, if the matrix X is stochastic, the conditional variance covariance matrix of the OLS b is: estimator β 1 b X = σ 2 X> X V β The unconditional variance covariance matrix is equal to: b = σ2 EX V β X> X 1 where EX denotes the expectation with respect to the distribution of X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 106 / 174 4.1. Finite sample properties Proof b = X> X β 1 X> y = β 0 + X> X 1 X> ε So, if X is a stochastic matrix: b X V β = E = E = Christophe Hurlin (University of Orléans) b β β0 X> X X> X 1 > 1 b β β0 X X> εε> X X> X 1 X> E εε> X X X> X Advanced Econometrics - HEC Lausanne X 1 November 23, 2013 107 / 174 4.1. Finite sample properties Proof (cont’d) Under assumption A4 (spherical disturbances), we have: V ( εj X) = E εε> X = σ2 IN The conditional variance covariance matrix of the OLS estimator is de…ned by: b X V β = X> X = X> X = σ 2 X> X Christophe Hurlin (University of Orléans) 1 1 X> E εε> X X X> X X > σ 2 IN X X > X 1 1 1 Advanced Econometrics - HEC Lausanne November 23, 2013 108 / 174 4.1. Finite sample properties Proof (cont’d) We have: b X = σ 2 X> X V β 1 The (unconditional) variance covariance matrix of the OLS estimator is de…ned by: b = EX V β b X V β = σ2 EX X> X 1 where EX denotes the expectation with respect to the distribution of X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 109 / 174 4.1. Finite sample properties Summary Case 1: X stochastic b =β E β 0 Mean Variance Cond. mean Cond. var b = σ 2 EX V β X> X b X =β E β 0 b X = σ 2 X> X V β Christophe Hurlin (University of Orléans) Case 2: X nonstochastic 1 b =β E β 0 b = σ 2 X> X V β 1 — 1 Advanced Econometrics - HEC Lausanne — November 23, 2013 110 / 174 4.1. Finite sample properties Question How to estimate the variance covariance matrix of the OLS estimator? b V β OLS b V β OLS = σ2 = σ 2 EX Christophe Hurlin (University of Orléans) X> X 1 X> X if X is nonstochastic 1 if X is stochastic Advanced Econometrics - HEC Lausanne November 23, 2013 111 / 174 4.1. Finite sample properties Question (cont’d) De…nition (Variance estimator) An unbiased estimator of the variance covariance matrix of the OLS estimator is given: 1 2 > b b β b V = σ X X OLS b 2 = (N K ) 1 b where σ ε>b ε is an unbiased estimator of σ2 . This result holds whether X is stochastic or non stochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 112 / 174 4.1. Finite sample properties Summary Case 1: X stochastic Variance Estimator b = σ 2 EX V β b b β V OLS Christophe Hurlin (University of Orléans) X> X b 2 X> X =σ Case 2: X nonstochastic 1 1 b = σ 2 X> X V β b b β V OLS Advanced Econometrics - HEC Lausanne 1 b 2 X> X =σ November 23, 2013 1 113 / 174 4.1. Finite sample properties De…nition (Estimator of the variance of disturbances) Under the assumption A1-A5, in the multiple linear regression model b2 is unbiased: y = Xβ + ε, the estimator σ b2 = σ2 E σ where b2 = σ N 1 N K ∑ bε2i = i =1 b ε>b ε N K This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 114 / 174 4.1. Finite sample properties Proof We assume that X is stochastic. Let M denotes the projection matrix (“residual maker”) de…ned by: M = IN X X> X 1 X> with b ε = M (N ,1 ) The N 1 2 y (N ,N )(N ,1 ) N matrix M satis…es the following properties: if X is regressed on X, a perfect …t will result and the residuals will be zero, so M X = 0 The matrix M is symmetric M> = M and idempotent M M = M Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 115 / 174 4.1. Finite sample properties Proof (cont’d) The residuals are de…ned as to be: Since y = Xβ + ε, we have b ε=My b ε = M (Xβ + ε) = MXβ + Mε Since MX = 0, we have b ε = Mε Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 116 / 174 4.1. Finite sample properties Proof (cont’d) b2 is based on the sum of squared residuals (SSR) The estimator σ b2 = σ ε> Mε b ε>b ε = N K N K The expected value of the SSR is E b ε>b ε X = E ε> Mε X The scalar ε> Mε is a 1 tr E ε> Mε X 1 scalar, so it is equal to its trace. = tr E εε> M X = tr E Mεε> X since tr (AB ) = tr (AB ) . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 117 / 174 4.1. Finite sample properties Proof (cont’d) Since M = IN X X> X 1 X> depends on X, we have: E b ε>b ε X = tr E Mεε> X = tr M E εε> X Under assumptions A3 and A4, we have E εε> X = σ2 IN As a consequence E b ε>b ε X = tr σ2 M IN = σ2 tr (M) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 118 / 174 4.1. Finite sample properties Proof (cont’d) E b ε>b ε X = σ2 tr (M) 1 = σ2 tr IN X X> X X> = σ2 tr (IN ) σ2 tr X X> X = σ2 tr (IN ) σ2 tr X> X X> X 1 X> 1 = σ2 tr (IN ) σ2 tr (IK ) = σ 2 (N K ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 119 / 174 4.1. Finite sample properties Proof (cont’d) b2 , we have: By de…nition of σ 2 b X = E σ E b ε>b ε X N K = σ 2 (N K ) = σ2 N K b2 is conditionally unbiased. So, the estimator σ b2 = EX E σ b2 X E σ = EX σ2 = σ2 b2 is unbiased: The estimator σ b2 = σ2 E σ Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 120 / 174 4.1. Finite sample properties Remark b2 . Given the same principle, we can compute the variance of the estimator σ b2 = σ As a consequence, we have: ε> Mε b ε>b ε = N K N K b2 X = V σ 1 (N K) V ε> Mε X b 2 = EX V σ b2 X V σ But, it takes... at least ten slides..... Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 121 / 174 4.1. Finite sample properties b2 ) De…nition (Variance of the estimator σ In the multiple linear regression model y = Xβ0 + ε, the variance of the b2 is estimator σ b2 = V σ 2σ4 N K where σ2 denotes the true value of variance of the error terms. This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 122 / 174 4.1. Finite sample properties Summary Case 1: X stochastic Case 2: X nonstochastic b2 = σ2 E σ b2 = σ2 E σ Mean b2 = V σ Variance Cond. mean Cond. var 2σ4 N K b2 X = σ2 E σ b2 X = V σ Christophe Hurlin (University of Orléans) b2 = V σ 2σ4 N K Advanced Econometrics - HEC Lausanne 2σ4 N K — — November 23, 2013 123 / 174 4.1. Finite sample properties Finite sample distributions Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 124 / 174 4.1. Finite sample properties Summary 1 2 3 Under assumptions A1-A5, we can derive the two …rst moments of b i.e. the (unknown) …nite sample distribution of the OLS estimator β, b and V β b . E β Are we able to characterize the …nite sample distribution (or exact b? sampling distribution) of β For that, we need to put an assumption on the distribution of ε and use a parametric speci…cation. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 125 / 174 4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, I) (1) In a parametric multiple linear regression model with stochastic b is known. The regressors, the conditional …nite sample distribution of β unconditional …nite sample distribution is generally unknown: b X β D where D is a multivariate distribution. Christophe Hurlin (University of Orléans) b β ?? Advanced Econometrics - HEC Lausanne November 23, 2013 126 / 174 4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, II) (2) In a parametric multiple linear regression model with nonstochastic b is regressors, the marginal (unconditional) …nite sample distribution of β known: b D β Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 127 / 174 4.1. Finite sample properties Finite sample distribution Fact (Finite sample distribution, III) (3) In a semi-parametric multiple linear regression model, with …xed or b is unknown random regressors, the …nite sample distribution of β b X β Christophe Hurlin (University of Orléans) ?? b β ?? Advanced Econometrics - HEC Lausanne November 23, 2013 128 / 174 4.1. Finite sample properties Example (Linear Gaussian regression model) Under the assumption A6 (normality - linear Gaussian regression model), N 0,σ2 IN ε b = X> X β 1 X> y = β + X> X 1 X> ε b is If X is a stochastic matrix, the conditional distribution of β b X β N 1 β,σ2 X> X b is If X is a nonstochastic matrix, the marginal distribution of β Christophe Hurlin (University of Orléans) b β N β,σ2 X> X 1 Advanced Econometrics - HEC Lausanne November 23, 2013 129 / 174 4.1. Finite sample properties Theorem (Linear gaussian regression model) b and σ b2 have a Under the assumption A6 (normality), the estimators β …nite sample distribution given by: b β N b2 σ (N σ2 β,σ2 X> X K) χ2 (N 1 K) b and σ b2 are independent. This result holds whether or not the Moreover, β b is matrix X is considered as random. In this last case, the distribution of β 1 b X = σ 2 X> X conditional to X with V β . Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 130 / 174 4.1. Finite sample properties E¢ cient estimator or BLUE? Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 131 / 174 4.1. Finite sample properties Question: OLS estimator = "good" estimator of β? The question is to know if there this estimator is preferred to other unbiased estimators? b V β OLS b 7V β other In general, to answer to this question we use the FDCR or Cramer-Rao bound and study the e¢ ciency of the estimator (cf. chapters 1 and 2). Problem: the computation of the FDCR bound requires an assumption on the distribution of ε. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 132 / 174 4.1. Finite sample properties Two cases: 1 In a parametric model, it is possible to show that the OLS estimator is e¢ cient - its variance reaches the FDCR bound. The OLS is the Best Unbiased Estimator (BUE) 2 In a semi-parametric model: 1 2 3 we don’t known the conditional distribution of y …ven X and the likelihood of the sample. The FDCR bound is unknown: so it is impossible to show the e¢ ciency of the OLS estimator. In a semi-parametric model, we introduce another concept: the Best Linear Unbiased Estimator (BLUE) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 133 / 174 4.1. Finite sample properties Problem (E¢ ciency of the OLS estimator) In a semi-parametric multiple linear regression model (with no assumption on the distribution of ε), it is impossible to compute the FDCR or Cramer Rao bound and to show that the e¢ ciency of the OLS estimator. The e¢ ciency of the OLS estimator can only be shown in a parametric multiple linear regression model. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 134 / 174 4.1. Finite sample properties Theorem (E¢ ciency - Gaussian model) In a Gaussian multiple linear regression model y = Xβ0 + ε, with b is e¢ cient. Its ε N 0,σ2 IN (assumption A6), the OLS estimator β variance attains the FDCR or Cramer-Rao bound: b = I 1 (β ) V β 0 N This result holds whether or not the matrix X is considered as random. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 135 / 174 4.1. Finite sample properties Proof Let us consider the case where X is nonstochastic. Under the assumption A6, the distribution is known, since y N Xβ,σ2 IN . The log-likelihood > of the sample fyi , xi gN i =1 , with xi = (xi 1 : .. : xiK ) , is then equal to: `N ( β; y, x) = N ln σ2 2 If we denote θ = β> : σ2 I N (θ0 ) = Eθ Christophe Hurlin (University of Orléans) > N ln (2π ) 2 (y Xβ)> (y 2σ2 Xβ) , we have (cf. chapter 2): ∂2 `N (θ; y, x) ∂θ∂θ> = 1 > X X σ2 Advanced Econometrics - HEC Lausanne 01 2 02 T 2σ4 1 ! November 23, 2013 136 / 174 4.1. Finite sample properties Proof (cont’d) I N (θ0 ) = 1 > X X σ2 01 2 The inverse of this block matrix is given by 1 0 1 > 2 σ X X 02 1 A= I N 1 (θ0 ) = @ 2σ4 01 2 T 02 1 T 2σ4 ! I N 1 ( β0 ) 01 2 02 IN 1 1 σ2 ! Then we have: b = I 1 ( β ) = σ 2 X> X V β 0 N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne 1 November 23, 2013 137 / 174 4.1. Finite sample properties Problem 1 In a semi-parametric model (with no assumption on the distribution of ε), it is impossible to compute the FDCR bound and to show the e¢ ciency of the OLS estimator. 2 The solution consists in introducing the concept of best linear unbiased estimator (BLUE): the Gauss-Markov theorem... Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 138 / 174 4.1. Finite sample properties Theorem (Gauss-Markov theorem) In the linear regression model under assumptions A1-A5, the least squares b is the best linear unbiased estimator (BLUE) of β whether estimator β 0 X is stochastic or nonstochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 139 / 174 4.1. Finite sample properties Comment The estimator b βk for k = 1, ..K is the BLUE of βk Best = smallest variance Linear (in y or yi ) : b βk = ∑N i =1 ω ki yi Unbiased: E b βk = βk Estimator: b βk = f (y1 , .., yN ) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 140 / 174 4.1. Finite sample properties Remark The fact that the OLS is the BLUE means that there is no linear b unbiased estimator with a smallest variance than β BUT, nothing guarantees that a nonlinear unbiased estimator may b have a smallest variance than β Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 141 / 174 4.1. Finite sample properties Summary Model Case 1: X stochastic Case 2: X nonstochastic Semi-parametric b is the BLUE β (Gauss-Markov) b is the BLUE β (Gauss-Markov) Parametric b is e¢ cient and BUE β (FDCR bound) b is e¢ cient and BUE β (FDCR bound) BUE: Best Unbiased Estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 142 / 174 4.1. Finite sample properties Other remarks Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 143 / 174 4.1. Finite sample properties Remarks 1 Including irrelevant variables (i.e., coe¢ cient = 0) does not a¤ect the unbiasedness of the estimator. 2 Irrelevant variables decrease the precision of OLS estimator, but do not result in bias. 3 Omitting relevant regressors obviously does a¤ect the estimator, since the zero conditional mean assumption does not hold anymore. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 144 / 174 4.1. Finite sample properties Remarks 1 2 Omitting relevant variables cause bias (under some conditions on these variables). 1 The direction of the bias can be (sometimes) characterized. 2 The size of the bias is more complicated to derive. More generally, correlation between a single regressor and the error term (endogeneity) usually results in all OLS estimators being biased Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 145 / 174 4.1. Finite sample properties Key Concepts 1 OLS unbiased estimator under assumption A3 (exogeneity) 2 Mean and variance of the OLS estimator under assumptions A3-A4 (exogeneity - spherical disturbances) 3 Finite sample distribution 4 Estimator of the variance of the error terms 5 BLUE estimator and e¢ cient estimator (FDCR bound) 6 Gauss-Markov theorem Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 146 / 174 Subsection 4.2. Asymptotic properties of the OLS estimator Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 147 / 174 4.2. Asymptotic properties Objectives The objectives of this subsection are the following 1 2 3 Study the consistency of the OLS estimator. b Derive the asymptotic distribution of the OLS estimator β. b Estimate the asymptotic variance covariance matrix of β. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 148 / 174 4.2. Asymptotic properties Theorem (Consistency) Let consider the multiple linear regression model yi = xi> β0 + εi , with xi = (xi 1 : .. : xiK )> . Under assumptions A1-A5, the OLS estimators b and σ b2 are ( weakly) consistent: β p b! β β0 p b2 ! σ2 σ Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 149 / 174 4.2. Asymptotic Properties Proof (cf. chapter 1) We have: b= β N ∑ i =1 xi xi> ! 1 N ∑ xi yi i =1 ! N = β0 + ∑ xi xi> 1 N N i =1 ! 1 N ∑ xi εi i =1 ! By multiplying and dividing by N, we get: b = β+ β Christophe Hurlin (University of Orléans) 1 N N ∑ i =1 xi xi> ! 1 ∑ xi εi i =1 Advanced Econometrics - HEC Lausanne ! November 23, 2013 150 / 174 4.2. Asymptotic Properties Proof (cf. chapter 1) 1 N b = β+ β 1 ∑ xi xi> i =1 ! 1 1 N N ∑ xi εi i =1 ! By using the (weak) law of large number (Kitchine’s theorem), we have: 1 N 2 N N 1 N p ∑ xi xi> ! E xi xi> i =1 N p ∑ xi εi ! E (xi εi ) i =1 By using the Slutsky’s theorem: p b! β+E β Christophe Hurlin (University of Orléans) 1 xi xi> E (xi εi ) Advanced Econometrics - HEC Lausanne November 23, 2013 151 / 174 4.2. Asymptotic Properties Proof (cf. chapter 1) p b! β β+E 1 xi xi> E (xi µi ) Under assumption A3 (exogeneity): E ( εi j xi ) = 0 ) We have E (εi xi ) = 0K E ( εi ) = 0 1 p b! β β b is (weakly) consistent. The OLS estimator β Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 152 / 174 4.2. Asymptotic properties Theorem (Asymptotic distribution) Consider the multiple linear regression model yi = xi> β0 + εi , with b is xi = (xi 1 : .. : xiK )> . Under assumptions A1-A5, the OLS estimator β asymptotically normally distributed: p or equivalently d b N β b β Christophe Hurlin (University of Orléans) β0 ! N 0, σ2 EX 1 xi xi> asy N β0 , σ2 E 1 xi xi> N X Advanced Econometrics - HEC Lausanne November 23, 2013 153 / 174 4.2. Asymptotic properties Comments 1 2 3 Whatever the speci…cation used (parametric or semi-parametric), it is always possible to characterize the asymptotic distribution of the OLS estimator. b is asymptotically Under assumptions A1-A5, the OLS estimator β normally distributed, even if ε is not normally distributed. This result (asymptotic normality) is valid whether X is stochastic or nonstochastic. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 154 / 174 4.2. Asymptotic properties Proof (cf. chapter 1) This proof has be shown in the chapter 1 (Estimation theory) 1 Let us rewrite the OLS estimator as: b=β + β 0 2 b Normalize the vector β p b N β β0 = AN = Christophe Hurlin (University of Orléans) 1 N N ∑ xi xi> i =1 ! 1 N ∑ xi εi i =1 ! β0 1 N N ∑ xi xi> i =1 ! 1 p N ∑ xi xi> bN = 1 N N p N i =1 Advanced Econometrics - HEC Lausanne N ∑ xi εi i =1 1 N ! = AN 1 bN N ∑ xi εi i =1 November 23, 2013 155 / 174 4.2. Asymptotic properties Proof (cf. chapter 1) 3. Using the WLLN (Kinchine’s theorem) and the CMP (continuous mapping theorem). Remark: if xi this part is useless. ! 1 1 N p xi xi> ! EX 1 xi xi> ∑ N i =1 4. Using the CLT: BN = p N 1 N N ∑ xi εi i =1 E (xi εi ) ! d ! N (0, V (xi εi )) Under assumption A3 E ( εi j xi ) = 0 =) E (xi εi ) = 0 and V (xi εi ) = E xi εi εi xi> = EX E xi εi εi xi> xi = EX xi V ( εi j xi ) xi> = σ2 EX xi xi> Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 156 / 174 4.2. Asymptotic properties Proof (cf. chapter 1) So we have with p b N β β 0 = A N 1 bN p AN 1 ! A 1 d bN ! N (Eb , Vb ) A 1 = EX 1 xi xi> Christophe Hurlin (University of Orléans) Eb = 0 Vb = σ2 EX xi xi> = σ2 A Advanced Econometrics - HEC Lausanne November 23, 2013 157 / 174 4.2. Asymptotic properties Proof (cf. chapter 1) p AN 1 ! A 1 d bN ! N (Eb , Vb ) A 1 =E xi xi> 1 Eb = 0 Vb = σ2 EX xi xi> = σ2 A By using the Slusky’s theorem (for a convergence in distribution), we have: d AN 1 bN ! N (Ec , Vc ) Ec = A Vc = A 1 Vb Christophe Hurlin (University of Orléans) Eb = A 1 A 1 =A 1 1 0=0 2 σ A Advanced Econometrics - HEC Lausanne A 1 = σ2 A 1 November 23, 2013 158 / 174 4.2. Asymptotic properties Proof (cf. chapter 1) p b N β d β0 = AN 1 bN ! N 0, σ2 A 1 with A 1 = EX 1 xi> xi (be careful: it is the inverse of the expectation and not the reverse). Finally, we have: p b N β Christophe Hurlin (University of Orléans) d β0 ! N 0, σ2 EX 1 xi xi> Advanced Econometrics - HEC Lausanne November 23, 2013 159 / 174 4.2. Asymptotic properties Remark 1 In some textbooks (Greene, 2007 for instance), the asymptotic variance covariance matrix of the OLS estimator is expressed as a function of the plim of N 1 X> X denoted by Q. Q = p lim 2 1 > X X N Both notations are equivalent since: 1 1 > X X= N N N ∑ xi xi> i =1 By using the WLLN (Kinchine’s theorem), we have: 1 > p X X ! Q = EX xi xi> N Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 160 / 174 4.2. Asymptotic properties Theorem (Asymptotic distribution) Consider the multiple linear regression model y = Xβ0 + ε. Under b is asymptotically normally assumptions A1-A5, the OLS estimator β distributed: p d b β ! N β N 0, σ2 Q 1 0 where Q = p lim 1 > X X = EX xi> xi N or equivalently b β Christophe Hurlin (University of Orléans) asy N β0 , σ2 Q N 1 Advanced Econometrics - HEC Lausanne November 23, 2013 161 / 174 4.2. Asymptotic properties De…nition (Asymptotic variance covariance matrix) b is The asymptotic variance covariance matrix of the OLS estimators β equal to 2 b = σ E 1 xi xi> Vasy β N X or equivalently 2 b = σ Q 1 Vasy β N where Q = p lim N 1 X> X. Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 162 / 174 4.2. Asymptotic properties Remark Finite sample variance Asymptotic variance b = σ 2 EX V β X> X 1 2 b = σ E 1 xi xi> Vasy β N X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 163 / 174 4.2. Asymptotic properties Remark How to estimate the asymptotic variance covariance matrix of the OLS estimator? 2 b = σ E 1 xi xi> Vasy β N X Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 164 / 174 4.2. Asymptotic properties De…nition (Estimator of the asymptotic variance matrix) A consistent estimator of the asymptotic variance covariance matrix is given by: 1 b =σ b asy β b 2 X> X V b2 is consistent estimator of σ2 : where σ b2 = σ Christophe Hurlin (University of Orléans) b ε>b ε N K Advanced Econometrics - HEC Lausanne November 23, 2013 165 / 174 4.2. Asymptotic properties Proof 2 b = σ E 1 xi xi> Vasy β N X We known that: p X> X 1 = N N N b2 ! σ2 σ p ∑ xi xi> ! EX i =1 xi xi> = Q By using the CMP and the Slutsky theorem, we get: b σ 2 Christophe Hurlin (University of Orléans) X> X N ! 1 p ! σ2 EX 1 xi xi> Advanced Econometrics - HEC Lausanne November 23, 2013 166 / 174 4.2. Asymptotic properties Proof (cont’d) As consequence 2 b = σ E 1 xi x> Vasy β i N X ! 1 X> X p 2 b σ ! σ2 EX 1 xi xi> N b2 σ N or equivalently Christophe Hurlin (University of Orléans) X> X N ! b 2 X> X σ 1 p ! σ2 E 1 xi xi> N X 1 p b ! Vasy β p b ! b b asy β V Vasy β Advanced Econometrics - HEC Lausanne November 23, 2013 167 / 174 4.2. Asymptotic properties Remark Under assumptions A1-A5, for a stochastic matrix X, we have Finite sample Variance Estimator b = σ 2 EX V β b =σ b β b 2 X> X V Christophe Hurlin (University of Orléans) Asymptotic X> X 1 1 b = σ2 N Vasy β 1Q 1 with Q = p lim N1 X> X b =σ b asy β b 2 X> X V Advanced Econometrics - HEC Lausanne November 23, 2013 1 168 / 174 4.2. Asymptotic properties Example (CAPM) The empirical analogue of the CAPM is given by: e rit = αi + βi e rmt + εt e rit = rit rft | {z } excess return of security i at time t rft ) } market excess return at time t where εt is an i.i.d. error term with: E ( εt ) = 0 e rmt = (rmt | {z V ( εt ) = σ 2 E ( εt j e rmt ) = 0 Data …le: capm.xls for Microsoft, SP500 and Tbill (closing prices) from 11/1/1993 to 04/03/2003 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 169 / 174 4.2. Asymptotic sample properties Example (CAPM, cont’d) Question: write a Matlab code in order to …nd (1) the estimates of α and β for Microsoft, (2) the corresponding standard errors, (3) the SE of the regression, (4) the SSR, (5) the coe¢ cient of determination and (6) the adjusted R 2 . Compare your results to the following table (Eviews) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 170 / 174 4.2. Asymptotic sample properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 171 / 174 4.2. Asymptotic sample properties Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 172 / 174 4.2. Asymptotic sample properties Key Concepts 1 The OLS estimator is weakly consist 2 The OLS estimator is asymptotically normally distributed 3 Asymptotic variance covariance matrix 4 Estimator of the asymptotic variance Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 173 / 174 End of Chapter 3 Christophe Hurlin (University of Orléans) Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne November 23, 2013 174 / 174