Lecture 2: Multiple Regression Digression Differentiation involving vectors and matrices Consider the vectors a , x and matrix A defined as a1 a a2 a3 x1 x x2 x3 a11 a12 A a21 a22 a31 a32 a13 a23 a33 Then L = a x = a1 x1 + a 2 x2 + a 3 x3 is linear in x s Q = x'Ax = a11 x12 + a12 x1 x2 + a13 x1 x3 + a 21 x1 x2 + a 22 x22 + x23 x2 x3 + a 31 x1 x3 + a 32 x2 x3 + a 33 x23 where Q is called a quadratic form in Xs Note L = a1 x1 Denote L x1 L L = x2 x L x3 , L = a2 x2 , L = a3 x3 Rules of differentiation involving vectors and matrices R1) L = a x R2) 2 L = 0 x x' Now Q = x1 where 0(nxn) a11 x1 + a12 x 2 + a13 x 3 with similar expressions for + a11 x1 + a 21 x 2 + a 31 x 3 Q Q and x2 x3 Hence Q 2 Q = A x + A' x and = A + A' R3) x x x R4) If A is symmetric, ie A=A, then Q 2 Q = 2A x and = 2A x x x' The Multiple Regression Model in Matrix Notation Consider the multiple regression model with k explanatory variables Yi = 1 X1i + 2 X2i + ... + k Xki + ei i = 1,2, ... , n which can be written as Y1 X11 Y X 2 12 Y X n 1n X 21 X k 1 1 e1 X 22 X k 2 2 e2 X 2 n X kn k en or Y = X + e Y = nx1 vector of observations on explained variables X = nxk matrix of observations on the explanatory variables e = nx1 vector of errors = K x 1 vector of parameters to be estimated Assumptions 1.) e ~ IID (0, 2 I) where I is identity matrix The errors are independently and identically distributed with mean 0 and variance 2 I 2.) The Xs are non-stochastic and hence are independent of es 3.) The Xs are linearly independent. Hence rank (XX) = rank (X) = k which implies that (XX)-1 exists. Under these assumptions the best (minimum variance) linear unbiased estimators of is obtained by minimising the error sum of squares n Q = e2i = e'e = (Y - X )'(Y - X ) i 1 which is known as the Gauss-Markoff theorem Derivation Multiplying out the brackets gives Q = Y 'Y - 2 X'Y + ' X' X Q = - 2 X'Y + 2 X' X Setting Q = 0 as X' X is symmetric - 2 X'Y + 2 X' X = 0 = (X' X) -1 X'Y Consider the following data relating to real investment. The columns of X are: constant, time, Real GDP, interest rate and inflation. X 1 Y 0163 0195 1 . 0 . 231 1 0.257 1 0.259 1 0.225 1 0.241 1 0.204 1 1 2 3 4 5 1058 . 1088 . 1086 . 1122 . 1186 . 516 . 587 . 5.95 4.88 4.50 6 7 8 1254 . 1246 . 1.232 6.44 7.83 6.25 9 10 11 12 13 14 15 1298 . 1370 . 1439 . 1479 . 1474 . 1503 . 1475 . 550 . 5.46 7.46 10.28 1177 . 13.42 1102 . 4.40 515 . 5.37 4.99 4.16 5.75 8.82 9.31 5.21 583 . 7.40 8.64 9.31 9.44 5.99 The aim is to model real investment by estimating the equation Y = X + e where minimises the sum of squared residuals ee. . 15.00 120.00 19.31 11179 120.00 1240.00 164.30 1035.90 (X' X) = 19.310 164.30 25.218 148.98 11179 1035.90 148.98 95386 . . 99.77 875.60 13122 . 799.02 ( X' X ) -1 99.77 875.60 13122 . 799.02 716.67 2.270 66.77 01242 . 0.0711 67.41 2.27 0.08624 2.257 0.0064 0.0009 67.09 01614 . 0.0506 = -66.77 2.257 01242 . 0 . 0064 01614 . 0 . 03295 0 . 01665 0.0711 0.0009 0.0506 0.01665 0.040428 3.050 26.004 X ' Y 3.993 23521 . 20.732 0.5090 0.0166 = (X'X ) -1 X'Y = 0.6704 0.0023 0.0001 Since (XX)-1X is a matrix of constants the elements of are linear function of Y, which implies that is a linear estimator. Recall that Y = X+ e and substitute into = (X' X )-1 X'Y to give = (X' X)-1 X'(X + e ) = + (X' X)-1 X'e E( ) as E(e) 0 is an unbiased estimator -1 V( ) = E[( - ) ( - )'] = (X' X) X ' E( e e' )X(X ' X )-1 = (X' X)-1 2 Since E(e e') = I 2 To show that least squares estimates have the minimum variance consider any other linear estimator * = + CY Then * = + CX + [(X' X) -1 X' + C]e E( ) = + CX * * Require CX = 0 for to be unbiased. V( ) = E[( - ) ( - )'] * * * = [(X' X) -1 X' + C]E(e e')[(X' X) -1 X' + C]' Since E(e e') = I 2 and CX = 0 V( ) = (X' X) -1 2 + (CC') 2 * Hence * V( ) V( ) Example Calculating the variance-covariance matrix corresponding to the real investment function V( ) = 2 (X' X) -1 where 2 = e 'e n- k e'e = (Y- X )' (Y- X ) = Y'Y - ' X'Y = 0.0004507 2 = 0.0004507 = 0.00004507 15 - 5 0.00304 0.000102 0.0000039 0.000102 0.00302 V ( ) 0.00301 0.0000056 0 . 0000003 0 . 000007 0 . 0000015 8 -5 0.0000032 0.00000004 0.0000022 8x10 1.8x10 Constant Time Real GDP Interest rate Inflation Coefficient -0.50907 -0.01658 0.67038 -0.00232 -0.00009 Standard Error 0.0551 0.001972 0.05499 0.01219 0.001347 Hypotheses Tests and Analysis of Variance The test for r restrictions of the multiple regression model with K explanatory variables Yi = 0 + 1X1i + 2X2i + ... + kXki is given by ( RRSS URSS ) r F( r ,n k 1) URSS n k 1 where URSS = unrestricted residual sum of squares RRSS = restricted residual sum of squares obtained by imposing restrictions of hypotheses Example Consider the restriction H0:1 = 2 = ... = k = 0 Now ESS Syy 2 = = R Syy - RSS =1- Syy 1 - R2 = RSS Syy RSS Syy URSS = (1 -R2)Syy RRSS = Syy Hence S yy S yy (1 R 2 ) k F( k ,n k 1) S yy (1 R 2 ) n k 1 R2 n k 1 k 1 R 2 Analysis of Variance for Multiple Regression Model Source of variation Sum of Degrees of squares freedom Mean square Regression R2Syy MS1= R2Syy/k k F-test =MS1/MS2 Regression (1-R2)Syy Total Syy n-k-1 MS2=[(1-R2)Syy]/(n-k-1) n-1 Measuring the goodness of fit The question is how best to measure the goodness of fit of a multiple regression equation. The problem with R2 ESS TSS is that as more explanatory variables are added to the regression equation the R2 at worst will remain constant but will generally increase. Consequently, consideration must be given to the number of explanatory variables when assessing the goodness of fit of a multiple regression equation. a) Adjusted R 2 , R 2 2 = 1 - n- 1 (1 - 2) R R n- k- 1 = 1- n- 1 RSS n- k- 1 TSS 2 2 R adjusts R to take into account the loss of degrees of freedom when adding more explanatory variables. b) Standard error of the regression (SER) SER = ˆ = RSS n- k- 1 RSS = Residual Sum of Squares As the number of explanatory variables increases RSS will tend to decline but there is a corresponding proportional decrease in the degrees of freedom, n-k-1. Information criteria are used as a guide in model selection, especially for autoregressive models. In general, the information contained in a model is the distance from the "true" model, which is measured by the log likelihood function. The information criteria provides a measure of the information that balances the goodness of fit and the parsimonious specification of the model. c) Akaike information criteria (AIC) where Lmax Lmax k 2 AIC 2 ln n n = Maximum likelihood value n = number of parameters k = number of estimated parameters d) Schwarz information criteria (SC) Lmax 2k ln n SC 2 ln n n Dummy variables in the regression model A dummy variable (also described as an indicator variable or binary variable) takes the value 1 if a particular event occurs and 0 otherwise. Consider X1 = 1 if over 35 0 otherwise in a consumption function Ci = 0.73 + 0.21X1i + 0.83Ii The inclusion of a dummy variable shifts the intercept upwards but keeps the marginal propensity to consume the same for all ages. The size of the increase of the intercept is the coefficient on the dummy variable. Ci C = 0.73 + 0.21X + 0.83I Ii The introduction of dummy variables means that the X matrix has been altered to X= 1 1 1 1 1 individual. 0 0 1 0 1 I1 I2 I3 I4 In where Ii denotes the income level of the ith Hence the OLS estimate of the coefficient on the dummy variable is obtained from the second element of the vector ˆ = (X' X )-1 X' Y Dummy variables can be used in time series analysis to remove any strange observations, for example strikes or stock market crashes, and to proxy policy changes, especially changes in taxation that cannot be quantified. Dummy variables are most frequently used in multiple regression models to remove the seasonal pattern from the data. { 1 if Spring Q1 = { { 0 otherwise { 1 if Summer Q2 = { { 0 otherwise { 1 if Autumn Q3 = { { 0 otherwise Hence the X matrix is transformed to X= 1 1 1 1 1 1 X11 X12 X13 X14 X15 X1 n X 21 1 0 0 X 22 0 1 0 X 23 0 0 1 X 24 0 0 0 X 25 1 0 0 0 0 1 X2 n Omission of Relevant Variables Consider True model Yi = 1X1i + 2X2i + ui Yi = 1X1i + Vi Estimated model Estimate of ̂ 1 is n X1i Yi ̂ 1 = i 1 n 2 X1i i 1 Substituting Yi in from the true model gives ˆ1 = n X1i ( 1 X1i + 2 X 2 i + Ui) i =1 n 2 X 1i i 1 n = 1 2 X 1i X 2i i 1 n n X 1iU i i 1 2 X 1i i 1 n 2 X 1i i 1 E( ˆ1) = 1 + b21 2 n X 1i X 2i n as E ( X 1iU i ) 0 and b 21 i 1 n 2 i 1 X 1i i 1 ̂ 1 is biased. Inclusion of Irrelevant Variables Consider True model: Y = 1X1 + U Estimated model: Y = 1X1 + 2X2 + U ~ = S22 S1 y - S12 S2 y 1 2 S11S22 - S12 Now ~ = S11S2 y - S12 S1 y 2 2 S11S22 - S12 S1 y = X1i Yi = X1i ( 1 X1i + Ui) E(S1 y) = 1 S11 Likewise E(S2y) = 1S12 E( ~1) = 1 E( ~ 2) = 0 Variance of correct model is 2 var(ˆ1) = S11 compared to var(ˆ1) = 2 2 )S (1 - r12 11 var(~1) > var(ˆ1) Tests for parameter The equations are First data set: Y1t = 10 + 11X1t + 12X2t + 13X3t + + 1kXkt + 1t Second data set: Y2t = 20 + 21X1t + 22X2t + 23X3t + + 2kXkt + 2t A test for stability of the parameters between the populations that generate the two data sets is a test of the following hypothesis: H0: 10 = 20 , 11 = 21 , 12 = 22 , , 1k = 2k If this equation is true, a single equation can be estimated for the data set obtained by pooling the two data sets. Let RSS1 = residual sum of squares for the first data set. RSS2 = residual sum of squares for the second data set. RRSS = Restricted residual sum of squares. a) Chow test ( RRSS URSS ) F( r , n k 1) URSS k 1 (n1 n2 2 k 2) where n1 and n2 are the respective sample sizes. 18 G26/27/28: Core Econometrics 2 18 b) Predictive stability test F( n2 ,n1 k 1) ( RRSS RSS1 ) n2 RSS1 n1 k 1 19 G26/27/28: Core Econometrics 2 19