x x ECONOMETRICS x x x CHAPTER 4 Multiple Regression = more than one explanatory variable Yi = B1 + B2 X2i + B3X3i + ui Independent variables are X2 and X3. X2i is the ith observation of X2. Yi = B1 + B2 X2i + B3X3i + ui B2 and B3 are partial regression coefficients. • B2 measures the change in E(Y) holding the value of X3 constant. Yi = b1 + b2 X2i + b3X3i + ei Sample regression function with parameter estimates. Estimating the impact of GDP and population on education expenditures. Educi = 414 + 0.052 GDPi - 50 Popi Holding population fixed, education spending increases 5.2¢ for every $1 of GDP. Educi = -161 + 0.048 GDPi GDP and population are correlated. When we don’t control for population, part of the population effect gets picked up by GDP. Estimating the impact of GDP and population on education expenditures. Educi = 414 + 0.052 GDPi - 50 Popi Holding GDP fixed, education spending decreases $50 for each additional person. Educi = 2,946 + 78.7 Popi When we don’t control for GDP, population picks up the GDP effect. Impact of Population on Education (millions) y = 78.716x + 2946.4 R2 = 0.0865 Education expenditures 25,000 20,000 15,000 10,000 5,000 0 0 50 Population 100 The Classical Linear Regression Model One more assumption 8. No exact linear relationship between explanatory variables, i.e. no multicollinearity. Example of multicollinearity: X2 = population of the state X3 = female population of the state X4 = male population of the state Linear relationship: X2 = X3 + X4 Second example of multicollinearity: X2 = % females in the state X3 = % males in the state Linear relationship: X2 = 1 - X3 Perfect collinearity is rare; error message if it happens. Regression is possible with high collinearity – but caution in interpretation of coefficients is needed. Estimation of Parameters Procedures for estimating parameters using OLS are the same (the equations just become more complicated.) Standard errors of the estimators are calculated in much the same way. We estimate the variance of the disturbance term in the population from the residuals in the sample. 2 ∑ e i 2 σ = n–k k represents the number of coefficients estimated. Estimating Goodness of Fit As before, R2 is used as a measure of goodness of fit. R2 = ESS / TSS Hypothesis Testing Testing the null hypothesis that Bi = 0 is the same as before except: df = n - k The test of significance approach to hypothesis testing Educi = 414 + 0.052 GDPi - 50 Popi Test statistic: t = b1 / se(b1) = 414 / 267 = 1.55 p = TDIST(t, df, tails) 1 tail: p = 0.065 2 tails: p = 0.13 t -1.55 0 1.55 Testing the Joint Hypothesis that B2=B3=0 Testing that all the coefficients* are equal to zero is the same as testing that R2=0. * Not necessarily the intercept, B1. R2 / (k - 1) F = (1 – R2) / (n – k) F follows the F distribution with (k-1) df in the numerator and (n-k) df in the denominator. From the regression of education expenditures on GDP and population (R2 = 0.962): F = 0.962 / 2 0.038 / 35 = 443.0 p = FDIST(F, df, tails) p = FDIST(443, 2, 35) = 1.6 E-25 = 0.000 * Note: This number is reported in standard regression output. Adjusted R2 Adjusted R2 is a goodness of fit measure that is adjusted for the number of explanatory variables. R2 always increases as you add explanatory variables. Adjusted R2 does not. R2 = 1 – (1 – R2) n–1 n–k