Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: Instructions Completion of this test only requires a writing instrument, scratch paper, and a calculator. Put away all other materials. Answer the questions in the space provided in the test. You will not receive credit for any work done in the scratch area or in any papers that you provide. Budget your time appropriately. Remember to write down YOUR NAME and ID number. Good luck! Multiple Choice Questions: [20 points] Please write the answers in the template provide here: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1) Heteroskedasticity means that a. b. c. d. homogeneity cannot be assumed automatically for the model. the variance of the error term is not constant. the observed units have different preferences. agents are not all rational. Answer: b 2) E(ui | Xi) = 0 says that a. b. c. d. dividing the error by the explanatory variable results in a zero (on average). the sample regression function residuals are unrelated to the explanatory variable. the sample mean of the Xs is much larger than the sample mean of the errors. the conditional distribution of the error given the explanatory variable has a zero mean. Answer: d 3) In the simple linear regression model, the regression slope a. indicates by how many percent Y increases, given a one percent increase in X. b. when multiplied with the explanatory variable will give you the predicted Y. c. indicates by how many units Y increases, given a one unit increase in X. d. represents the elasticity of Y on X. Answer: c 4) Under imperfect multicollinearity a. b. c. d. the OLS estimator cannot be computed. two or more of the regressors are highly correlated. the OLS estimator is biased even in samples of n > 100. the error terms are highly, but not perfectly, correlated. Answer: b 1 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: 5) If you had a two regressor regression model, then omitting one variable which is relevant a. b. c. d. will have no effect on the coefficient of the included variable if the correlation between the excluded and the included variable is negative. will always bias the coefficient of the included variable upwards. can result in a negative value for the coefficient of the included variable, even though the coefficient will have a significant positive effect on Y if the omitted variable were included. makes the sum of the product between the included variable and the residuals different from 0. Answer: c 6) If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then a. b. c. d. a series of t-tests may or may not give you the same conclusion. the regression is always significant. all of the hypotheses are always simultaneously rejected. the F-statistic must be negative. Answer: a 7) A nonlinear function a. b. c. d. makes little sense, because variables in the real world are related linearly. can be adequately described by a straight line between the dependent variable and one of the explanatory variables. is a concept that only applies to the case of a single or two explanatory variables since you cannot draw a line in four dimensions. is a function with a slope that is not constant. Answer: d 8) In the regression model Yi = and D is a dummy variable, a. b. c. d. β 0 + β1 X i + β 2 Di + β 3 ( X i × Di ) + ui , where X is a continuous variable β3 indicates the slope of the regression when D=1. has a standard error that is not normally distributed even in large samples since D is not a normally distributed variable. indicates the difference in the slopes of the two regressions. has no meaning since ( X i × Di ) = 0 when Di = 0. Answer: c 9) In the model Yi = a. β 0 + β1 X 1 + β 2 X 2 + β 3 ( X 1 × X 2 ) + ui , the expected effect ∆Y ∆ X1 is β1 + β 3 X 2 . 2 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: b. c. d. β1 . β1 + β 3 . β1 + β 3 X 1 . Answer: a 10) By including another variable in the regression, you will a. b. c. d. 2 decrease the regression R if that variable is important. eliminate the possibility of omitted variable bias from excluding that variable. look at the t-statistic of the coefficient of that variable and include the variable only if the coefficient is statistically significant at the 1% level. decrease the variance of the estimator of the coefficients of interest. Answer: b Analytical Questions: [40 points] 1. You have collected data for 104 countries to address the difficult questions of the determinants for differences in the standard of living among the countries of the world. You recall from your macroeconomics lectures that the neoclassical growth model suggests that output per worker (per capita income) levels are determined by, among others, the saving rate and population growth rate. To test the predictions of this growth model, you run the following regression: R̂elPersInc = 0.339 – 12.894 × n + 1.397 × sK , R2=0.621, SER = 0.177 (0.068) (3.177) (0.229) where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate, 1980-1990, and sK is the average investment share of GDP from 1960 to1990 (remember investment equals saving). Numbers in parentheses are for heteroskedasticity-robust standard errors. Hint: in the Solow growth model, the steady state level of savings (and therefore investment) is such that the level of capital per worker is kept constant. Therefore, investment is such that it replaces the capital that depreciates and covers the capital for the new workers added by population growth. Countries far away from steady state require heavy rates of savings (since their capital per worker levels are low). Similarly, developed countries will exhibit savings rates that are close to the steady state level. With declining marginal product rates, productivity will be higher the further away a country is from steady state, all else being equal. (a) Interpret the results by interpreting the slope coefficients of the regression, commenting on the fit, and any other characteristics you deem relevant. How does you interpretation of the regression results square with the predictions of the Solow growth model? Answer: The Solow growth model predicts higher productivity with higher saving rates and lower population growth. The signs therefore correspond to prior expectations. A 10 percent point increase in the saving rate results in a roughly 14 percent increase in per capita income relative to the United States. Lowering the population growth rate by 1 percent 3 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Spring 2003 Name: Student ID: results in a 13 percent higher per capita income relative to the United States. It is best not to interpret the intercept. The regression explains approximately 62 percent of the variation in per capita income among the 104 countries of the world. (b) Calculate the t-statistics and test (say at the 95% confidence level) whether or not each of the population parameters are significantly different from zero. Answer: The t-statistics for population growth and the saving rate are –4.06 and 6.10, making both coefficients significantly different from zero at conventional levels of significance. (c) You remember that human capital in addition to physical capital also plays a role in determining the standard of living of a country. You therefore collect additional data on the average educational attainment in years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression output: R̂elPersInc = 0.046 – 5.869 × n + 0.738 × sK + 0.055 × Educ, R2=0.775, SER = 0.1377 (0.079) (2.238) (0.294) (0.010) How has the inclusion of Educ affected your interpretation of the slope estimates, r-squared and other properties of the regression? Answer: The coefficient on the population growth rate is roughly half of what it was originally, while the coefficient on the saving rate has approximately doubled. The regression R2 has increased significantly. (d) Brazil has the following values in your sample: RelPersInc = 0.30, n = 0.021, sK = 0.169, Educ = 3.5. Does your equation overpredict or underpredict the relative GDP per worker? What would happen to this result if Brazil managed to double the average educational attainment? Answer: The predicted value for Brazil is 0.240. Hence the regression underpredicts Brazil’s per capita income. Increasing Educ to 7.0 would result in a predicted per capita income of 0.43, which is a substantial increase from both its current actual position and the previously predicted value. 2. After analyzing the age-earnings profile for 1,744 workers as shown in the figure, it becomes clear to you that the relationship cannot be approximately linear. 4 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: 2000 EARNINGS 1500 1000 500 0 0 20 40 60 80 100 AGE You estimate the following polynomial regression model, controlling for the effect of gender by using a dummy variable that takes on the value of one for females and is zero otherwise: Êarn = –795.90 + 82.93 × Age – 1.69 × Age 2 + 0.015 × Age3 – 0.0005 × Age 4 (283.11) (29.29) (1.06) (0.016) (0.0009) – 163.19 Female, R2=0.225, SER=259.78 (12.45) (a) Test for the significance of the Age4 coefficient. Describe one strategy you can think of to determine the appropriate degree of the polynomial. Answer: The coefficient has a t-statistic of 0.56 and hence is not statistically significant at conventional levels. The strategy is described in section 6.2 of the textbook. Considering first a polynomial of degree r, the coefficient associated with the largest value of r is tested for significance. From there, a sequential hypothesis testing procedure should be followed. (b) You run two further regressions. Present an argument as to which one you should use for further analysis. Êarn = – 683.21 + 65.83 × Age – 1.05 × Age 2 + 0.005 × Age3 (120.13) (9.27) (0.22) (0.002) – 163.23 Female, R2=0.225, SER=259.73 (12.45) Êarn = – 344.88 + 41.48 × Age – 0.45 × Age 2 5 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Spring 2003 Name: Student ID: (51.58) (2.64) (0.03) – 163.81 Female, R2=0.222, SER=260.22 (12.47) 3 Answer: The coefficient of Age is statistically significant at the 1% level using a one-sided hypothesis. The polynomial of degree three seems therefore the appropriate regression. (c) Sketch the graph of fitted earnings of males against age of your preferred regression. Does this make sense? Are you concerned about the negative coefficient on the regression intercept? What is the implication for female earners in this sample? Answer: Earnings Predicted Earnings and Age 700 600 500 400 300 200 100 0 15 25 35 45 55 65 Years Polynomial Degree 3 Polynomial Degree 2 There is little difference between the two fits for values between the age of 25 and 60. The inverted Ushape is well known to exist for age-earnings profiles, and hence the plot makes sense. There is no interpretation for the intercept, since there is no data close to the origin. Females earn significantly less at every age level. (d) Calculate the effect of changing age from 30 to 31 on earnings, holding constant the gender variable and using the model with polynomial of degree 3 regressors. Finally, calculate the standard errors of the estimated effect (you do not have to calculate this numerically, just explain the procedure in detail). Answer: Since this is a nonlinear relationship, the effect will depend on the age level. This is described in section 6.1 of the textbook. In essence, the predicted earnings value for one age level has to be computed first. Next, the same has to be done for the age level plus one. Finally the two values are differenced to find the change in earnings associated with the age level. For the polynomial of degree 3, the first task is to consider the estimated change in earnings associated with a change in age by one year, say from 30 to 31. This is given =β × (31 − 30) + β (312 − 302 ) + β (313 − 303 ) or by ∆Y 1 2 3 6 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: + 61β + 2791β . The standard error of the estimated effect is then given ∆Y = β 1 2 3 from | ∆Y | SE (∆Y ) = , where F + 61β + 2791β ) / SE ( β + 61β + 2791β ]2 . A 95% confidence F = [( β 1 2 3 1 2 3 interval for the change in the expected value of earnings is + 61β + 2791β ) ± 1.96 × SE ( β + 61β + 2791β ) . Obviously these (β 1 2 3 1 2 3 expressions get quite complicated once you go beyond a quadratic. 7 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: Empirical Question: [40 points] 1. The following output is based on a sample of 401, 12th grade students and the variables: DRUGS: Index of how much drug selling there is in the student's neighborhood.. ENROLLMENT: Enrollment at the school attended by the student. MATH87: Math test score in 8th Grade, in standard deviations from the mean. MATH91: Math test score in 12th Grade, in standard deviations from the mean. SES: Socio-economic status = a combination of parent's education, income, and goods in the household. SES is measured in standard deviations from the mean. URBAN: % of people in school' zip code that live in an urban area. Three models are estimated with these data, whose EViews output is reported below: Model 1 Dependent Variable: MATH91 Method: Least Squares Sample: 1 407 Included observations: 402 Excluded observations: 5 Variable Coefficient Std. Error t-Statistic Prob. C MATH87 SES URBAN ENROLLMENT DRUGS 0.012374 0.641167 0.133049 0.000446 6.76E-05 -0.916503 0.083436 0.037062 0.039321 0.000880 7.72E-05 0.436590 0.148309 17.29973 3.383634 0.507469 0.876704 -2.099230 0.8822 0.0000 0.0008 0.6121 0.3812 0.0364 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.548663 0.542964 0.610152 147.4253 -368.7840 1.970192 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.025206 0.902533 1.864597 1.924246 96.27848 0.000000 8 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: Model 2 Dependent Variable: MATH91 Method: Least Squares Sample: 1 407 Included observations: 405 Excluded observations: 2 Variable Coefficient Std. Error t-Statistic Prob. C MATH87 SES DRUGS 0.070319 0.639761 0.141094 -0.735341 0.071595 0.036826 0.039050 0.413901 0.982178 17.37269 3.613192 -1.776612 0.3266 0.0000 0.0003 0.0764 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.542521 0.539099 0.610941 149.6727 -373.0943 1.954262 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.026229 0.899902 1.862194 1.901738 158.5144 0.000000 Model 3 Dependent Variable: MATH91 Method: Least Squares Sample: 1 407 Included observations: 406 Excluded observations: 1 Variable Coefficient Std. Error t-Statistic Prob. C MATH87 SES -0.044097 0.647725 0.132900 0.030527 0.036618 0.038857 -1.444501 17.68890 3.420206 0.1494 0.0000 0.0007 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat 0.540066 0.537784 0.611986 150.9343 -375.2188 1.969220 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion F-statistic Prob(F-statistic) -0.023767 0.900158 1.863147 1.892750 236.6066 0.000000 9 Professor Òscar Jordà ECN 140 – ECONOMETRICS MIDTERM 2 Name: Spring 2003 Student ID: Answer the following questions: (a) Choose the “best” model and explain how you based your choice. (b) Test the null hypothesis that the coefficients of the variables URBAN and ENROLLMENT are jointly zero in Model 1 using the information in the output for any of the three models. Hint: the ( SSRR − SSRu ) n − k − 1 ~ Fq ,∞ . The 95% confidence SSRu q = 3.84; F2,∞ = 3.00; F3,∞ = 2.60; F4,∞ = 2.37; F5,∞ = 2.21 F-statistic can be computed as F = level critical values are: F1,∞ (c) Test the null hypothesis that the coefficient for the variable DRUGS in model 2 is not statistically significant without using the t-statistic reported in that output. Compare your results to the tstatistic reported in Model 2. Hint: The F ratio is the square of the t-ratio when only one constraint is being tested. (d) What do you learn (from a policy point of view) from the output in these three models? In particular, discuss the coefficient on the variables MATH87, DRUGS and SES and I am not just referring to their sign. SCRATCH AREA 10