Department of Economics Columbia University Economics UN3412 Spring 2019 Final Exam Section 2 (Mon/Wed section at 2:40pm) (Seyhan Erden) Instructions 1. Do not turn this page until so instructed. 2. This exam ends promptly at 4pm. 3. This exam has five questions for a total of 100 points. 4. Write down your Columbia ID number on the cover of this exam. 5. You are permitted to use a simple calculator. No computers, wireless, or other electronic devices without prior permission. You may not share resources with anyone else. 6. Some questions ask you to draw a real-world judgment in a problem of practical importance. The quality of that judgment counts. For example, consider the question: “It is 10oF outside. In your judgment, why are so many people wearing heavy coats?” The answer, “To stay warm” would receive more points than the answer, “Because they are fashion-conscious.” NAME:_________________________________________________________ UNI:__________________________________________________________ 1 Question 1 [15 points]: Imagine that you are interested in learning about the relationship between the research and development (R&D) expenditures of firms and patents applied for by them. Suppose that you would like to measure the outcome of innovation by the count of patents. Specifically, let the dependent variable (πππ‘ ) be the total number of patents applied for by firm π in year π‘, and the explanatory variable (πππ‘ ) the logarithm of real R&D expenditures. We have data {(πππ‘ , πππ‘ ): π = 1, … , π; π‘ = 1, 2} for a large number (π) of firms and for two years (π = 2). Consider the linear panel data model πππ‘ = πππ‘ π½1 + πΌπ + ππ‘ + π’ππ‘ , π‘ = 1,2; π = 1, … , π Where πΌπ is the firm fixed effect and ππ‘ is the year fixed effect. (a) (4p) Explain why a linear regression model with only one year of data may not be appropriate for understanding the relationship between R&D and patents. (b) (4p) Describe carefully how you would like to estimate π½1 in this case. 2 (c) (4p) A friend of yours claims that both π1 and π2 can be identified and estimated consistently, since you have two years of data. Would you agree? Justify your answer. (d) (3p) Another friend of yours suggests that you should use “clustered standard errors” that are robust against possible correlations across years within a firm. Would you agree? Justify your answer. 3 Question 2 [15 points]: A study tried to find the determinants of the increase in the number of households headed by a female. Using 1940 and 1960 historical census data, a logit model was estimated to predict whether a woman is the head of a household (living on her own) or whether she is living within another's household. The limited dependent variable takes on a value of one if the female lives on her own and is zero if she shares housing. The results for 1960 using 6,051 observations on prime-age whites and 1,294 on nonwhites were as shown in the table: Regression Regression model Constant Age age squared education farm status South expected family earnings family composition Pseudo-R2 Percent Correctly Predicted (1) White Logit 1.459 (0.685) -0.275 (0.037) 0.00463 (0.00044) -0.171 (0.026) -0.687 (0.173) 0.376 (0.098) 0.0018 (0.00019) 4.123 (0.294) 0.266 (2) Nonwhite Logit -2.874 (1.423) 0.084 (0.068) 0.00021 (0.00081) -0.127 (0.038) -0.498 (0.346) -0.520 (0.180) 0.0011 (0.00024) 2.751 (0.345) 0.189 82.0 83.4 where age is measured in years, education is years of schooling of the family head, farm status is a binary variable taking the value of one if the family head lived on a farm, south is a binary variable for living in a certain region of the country, expected family earnings was generated from a separate OLS regression to predict earnings from a set of regressors, and family composition refers to the number of family members under the age of 18 divided by the total number in the family. 4 The mean values for the variables were as shown in the table. Variable age age squared education farm status south expected family earnings family composition (1) White mean 46.1 2,263.5 12.6 0.03 0.3 2,336.4 (2) Nonwhite mean 42.9 1,965.6 10.4 0.02 0.5 1,507.3 0.2 0.3 (a) (5p) Interpret the results. Do the coefficients have the expected signs? Why do you think age was entered both in levels and in squares? 5 (b) (5p) Calculate the difference in the predicted probability between whites and nonwhites at the sample mean values of the explanatory variables. Why do you think the study did not combine the observations and allowed for a nonwhite binary variable to enter? (c) (5p) What would be the effect on the probability of a nonwhite woman living on her own, if education and family composition were changed from their current mean to the mean of whites, while all other variables were left unchanged at the nonwhite mean values? 6 Question 3 [15 points]: Earnings functions, whereby the log of earnings is regressed on years of education, years of on the job training, and individual characteristics, have been studied for a variety of reasons. Some studies have focused on the returns to education, others on discrimination, union non-union differentials, etc. For all these studies, a major concern has been the fact that ability should enter as a determinant of earnings, but that it is close to impossible to measure and therefore represents an omitted variable. Assume that the coefficient on years of education is the parameter of interest. Given that education is positively correlated to ability, since, for example, more able students attract scholarships and hence receive more years of education, the OLS estimator for the returns to education could be upward biased. To overcome this problem, various authors have used instrumental variable estimation techniques. For each of the instruments potential instruments listed below briefly discuss instrument validity. (a) (3p) The individual's postal zip code. (b)(4p) The individual's IQ or testscore on a work related exam. 7 (c) (4p) Years of education for the individual's mother or father. (d)(4p) Number of siblings the individual has. 8 Question 4 [15 points]: Your textbook estimates the initial relationship between the percentage change of real frozen OJ and the freezing degree days as follows: t = -0.40 + 0.47 FDDt (0.22) (0.13) t = 1950:1 — 2000:12, π 2 = 0.09, ππΈπ = 4.8 (a) (4p) Calculate the t-statistic for the slope coefficient. Can you reject the null hypothesis that the coefficient is zero in the population at 5% level? (b) (5p) The above regression was estimated using HAC standard errors. When you reestimate the regression using homoskedasticity-only standard errors, the standard error of the slope coefficient drops to 0.06. Calculate the t-statistic for the slope coefficient again. 9 (c) (5p) Which of the two standard errors should you use for statistical inference? 10 Question 5 [40 points]: True/False/Uncertain with explanations (a) The two conditions for a valid instrument are corr(Zi, Xi) = 0 and corr(Zi, ui) ≠ 0. (b) Stationarity means that the probability distribution of the time series variable does not change over time. 11 (c) In time series for dynamic causal effects model if the regressor is strictly exogenous OLS still efficient as an estimator of dynamic causal effects. (d) Time Fixed Effects regression are useful in dealing with omitted variables if these omitted variables are constant over time but vary across entities. 12 Selected Tables from Stock and Watson, Introduction to Econometrics 13 14 15 16