ECONOMETRICS – Spring 2022 Professor John Ham and Naima Hafeez Problem Set 4 – Due on Wednesday, May 11th, 11:59 PM Only problems 1, 2 and 3 will be marked for the problem set. The remaining problems are for practice for the final (do not count towards the grade of PS4). 1. In 2016 the government surprised investors by announcing that area A would get an MRT station. You want to estimate the effect of this announcement on housing prices in Area A. You are given 2016 prices for a random sample of houses in Area A and given 2016 prices and characteristics for a random sample of houses for Area B, which does not have an MRT station, and is not expected to get one in the near future. (a) In addition to the above information, you are given 2015 prices for a random sample of houses in Area A as well as for a random sample of houses in Area B. How would you proceed to evaluate the effect of the announcement using a difference-in-difference estimator? Why might this estimator give an inconsistent estimates of the announcement effect. (b) You are now given both 2014 data and 2015 data for random samples in A and B. How would you use this data to estimate the effect of this announcement on housing prices in Area A using a triple difference estimator? Be sure to state the equation you would estimate by regression analysis. 2. Suppose you are interested in the effect of education on the fertility of women in rural India. Consider the model for women over 45 years old πβπππππππ = π½0 + π½1 πππ’ππ + π½2 ππππ + π½3 ππππ2 + π’π , where πβπππππππ is the number of children a mother has, πππ’ππ is her education (in years) and ππππ is the age of mother (in years). . (a) Given the simplicity of this model, explain why might πππ’π be endogenous? (b) Given (a), IV (or 2SLS) may be a more appropriate estimator than OLS. Clearly explain (in words) the key attributes of a valid instrument. (c) πΉππ π‘βπππ is a dummy variable equal to one if the woman was born during the first six months of the year. Assuming that women in rural India are more likely to leave school upon reaching a certain age (and get married), explain why might it be reasonable to use πΉππ π‘βπππ as an instrument for πππ’π? (you may assume students born early in the year typically begin school at an older age) (d) How would you test whether πΉππ π‘βπππ is a relevant IV candidate for πππ’π? How would you check if it is weak? (e) How would you implement two stage least squares to consistently estimate π½1 ? 3. A common specification for the labor supply of prime-aged men is given by ln(hit ) = δ ln(wit ) + λi + eit where βππ‘ denotes individual i's hours of work in year t, π€ππ‘ denotes individual i's hourly wage in year t, ππ is proportional to individual i's lifetime wealth and is unobserved, and πππ‘ is an unobserved error term that captures, among other things, individual tastes towards working in year t. Thus the overall error term is π’ππ‘ = ππ + πππ‘ . You have data on 1000 men for the US. (a) Life time wealth depends on the current wage, so πππ£(π€ππ‘ , ππ ) ≠ 0. Suppose for each person you have 5 observations. How can you use a fixed effects model to avoid bias when estimating πΏ? If the πππ‘ are independent across time for the same person, will you have to adjust your standard errors? (b) Again, suppose for each person you have 5 observations. How can you use a first difference model to avoid bias when estimating πΏ? If the πππ‘ are independent across time for the same person, will you have to adjust your standard errors? (c) A researcher claims that you also need to worry about πππ£(π€ππ‘ , πππ‘ ) ≠ 0, since employers may reward workers who have a strong work ethic with higher wages. If this is true and you use the fixed effects or first difference model, will your estimate of πΏ be unbiased? Why or why not? Practice Questions 1. Suppose we are trying to estimate the effect of smoking on a health measure Y. We have data on a group of smokers and a group of nonsmokers and observe their health measure. Suppose, however, that the nonsmokers were more apt to exercise and exercise improves their health level. Thus, our true model is ππ = π½0 + π½1 ππππππ + π½2 π΅ππΌπ + π’π , where ππππππ is a dummy variable equal to 1 if an individual smokes, and 0 otherwise and π΅ππΌπ is an individual’s Body Mass Index. We expect π½1 < 0 and π½2 < 0. We also expect Cov (ππππππ , π΅ππΌπ ) < 0. Instead of the true model, you run the regression ππ = πΌ0 + πΌ1 ππππππ + π£π How is πΌΜ1 related to the estimated coefficient you would have obtained if you ran a regression for the true model, π½Μ1? Does omitted variable bias make smoking look better or worse? Briefly justify your answer. 2. Suppose, for simplicity, that there are only two goods: X and Z. The demand for good X is given by log(π) = π½0 + π½1 log(ππ₯ ) + π½2 log(ππ§ ) + π½3 log(π) + π’, where X denotes consumption of good X, ππ₯ denotes the price of X, ππ§ denotes the price of Z, and Y denotes income. We know from economic theory that this function should be homogenous of degree zero e.g. doubling ππ₯ , ππ§ and π should not change consumption of π. (a) Show that this homogeneity property implies that π½1 + π½2 + π½3 = 0. (b) You are given a data set that allows you to estimate the regression equation. Write down the t-statistic that you would use to test whether homogeneity holds in your data and explain how you would obtain the numerator and the denominator for the test statistic. When would you reject the null hypothesis of homogeneity at the 5% level of statistical significance? (c) How would you test the same null hypothesis using an F-test? Carefully write the restricted and unrestricted model and explain all steps of conducting the test. 3. The following question pertains to determining the effect of various background characteristics on the probability that an applicant is admitted to study at a particular university. We draw a random sample of 800 applicants and record their characteristics and whether they were admitted. The names of the variables, a brief description and summary statistics are as follows: Variable Description Mean Std. Dev Min Max in =1 if got into university, 0 = if not 0.37 math Admission test math score verb Admission test verbal score public =1 if from public school, 0 if private school 0.46 male =1 if male, 0 if female public*male public x male 0.48 0 1 68.83 10.18 44 95 61.68 9.70 36 95 0.50 0 1 0.61 0.49 0 1 0.30 0.46 0 1 A linear probability model is estimated that yields the following results (standard errors in parentheses), ππ Μ = −13.3 + 0.10πππ‘β + 0.12π£πππ + 0.002ππ’ππππ − 0.17ππππ − 0.66 ππ’ππππ ∗ ππππ (1.06) (0.01) (0.01) (0.30) (0.26) (0.38) (a) What fraction of applicants are both women and not from public schools? (b) What is the predicted probability of being admitted for a male, public school applicant who has scored exactly the average mark in the two admission test components? (c) What is the predicted probability of being admitted for a female, private school applicant who has scored exactly the average mark in the two admission test components? (d) What is the precise interpretation of the coefficient on ππππ. (e) The coefficient on the interaction term is negative. What does it imply?