PS4 2022

ECONOMETRICS – Spring 2022 Professor John Ham and Naima Hafeez Problem Set 4 – Due on Wednesday, May 11th, 11:59 PM Only problems 1, 2 and 3 will be marked for the problem set. The remaining problems are for practice for the final (do not count towards the grade of PS4). 1. In 2016 the government surprised investors by announcing that area A would get an MRT station. You want to estimate the effect of this announcement on housing prices in Area A. You are given 2016 prices for a random sample of houses in Area A and given 2016 prices and characteristics for a random sample of houses for Area B, which does not have an MRT station, and is not expected to get one in the near future. (a) In addition to the above information, you are given 2015 prices for a random sample of houses in Area A as well as for a random sample of houses in Area B. How would you proceed to evaluate the effect of the announcement using a difference-in-difference estimator? Why might this estimator give an inconsistent estimates of the announcement effect. (b) You are now given both 2014 data and 2015 data for random samples in A and B. How would you use this data to estimate the effect of this announcement on housing prices in Area A using a triple difference estimator? Be sure to state the equation you would estimate by regression analysis. 2. Suppose you are interested in the effect of education on the fertility of women in rural India. Consider the model for women over 45 years old 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛𝑖 = 𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐𝑖 + 𝛽2 𝑎𝑔𝑒𝑖 + 𝛽3 𝑎𝑔𝑒𝑖2 + 𝑢𝑖 , where 𝑐ℎ𝑖𝑙𝑑𝑟𝑒𝑛𝑖 is the number of children a mother has, 𝑒𝑑𝑢𝑐𝑖 is her education (in years) and 𝑎𝑔𝑒𝑖 is the age of mother (in years). . (a) Given the simplicity of this model, explain why might 𝑒𝑑𝑢𝑐 be endogenous? (b) Given (a), IV (or 2SLS) may be a more appropriate estimator than OLS. Clearly explain (in words) the key attributes of a valid instrument. (c) 𝐹𝑟𝑠𝑡ℎ𝑎𝑙𝑓 is a dummy variable equal to one if the woman was born during the first six months of the year. Assuming that women in rural India are more likely to leave school upon reaching a certain age (and get married), explain why might it be reasonable to use 𝐹𝑟𝑠𝑡ℎ𝑎𝑙𝑓 as an instrument for 𝑒𝑑𝑢𝑐? (you may assume students born early in the year typically begin school at an older age) (d) How would you test whether 𝐹𝑟𝑠𝑡ℎ𝑎𝑙𝑓 is a relevant IV candidate for 𝑒𝑑𝑢𝑐? How would you check if it is weak? (e) How would you implement two stage least squares to consistently estimate 𝛽1 ? 3. A common specification for the labor supply of prime-aged men is given by ln(hit ) = δ ln(wit ) + λi + eit where ℎ𝑖𝑡 denotes individual i's hours of work in year t, 𝑤𝑖𝑡 denotes individual i's hourly wage in year t, 𝜆𝑖 is proportional to individual i's lifetime wealth and is unobserved, and 𝑒𝑖𝑡 is an unobserved error term that captures, among other things, individual tastes towards working in year t. Thus the overall error term is 𝑢𝑖𝑡 = 𝜆𝑖 + 𝑒𝑖𝑡 . You have data on 1000 men for the US. (a) Life time wealth depends on the current wage, so 𝑐𝑜𝑣(𝑤𝑖𝑡 , 𝜆𝑖 ) ≠ 0. Suppose for each person you have 5 observations. How can you use a fixed effects model to avoid bias when estimating 𝛿? If the 𝑒𝑖𝑡 are independent across time for the same person, will you have to adjust your standard errors? (b) Again, suppose for each person you have 5 observations. How can you use a first difference model to avoid bias when estimating 𝛿? If the 𝑒𝑖𝑡 are independent across time for the same person, will you have to adjust your standard errors? (c) A researcher claims that you also need to worry about 𝑐𝑜𝑣(𝑤𝑖𝑡 , 𝑒𝑖𝑡 ) ≠ 0, since employers may reward workers who have a strong work ethic with higher wages. If this is true and you use the fixed effects or first difference model, will your estimate of 𝛿 be unbiased? Why or why not? Practice Questions 1. Suppose we are trying to estimate the effect of smoking on a health measure Y. We have data on a group of smokers and a group of nonsmokers and observe their health measure. Suppose, however, that the nonsmokers were more apt to exercise and exercise improves their health level. Thus, our true model is 𝑌𝑖 = 𝛽0 + 𝛽1 𝑆𝑚𝑜𝑘𝑒𝑖 + 𝛽2 𝐵𝑀𝐼𝑖 + 𝑢𝑖 , where 𝑆𝑚𝑜𝑘𝑒𝑖 is a dummy variable equal to 1 if an individual smokes, and 0 otherwise and 𝐵𝑀𝐼𝑖 is an individual’s Body Mass Index. We expect 𝛽1 < 0 and 𝛽2 < 0. We also expect Cov (𝑆𝑚𝑜𝑘𝑒𝑖 , 𝐵𝑀𝐼𝑖 ) < 0. Instead of the true model, you run the regression 𝑌𝑖 = 𝛼0 + 𝛼1 𝑆𝑚𝑜𝑘𝑒𝑖 + 𝑣𝑖 How is 𝛼̂1 related to the estimated coefficient you would have obtained if you ran a regression for the true model, 𝛽̂1? Does omitted variable bias make smoking look better or worse? Briefly justify your answer. 2. Suppose, for simplicity, that there are only two goods: X and Z. The demand for good X is given by log(𝑋) = 𝛽0 + 𝛽1 log(𝑝𝑥 ) + 𝛽2 log(𝑝𝑧 ) + 𝛽3 log(𝑌) + 𝑢, where X denotes consumption of good X, 𝑝𝑥 denotes the price of X, 𝑝𝑧 denotes the price of Z, and Y denotes income. We know from economic theory that this function should be homogenous of degree zero e.g. doubling 𝑝𝑥 , 𝑝𝑧 and 𝑌 should not change consumption of 𝑋. (a) Show that this homogeneity property implies that 𝛽1 + 𝛽2 + 𝛽3 = 0. (b) You are given a data set that allows you to estimate the regression equation. Write down the t-statistic that you would use to test whether homogeneity holds in your data and explain how you would obtain the numerator and the denominator for the test statistic. When would you reject the null hypothesis of homogeneity at the 5% level of statistical significance? (c) How would you test the same null hypothesis using an F-test? Carefully write the restricted and unrestricted model and explain all steps of conducting the test. 3. The following question pertains to determining the effect of various background characteristics on the probability that an applicant is admitted to study at a particular university. We draw a random sample of 800 applicants and record their characteristics and whether they were admitted. The names of the variables, a brief description and summary statistics are as follows: Variable Description Mean Std. Dev Min Max in =1 if got into university, 0 = if not 0.37 math Admission test math score verb Admission test verbal score public =1 if from public school, 0 if private school 0.46 male =1 if male, 0 if female public*male public x male 0.48 0 1 68.83 10.18 44 95 61.68 9.70 36 95 0.50 0 1 0.61 0.49 0 1 0.30 0.46 0 1 A linear probability model is estimated that yields the following results (standard errors in parentheses), 𝑖𝑛 ̂ = −13.3 + 0.10𝑚𝑎𝑡ℎ + 0.12𝑣𝑒𝑟𝑏 + 0.002𝑝𝑢𝑏𝑙𝑖𝑐 − 0.17𝑚𝑎𝑙𝑒 − 0.66 𝑝𝑢𝑏𝑙𝑖𝑐 ∗ 𝑚𝑎𝑙𝑒 (1.06) (0.01) (0.01) (0.30) (0.26) (0.38) (a) What fraction of applicants are both women and not from public schools? (b) What is the predicted probability of being admitted for a male, public school applicant who has scored exactly the average mark in the two admission test components? (c) What is the predicted probability of being admitted for a female, private school applicant who has scored exactly the average mark in the two admission test components? (d) What is the precise interpretation of the coefficient on 𝑚𝑎𝑙𝑒. (e) The coefficient on the interaction term is negative. What does it imply?

PS4 2022

Related documents

Products

Support

PS4 2022

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib