Uploaded by Faraz Zafar

PS4 2022

advertisement
ECONOMETRICS – Spring 2022
Professor John Ham and Naima Hafeez
Problem Set 4 – Due on Wednesday, May 11th, 11:59 PM
Only problems 1, 2 and 3 will be marked for the problem set. The remaining problems are for
practice for the final (do not count towards the grade of PS4).
1. In 2016 the government surprised investors by announcing that area A would get an MRT
station.
You want to estimate the effect of this announcement on housing prices in Area A. You
are given 2016 prices for a random sample of houses in Area A and given 2016 prices
and characteristics for a random sample of houses for Area B, which does not have an
MRT station, and is not expected to get one in the near future.
(a) In addition to the above information, you are given 2015 prices for a random sample of
houses in Area A as well as for a random sample of houses in Area B. How would you
proceed to evaluate the effect of the announcement using a difference-in-difference
estimator? Why might this estimator give an inconsistent estimates of the announcement
effect.
(b) You are now given both 2014 data and 2015 data for random samples in A and B. How
would you use this data to estimate the effect of this announcement on housing prices in
Area A using a triple difference estimator? Be sure to state the equation you would
estimate by regression analysis.
2. Suppose you are interested in the effect of education on the fertility of women in rural
India. Consider the model for women over 45 years old
π‘β„Žπ‘–π‘™π‘‘π‘Ÿπ‘’π‘›π‘– = 𝛽0 + 𝛽1 𝑒𝑑𝑒𝑐𝑖 + 𝛽2 π‘Žπ‘”π‘’π‘– + 𝛽3 π‘Žπ‘”π‘’π‘–2 + 𝑒𝑖 ,
where π‘β„Žπ‘–π‘™π‘‘π‘Ÿπ‘’π‘›π‘– is the number of children a mother has, 𝑒𝑑𝑒𝑐𝑖 is her education (in
years) and π‘Žπ‘”π‘’π‘– is the age of mother (in years).
.
(a) Given the simplicity of this model, explain why might 𝑒𝑑𝑒𝑐 be endogenous?
(b) Given (a), IV (or 2SLS) may be a more appropriate estimator than OLS. Clearly explain
(in words) the key attributes of a valid instrument.
(c) πΉπ‘Ÿπ‘ π‘‘β„Žπ‘Žπ‘™π‘“ is a dummy variable equal to one if the woman was born during the first six
months of the year. Assuming that women in rural India are more likely to leave school
upon reaching a certain age (and get married), explain why might it be reasonable to use
πΉπ‘Ÿπ‘ π‘‘β„Žπ‘Žπ‘™π‘“ as an instrument for 𝑒𝑑𝑒𝑐? (you may assume students born early in the year
typically begin school at an older age)
(d) How would you test whether πΉπ‘Ÿπ‘ π‘‘β„Žπ‘Žπ‘™π‘“ is a relevant IV candidate for 𝑒𝑑𝑒𝑐? How would
you check if it is weak?
(e) How would you implement two stage least squares to consistently estimate 𝛽1 ?
3. A common specification for the labor supply of prime-aged men is given by
ln(hit ) = δ ln(wit ) + λi + eit
where β„Žπ‘–π‘‘ denotes individual i's hours of work in year t, 𝑀𝑖𝑑 denotes individual i's hourly
wage in year t, πœ†π‘– is proportional to individual i's lifetime wealth and is unobserved, and 𝑒𝑖𝑑 is
an unobserved error term that captures, among other things, individual tastes towards
working in year t. Thus the overall error term is 𝑒𝑖𝑑 = πœ†π‘– + 𝑒𝑖𝑑 . You have data on 1000 men
for the US.
(a) Life time wealth depends on the current wage, so π‘π‘œπ‘£(𝑀𝑖𝑑 , πœ†π‘– ) ≠ 0. Suppose for each
person you have 5 observations. How can you use a fixed effects model to avoid bias
when estimating 𝛿? If the 𝑒𝑖𝑑 are independent across time for the same person, will you
have to adjust your standard errors?
(b) Again, suppose for each person you have 5 observations. How can you use a first
difference model to avoid bias when estimating 𝛿? If the 𝑒𝑖𝑑 are independent across time
for the same person, will you have to adjust your standard errors?
(c) A researcher claims that you also need to worry about π‘π‘œπ‘£(𝑀𝑖𝑑 , 𝑒𝑖𝑑 ) ≠ 0, since employers
may reward workers who have a strong work ethic with higher wages. If this is true and
you use the fixed effects or first difference model, will your estimate of 𝛿 be unbiased?
Why or why not?
Practice Questions
1. Suppose we are trying to estimate the effect of smoking on a health measure Y. We have
data on a group of smokers and a group of nonsmokers and observe their health measure.
Suppose, however, that the nonsmokers were more apt to exercise and exercise improves
their health level.
Thus, our true model is
π‘Œπ‘– = 𝛽0 + 𝛽1 π‘†π‘šπ‘œπ‘˜π‘’π‘– + 𝛽2 𝐡𝑀𝐼𝑖 + 𝑒𝑖 ,
where π‘†π‘šπ‘œπ‘˜π‘’π‘– is a dummy variable equal to 1 if an individual smokes, and 0 otherwise
and 𝐡𝑀𝐼𝑖 is an individual’s Body Mass Index. We expect 𝛽1 < 0 and 𝛽2 < 0. We also
expect Cov (π‘†π‘šπ‘œπ‘˜π‘’π‘– , 𝐡𝑀𝐼𝑖 ) < 0.
Instead of the true model, you run the regression
π‘Œπ‘– = 𝛼0 + 𝛼1 π‘†π‘šπ‘œπ‘˜π‘’π‘– + 𝑣𝑖
How is 𝛼̂1 related to the estimated coefficient you would have obtained if you ran a
regression for the true model, 𝛽̂1? Does omitted variable bias make smoking look better
or worse? Briefly justify your answer.
2. Suppose, for simplicity, that there are only two goods: X and Z. The demand for good X is
given by
log(𝑋) = 𝛽0 + 𝛽1 log(𝑝π‘₯ ) + 𝛽2 log(𝑝𝑧 ) + 𝛽3 log(π‘Œ) + 𝑒,
where X denotes consumption of good X, 𝑝π‘₯ denotes the price of X, 𝑝𝑧 denotes the price
of Z, and Y denotes income. We know from economic theory that this function should be
homogenous of degree zero e.g. doubling 𝑝π‘₯ , 𝑝𝑧 and π‘Œ should not change consumption of
𝑋.
(a) Show that this homogeneity property implies that 𝛽1 + 𝛽2 + 𝛽3 = 0.
(b) You are given a data set that allows you to estimate the regression equation. Write
down the t-statistic that you would use to test whether homogeneity holds in your data
and explain how you would obtain the numerator and the denominator for the test
statistic. When would you reject the null hypothesis of homogeneity at the 5% level
of statistical significance?
(c) How would you test the same null hypothesis using an F-test? Carefully write the
restricted and unrestricted model and explain all steps of conducting the test.
3. The following question pertains to determining the effect of various background
characteristics on the probability that an applicant is admitted to study at a particular
university. We draw a random sample of 800 applicants and record their characteristics
and whether they were admitted.
The names of the variables, a brief description and summary statistics are as follows:
Variable
Description
Mean Std. Dev Min Max
in
=1 if got into university, 0 = if not
0.37
math
Admission test math score
verb
Admission test verbal score
public
=1 if from public school, 0 if private school 0.46
male
=1 if male, 0 if female
public*male public x male
0.48
0
1
68.83 10.18
44
95
61.68 9.70
36
95
0.50
0
1
0.61
0.49
0
1
0.30
0.46
0
1
A linear probability model is estimated that yields the following results (standard errors in
parentheses),
𝑖𝑛
Μ‚ = −13.3 + 0.10π‘šπ‘Žπ‘‘β„Ž + 0.12π‘£π‘’π‘Ÿπ‘ + 0.002𝑝𝑒𝑏𝑙𝑖𝑐 − 0.17π‘šπ‘Žπ‘™π‘’ − 0.66 𝑝𝑒𝑏𝑙𝑖𝑐 ∗ π‘šπ‘Žπ‘™π‘’
(1.06) (0.01)
(0.01)
(0.30)
(0.26)
(0.38)
(a) What fraction of applicants are both women and not from public schools?
(b) What is the predicted probability of being admitted for a male, public school applicant
who has scored exactly the average mark in the two admission test components?
(c) What is the predicted probability of being admitted for a female, private school applicant
who has scored exactly the average mark in the two admission test components?
(d) What is the precise interpretation of the coefficient on π‘šπ‘Žπ‘™π‘’.
(e) The coefficient on the interaction term is negative. What does it imply?
Download