Uploaded by Wałęsak Margonem

Microeconometrics-Assignment-2-pdf-with-answers

advertisement
EC338 Econometrics 2: Microeconometrics Assignment 2
Adrian Pulchny (u2293153)
Section A
1)
This question is interesting, because we want to know whether veterans were suitably compensanted for their
service in the military. We aim to estimate the long term effects on civilian earnings of military service for
veterans using 19 year olds from 1950 to 1953, who were at risk of induction. Angrist analyses this through an
IV approach. Furthermore the question is relevant, because if veterans are supposed to get compensated fairly
well then the government can induce additional policies to offset the negative effects of serving in the military.
In Addition it may be wishful for the government to incentivize people to join the army in order to defend
the country, inducing policies which make the veterans even better would be enough of an incentive to build
a greater army. Furthermore, because of varying data sets previous researchers found different outcomes.
2)
A simple regression of earnings on veteran status will be biased if there any omitted factors, which are
correlated with the veteran status and also relevant for the earnings, so veteran status is endogenous. If
we don’t control for those factors we will get an biased estimator. For example if men with low civilian
opportunities tend to enlist in the army veteran status will be correlated with omitted factors. So veteran
status would be endogenous, because earnings are also correlated with the omitted factor we would have an
biased estimator. Whether it is over- or underestimated depends on the signs of the correlation.
3)
In the IV approach we need to find a variable that is correlated with veteran status and also relevant for
earnings, but uncorrelated with the error term. Functions of the draft lottery are IV when they are randomly
assigned. We use the draft eligibility and a constant to estimate the Wald estimator. Simply using an
indicator for draft eligiblity would give us a effect of draft eligiblity on earnings. The problem is that not
every draft eligible person served in the military and the non draft eligible persons could volunatirly enlist
into the military. The Wald estimator adjusts for that using the differences in the probability of being a
veteran.
4)
The first IV assumption requires the instrument to have a causal effect on the treatment (in this case the
veteran status), because draft eligibility is a function of the randomly assigned numbers it is therefore
correlated with the veteran status and so has a causal effect on it. The second IV assumption requires the
instrument to be as good as randomly assigned, which is given when the lottery is fair. If the assignment of
draft-eligiblity is random this would mean that prior to the lotteries the earnings of men should be similar.
In fact the earning between draft-eligible and draft-ineligible men don’t differ before the loterry. The only
thing that keeps them apart is the increased risk of being called in for the draft-eligible men. The third
assumption is given, because the instrument used is a function of the lottery numers and is not correlated
with the unobserved components of the earning equation. The only problem is that if draft avoidance is
1
correlated with lottery numbers then our instrument will fail the exclusion restriction, because it will be
correlated with the error term. Furthermore we assume that defiers do not exist.
5)
The estimation results in a Wald estimator. It adjusts the estimates of the effect of draft eligiblity on earnings
by taking into account that not everyone, who was draft eligible served in the military, because we assume
that nothing else than the difference in probabilities of being a veteran affects the earnings by draft eligiblity.
The Paper talks about compliers, because the estimator is a LATE (Local Average Treatment Effect), which
describes the effect of the veteran status on earnings for people who decided to enlist in the military. The
estimated effect for white veterans of the cohort 1950 is a average loss of 2000 constant dollars.
6)
There is a lot of criticism regarding the draft lottery in 1970, when evaluated by a Monte Carlo Simulation it
shows that it is highly unlikely that the lottery was really fair. I would like the researcher to critically review
the draft lottery and perform additional balance checks in order to assure the randomness of the loterry,
which is critical for the exclusion restriction because if not draft eligiblity is not randomly assigned then it
will be correlated with the error term and IV won’t work. Furthermore, I would really like the author to talk
about policy implications. The effect of veteran status on earnings is significantly negative and the author
could propose solutions for this problem and it may not incentivize people to enlist voluntarily to the military.
As economists we are researchers but also government consultants, so I would have expected some solutions
to the problem.
Section B
1)
The fraction of smokers in the dataset is 24.23%
2)
We can see that smokers have a slightly higher high school dropout rate and high school graduation rate,
whereas the average college graduation rate is twice as high for the nonsmokers (0.22) as for the smokers
(0.11)
Description of results for smoker:
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
smoker
smkban
age
hsdrop
hsgrad
colsome
colgrad
black
hispanic
female
smoker
smkban
age
hsdrop
hsgrad
vars
n mean
sd median trimmed
mad min max range skew
1 2423 1.00 0.00
1
1.00 0.00
1
1
0
NaN
2 2423 0.53 0.50
1
0.54 0.00
0
1
1 -0.13
3 2423 37.96 11.61
36
37.30 11.86 18 78
60 0.49
4 2423 0.14 0.35
0
0.05 0.00
0
1
1 2.06
5 2423 0.42 0.49
0
0.40 0.00
0
1
1 0.31
6 2423 0.28 0.45
0
0.23 0.00
0
1
1 0.96
7 2423 0.11 0.31
0
0.01 0.00
0
1
1 2.48
8 2423 0.08 0.27
0
0.00 0.00
0
1
1 3.17
9 2423 0.10 0.30
0
0.00 0.00
0
1
1 2.63
10 2423 0.54 0.50
1
0.55 0.00
0
1
1 -0.14
kurtosis
se
NaN 0.00
-1.98 0.01
-0.33 0.24
2.22 0.01
-1.90 0.01
2
##
##
##
##
##
colsome
colgrad
black
hispanic
female
-1.09
4.16
8.03
4.92
-1.98
0.01
0.01
0.01
0.01
0.01
Description of results for non-smoker
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
vars
n mean
sd median trimmed
mad min max range skew
1 7577 0.00 0.00
0
0.00 0.00
0
0
0
NaN
2 7577 0.63 0.48
1
0.67 0.00
0
1
1 -0.56
3 7577 38.93 12.26
38
38.21 13.34 18 88
70 0.52
4 7577 0.08 0.26
0
0.00 0.00
0
1
1 3.22
5 7577 0.30 0.46
0
0.24 0.00
0
1
1 0.90
6 7577 0.28 0.45
0
0.22 0.00
0
1
1 0.99
7 7577 0.22 0.42
0
0.16 0.00
0
1
1 1.32
8 7577 0.08 0.27
0
0.00 0.00
0
1
1 3.18
9 7577 0.12 0.32
0
0.02 0.00
0
1
1 2.38
10 7577 0.57 0.49
1
0.59 0.00
0
1
1 -0.29
kurtosis
se
smoker
NaN 0.00
smkban
-1.69 0.01
age
-0.21 0.14
hsdrop
8.39 0.00
hsgrad
-1.20 0.01
colsome
-1.03 0.01
colgrad
-0.26 0.00
black
8.10 0.00
hispanic
3.67 0.00
female
-1.91 0.01
smoker
smkban
age
hsdrop
hsgrad
colsome
colgrad
black
hispanic
female
3)
Test for differences in means with homogeneity. Both differences are significant at the 1% level.
Estimation results:
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
Two Sample t-test
data: df$hsdrop and dg$hsdrop
t = 9.9397, df = 9998, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05335701 0.07957172
sample estimates:
mean of x mean of y
0.14156005 0.07509568
Two Sample t-test
data: df$female and dg$female
t = -3.147, df = 9998, p-value = 0.001654
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.05908867 -0.01373103
3
## sample estimates:
## mean of x mean of y
## 0.5361123 0.5725221
4)
Smoking is not randomly assigned. We have a selection bias. Certain types of people are more likely to
smoke. As we saw in 3 smokers had a higher high school dropout rate and nonsmokers had a higher average
college graduation rate. When not controlling for these covariates, the negative effect of smoking on income
will be biased.
5)
1293 Smokers are affected by the smoking ban and 1130 are not. The difference is significant to any of the
common levels. The difference in means is -7.77%
Estimation results:
##
##
##
##
##
##
##
##
##
##
##
Two Sample t-test
data: smokesmk1$smoker and smokesmk0$smoker
t = -8.8633, df = 9998, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.09471105 -0.06040564
sample estimates:
mean of x mean of y
0.2120367 0.2895951
6)
The coefficient on smoking bans is way lower than in 5) and is at -4.5% the reason for this is we control for
more covariates and hence reduce the Omitted Variable Bias.
Estimation results:
##
##
##
##
##
##
##
##
##
Call:
lm(formula = smoker ~ ., data = data)
Coefficients:
(Intercept)
0.201276
colgrad
0.041541
smkban
-0.045343
black
-0.026503
age
-0.001354
hispanic
-0.103745
hsdrop
0.309919
female
-0.032874
hsgrad
0.224112
colsome
0.156170
7)
Unfortunately our LPM seems to have an issue, the minimum value is given as -0.08 which points towards
a negative probability. We have to keep the probabilities in the range of [0,1] in order to make them
interpretable.
Description results:
##
vars
n mean sd median trimmed mad
min max range skew kurtosis se
## X1
1 10000 0.24 0.1
0.26
0.24 0.11 -0.08 0.49 0.57 -0.18
-0.74 0
4
8)
THe coefficient is hard to interpret, because the probability of the outcome depends on the values of our x’es
the explanatory variables. We cant use the interpretation like in a normal linear regression model. But the
least we can say is that the coefficient on smkban is negative, which means that a smokeban, will have a
negative impact on the predicted probability for smoking. Furthemore here in the probit model we have non
constant effects.
The estimation results for the probit model:
##
##
##
##
##
##
##
##
##
##
##
##
##
Call: glm(formula = smoker ~ ., family = binomial(link = "probit"),
data = data)
Coefficients:
(Intercept)
-0.984241
colgrad
0.222474
smkban
-0.151762
black
-0.079690
age
-0.004203
hispanic
-0.332704
hsdrop
1.094230
female
-0.110625
Degrees of Freedom: 9999 Total (i.e. Null);
Null Deviance:
11070
Residual Deviance: 10500
AIC: 10520
hsgrad
0.851858
colsome
0.649256
9990 Residual
9)
The MEA and the average marginal effect seem to be very similar. It is not surprising, because smkban is a
dummy variable and therefore takes values 0 or 1. The Average Marginal Effect computes the average change
of smkban from 0 to 1 and the marginal effect at the average can not be very different, because for values of
smoke ban between (0,1) there should be no differences. Furthermore we have to ask us the question, whether
computing the MEA makes sense. The mean of smkban is not even a possible value.
Estimation results for the Average Marginal Effect:
##
##
smkban
age hsdrop hsgrad colsome colgrad
black hispanic
female
-0.04494 -0.001245 0.324 0.2522 0.1922 0.06587 -0.0236 -0.09851 -0.03276
Estimation results for the Marginal Effect at the Average:
##
##
##
##
at(smkban)
smkban
age hsdrop hsgrad colsome colgrad
black hispanic
0.6098 -0.04507 -0.001248 0.325 0.253 0.1928 0.06608 -0.02367 -0.09881
female
-0.03286
10)
The coefficients are nearly the same. The LPM estimates the probability of how a change in the smkban
influences the smoker rate. The Probit model, especially the average marginal effect basically does the same.
So there is no surprise the coefficients are the same, but the LPM has values, which can’t be interpreted
properly.
5
Download