EC338 Econometrics 2: Microeconometrics Assignment 2 Adrian Pulchny (u2293153) Section A 1) This question is interesting, because we want to know whether veterans were suitably compensanted for their service in the military. We aim to estimate the long term effects on civilian earnings of military service for veterans using 19 year olds from 1950 to 1953, who were at risk of induction. Angrist analyses this through an IV approach. Furthermore the question is relevant, because if veterans are supposed to get compensated fairly well then the government can induce additional policies to offset the negative effects of serving in the military. In Addition it may be wishful for the government to incentivize people to join the army in order to defend the country, inducing policies which make the veterans even better would be enough of an incentive to build a greater army. Furthermore, because of varying data sets previous researchers found different outcomes. 2) A simple regression of earnings on veteran status will be biased if there any omitted factors, which are correlated with the veteran status and also relevant for the earnings, so veteran status is endogenous. If we don’t control for those factors we will get an biased estimator. For example if men with low civilian opportunities tend to enlist in the army veteran status will be correlated with omitted factors. So veteran status would be endogenous, because earnings are also correlated with the omitted factor we would have an biased estimator. Whether it is over- or underestimated depends on the signs of the correlation. 3) In the IV approach we need to find a variable that is correlated with veteran status and also relevant for earnings, but uncorrelated with the error term. Functions of the draft lottery are IV when they are randomly assigned. We use the draft eligibility and a constant to estimate the Wald estimator. Simply using an indicator for draft eligiblity would give us a effect of draft eligiblity on earnings. The problem is that not every draft eligible person served in the military and the non draft eligible persons could volunatirly enlist into the military. The Wald estimator adjusts for that using the differences in the probability of being a veteran. 4) The first IV assumption requires the instrument to have a causal effect on the treatment (in this case the veteran status), because draft eligibility is a function of the randomly assigned numbers it is therefore correlated with the veteran status and so has a causal effect on it. The second IV assumption requires the instrument to be as good as randomly assigned, which is given when the lottery is fair. If the assignment of draft-eligiblity is random this would mean that prior to the lotteries the earnings of men should be similar. In fact the earning between draft-eligible and draft-ineligible men don’t differ before the loterry. The only thing that keeps them apart is the increased risk of being called in for the draft-eligible men. The third assumption is given, because the instrument used is a function of the lottery numers and is not correlated with the unobserved components of the earning equation. The only problem is that if draft avoidance is 1 correlated with lottery numbers then our instrument will fail the exclusion restriction, because it will be correlated with the error term. Furthermore we assume that defiers do not exist. 5) The estimation results in a Wald estimator. It adjusts the estimates of the effect of draft eligiblity on earnings by taking into account that not everyone, who was draft eligible served in the military, because we assume that nothing else than the difference in probabilities of being a veteran affects the earnings by draft eligiblity. The Paper talks about compliers, because the estimator is a LATE (Local Average Treatment Effect), which describes the effect of the veteran status on earnings for people who decided to enlist in the military. The estimated effect for white veterans of the cohort 1950 is a average loss of 2000 constant dollars. 6) There is a lot of criticism regarding the draft lottery in 1970, when evaluated by a Monte Carlo Simulation it shows that it is highly unlikely that the lottery was really fair. I would like the researcher to critically review the draft lottery and perform additional balance checks in order to assure the randomness of the loterry, which is critical for the exclusion restriction because if not draft eligiblity is not randomly assigned then it will be correlated with the error term and IV won’t work. Furthermore, I would really like the author to talk about policy implications. The effect of veteran status on earnings is significantly negative and the author could propose solutions for this problem and it may not incentivize people to enlist voluntarily to the military. As economists we are researchers but also government consultants, so I would have expected some solutions to the problem. Section B 1) The fraction of smokers in the dataset is 24.23% 2) We can see that smokers have a slightly higher high school dropout rate and high school graduation rate, whereas the average college graduation rate is twice as high for the nonsmokers (0.22) as for the smokers (0.11) Description of results for smoker: ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## smoker smkban age hsdrop hsgrad colsome colgrad black hispanic female smoker smkban age hsdrop hsgrad vars n mean sd median trimmed mad min max range skew 1 2423 1.00 0.00 1 1.00 0.00 1 1 0 NaN 2 2423 0.53 0.50 1 0.54 0.00 0 1 1 -0.13 3 2423 37.96 11.61 36 37.30 11.86 18 78 60 0.49 4 2423 0.14 0.35 0 0.05 0.00 0 1 1 2.06 5 2423 0.42 0.49 0 0.40 0.00 0 1 1 0.31 6 2423 0.28 0.45 0 0.23 0.00 0 1 1 0.96 7 2423 0.11 0.31 0 0.01 0.00 0 1 1 2.48 8 2423 0.08 0.27 0 0.00 0.00 0 1 1 3.17 9 2423 0.10 0.30 0 0.00 0.00 0 1 1 2.63 10 2423 0.54 0.50 1 0.55 0.00 0 1 1 -0.14 kurtosis se NaN 0.00 -1.98 0.01 -0.33 0.24 2.22 0.01 -1.90 0.01 2 ## ## ## ## ## colsome colgrad black hispanic female -1.09 4.16 8.03 4.92 -1.98 0.01 0.01 0.01 0.01 0.01 Description of results for non-smoker ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## vars n mean sd median trimmed mad min max range skew 1 7577 0.00 0.00 0 0.00 0.00 0 0 0 NaN 2 7577 0.63 0.48 1 0.67 0.00 0 1 1 -0.56 3 7577 38.93 12.26 38 38.21 13.34 18 88 70 0.52 4 7577 0.08 0.26 0 0.00 0.00 0 1 1 3.22 5 7577 0.30 0.46 0 0.24 0.00 0 1 1 0.90 6 7577 0.28 0.45 0 0.22 0.00 0 1 1 0.99 7 7577 0.22 0.42 0 0.16 0.00 0 1 1 1.32 8 7577 0.08 0.27 0 0.00 0.00 0 1 1 3.18 9 7577 0.12 0.32 0 0.02 0.00 0 1 1 2.38 10 7577 0.57 0.49 1 0.59 0.00 0 1 1 -0.29 kurtosis se smoker NaN 0.00 smkban -1.69 0.01 age -0.21 0.14 hsdrop 8.39 0.00 hsgrad -1.20 0.01 colsome -1.03 0.01 colgrad -0.26 0.00 black 8.10 0.00 hispanic 3.67 0.00 female -1.91 0.01 smoker smkban age hsdrop hsgrad colsome colgrad black hispanic female 3) Test for differences in means with homogeneity. Both differences are significant at the 1% level. Estimation results: ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## Two Sample t-test data: df$hsdrop and dg$hsdrop t = 9.9397, df = 9998, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.05335701 0.07957172 sample estimates: mean of x mean of y 0.14156005 0.07509568 Two Sample t-test data: df$female and dg$female t = -3.147, df = 9998, p-value = 0.001654 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.05908867 -0.01373103 3 ## sample estimates: ## mean of x mean of y ## 0.5361123 0.5725221 4) Smoking is not randomly assigned. We have a selection bias. Certain types of people are more likely to smoke. As we saw in 3 smokers had a higher high school dropout rate and nonsmokers had a higher average college graduation rate. When not controlling for these covariates, the negative effect of smoking on income will be biased. 5) 1293 Smokers are affected by the smoking ban and 1130 are not. The difference is significant to any of the common levels. The difference in means is -7.77% Estimation results: ## ## ## ## ## ## ## ## ## ## ## Two Sample t-test data: smokesmk1$smoker and smokesmk0$smoker t = -8.8633, df = 9998, p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.09471105 -0.06040564 sample estimates: mean of x mean of y 0.2120367 0.2895951 6) The coefficient on smoking bans is way lower than in 5) and is at -4.5% the reason for this is we control for more covariates and hence reduce the Omitted Variable Bias. Estimation results: ## ## ## ## ## ## ## ## ## Call: lm(formula = smoker ~ ., data = data) Coefficients: (Intercept) 0.201276 colgrad 0.041541 smkban -0.045343 black -0.026503 age -0.001354 hispanic -0.103745 hsdrop 0.309919 female -0.032874 hsgrad 0.224112 colsome 0.156170 7) Unfortunately our LPM seems to have an issue, the minimum value is given as -0.08 which points towards a negative probability. We have to keep the probabilities in the range of [0,1] in order to make them interpretable. Description results: ## vars n mean sd median trimmed mad min max range skew kurtosis se ## X1 1 10000 0.24 0.1 0.26 0.24 0.11 -0.08 0.49 0.57 -0.18 -0.74 0 4 8) THe coefficient is hard to interpret, because the probability of the outcome depends on the values of our x’es the explanatory variables. We cant use the interpretation like in a normal linear regression model. But the least we can say is that the coefficient on smkban is negative, which means that a smokeban, will have a negative impact on the predicted probability for smoking. Furthemore here in the probit model we have non constant effects. The estimation results for the probit model: ## ## ## ## ## ## ## ## ## ## ## ## ## Call: glm(formula = smoker ~ ., family = binomial(link = "probit"), data = data) Coefficients: (Intercept) -0.984241 colgrad 0.222474 smkban -0.151762 black -0.079690 age -0.004203 hispanic -0.332704 hsdrop 1.094230 female -0.110625 Degrees of Freedom: 9999 Total (i.e. Null); Null Deviance: 11070 Residual Deviance: 10500 AIC: 10520 hsgrad 0.851858 colsome 0.649256 9990 Residual 9) The MEA and the average marginal effect seem to be very similar. It is not surprising, because smkban is a dummy variable and therefore takes values 0 or 1. The Average Marginal Effect computes the average change of smkban from 0 to 1 and the marginal effect at the average can not be very different, because for values of smoke ban between (0,1) there should be no differences. Furthermore we have to ask us the question, whether computing the MEA makes sense. The mean of smkban is not even a possible value. Estimation results for the Average Marginal Effect: ## ## smkban age hsdrop hsgrad colsome colgrad black hispanic female -0.04494 -0.001245 0.324 0.2522 0.1922 0.06587 -0.0236 -0.09851 -0.03276 Estimation results for the Marginal Effect at the Average: ## ## ## ## at(smkban) smkban age hsdrop hsgrad colsome colgrad black hispanic 0.6098 -0.04507 -0.001248 0.325 0.253 0.1928 0.06608 -0.02367 -0.09881 female -0.03286 10) The coefficients are nearly the same. The LPM estimates the probability of how a change in the smkban influences the smoker rate. The Probit model, especially the average marginal effect basically does the same. So there is no surprise the coefficients are the same, but the LPM has values, which can’t be interpreted properly. 5