Economics 345
Applied Econometrics
Problem Set 7—Solutions
Prof: Martin Farnham
Problem sets in this course are ungraded. An answer key will be posted on the course website within a few days of the release of each problem set. As noted in class, it is highly recommended that you make every effort to complete these problems before viewing the answer key.
1) Book Problems 7.1-7.3
7.1 i) Yes, there is evidence that men sleep more than women. The point estimate suggests that men sleep 88 minutes a week more than women, on average, controlling for other factors included on the RHS. The t-statistic for the null that the coeff on male equals zero is 87.75/34.33 which is greater than 2. This indicates clear rejection of the null hypothesis at the 5% level.
Whether this is a big effect is debatable. We’re talking about 13 minutes a night more sleep than women. Personally, I would consider this a moderate effect. ii) Yes, there is a statistically significant tradeoff between working and sleeping. Each extra minute worked per week leads to about 0.16 fewer minutes of sleep per week.
Judging from the t-statistic (for a null of zero) of -0.163/0.018, this is clearly statistically significant. iii) Formally, this null hypothesis is H
0
:
β age
= 0,
β age _ squared
= 0 . To test the null hypothesis we need to estimate the restricted and unrestricted form of the regression, and conduct an F-test. We’ve estimated the unrestricted form already, so to estimate the restricted form we’d need to estimate sleep
= β
0
+ β
1 totwrk
+ β
2 educ
+ β
5 male
+ v .
7.2
Note that interpreting coefficients on dummies in log-linear specifications is a bit more complicated than interpreting coefficients on continuous x variables in log-linear specifications (by “log-linear” I mean a model where the y variable is in logs, but the x variables are in levels). Your textbook glosses over this point. For small values, the coefficient estimate on a dummy variable in a log-linear specification, if multiplied by
100, gives roughly the percent change in y associated with changing the dummy from 0 to
1. But if you consult your lecture notes on dummy variables, you’ll realize this interpretation is only a rough one, and that for large coefficient values, this
interpretation will be far from the truth. Because the estimates here are fairly small, we’ll stick to the rough interpretation given by your textbook in these answers. But consider doing the calculation correctly (per the lecture notes) as an exercise. i) The coefficient estimate on cigs suggests that an extra cigarette smoked (on average) daily during pregnancy will reduce birthweight by 0.44%. This would suggest that an extra 10 cigarettes per day will reduce birthweight by 4.4%. ii) A white child is expected to weigh 5.5% more than a non-white child on average, holding other factors fixed. Since the t-stat (for a zero null) is 0.55/0.13, this is clearly a statistically significant effect. iii) Increases in mother’s education cause lower birth weight (holding other factors fixed), according to estimates in the second equation. According to the point estimate, an extra year of mother’s education lowers birthweight by 0.3%. However, this estimate is not statistically significant. The t-stat (for a null of zero) is about 1. iv) The sample sizes are different for the two equations. To do an F-test, one could use the R-squared formula to compare equation 1 (the restricted equation) to equation 2 (the unrestricted equation). But one must use the same sample of data in estimating each regression. We can’t use the R-squareds provided here because they are calculated using different samples. The first sample size is larger, presumably because data are missing for motheduc and fatheduc (this causes the number of observations to shrink in the second equation).
The F-test would be easy to conduct if this problem were corrected.
7.3 i) There is pretty strong evidence that hsize 2 should be included. With a t-stat greater than 4, it’s clear that this term is statistically significant.
To find the optimal high school size, you want to find the size that maximizes SAT score.
Take the derivative with respect to hsize and set it equal to zero.
∂ sat
= 19.30
− 2(2.19) hsize = 0 . This implies that 4.38( hsize ) = 19.30
, or hsize=4.4.
∂ hsize
Recalling that hsize is expressed in units of 100s of students, this suggests that a high school class of approximately 440 maximizes SAT performance. ii) According to these estimates, nonblack females get a score of 45 points less than nonblack males, on average. This estimated difference is very statistically significant, as one can see by comparing the coefficient estimate and its standard error.
To see this most clearly, it helps to write out 1 equation for nonblack females (female=1, black=0) and 1 equation for nonblack males (female=0, black=0). The difference between the two (holding hsize constant) is 45.09. iii) Black males receive a score that is about 170 points less than nonblack males, ceteris paribus. This result is also highly statistically significant. To state the setup of the hypothesis test formally
H
0
:
β black
=
0
H
1
:
β black
≠
0
The t-statistic is -169.81/12.71=13.36. This is statistically significant at any conventional significance level. iv) Controlling for other conditions, black females score about 107.5 points lower than non-black females.
To see this, write out the equation for black females: sat=1028+19.3hsize-2.19hsize
2 -45.09(1)-169.81(1)+62.31(1)
Then write out the equation for nonblack females: sat=1028+19.3hsize-2.19hsize
2 -45.09(1)
The difference between these is 107.5.
To test whether the difference is statistically significant, we would just need to do a t-test of the null that (beta4+beta5)=0. This can be done in EViews. To calculate the tstatistic by hand you’d need to calculate se(betahat4+betahat5). p149 of your text talks about how to do this (though ultimately you’d probably still need EViews to do it this way). One more way you could do it (also requiring EViews) is to define a parameter
θ = β
4
+ β
5
, and rewrite the equation in terms of this parameter: sat
= β
0
+ β
1 hsize
+ β
2 hsize 2 + β
3 female
+
(
θ − β
5
) black
+ β
5 black
⋅ female
+ u
This can be rearranged to sat
= β
0
+ β
1 hsize
+ β
2 hsize 2 + β
3 female
+ θ black
+ β
5
( black
⋅ female
− black )
+ u and estimated. The parameter, theta, should be equal to -107.5. The standard error of theta should be se(betahat4+betahat5). Thus a t-stat could be easily calculated, given the reformulation we did.
2) Book Problems 7.5-7.6
7.5. i) Plugging in PC=1-noPC and rearranging yields
ColGPA
=
(
β
0
+ δ
0
)
− δ
0 noPC
+ β
1 hsGPA
+ β
2
ACT
+ u
The fitted version of this is
=
(1.26
+
0.157)
−
0.157
noPC
+
0.447
hsGPA
+
0.008
ACT
+ u .
So the intercept should now be 1.417. ii) Nothing should happen to the R-squared. It should remain the same, because we’re controlling for exactly the same variables as before. We’ve just replaced PC with a linear transformation of itself. iii) No they should not both be included, because each is a linear combination of the other. This violates the assumption of no perfect collinearity. Only 1 should be included.
To include both would be to fall into the “dummy variable trap.”
7.6
I discussed this in class when discussing the use of dummy variables for program evaluation. If we think less able people are more likely to receive training, then train and ability are negatively correlated. If we think ability leads to higher wages, then the omission of a control for ability should cause the coefficient on train to be negatively biased.
The best way to fix this would be to find a variable that controls for ability and include it in the regression.
3) Book Problem 7.8 i) We could estimate the following model: ln( wage )
= β
0
+ β
1
(# joints )
+ β
2 educ
+ β
3 ii) We could estimate the following model: exper
+ β
4 male
+ u ln( wage )
= β
0
+ β
1
(# joints )
+ β
2 educ
+ β
3 exper
+ β
4 male
+ β
5 male * (# joints )
+ u
As above, note the caveats about interpreting coefficients on dummies where you have a log-linear specification.
Beta1 gives the effect of smoking an extra joint on the wage for women (as a percent change). Beta1+beta5 gives the effect of smoking an extra joint on the wage for men (as a percent change). So the difference in the effect is given by beta5. To test whether there are differences in the effects of marijuana use for men and women, we could do a t-test
on beta5, the coefficient on male*(#joints). If we reject the null that this is zero, it would be indicative of different effects of drug use for men and women. iii) We could estimate the following model: ln( wage )
= β
0
+ β
1 light
+ β
2 moderate
+ β
3 high
+ β
4 educ
+ β
5 exper
+ β
6 male
+ u
Note that the reference group here is non-users. I don’t include a dummy for them, because that would be perfectly collinear with the other drug use dummies. iv) To test the null that marijuana use has no effect on the wage, an F-test would be appropriate. We could test for the joint significance of beta1, beta2, and beta3.
We could take the R-squared from the above model, and also estimate a model omitting light, moderate, and high, and use the R-squared from that (restricted) model to construct the F-stat. Numerator degrees of freedom would be q=3. Denominator degrees of freedom would be n-k-1, where k in this case is 6. At the 5% significance level, the critical F-value given a large sample would be about 2.6. If the F-stat we obtained was greater than this, we could reject the null that marijuana use has no effect on the wage. v) There are a couple of potential problems. First, people may not honestly answer questions about drug use if they are worried that someone else (e.g. their employer or police) might hear about their answer. This can cause bias under certain circumstances.
Second, drug use may be correlated with other attributes that affect the wage. Nothing against drug users, but they probably have a tendency to be less motivated and hardworking (if you prefer, more “easy going”) than non-users. If we don’t control for these other attributes that may be correlated with drug use, then we will obtain biased estimates of the effect of drug use on the wage.
Another possibility is that low-wage workers may have less to lose from a drug conviction. This could mean that low-wage workers choose to smoke more marijuana than high wage workers (simply due to different cost-benefit calculus). In such a case, the wages could actually be causing the drug use, rather than vice versa, as we’ve assumed. This would also lead to a biased measure of the causal effect of drug use on the wage.
4) Book Problem C7.2 (Computer Problem) i) Holding other factors fixed, blacks earn a weekly salary that is 18.8% less than nonblacks. The t-statistic on the coefficient on black is -5. So this difference is clearly statistically significant. ii) Here we need to do an F-test. We can use the R-squared formula to construct the Fstatistic.
F
=
(
R 2 ur
(1 − R 2 ur
−
R r
2
)
/ q
) / ( n − k − 1)
=
(0.2550
−
0.2526) / 2
(1 − 0.2550) / 925
≈
1.49
The critical F-value for the 10% significance level with df=(2,925) is about 2.30. This is clearly less than that.
To test for whether we can reject the null at the 20% significance level, we should probably use EViews. The way to do directly restrict the coefficients of interest is described in Lab 5, ii.g. This will produce a p-value for the F-test which will tell you the lowest significance level at which you can reject the null. You should obtain a p-value of
0.226, which says that the 22.6% significance level is the lowest level at which you can reject the null. iii) To do this, we should estimate the model ln( wage )
= β
0
+ β
1 educ
+ β
2 exper
+ β
3 tenure
+ β
4 married
+ β
5 black
+ β
6 south
+ β
7 urban
+ β
8 black * educ
+ u
As above, note the caveats about interpreting coefficients on dummies where you have a log-linear specification.
We obtain a coefficient of -0.0226, which suggests that an extra year of education has a
2.3 percentage point lower return for blacks than for non-blacks. To clarify: the return to an extra year of education for a nonblack worker is a 6.7% increase in the monthly salary. The return to an extra year of education for a black worker is a (6.7%+(-
2.3%))=4.4% increase in the monthly salary.
To test for whether the difference between the return to education for blacks and nonblacks is statistically significant, we need to test the null hypothesis
H
0
:
β
8
=
0 against the alternative that it is not equal to zero. A simple t-test will suffice. t equals -1.12. This indicates that we cannot reject the null at standard significance levels (for example, the critical t-value for a 2-sided alternative at the 5% level would be
-1.96). iv) Now we want to estimate ln( wage )
= β
0
+ β
1 educ
+ β
2 exper
+ β
3 tenure
+ β
4 married
+ β
5 black
+ β
6 black
⋅ married
+ β
7 south
+ β
8 urban
+ u
If you plug in zeros and ones where they belong you can find an expression for the log(wage) for married blacks and married nonblacks. For married blacks this is
(1)
ln( wage )
=
+ β
7 south
+
β
0
β
8
+ β
1 educ
+ urban
+ u
β
2 exper
+ β
3 tenure
+ β
4
+ β
5
+ β
6
For married nonblacks, we get
(2) ln( wage ) = β
0
+ β
1 educ + + β
4
+ β
7 south + β
8 urban + u
Thus, the wage difference between married black and married nonblacks, holding everything else constant, is beta5+beta6.
To estimate this model in EViews you need to first generate variables for the interaction terms. Then estimate the model. Then obtain the coefficient estimates of beta5 and beta6.
These are -0.24 and 0.06 respectively. Thus the difference in log wage between married blacks and married nonblacks is -0.19. This translates to 19% lower wages for married blacks (relative to married nonblacks).