ECO671, Spring 2006 , Second homework assignment

advertisement
ECO420, Fall 2013, Second homework assignment.
Prof. Bill Even
The assignment is due via email submission by 5 p.m. on Wednesday 10/30 (20 point penalty per
day, or any part thereof, late). Insert all your answers in this Word document, leaving the original
questions in place. Be sure to provide both the relevant Stata code and results for each answer.
1. (50 points) A data set named Mroz.dta is included in g:\eco\evenwe\eco420. A description of the
variables contained in the data set is contained in the file Mroz_descr.txt.
a. Estimate a log(wage) equation as a function of the person’s age, years of education, experience, and
experience2 using OLS. Re-estimate a log(wage) equation with the same controls, except allow for
education to be an endogenous variable by using IVREG2 (depending on which version of Stata you are
using, you may have to download download IVREG2 – go to help, search net resources, search for
IVREG2, and then install it). Use mother’s education and father’s education as instruments for a
person’s own education. Use outreg2 to put the OLS and IV estimates in a single table. Clearly identify
the columns.
b. Based on how the coefficients change from the OLS to the IVREG2 results, what can you conclude
about the nature of the endogeneity problem? Explain.
c. IVREG2 automatically generates a Cragg-Donald Wald F-statistic of “weak identification”. Explain
how this test statistic is generated. Given the results for this example, what conclusion can be drawn
from the resulting test statistic for this particular empirical problem? Explain.
d. IVREG2 automatically generates a Sargan statistic. Read about the Sargan statistic in the IVREG2
help under the section titled “testing overidentifying restrictions” and in chapter 15 of Wooldridge. State
precisely what hypothesis the Sargan statistic is testing, how the test statistic is calculated, and provide a
brief description of what the results imply for this empirical problem.
e. Use the OLS and IVREG2 estimates to calculate a Hausman test of the hypothesis that education is
exogenous. (See help on Hausman in stata). Interpret the results.
2. (50 points) For this problem, you will be using panel data extracted from the panel study of
income dynamics between 1980 and 1997. The data set is g:\eco\evenwe\eco420\psid.dta. It
contains the following variables:
earnings = annual earnings
white, black, othrace = dummies indicating whether the person's race is white, black, or some
other race.
educ=years of education (-1 implies missing value; you may delete these observations)
age=age in years
married=dummy indicating whether a person is married
female=dummy indicating whether person is female
year=year of observation
id=id number that identifies each person
It is important to note that people could be in the panel for as many as 18 years and as little as 1
year. This is referred to as an "unbalanced" panel since the number of years of observations is
not the same for all individuals.
For this problem, you will use the xtreg procedure in stata. xtreg allows you to estimate random
effects and fixed effects models (among others). It is important to note that before proceeding
with xtreg, you must identify the variables that identify the time period and the group. That is, if
the model is written as
yit = xitb + vi + uit
the t-subscript is identified by the year variable, and the i subscript is identified by the id
variable. In this case, you would tell stata what indexes i and t by executing
xtset id year
Check out the xtreg procedure to see how you would estimate a fixed or random effects model.
a. Estimate earnings as a function of education, race, age, marital status, gender and the year of
the observation using:1
i. ols
ii. random effects (RE)
iii. fixed effects (FE)
Output the results to a single table using the outreg2 command and clearly identify the 3
specifications.
1 1To control for year, create a dummy variable for each year. A shortcut for this is:
tabulate year, gen(ydum)
This will create dummies for each year, labeled ydum1 through ydum18. To include dummies 2-18 in the
regression, you can refer to them as ydum2-ydum18 rather than type out all 17 names.
b. Why are some variables automatically dropped in the FE model? Provide an econometric
justification for this.
c. Re-estimate the FE model by creating “deviations from individual specific means”.2 Recall
that this model should not have an intercept included (see the noconstant option in reg.)
Demonstrate that you get the same slope coefficients on the variables.
d. Compare the RE and the FE coefficient estimate for the married variable. Given the implied
bias in the RE estimate, what does this tell you about the unobservables of married people?
e. From the FE model, generate predictions of the FE (u in stata ... check out the predict options
for xtreg). Compute the correlation between the married variable and the fixed effects. Does
this confirm what you observed in part d? why or why not?
f. Compare the standard error of the estimates for the RE and FE estimates. How do they
compare? Why should you expect this?
g.
Test whether the assumptions necessary for the random effects model are appropriate (check
out hausman). Explain the difference in the assumptions of the RE and FE model. If the RE
assumptions are inappropriate, why is the FE model preferred? If the RE assumptions are
appropriate, why would the RE model be preferred over the FE model? (You can find more info
on the Hausman test in xtreg at http://www.stata.com/help.cgi?hausman where one of the
examples provided shows how to test for appropriateness of random effects assumption.)
2
You can create a variable containing individual specific means for age as follows:
bysort id: egen agemn=mean(age); *creates individual specific means;
Now each observation will have a new variable called “agemn” that contains the individual specific mean of age
across their years in the sample.
Download