PC Exercise 2 Dummy variables and interaction terms Home

advertisement
PC Exercise 2
Dummy variables and interaction terms
Home assignment: 1, 2, 3, 4.
Due date: prior to 10am (Swiss time) on 28 October 2015.
Please email your documented solutions, including calculation/regression results, in a pdf-file to:
svitlana.tyahlo@unifr.ch.
Group papers of maximum 4 students are appreciated.
If possible, please bring your own laptops for the PC sessions.
This exercise is about wage discrimination in the U.S. labor market and the use of dummy variables in
multiple regressions. Please use WAGE2.DTA for this exercise.
1. Create a variable that measures hourly wages as hwage=wage/(hours*4). Why is it important to
look at hourly wages rather than monthly earnings? Also create log hourly wages as
lhwage=ln(hwage). Calculate mean and standard deviation of hwage and lhwage.
2. What are the mean and the standard deviation of the hourly wage among blacks and non-blacks,
respectively?
3. Regress hwage on a constant and black.
a. What is the interpretation of the constant and coefficient on black?
b. Do blacks and non-blacks have significantly different hourly wages?
c. Does this mean that there is discrimination against blacks?
4. Create a dummy variable nonblack as 1-black. Regress hwage on a constant, black and nonblack.
Discuss the result (hint: why does R ignore one of the dummy variables?)
5. Regress hwage on black and nonblack but without a constant. What is the interpretation of the
coefficients and what is the relation to the estimates in 3.a?
6. Regress lhwage on a constant and all available explanatory variables (hint: all variables in the data
except wage, lwage, hours, hwage, nonblack).
a. Which of the regressors are dummy variables?
b. What is the reference group in this regression relative to which all coefficients have to be
interpreted?
c. What is the predicted hourly wage of an unmarried black living in a rural area in the south
with mean values of all other variables?
d. How would the predicted hourly wage of the same person change if he married?
e. Is there evidence for discrimination of blacks in the U.S. labor market? Discuss!
7. You want to know whether the returns to education differ between blacks and nonblacks.
a. What can you do to answer this question?
b. Perform this regression and interpret the result.
c. Discuss whether the interpretation of the coefficients of any of the variables in model 6.
changes.
d. What happens if you exclude black from this regression? Discuss.
8. You also want to know whether the coefficient of any of the other regressors differs between
blacks and nonblacks.
a. What can you do to answer this question?
b. Perform this regression and interpret the result.
c. Test whether there is model heterogeneity by race.
9. You would like to estimate the association between monthly earnings and age, education,
white/blue collar worker, number of children and gender using an OLS regression. For this
purpose, you rely on the following data set (1120 observations) consisting of the following
variables:
Variable name
earnings
age
edu
whitecollar
child0
child1
child2plus
male
mchild0
mchild1
mchild2plus
Description
Monthly earnings in EUR
Age in years
Education in years
Dummy for being a white collar worker (1=white collar, 0=blue collar)
Dummy for having no kids
Dummy for having one kid
Dummy for having two or more kids
Dummy for being male
Interaction between male and child0
Interaction between male and child1
Interaction between male and child2plus
Assume that the assumptions required for unbiasedness of OLS, namely MLR.1-MLR.4, are satisfied
for this dataset.
a. Applying OLS with robust standard errors, you obtain the following regression output:
Interpret the coefficients on “age”, “edu”, and “male” and also discuss whether they are
significantly different from zero.
b. The regression shown under 9.a includes a set of interaction terms between the dummy
indicating whether a person is male and the dummies for the number of children. Interpret
the coefficients of the constant and of the interaction terms “mchild0” and “mchild1”.
c. Discuss whether the specification under a) allows for different marginal effects of having
higher numbers of children (e.g. two versus three children) and why this is (not) the case.
d. A colleague claims that you have omitted the interaction term “mchild2plus” in your
regression under (a). However, another colleague claims that including “mchild2plus”
would induce perfect multicollinearity with another variable. Discuss which argument is
correct and give an explanation.
10. Consider the following linear model:
For education being equal to zero, which parameter(s) is (are) needed to predict the expected wage
for (1) young men, (2) old men, (3) young females, (4) old females?
11. What are the standard OLS assumptions for a multivariate linear model? Discuss their
implications and potential violations.
12. Explain the concept of heteroskedasticity and its implications for the consistency of point
estimates and standard errors. (hint: distinguish between consistent and efficient OLS estimator)
Download