PC Exercise 2 Dummy variables and interaction terms Home assignment: 1, 2, 3, 4. Due date: prior to 10am (Swiss time) on 28 October 2015. Please email your documented solutions, including calculation/regression results, in a pdf-file to: svitlana.tyahlo@unifr.ch. Group papers of maximum 4 students are appreciated. If possible, please bring your own laptops for the PC sessions. This exercise is about wage discrimination in the U.S. labor market and the use of dummy variables in multiple regressions. Please use WAGE2.DTA for this exercise. 1. Create a variable that measures hourly wages as hwage=wage/(hours*4). Why is it important to look at hourly wages rather than monthly earnings? Also create log hourly wages as lhwage=ln(hwage). Calculate mean and standard deviation of hwage and lhwage. 2. What are the mean and the standard deviation of the hourly wage among blacks and non-blacks, respectively? 3. Regress hwage on a constant and black. a. What is the interpretation of the constant and coefficient on black? b. Do blacks and non-blacks have significantly different hourly wages? c. Does this mean that there is discrimination against blacks? 4. Create a dummy variable nonblack as 1-black. Regress hwage on a constant, black and nonblack. Discuss the result (hint: why does R ignore one of the dummy variables?) 5. Regress hwage on black and nonblack but without a constant. What is the interpretation of the coefficients and what is the relation to the estimates in 3.a? 6. Regress lhwage on a constant and all available explanatory variables (hint: all variables in the data except wage, lwage, hours, hwage, nonblack). a. Which of the regressors are dummy variables? b. What is the reference group in this regression relative to which all coefficients have to be interpreted? c. What is the predicted hourly wage of an unmarried black living in a rural area in the south with mean values of all other variables? d. How would the predicted hourly wage of the same person change if he married? e. Is there evidence for discrimination of blacks in the U.S. labor market? Discuss! 7. You want to know whether the returns to education differ between blacks and nonblacks. a. What can you do to answer this question? b. Perform this regression and interpret the result. c. Discuss whether the interpretation of the coefficients of any of the variables in model 6. changes. d. What happens if you exclude black from this regression? Discuss. 8. You also want to know whether the coefficient of any of the other regressors differs between blacks and nonblacks. a. What can you do to answer this question? b. Perform this regression and interpret the result. c. Test whether there is model heterogeneity by race. 9. You would like to estimate the association between monthly earnings and age, education, white/blue collar worker, number of children and gender using an OLS regression. For this purpose, you rely on the following data set (1120 observations) consisting of the following variables: Variable name earnings age edu whitecollar child0 child1 child2plus male mchild0 mchild1 mchild2plus Description Monthly earnings in EUR Age in years Education in years Dummy for being a white collar worker (1=white collar, 0=blue collar) Dummy for having no kids Dummy for having one kid Dummy for having two or more kids Dummy for being male Interaction between male and child0 Interaction between male and child1 Interaction between male and child2plus Assume that the assumptions required for unbiasedness of OLS, namely MLR.1-MLR.4, are satisfied for this dataset. a. Applying OLS with robust standard errors, you obtain the following regression output: Interpret the coefficients on “age”, “edu”, and “male” and also discuss whether they are significantly different from zero. b. The regression shown under 9.a includes a set of interaction terms between the dummy indicating whether a person is male and the dummies for the number of children. Interpret the coefficients of the constant and of the interaction terms “mchild0” and “mchild1”. c. Discuss whether the specification under a) allows for different marginal effects of having higher numbers of children (e.g. two versus three children) and why this is (not) the case. d. A colleague claims that you have omitted the interaction term “mchild2plus” in your regression under (a). However, another colleague claims that including “mchild2plus” would induce perfect multicollinearity with another variable. Discuss which argument is correct and give an explanation. 10. Consider the following linear model: For education being equal to zero, which parameter(s) is (are) needed to predict the expected wage for (1) young men, (2) old men, (3) young females, (4) old females? 11. What are the standard OLS assumptions for a multivariate linear model? Discuss their implications and potential violations. 12. Explain the concept of heteroskedasticity and its implications for the consistency of point estimates and standard errors. (hint: distinguish between consistent and efficient OLS estimator)