Economics 231W, Econometrics University of Rochester Fall 2008 Homework: Chapter 10 Text Problems: 10.1, 10.5, 10.8, 10.19, 10.24 10.1. (a) and (b) These are variables that cannot be quantified on a cardinal scale. They usually denote the possession or nonpossession of an attribute, such as nationality, religion, sex, color, etc. (c) Regression models in which explanatory variables are qualitative are known as ANOVA models. (d) Regression models in which one or more explanatory variables are quantitative, although others may be qualitative, are known as ANCOVA models. (e) In a regression model with an intercept, if a qualitative variable has m categories, one must introduce only (m – 1) dummy variables. If we introduce m dummies in such a model, we fall into the dummy variable trap, that is, we cannot estimate the parameters of such models because of perfect (multi)collinearity. (f) They tell whether the average value of the dependent variable varies from group to group. (g) If the rate of change of the mean value of the dependent variable varies between categories, the differential slope dummies will point that out. 10.5. (a) False. Letting D take the values of (0, 2) will halve both the estimated B2 and its standard error, leaving the t ratio unchanged. (b) False. Since the dummy variables do not violate any of the assumptions of OLS, the estimators obtained by OLS are unbiased in small as well as large samples. 10.8. (a) The coefficient -0.1647 is the own-price elasticity, 0.5115 is the income elasticity, and 0.1483 is the cross-price elasticity. (b) It is inelastic because, in absolute value, the coefficient is less than one. (c) Since the cross-price elasticity is positive, coffee and tea are substitute products. (d) and (e) The trend coefficient of -0.0089 suggests that over the sample period coffee consumption had been declining at the quarterly rate of 0.89 percent. Among other things, the side effects of caffeine may have something to do with the decline. (f) 0.5115. (g) The estimated t value of the income elasticity coefficient is 1.23, which is not statistically significant. Therefore, it does not make much sense to test the hypothesis that it is not different from one. (h) The dummies here perhaps represent seasonal effects, if any. (i) Each dummy coefficient tells by how much the average value of ln Q is different from that of the base quarter, which is the fourth quarter. The actual values of the intercepts in the various quarters are, respectively, 1.1828, 1.1219, 1.2692, and 1.2789. Taking the antilogs of these values, we obtain: 3.2635, 3.0707, 3.5580, and 3.5927 as the average pounds of coffee consumed per capita in the first, second, third, and the fourth quarter, holding the values of the logs of all explanatory variables zero. Note: On the general interpretation of the dummy variables in a semi-log model, see Robert Halvorsen and Raymond Palmquist, "The Interpretation of Dummy Variables in Semilogarithmic Equations," The American Economic Review, vol. 70 (June 1980), no.3, pp. 474-475. (j) The dummy coefficients D1 and D2 are individually statistically significant. (k) That seems to be the case in quarters one and two. Among other things, coffee prices and weather may have something to do with the observed seasonal pattern in these two quarters. (l) The benchmark is the fourth quarter. If we choose another quarter for the base, the numerical values of the dummy coefficients will change. (m) The implicit assumption that is made is that the partial slope coefficients do not change among quarters. (n) We can incorporate differential slope dummies as follows: ln Q = B3 ln I B4 ln P B6 D1 B7 D2 B8 D3 B9 ( D1 ln P) B10 ( D2 ln P) B11 ( D3 ln P) B12 ( D1 ln I ) B13 ( D2 ln I ) B14 ( D3 ln I ) B1 B2 ln P B5 T B15 ( D1 ln P ) B16 ( D2 ln P ) B17 ( D3 ln P ) u Note: The subscript “t” has been omitted to avoid cluttering the equation. The first two rows of the equation are the same as in the text. The differential slope dummies are in the last three rows. (o) One could estimate the model given in (n). If there are other substitutes for coffee, they can be brought in the model. 10.19 (a) Based on the 19 observations, the EViews regression results are: Dependent Variable: NDIV Sample: 1999:1 2003:3 Variable C ATPROFITS R-squared Coefficie Std. Error t-Statistic Prob. nt 248.8055 31.89255 7.801368 0.0000 0.206553 0.049390 4.182100 0.0006 0.507103 As these results show, there is a statistically significant positive relationship between the two variables, an unsurprising finding. (b), (c),and (d) We can introduce three dummies to distinguish four quarters and can also interact them with the profits variable. This exercise yielded no satisfactory results, since both the dummies and interaction terms were completely insignificant, suggesting that perhaps there is no seasonality involved. This makes sense, for most corporations do not change their dividends from quarter to quarter. It seems that there is no reason to consider explicitly seasonality in the present case. 10.24. From Table 10.10 we observe that of the 40 observations, 6 observations have negative predicted values and 6 have predicted values in excess of 1. Hence, there are 12 incorrect predictions. Therefore, Count R 2 = 28 / 40 = 0.7000. The conventional R 2 value is 0.8047. Other Problems 1. Suppose you have been hired by a union that wants to convince workers in local dry-cleaning establishments that joining the union will improve their well-being. As your first assignment, your boss asks you to build a model of wages for dry-cleaning workers that measures the impact of union membership on those wages. Your first equation (standard errors in parentheses) is: Wi = -11.40 + 0.30Agei – 0.003Agei2 + 1.00Edui + 1.20Di se (0.10) (0.002) (0.20) (1.00) n = 34 R2 = .14, F= 24.2 Where: Wi = the hourly wage in dollars of the ith worker Ai = the age of the ith worker Edui = the number of years of education of the ith worker Di = a dummy variable = 1 if the ith worker is a union member, 0 otherwise. a. Interpret the regression results. How do the signs compare with your expectations? b. The intercept has no real economic meaning. If age increases by 1 year, average hourly wage increases by $0.30. If age2 increases by one unit, average hourly wage decreases by $0.003. If education increases by one year, average hourly wage increases by $1.00 If the worker is a member of a union the average hourly wage increases by $1.20. Using a two-sided t-test at the 5% significance level determine if the coefficients on the independent variables are statistically significant? H0: Bk = 0 and H1: Bk ≠ 0 tk bk Bk with n – k degrees of freedom se(bk ) tage = (0.30 – 0)/0.10 = 3 tage2 = (0.003 – 0)/0.002 = 1.5 tedu = (1.00 – 0)/0.20 = 5 tD = (1.20 – 0)/1.00 = 1.2 The t critical value with 34 – 5 = 29 degrees if freedom and α = 0.05 is 2.045. Therefore reject H0 for both Age and Education, that is they are both statistically significantly different from zero. Fail to reject H0 for Age2 and D, that is they are not statistically significantly different from zero. What relationship between A and W does the above result imply? Why doesn’t the inclusion of A and A2 violate the classical assumption of no perfect multicollinearity? c. This implies a nonlinear relationship Wage Age This does not violate the assumption of no perfect multicollinearity because Age and Age 2 represents a nonlinear relationship. d. On the basis of the regression results, should the workers be convinced that joining the union will improve their well-being? Why of why not? No, because the coefficient on the dummy variable is not statistically significant. e. Test the hypothesis that the coefficients are jointly significant at the 5 percent level. H0: B2 = B3 = B4 = B5 = 0 and H1: B2 ≠ B3 ≠ B4 ≠ B5 ≠ 0 Or H0: R2 = 0 and H1: R2 ≠ 0 The calculated F-stat is 24.2 which is greater than the F-critical value with 4 d.f. in the numerator and 29 d.f. in the denominator and α = 0.05 of 2.69 (I used 30 d.f. in the denominator), therefore reject H 0 and conclude that the variables are jointly statistically significantly different from zero. 2. Suppose you want to estimate the effect of gender and education on wages using the following regression: ln (wage) = B1 + B2D1i + B3(educationi) + B4(educationi*D1i) + ui Where ln(wage) = the natural log of the hourly wage of the ith individual D1 = 1 if female, 0 otherwise Education = years of education You have a sample of 274 men and 252 women which is 526 observations, the average wage for men in the sample is $7.10/hr and for women the average wage is $4.59/hr. The results of the regression are: ln(wage) = .389 - .227D1i + .082(educationi) -.0056(educationi*D1i) (se) (0.023) (0.008) (0.0014) n = 526 a. Adjusted R2 = .441 Using a two-sided t-test (at the 1% significance level) determine if the coefficients on the independent variables are statistically significant? H0: Bk = 0 and H1: Bk ≠ 0 tk bk Bk with n – k degrees of freedom se(bk ) tD = (0.227 – 0)/0.023 = 9.87 tedu = (0.082 – 0)/0.008 = 10.25 tedu*D = (0.0056 – 0)/0.0014 = 4.00 The t critical value with 526 – 4 = 522 degrees if freedom and α = 0.01 is 2.576. Therefore reject H0 and conclude that all of the coefficients are statistically significantly different from zero. b. Do the results indicate a difference in average wages between males and females holding education constant? How do you know? Yes, because the coefficient on the dummy variable is statistically significantly different from zero. c. What is the return to an additional year of education for males? What is the return to an additional year of education for females? Is the difference statistically significant? How do you know? For males the return to education is the slope coefficient on education 0.082. For females the return to education is the slope coefficient in education minus the differential slope coefficient = 0.082 – 0.0056 = 0.0764. Yes, the difference is statistically significant because the differential slope coefficient is statistically significantly different from zero. d. Sketch a graph of the results. Wage Male Female Edu 3. Use the “Crime Data” from my website, which has arrest information on 500 men in California in 1986, to estimate the following LPM of arrests Arrestedi = b1 + b2PPCi + b3Empi + ei Where Arrestedi = 1 if the ith man was arrested in 1986, 0 otherwise PPCi = the proportion of prior arrests that led to conviction for the ith man Empi = the number of quarters the ith man was employed in 1986 a. What are your expectations for the sings on PPC and EMP? Why? If incarceration works than PPC should be negative, but if incarceration does not work and we have a class of career criminals than PPC will be positive. We would assume that if the individual were employed they are less likely to be arrested. b. Report the regression results usual statistics in an appropriate format. Regression Statistics R Square Adjusted R Square Standard Error Observations 0.021 0.017 0.448 500 ANOVA df Regression Residual Total Intercept Proportion of Prior Convictions Number of Quarters Employed c. 2 497 499 SS 2.149 99.953 102.102 Coefficients 0.398 -0.110 -0.031 Standard Error 0.040 0.059 0.012 MS 1.074 0.201 t Stat 9.934 -1.870 -2.590 F 5.343 Pvalue 0.000 0.062 0.010 Test whether the coefficients on PPC and EMP are statistically significantly different from zero at the 10 percent level (two-tailed). Both calculated t-statistics above are greater than the t-critical value of 1.645, therefore reject the null and conclude that they are both statistically significantly different from zero. d. Interpret the regression equation. These results indicate that an increase in the proportion of prior convictions by one percent reduces the probability of being arrested by 11 percent. If we increase the number of employed quarters by 1, the probability of arrest falls by 3 percent. Significance F 0.005