Economics 231W, Econometrics

Economics 231W, Econometrics
University of Rochester
Fall 2008
Homework: Chapter 10
Text Problems: 10.1, 10.5, 10.8, 10.19, 10.24
(a) and (b) These are variables that cannot be quantified
on a
cardinal scale.
They usually denote the possession or nonpossession of an attribute,
such as nationality, religion, sex, color, etc.
(c) Regression models in which explanatory variables are qualitative
are known as ANOVA models.
(d) Regression models in which one or more explanatory variables
are quantitative, although others may be qualitative, are known as
ANCOVA models.
(e) In a regression model with an intercept, if a qualitative variable
has m categories, one must introduce only (m – 1) dummy variables.
If we introduce m dummies in such a model, we fall into the dummy
variable trap, that is, we cannot estimate the parameters of such
models because of perfect (multi)collinearity.
(f) They tell whether the average value of the dependent variable
varies from group to group.
(g) If the rate of change of the mean value of the dependent variable
varies between categories, the differential slope dummies will point
that out.
(a) False. Letting D take the values of (0, 2) will halve both the
estimated B2 and its standard error, leaving the t ratio unchanged.
(b) False. Since the dummy variables do not violate any of the
assumptions of OLS, the estimators obtained by OLS are unbiased
in small as well as large samples.
(a) The coefficient -0.1647 is the own-price elasticity, 0.5115 is the
income elasticity, and 0.1483 is the cross-price elasticity.
(b) It is inelastic because, in absolute value, the coefficient is less
than one.
(c) Since the cross-price elasticity is positive, coffee and tea are
substitute products.
(d) and (e) The trend coefficient of -0.0089 suggests that over the
sample period coffee consumption had been declining at the
quarterly rate of 0.89 percent. Among other things, the side effects of
caffeine may have something to do with the decline.
(f) 0.5115.
(g) The estimated t value of the income elasticity coefficient is 1.23,
which is not statistically significant. Therefore, it does not make
much sense to test the hypothesis that it is not different from one.
(h) The dummies here perhaps represent seasonal effects, if any.
(i) Each dummy coefficient tells by how much the average value of
ln Q is different from that of the base quarter, which is the fourth
quarter. The actual values of the intercepts in the various quarters
are, respectively, 1.1828, 1.1219, 1.2692, and 1.2789. Taking the
antilogs of these values, we obtain: 3.2635, 3.0707, 3.5580, and
3.5927 as the average pounds of coffee consumed per capita in the
first, second, third, and the fourth quarter, holding the values of the
logs of all explanatory variables zero.
Note: On the general interpretation of the dummy variables in a
semi-log model, see Robert Halvorsen and Raymond Palmquist,
"The Interpretation of Dummy Variables in Semilogarithmic
Equations," The American Economic Review, vol. 70 (June 1980),
no.3, pp. 474-475.
(j) The dummy coefficients D1 and D2 are individually statistically
(k) That seems to be the case in quarters one and two. Among other
things, coffee prices and weather may have something to do with the
observed seasonal pattern in these two quarters.
(l) The benchmark is the fourth quarter.
If we choose another
quarter for the base, the numerical values of the dummy coefficients
will change.
(m) The implicit assumption that is made is that the partial slope
coefficients do not change among quarters.
(n) We can incorporate differential slope dummies as follows:
ln Q =
 B3 ln I
 B4 ln P
 B6 D1
 B7 D2
 B8 D3
 B9 ( D1 ln P)
 B10 ( D2 ln P)
 B11 ( D3 ln P)
 B12 ( D1 ln I )
 B13 ( D2 ln I )
 B14 ( D3 ln I )
B1  B2 ln P
 B5 T
 B15 ( D1 ln P )  B16 ( D2 ln P )  B17 ( D3 ln P )  u
Note: The subscript “t” has been omitted to avoid cluttering the
equation. The first two rows of the equation are the same as in the
text. The differential slope dummies are in the last three rows.
(o) One could estimate the model given in (n). If there are other
substitutes for coffee, they can be brought in the model.
(a) Based on the 19 observations, the EViews regression results are:
Dependent Variable: NDIV
Sample: 1999:1 2003:3
Coefficie Std. Error t-Statistic Prob.
248.8055 31.89255 7.801368 0.0000
0.206553 0.049390 4.182100 0.0006
As these results show, there is a statistically significant positive
relationship between the two variables, an unsurprising finding.
(b), (c),and (d) We can introduce three dummies to distinguish four
quarters and can also interact them with the profits variable. This
exercise yielded no satisfactory results, since both the dummies and
interaction terms were completely insignificant, suggesting that
perhaps there is no seasonality involved. This makes sense, for most
corporations do not change their dividends from quarter to quarter. It
seems that there is no reason to consider explicitly seasonality in the
present case.
From Table 10.10 we observe that of the 40 observations, 6
observations have negative predicted values and 6 have predicted
values in excess of 1. Hence, there are 12 incorrect predictions.
Count R 2 = 28 / 40 = 0.7000.
The conventional R 2 value is 0.8047.
Other Problems
Suppose you have been hired by a union that wants to convince workers in local dry-cleaning
establishments that joining the union will improve their well-being. As your first assignment, your
boss asks you to build a model of wages for dry-cleaning workers that measures the impact of union
membership on those wages. Your first equation (standard errors in parentheses) is:
Wi = -11.40 + 0.30Agei – 0.003Agei2 + 1.00Edui + 1.20Di
n = 34 R2 = .14, F= 24.2
Where: Wi = the hourly wage in dollars of the ith worker
Ai = the age of the ith worker
Edui = the number of years of education of the ith worker
Di = a dummy variable = 1 if the ith worker is a union member, 0 otherwise.
Interpret the regression results. How do the signs compare with your expectations?
The intercept has no real economic meaning.
If age increases by 1 year, average hourly wage increases by $0.30.
If age2 increases by one unit, average hourly wage decreases by $0.003.
If education increases by one year, average hourly wage increases by $1.00
If the worker is a member of a union the average hourly wage increases by $1.20.
Using a two-sided t-test at the 5% significance level determine if the coefficients on the independent
variables are statistically significant?
H0: Bk = 0 and H1: Bk ≠ 0
tk 
bk  Bk
with n – k degrees of freedom
se(bk )
tage = (0.30 – 0)/0.10 = 3
tage2 = (0.003 – 0)/0.002 = 1.5
tedu = (1.00 – 0)/0.20 = 5
tD = (1.20 – 0)/1.00 = 1.2
The t critical value with 34 – 5 = 29 degrees if freedom and α = 0.05 is 2.045.
Therefore reject H0 for both Age and Education, that is they are both statistically significantly different
from zero. Fail to reject H0 for Age2 and D, that is they are not statistically significantly different from zero.
What relationship between A and W does the above result imply? Why doesn’t the inclusion of A and
A2 violate the classical assumption of no perfect multicollinearity?
This implies a nonlinear relationship
This does not violate the assumption of no perfect multicollinearity because Age and Age 2 represents a
nonlinear relationship.
On the basis of the regression results, should the workers be convinced that joining the union will
improve their well-being? Why of why not?
No, because the coefficient on the dummy variable is not statistically significant.
Test the hypothesis that the coefficients are jointly significant at the 5 percent level.
H0: B2 = B3 = B4 = B5 = 0 and H1: B2 ≠ B3 ≠ B4 ≠ B5 ≠ 0
H0: R2 = 0 and H1: R2 ≠ 0
The calculated F-stat is 24.2 which is greater than the F-critical value with 4 d.f. in the numerator and 29
d.f. in the denominator and α = 0.05 of 2.69 (I used 30 d.f. in the denominator), therefore reject H 0 and
conclude that the variables are jointly statistically significantly different from zero.
2. Suppose you want to estimate the effect of gender and education on wages using the following
ln (wage) = B1 + B2D1i + B3(educationi) + B4(educationi*D1i) + ui
ln(wage) = the natural log of the hourly wage of the ith individual
D1 = 1 if female, 0 otherwise
Education = years of education
You have a sample of 274 men and 252 women which is 526 observations, the average wage for men in the
sample is $7.10/hr and for women the average wage is $4.59/hr.
The results of the regression are:
ln(wage) = .389 - .227D1i + .082(educationi) -.0056(educationi*D1i)
n = 526
Adjusted R2 = .441
Using a two-sided t-test (at the 1% significance level) determine if the coefficients on the
independent variables are statistically significant?
H0: Bk = 0 and H1: Bk ≠ 0
tk 
bk  Bk
with n – k degrees of freedom
se(bk )
tD = (0.227 – 0)/0.023 = 9.87
tedu = (0.082 – 0)/0.008 = 10.25
tedu*D = (0.0056 – 0)/0.0014 = 4.00
The t critical value with 526 – 4 = 522 degrees if freedom and α = 0.01 is 2.576.
Therefore reject H0 and conclude that all of the coefficients are statistically significantly different from
Do the results indicate a difference in average wages between males and females holding
education constant? How do you know?
Yes, because the coefficient on the dummy variable is statistically significantly different from zero.
What is the return to an additional year of education for males? What is the return to an additional
year of education for females? Is the difference statistically significant? How do you know?
For males the return to education is the slope coefficient on education 0.082.
For females the return to education is the slope coefficient in education minus the differential slope
coefficient = 0.082 – 0.0056 = 0.0764.
Yes, the difference is statistically significant because the differential slope coefficient is statistically
significantly different from zero.
Sketch a graph of the results.
3. Use the “Crime Data” from my website, which has arrest information on 500 men in California in 1986,
to estimate the following LPM of arrests
Arrestedi = b1 + b2PPCi + b3Empi + ei
Arrestedi = 1 if the ith man was arrested in 1986, 0 otherwise
PPCi = the proportion of prior arrests that led to conviction for the ith man
Empi = the number of quarters the ith man was employed in 1986
What are your expectations for the sings on PPC and EMP? Why?
If incarceration works than PPC should be negative, but if incarceration does not work and we have a
class of career criminals than PPC will be positive.
We would assume that if the individual were employed they are less likely to be arrested.
Report the regression results usual statistics in an appropriate format.
Regression Statistics
R Square
Adjusted R Square
Standard Error
Proportion of Prior Convictions
Number of Quarters Employed
t Stat
Test whether the coefficients on PPC and EMP are statistically significantly different from zero at
the 10 percent level (two-tailed).
Both calculated t-statistics above are greater than the t-critical value of 1.645, therefore reject the null
and conclude that they are both statistically significantly different from zero.
Interpret the regression equation.
These results indicate that an increase in the proportion of prior convictions by one percent reduces the
probability of being arrested by 11 percent.
If we increase the number of employed quarters by 1, the probability of arrest falls by 3 percent.