1 Economics 405 Problem Set #4 Due—December 5 at 5:00 pm. 1

advertisement
Economics 405
Due—December 5 at 5:00 pm.
1.
Problem Set #4
An education researcher wished to test the claim that larger schools adversely impact student test
scores. The researcher collected data for n = 44 Michigan high schools. She estimated the
following model
^
math10  207.66  21.16 log( totcomp)  3.98 log( staff )  1.35 log( enroll )
(48.70) (4.06)
(4.19)
(0.71)
2
n = 44, R = .0654
where math10 is the percentage of tenth-graders receiving a passing grade on a standardized math
test; totcomp is average annual teacher compensation; staff is the number of staff per thousand
students; and enroll is the number of students enrolled.
a.
Interpret the regression coefficient on log(enroll).
We have a level-log model in this problem and you have to be careful with coefficient
interpretation in such models. Literally interpreted, a 1-log point increase in
enrollments is predicted to reduce percentage of students passing the test by 1.35%. A
1-log point increase in enrollments represents a huge change in actual enrollments,
however. It would be more useful to evaluate the effect on predicted test score in
terms of a 1% change in enrollments (i.e., .01 of a log point). Accordingly, divide ̂ 3
by 100 to get -.0135. Hence a 1% increase in enrollments is predicted to reduce the
pass rate by only .0135 of a percentage point.
b.
What does the model predict if enrollments increase by 10%? Explain.
Using our work from part a, if a 1% increase in enrollments reduces the pass rate by
.0135, it stands to reason that a 10% increase in enrollments reduces the pass rate by
10 x -.0135 = -.135 of a percentage point. Again, not much of an effect from a practical
point of view.
c.
Conduct the test implied in the first sentence at the 5% level of significance. (Be sure to
state the null and alternative hypotheses.)
H 0 : 3  0
H 1 :  3  0 (Since the researcher wishes to test for an “adverse effect”)
 1.35
 1.901
Test Statistic: t ˆ 
3
.71
Rejection Rule: Lower tail test so reject H0 if t ˆ  t ,n( k 1)  t.05, 40  1.684 .
3
Conclusion: Since the test statistic, at -1.901, is less than the critical value of -1.684, we
reject the null hypothesis and conclude that enrollments do have an adverse impact on
the pass rate at the 5% level of significance.
2
d.
Would the conclusion to part c change if the 1% level of significance were used? Why or
why not?
 t.01, 40  2.423 , therefore the test statistic would not be in the rejection region if we
use the 1% significance level. So the evidence against the null hypothesis of no effect is
strong but not the strongest possible evidence.
e.
Use Excel to generate the p-value of the test statistic. Explain how you obtained the pvalue. Interpret the p-value obtained for the test statistic in this problem.
Use the TDIST function with X = 1.901, degrees of freedom = 40, and one tail.
Using Excel, it turns out that the p-value for t ˆ  1.901 is .032. Note that this is
3
between .05 and .01, which explains why we can reject at the 5% level but not at the
1% level. p-value equal .032 implies that if the null hypothesis is correct, there is a
3.2% chance of randomly sampling 44 schools and obtaining ˆ3  1.35 . In other
words, there’s not much chance of getting the result we got if enrollments really don’t
affect the pass rate.
3
2.
The RENTAL.dta dataset from Wooldridge contains information on rent in college towns. I
regressed log(rent) on log(pop), log(avginc), and pctstu, where rent is average monthly rent, pop is
total town population, avginc is average household income, and pctstu is the percentage of students
in the total town population. Here is the regression output:
Source |
SS
df
MS
-------------+-----------------------------Model | 11.2058779
3 3.73529263
Residual | 2.85225674
124
.02300207
-------------+-----------------------------Total | 14.0581346
127 .110693974
Number of obs
F( 3,
124)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
128
162.39
0.0000
0.7971
0.7922
.15166
---------------------------------------------------------lrent |
Coef.
Std. Err.
t
P>|t|
-------------+-------------------------------------------lpop |
.0313456
.0270786
1.16
0.249
lavginc |
.8771387
.0413247
21.23
0.000
pctstu |
.0065849
.0012027
5.48
0.000
_cons | -3.368309
.4639438
-7.26
0.000
---------------------------------------------------------
a.
Interpret the regression coefficient on lavginc.
Since both the dependent variable and the independent variable of interest are in logs,
the coefficient on lavginc is the estimated elasticity of rent with respect to average
community income. Given ˆlavginc  .877 , our interpretation would be that a 1%
increase in the average income of a community results in a .877 of a percent increase in
average rent, holding log population and the percent of students equal across
communities. Alternatively, it would also be correct to say that a 10% increase in
average community income leads to an 8.77% increase in average rents, other things
equal.
b.
Using an   .05 , test the null hypothesis that the elasticity of rents with respect to income
equals 1 against the lower-tail alternative. In conducting the test, write out the null
hypothesis, the alternative hypothesis, the test statistic, the rejection region, and the
conclusion.
H 0 :  lavginc  1
H 1 :  lavginc  1
Rejection Region: Reject H0 if t ˆlavg inc  t.05,124  1.658 .
Test Statistic: t ˆ
lavginc

ˆlavginc  1
se( ˆ
lavginc
)

.877  1
 3
.041
Since the test statistic is less than the critical value, we should reject the null hypothesis
of unitary elasticity with respect to income. In other words, we can conclude that
while student rents rise with community income they do not do so proportionately.
4
c.
Use Excel to generate the p-value of the test statistic. Explain how you obtained the pvalue. Interpret the p-value obtained for the test statistic in this problem.
Use TDIST in excel, setting X = 3, degrees of freedom = 124, tails = 1.
The p-value equals .0016. Since the p-value is less than even   .01, we conclude that
the test statistic is highly significant. Specifically, there is less than a 2/10 of 1%
chance in repeated sampling of obtaining an estimated regression coefficient of .877 or
less when the null hypothesis of unitary elasticity is actually true. Essentially, there’s
no way that the elasticity of rents with respect to income is equal to one given the
sample evidence.
3.
Consider the population model: y   0  1 x1   2 x2   3 x3   4 x4   5 x5  u . There is
some question as to whether x4 and x5 belong in the model at all. Neither x4 nor x5 was
statistically significant in the estimated unrestricted regression. Conduct a test to
determine whether x4 and x5 are jointly significant at the 10% level. The sample size is n
= 32. The SSR of the restricted model is SSRr  3,047 and the SSR of the unrestricted
model is SSRur  2,691 .
a.
What is the null hypothesis? Explain.
The question is whether x4 and x5 should be in the model at all. The null
hypothesis, then, imposes q = 2 exclusion restrictions on the model, such that
H 0 :  4  0, 5  0 .
5
b.
What is the alternative hypothesis? Explain.
The alternative hypothesis in a test of joint significance is that the null
hypothesis is not true. This allows for the possibility in the problem at hand
that either x4 has an effect on the dependent variable, that x5 has an effect on
the dependent variable, or that both x4 and x5 have an effect on the
dependent variable. Formally,
H 1 : H 0 is not true
c.
Calculate the test statistic. Show your work.
We know the restricted and unrestricted residual sums of squares and we
know that n = 32, k = 5, and q = 2. The appropriate test statistic is:
F
d.
( SSRr  SSRur ) / q
(3,047  2,691) / 2
178


 1.7198
SSRur /( n  (k  1)) 2,691 /(32  (5  1)) 103.5
What is the critical value for the test statistic? Explain/show.
The F-statistic is distributed as an F random variable with q numerator
degrees of freedom and n-(k+1) denominator degrees of freedom under the
null hypothesis. The test for joint significance is strictly an upper tail test.
The desired level of significance is   .10 , i.e., the weakest level of evidence.
Notice that I don’t set the significance level at .05 or .01. We’re looking for
any evidence that variation in x4 and/or x5 explain some of the variation in y.
If we find even weak evidence, we should retain the variables in the
regression, otherwise we’re likely to induce omitted variable bias. We would
rather accept some inefficiency in the coefficient standard errors than cause
bias in the coefficient estimates of the remaining included regressors.
We’ll reject the null hypothesis if
F  F ,q ,n ( k 1)  F.10, 2, 26  2.52 .
e.
Are x4 and x5 jointly significant? Why or why not?
Since F = 1.7198, the test statistic does not satisfy the rejection rule. Hence
we cannot reject the null hypothesis that x4 and x5 have no effect on the
dependent variable at even the 10% level of significance. It would seem there
would be no harm in excluding these regressors from the model.
Furthermore, we should gain some precision in the estimation of the effects
of x1 – x3, since exclusion of x4 and x5 should reduce the degree of collinearity
amongst the remaining regressors.
6
f.
Use Excel to generate the p-value of the test statistic. Explain how you obtained
the p-value. Interpret the p-value of the test statistic.
Use FDIST setting X = 1.7198, Deg_freedom1 = 2, and Deg_freedom2 = 26.
The resulting p-value is .199. In other words, the probability is almost 20%
of getting a test statistic at least as large as 1.7198 when the null hypothesis of
no effect from x4 and x5 is true. This p-value is greater than all conventional
significance levels and so we cannot reject the null hypothesis at any of the
conventional significance levels (of course, this fact is clear in part e, since we
couldn’t reject H 0 at even the .10 level of significance there).
Download