Introductory Econometrics, Fall 2012 Homework 4 Solutions 1 1

advertisement
Introductory Econometrics, Fall 2012
Homework 4 Solutions
1. Consider the regression models
Model 1: wage=β0+β1educ+u
Model 2: log(wage)=γ0+γ1educ+v
Using the data set WAGE1.wf1 (this is different from the data set I posted for Homework
3), choose between the two models using the procedure that I showed in class.
Reminder:
i) Estimate both models (report the estimated regressions)
ii) Compute predictions for wage (not log wage!) from Model 1 as well as Model 2. For
Model 2 you’ll need to use the adjustment discussed in class. Don’t report anything from
this step.
iii) Regress wage on the adjusted predicted wage from Model 2 and record the R-squared
statistic. Compare with the R-squared from Model 1 and take your pick.
i)
Dependent Variable: WAGE
Method: Least Squares
Date: 12/06/11 Time: 22:18
Sample: 1 526
Included observations: 526
Variable
Coefficient
Std. Error
t-Statistic
Prob.
EDUC
C
0.541359
-0.904852
0.053248
0.684968
10.16675
-1.321013
0.0000
0.1871
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.164758
0.163164
3.378390
5980.682
-1385.712
103.3627
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
5.896103
3.693086
5.276470
5.292688
5.282820
1.823686
1
Introductory Econometrics, Fall 2012
Homework 4 Solutions
Dependent Variable: LWAGE
Method: Least Squares
Date: 12/06/11 Time: 22:18
Sample: 1 526
Included observations: 526
Variable
Coefficient
Std. Error
t-Statistic
Prob.
EDUC
C
0.082744
0.583773
0.007567
0.097336
10.93534
5.997510
0.0000
0.0000
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.185806
0.184253
0.480079
120.7691
-359.3781
119.5816
0.000000
Mean dependent var
S.D. dependent var
Akaike info criterion
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
1.623268
0.531538
1.374061
1.390279
1.380411
1.801328
ii)
wage_hat1= -0,904852+0.541359*educ
logwage_hat= 0,583773+0.082744*educ
wage_hat2=e^((SSR/(n-k-1)) / 2)* e^(logwage_hat)
where SSR= 120.7691, n-k-1=526-1-1=524
iii)
Dependent Variable: WAGE
Method: Least Squares
Date: 12/06/11 Time: 22:49
Sample: 1 526
Included observations: 526
Variable
Coefficient
Std. Error
t-Statistic
Prob.
WAGE_HAT2
C
1.254539
-1.422516
0.113088
0.675480
11.09346
-2.105933
0.0000
0.0357
R-squared
Adjusted R-squared
S.E. of regression
0.190189
0.188644
3.326559
Mean dependent var
S.D. dependent var
Akaike info criterion
5.896103
3.693086
5.245549
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
5798.580
-1377.579
123.0649
0.000000
Schwarz criterion
Hannan-Quinn criter.
Durbin-Watson stat
5.261767
5.251899
1.829516
2
Introductory Econometrics, Fall 2012
Homework 4 Solutions
First model
Last model
R^2: 0.164758
R^2: 0.190189
So choose log(wage) model.
2) [Wooldridge 7.3]
(i) The t statistic on hsize2 is over four in absolute value, so there is very strong evidence
that it belongs in the equation. We obtain the optimal high school size by looking at the
ˆ (other things fixed):
first order condition. This is the value of hsize that maximizes sat
19.3/(2*2.19)  4.41. Because hsize is measured in hundreds, the optimal size of
graduating class is about 441.
(ii) This is given by the coefficient on female (since black= 0): nonblack females have
SAT scores about 45 points lower than nonblack males. The t statistic is about –10.51, so
the difference is very statistically significant. (The very large sample size certainly
contributes to the statistical significance.)
(iii) Because female= 0, the coefficient on black implies that a black male has an
estimated SAT score almost 170 points less than a comparable nonblack male. The t
statistic is over 13 in absolute value, so we easily reject the hypothesis that there is no
ceteris paribus difference.
(iv) We plug in black= 1, female= 1 for black females and black= 0 and female= 1 for
nonblack females. The difference is therefore –169.81+ 62.31= 107.50. Because the
estimate depends on two coefficients, we cannot construct a t statistic from the
information given. The easiest approach is to define dummy variables for three of the
four race/gender categories and choose nonblack females as the base group. We can then
obtain the t statistic we want as the coefficient on the black female dummy variable.
3)
a) To do the Breush-Pagan test, one needs to estimate the model
wage=β0+β1educ+β2exper+u
by OLS and then regress the squared residuals on a constant, educ and exper. The results
are displayed below:
3
Introductory Econometrics, Fall 2012
Homework 4 Solutions
ls wage c educ exper
genr uhat2=resid^2
ls uhat2 c educ exper
Dependent Variable: UHAT2
Method: Least Squares
Date: 12/07/10 Time: 09:49
Sample: 1 526
Included observations: 526
Variable
Coefficient
Std. Error
t-Statistic
C
-23.23086
6.211818
-3.739784
EDUC
2.163008
0.436014
4.960867
EXPER
0.388162
0.088957
4.363499
R-squared
0.060546 Mean dependent var
Adjusted R-squared
0.056954 S.D. dependent var
S.E. of regression
26.39324 Akaike info criterion
Sum squared resid
364323.5 Schwarz criterion
Log likelihood
-2466.512 F-statistic
Durbin-Watson stat
1.960873 Prob(F-statistic)
Prob.
0.0002
0.0000
0.0000
10.54783
27.17855
9.389780
9.414107
16.85322
0.000000
As seen from the F-statistic for the overall significance of the explanatory variables, the
null hypothesis of “no heteroskedasticity” is strongly rejected.
Next we estimate the model wage=β0+β1educ+β2exper+u with hetersokedasticty-robust
standard errors:
Dependent Variable: WAGE
Method: Least Squares
Date: 12/07/10 Time: 09:55
Sample: 1 526
Included observations: 526
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
Coefficient
Std. Error
t-Statistic
Prob.
C
-3.390540
0.864875
-3.920267
0.0001
EDUC
0.644272
0.065187
9.883457
0.0000
EXPER
0.070095
0.010994
6.375622
0.0000
R-squared
0.225162 Mean dependent var
5.896103
Adjusted R-squared
0.222199 S.D. dependent var
3.693086
S.E. of regression
3.257044 Akaike info criterion
5.205204
Sum squared resid
5548.160 Schwarz criterion
5.229531
Log likelihood
-1365.969 F-statistic
75.98998
Durbin-Watson stat
1.820274 Prob(F-statistic)
0.000000
b) In order to transform the model, first generate a variable (call it h) equal to
Var(u|educ,exper)=exp(-3.5+0.28educ+0.05exper). Then generate the transformed
variables wage1 = wage/√h, c=1/√h (this is the transformed variable that corresponds to
the constant term in the original regression), educ1=educ/√h, and exper1=exper/√h.
4
Introductory Econometrics, Fall 2012
Homework 4 Solutions
genr h=exp(-3.5+0.28*educ+0.05*exper)
genr c1=1/h^.5
genr wage1=wage/h^.5
genr educ1=educ/h^.5
genr exper1=exper/h^.5
We now regress wage1 on c1, educ1 and exper1. A constant term is no longer needed:
ls wage1 c1 educ1 exper1
Dependent Variable: WAGE1
Method: Least Squares
Date: 12/07/10 Time: 10:02
Sample: 1 526
Included observations: 526
Variable
Coefficient
Std. Error
t-Statistic
C1
0.290811
0.433471
0.670890
EDUC1
0.325125
0.033217
9.787948
EXPER1
0.072856
0.009672
7.532938
R-squared
0.064396 Mean dependent var
Adjusted R-squared
0.060818 S.D. dependent var
S.E. of regression
1.897646 Akaike info criterion
Sum squared resid
1883.356 Schwarz criterion
Log likelihood
-1081.821 F-statistic
Durbin-Watson stat
1.741310 Prob(F-statistic)
Prob.
0.5026
0.0000
0.0000
3.713637
1.958125
4.124793
4.149120
17.99850
0.000000
You do not need to use the robust standard error formula because the new regression has
(presumably) homoskededastic errors.
c) In comparing the estimated models presented in parts a) and b), first note that the
coefficient on exper is pretty much the same, but the GLS estimate of the coefficient on
educ is just half of the OLS estimate. Moreover, the difference is rather big in
comparison to either standard error estimate. This could be a sign of model
misspecification (e.g. omitted variables), which is not surprising given how rudimentary
this model is.
Turning to the estimated standard errors, we see that they are smaller in part b) than in
part a). Theoretically, we know that if the stated expression for Var(u|educ,exper) is
indeed correct, GLS is the better (more efficient) estimator. In this case (at least when the
sample is reasonably large) we expect this fact to be reflected in the estimated standard
errors. However, if Var(u|educ,exper) is misspecified, GLS might not have smaller
variance . Moreover, the standard error estimates will be biased because
heteroskedasticity is not fully removed.
5
Download