Tutorial to use with Davis, Duncan Data

advertisement
POL 681: Standard Errors and Such
Run Regression:
.
reg measwt reptwt if female==1
Source |
SS
df
MS
-------------+-----------------------------Model | 4334.88935
1 4334.88935
Residual | 418.873025
99 4.23104066
-------------+-----------------------------Total | 4753.76238
100 47.5376238
Number of obs
F( 1,
99)
Prob > F
R-squared
Adj R-squared
Root MSE
=
101
= 1024.54
= 0.0000
= 0.9119
= 0.9110
= 2.0569
-----------------------------------------------------------------------------measwt |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------reptwt |
.9772242
.0305301
32.01
0.000
.9166458
1.037803
_cons |
1.777503
1.744408
1.02
0.311
-1.68378
5.238787
-----------------------------------------------------------------------------The MSE is the variance of the residuals.
SSResidual/n-k-1.
It is given by the formula:
For these data, it is:
. display 418.873025/99
4.2310407
The standard error of the estimate (or RMSE) is given by the square root of the
variance:
. display sqrt(418.873025/99)
2.0569494
To see where the standard error of b1 comes from, recall the estimator. We can
demonstrate where these quantities come from. First, generate descriptive
statistics:
. summ reptwt, detail, if female==1
Reported Weight
------------------------------------------------------------Percentiles
Smallest
1%
44
41
5%
45
44
10%
50
44
Obs
101
25%
53
45
Sum of Wgt.
101
50%
75%
90%
95%
99%
56
61
65
68
75
Largest
71
75
75
77
Now, generate the “numerator”:
. display 45.39307*100
4539.307
Mean
Std. Dev.
56.74257
6.737438
Variance
Skewness
Kurtosis
45.39307
.4570697
3.639294
. display sqrt(45.39307*100)
67.374379
The standard error (as estimated) is:
. display 2.0569494/67.374379
.03053014
Which corresponds to our regression output.
Q: What happens as the variance in X decreases?
Now, let’s turn to the multiple regression setting.
model:
Let’s run the following
. reg prestige income educ
Source |
SS
df
MS
-------------+-----------------------------Model | 36180.9458
2 18090.4729
Residual | 7506.69865
42
178.73092
-------------+-----------------------------Total | 43687.6444
44
992.90101
Number of obs
F( 2,
42)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
45
101.22
0.0000
0.8282
0.8200
13.369
-----------------------------------------------------------------------------prestige |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.5987328
.1196673
5.00
0.000
.3572343
.8402313
educ |
.5458339
.0982526
5.56
0.000
.3475521
.7441158
_cons | -6.064663
4.271941
-1.42
0.163
-14.68579
2.556463
-----------------------------------------------------------------------------Let’s verify the RMSE estimate as being:
. display sqrt(7506.69865/42)
13.369028
Taking into account the estimator for the standard error of b1, we can compute
the “denominator.”
. summ educ, detail
Percent of males in occupation in 1950 who were
high-school graduates
------------------------------------------------------------Percentiles
Smallest
1%
7
7
5%
17
15
10%
20
17
Obs
45
25%
26
19
Sum of Wgt.
45
50%
75%
90%
95%
99%
45
84
92
97
100
. display 885.7071*44
38971.112
Largest
93
97
98
100
Mean
Std. Dev.
52.55556
29.76083
Variance
Skewness
Kurtosis
885.7071
.2262167
1.451751
This is our “variance” term.
What about the auxiliary regression?
. reg educ income
Source |
SS
df
MS
-------------+-----------------------------Model | 20456.6437
1 20456.6437
Residual | 18514.4674
43 430.569009
-------------+-----------------------------Total | 38971.1111
44 885.707071
Number of obs
F( 1,
43)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
45
47.51
0.0000
0.5249
0.5139
20.75
-----------------------------------------------------------------------------educ |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------income |
.8824238
.1280211
6.89
0.000
.6242448
1.140603
_cons |
15.61141
6.188362
2.52
0.015
3.13139
28.09143
The r2 from this model is the auxiliary regression, which of course is
equivalent to the square of the correlation coefficient:
. corr educ income
(obs=45)
|
educ
income
-------------+-----------------educ |
1.0000
income |
0.7245
1.0000
. display .7245^2
.52490025
Putting the pieces together, the standard error is:
. display 13.369028/sqrt(38971.112*(1-.52490025))
.09825
Download