Uploaded by fasdfasdf294

(GLBL 122) PSET 1

advertisement
Beckett Elkins – GLBL 122
Problem Set 1
Study Group: Jason Zeng
================================================================================
Describe data using summary statistics
1. regress ed dist, robust
Source |
SS
df
MS
-------------+---------------------------------Model | 338.850403
1 338.850403
Residual | 7144.17857
2,276 3.13891853
-------------+---------------------------------Total | 7483.02897
2,277
3.2863544
Number of obs
F(1, 2276)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
2,278
107.95
0.0000
0.0453
0.0449
1.7717
-----------------------------------------------------------------------------ed | Coefficient Std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------incomehi |
.8475595
.0815748
10.39
0.000
.6875907
1.007528
_cons |
13.60521
.044141
308.22
0.000
13.51865
13.69177
------------------------------------------------------------------------------
Years of education and the dummy variable of whether a family earns more than 25k/yr are
positively correlated. On average, kids from families making more than 25k/yr attain 0.85 more years
of education. We can assume with 95% certainty that the population regression coefficient falls within
0.69 and 1.0. Thus, the null hypothesis is rejected, and a statistically significant relationship is
established.
2. regress ed momcoll, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
163.73
0.0000
0.0637
1.7546
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------momcoll |
1.348584
.1053945
12.80
0.000
1.141904
1.555263
_cons |
13.6746
.0396651
344.75
0.000
13.59681
13.75238
-----------------------------------------------------------------------------. regress ed dadcoll, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
211.58
0.0000
0.0865
1.733
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dadcoll |
1.327785
.0912824
14.55
0.000
1.14878
1.50679
_cons |
13.58526
.040518
335.29
0.000
13.5058
13.66471
------------------------------------------------------------------------------
1
Years of education and the dummy variable of whether a kid’s mom went to college are positively
correlated. On average, kids whose moms went to college attained 1.35 more years of education. We
can assume with 95% certainty that the population regression coefficient falls within 1.14 and 1.56.
Thus, the null hypothesis is rejected, and a statistically significant relationship is established.
Years of education and the dummy variable of whether a kid’s dad went to college are
positively correlated. On average, kids whose moms went to college attained 1.33 more years of
education. We can assume with 95% certainty that the population regression coefficient falls within
1.15 and 1.51. Thus, the null hypothesis is rejected, and a statistically significant relationship is
established.
3. tabstat dist, statistics(mean median min max p25 p75)
Variable |
Mean
p50
Min
Max
p25
p75
-------------+-----------------------------------------------------------dist | 1.733714
1
0
16
.4
2.3
--------------------------------------------------------------------------
The mean distance from a college (by 10s of miles) is 1.73. The median is 1, the minimum is 0, the
maximum is 16, the 25th percentile is 0.4, and the 75th percentile is 2.3.
Regression Analysis I
4. regress ed dist, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
10.89
0.0010
0.0045
1.8091
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0568495
.0172281
-3.30
0.001
-.0906339
-.0230651
_cons |
13.95194
.0485974
287.09
0.000
13.85664
14.04724
The regression shows a weak but statistically significant, negative correlation between distance from a college
and average years of education.
5. ed = β0 + β1(disti) + μi
6. the slope coefficient represents the average change in years of education completed for every 10 miles from
a 4 year college. The coefficient is negative so greater distance from a college is correlated to less years of
formal education.
7. The standard error value of .0172281 indicates that the population slope coefficient is on average .0172281
from the sample coefficent of -.0568495.
8. Assuming an alpha of 0.05, the null hypothesis is rejected as 0 falls outside the 95% confidence interval for
the true value of the slope coefficient.
9. The confidence interval indicates that there is a 95% chance the true value of the regression coefficient lies
between the values of -.0906339 of -.0230651.
10. The R^2 value of 0.0045 indicates that 0.45% of the variance in years of education can be explained by
distance from a college through this model.
2
11. ed = _cons + β1(dist)
= 13.95 - 0.057 (1)
= 13.90 <- predicted value for observation 10 miles away from college
= 13.95 - 0.057 (5)
= 13.67 <- predicted value for observation 50 miles away from college
Regression Analysis II
12. regress ed incomehi, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
104.17
0.0000
0.0453
1.7717
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------incomehi |
.8475595
.0830406
10.21
0.000
.6847163
1.010403
_cons |
13.60521
.0435719
312.25
0.000
13.51977
13.69066
------------------------------------------------------------------------------
The shows a positive correlation between years of education and a family making more than 25k/yr.
13. ed = β0 + β1(incomehii) + μi
14. The "omitted" category would be the category represented by 0, which in this case would be families
making less than 25k a year.
15. The slope coefficient represents the difference in years of education completed between families making
more than 25k/yr and families making the same or less. The coefficient is positive so kids coming from
families in the higher earning category typically recieve more years of education.
16. The standard error value of .0830406 indicates that the population slope coefficient is on average .0830406
from the sample coefficent of .8475595.
17. Assuming an alpha of 0.05, the null hypothesis is rejected as 0 falls outside the 95% confidence interval for
the true value of the slope coefficient.
18. The confidence interval indicates that there is a 95% chance the true value of the regression coefficient lies
between the values of .6847163 and 1.010403.
19. The R^2 value of 0.045 indicates that 4.5% of the variance in years of education can be explained by a
parent having an above-25k/yr income through this model.
20. ed = _cons + β1(incomehi)
= 13.60 + 0.85(1)
= 14.45 <- predicted value for observation from high-income family
= 13.61 + 0.85(0)
= 13.61 <- predicted value for observation from low-income family
Regression Analysis III
21. regress ed dist incomehi female, robust
Linear regression
Number of obs
F(3, 2274)
=
=
2,278
37.97
3
Prob > F
R-squared
Root MSE
=
=
=
0.0000
0.0481
1.7698
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0447538
.0171276
-2.61
0.009
-.0783412
-.0111665
incomehi |
.8314262
.0835712
9.95
0.000
.6675425
.9953099
female |
-.021912
.0750971
-0.29
0.770
-.1691779
.1253539
_cons |
13.69958
.0702361
195.05
0.000
13.56185
13.83731
------------------------------------------------------------------------------
The regression shows a predictively weak, multivariate regression with distance from a college and female being
negatively correlated with years of education and high income status being positively correlated.
22. ed = β0 + β1(dist) + β2(incomehi) + β3(female) + μi
23. The coefficient for incomehi is .83 indicating that the kids of high-income receive 0.83 more years of
education compared to low income families. Since the 95% confidence interval is wholy positive, I am
confident that the true relationship is significant and positive.
24. Gender does not appear to have a clear effect on years of education as the 95% confidence interval (-.17 to
.13) includes both positive and negative values.
Regression Analysis IV
25. regress ed dist incomehi female bytest, robust
Linear regression
Number of obs
F(4, 2273)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
240.24
0.0000
0.2567
1.5643
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0273668
.0145552
-1.88
0.060
-.0559097
.001176
incomehi |
.5381466
.0750767
7.17
0.000
.3909205
.6853726
female |
.0704433
.0659398
1.07
0.286
-.0588653
.1997518
bytest |
.0950745
.0035131
27.06
0.000
.0881853
.1019637
_cons |
8.853386
.178784
49.52
0.000
8.502789
9.203983
------------------------------------------------------------------------------
The regression shows a multivariate regression with distance from a college, test score, and female being
negatively correlated with years of education and high-income status being positively correlated. While it has
more predictive power than the previous models, it can still only explain 26% of the variance in years of
education.
26. ed = β0 + β1(dist) + β2(incomehi) + β3(female) + β4(bytest) + μi
27. The coefficient on bytest is 0.095 meaning for every point increase in the base year test score, the expected
years of education increases by 0.095. I am confident that there is a positive, non-zero correlation as the
95% confidence interval does not contain zero.
Mean estimation
Number of obs = 2,278
-------------------------------------------------------------|
Mean Std. err. [95% conf. interval]
4
-------------+-----------------------------------------------bytest | 51.02446 .1854846 50.66072 51.3882
dist | 1.733714 .0449891
1.64549 1.821938
-------------------------------------------------------------28. ed = 8.85 - .027(1.733714) + .54(1) + .07(0) + 0.095(51.02)
= 14.20
29. The adjusted R squared value is 0.0013 lower for regression 4 than the normal R squared value. Although
this is a small change, one would expect a larger difference when the number of explanitory variables is
increased.
30. I decided to use a robust regression since the data for distance displays heteroskedasticity with variance
decreasing as distance increases. This undermines the classical assumptions of a typical regression.
31. The introduction of the robust option decreased the range of the 95% confidence interval.
Do file
use "/Users/beckettpechon-elkins/Downloads/ps1.dta"
regress ed incomehi
regress ed momcoll, robust
regress ed dadcoll, robust
summarize dist
regress ed dist, robust
regress ed incomehi, robust
regress ed dist incomehi female, robust
regress ed dist incomehi female bytest, robust
Log file
use "/Users/beckettpechon-elkins/Downloads/ps1.dta"
.
. regress ed incomehi
Source |
SS
df
MS
-------------+---------------------------------Model | 338.850403
1 338.850403
Residual | 7144.17857
2,276 3.13891853
-------------+---------------------------------Total | 7483.02897
2,277
3.2863544
Number of obs
F(1, 2276)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
2,278
107.95
0.0000
0.0453
0.0449
1.7717
-----------------------------------------------------------------------------ed | Coefficient Std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------incomehi |
.8475595
.0815748
10.39
0.000
.6875907
1.007528
_cons |
13.60521
.044141
308.22
0.000
13.51865
13.69177
-----------------------------------------------------------------------------.
. regress ed momcoll, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
=
=
=
=
2,278
163.73
0.0000
0.0637
5
Root MSE
=
1.7546
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------momcoll |
1.348584
.1053945
12.80
0.000
1.141904
1.555263
_cons |
13.6746
.0396651
344.75
0.000
13.59681
13.75238
-----------------------------------------------------------------------------. regress ed dadcoll, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
211.58
0.0000
0.0865
1.733
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dadcoll |
1.327785
.0912824
14.55
0.000
1.14878
1.50679
_cons |
13.58526
.040518
335.29
0.000
13.5058
13.66471
-----------------------------------------------------------------------------.
. summarize dist
Variable |
Obs
Mean
Std. dev.
Min
Max
-------------+--------------------------------------------------------dist |
2,278
1.733714
2.147257
0
16
.
. regress ed dist, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
10.89
0.0010
0.0045
1.8091
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0568495
.0172281
-3.30
0.001
-.0906339
-.0230651
_cons |
13.95194
.0485974
287.09
0.000
13.85664
14.04724
-----------------------------------------------------------------------------.
. regress ed incomehi, robust
Linear regression
Number of obs
F(1, 2276)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
104.17
0.0000
0.0453
1.7717
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------incomehi |
.8475595
.0830406
10.21
0.000
.6847163
1.010403
_cons |
13.60521
.0435719
312.25
0.000
13.51977
13.69066
------------------------------------------------------------------------------
6
.
. regress ed dist incomehi female, robust
Linear regression
Number of obs
F(3, 2274)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
37.97
0.0000
0.0481
1.7698
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0447538
.0171276
-2.61
0.009
-.0783412
-.0111665
incomehi |
.8314262
.0835712
9.95
0.000
.6675425
.9953099
female |
-.021912
.0750971
-0.29
0.770
-.1691779
.1253539
_cons |
13.69958
.0702361
195.05
0.000
13.56185
13.83731
-----------------------------------------------------------------------------.
. regress ed dist incomehi female bytest, robust
Linear regression
Number of obs
F(4, 2273)
Prob > F
R-squared
Root MSE
=
=
=
=
=
2,278
240.24
0.0000
0.2567
1.5643
-----------------------------------------------------------------------------|
Robust
ed | Coefficient std. err.
t
P>|t|
[95% conf. interval]
-------------+---------------------------------------------------------------dist | -.0273668
.0145552
-1.88
0.060
-.0559097
.001176
incomehi |
.5381466
.0750767
7.17
0.000
.3909205
.6853726
female |
.0704433
.0659398
1.07
0.286
-.0588653
.1997518
bytest |
.0950745
.0035131
27.06
0.000
.0881853
.1019637
_cons |
8.853386
.178784
49.52
0.000
8.502789
9.203983
-----------------------------------------------------------------------------.
end of do-file
.
7
Download