Beckett Elkins – GLBL 122 Problem Set 1 Study Group: Jason Zeng ================================================================================ Describe data using summary statistics 1. regress ed dist, robust Source | SS df MS -------------+---------------------------------Model | 338.850403 1 338.850403 Residual | 7144.17857 2,276 3.13891853 -------------+---------------------------------Total | 7483.02897 2,277 3.2863544 Number of obs F(1, 2276) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2,278 107.95 0.0000 0.0453 0.0449 1.7717 -----------------------------------------------------------------------------ed | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------incomehi | .8475595 .0815748 10.39 0.000 .6875907 1.007528 _cons | 13.60521 .044141 308.22 0.000 13.51865 13.69177 ------------------------------------------------------------------------------ Years of education and the dummy variable of whether a family earns more than 25k/yr are positively correlated. On average, kids from families making more than 25k/yr attain 0.85 more years of education. We can assume with 95% certainty that the population regression coefficient falls within 0.69 and 1.0. Thus, the null hypothesis is rejected, and a statistically significant relationship is established. 2. regress ed momcoll, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 163.73 0.0000 0.0637 1.7546 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------momcoll | 1.348584 .1053945 12.80 0.000 1.141904 1.555263 _cons | 13.6746 .0396651 344.75 0.000 13.59681 13.75238 -----------------------------------------------------------------------------. regress ed dadcoll, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 211.58 0.0000 0.0865 1.733 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dadcoll | 1.327785 .0912824 14.55 0.000 1.14878 1.50679 _cons | 13.58526 .040518 335.29 0.000 13.5058 13.66471 ------------------------------------------------------------------------------ 1 Years of education and the dummy variable of whether a kid’s mom went to college are positively correlated. On average, kids whose moms went to college attained 1.35 more years of education. We can assume with 95% certainty that the population regression coefficient falls within 1.14 and 1.56. Thus, the null hypothesis is rejected, and a statistically significant relationship is established. Years of education and the dummy variable of whether a kid’s dad went to college are positively correlated. On average, kids whose moms went to college attained 1.33 more years of education. We can assume with 95% certainty that the population regression coefficient falls within 1.15 and 1.51. Thus, the null hypothesis is rejected, and a statistically significant relationship is established. 3. tabstat dist, statistics(mean median min max p25 p75) Variable | Mean p50 Min Max p25 p75 -------------+-----------------------------------------------------------dist | 1.733714 1 0 16 .4 2.3 -------------------------------------------------------------------------- The mean distance from a college (by 10s of miles) is 1.73. The median is 1, the minimum is 0, the maximum is 16, the 25th percentile is 0.4, and the 75th percentile is 2.3. Regression Analysis I 4. regress ed dist, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 10.89 0.0010 0.0045 1.8091 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0568495 .0172281 -3.30 0.001 -.0906339 -.0230651 _cons | 13.95194 .0485974 287.09 0.000 13.85664 14.04724 The regression shows a weak but statistically significant, negative correlation between distance from a college and average years of education. 5. ed = β0 + β1(disti) + μi 6. the slope coefficient represents the average change in years of education completed for every 10 miles from a 4 year college. The coefficient is negative so greater distance from a college is correlated to less years of formal education. 7. The standard error value of .0172281 indicates that the population slope coefficient is on average .0172281 from the sample coefficent of -.0568495. 8. Assuming an alpha of 0.05, the null hypothesis is rejected as 0 falls outside the 95% confidence interval for the true value of the slope coefficient. 9. The confidence interval indicates that there is a 95% chance the true value of the regression coefficient lies between the values of -.0906339 of -.0230651. 10. The R^2 value of 0.0045 indicates that 0.45% of the variance in years of education can be explained by distance from a college through this model. 2 11. ed = _cons + β1(dist) = 13.95 - 0.057 (1) = 13.90 <- predicted value for observation 10 miles away from college = 13.95 - 0.057 (5) = 13.67 <- predicted value for observation 50 miles away from college Regression Analysis II 12. regress ed incomehi, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 104.17 0.0000 0.0453 1.7717 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------incomehi | .8475595 .0830406 10.21 0.000 .6847163 1.010403 _cons | 13.60521 .0435719 312.25 0.000 13.51977 13.69066 ------------------------------------------------------------------------------ The shows a positive correlation between years of education and a family making more than 25k/yr. 13. ed = β0 + β1(incomehii) + μi 14. The "omitted" category would be the category represented by 0, which in this case would be families making less than 25k a year. 15. The slope coefficient represents the difference in years of education completed between families making more than 25k/yr and families making the same or less. The coefficient is positive so kids coming from families in the higher earning category typically recieve more years of education. 16. The standard error value of .0830406 indicates that the population slope coefficient is on average .0830406 from the sample coefficent of .8475595. 17. Assuming an alpha of 0.05, the null hypothesis is rejected as 0 falls outside the 95% confidence interval for the true value of the slope coefficient. 18. The confidence interval indicates that there is a 95% chance the true value of the regression coefficient lies between the values of .6847163 and 1.010403. 19. The R^2 value of 0.045 indicates that 4.5% of the variance in years of education can be explained by a parent having an above-25k/yr income through this model. 20. ed = _cons + β1(incomehi) = 13.60 + 0.85(1) = 14.45 <- predicted value for observation from high-income family = 13.61 + 0.85(0) = 13.61 <- predicted value for observation from low-income family Regression Analysis III 21. regress ed dist incomehi female, robust Linear regression Number of obs F(3, 2274) = = 2,278 37.97 3 Prob > F R-squared Root MSE = = = 0.0000 0.0481 1.7698 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0447538 .0171276 -2.61 0.009 -.0783412 -.0111665 incomehi | .8314262 .0835712 9.95 0.000 .6675425 .9953099 female | -.021912 .0750971 -0.29 0.770 -.1691779 .1253539 _cons | 13.69958 .0702361 195.05 0.000 13.56185 13.83731 ------------------------------------------------------------------------------ The regression shows a predictively weak, multivariate regression with distance from a college and female being negatively correlated with years of education and high income status being positively correlated. 22. ed = β0 + β1(dist) + β2(incomehi) + β3(female) + μi 23. The coefficient for incomehi is .83 indicating that the kids of high-income receive 0.83 more years of education compared to low income families. Since the 95% confidence interval is wholy positive, I am confident that the true relationship is significant and positive. 24. Gender does not appear to have a clear effect on years of education as the 95% confidence interval (-.17 to .13) includes both positive and negative values. Regression Analysis IV 25. regress ed dist incomehi female bytest, robust Linear regression Number of obs F(4, 2273) Prob > F R-squared Root MSE = = = = = 2,278 240.24 0.0000 0.2567 1.5643 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0273668 .0145552 -1.88 0.060 -.0559097 .001176 incomehi | .5381466 .0750767 7.17 0.000 .3909205 .6853726 female | .0704433 .0659398 1.07 0.286 -.0588653 .1997518 bytest | .0950745 .0035131 27.06 0.000 .0881853 .1019637 _cons | 8.853386 .178784 49.52 0.000 8.502789 9.203983 ------------------------------------------------------------------------------ The regression shows a multivariate regression with distance from a college, test score, and female being negatively correlated with years of education and high-income status being positively correlated. While it has more predictive power than the previous models, it can still only explain 26% of the variance in years of education. 26. ed = β0 + β1(dist) + β2(incomehi) + β3(female) + β4(bytest) + μi 27. The coefficient on bytest is 0.095 meaning for every point increase in the base year test score, the expected years of education increases by 0.095. I am confident that there is a positive, non-zero correlation as the 95% confidence interval does not contain zero. Mean estimation Number of obs = 2,278 -------------------------------------------------------------| Mean Std. err. [95% conf. interval] 4 -------------+-----------------------------------------------bytest | 51.02446 .1854846 50.66072 51.3882 dist | 1.733714 .0449891 1.64549 1.821938 -------------------------------------------------------------28. ed = 8.85 - .027(1.733714) + .54(1) + .07(0) + 0.095(51.02) = 14.20 29. The adjusted R squared value is 0.0013 lower for regression 4 than the normal R squared value. Although this is a small change, one would expect a larger difference when the number of explanitory variables is increased. 30. I decided to use a robust regression since the data for distance displays heteroskedasticity with variance decreasing as distance increases. This undermines the classical assumptions of a typical regression. 31. The introduction of the robust option decreased the range of the 95% confidence interval. Do file use "/Users/beckettpechon-elkins/Downloads/ps1.dta" regress ed incomehi regress ed momcoll, robust regress ed dadcoll, robust summarize dist regress ed dist, robust regress ed incomehi, robust regress ed dist incomehi female, robust regress ed dist incomehi female bytest, robust Log file use "/Users/beckettpechon-elkins/Downloads/ps1.dta" . . regress ed incomehi Source | SS df MS -------------+---------------------------------Model | 338.850403 1 338.850403 Residual | 7144.17857 2,276 3.13891853 -------------+---------------------------------Total | 7483.02897 2,277 3.2863544 Number of obs F(1, 2276) Prob > F R-squared Adj R-squared Root MSE = = = = = = 2,278 107.95 0.0000 0.0453 0.0449 1.7717 -----------------------------------------------------------------------------ed | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------incomehi | .8475595 .0815748 10.39 0.000 .6875907 1.007528 _cons | 13.60521 .044141 308.22 0.000 13.51865 13.69177 -----------------------------------------------------------------------------. . regress ed momcoll, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared = = = = 2,278 163.73 0.0000 0.0637 5 Root MSE = 1.7546 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------momcoll | 1.348584 .1053945 12.80 0.000 1.141904 1.555263 _cons | 13.6746 .0396651 344.75 0.000 13.59681 13.75238 -----------------------------------------------------------------------------. regress ed dadcoll, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 211.58 0.0000 0.0865 1.733 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dadcoll | 1.327785 .0912824 14.55 0.000 1.14878 1.50679 _cons | 13.58526 .040518 335.29 0.000 13.5058 13.66471 -----------------------------------------------------------------------------. . summarize dist Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------dist | 2,278 1.733714 2.147257 0 16 . . regress ed dist, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 10.89 0.0010 0.0045 1.8091 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0568495 .0172281 -3.30 0.001 -.0906339 -.0230651 _cons | 13.95194 .0485974 287.09 0.000 13.85664 14.04724 -----------------------------------------------------------------------------. . regress ed incomehi, robust Linear regression Number of obs F(1, 2276) Prob > F R-squared Root MSE = = = = = 2,278 104.17 0.0000 0.0453 1.7717 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------incomehi | .8475595 .0830406 10.21 0.000 .6847163 1.010403 _cons | 13.60521 .0435719 312.25 0.000 13.51977 13.69066 ------------------------------------------------------------------------------ 6 . . regress ed dist incomehi female, robust Linear regression Number of obs F(3, 2274) Prob > F R-squared Root MSE = = = = = 2,278 37.97 0.0000 0.0481 1.7698 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0447538 .0171276 -2.61 0.009 -.0783412 -.0111665 incomehi | .8314262 .0835712 9.95 0.000 .6675425 .9953099 female | -.021912 .0750971 -0.29 0.770 -.1691779 .1253539 _cons | 13.69958 .0702361 195.05 0.000 13.56185 13.83731 -----------------------------------------------------------------------------. . regress ed dist incomehi female bytest, robust Linear regression Number of obs F(4, 2273) Prob > F R-squared Root MSE = = = = = 2,278 240.24 0.0000 0.2567 1.5643 -----------------------------------------------------------------------------| Robust ed | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------dist | -.0273668 .0145552 -1.88 0.060 -.0559097 .001176 incomehi | .5381466 .0750767 7.17 0.000 .3909205 .6853726 female | .0704433 .0659398 1.07 0.286 -.0588653 .1997518 bytest | .0950745 .0035131 27.06 0.000 .0881853 .1019637 _cons | 8.853386 .178784 49.52 0.000 8.502789 9.203983 -----------------------------------------------------------------------------. end of do-file . 7