Regression Project Introduction The Human Capital Theory of economics states that each labor market participant has a certain and distinct set of skills and abilities called human capital. Workers with valuable human capital increase their probability of earning high wages in the labor market. The skills acquired in college greatly increase a worker’s human capital and therefore the worker’s expected earnings (Borjas 250). Colleges, especially elite private colleges, are usually very expensive to attend. Presumably, students who go to these schools expect to acquire more human capital at a private school, which would increase their productivity, thereby increasing their wage in the labor market. Students attending private colleges expect the value of a private education to be worth more in the labor market than a public college education. That is, the present value of a private school education is greater than the present value of a public education. PVPrivate > PVPublic The present value of a private college education equals the present value of the wages the student expects to receive each year in the labor market minus the cost of attending a private college: PVPrivate= WPrivate/ (1+r)t - 3 H/ (1+r)t 1 where r is the discount rate, t stands for the year (there are only 46 years to account for retirement), and H represents the tuition of a private school for one year. (Borjas, 258) The present Value of a public school education equals the present value of the wages the student expects to receive each year in the labor market mines the cost attending a public college. PVPublic= WPublic/ (1+r)t - 3 H/ (1+r)t where r is the discount rate, t stands for the year, and H represents the tuition of a public school for one year (Borjas 258). Based on these equations, the wages of a private school graduate must be higher than the wages of a public school graduate in order for PVPrivate > PVPublic. Expected higher wages, however, may not be the only reason students attend private colleges. Private colleges tend to have smaller class and allow more access to faculty, which benefits some students more than others. Students who need more personal attention from faculty to acquire valuable human capital may be willing to pay the high tuition costs of a private college even if they do not expect a higher wage in the labor market. Other students would prefer the smaller tuition of public schools. Another theory about the effect of education on labor market outcome is the signaling theory. This theory is based on the idea that certain levels of education, or a degree from a specific college, signals a worker’s qualifications. According to the signaling theory, there is no rate of social return. These schools don’t teach more skills or increase a worker’s human capital. Rather, a degree from certain colleges signals worker 2 skills. Employers value credentials from elite college because they signal a high level of human capital (Borjas 241). Both of these economic theories predict that students attending an elite colleges or universities can expect higher wages than students attending other universities. However, education is not the only factor which determines a worker’s wage in the labor market. Personal characteristics, occupation and labor market conditions also affect wages. Other Studies Many economists have tried to measure the effect of college quality on earnings. Brewer, Eide, and Ehrenberg used data from the National Longitudinal Study of the High School Class of 1972 to gauge the effect of college choice on wages. They obtained their information on the characteristics of colleges from the Higher Education General Information Survey. The dependant variable used by Brewer, Eide, and Ehrenberg was the natural log of annual earnings. They controlled for many factors, including individual characteristics and college selection process. They found a statistically significant and large return to attending an elite school relative to a low ranked public school. They found weaker evidence of a premium to attending an elite public university. Evidence presented by Brewer, Eide, and Ehrenberg also suggests that the return to attending an elite private school in 1980 was greater than the return to attending an elite private school in 1972. (Brewer, Eide, and Ehrenberg, 11) In “College Quality and Future Earnings: Where Should You Send Your Child to College,” James, Alsalam, Conaty and To also use data from the National Longitudinal Study of the High School Class of 1972 to measure the effects of college quality on 3 wages. Their regression equation controls for individual characteristics, college characteristics, college major, occupation, and other labor market variables. James Alsalam, Conaty, and To found that attending an Eastern elite private college or university had a positive effect on earnings. However, they also found that taking many math and science courses in college had a more substantial effect on earnings (James, Alsalam, Conaty, and To 251-252). Wales used NBER-Thorndike data for his regression analysis. NBER- Thorndike data was collected from a sample of men who took a test for the Air Force, measuring their math and reasoning skills, physical coordination, spatial perception, and reaction to stress. Wales used the Gourman report as his measure of college quality. Wales found that the quality of college or university had the most profound effect at the graduate school level. Precisely, Wales found that a student who had attended an undergraduate school and a graduate school, both in the top fifth of all schools, earned 60 percent more than the average high school graduate. However, a student who attended undergraduate and graduate schools in the bottom four fifths of all schools earned only 15 percent more than the average high school graduate (Wales 314). Solmon and Wachtel also used NBER-Thorndike data and the Gourman rating of school quality. Controlling for experience, personal intelligence, years of schooling, occupation, and college type, Solmon and Wachtel found that college type does influence earnings. However, their results suggest that the effects of college type differ for different types of students (Solmon and Wachtel, 89). Unlike the studies conducted by Solmon and Wachtel and Wales, who studied only men, I used National Longitudinal Study (NLS) young women data for my analysis. 4 The National Longitudinal Study of the Labor Market observed over 5000 young women, ages 14 to 25, from 1968 to 1991. I performed two regressions using this data. One regression was done to prove that higher education levels such as college do increase a woman’s earnings. The other was performed to measure the effect of college quality on wages. The dependent variable in both of the regressions was the natural log of wages in 1992. I took the natural log of wages because it lets me view the explanatory variables in terms of percent change. The coefficients of the independent variables show the percent change in wages due to a one unit increase in the independent variable. Like the other studies, I took other variables besides college quality into account when performing my regression. In each regression I used a set of personal characteristics, labor market conditions, and education. First Regression My first regression measures the effect of level of education on hourly wages. I control for age, race, region, field of study in college (if the subject went to college), college quality (was the college attended in the top quartile of all the colleges in the data set), and occupation. Usually experience is included as a controlled variable in studies measuring effects on wages. However, many subjects in the NLS data set did not report their experience in the labor market. As a result, using experience in my regression would limit the number of observations. Therefore, in this regression, age is used as a proxy for 5 experience. Age is a better estimate of experience when pertaining to men as opposed to women. However, age is an adequate substitute for experience for women. The regression equation estimates age as well as age squared. This is because the coefficient preceding age is not constant. Wages tend to increase as workers grow older, but as age increases, wages increase at a decreasing rate. Some individual characteristics are left out of the regression as well. Although mother’s and father’s education most likely do affect a workers wage, they are omitted from the regression for the same reason as experience. The data set is missing information on parents’ level of education for many observations, therefore in order to retain a high number of observations, the variables are omitted. The regression equation looks like: Ln Wage=b0+b1(age)+b2(age squared)+b3(South)+ b4(Black)+ b5(Occ-Prof)+ b6(OccMan)+ b7(Occ-Cler)+ b8(Occ-Sal)+ b9(Occ-Cra)+ b10(Occ-HH)+ b11(Occ-Svc)+ b12(Field-SC)+ b13(Field-SS)+ b14(Field-Co)+ b15(Field-Bu)+ b16(Field-Ed)+ b17(HsGrad)+ b18(SomeCol)+ b19(Bach)+ b20(GradSchl)+ b21(top25)+E E is a normally distributed error term. South, Black, Occupations, Fields of Study, Level of Education, and Quality of School are all dummy variables. The occupations were measured against farm workers, and the fields of study were measured in terms of a humanities major. Levels of Education were measured against high school dropouts. For a complete list and definitions of the dummy variables see appendix A. 6 First Regression Results Many occupations and all levels of education are statistically significant at the 99 percent level. According to this regression, education does have an effect on wages. Relative to a high school dropout, a high school diploma increases wages by 20 percent, attending some college increases wages by 29 percent, a bachelors degree increases wages by 40 percent, and going to graduate school increases wages by 61 percent. The pvalue for f test of the regression is <0.0001, proving that at the 99 percent level that all of the coefficients of the independent variable are statically different from zero. In other words, the independent variables actually do have an effect of wages. Using a correlation matrix of estimates I detected some multicollinearity with the education levels. However, this is to be expected because in order to graduate from college, a person must first graduate from high school. In order to graduate from graduate school, a person must have completed college and so on. Multicollinearity also exists in the relationships between certain occupations, but this probably did not affect the results greatly. Heteroscedasticity does not seem to be a problem. The r-squared of the regression is.2995, meaning that the data points do not fit the regression line very well. Only 30 percent of the variation from the mean can be explained by the regression line. This is most likely because there are other variables that influence wages such as personal drive, productivity, and luck that are extremely difficult to measure. For the actual regression coefficients and summary statistics, see appendix C. Second Regression 7 So it seems that education does actually does affect wages. But does the quality of college education affect wages, too? The previous regression found a slightly positive but statistically insignificant effect. In order to obtain a better answer to this question, I preformed a second regression. In this regression, I reduce the observation group to only those who graduated college. To test whether the quality of college impacts wages among college graduates, I control for race, age, region, college field of study, and gradschool. Again I include a variable of whether the college attended was in the top quartile of colleges (using average SAT score of the college) of the people in the data set. As in the last regression, age is used as a proxy for experience. In this regression, however occupation is omitted to preserve a higher number of degrees of freedom. The regression equation looks like: Ln Wage=b0+ b1(Age)+ b2(Age squared)+ b3(Black)+ b4(South)+ b5(GRADSCHL)+ b6(top 25)+ b7(FIELD-SC)+ b8(FIELD-SS)+ b9(FIELD-CO)+ b10(FIELD-BU)+ b11(FIELD-ED)+E Again, E is a normally distributed error term. For a list and definition of the dummy variables refer to Appendix A. Second Regression Results Nearly all of the coefficients in the second regression are statistically insignificant. Only the coefficients for Graduate School and a Science or Math major in college are statistically different from zero. There is no evidence of multicollinearity or heteroscadasticity. The r-squared value is 0.1038; the data points fit the regression line 8 very poorly. Again, this is most likely because of some unobservable characteristics that also effect wages. The p-value for the F test is again <.0001, meaning at the 99 percent level the independent variables do affect the dependant variable. For complete regression results and summary statistics, see appendix D. Hypothesis Tests In the second regression, like the first, the coefficient for the top 25 percent of colleges by quality is small, slightly positive, and statistically insignificant. So is there any real difference between going to a top school? In order to answer this question I performed a hypothesis test for the difference of two means. The mean hourly wage of subjects with a college degree who attended colleges in the bottom 75 percent of this study is 1585.86. The mean wage for those who attended a college in the top 25 percent of schools is 1687.86. For summary statistics of variables used in the hypothesis test, refer to appendix B. Hypothesis Test for the difference between the two means: H0: Top25-College=0, Mean wage of top 25 minus the mean wage of all other college graduates equals zero. H1: Top25-College≠0, Mean wage of top 25 minus the mean wage of all other college graduates does not equal zero. Level of alpha: .05: Z.025=1.96 Test Statistic: (Top25-College)÷(Top25/n Top25+College/ncollege) = (1687.86-1585.41)/√ (1050.912/173 + 804.362/496) = 1.17 Decision rule: Reject H0 if z>1.96 or z<-1.96 9 -1.96<1.17<1.96 I cannot reject the null hypothesis. There is no statistically significant difference between the mean wage of a top 25 percent college graduate and a bottom 75 percent college graduate. There seems to be no return to attending a school in the top 25 percent of colleges in the NLS study. So apparently, the regressions and the hypothesis test prove that there is no real difference in the labor market returns to attaining a college education at a college in the top 25 percent of all schools by quality relative to a bottom 75 percent college education. Perhaps schools that are even more elite will have a higher return. I performed a hypothesis test for the difference of the top 10 percent of elite college graduates and all other college graduates. The mean of the college graduates who attended a college ranked in the top 10 percent by SAT score is 1977.75. The mean for all other college graduates (bottom 90 percent) is 1569.81. Hypothesis test: H0: Top10-College=0, Mean wage of top 10 minus the mean wage of all other college graduates equals zero. H1: Top10-College0, Mean wage of top 10 minus the mean wage of all other college graduates does not equal zero. Level of alpha: .05: Z.025=1.96 Test Statistic: (Top10-College)÷ (Top10/n Top10+College/ncollege) = (1977.75-1569.81)/ (1397.212/68 + 785.082/574) = 2.36 Decision rule: Reject H0 if z>1.96 or z<-1.96 10 2.36>1.96 I can reject the null hypothesis. The difference between the wages of a top 10 percent college graduate and a bottom 90 percent college graduate is significantly different from zero. Conclusion In the hypothesis test as well as the regressions, I have found no significant effect on wages of attending a school in the top 25 percent of elite colleges versus the bottom 75 percent. This result differs from previous studies, perhaps because this study was limited to women, while the other studies were either co-ed or limited to just men. According to economists, Lawrence Mishel, Jared Berstein, and John Schmit, the ratio of female to male wages in 1997 was at about .79, meaning that women make 79 percent of what men make (134-135). This could cause the different results in the return to college quality for men and women. Further there is evidence that because of socialization processes or importance of child rearing, women often cluster into certain occupation, which are often low-wage. I also found some results similar to those of other studies. Like Whales, I found a large return to attending graduate school. According to my regression, graduates of graduate school earn 61 percent higher wages than high school dropouts and 24 percent higher wages than college graduates. Similar to James, Alsalam, Conaty, and To, I found that some fields of study, particularly science and math, affect wages. In the regression including solely college graduates, the coefficient in front of the science and math field of study was .24, significant at the 99 percent level. Consistent with the results of James, 11 Alsalam, Conatry and To, in the second hypothesis test, I found that there is a significant return to attending an elite college in the top 10 percent of schools. However, most of my regression coefficients in both regressions were statistically insignificant. This was probably due, in part, to the small amount of female subjects in the NLS data set who attended elite schools. Perhaps an analysis of a larger group of college graduates, accounting for more college variables such as high school grades, college grades, and individual SAT, score may produce some more significant results about the return to college quality for women. 12 Works Cited Borjas, George J. Labor Economics. The McGraw-Hill Companies, 1996. Brewer, Dominic J. and Eric R. Eide and Ronald G. Ehrenberg. “Does it Pay to Attend an Elite Private College? Cross-Cohort Evidence on the Effects of College Type on Earnings.” Journal of Human Resources 34 (Winter 1999). James, Estelle, nabeel Alsalam, Joseph C. Conaty, and Duc-Le To. “College Quality and Future Earnings: Where Should You Send Your Child to College.” American Economic Association Papers and Proceedings, May. Mishel, Lawrence, Jared Bernstein and John Schmitt. The State of Working America. Ithaca; Cornell University Press, 1999. Solomon, Lewis C. and Paul Wachtel. “The Effects on Income of Type of College Attended.” Sociology of Education 48 (Winter 1975):75-90. Wales, Terence J. “The Effect of College Quality on Earning: Results from the NBERThorndike Data.” Journal of Human Resources 8: 306-315. 13 Appendix A Dummy Variables Fields of Study-For all fields the observation equals 1 if the subject studies that field in college, otherwise the observation equals zero. Field-SC=Sciences and Math Field-En=Engineering Field-SS=Social Sciences Field-CO=Computers Field-BU=Business Field-ED=Education The omitted variable is Field-HU=Humanities. All of the coefficients in front of the Field dummy variables compare the specific field to a humanities field. Occupations-For all occupations the observation equals 1 if the subject works in that occupation, otherwise the observation equals zero. OCC-PROF=Professional, technical, and kindred OCC-MAN=Managers, Officials, and Proprietors OCC-SAL=Sales Workers OCC-CRA=Craftsmen, Foremen, Operatives and Kindred OCC-HH=Private Household Workers OCC-SVC=Service Workers except Private Household 14 The Omitted Variable is OCC-FARM=Farm workers, Farm Managers, Farm Laborers and Foremen. All of the coefficients in front of the occupational dummy variables compare the specific occupation to Farm workers. Black The observation is equal to 1 if the subject is black, 0 if otherwise. South The observations take on different values depending on the location of the home of the subject. Levels of Education For all levels of education the observation equals 1 if the subject has obtained the specific level of education, otherwise the observation equals zero. HSGRAD=High School Graduate SOMECOL=The subject has had some college BACH=The subject has graduated college GRADSCHL=Graduate School The omitted variable is HSDROP=High School Dropout. The coefficients in front of the level of education variables to a high school dropout. 15 Top 25 The colleges ranked by average SAT score of the student body. The observation is equal to one if the subject attended a college in the top quartile of schools, zero if otherwise. Top 10 The colleges were ranked by average SAT score of the student body. The observation equals one if the subject attended a college in the top tenth of schools, zero if otherwise. 16 Appendix B Summary Statistics for hourly wages for subjects who graduated college N Mean Std. Dev Minimum Maximum Top 10 68 1977.75 1397.21 275.00 9622.00 Bottom 90 574 1569.81 785.08 231.00 9622.00 Top 25 173 1687.86 1050.91 275.00 9622.00 Bottom 75 469 1585.41 804.36 231.00 9623.00 17 Appendix C The First Regression Analysis Dependent Variable: hourly_wage_log Analysis of Variance Source DF Model Error Corrected Total Sum of Squares 22 2320 2342 Root MSE Dependent Mean Coeff Var 250.29674 585.33384 835.63058 0.50229 6.89650 7.28331 Mean Square F Value 11.37712 0.25230 R-Square Adj R-Sq 45.09 Pr>F <.0001 0.2995 0.2929 Parameter Estimates Variable Intercept SOUTH AGE93 agesqu FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_ED BLACK top25 OCC_PROF OCC_MAN OCC_CLER OCC_SAL OCC_CRA OCC_HH OCC_SVC HSGRAD SOMECOL BACH GRADSCHL Label Intercept SOUTH AGE93 FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_ED BLACK top25 OCC_PROF OCC_MAN OCC_CLER OCC_SAL OCC_CRA OCC_HH OCC_SVC HSGRAD SOMECOL BACH GRADSCHL DF 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 5.94265 -0.00833 0.01577 -0.00017423 0.15518 0.30953 0.04913 0.15363 0.05904 -0.07414 -0.03558 0.02438 0.46105 0.57039 0.24549 0.33536 0.17051 -0.37108 0.05746 0.20231 0.29010 0.40975 0.61356 Standard Error t Value 2.34114 2.54 0.00387 -2.15 0.10709 0.15 0.00122 -0.14 0.03773 4.11 0.29236 1.06 0.05059 0.97 0.09845 1.56 0.03844 1.54 0.03958 -1.87 0.02556 -1.39 0.03915 0.62 0.08362 5.51 0.08499 6.71 0.08166 3.01 0.09546 3.51 0.08440 2.02 0.12537 -2.96 0.08402 0.68 0.03653 5.54 0.04357 6.66 0.05173 7.92 0.05269 11.64 Pr > |t| 0.0112 0.0315 0.8829 0.8865 <.0001 0.2898 0.3316 0.1188 0.1247 0.0612 0.1641 0.5335 <.0001 <.0001 0.0027 0.0005 0.0435 0.0031 0.4941 <.0001 <.0001 <.0001 <.0001 18 The First Regression Summary Statistics The MEANS Procedure Variable hourly_wage SOUTH AGE93 FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_HU FIELD_ED BLACK top25 DROPOUT HSGRAD SOMECOL BACH GRADSCHL OCC_PROF OCC_MAN OCC_CLER OCC_SAL OCC_CRA OCC_HH OCC_SVC OCC_FARM N 2343 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 5159 Mean 1174.91 -48.7045939 43.7220392 0.0773406 0.0013569 0.0346966 0.0065904 0.0738515 0.0407056 0.0750145 0.2828067 0.0792789 0.0924598 0.2333786 0.1316147 0.0744330 0.0858694 0.1515798 0.0730762 0.1703819 0.0294631 0.0703625 0.0137624 0.0827680 0.0120178 Std Dev 780.5697857 62.4078255 3.0198196 0.2671570 0.0368140 0.1830281 0.0809213 0.2615545 0.1976264 0.2634403 0.4504069 0.2701998 0.2897020 0.4230221 0.3381041 0.2624997 0.2801982 0.3586478 0.2602867 0.3760044 0.1691170 0.2557817 0.1165143 0.2755579 0.1089757 Minimum Maximum 10.00 9623.00 -128.00 1.00 39.00 49.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 Miss 2816 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 Appendix D The Second Regression Analysis Dependent Variable: hourly_wage_log Analysis of Variance Source DF Model Error Corrected Total Sum of Squares 12 629 641 Root MSE Dependent Mean Coeff Var 17.37674 149.95519 167.33192 0.48826 7.26062 6.72483 Mean Square F Value 1.44806 0.23840 R-Square Adj R-Sq 6.07 Pr>F <.0001 0.1038 0.0867 Parameter Estimates Variable Intercept SOUTH AGE93 agesqu FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_ED BLACK GRADSCHL top25 Label Intercept SOUTH AGE93 FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_ED BLACK GRADSCHL top25 DF 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 11.18661 -0.09784 -0.19267 0.00226 0.24321 0.27962 0.10904 0.00202 0.07671 -0.02312 0.01876 0.24144 0.00924 Standard Error t Value 4.44167 2.52 0.04163 -2.35 0.20314 -0.95 0.00231 0.98 0.05884 4.13 0.28521 0.98 0.06486 1.68 0.22281 0.01 0.07366 1.04 0.05217 -0.44 0.05364 0.35 0.03908 6.18 0.04559 0.20 Pr > |t| 0.0120 0.0191 0.3432 0.3289 <.0001 0.3273 0.0932 0.9928 0.2981 0.6578 0.7266 <.0001 0.8394 20 The Second Regression Summary Statistics Variable Hourly wage SOUTH AGE93 FIELD_SC FIELD_EN FIELD_SS FIELD_CO FIELD_BU FIELD_HU FIELD_ED BLACK GRADSCHL Top25 N 642 827 827 827 827 827 827 827 827 827 827 827 827 Mean 1613.0200000 0.3712213 43.5054414 0.1958888 0.0036276 0.1269649 0.0084643 0.1003628 0.1318017 0.2889964 0.1523579 0.5356711 0.2877872 Std Dev 877.9523162 0.4834239 2.9876107 0.3971235 0.0601563 0.3331352 0.0916670 0.3006649 0.3384798 0.4535705 0.3595849 0.4990278 0.4530054 Minimum Maximum 231.00 9623.00 0 1.00 39.00 49.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 0 1.00 Miss 185 0 0 0 0 0 0 0 0 0 0 0 0 21