I. Introduction: THE CALIFORNIA TEST SCORE DATA SET The California Standardized Testing and Reporting (STAR) dataset contains data on test performance, school characteristics and student demographic backgrounds. The data used here are from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999. Test scores are the average of the reading and math scores on the Stanford 9 standardized test administered to 5th grade students. School characteristics (averaged across the district) include enrollment, number of teachers (measured as “full-time-equivalents”), number of computers per classroom, and expenditures per student. The student-teacher ratio used here is the number of full-time equivalent teachers in the district, divided by the number of students. Demographic variables for the students also are averaged across the district. The demographic variables include the percentage of students in the public assistance program CalWorks (formerly AFDC), the percentage of students that qualify for a reduced price lunch, and the percentage of students that are English Learners (that is, students for whom English is a second language). All of these data were obtained from the California Department of Education (www.cde.ca.gov). However, in this research we wish to test how a set of selected variables from the data set affect test scores. The Statistical Software used to carry out this research is STATA. We wish to test the following hypothesis: i. Test the relationship of avg test score with district average income , percent of English Learners, Percent qualifying for CalWorks, Percent qualifying for reduced-price lunch , computers per student, expentitures per student,and student teacher ratio. ii. Test whether there is a concave relationship between avg test score and expenditures per student. iii. Test whether there is a negative relationship between district average income and Percent qualifying for reduced-price lunch. iv. Test that percentage increase in expenditure per student affects percentage change in avg test score positively. v. Test whether computers per student and expenditure per student affect avg test score equally. II. Empirical Methodology: These are the variables we have used in our research: Dependant Variable: TESTSCR: AVG TEST SCORE (= (READ_SCR+MATH_SCR)/2 ); Independent Varibles: COMP_STU: COMPUTERS PER STUDENT ( = COMPUTER/ENRL_TOT); EXPN_STU: EXPENTITURES PER STUDENT ($’S); STR: STUDENT TEACHER RATIO (ENRL_TOT/TEACHERS); EL_PCT: PERCENT OF ENGLISH LEARNERS(SECOND LANGUAGE) MEAL_PCT: PERCENT QUALIFYING FOR REDUCED-PRICE LUNCH; CALW_PCT: PERCENT QUALIFYING FOR CALWORKS; AVGINC: DISTRICT AVERAGE INCOME (IN $1000'S); Other variables that have been constructed/generated to test certain hypotheses: Dependant: LOG_TESTSCR: LOG(TESTSCR) Independent: EXPN2: (EXPN_STU)^2 AVGINC_MEAL: (AVGINC*MEAL_PCT) LOG_EXP: LOG(EXPN_STU) COMP_EXPN: (COMP_STU+ EXPN_STU) Presentation and discussion of the summary statistic of key variables: Variable TESTSCR Variance 363.0301 Skewness .0916151 Kurtosis 2.745712 Mean 15.31659 St. Deviation 7.22589 Variance 52.21348 Skewness 2.215156 Kurtosis 9.532125 The mean of district average incomes is $15316.59 with a std. deviation of 7.22589. Since Kurtosis > 3, its has a Leptokurtic distribution, sharper than a normal distribution with values concentrated around the mean and thicker tails. Skewness > 0, so we have a Right skewed distribution - most values are concentrated on left of the mean, with extreme values to the right. This means that there are a few exceptional district schools with average income much higher than the mean. Variable EL_PCT Std. Deviation 19.05335 The mean of test scores is 654.1565 score with a std. deviation of 19.05335. Since Kurtosis < 3, it has a Platykurtic distribution, flatter than a normal distribution with a wider peak. Although the distribution is positively skewed, the magnitude of skewness is negligible and hence can be assumed to be symmetrical around the mean. Variable AVGINC Mean 654.1565 Mean 15.76816 St. Deviation 18.28593 Variance 334.3751 Skewness 1.426798 Kurtosis 4.435401 The mean of percentage of students studying English is 15.76% with a std. deviation of 18.28. The std. deviation is very high indicating that the values are very scattered away from the mean, meaning that there are district schools were percentage of students studying English are both much higher and much lower than the mean. Since Kurtosis > 3, we have a Leptokurtic distribution, sharper than a normal distribution, with values concentrated around the mean and thicker tails. Skewness > 0, so we have a Right skewed distribution with most values concentrated on left of the mean, with extreme values to the right. This means that a few district schools have a substantially higher percentage of students studying English as a second language. Variable MEAL_PCT Variance 735.6778 Skewness .1839536 Kurtosis 2.000198 Mean .1359266 St. Deviation .0649558 Variance .0042193 Skewness .9223692 Kurtosis 4.431126 The mean of number of computers per student is .136 with a std. deviation of .065. Since Kurtosis > 3, the distribution is Leptokurtic, sharper than a normal distribution, with values concentrated around the mean and thicker tails. Skewness > 0, implies a Right skewed distribution - most values are concentrated on left of the mean, with extreme values to the right. This means that there are few districts schools which have a much much higher number of computers per student than the mean. Variable EXPN_STU St. Deviation 27.12338 The mean of percentage of students qualifying for a reduced meal price is 44.7%, which is quite substantial. The std. deviation is 27.12, which is quite high indicating that there are district schools where percentage of students qualifying for a reduced meal price is both much higher and much lower. Since Kurtosis < 3, is is a Platykurtic distribution, flatter than a normal distribution with a wider peak. Skewness > 0, so we have a Right skewed distribution, but the skewness is low. Most values are concentrated on left of the mean, with very few extreme values (districts where percentage of students qualifying for a reduced meal price is much much higher than 44.7%). Variable COMP_STU Mean 44.70524 Mean 5312.408 St. Deviation 633.9371 Variance 401876.2 Skewness 1.067897 Kurtosis 4.875713 The Mean of Expenditure per student is $5312.4 with a std. deviation of 633.9371. Since Kurtosis > 3, the distribution is Leptokurtic , sharper than a normal distribution, with values concentrated around the mean and thicker tails. Skewness > 0, implying a Right skewed distribution - most values are concentrated on left of the mean, with extreme values to the right. This means there are a few district schools that have a very high expenditure per student. The Regression Equation(s): i. Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ ii. Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ iii. Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ iv. Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp+ ɛ v. Testscr – U5 = β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_ expn)+ ɛ Expected signs of coefficients: For Regression Equation (i) Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ It is expected that, β1>0 (since students from districts with higher average income are expected to get better facilities/tools for education ) β2 <0 (since students who have English as a second language are expected to be weaker in English and this should be reflected in them getting lower scores in read score and eventually leading to lower test scores; hence if percentage of English learners is higher, test scores should be lower on average) β3<0 (since if more students are in calw, this indicates that they are worse-off and hence probably have lesser access to tools that can help in education, resulting in lower test scores in average) β4 <0 (since those who qualify for a reduced meal price are expected to be worse-off and hence probably have lesser access to facilities/tools that can help in education) β5>0 (since more computers per student ensures everyone can utilize their computer better and for a longer time) β6 >0 (since increase in expenditure per student means better facilities and better quality of education and hence better results) β7<0 (more student teacher ratio means that students get lesser individual attention of the teacher) For Regression Equation (ii) Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ It is expected that, Β6<0 (since, too much easily received facilities may make the students value the facilities less and hence not make proper use of them) For Regression Equation (iii) Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ It is expected that, β6<0 (since it is expected that districts with higher average income would have lower percentage of students that qualify for reduced price lunch) For Regression Equation (iv) Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ It is expected that, β5>0 (since a percentage increase in expenditure per student is likely to cause an increase in test scores) For Regression Equation (v) Testscr – U5= β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_expn)+ ɛ It is expected that, β4>0 (since both comp_stu and expn_stu are positive individually; however, whether they affect test scores equally is ambiguous). III. Estimated Results: Report of estimated results and explanation of estimated models: For Regression Equation (i) Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ Independant Variables Avginc El_pct calw_pct meal_pct comp_stu expn_stu str _cons Value of Coefficient .6216732 -.1981365 -.0778183 -.375618 11.89028 .0015263 -.18991 659.5871 P-value 0.000 0.000 0.175 0.000 0.086 0.088 0.503 0.000 Standard error 8.3914 R-squared 0.8093 Adjusted Rsquared 0.8060 F-statistic (Prob>F) 249.74 (0.0000) From the table above we can see that str has negative coefficient (β7 <0, as hypothesized) but the p-value is very high at 0.503, hence indicating clearly that the coefficient is insignificant. Also, the coefficient of calw_pct is negative (β3<0, as hypothesized) but here also the pvalue is high again proving that the coefficient is insignificant. So, the variables str and calw_pct should be dropped from our model. We have to reject our Hypothesis for the coefficients of these two variables. We therefore get a new equation for our model: New Regression equation (i) Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + ɛ Note: It must be noted that the numeration of the coefficients have changed as from the original equation since we have changed our original model by dropping two variables. E.g. β3 was the coefficient of calw_pct in the original model but in the new model β3 is the coefficient of meal_pct. Hence, while comparing the results of hypotheses tests this difference in numeration must be kept in mind. The estimated results, for the new model is as follows: Independant Variables Avginc El_pct meal_pct comp_stu expn_stu _cons Value of Coefficient .6194044 -.1859543 -.4064677 13.50629 .0016874 654.9726 P-value 0.000 0.000 0.000 0.048 0.021 0.000 Standard error 8.3945 R-squared 0.8082 Adjusted Rsquared 0.8059 F-statistic (Prob>F) 348.91 (0.0000) From the table above we can see that all the coefficients have p-values less than 5. Hence, all the coefficients are significant. The coefficient of avginc is .6194044 i.e. β1>0 as we hypothesized. So, we do not reject our hypothesis. This means that if District average income increases by 1unit, i.e. $1000, average test score is estimated to increase by .6194044 scores on average. The coefficient of el_pct is negative i.e. β2<0 as we hypothesized. So, we do not reject our hypothesis. This means that where percentage of students studying English is 1% more, it is estimated that test scores will be .1859543 scores lesser on average. The coefficient of meal_pct is negative i.e. β3<0 as we hypothesized. So, we do not reject our hypothesis. This means that where there are 1% more students qualifying for reduced price lunch we expect test scores there to be .4064677 scores lesser on average. The coefficient of comp_stu is positive i.e. β4>0 as we hypothesized. So, we do not reject our hypothesis. This means that when there is an increase of 1 computer per student, test scores are estimated to go up by 13.50629 scores on average. The coefficient of expn_stu is positive i.e. β5>0 as we hypothesized. So, we do not reject our hypothesis. This means that when there is an increase in expenditure per student by $1, test scores are estimated to go up by .0016874 scores on average. For Regression Equation (ii) Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ Independant Variables Avginc El_pct meal_pct comp_stu expn_stu expn2 _cons Value of Coefficient .615671 -.1849029 -.4062285 12.77274 -.0041014 5.16e-07 671.0976 P-value 0.000 0.000 0.000 0.063 0.0565 0.414 0.000 Standard error 8.3979 R-squared 0.8085 Adjusted Rsquared 0.8057 F-statistic (Prob>F) 290.64 (0.0000) From the table we can see that the coefficient of expn2 is neither negative nor significant. Hence, we should reject our hypothesis that the relationship between test score and expenditure per student is concave. For Regression Equation (iii) Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ Independent Variables Avginc El_pct meal_pct comp_stu expn_stu Avginc_meal _cons Value of Coefficient .6514535 -.1882845 -.3719248 12.46924 .0016199 -.0033181 655.3017 P-value 0.000 0.000 0.000 0.070 0.028 0.235 0.000 Standard error 8.3903 R-squared 0.8089 Adjusted R-squared 0.8061 F-statistic (Prob>F) 291.29 (0.0000) Our hypothesis that β6<0 should be rejected because although the sign of the coefficient is negative, the high p-value indicates insignificance of the coefficient. Hence, we cannot conclude that districts with higher average income would have lower percentage of students that qualify for reduced price lunch. For Regression Equation (iv) Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ Independent Variables Avginc El_pct meal_pct comp_stu Log_exp _cons Value of Coefficient .0009058 -.0002937 -.0006265 .0210436 . 0132001 6.38569 P-value 0.000 0.000 0.000 0.043 0.032 0.000 Standard error 0.01284 R-squared 0.8079 Adjusted R-squared 0.8056 F-statistic (Prob>F) 348.26 (0.0000) The p-value of the coefficient of log(expn_stu) is less than 5% indicating the coefficient is significant. The positive value of the coefficient matches our hypothesis (i.e. β5>0). This means that a 1% increase in expenditure per student is estimated to increase test scores by .0132001% (since this is a log-log model). For Regression Equation (v) Testscr - Uk = β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_expn)+ ɛ Independent Variables Avginc El_pct meal_pct comp_expn _cons Value of Coefficient .0009058 -.0002937 -.0006265 .0210436 6.38569 P-value 0.000 0.000 0.000 0.043 0.000 Standard error 8.4242 R-squared 0.8064 Adjusted R-squared 0.8045 F-statistic (Prob>F) 432.09 (0.0000) The coefficient is significant and positive as expected. However, to test whether comp_stu and expn_stu affect testscr equally, we must find out the F-calc for this restricted model. SSEr = 29451.6675; SSEur = 29173.7582; q = 1 (H0: β4= β5; on the restricted model); n-k1= 415 [refer to STATA outputs in the appendix for these values]. Hence F-calc = 3.9533 F-crit (α= 5%) = 3.84; F-crit (α= 2.5%)= 5.03 At the 5% significance level, F-calc>F-crit (3.953>3.84). So, we have strong evidence to reject the null. i.e. we can conclude that number of computers per student and expenditure per student do not affect test scores equally (β4 ≠ β5). At the 2.5% significance level, F-calc<F-crit (3.953<5.03). So, we do not reject null. i.e. we cannot say that number of computers per student and expenditure per student do not affect test scores equally. The estimated models, the ‘best’ model and whether we reject or do not reject our hypothesis based on the ‘best’ model: All the models we have used have very high F values indicating that all the models are valid. Regression model (iv) has by far the lowest standard error (Root MSE)=0.1284 All our models have R-squared and adjusted R-squared >0.8, indicating that all models are able to account for more than 80% of the changes in dependant variable-test score. Since the regression model (i) had two variables (calw_pct and str) with insignificant coefficients, these two variables were dropped and a new regression model (i) was formed. Since both regression models (ii) and (iii) have insignificant coefficients for the variables that are relevant to our study (i.e. expn2 and avginc_meal), these two models are now irrelevant. Since the regression model (v) was a restricted model for the new regression model (i) to test a certain hypothesis, we shall not count it as a candidate for a general model. Hence, we are left with two general models for the candidate of ‘best ‘model: the new regression model (i) and regression model (iv). All the coefficients of the variables of these two models are statistically significant. Between these two models, new regression model (i) has the higher R-squared (=0.8082) and Adjusted R-squared (=0.8059); regression model (iv) has Rsquared=0.8079 and adjusted R-squared=0.8056; the difference in these values are negligible. However, the difference in the values for standard error in these two models is remarkable. The new regression model (i) has a standard error = 8.3945 whereas the regression model (iv) has a standard error=0.1284. Hence, our choice for the best model is the regression model (iv) i.e. Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ Independent Variables Avginc El_pct meal_pct comp_stu Log_exp _cons Value of Coefficient .0009058 -.0002937 -.0006265 .0210436 . 0132001 6.38569 P-value 0.000 0.000 0.000 0.043 0.032 0.000 Standard error 0.01284 R-squared 0.8079 Adjusted R-squared 0.8056 F-statistic (Prob>F) 348.26 (0.0000) Where, All our hypotheses relating to the best model have come out to be as we expected (i.e. β1, β4, β5>0; β2, β3<0). Hence, we do not reject any of our hypothesis relating to the best model (although we have rejected some of our hypotheses in other models). (β1*100) is the estimated percentage change in test scores when district average income increases by 1 unit (i.e. $1000). The table shows that test scores are estimated increase by .09% on average when district average income increases by $1000. β2 is the estimated percentage change in test scores when percentage of English learners is 1% higher. The table shows that test scores are estimated to be -.029% lower on average where percentage of English learners is 1% higher. β3 is the estimated percentage change in test scores when percentage qualifying for reduced meal price is increased by 1%. The table shows that test scores is estimated to be -.062% lower on average where percentage qualifying for reduced meal price is increased by 1%. (β4*100) is the estimated percentage change in test scores when the number of computer s per student increases by 1. The table shows that test scores are estimated to increase by 2.104% when the number of computer s per student increases by 1. β5 is the estimated percentage change in test scores when expenditure per student is increased by 1%. The table shows that a 1% increase in expenditure per student is estimated to increase test scores by .0132001%. IV. Summary: Our research brings us to some interesting conclusions. We have found that student teacher ratio does not have an effect on the test scores, although we presumed it would have a negative effect on test scores. The percentage of students qualifying for calworks also does not affect the test scores. Districts with higher average income do see better test scores but increasing district average income will not increase test scores by much; this is indicated by the low coefficient. The students who are weak in English, i.e. those who are studying English as a second language, do score lower in tests on average; yet the coefficient is small indicating that improving students’ abilities in English will contribute little to the increase in test score. Districts, which see a greater number of students qualifying for a reduced-price meal, also see lower test scores. However, yet again the coefficient is small indicating that increased family income will not increase test scores in a major way. If the number of computers per student is increased, we see there is a substantial rise in test scores. Hence, our suggestion is that initiative be taken to ensure that there is one computer per student as this will help improve test scores. Where expenditure per student is more we experience an increased test scores. However, increasing expenditure per student generally will increase test scores by a small amount as indicated by the small coefficient. We have strong evidence that expenditure per student and computer per student do not affect test scores equally. Hence, a more detail study needs to be carried out to find out expenditure on what specific facilities/sectors shall increase test scores substantially. V. Appendix: The data set contains the following variables: DIST_CODE: DISTRICT CODE; READ_SCR: AVG READING SCORE; MATH_SCR: AVG MATH SCORE; COUNTY : COUNTY; DISTRICT: DISTRICT; GR_SPAN: GRADE SPAN OF DISTRICT; ENRL_TOT : TOTAL ENROLLMENT; TEACHERS: NUMBER OF TEACHERS; COMPUTER: NUMBER OF COMPUTERS; TESTSCR: AVG TEST SCORE (= (READ_SCR+MATH_SCR)/2 ); COMP_STU: COMPUTERS PER STUDENT ( = COMPUTER/ENRL_TOT); EXPN_STU: EXPENTITURES PER STUDENT ($’S); STR: STUDENT TEACHER RATIO (ENRL_TOT/TEACHERS); EL_PCT: PERCENT OF ENGLISH LEARNERS; MEAL_PCT: PERCENT QUALIFYING FOR REDUCED-PRICE LUNCH; CALW_PCT: PERCENT QUALIFYING FOR CALWORKS; AVGINC: DISTRICT AVERAGE INCOME (IN $1000'S); The STATA do-file: sum testscr, detail sum avginc,detail sum el_pct,detail sum calw_pct,detail sum meal_pct,detail sum comp_stu,detail sum expn_stu,detail sum str, detail reg testscr avginc el_pct calw_pct meal_pct comp_stu expn_stu str reg testscr avginc el_pct meal_pct comp_stu expn_stu gen expn2=(expn_stu)^2 reg testscr avginc el_pct meal_pct comp_stu expn_stu expn2 gen avginc_meal= avginc*meal_pct reg testscr avginc el_pct meal_pct comp_stu expn_stu avginc_meal gen log_testscr= log( testscr) gen log_exp=log( expn_stu) reg log_testscr avginc el_pct meal_pct comp_stu log_exp gen comp_expn= comp_stu+ expn_stu reg testscr avginc el_pct meal_pct comp_expn All Outputs after executing the STATA do-file: do "C:\Users\Jamil H Chowdhury\Desktop\NSU\ECO 372 project\caschool.do" . sum testscr, detail testscr 1% 5% 10% 25% 50% 75% 90% 95% 99% . sum Percentiles 612.65 623.15 630.375 640 Smallest 605.55 606.75 609 612.5 654.45 666.675 679.1 685.5 698.45 Largest 699.1 700.3 704.3 706.75 Obs Sum of Wgt. 420 420 Mean Std. Dev. 654.1565 19.05335 Variance Skewness Kurtosis 363.0301 .0916151 2.745712 avginc,detail avginc 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 6.613 7.632 8.925666 10.639 Smallest 5.335 5.699 6.216 6.577 13.7278 17.638 22.7997 30.73425 43.23 Largest 43.23 49.939 50.677 55.328 Obs Sum of Wgt. 420 420 Mean Std. Dev. 15.31659 7.22589 Variance Skewness Kurtosis 52.21348 2.215156 9.532125 . sum el_pct,detail el_pct 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 0 0 0 1.939866 Smallest 0 0 0 0 8.777634 23.00052 43.91753 53.65335 76.66525 Largest 77.00581 80.12326 80.42009 85.53972 Obs Sum of Wgt. 420 420 Mean Std. Dev. 15.76816 18.28593 Variance Skewness Kurtosis 334.3751 1.426798 4.435401 . sum calw_pct,detail calw_pct 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 0 .73285 1.9716 4.37715 Smallest 0 0 0 0 10.52045 19.0308 27.2148 34.39185 52.2199 Largest 55.0323 58.7522 71.7131 78.9942 Obs Sum of Wgt. 420 420 Mean Std. Dev. 13.24604 11.45482 Variance Skewness Kurtosis 131.2129 1.683061 7.589592 . sum meal_pct,detail meal_pct 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 0 2.23835 9.902 23.2634 Smallest 0 0 0 0 41.7507 66.87865 83.1386 90.4543 100 Largest 100 100 100 100 Obs Sum of Wgt. 420 420 Mean Std. Dev. 44.70524 27.12338 Variance Skewness Kurtosis 735.6778 .1839536 2.000198 . sum comp_stu,detail comp_stu 1% 5% 10% 25% 50% Percentiles 0 .0544449 .0663632 .0936371 Smallest 0 0 0 0 .1254644 75% 90% 95% 99% .1645296 .2256257 .2527498 .3276955 Largest .3435898 .3497942 .3589744 .4208333 99% .3276955 .4208333 Obs Sum of Wgt. 420 420 Mean Std. Dev. .1359266 .0649558 Variance Skewness Kurtosis .0042193 .9223692 4.431126 Kurtosis 4.431126 . sum expn_stu,detail expn_stu 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 4136.251 4438.913 4615.08 4906.13 Smallest 3926.07 4016.416 4023.532 4079.129 5214.517 5603.195 6110.483 6552.784 7542.038 Largest 7593.406 7614.379 7667.572 7711.507 Obs Sum of Wgt. 420 420 Mean Std. Dev. 5312.408 633.9371 Variance Skewness Kurtosis 401876.2 1.067897 4.875713 . sum str, detail str 1% 5% 10% 25% 50% 75% 90% 95% 99% Percentiles 15.13898 16.41658 17.34573 18.58179 Smallest 14 14.20176 14.54214 14.70588 19.72321 20.87183 21.87561 22.64514 24.88889 Largest 24.95 25.05263 25.78512 25.8 Obs Sum of Wgt. Mean Std. Dev. Variance Skewness Kurtosis 420 420 19.64043 1.891812 3.578952 -.0253655 3.609597 99% 24.88889 . reg 25.8 Kurtosis testscr avginc el_pct calw_pct meal_pct Source SS df 123098.481 29011.1128 7 412 17585.4973 70.4153223 Total 152109.594 419 363.030056 Coef. avginc el_pct calw_pct meal_pct comp_stu expn_stu str _cons .6216732 -.1981365 -.0778183 -.375618 11.89028 .0015263 -.18991 659.5871 . reg Std. Err. .0877192 .033234 .0572156 .0358925 6.898228 .0008917 .2835384 9.023305 testscr avginc el_pct meal_pct Source SS df t 7.09 -5.96 -1.36 -10.47 1.72 1.71 -0.67 73.10 5 414 24587.1671 70.4680149 Total 152109.594 419 363.030056 avginc el_pct meal_pct comp_stu expn_stu _cons .6194044 -.1859543 -.4064677 13.50629 .0016874 654.9726 P>|t| Std. Err. .0877352 .0313278 .027952 6.800089 .0007307 3.61974 = = = = = = 420 249.74 0.0000 0.8093 0.8060 8.3914 [95% Conf. Interval] 0.000 0.000 0.175 0.000 0.086 0.088 0.503 0.000 MS 122935.835 29173.7582 Coef. Number of obs F( 7, 412) Prob > F R-squared Adj R-squared Root MSE .4492401 -.263466 -.1902892 -.4461733 -1.66983 -.0002264 -.7472724 641.8496 .7941062 -.1328071 .0346526 -.3050627 25.4504 .0032791 .3674525 677.3245 comp_stu expn_stu Model Residual testscr comp_stu expn_stu str MS Model Residual testscr 3.609597 t 7.06 -5.94 -14.54 1.99 2.31 180.94 Number of obs F( 5, 414) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.048 0.021 0.000 = = = = = = 420 348.91 0.0000 0.8082 0.8059 8.3945 [95% Conf. Interval] .4469423 -.2475356 -.4614132 .1392873 .000251 647.8573 .7918665 -.1243729 -.3515222 26.8733 .0031238 662.088 . gen expn2=(expn_stu)^2 . reg testscr avginc el_pct meal_pct Source SS df comp_stu expn_stu MS Model Residual 122982.974 29126.6197 6 413 20497.1623 70.524503 Total 152109.594 419 363.030056 testscr Coef. avginc el_pct meal_pct comp_stu expn_stu expn2 _cons .615671 -.1849029 -.4062285 12.77274 -.0041014 5.16e-07 671.0976 Std. Err. .0878891 .0313667 .0279647 6.86173 .0071182 6.31e-07 20.053 t 7.01 -5.89 -14.53 1.86 -0.58 0.82 33.47 expn2 Number of obs F( 6, 413) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.063 0.565 0.414 0.000 = = = = = = 420 290.64 0.0000 0.8085 0.8057 8.3979 [95% Conf. Interval] .4429052 -.2465613 -.4611994 -.7155316 -.0180938 -7.24e-07 631.6789 .7884368 -.1232446 -.3512576 26.26101 .0098911 1.76e-06 710.5162 . gen avginc_meal= avginc*meal_pct . reg testscr avginc el_pct meal_pct Source SS df comp_stu expn_stu avginc_meal MS Model Residual 123035.301 29074.2922 6 413 20505.8836 70.397802 Total 152109.594 419 363.030056 testscr Coef. avginc el_pct meal_pct comp_stu expn_stu avginc_meal _cons .6514535 -.1882845 -.3719248 12.46924 .0016199 -.0033181 655.3017 Std. Err. .091743 .0313735 .0403118 6.852468 .0007326 .0027915 3.62851 t 7.10 -6.00 -9.23 1.82 2.21 -1.19 180.60 Number of obs F( 6, 413) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.070 0.028 0.235 0.000 = = = = = = 420 291.29 0.0000 0.8089 0.8061 8.3903 [95% Conf. Interval] .4711121 -.2499562 -.4511667 -1.000829 .0001799 -.0088055 648.169 .8317949 -.1266129 -.2926828 25.9393 .00306 .0021692 662.4343 . gen log_testscr= log( testscr) . gen log_exp=log( expn_stu) . reg log_testscr avginc el_pct meal_pct Source SS df comp_stu log_exp MS Model Residual .286965303 .068227399 5 414 .057393061 .0001648 Total .355192702 419 .000847715 log_testscr Coef. avginc el_pct meal_pct comp_stu log_exp _cons .0009058 -.0002937 -.0006265 .0210436 .0132001 6.38569 Std. Err. .0001338 .0000479 .0000427 .010361 .0061378 .0511922 t 6.77 -6.13 -14.66 2.03 2.15 124.74 Number of obs F( 5, 414) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.043 0.032 0.000 = = = = = = 420 348.26 0.0000 0.8079 0.8056 .01284 [95% Conf. Interval] .0006428 -.0003878 -.0007105 .0006768 .0011349 6.285061 .0011687 -.0001995 -.0005425 .0414104 .0252653 6.486319 . gen comp_expn= comp_stu+ expn_stu . reg testscr avginc el_pct meal_pct comp_expn Source df MS Model Residual 122657.926 29451.6675 4 415 30664.4815 70.9678735 Total 152109.594 419 363.030056 testscr Coef. avginc el_pct meal_pct comp_expn _cons .6218424 -.1953478 -.4079631 .0020521 655.0487 . end of do-file . SS Std. Err. .0880372 .0310783 .0280408 .0007098 3.632352 t 7.06 -6.29 -14.55 2.89 180.34 Number of obs F( 4, 415) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.004 0.000 = = = = = = 420 432.09 0.0000 0.8064 0.8045 8.4242 [95% Conf. Interval] .4487879 -.2564383 -.4630827 .0006569 647.9086 .7948969 -.1342574 -.3528435 .0034473 662.1888