Uploaded by nader jamil

Econometric Analysis - California School Test Scores

advertisement
I.
Introduction:
THE CALIFORNIA TEST SCORE DATA SET
The California Standardized Testing and Reporting (STAR) dataset contains data on test
performance, school characteristics and student demographic backgrounds. The data used here
are from all 420 K-6 and K-8 districts in California with data available for 1998 and 1999. Test
scores are the average of the reading and math scores on the Stanford 9 standardized test
administered to 5th grade students. School characteristics (averaged across the district) include
enrollment, number of teachers (measured as “full-time-equivalents”), number of computers
per classroom, and expenditures per student. The student-teacher ratio used here is the
number of full-time equivalent teachers in the district, divided by the number of students.
Demographic variables for the students also are averaged across the district. The demographic
variables include the percentage of students in the public assistance program CalWorks
(formerly AFDC), the percentage of students that qualify for a reduced price lunch, and the
percentage of students that are English Learners (that is, students for whom English is a second
language). All of these data were obtained from the California Department of Education
(www.cde.ca.gov). However, in this research we wish to test how a set of selected variables
from the data set affect test scores. The Statistical Software used to carry out this research is
STATA.
We wish to test the following hypothesis:
i.
Test the relationship of avg test score with district average income , percent of English
Learners, Percent qualifying for CalWorks, Percent qualifying for reduced-price lunch ,
computers per student, expentitures per student,and student teacher ratio.
ii. Test whether there is a concave relationship between avg test score and expenditures
per student.
iii. Test whether there is a negative relationship between district average income and
Percent qualifying for reduced-price lunch.
iv. Test that percentage increase in expenditure per student affects percentage change in
avg test score positively.
v. Test whether computers per student and expenditure per student affect avg test score
equally.
II.
Empirical Methodology:
These are the variables we have used in our research:
Dependant Variable:

TESTSCR:
AVG TEST SCORE (= (READ_SCR+MATH_SCR)/2 );
Independent Varibles:

COMP_STU:
COMPUTERS PER STUDENT ( = COMPUTER/ENRL_TOT);

EXPN_STU:
EXPENTITURES PER STUDENT ($’S);

STR:
STUDENT TEACHER RATIO (ENRL_TOT/TEACHERS);

EL_PCT:
PERCENT OF ENGLISH LEARNERS(SECOND LANGUAGE)

MEAL_PCT:
PERCENT QUALIFYING FOR REDUCED-PRICE LUNCH;

CALW_PCT:
PERCENT QUALIFYING FOR CALWORKS;

AVGINC:
DISTRICT AVERAGE INCOME (IN $1000'S);
Other variables that have been constructed/generated to test certain hypotheses:
Dependant:

LOG_TESTSCR: LOG(TESTSCR)
Independent:

EXPN2: (EXPN_STU)^2

AVGINC_MEAL: (AVGINC*MEAL_PCT)

LOG_EXP: LOG(EXPN_STU)

COMP_EXPN: (COMP_STU+ EXPN_STU)
Presentation and discussion of the summary statistic of key variables:
Variable
TESTSCR

Variance
363.0301
Skewness
.0916151
Kurtosis
2.745712
Mean
15.31659
St. Deviation
7.22589
Variance
52.21348
Skewness
2.215156
Kurtosis
9.532125
The mean of district average incomes is $15316.59 with a std. deviation of 7.22589.
Since Kurtosis > 3, its has a Leptokurtic distribution, sharper than a normal distribution
with values concentrated around the mean and thicker tails. Skewness > 0, so we have a
Right skewed distribution - most values are concentrated on left of the mean, with
extreme values to the right. This means that there are a few exceptional district schools
with average income much higher than the mean.
Variable
EL_PCT

Std. Deviation
19.05335
The mean of test scores is 654.1565 score with a std. deviation of 19.05335. Since
Kurtosis < 3, it has a Platykurtic distribution, flatter than a normal distribution with a
wider peak. Although the distribution is positively skewed, the magnitude of skewness is
negligible and hence can be assumed to be symmetrical around the mean.
Variable
AVGINC

Mean
654.1565
Mean
15.76816
St. Deviation
18.28593
Variance
334.3751
Skewness
1.426798
Kurtosis
4.435401
The mean of percentage of students studying English is 15.76% with a std. deviation of
18.28. The std. deviation is very high indicating that the values are very scattered away
from the mean, meaning that there are district schools were percentage of students
studying English are both much higher and much lower than the mean. Since Kurtosis >
3, we have a Leptokurtic distribution, sharper than a normal distribution, with values
concentrated around the mean and thicker tails. Skewness > 0, so we have a Right
skewed distribution with most values concentrated on left of the mean, with extreme
values to the right. This means that a few district schools have a substantially higher
percentage of students studying English as a second language.
Variable
MEAL_PCT

Variance
735.6778
Skewness
.1839536
Kurtosis
2.000198
Mean
.1359266
St. Deviation
.0649558
Variance
.0042193
Skewness
.9223692
Kurtosis
4.431126
The mean of number of computers per student is .136 with a std. deviation of .065.
Since Kurtosis > 3, the distribution is Leptokurtic, sharper than a normal distribution,
with values concentrated around the mean and thicker tails. Skewness > 0, implies a
Right skewed distribution - most values are concentrated on left of the mean, with
extreme values to the right. This means that there are few districts schools which have a
much much higher number of computers per student than the mean.
Variable
EXPN_STU

St. Deviation
27.12338
The mean of percentage of students qualifying for a reduced meal price is 44.7%,
which is quite substantial. The std. deviation is 27.12, which is quite high indicating that
there are district schools where percentage of students qualifying for a reduced meal
price is both much higher and much lower. Since Kurtosis < 3, is is a Platykurtic
distribution, flatter than a normal distribution with a wider peak. Skewness > 0, so we
have a Right skewed distribution, but the skewness is low. Most values are
concentrated on left of the mean, with very few extreme values (districts where
percentage of students qualifying for a reduced meal price is much much higher than
44.7%).
Variable
COMP_STU

Mean
44.70524
Mean
5312.408
St. Deviation
633.9371
Variance
401876.2
Skewness
1.067897
Kurtosis
4.875713
The Mean of Expenditure per student is $5312.4 with a std. deviation of 633.9371.
Since Kurtosis > 3, the distribution is Leptokurtic , sharper than a normal distribution,
with values concentrated around the mean and thicker tails. Skewness > 0, implying a
Right skewed distribution - most values are concentrated on left of the mean, with
extreme values to the right. This means there are a few district schools that have a very
high expenditure per student.
The Regression Equation(s):
i. Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ
ii. Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ
iii. Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ
iv. Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp+ ɛ
v. Testscr – U5 = β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_ expn)+ ɛ
Expected signs of coefficients:
For Regression Equation (i)
Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ
It is expected that,

β1>0 (since students from districts with higher average income are expected to get
better facilities/tools for education )

β2 <0 (since students who have English as a second language are expected to be weaker
in English and this should be reflected in them getting lower scores in read score and
eventually leading to lower test scores; hence if percentage of English learners is higher,
test scores should be lower on average)

β3<0 (since if more students are in calw, this indicates that they are worse-off and hence
probably have lesser access to tools that can help in education, resulting in lower test
scores in average)

β4 <0 (since those who qualify for a reduced meal price are expected to be worse-off
and hence probably have lesser access to facilities/tools that can help in education)

β5>0 (since more computers per student ensures everyone can utilize their computer
better and for a longer time)

β6 >0 (since increase in expenditure per student means better facilities and better
quality of education and hence better results)

β7<0 (more student teacher ratio means that students get lesser individual attention of
the teacher)
For Regression Equation (ii)
Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ
It is expected that,

Β6<0 (since, too much easily received facilities may make the students value the facilities
less and hence not make proper use of them)
For Regression Equation (iii)
Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ
It is expected that,

β6<0 (since it is expected that districts with higher average income would have lower
percentage of students that qualify for reduced price lunch)
For Regression Equation (iv)
Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ
It is expected that,

β5>0 (since a percentage increase in expenditure per student is likely to cause an
increase in test scores)
For Regression Equation (v)
Testscr – U5= β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_expn)+ ɛ
It is expected that,

β4>0 (since both comp_stu and expn_stu are positive individually; however, whether
they affect test scores equally is ambiguous).
III. Estimated Results:
Report of estimated results and explanation of estimated models:
For Regression Equation (i)
Testscr = β0 + β1avginc + β2el_pct + β3calw_pct + β4meal_pct + β5comp_stu + β6 expn_stu + β7 str + ɛ
Independant
Variables
Avginc
El_pct
calw_pct
meal_pct
comp_stu
expn_stu
str
_cons
Value of
Coefficient
.6216732
-.1981365
-.0778183
-.375618
11.89028
.0015263
-.18991
659.5871
P-value
0.000
0.000
0.175
0.000
0.086
0.088
0.503
0.000
Standard
error
8.3914
R-squared
0.8093
Adjusted Rsquared
0.8060
F-statistic
(Prob>F)
249.74
(0.0000)

From the table above we can see that str has negative coefficient (β7 <0, as
hypothesized) but the p-value is very high at 0.503, hence indicating clearly that the
coefficient is insignificant.

Also, the coefficient of calw_pct is negative (β3<0, as hypothesized) but here also the pvalue is high again proving that the coefficient is insignificant.

So, the variables str and calw_pct should be dropped from our model. We have to reject
our Hypothesis for the coefficients of these two variables.
We therefore get a new equation for our model:
New Regression equation (i)
Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + ɛ
Note: It must be noted that the numeration of the coefficients have changed as from the
original equation since we have changed our original model by dropping two variables. E.g.
β3 was the coefficient of calw_pct in the original model but in the new model β3 is the
coefficient of meal_pct. Hence, while comparing the results of hypotheses tests this
difference in numeration must be kept in mind.
The estimated results, for the new model is as follows:
Independant
Variables
Avginc
El_pct
meal_pct
comp_stu
expn_stu
_cons
Value of
Coefficient
.6194044
-.1859543
-.4064677
13.50629
.0016874
654.9726
P-value
0.000
0.000
0.000
0.048
0.021
0.000
Standard
error
8.3945
R-squared
0.8082
Adjusted Rsquared
0.8059
F-statistic
(Prob>F)
348.91
(0.0000)

From the table above we can see that all the coefficients have p-values less than 5.
Hence, all the coefficients are significant.

The coefficient of avginc is .6194044 i.e. β1>0 as we hypothesized. So, we do not reject
our hypothesis. This means that if District average income increases by 1unit, i.e. $1000,
average test score is estimated to increase by .6194044 scores on average.

The coefficient of el_pct is negative i.e. β2<0 as we hypothesized. So, we do not reject
our hypothesis. This means that where percentage of students studying English is 1%
more, it is estimated that test scores will be .1859543 scores lesser on average.

The coefficient of meal_pct is negative i.e. β3<0 as we hypothesized. So, we do not
reject our hypothesis. This means that where there are 1% more students qualifying for
reduced price lunch we expect test scores there to be .4064677 scores lesser on
average.

The coefficient of comp_stu is positive i.e. β4>0 as we hypothesized. So, we do not reject
our hypothesis. This means that when there is an increase of 1 computer per student,
test scores are estimated to go up by 13.50629 scores on average.

The coefficient of expn_stu is positive i.e. β5>0 as we hypothesized. So, we do not reject
our hypothesis. This means that when there is an increase in expenditure per student by
$1, test scores are estimated to go up by .0016874 scores on average.
For Regression Equation (ii)
Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6 expn2 + ɛ
Independant
Variables
Avginc
El_pct
meal_pct
comp_stu
expn_stu
expn2
_cons

Value of
Coefficient
.615671
-.1849029
-.4062285
12.77274
-.0041014
5.16e-07
671.0976
P-value
0.000
0.000
0.000
0.063
0.0565
0.414
0.000
Standard
error
8.3979
R-squared
0.8085
Adjusted Rsquared
0.8057
F-statistic
(Prob>F)
290.64
(0.0000)
From the table we can see that the coefficient of expn2 is neither negative nor
significant. Hence, we should reject our hypothesis that the relationship between test
score and expenditure per student is concave.
For Regression Equation (iii)
Testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5expn_stu + β6(avginc_meal) + ɛ
Independent
Variables
Avginc
El_pct
meal_pct
comp_stu
expn_stu
Avginc_meal
_cons

Value of
Coefficient
.6514535
-.1882845
-.3719248
12.46924
.0016199
-.0033181
655.3017
P-value
0.000
0.000
0.000
0.070
0.028
0.235
0.000
Standard
error
8.3903
R-squared
0.8089
Adjusted
R-squared
0.8061
F-statistic
(Prob>F)
291.29
(0.0000)
Our hypothesis that β6<0 should be rejected because although the sign of the coefficient
is negative, the high p-value indicates insignificance of the coefficient. Hence, we cannot
conclude that districts with higher average income would have lower percentage of
students that qualify for reduced price lunch.
For Regression Equation (iv)
Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ
Independent
Variables
Avginc
El_pct
meal_pct
comp_stu
Log_exp
_cons

Value of
Coefficient
.0009058
-.0002937
-.0006265
.0210436
. 0132001
6.38569
P-value
0.000
0.000
0.000
0.043
0.032
0.000
Standard
error
0.01284
R-squared
0.8079
Adjusted
R-squared
0.8056
F-statistic
(Prob>F)
348.26
(0.0000)
The p-value of the coefficient of log(expn_stu) is less than 5% indicating the coefficient
is significant. The positive value of the coefficient matches our hypothesis (i.e. β5>0).
This means that a 1% increase in expenditure per student is estimated to increase test
scores by .0132001% (since this is a log-log model).
For Regression Equation (v)
Testscr - Uk = β0 + β1avginc + β2el_pct + β3meal_pct + β4(comp_expn)+ ɛ
Independent
Variables
Avginc
El_pct
meal_pct
comp_expn
_cons
Value of
Coefficient
.0009058
-.0002937
-.0006265
.0210436
6.38569
P-value
0.000
0.000
0.000
0.043
0.000
Standard
error
8.4242
R-squared
0.8064
Adjusted
R-squared
0.8045
F-statistic
(Prob>F)
432.09
(0.0000)

The coefficient is significant and positive as expected.

However, to test whether comp_stu and expn_stu affect testscr equally, we must find
out the F-calc for this restricted model.

SSEr = 29451.6675; SSEur = 29173.7582; q = 1 (H0: β4= β5; on the restricted model); n-k1= 415 [refer to STATA outputs in the appendix for these values].

Hence F-calc = 3.9533

F-crit (α= 5%) = 3.84; F-crit (α= 2.5%)= 5.03

At the 5% significance level, F-calc>F-crit (3.953>3.84). So, we have strong evidence to
reject the null. i.e. we can conclude that number of computers per student and
expenditure per student do not affect test scores equally (β4 ≠ β5).

At the 2.5% significance level, F-calc<F-crit (3.953<5.03). So, we do not reject null. i.e.
we cannot say that number of computers per student and expenditure per student do
not affect test scores equally.
The estimated models, the ‘best’ model and whether we reject or do not reject
our hypothesis based on the ‘best’ model:

All the models we have used have very high F values indicating that all the models are
valid.

Regression model (iv) has by far the lowest standard error (Root MSE)=0.1284

All our models have R-squared and adjusted R-squared >0.8, indicating that all models
are able to account for more than 80% of the changes in dependant variable-test
score.

Since the regression model (i) had two variables (calw_pct and str) with insignificant
coefficients, these two variables were dropped and a new regression model (i) was
formed.

Since both regression models (ii) and (iii) have insignificant coefficients for the variables
that are relevant to our study (i.e. expn2 and avginc_meal), these two models are now
irrelevant.

Since the regression model (v) was a restricted model for the new regression model (i)
to test a certain hypothesis, we shall not count it as a candidate for a general model.

Hence, we are left with two general models for the candidate of ‘best ‘model: the new
regression model (i) and regression model (iv).

All the coefficients of the variables of these two models are statistically significant.

Between these two models, new regression model (i) has the higher R-squared
(=0.8082) and Adjusted R-squared (=0.8059); regression model (iv) has Rsquared=0.8079 and adjusted R-squared=0.8056; the difference in these values are
negligible. However, the difference in the values for standard error in these two models
is remarkable. The new regression model (i) has a standard error = 8.3945 whereas the
regression model (iv) has a standard error=0.1284.

Hence, our choice for the best model is the regression model (iv) i.e.
Log_testscr = β0 + β1avginc + β2el_pct + β3meal_pct + β4comp_stu + β5log_exp + ɛ
Independent
Variables
Avginc
El_pct
meal_pct
comp_stu
Log_exp
_cons
Value of
Coefficient
.0009058
-.0002937
-.0006265
.0210436
. 0132001
6.38569
P-value
0.000
0.000
0.000
0.043
0.032
0.000
Standard
error
0.01284
R-squared
0.8079
Adjusted
R-squared
0.8056
F-statistic
(Prob>F)
348.26
(0.0000)
Where,
 All our hypotheses relating to the best model have come out to be as we expected (i.e.
β1, β4, β5>0; β2, β3<0). Hence, we do not reject any of our hypothesis relating to the best
model (although we have rejected some of our hypotheses in other models).
 (β1*100) is the estimated percentage change in test scores when district average income
increases by 1 unit (i.e. $1000). The table shows that test scores are estimated increase
by .09% on average when district average income increases by $1000.
 β2 is the estimated percentage change in test scores when percentage of English
learners is 1% higher. The table shows that test scores are estimated to be -.029% lower
on average where percentage of English learners is 1% higher.
 β3 is the estimated percentage change in test scores when percentage qualifying for
reduced meal price is increased by 1%. The table shows that test scores is estimated to
be -.062% lower on average where percentage qualifying for reduced meal price is
increased by 1%.
 (β4*100) is the estimated percentage change in test scores when the number of
computer s per student increases by 1. The table shows that test scores are estimated
to increase by 2.104% when the number of computer s per student increases by 1.
 β5 is the estimated percentage change in test scores when expenditure per student is
increased by 1%. The table shows that a 1% increase in expenditure per student is
estimated to increase test scores by .0132001%.
IV. Summary:
Our research brings us to some interesting conclusions. We have found that student teacher
ratio does not have an effect on the test scores, although we presumed it would have a
negative effect on test scores.
The percentage of students qualifying for calworks also does not affect the test scores.
Districts with higher average income do see better test scores but increasing district average
income will not increase test scores by much; this is indicated by the low coefficient.
The students who are weak in English, i.e. those who are studying English as a second language,
do score lower in tests on average; yet the coefficient is small indicating that improving
students’ abilities in English will contribute little to the increase in test score.
Districts, which see a greater number of students qualifying for a reduced-price meal, also see
lower test scores. However, yet again the coefficient is small indicating that increased family
income will not increase test scores in a major way.
If the number of computers per student is increased, we see there is a substantial rise in test
scores. Hence, our suggestion is that initiative be taken to ensure that there is one computer
per student as this will help improve test scores.
Where expenditure per student is more we experience an increased test scores. However,
increasing expenditure per student generally will increase test scores by a small amount as
indicated by the small coefficient. We have strong evidence that expenditure per student and
computer per student do not affect test scores equally. Hence, a more detail study needs to be
carried out to find out expenditure on what specific facilities/sectors shall increase test scores
substantially.
V. Appendix:
The data set contains the following variables:
DIST_CODE:
DISTRICT CODE;
READ_SCR:
AVG READING SCORE;
MATH_SCR:
AVG MATH SCORE;
COUNTY :
COUNTY;
DISTRICT:
DISTRICT;
GR_SPAN:
GRADE SPAN OF DISTRICT;
ENRL_TOT :
TOTAL ENROLLMENT;
TEACHERS:
NUMBER OF TEACHERS;
COMPUTER:
NUMBER OF COMPUTERS;
TESTSCR:
AVG TEST SCORE (= (READ_SCR+MATH_SCR)/2 );
COMP_STU:
COMPUTERS PER STUDENT ( = COMPUTER/ENRL_TOT);
EXPN_STU:
EXPENTITURES PER STUDENT ($’S);
STR:
STUDENT TEACHER RATIO (ENRL_TOT/TEACHERS);
EL_PCT:
PERCENT OF ENGLISH LEARNERS;
MEAL_PCT:
PERCENT QUALIFYING FOR REDUCED-PRICE LUNCH;
CALW_PCT:
PERCENT QUALIFYING FOR CALWORKS;
AVGINC:
DISTRICT AVERAGE INCOME (IN $1000'S);
The STATA do-file:
sum testscr, detail
sum avginc,detail
sum el_pct,detail
sum calw_pct,detail
sum meal_pct,detail
sum comp_stu,detail
sum expn_stu,detail
sum str, detail
reg testscr avginc el_pct calw_pct meal_pct comp_stu expn_stu str
reg testscr avginc el_pct meal_pct comp_stu expn_stu
gen expn2=(expn_stu)^2
reg testscr avginc el_pct meal_pct comp_stu expn_stu expn2
gen avginc_meal= avginc*meal_pct
reg testscr avginc el_pct meal_pct comp_stu expn_stu avginc_meal
gen log_testscr= log( testscr)
gen log_exp=log( expn_stu)
reg log_testscr avginc el_pct meal_pct comp_stu log_exp
gen comp_expn= comp_stu+ expn_stu
reg testscr avginc el_pct meal_pct comp_expn
All Outputs after executing the STATA do-file:
do "C:\Users\Jamil H Chowdhury\Desktop\NSU\ECO 372 project\caschool.do"
. sum testscr, detail
testscr
1%
5%
10%
25%
50%
75%
90%
95%
99%
. sum
Percentiles
612.65
623.15
630.375
640
Smallest
605.55
606.75
609
612.5
654.45
666.675
679.1
685.5
698.45
Largest
699.1
700.3
704.3
706.75
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
654.1565
19.05335
Variance
Skewness
Kurtosis
363.0301
.0916151
2.745712
avginc,detail
avginc
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
6.613
7.632
8.925666
10.639
Smallest
5.335
5.699
6.216
6.577
13.7278
17.638
22.7997
30.73425
43.23
Largest
43.23
49.939
50.677
55.328
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
15.31659
7.22589
Variance
Skewness
Kurtosis
52.21348
2.215156
9.532125
. sum el_pct,detail
el_pct
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
0
0
0
1.939866
Smallest
0
0
0
0
8.777634
23.00052
43.91753
53.65335
76.66525
Largest
77.00581
80.12326
80.42009
85.53972
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
15.76816
18.28593
Variance
Skewness
Kurtosis
334.3751
1.426798
4.435401
. sum calw_pct,detail
calw_pct
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
0
.73285
1.9716
4.37715
Smallest
0
0
0
0
10.52045
19.0308
27.2148
34.39185
52.2199
Largest
55.0323
58.7522
71.7131
78.9942
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
13.24604
11.45482
Variance
Skewness
Kurtosis
131.2129
1.683061
7.589592
. sum meal_pct,detail
meal_pct
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
0
2.23835
9.902
23.2634
Smallest
0
0
0
0
41.7507
66.87865
83.1386
90.4543
100
Largest
100
100
100
100
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
44.70524
27.12338
Variance
Skewness
Kurtosis
735.6778
.1839536
2.000198
. sum comp_stu,detail
comp_stu
1%
5%
10%
25%
50%
Percentiles
0
.0544449
.0663632
.0936371
Smallest
0
0
0
0
.1254644
75%
90%
95%
99%
.1645296
.2256257
.2527498
.3276955
Largest
.3435898
.3497942
.3589744
.4208333
99%
.3276955
.4208333
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
.1359266
.0649558
Variance
Skewness
Kurtosis
.0042193
.9223692
4.431126
Kurtosis
4.431126
. sum expn_stu,detail
expn_stu
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
4136.251
4438.913
4615.08
4906.13
Smallest
3926.07
4016.416
4023.532
4079.129
5214.517
5603.195
6110.483
6552.784
7542.038
Largest
7593.406
7614.379
7667.572
7711.507
Obs
Sum of Wgt.
420
420
Mean
Std. Dev.
5312.408
633.9371
Variance
Skewness
Kurtosis
401876.2
1.067897
4.875713
. sum str, detail
str
1%
5%
10%
25%
50%
75%
90%
95%
99%
Percentiles
15.13898
16.41658
17.34573
18.58179
Smallest
14
14.20176
14.54214
14.70588
19.72321
20.87183
21.87561
22.64514
24.88889
Largest
24.95
25.05263
25.78512
25.8
Obs
Sum of Wgt.
Mean
Std. Dev.
Variance
Skewness
Kurtosis
420
420
19.64043
1.891812
3.578952
-.0253655
3.609597
99%
24.88889
. reg
25.8
Kurtosis
testscr avginc el_pct calw_pct meal_pct
Source
SS
df
123098.481
29011.1128
7
412
17585.4973
70.4153223
Total
152109.594
419
363.030056
Coef.
avginc
el_pct
calw_pct
meal_pct
comp_stu
expn_stu
str
_cons
.6216732
-.1981365
-.0778183
-.375618
11.89028
.0015263
-.18991
659.5871
. reg
Std. Err.
.0877192
.033234
.0572156
.0358925
6.898228
.0008917
.2835384
9.023305
testscr avginc el_pct meal_pct
Source
SS
df
t
7.09
-5.96
-1.36
-10.47
1.72
1.71
-0.67
73.10
5
414
24587.1671
70.4680149
Total
152109.594
419
363.030056
avginc
el_pct
meal_pct
comp_stu
expn_stu
_cons
.6194044
-.1859543
-.4064677
13.50629
.0016874
654.9726
P>|t|
Std. Err.
.0877352
.0313278
.027952
6.800089
.0007307
3.61974
=
=
=
=
=
=
420
249.74
0.0000
0.8093
0.8060
8.3914
[95% Conf. Interval]
0.000
0.000
0.175
0.000
0.086
0.088
0.503
0.000
MS
122935.835
29173.7582
Coef.
Number of obs
F( 7,
412)
Prob > F
R-squared
Adj R-squared
Root MSE
.4492401
-.263466
-.1902892
-.4461733
-1.66983
-.0002264
-.7472724
641.8496
.7941062
-.1328071
.0346526
-.3050627
25.4504
.0032791
.3674525
677.3245
comp_stu expn_stu
Model
Residual
testscr
comp_stu expn_stu str
MS
Model
Residual
testscr
3.609597
t
7.06
-5.94
-14.54
1.99
2.31
180.94
Number of obs
F( 5,
414)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.048
0.021
0.000
=
=
=
=
=
=
420
348.91
0.0000
0.8082
0.8059
8.3945
[95% Conf. Interval]
.4469423
-.2475356
-.4614132
.1392873
.000251
647.8573
.7918665
-.1243729
-.3515222
26.8733
.0031238
662.088
. gen
expn2=(expn_stu)^2
. reg
testscr avginc el_pct meal_pct
Source
SS
df
comp_stu expn_stu
MS
Model
Residual
122982.974
29126.6197
6
413
20497.1623
70.524503
Total
152109.594
419
363.030056
testscr
Coef.
avginc
el_pct
meal_pct
comp_stu
expn_stu
expn2
_cons
.615671
-.1849029
-.4062285
12.77274
-.0041014
5.16e-07
671.0976
Std. Err.
.0878891
.0313667
.0279647
6.86173
.0071182
6.31e-07
20.053
t
7.01
-5.89
-14.53
1.86
-0.58
0.82
33.47
expn2
Number of obs
F( 6,
413)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.063
0.565
0.414
0.000
=
=
=
=
=
=
420
290.64
0.0000
0.8085
0.8057
8.3979
[95% Conf. Interval]
.4429052
-.2465613
-.4611994
-.7155316
-.0180938
-7.24e-07
631.6789
.7884368
-.1232446
-.3512576
26.26101
.0098911
1.76e-06
710.5162
. gen avginc_meal= avginc*meal_pct
. reg
testscr avginc el_pct meal_pct
Source
SS
df
comp_stu expn_stu avginc_meal
MS
Model
Residual
123035.301
29074.2922
6
413
20505.8836
70.397802
Total
152109.594
419
363.030056
testscr
Coef.
avginc
el_pct
meal_pct
comp_stu
expn_stu
avginc_meal
_cons
.6514535
-.1882845
-.3719248
12.46924
.0016199
-.0033181
655.3017
Std. Err.
.091743
.0313735
.0403118
6.852468
.0007326
.0027915
3.62851
t
7.10
-6.00
-9.23
1.82
2.21
-1.19
180.60
Number of obs
F( 6,
413)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.070
0.028
0.235
0.000
=
=
=
=
=
=
420
291.29
0.0000
0.8089
0.8061
8.3903
[95% Conf. Interval]
.4711121
-.2499562
-.4511667
-1.000829
.0001799
-.0088055
648.169
.8317949
-.1266129
-.2926828
25.9393
.00306
.0021692
662.4343
. gen log_testscr= log( testscr)
. gen log_exp=log( expn_stu)
. reg log_testscr avginc el_pct meal_pct
Source
SS
df
comp_stu log_exp
MS
Model
Residual
.286965303
.068227399
5
414
.057393061
.0001648
Total
.355192702
419
.000847715
log_testscr
Coef.
avginc
el_pct
meal_pct
comp_stu
log_exp
_cons
.0009058
-.0002937
-.0006265
.0210436
.0132001
6.38569
Std. Err.
.0001338
.0000479
.0000427
.010361
.0061378
.0511922
t
6.77
-6.13
-14.66
2.03
2.15
124.74
Number of obs
F( 5,
414)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.043
0.032
0.000
=
=
=
=
=
=
420
348.26
0.0000
0.8079
0.8056
.01284
[95% Conf. Interval]
.0006428
-.0003878
-.0007105
.0006768
.0011349
6.285061
.0011687
-.0001995
-.0005425
.0414104
.0252653
6.486319
. gen comp_expn= comp_stu+ expn_stu
. reg
testscr avginc el_pct meal_pct comp_expn
Source
df
MS
Model
Residual
122657.926
29451.6675
4
415
30664.4815
70.9678735
Total
152109.594
419
363.030056
testscr
Coef.
avginc
el_pct
meal_pct
comp_expn
_cons
.6218424
-.1953478
-.4079631
.0020521
655.0487
.
end of do-file
.
SS
Std. Err.
.0880372
.0310783
.0280408
.0007098
3.632352
t
7.06
-6.29
-14.55
2.89
180.34
Number of obs
F( 4,
415)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.000
0.000
0.000
0.004
0.000
=
=
=
=
=
=
420
432.09
0.0000
0.8064
0.8045
8.4242
[95% Conf. Interval]
.4487879
-.2564383
-.4630827
.0006569
647.9086
.7948969
-.1342574
-.3528435
.0034473
662.1888
Download