An Analysis of State Expenditures, Student

advertisement
Expenditures, Eligibility, and SAT Scores 1
Running Header: EXCEL PROJECT
Excel Project:
An Analysis of State Expenditures, Student Eligibility Percentages, and SAT Scores
Gavin L. Molitor
Seattle Pacific University
Expenditures, Eligibility, and SAT Scores 2
The data was collected from all 50 states and the District of Columbia (states) as
reported in the Digest of Education Statistics, an annual publication of the U.S. Department
of education, for a study done as a result of the debate and controversy over equity in
public school expenditures and the relation to academic performance. A variety of
statistical measures were used to analyze, compare, and interpret this data, and details are
accompanied by descriptions, results, conclusions, and a discussion.
Part 1 – Histograms, Box Plots, and Frequency Distribution
The data collected from each of the states and includes expenditures per pupil in
average daily attendance, student/teacher ratio, SAT scores in Reading, and SAT scores in
Writing. The following histograms and corresponding box plots are used to explain the
distributions of these variables as they appear in the data. We first look at expenditures.
Figure 1.1 - Expenditures
12
10
8
6
4
2
0
<=7000
(7000, 8000]
(8000, 9000]
(9000, 10000]
(10000, 11000]
(11000, 12000]
(12000, 13000]
(13000, 14000]
>14000
In histogram figure 1.1, the most frequent occurrences are for the eleven states
spending between $8,000 and $9,000 per pupil in average daily attendance in public
elementary and secondary schools, followed by ten states spending between $10,000 and
$11,000, and nine states spending between $9,000 and $10,000. From this histogram, we
Expenditures, Eligibility, and SAT Scores 3
can also observe that 59% of the States spend between $8,000 and $11,000. The lowest
frequency is for Utah, which spends $5,960. This data does not follow a normal
distribution. It is negatively skewed to the left and also indicates that a few states are
spending significantly more money in expenditures as shown in box plot figure 1.2.
Figure 1.2 - Expenditures
0
5000
10000
15000
20000
25000
Figure 1.2 shows that the median amount is $9805. After looking into the outliers
for this distribution, the District of Columbia, New York, and New Jersey spent significantly
higher amounts of money, and are largely responsible for the skewness. We now turn our
attention to the student-to-teacher ratio at the secondary level.
Figure 2.1 - Secondary Student/Teacher Ratio
14
12
10
8
6
4
2
0
<=10(10, 11]
(11, 12]
(12, 13]
(13, 14]
(14, 15]
(15, 16]
(16, 17]
(17, 18]
(18, 19]
(19, 20]
(20, 21]
(21, 22]
(22, 23]>23
In histogram figure 2.1, the most frequent value for student-to-teacher ratios fell
between 13.0 and 14.0 and occurred in thirteen different states. The distribution appears
Expenditures, Eligibility, and SAT Scores 4
to approach normality, but there are twenty-two states clustered between 14.0 and 17.0 as
well as a more frequent occurrence of higher ratios in other states. As a result, this data is
negatively skewed, indicated by the shorter lengths of the 2 lower quartiles and the longer
length of the highest quartile whisker in box plot figure 2.1.
Figure 2.1 - Secondary Student/Teacher Ratio
0
5
10
15
20
25
30
The median ratio in figure 2.1 is 14.8, yet the highest frequency of occurrence for
this variable was a ratio between 13.0 and 14.0. After analyzing the outliers in the data, it
became apparent that Utah, Oregon, California, and Arizona have an abnormally high
student-to-teacher ratio. Next, we look at SAT scores in Reading.
Figure 3.1 - SAT Scores in Reading
12
10
8
6
4
2
0
<=480
(480, 490]
(490, 500]
(500, 510]
(510, 520]
(520, 530]
(530, 540]
(540, 550]
(550, 560]
(560, 570]
(570, 580]
(580, 590]
(590, 600]
>600
Histogram figure 3.1 shows the most frequently occurring average for student SAT
scores in Reading fell between 490 and 500 for eleven different states. This data has a
strong negative skew because only three states had lower averages than the most frequent
Expenditures, Eligibility, and SAT Scores 5
occurrence. Box plot figure 3.2 shows that the lower 50% of scores in the date are more
clustered, and that scores in the upper 50% of the data have a much greater range.
Figure 3.2 - SAT Scores in Reading
0
100
200
300
400
500
600
700
800
900
The median value for this data is 523. The lowest value is 482 and the lower hinge
value is 498. This puts the lower 50% of the data in a range of 41. The upper hinge value is
569 and the highest value is 610. This gives the upper 50% of the data a range of 87. This
shows us that the states with higher average student scores are much more spread out
across the upper range of values, whereas the states with lower averages tend to cluster
near the lower hinge value.
Figure 4.1 - SAT Scores in Writing
12
10
8
6
4
2
0
<=480
(480, 490]
(490, 500]
(500, 510]
(510, 520]
(520, 530]
(530, 540]
(540, 550]
(550, 560]
(560, 570]
(570, 580]
(580, 590]
(590, 600]
>600
The data for SAT scores in writing, shown in figures 4.1 and 4.2, follows a very similar
trend as to the data for SAT scores in reading.
Expenditures, Eligibility, and SAT Scores 6
Figure 4.2 - SAT Scores in Writing
0
100
200
300
400
500
600
700
800
900
The median value of 511 for writing is 12 points lower than the median value of 523
for reading. The lowest value of 472 is 10 points lower, and the lower hinge value of 490.5
is 7.5 points lower. The upper hinge value of 564 is 5 points lower, but the highest value of
591 is 19 points lower. The range for the lower 50% of the data is 39, yet for the upper
50% of the data it is 80. The negative skew is very similar to that of the SAT scores for
reading, but the range of the entire distribution of SAT scores in Writing is not quite as
large.
In an analysis of figure 5, the frequency distribution for the categorical variable
regions, we observe that the South region is comprised of 17 different states, whereas the
northeast region consists of only 9 states. The West and Midwest are comprised of 13
states and 12 states respectively. These four regions make up our population.
Figure 5 - Regions
18
16
14
12
10
8
6
4
2
0
West
Midwest
South
Northeast
Expenditures, Eligibility, and SAT Scores 7
Part II – Regional Comparisons
Having analyzed the distributions for the four continuous variables of interest as
well as the regional breakdown, we now focus on identifying differences between regions
and determining statistical and practical significance.
The ANOVA test, figure 6.1 is used to determine variance between the regions in
regards to expenditures. We conclude from this between-subjects test that there is a
significant difference between the regions because the F ratio is calculated to be 9.75,
which is greater than the critical value for F (3,47) = 2.80.
Figure 6.1 - Tests of Between-Subjects Effects: Expenditures
Dependent Variable: Current expenditure per pupil in average daily attendance in public elem and
sec schools 2005-06
Type III Sum of
Source
Squares
Partial Eta
df
Mean Square
region
120,097,648.80
3
40,032,549.60
Error
192,953,101.83
47
4,105,385.15
Total
5,752,870,321.00
51
313,050,750.63
50
Corrected Total
F
9.75
p
Squared
.00
.38
From figure 6.2, the multiple comparisons post-hoc Tukey HSD test, we can
conclude that the difference in expenditures is statistically significant at the .05 level when
comparing the Northeast region to all three other regions, but that it is not statistically
significant when comparing these other three regions to each other. We can expect to find
statistically similar values for expenditures when comparing states in the West, Midwest,
and South, but we can expect to find significantly increased values for expenditures in the
Northeast states.
Expenditures, Eligibility, and SAT Scores 8
Figure 6.2 - Multiple Comparisons: Expenditures
Current expenditure per pupil in average daily attendance in public elem and
sec schools 2005-06
Tukey HSD
(I) region (J) region
West
Mean
Difference
(I-J)
95% Confidence
Interval
Std.
Error
P
Lower
Bound
Upper
Bound
Midwest
-660.49
811.12
.85
-2820.82
1499.83
South
-475.96
746.52
.92
-2464.23
1512.31
-4356.52*
878.61
.00
-6696.60
-2016.45
West
660.49
811.12
.85
-1499.83
2820.82
South
184.53
763.94
1.00
-1850.14
2219.21
-3696.03*
893.46
.00
-6075.66
-1316.40
475.96
746.52
.92
-1512.31
2464.23
-184.53
763.94
.995
-2219.21
1850.14
-3880.56*
835.25
.000
-6105.17
-1655.96
4356.52*
878.61
.00
2016.45
6696.60
3696.03*
893.46
.00
1316.40
6075.66
South
3880.56* 835.25
.00
*. The mean difference is significant at the .05 level.
1655.96
6105.17
Northeast
Midwest
Northeast
South
West
Midwest
Northeast
Northeast West
Midwest
The mean differences in figure 6.2, comparing the expenditures of Northeast those
of other regions, shows values that range between $3,696.03 and $4,356.52. The practical
significance of this data is that we know the expenditures per pupil in average daily
attendance in public elementary and secondary levels will be comparatively greater for the
9 Northeastern states than for the other 42 states in the nation.
Further research is needed to analyze regional education expenditures, and data
should be focused on categorical expenditures in high-percentage areas of each state
education budget, such as higher education, special education, and teacher salaries. Further
research should also examine differences between rural, urban, and suburban areas.
Expenditures, Eligibility, and SAT Scores 9
Let us now turn our attention to student/teacher ratios. An ANOVA test between
regions indicates an F ratio of 13.08. This is statistically significant, as it is greater than
2.80, the critical value of F (3,47). A Dunnett C test with an alpha level of .05 indicates that
the West region is significantly different than each of the other three regions, and we can
expect a greater student-to-teacher ratio. In comparing this region to other regions, the
mean differences ranges from 2.94 to 5.07. The Northeast has a lower ratio than the other
regions, but the difference was only statistically significant when comparing this region to
the South and the West. Figure 7 illustrates this variance in the mean ratios for each region.
Figure 7 - Descriptive Statistics:
Student/Teacher Ratio
Dependent Variable: Average
pupil/teacher ratio Fall 2005
Std.
region
Mean
Deviation
N
West
17.81
2.90
13
Midwest
14.81
1.71
12
South
14.86
1.28
17
Northeast
12.73
1.46
9
Total
15.23
2.54
51
Because the values for the standard deviation in each region are low, we can find a
practical significance in the data and expect sample states in the region to closely represent
the population mean for that region. However, this research does not account for support
staff and other certificated teacher positions that might influence the difference between
the theoretical student/teacher ratio and the actual average number of students in a
classroom. Further research is necessary to explore the impact of such educational
positions on this ratio for each region.
Expenditures, Eligibility, and SAT Scores 10
Before we focus on SAT scores, it is important to observe and analyze the
data regarding the percentage of eligible students taking the SAT tests in each region. An F
ratio of 16.66, which is greater than the critical value of 2.80 for F(3,47), indicates a highly
significant difference and may be one of the most crucial variables to consider when
making comparisons and inferences about SAT performance. The mean percentage of
students taking the SAT tests in the Northeast, as indicated in figure 8.1, was 81.44. The
means for the other three regions ranged from 12.67 in the Midwest to 40.35 in the South.
It is important to observe the standard deviation as well, which indicates a much smaller,
more predictable range for the Northeastern states, and greater fluctuation in the other
three regions.
Figure 8.1 - Descriptive Statistics: SAT
Eligible
Dependent Variable: Percentage of all
eligible students taking the SAT 200607
Std.
region
Mean
West
33.46
18.85
13
Midwest
12.67
16.75
12
South
40.35
30.89
17
Northeast 81.44
10.33
9
Total
31.12
51
39.33
Deviation
N
A Dunnett C test, figure 8.2, shows statistical significance at the .05 level when
comparing the Northeast to the other three regions, and the difference of the means in
these comparisons is greater than any other comparisons. There is also statistical
significance when comparing the very low mean percentage of the Midwest region to the
South region. It is important to note the confidence intervals for this data as well, which
Expenditures, Eligibility, and SAT Scores 11
indicate a large range in which the mean difference between two regions is likely to fall, but
also clearly indicates a direction for that difference in most comparisons. We can observe
the most obvious and strongest of these relationships by looking at the confidence intervals
when comparing the northeast to any of the other region.
Figure 8.2 - Multiple Comparisons: SAT Eligible
Percentage of all eligible students taking the SAT 2006-07
Dunnett C
95% Confidence
Interval
Mean
(I) region (J) region
West
Difference
Std.
Lower
Upper
(I-J)
Error
Bound
Bound
Midwest
20.80
7.12
-.48
42.07
South
-6.89
9.14
-33.35
19.57
Northeast
-47.98*
6.26
-67.02
-28.95
West
-20.80
7.12
-42.07
.48
South
-27.69*
8.92
-53.59
-1.79
Northeast
-68.78*
5.94
-87.03
-50.53
6.89
9.14
-19.57
33.35
27.69*
8.92
1.79
53.59
-41.09*
8.25
-65.17
-17.01
47.98*
6.26
28.95
67.02
68.78*
5.94
50.53
87.03
South
41.09*
8.25
17.01
*. The mean difference is significant at the .05 level.
65.17
Midwest
South
West
Midwest
Northeast
Northeast West
Midwest
The practical significance of this data lies in the impact that these percentages will
have on the generalizability of conclusions about SAT scores and any correlations involving
this data. This data raises significant questions about the South, the West, and especially
the Midwest regions in terms of whether or not the sample groups taking the SAT tests are
representative of the population.
Expenditures, Eligibility, and SAT Scores 12
Further research is needed to determine if the means being reported for regional
SAT scores are being skewed by the percentages and the high degree of variance between
regions as indicated in this Dunnett C test. Further research might focus the correlation
between an eligible student’s GPA and whether or not they take the SAT tests, or
percentages of SAT test takers and enrollment in higher education.
We can now focus on SAT scores in writing. We can observe from the data in figure
9.1 that the mean SAT score was highest in the Midwest with a value of 564.17 and lowest
in the Northeast with a value of 497.22. The West and the Midewest were clustered near
the mean total for the nation, which has a value of 525.37. It is important to note that the
standard deviation is much lower in the Northeast region, indicating a lesser amount of
variance between scores.
Figure 9.1 - Descriptive Statistics: SAT
Scores in Writing
Dependent Variable: Average writing
SAT score 2005-06
Std.
region
Mean
Deviation
N
West
515.00
25.29
13
Midwest
564.17
31.04
12
South
520.82
39.19
17
Northeast
497.22
11.24
9
Total
525.37
37.63
51
Figure 9.2 shows a statistical significant at the .05 level when comparing the
Midwest with any of the other three regions. The mean difference ranges between
43.34 and 66.94, and indicates a higher statistical probability of greater scores in the
Midwest region. However, the confidence intervals for this data are expansive, indicating a
Expenditures, Eligibility, and SAT Scores 13
high degree of variance when determining the mean scores between states from different
regions.
Figure 9.2 - Multiple Comparisons: SAT Scores in Writing
Average writing SAT score 2005-06
Dunnett C
95% Confidence
Interval
Mean
(I) region (J) region
West
South
Std.
Lower
Upper
e (I-J)
Error
Bound
Bound
-49.17*
11.38
-83.24
-15.10
South
-5.82
11.81
-40.07
28.42
Northeast
17.78
7.95
-6.24
41.80
West
49.17*
11.389
15.10
83.24
South
43.34*
13.06
5.06
81.63
Northeast
66.94*
9.71
37.43
96.46
5.82
11.81
-28.42
40.07
-43.34*
13.06
-81.63
-5.06
23.60
10.22
-6.10
53.30
-17.78
7.95
-41.80
6.24
Midwest
-66.94*
9.71
-96.46
-37.43
South
-23.60
10.22
-53.30
6.10
Midwest
Midwest
Differenc
West
Midwest
Northeast
Northeast West
*. The mean difference is significant at the .05 level.
The practical significance of this data is questionable, based on the issues previously
raised regarding the percentages of eligible students taking the SAT tests in each region. It
is beyond the scope and limitations of this data to infer causation between these variables.
Any conclusions drawn from the initial report involving SAT scores may be distorted
significantly by a gross sampling error of the students taking SAT tests in various states and
regions. We take a closer look at correlations between these and other variables in Part III.
Expenditures, Eligibility, and SAT Scores 14
Part III – Correlation, Scatterplots, and Regression Equations
Here we look to understand correlation and analyze the relationship between
variables by looking at scatterplots and regression equations. This analysis continues to
focus on SAT scores as the dependent variable. The independent variables are
Expenditures, Student/Teacher Ratio, and the percentage of Eligible Students who take the
SAT tests. Each scatterplot shows a negative relationship, as indicated by the downward
slope of the line of best fit. The correlation coefficient will be measured for statistical
significance against the critical value at the .05 alpha level of 0.279, which coincides with
48 degrees of freedom - the closest value we can obtain to our actual calculated df value of
49. It is important to remember that statistical significance is determined by the
correlation coefficient as it approaches +/- 1.00, a perfect correlation. Even if a value is
negative, it can be greater than our critical value. It is also important to remember that we
are looking to make determinations about statistical and practical significance through
correlation, not looking to draw conclusions about causation. Let us look at our first
scatterplot.
SAT Scores
Figure 10 - Expenditures and SAT Scores in Writing
700
600
500
400
300
200
100
0
y = -0.006x + 587.02
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Expenditures
Regression Equation: y = 0.006x + 587.02
Coefficient of Determination: r^2 = 0.1576
Coefficient of Correlation: r = -0.3969
Expenditures, Eligibility, and SAT Scores 15
In figure 10, we see the slight negative correlation between Expenditures and SAT scores.
The slope is a -0.006. The y-intercept is 587.02. The correlation coefficient is -0.396 is greater than
our critical value of 0.279, indicating a statistical significance. This relationship is a weak because
the correlation coefficient has the low value. Most of the values on the scatterplot are clustered
between $8,000 and $12,000 and do not fall very close to the line of best fit. Though statistically
significant, Expenditures are not a strong, predictor of SAT Scores, and this correlation has limited
practical significance. Let us turn our attention to our second set of variables.
Figure 11 - Student/Teacher Ratio and SAT Scores in Writing
700
y = -0.9872x + 540.34
SAT Scores
600
500
400
300
200
100
0
0
5
10
15
Student/Teacher Ratio
20
25
Regression Equation: y = 0.9872x + 540.34
Coefficient of Determination: r^2 = 0.0044
Coefficient of Correlation: r = -0.0662
In looking at the relationship between Student/Teacher Ratio and SAT Scores, we can see in
figure 11 that there is a slight negative correlation. The regression equation calculates the slope at 0.987 and the y-intercept at 540.34. The slight downward direction of the line of best fit again
shows the negative relationship between these two variables. The correlation coefficient is
-0.066. Because this value is less than our critical value of 0.279, we accept the null hypothesis that
there is no statistical significance in the relationship. Let us look at our last scatterplot.
Expenditures, Eligibility, and SAT Scores 16
SAT Scores
Figure 12 - Eligible and SAT Scores in Writing
700
600
500
400
300
200
100
0
y = -1.0516x + 566.74
0
20
40
60
80
100
120
Eligible
Regression Equation: y = -1.0516x + 566.74
Coefficient of Determination: r^2 = 0.7565
Coefficient of Correlation: r = -0.8698
Figure 12 illustrates the negative correlation between the percentage of students
who graduate high school and take the SAT tests and performance on SAT tests in writing.
The correlation coefficient for this relationship is -0.870. This value is much greater than
our critical value of 0.279. It indicates a strong correlation between these two variables as
it approaches -1.00. As we look at the data values along the line of best fit, we can see that
they are either touching the line, or very close to it. The regression equation calculates the
slope to be -1.052 and the y-intercept to be 566.74.
Conclusions
There is a concerning degree of skewness to the variables evaluated in Part I. While
sample error is expected, this data shows enough of a departure from normality to suggest
that some of the sample means are not representative of the population. This skewness was
most obvious in the SAT scores. By evaluating the correlations through scatterplots in Part
III, we can see that there is a strong relationship between the percentage of eligible
students who take the SAT tests and student performance on the SAT writing tests. Further
Expenditures, Eligibility, and SAT Scores 17
research is needed to investigate this relationship. It is necessary to evaluate those states
who have a very low percentage of students taking the SAT tests, yet report a high mean
score on the SAT test in writing, to determine if this mean scores of the sample is
representative of the population, or if it is severely skewed data. In Part II, it was shown
that there was high degree of variability in the percentages of students in different states
and different regions who go on to take the SAT tests. This analysis of the data related to
SAT eligibility percentages and scores raises serious concerns about external validity that
requires further research and evaluation.
Download