defended its school accountability system

advertisement
A Response to “Oklahoma School Grades: Hiding “Poor” Achievement
25 November 2013
In October 2013, the staff of The Oklahoma Center for Education Policy (University of Oklahoma) and
The Center for Educational Research and Evaluation (Oklahoma State University) released “Oklahoma
School Grades: Hiding “Poor” Achievement”, a paper critiquing Oklahoma’s A-F Report Card. Their
research outlined three primary criticisms of the A-F Report Card. First, the authors argued that
differences between predicted A-F letter grades are small and effectively meaningless. Secondly, they
claimed that summarizing a school’s test performance on math, reading, and science in a single letter
grade is neither a clear nor a reliable indicator of school performance. Finally, they argued that letter
grades mask achievement gaps between poor and minority children and their wealthier, non-minority
peers.
We think that the conclusions of this study may be misleading to the public and have prepared this
response in order to clarify some of the claims made by these researchers. To investigate the validity of
the three aforementioned criticisms, we conducted an analysis similar to the one completed by the
OSU/OU researchers. In contrast to the OSU/OU researchers who ran the analysis using a small (3% of
the entire data), non-representative sample of state data, our analysis used data from the entire state in
order to get more reliable estimates. Our analysis resulted in the following conclusions:

Actual differences between A and F schools were much higher than the small differences
reported by the OSU/OU. The difference for math, for instance, was approximately 14 points,
on average.

These difference between A and F schools meant a lot in terms of student learning. Students in
“F” schools, for instance, are about one and half years of learning behind students in an “A”
schools, even after taking into account the impact of factors such as poverty and ethnicity.

Analysis on the entire dataset did not result in the same counter-intuitive results that “F”
schools served poor and minority students better than “A” schools. Instead we found that while
racial and income gaps existed, “A” schools served all types of students better than “F” schools,
on average.
The following pages summarize our response to this study and discuss our findings in greater detail.
1
Claim 1: Very Small Differences Predicted By Letter Grades
The article claimed that when school raw scores for reading, math, and science were averaged, three to
six correct responses separated “A” from “F” schools on 50 question tests. We find this statement
misleading. The 3-6 question differences they report is after controlling for race, poverty and prior
achievement. Before adjusting for these factors, the differences between A and F schools were much
larger. The difference for math, for instance, was approximately 14 points, on average, as shown in
Table 1 below. While it may be appropriate to control for such factors, we do not think that this article
makes this point entirely clear and wanted to highlight the fact that these differences are after
controlling for the aforementioned factors considered to be outside of a school’s control.
Table 1. Unadjusted Differences in Mean Correct Response by School Grade
School
Raw Math
Grade
Score1
A-B
2.3
A-C
4.5
A-D
9.7
A-F
14.5
More important, however, the authors argue that the 3-6 question differences between A and F schools
are small effects. Although when looking at the raw scores, these effects do seem small, comparisons
based on raw scores are difficult to interpret because they require knowledge about the difficulty of the
test and how the scores were spread above and below the mean for each grade and subject. Given
these challenges, standardized scores2 are much more appropriate. The use of standardized scores
ensures more consistent measures across grade and subjects and thus makes it possible to interpret raw
scores and better compare the scores of different students on different tests, on different subjects and
in different grades.
1
To ensure a more accurate comparison, this analysis uses OCCT scores only (OMAAP and OAAP scores are
omitted).
2
A standardized z-score represents both the relative position of an individual score in a distribution as compared
to the mean and the variation of scores in the distribution. A negative z-score indicates the score is below the
distribution mean. A positive z-score indicates the score is above the distribution mean. Standardized scores are
calculated by subtracting the mean from the individual score and dividing by the standard deviation. In this way,
standardized scores result in scores that are directly comparable within and between different groups of cases.
2
As shown in Table 2, when looking at standardized scores, we see a difference of .10 standard deviations
from A – B schools, .27 from A – C schools, .47 from A – D schools and .64 from A – F schools, even when
controlling for factors outside of the school control such as race/ethnicity, gender, free lunch status and
prior test achievement. In education, such differences are considered large and meaningful. Several
studies (for example Kane, Rockoff, and Staiger, 2008 ), for instance, find that the value-added of the
average 3rd or 4th grade teacher is .06 to .09 standard deviations above novice teachers. In these terms,
the impact of being in an A versus F school is as much as ten times the impact of having a new versus
experienced teacher.
Table 2. Mean Achievement Differences by School Grade
School
Grade
A-B
A-C
A-D
A-F
Months of
Learning
2
6
11
15
Another commonly used way to look at this difference is in months of schooling3,4.
Using this
conversion, students in “A” schools were approximately 2 months ahead of students in “B” schools, 6
months ahead of students in “C” schools, 11 months ahead of students in “D” schools, and 15 months
ahead of students in “F” schools. In other words, the difference between being in an “A” and “F”
schools is over two academic years of learning. Clearly, when looking at differences between A – F
schools relative to other factors that affect classroom learning such as the experience level of the
teacher or in months of learning, the differences between them are not small, insignificant, or explained
by chance.
3
According to Hill et al. (2007), the average effect size from nationally normed tests for 3-8th graders ranged from
.30 - .56 standard deviations, with an average of .422.
4
We use this conversion factor for ease of explanation, fully recognizing that the actual factor is likely to vary by
test, grade level, and skill. For instance, learning gains in math and reading comprehension are smaller in later
grades relative to early grades.
3
Claim 2: Classification Error
Secondly, the authors argued that using a single letter grade to summarize overall school performance is
neither a clear nor a reliable indicator of school performance. To support this claim, they point out that
there were schools with lower overall letter grades that had higher achievement in certain subjects than
schools with higher overall letter grades. While it is true that some schools with lower overall schools
scored higher on certain subjects than schools with higher overall scores and vice-versa, we do not
agree with the author’s conclusion that a single summative grade is meaningless.
The purpose of the single summative grade is to provide parents and citizens with a clear, easy-tounderstand and concise indicator of overall performance. Performance indices and letter grades are
also provided for Language Arts, Mathematics, Science, Social Studies/History/Geography, and Writing
in order to provide parents and citizens with additional information on school performance by subject.
Statistics and letter grades for various factored related to the school environment including attendance,
dropout rate, and parent and community engagement also provide parents with information on nonacademic factors of interest.
Thus, just as students are given a GPA which summarizes their overall academic performance as well as
individual letter grades in each subject, the A-F Report Card also provides parents with grades for
individual subjects and an overall GPA. Given that grades for each subject are reported in addition to
the GPA, the authors of the OU/OSU study’s claim that A – F Report Cards collapse school performance
into one relatively useful and meaningless indicator of performance is misleading. Their examples, of
low-achieving schools outperforming relatively higher-achieving schools to illustrate this point,
moreover, is also concerning. While on average, A or B schools outperform C or D schools, that doesn’t
mean that they outperform them in every subject. This would be like expecting a student with a 3.0 or
“B” average to outperform all of his or her peers with a lower overall GPA in every subject. Just as
individual students have unique strengths, so do schools.
4
Claim 3: Achievement Gaps
Third, the authors of this article argue that letter grades hide low test performance by poor and minority
children. They claimed that in their analysis, minority and poor children tested highest in “D” and “F”
schools and lowest in “A” and “B” schools. Our analysis, however, did not result in the same findings.
Instead, as demonstrated in Figures 1 and 2, we found that while income and racial achievement gaps
do exist, all subgroups of students experienced increased test scores as the quality of the school (as
measured by the A-F grading system) increased. Such differences between our analysis and the analysis
conducted by the staff at OU/OSU may potentially be explained by their small sample size, which
included a non-representative sample of only 3% of schools statewide and only three “A” and five “F”
schools. The extremely small number of “A” and “F” schools in this sample makes it nearly impossible to
draw larger conclusions about the quality of these schools and thus makes their results particularly
prone to the counter-intuitive claims made in their paper.
Raw OCCT Score
Figure 1. Mean Math Achievement by FRL Status
40
35
30
25
20
15
10
5
0
A
B
C
School Grade
FRL
5
Non-FRL
D
F
Figure 2. Mean Math Achievement by Minority Status
Raw OCCT Score
40
30
20
10
0
A
B
C
Minority
D
Non-Minority
6
F
Download