A Response to “Oklahoma School Grades: Hiding “Poor” Achievement 25 November 2013 In October 2013, the staff of The Oklahoma Center for Education Policy (University of Oklahoma) and The Center for Educational Research and Evaluation (Oklahoma State University) released “Oklahoma School Grades: Hiding “Poor” Achievement”, a paper critiquing Oklahoma’s A-F Report Card. Their research outlined three primary criticisms of the A-F Report Card. First, the authors argued that differences between predicted A-F letter grades are small and effectively meaningless. Secondly, they claimed that summarizing a school’s test performance on math, reading, and science in a single letter grade is neither a clear nor a reliable indicator of school performance. Finally, they argued that letter grades mask achievement gaps between poor and minority children and their wealthier, non-minority peers. We think that the conclusions of this study may be misleading to the public and have prepared this response in order to clarify some of the claims made by these researchers. To investigate the validity of the three aforementioned criticisms, we conducted an analysis similar to the one completed by the OSU/OU researchers. In contrast to the OSU/OU researchers who ran the analysis using a small (3% of the entire data), non-representative sample of state data, our analysis used data from the entire state in order to get more reliable estimates. Our analysis resulted in the following conclusions: Actual differences between A and F schools were much higher than the small differences reported by the OSU/OU. The difference for math, for instance, was approximately 14 points, on average. These difference between A and F schools meant a lot in terms of student learning. Students in “F” schools, for instance, are about one and half years of learning behind students in an “A” schools, even after taking into account the impact of factors such as poverty and ethnicity. Analysis on the entire dataset did not result in the same counter-intuitive results that “F” schools served poor and minority students better than “A” schools. Instead we found that while racial and income gaps existed, “A” schools served all types of students better than “F” schools, on average. The following pages summarize our response to this study and discuss our findings in greater detail. 1 Claim 1: Very Small Differences Predicted By Letter Grades The article claimed that when school raw scores for reading, math, and science were averaged, three to six correct responses separated “A” from “F” schools on 50 question tests. We find this statement misleading. The 3-6 question differences they report is after controlling for race, poverty and prior achievement. Before adjusting for these factors, the differences between A and F schools were much larger. The difference for math, for instance, was approximately 14 points, on average, as shown in Table 1 below. While it may be appropriate to control for such factors, we do not think that this article makes this point entirely clear and wanted to highlight the fact that these differences are after controlling for the aforementioned factors considered to be outside of a school’s control. Table 1. Unadjusted Differences in Mean Correct Response by School Grade School Raw Math Grade Score1 A-B 2.3 A-C 4.5 A-D 9.7 A-F 14.5 More important, however, the authors argue that the 3-6 question differences between A and F schools are small effects. Although when looking at the raw scores, these effects do seem small, comparisons based on raw scores are difficult to interpret because they require knowledge about the difficulty of the test and how the scores were spread above and below the mean for each grade and subject. Given these challenges, standardized scores2 are much more appropriate. The use of standardized scores ensures more consistent measures across grade and subjects and thus makes it possible to interpret raw scores and better compare the scores of different students on different tests, on different subjects and in different grades. 1 To ensure a more accurate comparison, this analysis uses OCCT scores only (OMAAP and OAAP scores are omitted). 2 A standardized z-score represents both the relative position of an individual score in a distribution as compared to the mean and the variation of scores in the distribution. A negative z-score indicates the score is below the distribution mean. A positive z-score indicates the score is above the distribution mean. Standardized scores are calculated by subtracting the mean from the individual score and dividing by the standard deviation. In this way, standardized scores result in scores that are directly comparable within and between different groups of cases. 2 As shown in Table 2, when looking at standardized scores, we see a difference of .10 standard deviations from A – B schools, .27 from A – C schools, .47 from A – D schools and .64 from A – F schools, even when controlling for factors outside of the school control such as race/ethnicity, gender, free lunch status and prior test achievement. In education, such differences are considered large and meaningful. Several studies (for example Kane, Rockoff, and Staiger, 2008 ), for instance, find that the value-added of the average 3rd or 4th grade teacher is .06 to .09 standard deviations above novice teachers. In these terms, the impact of being in an A versus F school is as much as ten times the impact of having a new versus experienced teacher. Table 2. Mean Achievement Differences by School Grade School Grade A-B A-C A-D A-F Months of Learning 2 6 11 15 Another commonly used way to look at this difference is in months of schooling3,4. Using this conversion, students in “A” schools were approximately 2 months ahead of students in “B” schools, 6 months ahead of students in “C” schools, 11 months ahead of students in “D” schools, and 15 months ahead of students in “F” schools. In other words, the difference between being in an “A” and “F” schools is over two academic years of learning. Clearly, when looking at differences between A – F schools relative to other factors that affect classroom learning such as the experience level of the teacher or in months of learning, the differences between them are not small, insignificant, or explained by chance. 3 According to Hill et al. (2007), the average effect size from nationally normed tests for 3-8th graders ranged from .30 - .56 standard deviations, with an average of .422. 4 We use this conversion factor for ease of explanation, fully recognizing that the actual factor is likely to vary by test, grade level, and skill. For instance, learning gains in math and reading comprehension are smaller in later grades relative to early grades. 3 Claim 2: Classification Error Secondly, the authors argued that using a single letter grade to summarize overall school performance is neither a clear nor a reliable indicator of school performance. To support this claim, they point out that there were schools with lower overall letter grades that had higher achievement in certain subjects than schools with higher overall letter grades. While it is true that some schools with lower overall schools scored higher on certain subjects than schools with higher overall scores and vice-versa, we do not agree with the author’s conclusion that a single summative grade is meaningless. The purpose of the single summative grade is to provide parents and citizens with a clear, easy-tounderstand and concise indicator of overall performance. Performance indices and letter grades are also provided for Language Arts, Mathematics, Science, Social Studies/History/Geography, and Writing in order to provide parents and citizens with additional information on school performance by subject. Statistics and letter grades for various factored related to the school environment including attendance, dropout rate, and parent and community engagement also provide parents with information on nonacademic factors of interest. Thus, just as students are given a GPA which summarizes their overall academic performance as well as individual letter grades in each subject, the A-F Report Card also provides parents with grades for individual subjects and an overall GPA. Given that grades for each subject are reported in addition to the GPA, the authors of the OU/OSU study’s claim that A – F Report Cards collapse school performance into one relatively useful and meaningless indicator of performance is misleading. Their examples, of low-achieving schools outperforming relatively higher-achieving schools to illustrate this point, moreover, is also concerning. While on average, A or B schools outperform C or D schools, that doesn’t mean that they outperform them in every subject. This would be like expecting a student with a 3.0 or “B” average to outperform all of his or her peers with a lower overall GPA in every subject. Just as individual students have unique strengths, so do schools. 4 Claim 3: Achievement Gaps Third, the authors of this article argue that letter grades hide low test performance by poor and minority children. They claimed that in their analysis, minority and poor children tested highest in “D” and “F” schools and lowest in “A” and “B” schools. Our analysis, however, did not result in the same findings. Instead, as demonstrated in Figures 1 and 2, we found that while income and racial achievement gaps do exist, all subgroups of students experienced increased test scores as the quality of the school (as measured by the A-F grading system) increased. Such differences between our analysis and the analysis conducted by the staff at OU/OSU may potentially be explained by their small sample size, which included a non-representative sample of only 3% of schools statewide and only three “A” and five “F” schools. The extremely small number of “A” and “F” schools in this sample makes it nearly impossible to draw larger conclusions about the quality of these schools and thus makes their results particularly prone to the counter-intuitive claims made in their paper. Raw OCCT Score Figure 1. Mean Math Achievement by FRL Status 40 35 30 25 20 15 10 5 0 A B C School Grade FRL 5 Non-FRL D F Figure 2. Mean Math Achievement by Minority Status Raw OCCT Score 40 30 20 10 0 A B C Minority D Non-Minority 6 F