FIRST, DO NO HARM: AN ANALYSIS OF ASSESSMENT-BASED ACCOUNTABILITY By Michael J. Moody, Ed.D For the Board of Education Saline County School District 0068 Friend Public Schools April 2014 2 Introduction The following “white-paper” has been developed at the request of the Board of Education of the Friend (Nebraska) Public School District. The intent of the paper is to provide a review of relevant literature and research regarding the effect of high-stakes, assessment-based accountability systems on the teaching/learning process. For the purpose of this paper, an assessment-based accountability system exists if “a test is used to hold individuals or institutions responsible for their performance and has stakes attached to it…” (Supovitz, 2009; p. 213). According to Popham (2001) an assessment is considered to be high-stakes if either (or both) of the following conditions are present: 1. There are significant consequences linked to individual students’ test performance [such as graduation or grade advancement] [and/or] 2. The students’ test scores determine the “instructional success” of the school or district. (p. 33, italics in the original) Nebraska public schools operate under the auspices of two separate, but uniquely interrelated assessment-based accountability systems. Public Law 107-110: the No Child Left Behind Act of 2001 (No Child Left Behind [NCLB], 2002) provides the basis for educational accountability at the Federal level; while the Nebraska state accountability system (mandated by NCLB) is known as NePAS (the Nebraska Performance Accountability System). Accountability indicators for both the NCLB and the NePAS systems are determined primarily through achievement test scores produced on NeSA (Nebraska State Accountability) assessments. NeSA is a “bank” of high-stakes, standardized, criterion-referenced achievement tests 3 encompassing the academic domains of Reading, Writing, Mathematics, and Science (Nebraska Department of Education, 2012). The essence of the paper (first, do no harm) is captured within the context of the following constructs. There is a medical term—iatrogenic--that essentially means an unintended illness or an adverse effect that is induced upon a patient through the actions (often well intended) of a medical doctor. In a similar vein, Madous and Russell (2010/2011) write that “[t]he paradox of high-stakes testing might well be called peiragenics, that is, the negative, unanticipated effects on students, teachers, and schools of well-intended testing policies” (p. 28, italics in the original). High-stakes Standardized Assessments Standardized achievement testing has been part of the educational landscape for well over 60 years. As originally designed and implemented, standardized achievement testing was intended to be diagnostic in nature. Within the diagnostic paradigm, assessment results held enormous potential to identify instructional strengths and weaknesses, as well as possible gaps in a school’s curriculum. In other words, (as initially envisioned) the primary function of standardized assessments was to inform instruction. Over time, as the use of standardized assessments became more common-place, the results of these same assessment instruments have increasingly been utilized as a basis for administrative decision making (such as grouping of students for purposes of instruction, and/or for grade advancement) as well as for political purposes--i.e. accountability measures, (Cohen & Swedlik, 2002; Madaus & Russell, 2010/2011; Popham, 2000; Stiggins, 1997). Within this current paradigm, achievement test results have become more and more an instrument of management and accountability (in other words an instrument of control), while the 4 diagnostic function has been summarily diminished (Kohn, 2000; Madaus & Russell, 2010/2011; Popam, 2000). This evolution in function has been broadly based upon an exaggerated (as well as an inappropriate) level of confidence in the art and science of standardized achievement tests. This escalating ethos of infallibility, as well as a generalized misunderstanding of what standardized achievement tests can and cannot do (DuFour, DuFour, & Eaker, 2008; Kohn, 2000; Popham, 2001; Stiggins, 1997), has lead to accountability-based assessments assuming an unwarranted status as the conservator of standards of educational excellence (Bracey, 1995; Kohn, 2000; Popham, 2001; Stiggins, 1997). It is also noteworthy that Stiggins (1997), a widely respected assessment expert in his own right, has written, “[S]tandardized tests typically are not the precision tools or accurate predictors most think they are. They do not produce high-resolution portraits of student achievement” (p. 352). Diane Ravitch (2010), a former high-ranking educational consultant to the George H. W. Bush administration, and a one-time ardent proponent of test-based accountability, adds to the discourse, stating “The problem with using tests to make important decisions about people’s lives is that standardized tests are not precise instruments” (p. 152). A report by the prestigious National Research Council (2011), highlights the problematic nature of standardized achievement assessments stating, “Although large-scale tests can provide a relatively objective and efficient way to gauge the most valued aspects of student achievement, they are neither perfect nor comprehensive measures….they cover only a subset of the content domain that is being tested” (p. 38, emphasis added). As a case in point, a Nebraska Department of Education (2009) publication indicates that there are 49 fourth-grade Math standards and 78 high school Math standards covering the following 5 Mathematical content areas: (1). Number Sense, (2). Geometry & Measurement, (3). Algebra, and (4). Data Analysis & Probability. However, according to a Data Recognition Corporation (2013) technical manual, the 2013 NeSA Mathematics examination consisted of 55 test items at the fourth grade level (with a 3.0 standard error of measurement) and 60 test items at the 11th grade level (with a 3.1 standard error of measurement). Obviously, the fourth-grade assessment instrument barely provides one test item per standard, while the secondary assessment does not meet the one-to-one standard. This technical information furnished by the Data Recognition Corporation serves to support the following National Research Council (2011) statement, “[T]ests that are typically used to measure performance in education fall short of providing a complete measure of desired educational outcomes in many ways” (p. 47). Within the above context, a widely recognized psychometric phenomenon lends theoretical credence to claims that assessment-based accountability systems are generally flawed, and as such, incapable of producing valid and reliable accountability measures (Goodwin, 2014; Madaus & Russell 2010/2011; Ravitch, 2010). Donald Campbell, a noted psychologist and test developer, and one-time President of the American Psychological Association, proffered the following theory known as Campbell’s Law. According to Campbell (1976), “The more any quantitative social indicator is used for social decisionmaking, the more subject it will be to corruption pressures…[In other words,] the higher the stakes attached to any measure, the less valid that measure becomes” (cited in Goodwin, 2014, p. 78, italics in the original). Friend Public Schools administers two standardized achievement tests annually. The NWEA/MAPS, a low-stakes, norm-referenced, standardized achievement assessment, is 6 administered to all students in grades 3-11 each fall and all students grades 2-11 in the spring. Additionally, Friend students in grades four, eight, and 11 sit for the NeSA Writing examination in January each year, while students in grades three through eight, and in grade 11 take the NeSA Reading and Mathematics tests in April. Students in grades five, eight, and 11 will take the State Science exam also in April. As stated previously, the NeSA assessments are State mandated high-stakes, criterion-referenced standardized achievement test. The school gives the NWEA/MAPS assessments because MAPS assessment results provide the students and staff with a wealth of useful diagnostic data. Additionally, NWEA/MAPS results are generated immediately after the assessments are completed; therefore, the results are both timely and beneficial. NeSA tests, on the other hand, are mandated by the State of Nebraska, and the results from the spring assessments are not received until the next fall. Unfortunately, the logistics of the spring testing/fall reporting paradigm essentially renders the diagnostic potential of the NeSA tests virtually useless. While this assessment format holds very little pedagogical value, the NeSA format does serve the political/control function well. Assessment-based Accountability NeSA assessment scores are the most salient component of the Nebraska Performance Accountability System (NePAS). NePAS incorporates NeSA test scores along with a number of additional indicators (for example, graduation rate, historical growth and improvement test data, as well as the assessment participation rate) to develop a system that assigns a rank-order configuration to all Nebraska public school districts. The state raking system ranges from a rank of 1 (the highest) to a ranking of 249 (the lowest). Apparently, 7 the theoretical (albeit, flawed) framework driving the NePAS system is that the potential of a low ranking will motivate schools to aspire to higher levels of instruction as well as student performance. Standardized achievement assessment (for purposes of accountability) initially surfaced in the early 1990s, and became a major component of public education following the passage of Public Law 107-110: the No Child Left Behind Act of 2001 (Ravitch, 2010). While the intentions of accountability systems such NePAS and NCLB (the Federal system) are certainly noble, (the stated purpose is to improve education, as well as to close the achievement gap) the approach is, quite frankly, ineffective. Within this context, Wiliam (2010) argues, “[T]he systems currently in use have significant shortcomings that call into question some of the interpretations that are routinely based on the scores yielded by these tests” (p. 107). Additionally, in response to the ubiquitous and seemingly pervasive nature of assessment-based accountability, the American Federation of Teachers (AFT) commissioned a study designed to examine the impact of the assessment-based accountability systems. The AFT study indicated that the time students spend taking tests ranged from 20 to 50 hours per year in heavily tested grades. In addition, [their results indicate that] students can spend 60 to more than 110 hours per year in test prep in high-stakes testing grades. (Nelson, 2013; p. 3) Not only has the prevailing practice of using standardized achievement test results as an accountability measure proven to be both time consuming as well as ineffective; there is a growing body of literature that strongly suggests that an assessment-based approach to educational accountability is pedagogically detrimental as well (Au & Gourd, 2013; 8 DuFour, et al., 2008; Madaus & Russell 2010/2011; Popham, 2000; Ravitch, 2010; Stiggins, 1997). Wiliam (2010) proffers that high-stakes standardized assessments have “the potential for a range of unintended outcomes, many of which will have a negative impact.” (p. 120). In a similar vein, Ravitch, (2010) adds credence to the detrimental effect of highstakes assessment with the following: “[T]est-based accountability has corrupted education, narrowed the curriculum, and distorted the goals of schooling.” (p. 161). Assessment-based accountability systems such as NCLB and NePAS place significant pressure on schools and teachers to increase test scores (or else!). Within this context, as the educational community rallied around the accountability agenda, specific unintended outcomes have invariably surfaced (Goodwin, 2014; Madaus & Russell 2010/2011; National Research Council, 2011; Popham, 2000; Ravitch, 2010). For example, according to Au & Gourd (2013), “[W]e know that high-stakes testing is controlling both what and how subjects are taught...” (p. 17). As a direct result of increasing pressure to produce acceptable test scores, curricular offerings are being steadily reduced and restricted—in other words, what is tested becomes what is taught (Au & Gourd, 2013; Kohn, 2000; National Research Council, 2011; Popham, 2000; Ravitch, 2010). A report by the National Research Council (2011) addresses an additional unintended consequence regarding the impact of test-based accountability on curriculum and instruction stating, “The likely outcome is that performance on the untested material will show less improvement or decrease, but this difference will be invisible because the material is not covered by the test” p. 39). In response to this disturbing trend, Au & Gourd (2013) also stress that “[U]ntested subjects are being reduced in the curriculum and teachers nationwide are moving toward more teacher-centered, lecture-based pedagogies that encourage rote learning…” (p. 17). In 9 addition to narrowing curriculum, research has shown that teachers (especially those in the tested grades and subject areas) are devoting an increasing amount instructional time to “a steadier diet of test preparation activities that distract from the larger goals of educating students with the more complex skills and habits to complete in the global economy and a more sophisticated democratic society” (Supovitz, 2009, p. 221). Shepard (2000) concurs stating, “[E]xternally imposed testing programs prevent and drive out thoughtful classroom practices” (p. 9). No Child Left Behind (2001-2014) Public Law 107-110 (the No Child Left Behind Act of 2001—NCLB) is, quite simply, a reauthorization of the Elementary and Secondary Education Act (ESEA) of 1965. ESEA, a major component of President Johnson’s “War on Poverty” initiative, established the Title I program that was designed to provide additional instructional supports to students in “targeted” schools that were determined to be “below grade-level” in reading and mathematics. The 2001 reauthorization of the ESEA (now known as the No Child Left Behind Act of 2001) was passed by the United States Congress with strong bi-partisan support, and signed into law on January 8, 2002. Arguably, the most significant facet of NCLB is that it established a federally mandated school accountability system based almost exclusively upon the results of standardized achievement tests (SATs). According to the Center on Education Policy (2003), NCLB contained “two major purposes: [1] to raise student achievement across the board and [2] to eliminate the achievement gap between students from different backgrounds” (p. iii). The law further mandates that schools must demonstrate adequate yearly progress (AYP) in order to satisfy the stated goal that all students score at the “proficient” level in both reading and math by the year 2014. (Center 10 on Education Policy, 2003; DuFour, et al., 2008; Ellis, 2007; Lee & Reeves, 2012; Ravitch, 2010). As the text of this paper would indicate, there exists a considerable amount of professional literature that questions the overall effectiveness of assessment-based accountability systems like NCLB and NePAS. With the NCLB 2014 “due date” for the “100 per cent proficiency” mandate looming large, Dee and Jacob (2010), writing for the Brookings Institute, stated, “Given the national scope of the policy, it is difficult to reach definitive conclusions about its impact” (p. 190). Additionally, Ellis (2007) adds a cautionary note stating that research data “supporting the purported success of No Child Left Behind are ambivalent at best” (p. 222). That being said, much of the research directed at identifying a relationship between highstakes assessments (mandated by NCLB) and improved student achievement has focused upon comparisons between test scores generated on state developed highstakes assessments and a relatively low-stakes national assessment known as the National Assessment of Educational Progress (NAEP)—sometimes referred to as The Nation’s Report Card. According to a Nebraska Department of Education (n.d.a) web page, “The National Assessment of Educational Progress…is designed to measure what students across the nation know and can do in ten subject areas, including mathematics, reading, writing, and science” (¶ 1). It is important to note that NAEP was created in 1969 by the United States Congress to “provide a common national yardstick for accurately evaluating the performance of American students” (NDE, n.d.-a, (¶ 4). While it should only seem logical that NAEP scores should closely mirror corresponding state level accountability-linked assessment scores, such has not been the case. In fact, as 11 detailed above, academic improvement data related to high-stakes, assessment-based accountability systems mandated by NCLB has proved to be both mixed, confusing, and questionable (DuFour, et al., 2008; Lee, 2008; Lee & Reeves, 2012; Fuller, Wright, Gesicki, & Kang, 2007; Nichols, Glass, & Berliner, 2006/2012). In this respect, Ravitch (2010) has indicated that the intense pressures to improve test scores at the state and local level may well lead to significant test score inflation and that said inflation may well be strongly correlated to “coaching or cheating or manipulating the pool of test takers” (p. 161). She states further, The starkest display of score inflation is the contrast between the statereported test scores, which have been steadily (and sometimes sharply) rising since the passage of NCLB, and the scores registered by states on NAEP, the federal assessment program. (p. 161) As a case in point, the Nebraska State of the Schools Report indicates that for the 2012-2013 NeSA assessment cycle, 79% of Nebraska’s fourth grades students scored at the proficient level (NDE, n.d.-b), while the 2013 State Snapshot Reading Report (Nation’s Report Card, 2013) indicates that only 37% of Nebraska’s fourth grade students scored at the NAEP proficient level. Similar conflicting results are reported for fourth and eighth grade Mathematics, and eighth grade reading as well (NRP, 2013). With the Nebraska results serving as a specific example, Fuller, et al., (2007) proclaim, “[S]tate results continue to exaggerate the percentage of fourth graders deemed proficient or above in reading and math when compared with NAEP results” (p. 275). Within a backdrop of conflicting assessment data, Arizona State University researchers, Nichols, et al., (2006) stated, “To date there is no consistent evidence that high- 12 stakes testing works to increase achievement” (p. 6). Within this context, a review of relevant research and literature indicates that student achievement has not generally experienced appreciable gains since NCLB was signed into law. (Dee & Jacob, 2010; DuFour, et al., 2008; Ellis, 2007; Lee, 2008; Lee & Reeves, 2012; Nichols, et al., 2006/2012; Ravitch, 2010). In fact, the research literature indicates that student gains in reading have remained (at best) flat, and may well have declined since the implementation of NCLB. In terms of student achievement gains in the area of Mathematics, the data are more favorable (especially for fourth grade Hispanic students); however, it appears that academic improvement in Mathematics was more pronounced prior to NCLB, and improvement in the area of Math seems to have slowed down or even stalled (especially at the eighth grade level). (Dee & Jacob, 2010; DuFour, et al., 2008; Ellis, 2007; Lee, 2008; Lee & Reeves, 2012; Nichols, et al., 2006/2012; Ravitch, 2010). In regard to the apparent impact of NCLB on gains in student achievement, Nichols, et al., (2012) proffer the following summary statement, “a pattern seems to have emerged that suggests that highstakes testing has little or no relationship to reading achievement, and a weak to moderate relationship to math, especially in fourth grade but only for certain student groups” (p. 3), with respect to the specific NCLB goal of closing the “achievement gap,” Fuller, et al., (2007), offer the following observation: When it comes to narrowing achievement gaps, the historical patterns are similar. For reading, ethnic gaps on the NAEP closed steadily from the early 1970s through 1992, then widened in 1994, and then narrowed through 2002. But no further narrowing has occurred since 2002. For math, the Black-White gap narrowed by over half a grade level between 1992 and 2003, but no 13 further progress was observed in 2005. The Latino-White gap has continued to close with a bit of progress post-NCLB—the one bright spot on the equity front. (p. 275) Given the apparent failure of assessment-based accountability systems (such as NePAS and NCLB) to positively impact student achievement, Nichols, et al., (2006), provide the following poignant summary statement: In light of the rapidly growing body of evidence of the deleterious unintended effects of high-stakes testing, and the fact that our study finds no dependable or compelling evidence that the pressure associated with high-stakes testing leads to increased achievement, there is no reason to continue the practice of high-stakes testing. (p. 52) 14 References Au, W. & Gourd, K. (2013) Asinine assessment: Why high-stakes testing is bad for everyone, including English teachers. English Journal. 103.1: 14-19. Bracey, G. W. (1995). Final exam: A study of the perpetual scrutiny of American education. Bloomington, IA: TECHNOS Press. Center on Education Policy (2003). From the capital to the classroom: State and Federal efforts to implement the No Child Left Behind Act. Available on-line at http://www.cep-dc.org/displayDocument.cfm?DocumentID=298 Cohen, R. J. and M. E. Swerdlik (2002). Psychological testing and assessment: An introduction to test and measurement (5th Ed.). Boston: McGraw-Hill. Data Recognition Corporation (August 2013). Spring 2013 Nebraska State Accountability (NeSA) Reading, Mathematics, and Science: Technical Report. Available on-line at http://www.education.ne.gov/assessment/pdfs/Final_2013_NeSA_Technical_Report. pdf. Dee, T. S. & Jacob, B. A. (Fall 2010) The impact of no child left behind on students, teachers and schools, 149-207. Brookings Papers on Economic Activity. Available on-line at http://faculty.smu.edu/Millimet/classes/eco4361/readings/dee%20et% 20al%202010.pdf. DuFour, R., DuFour, R. & Eaker, R. (2008). Revisiting professional learning communities at work: New insights for improving schools. Bloomington, IN: Solution Tree. Ellis, C. R. (2007). No child left behind—A critical analysis: “A nation at greater risk.” Curriculum and Teaching Dialogue. 9(1&2), pp. 221-233. 15 Fuller, B., Wright, J., Gesicki, K., & Kang, E. (2007). Gauging growth: How to judge No Child Left Behind. Educational Researcher. 36(5), pp. 268-278. Goodwin, B. (2014) Better Tests don’t guarantee better instruction. Educational Leadership. 71(6), pp. 78-80. Kohn, A. (2000). The case against standardized testing: Raising the scores, ruining the schools. Portsmouth, NH: Heinemann. Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence from cross-state causal-comparative and correlational studies. Review of Educational Research. 78(3), pp. 608-644. Lee, J. & Reeves, T. (2012). Revisiting the impact of NCLB high-stakes school accountability, capacity, and resources: State NAEP 1990-2009 reading and math achievement gaps and trends. Educational Evaluation and Policy Analysis. 34(2), pp. 209-231. Madaus, G. & Russell, M. (2010/2011). Paradoxes of high-stakes testing. Journal of Education 190(1/2), pp. 21-30. National Research Council (2011). Incentives and test-based accountability in public education. M. Hout & S.W. Elliott (Eds). Washington, DC: The National Academies Press. Nebraska Department of Education. (n.d.-a.). National Assessment of Educational Progress (NAEP). Available on-line at http://www.education.ne.gov/naep/ Nebraska Department of Education (n.d.-b.) 2012-2013 State of the schools report: A report on Nebraska public schools. Available on-line at http://reportcard.education.ne.gov/ 16 Nebraska Department of Education (2009). Nebraska Mathematics Standards. Available on –line at http://www.education.ne.gov/math/PDFs/Math_StandardsAdopted10-809Horizontal.pdf Nebraska Department of Education. (August 9, 2012). Nebraska Performance Accountability System [NePAS]. Available on-line at http://www.education.ne.gov/ Assessment/pdfs/Nebraska_Performance_Accountability_System_Aug_2012.pdf Nelson, H. (2013) Testing more, teaching less: What America’s obsession with student testing costs in money and lost instructional time. American Federation of Teachers. Available on line at www.aft.org/pdfs/teachers/testingmore2013.pdf. Nichols, S. L., Glass, G. V. & Berliner, D. C. (2006). High-stakes testing and student achievement: Does accountability pressure increase student learning? Education Policy Analysis Archives. 14(1), pp. 1-95. Nichols, S. L., Glass, G. V. & Berliner, D. C. (2012). High-stakes testing and student achievement: Updated analyses with NAEP data. Education Policy Analysis Archives. 20(20), pp. 1-30. No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 (2002). Popham, W.J. (2001). The truth about testing: An educator's call to action. Alexandria, VA: ASCD. Ravitch, D. (2010). The death and life of the great American school system: How testing and choice are undermining education. New York: Basic Books. Shepard, L. A. (October, 2000). The role of assessment in a learning culture. Educational Researcher. 29(7), pp. 4-14. 17 Stiggins, R. J. (1997). Student-centered classroom assessment (2nd Ed.). Upper Saddle River, NJ: Prentice-Hall, Inc. Supovitz, J. (2009). Can high stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change. 10, pp. 211-227. The Nation’s Report Card. (2013). Reading: 2013 State Snapshot Report. Available on-line at http://www.education.ne.gov/naep/PDFs/2014_NAEP_grade_4_reading.pdf Wiliam, D. (2010). Standardized testing and school accountability. Educational Psychologist. 45(2), pp. 107-122. Questions or comments regarding the content of this publication may be directed to the following: Dr. Michael Moody Superintendent of Schools Friend Public Schools 501 S. Main St., PO Box 67 Friend, NE 68359 Email: m.moody@friendschool.org