first, do no harm: an analysis of assessment

advertisement
FIRST, DO NO HARM: AN ANALYSIS OF ASSESSMENT-BASED
ACCOUNTABILITY
By
Michael J. Moody, Ed.D
For the Board of Education
Saline County School District 0068
Friend Public Schools
April 2014
2
Introduction
The following “white-paper” has been developed at the request of the Board of Education of
the Friend (Nebraska) Public School District. The intent of the paper is to provide a review
of relevant literature and research regarding the effect of high-stakes, assessment-based
accountability systems on the teaching/learning process. For the purpose of this paper, an
assessment-based accountability system exists if “a test is used to hold individuals or
institutions responsible for their performance and has stakes attached to it…” (Supovitz,
2009; p. 213). According to Popham (2001) an assessment is considered to be high-stakes if
either (or both) of the following conditions are present:
1. There are significant consequences linked to individual students’ test
performance [such as graduation or grade advancement]
[and/or]
2. The students’ test scores determine the “instructional success” of the
school or district. (p. 33, italics in the original)
Nebraska public schools operate under the auspices of two separate, but
uniquely interrelated assessment-based accountability systems. Public Law 107-110:
the No Child Left Behind Act of 2001 (No Child Left Behind [NCLB], 2002)
provides the basis for educational accountability at the Federal level; while the
Nebraska state accountability system (mandated by NCLB) is known as NePAS (the
Nebraska Performance Accountability System). Accountability indicators for both
the NCLB and the NePAS systems are determined primarily through achievement
test scores produced on NeSA (Nebraska State Accountability) assessments. NeSA
is a “bank” of high-stakes, standardized, criterion-referenced achievement tests
3
encompassing the academic domains of Reading, Writing, Mathematics, and Science
(Nebraska Department of Education, 2012).
The essence of the paper (first, do no harm) is captured within the context of the
following constructs. There is a medical term—iatrogenic--that essentially means an
unintended illness or an adverse effect that is induced upon a patient through the actions
(often well intended) of a medical doctor. In a similar vein, Madous and Russell
(2010/2011) write that “[t]he paradox of high-stakes testing might well be called
peiragenics, that is, the negative, unanticipated effects on students, teachers, and schools of
well-intended testing policies” (p. 28, italics in the original).
High-stakes Standardized Assessments
Standardized achievement testing has been part of the educational landscape for well over
60 years. As originally designed and implemented, standardized achievement testing was
intended to be diagnostic in nature. Within the diagnostic paradigm, assessment results held
enormous potential to identify instructional strengths and weaknesses, as well as possible
gaps in a school’s curriculum. In other words, (as initially envisioned) the primary function
of standardized assessments was to inform instruction. Over time, as the use of
standardized assessments became more common-place, the results of these same assessment
instruments have increasingly been utilized as a basis for administrative decision making
(such as grouping of students for purposes of instruction, and/or for grade advancement) as
well as for political purposes--i.e. accountability measures, (Cohen & Swedlik, 2002;
Madaus & Russell, 2010/2011; Popham, 2000; Stiggins, 1997). Within this current
paradigm, achievement test results have become more and more an instrument of
management and accountability (in other words an instrument of control), while the
4
diagnostic function has been summarily diminished (Kohn, 2000; Madaus & Russell,
2010/2011; Popam, 2000).
This evolution in function has been broadly based upon an exaggerated (as well as an
inappropriate) level of confidence in the art and science of standardized achievement tests.
This escalating ethos of infallibility, as well as a generalized misunderstanding of what
standardized achievement tests can and cannot do (DuFour, DuFour, & Eaker, 2008; Kohn,
2000; Popham, 2001; Stiggins, 1997), has lead to accountability-based assessments
assuming an unwarranted status as the conservator of standards of educational excellence
(Bracey, 1995; Kohn, 2000; Popham, 2001; Stiggins, 1997). It is also noteworthy that
Stiggins (1997), a widely respected assessment expert in his own right, has written,
“[S]tandardized tests typically are not the precision tools or accurate predictors most think
they are. They do not produce high-resolution portraits of student achievement” (p. 352).
Diane Ravitch (2010), a former high-ranking educational consultant to the George H. W.
Bush administration, and a one-time ardent proponent of test-based accountability, adds to
the discourse, stating “The problem with using tests to make important decisions about
people’s lives is that standardized tests are not precise instruments” (p. 152). A report by
the prestigious National Research Council (2011), highlights the problematic nature of
standardized achievement assessments stating, “Although large-scale tests can provide a
relatively objective and efficient way to gauge the most valued aspects of student
achievement, they are neither perfect nor comprehensive measures….they cover only a
subset of the content domain that is being tested” (p. 38, emphasis added). As a case in
point, a Nebraska Department of Education (2009) publication indicates that there are 49
fourth-grade Math standards and 78 high school Math standards covering the following
5
Mathematical content areas: (1). Number Sense, (2). Geometry & Measurement, (3).
Algebra, and (4). Data Analysis & Probability. However, according to a Data Recognition
Corporation (2013) technical manual, the 2013 NeSA Mathematics examination consisted of
55 test items at the fourth grade level (with a 3.0 standard error of measurement) and 60 test
items at the 11th grade level (with a 3.1 standard error of measurement). Obviously, the
fourth-grade assessment instrument barely provides one test item per standard, while the
secondary assessment does not meet the one-to-one standard. This technical information
furnished by the Data Recognition Corporation serves to support the following National
Research Council (2011) statement, “[T]ests that are typically used to measure performance
in education fall short of providing a complete measure of desired educational outcomes in
many ways” (p. 47).
Within the above context, a widely recognized psychometric phenomenon lends
theoretical credence to claims that assessment-based accountability systems are generally
flawed, and as such, incapable of producing valid and reliable accountability measures
(Goodwin, 2014; Madaus & Russell 2010/2011; Ravitch, 2010). Donald Campbell, a noted
psychologist and test developer, and one-time President of the American Psychological
Association, proffered the following theory known as Campbell’s Law. According to
Campbell (1976), “The more any quantitative social indicator is used for social decisionmaking, the more subject it will be to corruption pressures…[In other words,] the higher the
stakes attached to any measure, the less valid that measure becomes” (cited in Goodwin,
2014, p. 78, italics in the original).
Friend Public Schools administers two standardized achievement tests annually. The
NWEA/MAPS, a low-stakes, norm-referenced, standardized achievement assessment, is
6
administered to all students in grades 3-11 each fall and all students grades 2-11 in the
spring. Additionally, Friend students in grades four, eight, and 11 sit for the NeSA Writing
examination in January each year, while students in grades three through eight, and in grade
11 take the NeSA Reading and Mathematics tests in April. Students in grades five, eight,
and 11 will take the State Science exam also in April. As stated previously, the NeSA
assessments are State mandated high-stakes, criterion-referenced standardized achievement
test.
The school gives the NWEA/MAPS assessments because MAPS assessment results
provide the students and staff with a wealth of useful diagnostic data. Additionally,
NWEA/MAPS results are generated immediately after the assessments are completed;
therefore, the results are both timely and beneficial. NeSA tests, on the other hand, are
mandated by the State of Nebraska, and the results from the spring assessments are not
received until the next fall. Unfortunately, the logistics of the spring testing/fall reporting
paradigm essentially renders the diagnostic potential of the NeSA tests virtually useless.
While this assessment format holds very little pedagogical value, the NeSA format does
serve the political/control function well.
Assessment-based Accountability
NeSA assessment scores are the most salient component of the Nebraska Performance
Accountability System (NePAS). NePAS incorporates NeSA test scores along with a
number of additional indicators (for example, graduation rate, historical growth and
improvement test data, as well as the assessment participation rate) to develop a system that
assigns a rank-order configuration to all Nebraska public school districts. The state raking
system ranges from a rank of 1 (the highest) to a ranking of 249 (the lowest). Apparently,
7
the theoretical (albeit, flawed) framework driving the NePAS system is that the potential of
a low ranking will motivate schools to aspire to higher levels of instruction as well as
student performance.
Standardized achievement assessment (for purposes of accountability) initially
surfaced in the early 1990s, and became a major component of public education following
the passage of Public Law 107-110: the No Child Left Behind Act of 2001 (Ravitch, 2010).
While the intentions of accountability systems such NePAS and NCLB (the Federal system)
are certainly noble, (the stated purpose is to improve education, as well as to close the
achievement gap) the approach is, quite frankly, ineffective. Within this context, Wiliam
(2010) argues, “[T]he systems currently in use have significant shortcomings that call into
question some of the interpretations that are routinely based on the scores yielded by these
tests” (p. 107). Additionally, in response to the ubiquitous and seemingly pervasive nature
of assessment-based accountability, the American Federation of Teachers (AFT)
commissioned a study designed to examine the impact of the assessment-based
accountability systems. The AFT study indicated that
the time students spend taking tests ranged from 20 to 50 hours per year in
heavily tested grades. In addition, [their results indicate that] students can
spend 60 to more than 110 hours per year in test prep in high-stakes testing
grades. (Nelson, 2013; p. 3)
Not only has the prevailing practice of using standardized achievement test results as
an accountability measure proven to be both time consuming as well as ineffective; there is
a growing body of literature that strongly suggests that an assessment-based approach to
educational accountability is pedagogically detrimental as well (Au & Gourd, 2013;
8
DuFour, et al., 2008; Madaus & Russell 2010/2011; Popham, 2000; Ravitch, 2010; Stiggins,
1997). Wiliam (2010) proffers that high-stakes standardized assessments have “the
potential for a range of unintended outcomes, many of which will have a negative impact.”
(p. 120). In a similar vein, Ravitch, (2010) adds credence to the detrimental effect of highstakes assessment with the following: “[T]est-based accountability has corrupted education,
narrowed the curriculum, and distorted the goals of schooling.” (p. 161).
Assessment-based accountability systems such as NCLB and NePAS place
significant pressure on schools and teachers to increase test scores (or else!). Within this
context, as the educational community rallied around the accountability agenda, specific
unintended outcomes have invariably surfaced (Goodwin, 2014; Madaus & Russell
2010/2011; National Research Council, 2011; Popham, 2000; Ravitch, 2010). For example,
according to Au & Gourd (2013), “[W]e know that high-stakes testing is controlling both
what and how subjects are taught...” (p. 17). As a direct result of increasing pressure to
produce acceptable test scores, curricular offerings are being steadily reduced and
restricted—in other words, what is tested becomes what is taught (Au & Gourd, 2013;
Kohn, 2000; National Research Council, 2011; Popham, 2000; Ravitch, 2010). A report by
the National Research Council (2011) addresses an additional unintended consequence
regarding the impact of test-based accountability on curriculum and instruction stating, “The
likely outcome is that performance on the untested material will show less improvement or
decrease, but this difference will be invisible because the material is not covered by the test”
p. 39). In response to this disturbing trend, Au & Gourd (2013) also stress that “[U]ntested
subjects are being reduced in the curriculum and teachers nationwide are moving toward
more teacher-centered, lecture-based pedagogies that encourage rote learning…” (p. 17). In
9
addition to narrowing curriculum, research has shown that teachers (especially those in the
tested grades and subject areas) are devoting an increasing amount instructional time to “a
steadier diet of test preparation activities that distract from the larger goals of educating
students with the more complex skills and habits to complete in the global economy and a
more sophisticated democratic society” (Supovitz, 2009, p. 221). Shepard (2000) concurs
stating, “[E]xternally imposed testing programs prevent and drive out thoughtful classroom
practices” (p. 9).
No Child Left Behind (2001-2014)
Public Law 107-110 (the No Child Left Behind Act of 2001—NCLB) is, quite simply, a
reauthorization of the Elementary and Secondary Education Act (ESEA) of 1965. ESEA, a
major component of President Johnson’s “War on Poverty” initiative, established the Title I
program that was designed to provide additional instructional supports to students in
“targeted” schools that were determined to be “below grade-level” in reading and
mathematics. The 2001 reauthorization of the ESEA (now known as the No Child Left
Behind Act of 2001) was passed by the United States Congress with strong bi-partisan
support, and signed into law on January 8, 2002. Arguably, the most significant facet of
NCLB is that it established a federally mandated school accountability system based almost
exclusively upon the results of standardized achievement tests (SATs). According to the
Center on Education Policy (2003), NCLB contained “two major purposes: [1] to raise
student achievement across the board and [2] to eliminate the achievement gap between
students from different backgrounds” (p. iii). The law further mandates that schools must
demonstrate adequate yearly progress (AYP) in order to satisfy the stated goal that all
students score at the “proficient” level in both reading and math by the year 2014. (Center
10
on Education Policy, 2003; DuFour, et al., 2008; Ellis, 2007; Lee & Reeves, 2012; Ravitch,
2010).
As the text of this paper would indicate, there exists a considerable amount of
professional literature that questions the overall effectiveness of assessment-based
accountability systems like NCLB and NePAS. With the NCLB 2014 “due date” for
the “100 per cent proficiency” mandate looming large, Dee and Jacob (2010), writing
for the Brookings Institute, stated, “Given the national scope of the policy, it is
difficult to reach definitive conclusions about its impact” (p. 190). Additionally,
Ellis (2007) adds a cautionary note stating that research data “supporting the
purported success of No Child Left Behind are ambivalent at best” (p. 222). That
being said, much of the research directed at identifying a relationship between highstakes assessments (mandated by NCLB) and improved student achievement has
focused upon comparisons between test scores generated on state developed highstakes assessments and a relatively low-stakes national assessment known as the
National Assessment of Educational Progress (NAEP)—sometimes referred to as
The Nation’s Report Card. According to a Nebraska Department of Education (n.d.a) web page, “The National Assessment of Educational Progress…is designed to
measure what students across the nation know and can do in ten subject areas, including
mathematics, reading, writing, and science” (¶ 1). It is important to note that NAEP was
created in 1969 by the United States Congress to “provide a common national yardstick
for accurately evaluating the performance of American students” (NDE, n.d.-a, (¶ 4).
While it should only seem logical that NAEP scores should closely mirror corresponding
state level accountability-linked assessment scores, such has not been the case. In fact, as
11
detailed above, academic improvement data related to high-stakes, assessment-based
accountability systems mandated by NCLB has proved to be both mixed, confusing, and
questionable (DuFour, et al., 2008; Lee, 2008; Lee & Reeves, 2012; Fuller, Wright, Gesicki,
& Kang, 2007; Nichols, Glass, & Berliner, 2006/2012). In this respect, Ravitch (2010) has
indicated that the intense pressures to improve test scores at the state and local level may
well lead to significant test score inflation and that said inflation may well be strongly
correlated to “coaching or cheating or manipulating the pool of test takers” (p. 161). She
states further,
The starkest display of score inflation is the contrast between the statereported test scores, which have been steadily (and sometimes sharply) rising
since the passage of NCLB, and the scores registered by states on NAEP, the
federal assessment program. (p. 161)
As a case in point, the Nebraska State of the Schools Report indicates that for the
2012-2013 NeSA assessment cycle, 79% of Nebraska’s fourth grades students scored at the
proficient level (NDE, n.d.-b), while the 2013 State Snapshot Reading Report (Nation’s
Report Card, 2013) indicates that only 37% of Nebraska’s fourth grade students scored at
the NAEP proficient level. Similar conflicting results are reported for fourth and eighth
grade Mathematics, and eighth grade reading as well (NRP, 2013). With the Nebraska
results serving as a specific example, Fuller, et al., (2007) proclaim, “[S]tate results continue
to exaggerate the percentage of fourth graders deemed proficient or above in reading and
math when compared with NAEP results” (p. 275).
Within a backdrop of conflicting assessment data, Arizona State University
researchers, Nichols, et al., (2006) stated, “To date there is no consistent evidence that high-
12
stakes testing works to increase achievement” (p. 6). Within this context, a review of
relevant research and literature indicates that student achievement has not generally
experienced appreciable gains since NCLB was signed into law. (Dee & Jacob, 2010;
DuFour, et al., 2008; Ellis, 2007; Lee, 2008; Lee & Reeves, 2012; Nichols, et al.,
2006/2012; Ravitch, 2010). In fact, the research literature indicates that student gains in
reading have remained (at best) flat, and may well have declined since the implementation
of NCLB. In terms of student achievement gains in the area of Mathematics, the data are
more favorable (especially for fourth grade Hispanic students); however, it appears that
academic improvement in Mathematics was more pronounced prior to NCLB, and
improvement in the area of Math seems to have slowed down or even stalled (especially at
the eighth grade level). (Dee & Jacob, 2010; DuFour, et al., 2008; Ellis, 2007; Lee, 2008;
Lee & Reeves, 2012; Nichols, et al., 2006/2012; Ravitch, 2010). In regard to the apparent
impact of NCLB on gains in student achievement, Nichols, et al., (2012) proffer the
following summary statement, “a pattern seems to have emerged that suggests that highstakes testing has little or no relationship to reading achievement, and a weak to moderate
relationship to math, especially in fourth grade but only for certain student groups” (p. 3),
with respect to the specific NCLB goal of closing the “achievement gap,” Fuller, et al.,
(2007), offer the following observation:
When it comes to narrowing achievement gaps, the historical patterns are
similar. For reading, ethnic gaps on the NAEP closed steadily from the early
1970s through 1992, then widened in 1994, and then narrowed through 2002.
But no further narrowing has occurred since 2002. For math, the Black-White
gap narrowed by over half a grade level between 1992 and 2003, but no
13
further progress was observed in 2005. The Latino-White gap has continued
to close with a bit of progress post-NCLB—the one bright spot on the equity
front. (p. 275)
Given the apparent failure of assessment-based accountability systems (such
as NePAS and NCLB) to positively impact student achievement, Nichols, et al.,
(2006), provide the following poignant summary statement:
In light of the rapidly growing body of evidence of the deleterious unintended
effects of high-stakes testing, and the fact that our study finds no dependable
or compelling evidence that the pressure associated with high-stakes testing
leads to increased achievement, there is no reason to continue the practice of
high-stakes testing. (p. 52)
14
References
Au, W. & Gourd, K. (2013) Asinine assessment: Why high-stakes testing is bad for
everyone, including English teachers. English Journal. 103.1: 14-19.
Bracey, G. W. (1995). Final exam: A study of the perpetual scrutiny of American education.
Bloomington, IA: TECHNOS Press.
Center on Education Policy (2003). From the capital to the classroom: State and Federal
efforts to implement the No Child Left Behind Act. Available on-line at
http://www.cep-dc.org/displayDocument.cfm?DocumentID=298
Cohen, R. J. and M. E. Swerdlik (2002). Psychological testing and assessment: An
introduction to test and measurement (5th Ed.). Boston: McGraw-Hill.
Data Recognition Corporation (August 2013). Spring 2013 Nebraska State Accountability
(NeSA) Reading, Mathematics, and Science: Technical Report. Available on-line at
http://www.education.ne.gov/assessment/pdfs/Final_2013_NeSA_Technical_Report.
pdf.
Dee, T. S. & Jacob, B. A. (Fall 2010) The impact of no child left behind on students,
teachers and schools, 149-207. Brookings Papers on Economic Activity. Available
on-line at http://faculty.smu.edu/Millimet/classes/eco4361/readings/dee%20et%
20al%202010.pdf.
DuFour, R., DuFour, R. & Eaker, R. (2008). Revisiting professional learning communities
at work: New insights for improving schools. Bloomington, IN: Solution Tree.
Ellis, C. R. (2007). No child left behind—A critical analysis: “A nation at greater risk.”
Curriculum and Teaching Dialogue. 9(1&2), pp. 221-233.
15
Fuller, B., Wright, J., Gesicki, K., & Kang, E. (2007). Gauging growth: How to judge No
Child Left Behind. Educational Researcher. 36(5), pp. 268-278.
Goodwin, B. (2014) Better Tests don’t guarantee better instruction. Educational Leadership.
71(6), pp. 78-80.
Kohn, A. (2000). The case against standardized testing: Raising the scores, ruining the
schools. Portsmouth, NH: Heinemann.
Lee, J. (2008). Is test-driven external accountability effective? Synthesizing the evidence
from cross-state causal-comparative and correlational studies. Review of Educational
Research. 78(3), pp. 608-644.
Lee, J. & Reeves, T. (2012). Revisiting the impact of NCLB high-stakes school
accountability, capacity, and resources: State NAEP 1990-2009 reading and math
achievement gaps and trends. Educational Evaluation and Policy Analysis. 34(2), pp.
209-231.
Madaus, G. & Russell, M. (2010/2011). Paradoxes of high-stakes testing. Journal of
Education 190(1/2), pp. 21-30.
National Research Council (2011). Incentives and test-based accountability in public
education. M. Hout & S.W. Elliott (Eds). Washington, DC: The National Academies
Press.
Nebraska Department of Education. (n.d.-a.). National Assessment of Educational Progress
(NAEP). Available on-line at http://www.education.ne.gov/naep/
Nebraska Department of Education (n.d.-b.) 2012-2013 State of the schools report: A report
on Nebraska public schools. Available on-line at http://reportcard.education.ne.gov/
16
Nebraska Department of Education (2009). Nebraska Mathematics Standards. Available
on –line at http://www.education.ne.gov/math/PDFs/Math_StandardsAdopted10-809Horizontal.pdf
Nebraska Department of Education. (August 9, 2012). Nebraska Performance
Accountability System [NePAS]. Available on-line at http://www.education.ne.gov/
Assessment/pdfs/Nebraska_Performance_Accountability_System_Aug_2012.pdf
Nelson, H. (2013) Testing more, teaching less: What America’s obsession with student
testing costs in money and lost instructional time. American Federation of Teachers.
Available on line at www.aft.org/pdfs/teachers/testingmore2013.pdf.
Nichols, S. L., Glass, G. V. & Berliner, D. C. (2006). High-stakes testing and student
achievement: Does accountability pressure increase student learning? Education
Policy Analysis Archives. 14(1), pp. 1-95.
Nichols, S. L., Glass, G. V. & Berliner, D. C. (2012). High-stakes testing and student
achievement: Updated analyses with NAEP data. Education Policy Analysis
Archives. 20(20), pp. 1-30.
No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 115, Stat. 1425 (2002).
Popham, W.J. (2001). The truth about testing: An educator's call to action. Alexandria, VA:
ASCD.
Ravitch, D. (2010). The death and life of the great American school system: How testing
and choice are undermining education. New York: Basic Books.
Shepard, L. A. (October, 2000). The role of assessment in a learning culture. Educational
Researcher. 29(7), pp. 4-14.
17
Stiggins, R. J. (1997). Student-centered classroom assessment (2nd Ed.). Upper Saddle
River, NJ: Prentice-Hall, Inc.
Supovitz, J. (2009). Can high stakes testing leverage educational improvement? Prospects
from the last decade of testing and accountability reform. Journal of Educational
Change. 10, pp. 211-227.
The Nation’s Report Card. (2013). Reading: 2013 State Snapshot Report. Available on-line
at http://www.education.ne.gov/naep/PDFs/2014_NAEP_grade_4_reading.pdf
Wiliam, D. (2010). Standardized testing and school accountability. Educational
Psychologist. 45(2), pp. 107-122.
Questions or comments regarding the content of this publication may be directed to the
following:
Dr. Michael Moody
Superintendent of Schools
Friend Public Schools
501 S. Main St., PO Box 67
Friend, NE 68359
Email: m.moody@friendschool.org
Download