A Quandary for School Leaders: Equity, High-stakes Testing and Accountability Submitted for AERA Handbook of Research on Educational Leadership for Diversity and Equity Julian Vasquez Heilig, Ph.D. Department of Educational Administration The University of Texas at Austin George I. Sanchez Building (SZB) 374D 1 University Station D5400 Austin, TX 78712-0374 Tel: (512) 897-6768 Fax: (512) 471-5975 jvh@austin.utexas.edu Sharon L. Nichols, Ph.D. Department of Educational Psychology University of Texas at San Antonio 501 W. Durango Blvd., Rm. 4.342 San Antonio, TX 78207-4415 Tel: (210) 458-2035 Fax: (210) 458-2019 Sharon.Nichols@utsa.edu 2 The goal of this chapter is to consider the interrelation among equity, high-stakes testing and accountability as they relate to evolving roles of today’s school leaders. As educational policy has developed over the past eighty years, a rapidly growing fear and uncertainty has emerged around the “core technology” of education (Young and Brewer, 2008). As a result, many schools leaders feel as if their work has changed dramatically, from a focus on curriculum and instruction to one on assessment and intervention (McNeil, 2005). The intense focus on test results and how those results are used and shared with the public has left many school leaders feeling disillusioned, anxious and uncertain about the decisions they make (Vasquez Heilig & Darling-Hammond, 2008). Consequently, school leaders face a quandary over how best to manage their schools when policy-driven accountability mandates conflict with curriculum-based, studentcentered instructional practice—an issue particularly salient among leaders serving in historically low performing schools and where leaders are often rewarded for discarding the neediest underperforming students (Vasquez Heilig, Williams, & Young, 2011). As we discuss in this chapter, for many school leaders in low-performing schools, the exclusion of at-risk students from school appears to be a rational response to the quandary fomented by the current educational policy environment. We begin with a short history to locate the roots of high-stakes testing and accountability in the debates between Deweyan pedagogical principles and administrative reformers and briefly trace the emergence of testing out of the progressive era into the educational discourse. We then describe the role that the Lone Star State has played in the push for national high-stakes testing and accountability followed by a précis of Texas data considering long-term African American and Latina/o student achievement trends 3 and graduation/dropout rates in the midst of accountability. Next, we examine the literature to understand how educators in low-performing schools across the nation have responded to the press of standards and accountability. We conclude with recommendations for future research and some suggestions for school leaders based on proposed alternative views of high-stakes testing and accountability. Foundations of the High-Stakes Testing and Accountability Movement At the turn of 20th century, the United States experienced dramatic population and demographic growth. Driven by immigration, from 1890 to 1930, the population nearly doubled, from 62 million to 122 million. Concomitant with population growth, compulsory schooling laws doubled the cost of schooling in many localities as attendance quadrupled (Tyack, 1976). School enrollment ballooned from 15.5 to 50 million, and the number of schoolteachers increased from 400,000 to 5.6 million (Nichols & Good, 2004). In spite of (or because of) these shifts, in conjunction with a highly racialized society, the struggle for educational equity and access were readily apparent (Blanton, 2004). In the early 1900s, questions about quality and access were particularly relevant as African American and Latina/o students attended woefully under resourced schools at disproportionately higher rates (Blanton, 2004). The progressive movement came into prominence in this era of rapidly changing demographics and evolving social context. An awakening of social conscience among the muckrakers, prohibitionists and corporate reformers spurred the movement dubbed the progressive era. Progressive reformers, so named by historians, fomented the public policy discussions of the day. On one side were the administrative progressives who argued that the primary goal of schooling was a uniform structure in the mold of 4 Taylorism that efficiently prepared individuals for a place in the workforce (Tyack, 1974). In today’s language, the tenants of administrative progressivism could be considered neoliberal. On the other side were the pedagogical progressives who proffered that schools should recognize and adapt to the individual capacity and interests of students rather than rote standardization (Tyack, 1974)—a.position that aligns more closely with the socio-constructivist conception of teaching and learning (Vygotsky, 1978). These dichotomous positions contrast with today’s view of the term “progressive” which is commonly associated with a particular political point of view (i.e., liberal). The administrative progressives sought to apply a corporate model where expert bureaucrats ran schools seeking social efficiency. They supported multiple ability tracks, extensive revisions of curriculum, detailed records of students and upgraded training for education professionals (Cuban, 1988). Administrative progressives argued that the governance of city schools was immersed in local politics and inefficiency and should be turned over to educational experts. They also instituted inter-locking directorates that created a “symbiosis” between state departments of education, education professors and state educational associations to encourage local administrators to publicize and enforce new education codes, regulations and standards (Tyack, James, & Benavot, 1987). The administrative progressives were concerned with organizational performance and aggressive “uniform” goals rather than individualized development. The pedagogical progressives’ counterpoint to administrative progressives was that the key to student success was not centered in management efficiency, but rather a focus on meeting the needs of individual students. John Dewey (as cited in Tyack, 1974) argued, 5 It is easy to fall into the habit of regarding the mechanics of school organization and administration as something comparatively external and indifferent to educational purposes and ideas…We forget that is it such matters as the classifying of pupils the way decisions are made, the manner in which the machinery of instruction bears upon the child that really controls the whole system. (Tyack, 1974 p. 176) Pedagogical progressives concentrated on inspiring teachers to change philosophy, curriculum, and methods to subvert the “hegemony” of school and giving them independence to do so in order to increase student achievement and success (Tyack, 1974). Dewey argued that democratic education required “substantial autonomy” for teachers and children. He theorized that children needed education that was authentic— allowing them to grow mentally, physically and socially by providing student-centered opportunity to be creative, critical thinkers (Dewey, 1938). Labaree (2005) proffers that the modern educational policy environment is heavily influenced by this long-standing debate between administrative and pedagogical views. He argued that pedagogical progressives failed “miserably” due to the structure formed by administrative progressives that “reconstructed the organization and curriculum of American schools in a form that has lasted until this day” (p. 276). The administrative progressives won the struggle to focus school reform on the management of schools and the measurement of uniform curriculum structures. In their study of progressive era, Lynd and Lynd (as cited in Tyack, 1974) acknowledged that in the clash between quantitative administrative efficiency and qualitative education goals, “the big guns are all on the side of the heavily concentrated controls behind the former” (p. 198). 6 These positivistic values cultivated by the administrative progressives undoubtedly underlie the emerging power of standardized testing in education from the progressive era onward. The Transition from Low-Stakes to High-Stakes Testing: Toward Greater School Efficiency For centuries, philosophers and scientists have sought ways to measure human potential and intelligence (Gould, 1996). As the science of measurement evolved, so too did the discourse surrounding how tests and measures might be used to facilitate the educational experience. The use of standardized tests in school settings emerged in the late 18th century at the same time psychologists were becoming increasingly interested in developing scientific methods for measuring human qualities such as intelligence (Gamson, 2007; Mazzeo, 2001; Sacks, 1999). The widespread use of such tests was popularized with the construction of the standardized “National Intelligence Test” spearheaded by Edward Thorndike, Lewis Terman, and Robert Yerkes (Giordano, 2005). Standardized intelligence testing gained momentum between 1880 and World War I as a variety of instruments were utilized to gauge mental ability (Boake, 2002; Gamson, 2007). Soon after the turn of the 20th century, the Binet-Simon scale began to test children aged 3 to 15 years to compare their intellectual capacity with that of “normal” children and adolescents of various ages to devise an IQ, or intelligence quotient (Binet & Simon, 1916). Ability testing of immigrants also emerged as federal immigration acts as early as 1917 required basic literacy tests for new arrivals (Timmer & Williamson, 1998). During WWI, the Army utilized mental testing developed by Edward Thorndike to determine which soldiers were eligible for the officer corps (Samelson, 7 1977). The conception that aptitude could be quantitatively measured and used for educational purposes also gained momentum as intelligence tests were revised for use in schools and promoted “tracking” systems used to segregate students according to ability (Chapman, 1981). Growing familiarity of these standardized tests ignited debates early in our educational history about how best to assess students’ academic success potential. Proponents, skeptical of subjective teacher grading systems (Starch & Elliott, 1912, as referenced in Giordano, 2005), believed standardized tests were the perfect way to elicit meaningful and reliable data about students. Opponents worried about test bias and their limited capacity to adequately account for student differences (e.g., race, income). These are familiar worries about the appropriate use of standardized tests that have persisted since their inception (Sacks, 1999). In spite of ongoing debates regarding the fundamental purposes of schools (and therefore, use of tests, e.g., Cuban, 1988; Tyack & Cuban, 1995), proponents of standardized tests and purveyors of the administrative progressive viewpoint of schooling were convinced of their necessity and role. E. L. Thorndike put it this way, Educational science and educational practice alike need more objective, more accurate and more convenient measures…Any progress toward measuring how well a child can read with something of the objectivity, precision, commensurability, and convenience which characterize our measurement of how tall he is, how much he can lift with his back or squeeze with his hand, or how acute his vision is, would be of great help in grading, promoting, testing the value of methods of teaching and in every other case where we need to know ourselves and to inform others how well an individual or a class or a school population can read (Thorndike, 1923, p. 1-2). Since Thorndike’s time, the form and function of standardized tests have expanded swiftly and have been used for many purposes over the years (Sacks, 1999). Norm- 8 referenced tests (e.g., IQ) gave us a way to rank students according to aptitude. Criterionreferenced tests provided measures of students’ progress against eternally defined standards. Other standardized tests gave us a way to make predictions about students’ academic potential (e.g., SAT, GRE) (Giordano, 2005; Herman & Haertel, 2005; Moss, 2007; Sacks, 1999). When it comes to consequences, until the 1980s, performance on any one of these types of tests had relatively low stakes for students and essentially no stakes for teachers. Although some states and districts have a history of experimenting with high-stakes testing as a way to reform and improve schools, this practice was inconsistent and relatively inconsequential for large groups of students (Allington & McGill-Franzen, 1992; Mazzeo, 2001; Tyack, 1974). This has changed radically over the past few decades as the rhetoric of public schools in “crisis” expanded (Berliner & Biddle, 1995; Glass, 2008; National Commission for Excellence in Education, 1983) culminating with the No Child Left Behind act (NCLB, 2002) that mandated the use of high-stakes testing in all schools for all students. Consequential standardized testing systems pervade modern American classrooms (Herman & Haertel, 2005; Orfield & Kornhaber, 2001). Using Stakes-Based Testing to Address Underperformance and Inequities in Education The current emphasis on tests for making important decisions schools, students, teachers, and administrators can be traced back to two important events. The first rests with the Cold War Era launch of Sputnik—an event that ignited deep concern about the adequacy of American public schools to prepare students to compete internationally. Since then, public school bashing has become a tool justifying series of federal reforms 9 that over time have become increasingly intrusive (Gamson, 2007; Giordano, 2005). The second event came with the 1965 authorization of the Elementary and Secondary Education Act (ESEA). ESEA was the first large-scale federal legislation aimed at equalizing educational opportunities for all of America’s students. A significant goal of ESEA was to address resource allocation inequities among schools serving wealthy and poor students. Of its many provisions, ESEA provided federal dollars to schools serving large populations of students of poverty. At the time, it was argued that equalizing school inputs (resources, library books, access to materials, supplies and teachers) would help to minimize achievement gaps among students. Over time, the focus shifted from school inputs (e.g., what do schools provide to students?) to their outputs (e.g., what skills do students leave with?). As a result, states began implementing minimum competency tests as a way to comply with federal recommendations to ensure all students left school with at least the ability to read and do basic math (Herman & Haertel, 2005; Moss, 2007). In some states, students could be denied a diploma if they did not pass these tests, but there were few if any consequences for teachers or schools. Eventually the minimum competency tests were criticized for being relatively easy to pass since they were concerned with minimums to be learned: the achievement floor and not the achievement ceiling. Many politicians and citizens alike believed that students were not being stretched enough, that overall US school performance was not improving, and that the terrible achievement gap between white middle-class students and black, Hispanic, or poor students was not being reduced. As discontent grew, the 10 language of minimum competency changed to the language of standards-based achievement (Glass, 2008; McDonnell, 2005). In the years following ESEA concern for the emerging American educational “crisis” grew (Aronowitz & Giroux, 1985). This was fueled, in part, by international data that purported to show that US schools were not as good as those in other nations. More fuel was provided because of a growth of the international economy at a time when the national economy was performing poorly. A scapegoat for that state of affairs needed to be found (Berliner & Biddle, 1995). The concerns of the 1970s culminated in the release of A Nation at Risk (1983)—a report that predicted that unless public education received a major overhaul, and unless expectations for student achievement were raised, America’s economic security would be severely compromised (National Commission for Excellence In Education, 1983). Indeed, A Nation at Risk sparked a renewed interest in and urgency about how we educate America’s students. That sense of urgency set in motion a series of policy initiatives aimed at improving the “failing” American educational system. The past 20 years have also seen a renewal of the rapid demographic changes experienced in during the progressive era, spurring a renewed focus on educational policy to solve America’s educational problems—among these was a call for “closing the achievement gaps,” accountability, and more consequential testing (Anderson. 2004; Lipman, 2004; Herman & Heartel, 2005). 11 Closing the Achievement Gap: The Modern Rise of High-Stakes Testing and Accountability While immigrants mostly of European origin led the immigration wave during the progressive era, people of color are leading the current demographic shift in US schools.i According to the U.S. Census Bureau, the number of Latina/os living in American households will rise 500% between 1970 and 2010 (U.S. Census, 2006). By the 2000 census, Latina/os had become the largest racially collated ethnic minority group in the United States at almost 13% of the populace (U.S. Census, 2001).1 By 2008, nearly one in six U.S. residents were Latina/o (U.S. Census, 2009). Turning to African Americans, they increased by 6.6 million between 1990 and 2000—the largest increase since the first Census in 1790. African Americans were 12.9% of the U.S. population in 2000 and increased 14% by 2008 (U.S. Census, 2008). Nationally, students of color constituted a majority in the public schools in 11 states by 2008 (Southern Education Foundation, 2010).ii One of the most pressing problems in the United States is improving student academic performance, within the nation’s burgeoning African American and Latina/o student population (Rumberger & Arellano Anguiano, 2004). African American and Latina/o students comprise a large sector of students vulnerable to poor school performance, as many of these youth arrive in high school having received uneven or irregular instruction (Jencks & Phillips, 1998). Although there have been multitude of efforts (both successful and unsuccessful) over time targeting achievement-related gaps among poorer students of color and their white counterparts (Anderson, 2004), we focus 1 People who identify as Latina/o may be of any race. 12 on the latest efforts subsumed within the No Child Left Behind Act of 2001 that reauthorized the Elementary and Secondary Education Act of 1965. No Child Left Behind: Lassoed From Texas No Child Left Behind was rooted in Texas long before it’s passage in 2001. Soon after the Nation at Risk (1983) report, there was a new push by Texas policymakers and business leaders to reform the state’s schools. The Perot Commission, and later the Texas Business-Education Coalition coalesced corporate leaders to represent the business perspective in education reform (Grissmer & Flanagan, 1998). Ross Perot and his allies were “influential actors” and proponents of accountability and testing in Texas (Carnoy and Loeb, 2003). Codified in this reform effort was a determination to inculcate reform measures to increase efficiency, quality and accountability in a push for schools to perform more like businesses (Grubb, 1985). As a result, Texas was one of the earlier states to develop statewide testing systems during the 1980s, adopting minimum competency tests for school graduation in 1987.iii In the early 1990s, the Texas Legislature passed Senate Bill 7 (1993), which mandated the creation of the first-generation of Texas public school accountability to rate school districts and evaluate campuses. The Texas accountability system was initially supported by the Public Education Information Management System (PEIMS) data collection system, a state-mandated curriculum and the Texas Assessment of Academic Skills (TAAS) statewide testing program. The prevailing theory of action underlying Texas-style high-stakes testing and accountability ratings was that schools and students held accountable to these measures would automatically increase educational output as educators tried harder, schools 13 adopted more effective methods, and students learned more (Vasquez Heilig & DarlingHammond, 2008). Pressure to improve test scores would produce genuine gains in student achievement (Scheurich, Skrla, Johnson, 2003). As test-based accountability commenced in Texas, achievement gains across grade levels conjoined with increases in high school graduation rates and decreases in dropout rates brought nationwide acclaim to the Texas accountability ‘miracle’ (Haney 2000). Citing the success of the first generation of Texas-style high-stakes testing and accountability, President George W. Bush and former Secretary of Education Rod Paige, two primary arbiters of NCLB, lassoed their ideas for federal education policy from Texas. NCLB replicated the Texas model of accountability by injecting public rewards and sanctions into national education policy and ushered in an era where states and localities are required to build state accountability systems on high-stakes assessments. Early on, the literature echoed the administrative progressive ideals that the long term implications of accountability pointed to increase efficiency and achievement (Cohen, 1996; Smith and O’Day, 1991) others posited the Deweyan ideal that testing would dramatically narrow the curriculum and could negatively impacting classroom pedagogy (McNeil & Valenzuela, 2001; Valencia & Bernal, 2000). Nevertheless, at the point of NCLB’s national implementation, the Texas Miracle was the primary source of evidence fueling the notion that accountability created more equitable schools and districts by positively impacting the long-term success of low-performing students (Nichols, Glass, & Berliner, 2006). The Texas Case: The Long-Term Impact of Accountability The successes of the Lone Star State’s accountability policy in the midst of the 14 Texas Miracle has been debated vociferously in the literature (Carnoy, Loeb, & Smith, 2001; Haney, 2000; Klein, Hamilton, McCaffrey, & Stecher, 2000; McNeil, Coppola, Radigan & Vasquez Heilig, 2008; Linton & Kester, 2003; Toenjes & Dworkin, 2002; Vasquez Heilig & Darling-Hammond, 2008). In theory, accountability spurs high schools to increase education output for all students, especially for African American and Latina/o students, who have been historically underserved by U.S. schools. Yet the question remains: Do policies that reward and sanction schools and students based on high-stakes tests improve African American and Latina/o student outcomes over the long term? The Texas Comptroller of Public Accounts (2001) indicated that the PEIMS was created in 1983 for the Texas Education Agency (TEA) to provide a uniform accounting system to collect all information about public education, including student demographics, academic performance, personnel, and school finances. The PEIMS lies at the heart of the Texas student accountability system, and the wealth of information gathered from school districts offers the opportunity to gauge the success of Texas-style accountability for students over time. At the time of writing, there were no publicly available TEA reports or published research considering cross-sectional African American and Latina/o achievement and progress through school in Texas. As a result, this chapter gathers data from multiple state reports to descriptively consider TAAS and TAKS exit testing achievement, grade retention, dropout rates, and graduation rates for more than 15 years of Texas-style accountability.iv To address the question of whether student outcomes for African American and Latina/o students have improved over time in Texas, we examined cross-sectional high 15 school exit exam data from the inception of accountability in 1994 through 2010—the most recent data available. During this time, Texas utilized two generations of accountability assessment systems. The first generation of relied on the Texas Assessment of Academic Skills (TAAS) and lasted 1994-2002. The second generation included the Texas Assessment of Knowledge and Skills (TAKS) and includes data from 2003-2010. Our descriptive statistical analyses focus on African American and Latina/o high-stakes high school exit test score trends for 10th graders (TAAS, 1994-2002) and 11th graders (TAKS, 2003-2010). TAAS Exit Exam Figure 1 shows that African Americans dramatically increased their achievement on the TAAS Exit Math, from only 32% meeting minimum standards in 1994 to 85% by 2002. Concurrently, the percent of Latina/os meeting minimum standards increased from 40% to 88%. Although achievement gap between minorities and whites remained, the gap for Latina/os and African Americans narrowed to 8% and 11%, respectively, between 1994 and 2002. Figure 2 also shows large gains in the percent of African American and Latina/o students meeting minimum standards on the TAAS Exit Reading. By 2002, TEA reported that 92% of African Americans and 90% of Latina/os in the state had met minimum standards on the TAAS Exit Reading. African Americans showed an increase of 32% more students meeting minimum standards, while Latina/os showed an overall increase of 29%. The achievement gap closed to 8% for Latina/os and 6% for African Americans. [INSERT FIGURES 1 AND 2 HERE] 16 TAKS Exit Exam In 2003, the TAKS replaced the TAAS as the exit exam in Texas. As shown in Figure 3, between 2003 and 2010 the percentage of African Americans passing the TAKS Exit Math increased from 25% to 81%, a gain of 56%. Latina/os showed a similar gain of 55% more students meeting minimum standards on the TAKS Exit Math (from 30% to 85%). Similar to the closing of the achievement gap on the TAAS Exit Math, the TAKS Exit Math gap for African Americans and Latina/os decreased to 4% and 9% by 2010 (see Figure 3). [INSERT FIGURE 3 HERE] During the past 8 years of TAKS Exit testing, the percentage of African Americans passing the TAKS Exit English Language Arts increased 43%, while the proportion of Latina/os meeting minimum standards increased 38% (see Figure 4). Similar to the closing of the achievement gap noted on the TAAS Exit Reading, the gap between African American and White students decreased to 6%. By 2010, the gap between the percent of Whites and Latina/os passing the TAKS Exit English Language Arts had declined to 7%. [INSERT FIGURE 4 HERE] Dropout Cross-sectional snapshot cohort rates show that publicly reported yearly dropout rates more than halved in the first decade of accountability—arriving at about 1% for African Americans and 2% for Latina/os by 2004 (see Figure 5). However, after 2005, when the state began to use the National Center for Education Statistics (NCES) dropout definition for leaver reporting, the yearly count tripled for Latina/os and quadrupled for 17 African Americans. Clearly, as evidenced by Figure 5, Latina/os and African Americans were over-represented in the underreporting of yearly dropouts. In the 1998–1999 school year, TEA introduced tracking of individual students in cohorts between Grades 9 and 12 (TEA, 2001). As a result, the longitudinal cohort dropout analysis begins in 1998 instead of 1994.v Figure 6 shows that TEA-reported African American and Latina/o cohort dropout rates halved between 1999 and 2005. However, after 2005, using the NCES dropout standard for leaver reporting, a 100% increase in the number of publicly reported dropouts occurred in Texas. [INSERT FIGURES 5 AND 6 HERE] Notably, the cohort dropout rates more than doubled for African Americans and Latina/os, after the adoption of the NCES standard. These numbers align with empirical research critical of TEA’s publicly reported dropout numbers (Losen, Orfield, & Balfanz, 2006; Vasquez Heilig & Darling-Hammond, 2008) and suggests that student leavers were underreported for quite some time by the state, especially when it came to African American and Latina/o populations. Texas has vastly undercounted and underreported dropout data over time. Graduation Rates If significantly larger numbers of African Americans and Latina/os were dropping out of school in Texas, then cohort graduation rates should be correspondingly low. Figure 7 shows that TEA reported African American and Latina/o graduation rates from 1996-2004 gradually rose to about 80% then dipped by almost 10% when NCES standards were instituted in 2005. Notably, the large decline did not occur for Whites in 18 Texas, as their cohort graduation rates only dipped about 1% after the NCES readjustment. [INSERT FIGURE 7 HERE] In a study of Texas dropout data, Losen et al. (2006) argued that Texas graduation rates historically have been overstated. They examined PEIMS data for individual students and proffered that between 1994 and 2003, the state’s graduation rate increased from 56% to 67%. In contrast, TEA’s publicly released statistics locate the graduation rates at 72% and 84% for the same period—a difference of 17% by 2003, the equivalent of approximately 46,000 students. Losen et al. noted that the overstatement of graduation rates in Texas occurred partly because PEIMS has included many ways that students could be excluded from enrollment data used to calculate graduation rates. Instead of utilizing PEIMS to define away the dropout and graduation numbers in Texas, the NCES definition has created more transparency in the state while calling into question whether gains have actually occurred in Texas since the inception of accountability in 1994. The Intercultural Development Research Association (IDRA) has argued that adopting the NCES national dropout definition for Texas has provided a more accurate, yet still understated representation of the magnitude of the overall dropout problem in Texas (Johnson, 2008). More than two decades of IDRA’s yearly high school attrition studies of PEIMS data have suggested that TEA has consistently and severely undercounted student leaving in publicly reported dropout and graduation rates. IDRA found the overall student attrition rate of 33% was the same in 2007–2008 as it was more than two decades ago (Johnson, 2008). In contrast, TEA had reported annual dropout rates that declined from 5% to 1% and longitudinal cohort dropout rates that declined 19 from about 35% to around 5% over the same time frame (Figure 8). IDRA also posited that the high school attrition rates for Latina/o and African American students accounted for more than two thirds of the estimated 2.8 million students lost from Texas public high school enrollment since the 1980s (Johnson, 2008). [INSERT FIGURE 8 HERE] In summary, TEA’s TAAS and TAKS exit exam data show that African American and Latina/o students apparently made dramatic achievement gains and narrowed the achievement gaps during the TAAS and TAKS eras. However, the crosssectional student progress analysis showed that dropout rates and graduation rates for African Americans in Texas do not appear to have improved (even with the apparently inflated rates released by TEA) after about 15 years of high-stakes testing and accountability policy; in fact, if data from empirical sources are to be believed, the situation has worsened. As a cautionary note, we acknowledge that this review of data is limited because of the ongoing debate about the validity of leaver data collected by the state. Data reported by the state of Texas has long been accused of inaccuracy in the accounting of student leavers (Haney, 2000; Orfield et al., 2004; Vasquez Heilig & Darling-Hammond, 2008). The data used in these analyses are the same data that has drawn criticism from IDRA and other researchers that have argued that the leaver problem is underreported (Johnson, 2008). We believe the actual dropout rates to be much higher and graduation rates lower than the publicly reported data (See also McNeil, Coppola, Radigan and Vasquez Heilig, 2008). Furthermore, critics have questioned the validity of TAKS and TAAS score growth over time due to TEA’s lowering of cut scores in successive state- 20 mandated testing regimes (Mellon, 2010; Stutz, 2011). We will return to these issues in forthcoming sections. Are High-stakes Testing and Accountability As Good as Advertised? What explains the “dubious” nature of these data? Why does Texas consistently overstate the educational trajectories of its African American and Latina/o youth population? Campbell’s law cogently describes inevitable corruption that emerges when a single indicator is used to evaluate complex social systems: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor” (Campbell, 1975, p. 35). Campbell’s law thusly warns of the impending corruption and distortion associated with high-stakes testing systems that are reliant solely on test scores to evaluate complex educational processes. Rapidly emerging research seems to bear out Campbell’s early warning and increasingly, data suggest educators engage in all types of activities aimed at making test scores as favorable as possible. These “gaming” activities (Ryan, 2004; Figlio & Getzler, 2002) have a particularly negative effect on African American and Latina/o students’ educational outcomes since poor students of color and those with learning disabilities or for whom English is a Second Language are more likely to be low scorers on these tests. For example, poor, African American and Latina/o students are more likely to suffer from low-quality pedagogy due to excessive teaching to the test and rote and drill activities as well as more restricted curriculum (McNeil, 2000; Vasquez Heilig, 2011 Vasquez Heilig, Cole, & Aguilar, 2010). Other types of gaming activities include blatant cheating such as correcting test answers (Amrein-Beardsley et al., 2010; Jacob and 21 Levitt, 2002a; Pedulla et al., 2003) and excluding special populations from testing through exemptions and other means in order to show overall increased educational achievement (Cullen & Reback, 2006; Jacob, 2005; Jennings & Beveridge, 2009). Gaming and Cheating Because of High-Stakes Testing Before we review some of the research underscoring the prescience of Campbell’s law, it is important to note that not all acts of cheating or gaming are the same (Nichols & Berliner, 2007a, b). We believe, as Amrein-Beardsley, Berliner, and Rideau (2010) describe, that there are different levels or “degrees” of cheating that ensue from highstakes testing pressures. First-degree actions constitute willful, premeditated acts designed to intentionally change, alter, or fudge test score data. Second-degree actions include more nuanced types of test fudging that include subtle modifications of test administration protocols such as when teachers provide test taking cues, encouragements and reminders, as well as gaming associated with how proficiency cut scores are determined and manipulating testing populations. Finally, third degree “cheating” is characterized by involuntary actions or cheating without awareness such as when educators use readily available test items to prepare students but “focus only on the testable standards, concept areas, and performance objectives” (Amrein-Beardsley, Berliner, & Rideau, 2010, p. 7). Prevalence of cheating It is difficult to gauge the prevalence of test-related teacher/administrator cheating/fudging because of the reliance on self-report data drawn from nongeneralizable samples. Still, some clues have emerged. A survey conducted with teachers and administrators in Tennessee found that 9% of teachers witnessed some type of test 22 impropriety on that state’s test (Edmonson, 2003). A national survey of teachers conducted in the early years of NCLB suggested that approximately 10% admitted to providing hints about answers during the test (second degree) with approximately 1.5% admitting to actually changing answers on the test (first-degree) (Pedulla et al., 2003). Jacob and Levitt (2002a, b) studied classroom test results in the Chicago Public School district that at the time used ITBS to make decisions about the quality of teachers. They concluded that approximately 4-5% of classroom teachers had engaged in blatant cheating (e.g. changing answers from wrong to right). Another difficulty in estimating prevalence has to do with varying definitions of what constitutes cheating. Amrein-Beardsley, Berliner, and Rideau’s (2010) self-report study of teacher cheating provides some additional insight on prevalence according to their first, second, and third degree characterizations. In their study, Amrein-Beardsley et al. (2010) surveyed a non-representative sample of 3,085 teachers across the state of Arizona, asking them to provide yes/no responses to a series of questions pertaining to their views of other teachers’ cheating behaviors as well as their own. Not surprisingly, teachers reported lower instances of personal cheating and greater instances of others’ cheating. Reports on what colleagues do in the classrooms were based on what they overheard or what they were told (i.e., not what they witnessed). From this data, they found that 39% of teachers said colleagues encouraged students to redo test problems, 34% gave students extra time on the test, and 34% wrote down questions for students to use to prepare for future tests. Few knew of anyone who outright cheated (10%). Only 1% of respondents admitted to blatantly cheating themselves, but were more likely to admit second and third degree types of cheating activities such as encouraging students to 23 redo problems (16%) and giving students extra time on the test (14%) or encouraging them during the test (8%). Although we must be cautious about generalizing from this data, it seems safe to estimate that anywhere from 25% to 40% of teachers at some point in time engage in “second” degree forms of cheating (consisting primarily of test administration protocol violations). When it comes to more blatant, willful acts (changing answers), similar to what others have found, it is likely more in the 1-5% range. Interview and focus group data provide more information on the type of blatant acts witnessed. For example, one teacher reported the blatant cheating activities of a once valued mentor: She told me to sit down. She said that she needed my help because of some of the bubbles were wrong and they needed to be fixed. I did not even think to question her. I was just, like, ‘okay.’ So, I sat down next to her and I saw her. She was going down looking at all the bubbles. She told me that some of them were wrong. So, she gave me a stack of them and she just told me this one is supposed to be B or this one is supposed to be C and whatever (Amrein-Beardsley, Berliner, & Rideau, 2010, p. 16). Unfortunately, this was not an isolated incident. Another teacher reported the following: I observed one time during the AIMS a fellow teacher in the same grade level doing the writing, which is supposed to be done by the children—the prewriting, first draft, final draft. I walked by her classroom and the door was opened. She had the kids lined up at her desk…She told me they were just doing the AIMS test and she was correcting their writing and sending them back to their seat[s] to correct their writing. Of course that year she got kudos because her kids reached 24 the 85th percentile. So, she got great results. She was doing the writing for them, but she got all the pats on the back and everything else. I even brought it to the principal’s attention. He did not really want to hear about it because of course if our scores look better they [administrators] don’t care. They turn their heads (Amrein-Beardsley et al., 2010, p. 20). When learning is viewed as a product (i.e., test score) instead of a process (i.e., an experience), and when that product is the only thing that stands between educators and financial resources that would significantly improve theirs and their students academic livelihoods, then it stands to reason that educators will engage in behaviors aimed at making that indicator as favorable as possible. No Child Behind Left The most prominent and empirically documented form of second-degree acts of cheating involves manipulating the test taking pool. An overreliance on test scores as the sole criteria for significant funding decisions has the unintended consequence of incentivizing exclusionary practices where educators—either blatantly or indirectly— remove low-test scorers from taking the test. Studies have shown that greater numbers of marginalized populations of students tend to be excluded from the test (Figlio & Getzler, 2002; Jacob, 2002). These exclusionary practices come in many forms. Figlio and Getzler’s (2002) study provides evidence that the introduction of high-stakes tests in Florida increased by almost 50% the likelihood a student would be classified with a learning disability, thereby making it more likely their test scores would be excluded from AYP calculations. Elsewhere, Figlio (2006) found that lower test scorers were more 25 likely to receive longer suspensions when caught in altercations with other, higher test scoring students. Sometimes exclusionary practices are more abrupt, aggressive, and obvious. In Birmingham Alabama, for example, just before the state test, 522 students were administratively “withdrawn” from the rolls (Orel, 2003). One of these students, attempting to re-enroll in school, noted: He [the principal] said that he would try to get…the students out of the school who he thought would bring the test scores down. He also gave us this same message over the intercom a couple of times after that. On the last day that I went to school, I was told to report to the principal’s office because my name was not on the roster. I was given a withdrawal slip that said “Lack of interest.” I did miss a lot of school days. I had family problems. I had allergies (Orel, 2003) And there are many examples. In North Carolina, some district-figures showed a spike in suspensions around test taking time, and in Florida in 2004, over 100 low scoring students were “purged” from attendance roles just before the state test (Nichols & Berliner, 2007b). In another more subtle effort at gaming the test taking pool, Figlio and Winicki (2004) looked at school lunches before, during, and after high-stakes tests in Virginia school districts. Their analysis of nutrition content and caloric amount throughout Virginia’s high-stakes testing administration period revealed that districts faced with the threat of imminent test-based sanctions were significantly more likely to provide higher (empty) caloric lunches than other districts not facing immediate threats of sanctions. This was done presumably in the quest to give lower scoring students a cognitive boost 26 on test days (Figlio & Winicki, 2004), thereby helping district to avoid federal sanctions. In contrast to the intent of NCLB to expand educational opportunities to historically marginalized populations, what we see over and over, is the exact opposite—high-stakes testing incentivizing ways to further deny opportunity and high quality educational experiences to all students. Vasquez Heilig & Darling-Hammond (2008) examine longitudinal urban student progress and learning over time in an large, urban district in Texas to see the effects of high-stakes testing and educational opportunities provided to African American and Latina/o students. Their analyses show that there were increases in student retention, student dropout/disappearance, and, ultimately, failure to advance to graduation, disproportionately affecting African Americans and Latina/os. Additionally, student retention and disappearance rates increased high schools’ average Reading and Math Exit TAAS (Texas Assessment of Academic Skills) scores and Texas Education Agency (TEA) accountability ratings, enabling a large, urban Texas district’s high schools to maintain a public appearance of success. Playing With Numbers Another form of second degree cheating has to do with politicization/gaming of cut scores. When consequences are tied to tests, it becomes extremely important that there are clear definitions of what level of performance constitutes passing and failing. If a test includes 30 items, for example, how many must a test taker get right to deem him/her proficient in that area? Must they get 29 correct? 28? As the stakes of doing well (or poorly) increase, we see explicit and implicit forms of gaming associated with how these arbitrary cut scores are determined (Glass, 2003). For example, in the spring of 27 2005, Arizona State Board of Education and Schools’ Chief Tom Horne publicly debated the merits of two different cut scores: one which would have resulted in 71% of Arizona students passing, and the other resulting in 60% of them passing. In short, the state board wanted “easier” standards while Tom Horne was arguing for “tougher” standards (Kossan, 2005). In Texas, TEA has consistently lowered the standards in successive testing regimes. Stutz (2001) reported that TEA lowered testing standards as “1.9 million students tested in math, reading and other subjects… were required to correctly answer significantly fewer questions to pass the high-stakes Texas Assessment of Academic Skills.” For example, in math, students had to get only about half the math questions right. Two years earlier, they had to get about 70 percent of the TAAS questions correct. TEA also conducted similar reductions to the TAKS testing standards. Mellon (2010) revealed, “The biggest change involved the social studies test for students in grades 8 and 10. This year, for example, eighth-graders had to answer correctly 21 of 48 questions — or 44 percent. Last year, the passing standard was 25 questions, or 52 percent.” The lower passing standards that TEA has consistently implemented over time calls into question the much- touted improvements on the state-mandated TAAS and TAKS testing regimes. In 2005, Achieve Inc compared state high-stakes test proficiency levels with those set by National Assessment for Educational Progress (NAEP), a federally funded achievement test viewed as a comparable assessment to most state tests. When it came to fourth grade math performance in 2005, states varied widely in how they defined “proficient” (Figure 9). Compared to NAEP’s standard of proficiency, Mississippi’s tests 28 were the “easiest” compared to NAEP standards, whereas Massachusetts’ assessments were much “harder.” [INSERT FIGURE 9 HERE] The nature of cut scores brings into question the meaningfulness (or contentrelated validity) of resultant high-stakes testing performance. Data we review from Texas (Figures 1-4) suggest that overtime greater proportions of African American and Latina/o students have attained minimum levels of competency on the state’s TAAS/TAKS tests. Although this pattern may represent the “truth” in the public policy sphere, the empirical research on the consequential nature and limited validity of testing in Texas makes this interpretation somewhat suspect. As others have pointed out, favorable patterns of student performance on Texas’s high-stakes tests are more likely the result of lowering of cut scores standards and suspicious exclusionary practices (removing low scorers from test taking) or other forms of data manipulation (e.g., misrepresentation of dropout/graduation rates), particularly with respect to African American and Latina/o populations (Haney, 2000; Linn, Graue, & Sanders, 1990; Shepard, 1990; Stutz, 2001; Mellon, 2010; Vasquez Heilig & Darling-Hammond, 2008). Thus, it is difficult to ascertain how much students actually learn under high-stakes testing environments (Amrein & Berliner, 2002; Nichols, Glass, & Berliner, 2006) Cheating Unknowingly Cheating in the third degree is characterized by involuntary actions or cheating without awareness such as when educators use readily available test items to prepare students but only focus on narrow testable standards, concept areas, and performance objectives (Amrein-Beardsley, Berliner, & Rideau, 2010). Studies show that as test- 29 related pressure goes up, distortions in sound instructional decision-making become more likely. Teachers report that the tests come to drive instruction, forcing them to teach to the test, narrow the curriculum, restrict their instructional creativity, and compromise their professional standards (Jones & Egley, 2004; Jones, Jones, & Hargrove, 2003; Pedulla et al., 2003; Perlstien, 2007). Vasquez Heilig (2011) examined how high schools in Texas have narrowed curriculum and pedagogy in response to Texas high-stakes exit testing. Teachers (11 of 33) and principals (6 of 7) from each of the four study high schools detailed aspects of “teaching to the test” and the impact of exit testing on the narrowing of the curriculum. A staff member at suburban high school in Texas was concerned that the attention being paid to the TAKS testing was distracting from the core mission of educating students. The respondent had been on staff at the high school for over a decade, which provided a long view of how the policy was changing the school environment: I think personally that we're so caught up in this game of making the numbers look good you know, so that your AYP and all this other stuff, the report card, that we've forgotten why we're here as educators. . . . I think it's very transparent, to them [students] too, that the emphasis is on teaching the tests, and manipulating, in a sense, the figures, rather than on focusing on really teaching. The students in the suburban high school also seemed concerned with how exit testing was impacting instruction and the quality of the curriculum. As the staff member mentioned, the tensions associated with the TAKS testing were also on the minds of the students. When asked whether the TAKS appeared in the daily curriculum, students 30 related that they had noticed that many of their courses had a heavy TAKS preparation focus. One student stated, In my pre-AP [Advanced Placement] we were working in class on one problem by day or sometimes, some days with practice like, many problems during the class. But in regular classes they just give you the [TAKS] book and the class was about that, was according with that book. The students who faced the heaviest test-prep focused courses were those who had not passed TAKS during prior testing opportunities. Students relayed that they were tracked into courses where the amount of TAKS curriculum was increased. A Texas student who had failed previous administrations of the TAKS Exit related, Every class I have, it’s based on the TAKS, you know? Like we do exercises that are in the TAKS. . . . Right now I’m taking the TAKS classes for the tests I need and basically he lectures the whole period [on TAKS]. . . . We don’t have homework. Student informants at a rural Texas high school related that the curriculum was saturated with teaching to the exit tests. They reported that their chemistry class entailed 100% TAKS test preparation—no textbooks, labs, experiments, or other traditional means of science curriculum. The entire chemistry course was solely designed to drill students for science exit testing by utilizing multiple-choice worksheets. The idea seemed somewhat implausible, until the rural high school chemistry teacher was randomly chosen to participate in a focus group. She characterized the worksheets in the course as being entirely geared for the TAKS: “Mine is not going through a 15-minute bell-ringer, 31 then going on to teaching chemistry. No, no me it’s everything, so mine are actual [TAKS] lessons. . . . I don’t just teach my course, now I teach towards the TAKS.” It is also an open question whether tested subjects receive more attention in an environment of accountability. While high-stakes testing did not halt nationwide arts education in America, it is readily apparent that the focus of NCLB is elsewhere (Chapman 2007; Zastrow & Janc, 2004). The only subjects for whom the federal government holds states accountable (through Adequate Yearly Progress, the accountability mechanism of NCLB) are reading and mathematics. Vasquez Heilig, Cole and Aguilar (2010) examined another class of third-degree cheating—the decline of arts education curriculum due to testing in language arts and mathematics. They tracked the evolution and devolution of visual arts education from the progressive era through the modern accountability movement. Their analysis of archival material, state curricular documents, and conversations with policymakers show an increasing focus in the accountability era on core subject areas of reading, writing, and mathematics at the expense of arts education. They found that high-stakes testing had narrowed the curriculum and pushed arts education and other non-core subjects to the margin. Berliner (2009) argued that the accountability era has also culminated a movement from curriculum being driven solely by local pedagogical and curricular discourse, to an environment where educational standards defined by the state and federal levels have determined the declining prominence and presence of the arts and other non-tested core subjects in school curriculum. Diane Ravitch, a widely respected conservative and initial supporter of No Child Left Behind, has recently reversed her view concluding that high-stakes testing has 32 eroded educational quality, restricted curriculum access and lowered expectations of our students (Ravitch, 2010). She concluded, At the present time, public education is in peril. Efforts to reform public education are, ironically, diminishing its quality and endangering its very survival. We must turn our attention to improving the schools, infusing them with the substance of genuine learning and revising that conditions that make learning possible (p. 242.) If we didn’t know any better, we might have guessed John Dewey made this statement. High-stakes testing pressure has resulted in view of learning as a product, a simple test score, rather than as a process. It has also narrowed our perspective of learners as their test score rather than as social beings who are learning to negotiate complex social, and academic environments (McCaslin 1996; 2006). This commodification of learning and learners as test scores or products fosters a specific kind of educational context that has the unintended effect of rewarding cheating—a type of corruption Campbell’s law predicted decades ago (1975). Although student cheating is a familiar worry among educators and researchers (Murdock & Anderman, 2006; Cizek, 1999; Hollinger & Lanza-Kaduce, 1996; Jensen et al., 2002; McCabe & Trevino, 1997), only since this institution of high-stakes testing systems have incidents of gaming become structural, visible and pernicious. Teachers and administrators, faced with the potential deleterious consequences of not making AYP (i.e., of students not scoring well on the test), behave in ways that will better ensure that resultant test scores are as favorable as possible. Although gaming the system harms everyone, marginalized populations are most at risk of enduring unintended negative consequences of rational acts of cheating enacted by teachers and administrators. 33 School leaders must be more knowledgeable about how much they can truly trust their students’ assessment data. This is especially true because of inherent validity uncertainties associated with these data. The quandary for schools leaders rests with what to do with these suspicious data from which they are asked to make decisions. Do you use the data to assess success or to push out students? School leaders are in a pivotal position of having to negotiate tensions between policy mandates (top down pressure) and the need to support hard working teachers and students (bottom up pressures). Educational leaders are the fulcrum of high-stakes testing and accountability policy as they manage the press from a manifold of directions. The question arises— What should school leaders do with the avalanche of data? Some school leaders approach schooling of children with Deweyan ideals, while others are blinded by gaming their data to produce enron-esque results. We argue that the former rather than the latter is the purpose of education. Future Research: Equity, Standards and Accountability There are several areas of research that would provide important information on the potential for standards and accountability to create more equity for districts and schools. One important area for future research is how standards and accountability have impacted how school districts and leaders go about the process of hiring and distributing teachers. There is paucity in the research literature on whether No Child Left Behind has remedied the historical inequality in the distribution of great teachers and leaders for lowperforming schools and students (Lipman, 2004). It seems equally important to better understand the relationship between school leaders’ experiences with high-stakes testing (i.e., as teachers) and their views of testing 34 later when they enter the schools as leaders. For example, Achinstein, Ogawa, and Speiglman (2004) found that the accountability movement along with the local conditions of specific schools that interact in ways that end up socializing two tracks of teachers going into teaching. Are there two tracks of school leaders? In general, there seems to be one track of leaders who seek out and are relatively amenable to local conditions where the instructional decision-making is out of one’s hands (i.e., district-controlled) and another track of school leaders who prefer greater degree of autonomy and independence. There are many potential explanations for this seemingly two-track system—which begs for more research to uncover the role in which previous experiences with tests may play in influencing later professional orientations and decision-making. Finally, the ultimate NCLB sanction levied after multiple years of failing to make Adequate Yearly Progress (AYP) is total school restructuring/closure. Restructuring policies under NCLB currently provide a broad framework for school districts to make strategic changes in order to improve achievement, including school reconstitution or turnaround. While there is a sizable amount of research on piecemeal school reform and restructuring efforts, there is very little coherent data or comprehensive research on the potential of the wholesale firing school leaders and teachers for improving student achievement. Conclusion Despite a lack of empirical research demonstrating that standards and accountability has accomplished equity for schools and districts, the current educational policy discourse suggests that the solution to the persisting minority-majority achievement gap, high African American and Latina/o dropout rates, and low graduation 35 rates is higher passing standards and even more high-stakes tests. Clearly, the structural aims of the administrative progressives have overwhelmed the pedagogical progressives’ focus on practice in the classroom (Labaree, 2005). No Child Left Behind, the current educational policy construct, has bypassed pedagogical progressive by entrenching and codifying administrative progressive principles of management efficiency rooted in quantitatively measurable outcomes. Thus, the ideas derived from administrative progressive era persist as the student-centered pedagogical progressives ideals wane. The consequences are potentially dire, especially for poor African American and Latina/o populations whose educational experiences are significantly compromised as the data and literature we review suggest. The historical roots of the design and implementation of high-stakes testing in modern educational policy is a didactic for school leaders in the midst of accountability. When it comes to positing new directions for new policy, it is important to understand that the ways in which our educational problems are defined are of central importance. In the fall of 2009, Secretary of Education, Arne Duncan spoke to a Washington-based education interest group, arguing for what was to become the Race to the Top initiative. In the quest for more top-down policies, he posited, “It’s not enough to define the problem. We’ve had that for 50 years. We need to find solutions—based on the very best evidence and the very best ideas.” Mr. Duncan’s assumption that we somehow know (and agree upon) the problems we face in improving America’s schools implies that there is a single, identifiable solution. The problems with education are quite varied (too many behavioral problems, underfunded school, large class sizes), necessitating a more nuanced set of solutions (more school counselors, more teachers, multilingual 36 environments). Thus, our educational problems are vast, varied, and necessitate a wide spectrum of solutions that acknowledge complex interrelationships and educational discourses (Weaver-Hightower, 2008). Mr. Duncan’s supposed reliance on “evidence” as the pathway to our educational solutions must also be viewed with skepticism. For administrators, it is an era of data or evidence-based decision making; however, evidence born out of high-stakes testing contexts are inherently unreliable and invalid, as we review throughout this chapter. But it is equally important that administrators be suspicious of their inherent biased interpretations of what that “evidence” represents. Spillane and Miele (2007, p. 48) put it this way, Contrary to many people’s intuitions, a single piece of information (e.g., “test scores are down”) does not entail a single conclusion (e.g., “classroom instruction is poor”). This is in part because information is always interpreted with respect to a person or organization’s existing beliefs, values, and norms. These constructs serve as the lens through which new information is understood and thus influence how information gets shaped into evidence. By considering the inadequacy of high-stakes testing and accountability for fomenting equity—this chapter seeks to push the field towards a new paradigm of standards and assessment as an ecology and move the field beyond the uneasy dichotomy that currently pits assessment as a technical exercise involving the quantification of cognitive abilities versus assessment as the humanistic endeavor of portraying learners' qualitative development (Falsgraf, 2008). Ultimately, if standards and high-stakes tests do not provide a quality assessment of knowledge or cognitive ability as measured by 37 college or workforce readiness—higher education and career success—then it would suggest an approach that is more ecological in nature, a development of a multiple measures approach that entails broader subjective and objective assessments that can better predict long-term student success. Returning to the initial framing of the genesis of high-stakes testing and accountability, administrative progressives and pedagogical/student-focused progressives represent an inherent tension in American education. The persistence of high-stakes accountability flies in the face of empirical research—there is no evidence to show that such a system is improving children's education in low-performing schools over the long term (especially as measured by graduation and dropout rates). Proponents of accountability are very adept at using the language of equity and other seemingly "progressive" concepts to promote the current NCLB-inspired system. The ascendancy and dominance of the administrative/positivistic paradigm has co-opted the equity and student-centered language of pedagogical progressivism and have in a sense silenced that debate by creating the appearance of striving for an educational ideal even at the same time it is being undermined. For school leaders that are pressed by high-stakes testing and accountability, this chapter may be welcome. However, those that support the rhetoric and the hegemony of these structures may be unlikely to support our conclusion. In response, we conclude with a thought proffered by Aronowitz and Giroux (1985, pp. 199-200), …the debate about the reality and promise of U.S. education should be analyzed not only on the strength of its stated assumptions but also on the nature of its structured silences, that is, those issues which it has chosen to ignore or 38 deemphasize. Such an analysis is valuable because it provides the opportunity to examine the basis of the public philosophy that has strongly influenced the language of the debate and the issues it has chosen to legitimate. Clearly, equity is a ratiocinative critique of high-stakes testing and accountability. 39 References Achinstein, B., Ogawa, R. T., & Speiglman, A. (2004). Are we creating separate and unequal tracks of teachers? The effects of state policy, local conditions, and teacher characteristics on new teacher socialization. American Educational Research Journal, 41(3), 557-603. Allington, R., & McGill-Franzen, A. (1992). Unintended effects of educational reform in New York. Educational Policy, 6(4), 397–414. Amrein, A. L. & Berliner, D. C. (2002). High-stakes testing, uncertainty, and student learning. Education Policy Analysis Archives, 10(18). Retrieved June 5 2007, from http://epaa.asu.edu/epaa/v10n18/. Amrein-Beardsley, A., Berliner, D.C. & Rideau, S. (2010). Cheating in the first, second, and third degree: Educators' responses to high-stakes testing. Educational Policy Analysis Archives, 18(14), Retrieved July 15, 2010 from http://epaa.asu.edu/ojs/article/view/714 Anderson, J. D. (2004). The historical context for understanding the test score gap. National Journal of Urban Education and Practice, 1(1), 1-21. Aronowitz, S. & Giroux, H. A. (1985). Education under siege: The conservative, liberal and radical debate over schooling. London: Bergin & Garvey Publishers Inc. Berliner, D. C. (2009). Rational response to high-stakes testing and the special case of narrowing the curriculum. Paper presented at the International Conference on Redesigning Pedagogy, National Institute of Education, Nanyang Technological University, Singapore. 40 Berliner, D. C. & Biddle, B. J. (1995). The manufactured crisis: Myths, fraud, and the attack on America’s public school. Reading, MA: Addison-Wesley Publishing. Binet. A., & Simon, T. (1916). The development of intelligence in children. Baltimore: Williams & Wilkins. Blanton, C. K. (2004). The strange career of bilingual education in Texas, 1836-1981. College Station: Texas A&M University Press. Boake, C. (2002). From the Binet-Simon to the Wechsler-Bellevue: Tracing the history of intelligence testing. Journal of Clinical and Experimental Psychology, 24, 383405. Campbell, D. (1975). Assessing the impact of planned social change. In G. Lyons (Ed.), Social research and public policies: The Dartmouth/OECD conference. Hanover, NH: Public Affairs Center, Dartmouth College. Carnoy, M. and Loeb, S. (2003). Does external accountability affect student outcomes? A cross-state analysis? Educational Evaluation and Policy Analysis, 24, 305-331. Carnoy, M., Loeb, S., & Smith, T. (2001) Do higher state test scores in Texas make for better high school outcomes? (CPRE Research Report, No. RR-047), Consortium for Policy Research in Education, Philadelphia, PA ERIC ED478984. Chapman, L. (2007). An Update on No Child Left Behind and National Trends in Education. Arts Education Policy Review, 109 (1): 25-36. Chapman, P. (1981). Schools as sorters: Testing and tracking in California, 1910- 1925. Journal of Social History, 14, 701-717. Cizek, G. J. (1999). Cheating on tests: How to do it, detect it, and prevent it. Mahwah, NJ: Erlbaum. 41 Cohen, D. (1996). Standards-based school reform: Policy, practice, and performance. In H. F. Ladd (Ed.), Holding Schools Accountable. Washington, D.C: The Brookings Institution. Cuban, L (1988). Constancy and Change in Schools, 1880 to Present. In P. Jackson (Ed.), Contributing to Educational Change. Berkeley, CA.: McCutchan. Cullen, J. & Reback, R. (2006). Tinkering toward accolades: School gaming under a performance accountability system. NBER Working Paper No. 12286. Dewey, J. (1938). Experience and education. New York, NY: Macmillan. Edmonson, A. (2003, September 21). Exams test educator integrity—emphasis on scores can lead to cheating, teacher survey finds. The Commercial Appeal. Falsgraf, C. (2008). The ecology of assessment. Language and Teaching, 42(4), 491-503. Figlio, D. N. (2006). Testing, crime, and punishment. Journal of Public Economics, 90(45), 837-851. Figlio, D. & Getzler, L. (2002). Accountability, ability and disability: Gaming the system? National Bureau of Economic Research working paper 9307. Figlio, D. & Winicki, J. (2004). Food for thought: The effects of school accountability plans on school nutrition. Journal of Public Economics, 89, 381-394. Gamson, D. (2007). Historical perspectives on democratic decision making in education: Paradigms, paradoxes, and promises. In P. A. Moss (Ed.), Evidence and decision making: The 106th Yearbook of the National Society for the Study of Education, Part 1 (pp. 15-45). Malden, MA: Blackwell Publishing Giordano, G. (2005). How testing came to dominate American schools: The history of educational assessment NY: Peter Lang. Glass, G. V. (2003 Revision). Standards and criteria redux. Retrieved June 5, 2007, from 42 http://glass.ed.asu.edu/gene/papers/standards/. Glass, G. V (2008). Fertilizers, pills, and magnetic strips: The fate of public education in America. Charlotte, NC: Information Age Publishing. Gould, S. J. (1996). The mismeasure of man. NY: WW Norton & Co. Grissmer, D. and Flanagan, A. (1998) Exploring Rapid Achievement Gains in North Carolina and Texas: Lessons from the States (Washington, D.C.: National Education Goals Panel). Grubb, N. (1985) The Initial Effects of House Bill 72 on Texas Public Schools: The Challenges of Equity and Effectiveness (Austin: Lyndon B. Johnson School of Public Affairs). Haney, W. (2000). The myth of the Texas miracle in education. Education Policy Analysis Archives, 8(41) Retrieved from http://epaa.asu.edu/epaa/v8n41/ Herman, J. L. and Haertel, E. H. (Eds.) (2005). Uses and Misuses of Data for Educational Accountability and Improvement: The 104th Yearbook of the National Society for the Study of Education, Part II. Malden, MA: Blackwell. Hollinger, R. C., & Lanza-Kaduce, L. (1996). Academic dishonesty and the perceived effectiveness of countermeasures: An empirical survey of cheating at a major public university. NASPA Journal, 33, 292-306. Jacob, B. (2002). Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago public schools. National Bureau of Economic Research working paper 8968. Jacob, B. (2005). Accountability, incentives and behavior: The impact of high-stakes testing in the Chicago Public Schools. Journal of Public Economics, 89(5-6), 761 43 796. Jacob, B. & Levitt, S. (2002a). Rotten apples: An investigation of the prevalence and predictors of teacher cheating (Working Paper #9413). Cambridge, MA: National Bureau of Economic Research. Retrieved June 5, 2007 from http://www.nber.org/papers/w9413/ Jacob, B. & Levitt, S. (2002b). Catching cheating teachers: The results of an unusual experiment in implementing theory (Working Paper #94149). Cambridge, MA: National Bureau of Economic Research. Retrieved June 5, 2006, from http://www.nber.org/papers/w9414/. Jencks, C., & Phillips, M. (Eds.) (1998). The Black-White test score gap. Brookings Institute Press. Jennings, J. & Beveridge, A. (2009). How does test exemption affect schools’ and students’ academic performance? Educational Evaluation and Policy Analysis, 31(2), 153-175. Jensen, L. A., Arnett, J. J., Feldman, S., & Cauffman, E. (2002). It’s wrong, but everybody does it: Academic dishonesty among high school and college students. Contemporary Educational Psychology, 27, 209-228. Johnson, R. (2008, October). Texas public school attrition study, 2007-08: At current pace, schools will lose many more generations. IDRA Newsletter. Retrieved from http://www.idra.org/newsletterplus/October_2008/ Jones, B. & Egley, R. (2004). Voices from the frontlines: Teachers’ perceptions of highstakes testing. Education Policy Analysis Archives, 12(39). Retrieved June 5, 2007, from http://epaa.asu.edu/epaa/v12n39/. 44 Jones, M.G., Jones, B., and Hargrove, T. (2003). The unintended consequences of highstakes testing. Lanham, MD: Rowman & Littlefield. Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, M. B. (2000). What do test scores in Texas tell us? (Issue Paper). Santa Monica, CA: RAND. Kossan, P. (2005, May 13). State deems failing grades good enough to pass AIMS. Arizona Republic. Retrieved June 5, 2007, from http://www.azcentral.com/arizonarepublic/news/articles/0513scores13.html. Labaree, D. F. (2005) Progressivism, Schools and Schools of Education: An American Romance. Paedagogica Historica, 41, 275-288 Linn, R. L., Graue, M. E., & Sanders, N. M. (1990). Comparing state and district test results to national norms: The validity of claims that “everyone is above average.” Educational Measurement: Issues and Practice, 9(3), 5-14. Linton T. H. & Kester, D. (2003, March 14). Exploring the achievement gap between white and minority students in Texas: A comparison of the 1996 and 2000 NAEP and TAAS eighth grade mathematics test results, Education Policy Analysis Archives, 11(10). Retrieved March 5, 2009 from http://epaa.asu.edu/epaa/v11n10/ Lipman, P. (2004). High stakes education: Inequality, globalization and urban school reform. NY: RoutledgeFarmer. Losen, D., Orfield, G., & Balfanz, R. (2006). Confronting the graduation rate crisis in Texas. Cambridge, MA: The Civil Rights Project at Harvard University. Lynd, R. S., & Lynd, H. M. (1929). Middletown: A study in modern American culture. New York: Harcourt, Brace & World. 45 Mazzeo, C. (2001). Frameworks of state: Assessment policy in historical perspective. Teachers College Record, 103(3), 367-397. McCabe, D. L., & Trevino, L. K. (1997). Individual and contextual influences on academic dishonesty: A multicampus investigation. Research in Higher Education, 38, 379-396. McCaslin, M. (1996). The problem of problem representation: The Summit’s conception of student. Educational Researcher, 25, 13-15. McCaslin, M. (2006). Student motivational dynamics in the era of school reform. Elementary School Journal, 106, 479-490. McDonnell, L. (2005). Assessment and accountability from the policymaker’s perspective. In J. L. Herman & E. H. Haertel (Eds.), Uses and Misuses of Data for Educational Accountability and Improvement: The 104th Yearbook of the National Society for the Study of Education, Part II, Malden, MA: Blackwell. McNeil, L. (2000). Contradictions of school reform: Educational costs of standardized testing. New York, NY: Routledge. McNeil, L. (2005), “Faking equity: high-stakes testing and the education of Latino youth”, In A. Valenzuela (Eds.), Leaving Children Behind: How “Texas-style” Accountability Fails Latino Youth. State University of New York Press, New York, NY. McNeil, L. M., Coppola, E., Radigan, J., & Vasquez Heilig, J. (2008). Avoidable losses: High-stakes accountability and the dropout crisis. Education Policy Analysis Archives, 16(3). Retrieved June 20, 2009, from http://epaa.asu.edu/epaa/v16n3/ McNeil, L. & Valenzuela, A. (2001). “The harmful impact of the TAAS system of testing 46 in Texas: Beneath the accountability rhetoric.” In M. Kornhaber & G. Orfield, (Eds.,), Raising Standards or Raising Barriers? Inequality and High Century Foundation (pp. 127-150). NY: Century Foundation Press. Mellon, E. (2010, June 7). Qualms arise over TAKS standards. The Houston Chronicle. Moss, P. A. (2007) (Ed.). Evidence and decision making: The 106th Yearbook of the National Society for the Study of Education, Part 1. Malden, MA: Blackwell Publishing. Murdock, T. B., & Anderman, E. M. (2006). Motivational perspective on student cheating: Toward an integrated model of academic dishonesty. Educational Psychologist, 41(3), 129-145. National Commission for Excellence in Education (1983). A Nation at Risk: The Imperatives for Educational Reform. Washington DC: US Department of Education, National Commission for Excellence in Education. Nichols, S., & Berliner, D. C. (2007a). Collateral damage: How high-stakes testing corrupts America’s schools. Cambridge, MA: Harvard Educational Press. Nichols, S., & Berliner, D. C. (2007b). The pressure to cheat in a high-stakes testing environment. In E. M. Anderman & T. B. Murdock (Eds.), Psychology of academic cheating (pp. 289-312). NY: Elsevier. Nichols, S. L., Glass, G. V, & Berliner, D. C. (2006). High-stakes testing and student achievement: Does accountability pressure increase student learning? Education Policy Analysis Archives, 14(1). Retrieved June 30, 2009, from http://epaa.asu.edu/epaa/v14n1/. 47 Nichols, S. L., & Good, T. (2004). America’s teenagers—Myths and realities: Media images, schooling, and the social costs of careless indifference. Mahwah, NJ: Erlbaum. Orel, S. (2003). Left behind in Birmingham: 522 pushed-out students. In R. Cossett Lent & G. Pipkin (Eds.), Silent no more: Voices of courage in American schools. Portsmouth, NJ: Heinemann. Orfield, G. & Kornhaber, M. L. (Eds.). (2001). Raising standards or raising barriers? Inequality and high stakes testing in public education. New York: The Century Foundation Press. Orfield, G., Losen, D., Wald, J., & Swanson, C. B. (2004). Losing our future: How minority youth are being left behind by the graduation rate crisis. Cambridge, MA: The Civil Rights Project at Harvard University. Pedulla, J., Abrams, L., Madaus, G., Russell, M., Ramos, M., Jing, M. (2003). Perceived effects of state-mandated testing programs on teaching and learning: findings from a national survey of teachers. Boston: National Board on Educational Testing and Public Policy. Perlstein, L. (2007). Tested: One American school struggles to make the grade. NY: Henry Holt & Co. Ravitch, D. (2010). The death and life of the great American school system: How testing and choice are undermining education. NY: Basic Books. Ryan, J. (2004). The perverse incentives of the No Child Left Behind Act. New York University Law Review, 79, 932-989. 48 Rumberger, R. and Arellano Anguiano, B. (2004). Understanding and addressing the California Latino achievement gap in early elementary school. UC Linguistic Minority Research Institute Working Paper. Accessed at http://lmri.ucsb.edu. Sacks, P. (1999). Standardized minds: The high price of America’s testing culture and what we can do to change it. Cambridge, MA: Perseus Publishing. Samelson, F. (1977). World war I intelligence testing and the development of psychology. Journal of the History of the Behavioral Sciences, 13(3), 274-282. Scheurich, J. J., Skrla, L., Johnson, J. F. (2003). Thinking carefully about equity and accountability. In L. Skrla, & J. J. Scheurich (Eds.), Educational equity and accountability: Paradigms, policies and politics. NY: Routledge. Shepard, L. A. (1990). Inflated test scores gains: Is the problem old norms or teaching the test? Educational Measurement: Issues and Practice, 9(3), 15-22. Smith, M., & O’Day, J. (1991). Systemic school reform. In S. Fuhrman & B. Malen (Eds.), The Politics of Curriculum and Testing. NY: Falmer. Southern Education Foundation (2010). A New Diverse Majority. Atlanta, GA. Spillane, J. & Miele, D. (2007) Evidence in Practice: A Framing of the Terrain. In P. Moss (Ed.), Evidence and decision making: The 106th Yearbook of the National Society for the Study of Education, Part 1 (pp. 46-73). Malden, MA: Blackwell Publishing. Starch, D., & Elliott, E. C. (1912). Reliability of the grading of high-school work in English. School Review, 20, 442-457. Stutz, E. (2001, June 9). Bar for passing TAAS lowered cutoff scores for math test debated. The Dallas Morning News. 49 Texas Comptroller of Public Accounts. (2001). Special report: Undocumented immigrants in Texas, a financial analysis of the impact to the state budget and economy (Publication No. 96-1224). Retrieved June 1, 2009, from http://www.window.state.tx.us/specialrpt/ undocumented/undocumented.pdf Texas Education Agency. (1998). Grade-level retention in Texas public schools, 1996–97 (Document No. GE08 601 07). Austin, TX: Author. Texas Education Agency. (1999). 1996–97 report on high school completion rates. Retrieved June 16, 2009, from http://ritter.tea.state.tx.us/research/pdfs/9697comp.pdf Texas Education Agency. (2000). Academic Excellence Indicator System reports, SY 2000–01. Retrieved June 1, 2009, from http://ritter.tea.state.tx.us/perfreport/aeis/ 2000/state.html Texas Education Agency. (2001). Secondary school completion and dropouts in Texas public schools 1999–00. Retrieved June 15, 2009, from http://ritter.tea.state.tx.us/research/ pdfs/9900drpt.pdf Texas Education Agency. (2002). Three-year follow-up of a Texas public high school cohort. Retrieved June 11, 2009, from http://ritter.tea.state.tx.us/research/pdfs/wp06.pdf Texas Education Agency. (2003a). Secondary school completion and dropouts in Texas public schools 2000–01 retrieved June 15, 2009, from http://ritter.tea.state.tx.us/ 50 research/pdfs/ 0001drpt_reprint.pdf Texas Education Agency. (2003b). Statewide TAAS results—Percent passing tables Spring 1994–Spring 2002, Grade 10, Reading, Mathematics, Writing. Retrieved June 11, 2009, from http://ritter.tea.state.tx.us/student.assessment/reporting/ results/swresults/august/ g10all_au.pdf Texas Education Agency. (2004a). Academic Excellence Indicator System: 2003–2004. Austin, TX: Author. Texas Education Agency. (2004b). Secondary school completion and dropouts in Texas public schools 2001–02. Retrieved June 15, 2009, from http://ritter.tea.state.tx.us/research/ pdfs/0102drpt.pdf. Texas Education Agency. (2005). Secondary school completion and dropouts in Texas public schools 2002–03. Retrieved June 15, 2009, from http://ritter.tea.state.tx.us/research/ pdfs/dropcomp_2002-03.pdf Texas Education Agency. (2006). Secondary school completion and dropouts in Texas public schools 2003–04. Retrieved June 15, 2009, from http://ritter.tea.state.tx.us/research/ pdfs/dropcomp_2003-04.pdf. Texas Education Agency. (2007). Secondary school completion and dropouts in Texas public schools 2005–06. Retrieved June 15, 2009, from 51 http://ritter.tea.state.tx.us/research/ pdfs/dropcomp_2005-06.pdf Texas Education Agency. (2008a). Grade-level retention in Texas public schools, 2006– 07 (Document No. GE09 601 01). Austin, TX: Author. Texas Education Agency. (2008b). Secondary school completion and dropouts in Texas public schools 2006–07. Retrieved June 15, 2009, from http://ritter.tea.state.tx.us/research/ pdfs/dropcomp_2006-07.pdf Texas Education Agency. (2009). Statewide TAKS performance results. Retrieved January 27, 2010, from http://www.tea.state.tx.us/index3.aspx?id=3220&menu_id3=793 Texas Senate Bill 7, 73rd Texas Legislature, Education Code § 16.007 (1993). Thorndike, E. L. (1923). Education: A first book. NY: Macmillan. Timmer, A. & Williamson, J. (1998). Immigration Policy Prior to the 1930s: Labor Markets, Policy Interactions, and Global Backlash, Population and Development Review , 24 (2), 739-771. Toenjes, L. A., & Dworkin, A. G. (2002). Are increasing test scores in Texas really a myth, or is Haney's myth a myth? Education Policy Analysis Archives, 10(17). Available from http://epaa.asu.edu/epaa/v10n17/ Tyack, D. (1974). The One Best System: A History of American Urban Education. Cambridge: Harvard University Press. Tyack, D. (1976). Ways of Seeing: An Essay on The History of Compulsory Schooling. Harvard Educational Review, 46, 355-389. 52 Tyack, D., & Cuban, L. (1995). Tinkering toward utopia: A century of public school reform. Cambridge, MA: Harvard University Press. Tyack, D., James, T., & Benavot, A. (1987). Law and the shaping Addison-Wesley. U.S. Census Bureau. (2001). The Hispanic population reports: Census 2000 brief. Washington, DC: U.S. Department of Commerce, Economics and Statistics Administration. U.S. Census Bureau. (2006). U.S. population estimates by age, sex, race, and Hispanic origin: July 1, 2005. Washington, DC: U.S. Department of Commerce, Economics and Statistics Administration. U.S. Census Bureau. (2008). An Older and More Diverse Nation by Midcentury. U.S. Census Bureau News, Press Release CB08-123. Washington, DC: Author. U. S. Census Bureau. (2009). School Enrollment--Social and Economic Characteristics of Students: October 2008 Retrieved November 9, 2009, from http://www.census.gov/population/www/socdemo/school/cps2008.html Valencia, R. R. & Bernal, E. M. (2000). An overview of conflicting opinions in the TAAS case. Hispanic Journal of Behavioral Sciences, 22(4), 423-443. Vasquez Heilig, J. (2011). Understanding the interaction between high-stakes graduation tests and English language learners. Teachers College Record, 113(12), p. Vasquez Heilig, J., Cole, H. & Aguilar, A. (2010). From Dewey to No Child Left Behind: The Evolution and Devolution of Public Arts Education. Arts Education Policy Review, 111(4), 136-145. 53 Vasquez Heilig, J & Darling-Hammond, L (2008). Accountability Texas-style: The progress and learning of urban minority students in a high-stakes testing context. Educational Evaluation and Policy Analysi, 30(2), 75-110. Vasquez Heilig, J., Dietz, L. & Volonnino, M. (2011). From Jim Crow to the Top 10% Plan: A historical analysis of Latino access to a selective flagship university. Enrollment Management Journal: Student Access, Finance, and Success in Higher Education, in press. Vasquez Heilig, J. & Williams, A. & Young, M. (2011). At-risk student averse: Risk management and accountability. University of Texas at Austin. Working Paper. Vygotsky, L. S. (1978). Mind in society. Cambridge, MA: Harvard University Press. (Edited by M. Cole, V. John-Steiner, S. Scribner, & E. Souberman) Weaver-Hightower, M. B. (2008). An ecology metaphor for educational policy analysis: A call to complexity. Educational Researcher, 37(3), 153-167. Young, M. D., and Brewer, C. (2008), “Fear and the preparation of school leaders: The role of ambiguity, anxiety and power in meaning making”, Journal of Education Policy, 22(1), 106-129. Zastrow, C. & Janc, H. (2004). Academic atrophy: The condition of the liberal arts in America’s public schools. Washington DC: Council for Basic Education. 54 Figure 1. Texas Assessment of Academic Skills (TAAS) Exit Math: Percent meeting minimum standards (1994–2002). Source: Statewide TAAS Results, by the Texas Education Agency, 2003b. 55 Figure 2. Texas Assessment of Academic Skills (TAAS) Exit Reading: Percent meeting minimum standards (1994–2002). Source: Statewide TAAS Results, by the Texas Education Agency, 2003b. 56 Figure 3. Texas Assessment of Knowledge and Skills (TAKS) Exit Math: Percent meeting minimum standards (2003–2009). Source: Statewide TAKS Performance Results, by the Texas Education Agency, 2009. 57 Figure 4. Texas Assessment of Knowledge and Skills (TAKS) Exit English Language Arts: Percent meeting minimum standards (2003–2009). Source: Statewide TAKS Performance Results, by the Texas Education Agency, 2009. 58 Figure 5. Dropout rates (1995–2008). Source: Secondary school completion and dropout data from the Texas Education Agency.vi 59 Figure 6. Cohort dropout rates (1998–2008). Source: Secondary school completion and dropout data from the Texas Education Agency.vii 60 Figure 7. Cohort graduation rates (1996–2008). Source: Secondary school completion and dropout data from the Texas Education Agency.viii 61 Figure 8. From http://www.idra.org/IDRA_Newsletter/October_2009_School_Holding_Power/Texa s_Public_School_Attrition_Study_2008_09/ 62 Figure 9. Achieve, Inc. (2005). NAEP Vs. state proficiency 2005. Retrieved June 5, 2007, from http://www.achieve.org/node/482. 63 i Although Asian and Latin American immigration increased steadily through much of the 19th century and the start of the 20th century, these regions still contributed substantially fewer newcomers than Europe during this time period. ii Southern Education Foundation (2010) relates “Six of these states were in the South and five, including Hawaii, were in the West. Nine of the ten states in the continental US were at or near the nation’s southern border. Latina/os represented almost nine out of every10 non-White students in the West, where there was also a higher percentage of Asian-Pacific students (9 percent) than African American students (6 percent). African Americans were not the largest non-White student group in any of the Western states.” iii As Mazzeo (2001) points out, other states and regions adopted various forms of highstakes testing (e.g., New York). We focus on Texas because of its history of experimenting with accountability-based testing and because of our familiarity with resultant data. iv The most recent publicly available data at the time of writing were utilized in the research. v To understand student leavers, a cohort method is more desirable than the yearly snapshot as it considers what happens to a group of students over time and is based on repeated measures of each cohort to reveal how students progress in school. The cohort method is more accurate than the yearly snapshot dropout rate that TEA historically has highlighted in the public sphere. vi Data from TEA, 2001, 2003a, 2004b, 2005, 2006, 2007, 2008b. vii In 1999, TEA first reported longitudinal EL dropout rates for students who left the system between Grades 9 and 12 for reasons other than obtaining a GED certificate or graduation (TEA, 2002). Data from TEA, 2001, 2003a, 2004b, 2005, 2006, 2007, 2008b. viii Data from TEA, 2001, 2003a, 2004b, 2005, 2006, 2007, 2008b.