NCLB: Changing It; Fixing It; Living With It NCLB: Changing It; Fixing It; Living With It NATIONAL ASSOCIATION OF TEST DIRECTORS 2007 SYMPOSIUM Organized by: Bonnie Wilkerson, Ed.D. Northbrook (IL) School District 27 Edited by: Joseph O'Reilly, Ph.D. Mesa (AZ) Public Schools National Association of Test Directors 2007 Proceedings 1 NCLB: Changing It; Fixing It; Living With It This is the twenty-third volume of the published symposia, papers and surveys of the National Association of Test Directors (NATD). This publication serves an essential mission of NATD - to promote discussion and debate on testing matters from both a theoretical and practical perspective. In the spirit of that mission, the views expressed in this volume are those of the authors and not NATD. The paper and discussant comments presented in this volume were presented at the April, 2007 meeting of the National Council on Measurement in Education (NCME) in Chicago, Illinois. National Association of Test Directors 2007 Proceedings The authors, organizer and editor of this volume are: Robert Linn University of Colorado at Boulder PO Box 1815 100 Fifth St Ouray, CO 81427 robert.linn@colorado.edu Judy Feil Ohio Department of Education Office of Assessment, Mail Stop #507 25 South Front Street Columbus, OH 43215 judy.feil@ode.state.oh.us David Kroeze Northbrook School District 27 1250 Sanders Road Northbrook, IL kroeze.d@northbrook27.k12.il.us Barbara Boyd Education Partnerships Officer, Nationwide Mutual Insurance Company One Nationwide Plaza Columbus, OH 43215 boydb@nationwide.com Glynn Ligon ESP Solutions Group 8627 N. Mopac, Suite 400 Austin, TX 78759 gligon@espsg.com Bonnie Wilkerson, Organizer and Moderator Northbrook School District 27 1250 Sanders Road Northbrook, IL wilkerson.b@nb27.org Joe O'Reilly, Editor Mesa Public Schools 63 East Main Street #101 Mesa, AZ 85201 (480) 472-0241 joreilly@mpsaz.org 1 NCLB: Changing It; Fixing It; Living With It National Association of Test Directors 2007 Proceedings Table of Contents A Nationwide Overview of NCLB Robert Linn.....................................................................................................................1 The State of NCLB at a State Department of Education Judy Feil…………………………………………….................................................…19 Is A High Performing District’s Performance High Enough for NCLB? David Kroeze ….……………….………….……….………………………….......…28 Business Looks at NCLB’s Bottom Line Barbara Boyd…….……………….………….……….…………………………....…39 Discussant Comments Glynn Ligon…….……………….……………….………….………………….……48 1 NCLB: Changing It; Fixing It; Living With It Needed Modifications of NCLB Bob Linn CRESST, University of Colorado at Boulder NCLB has much that is worthwhile. It is particularly praiseworthy for its emphasis on all children, for the special attention it gives to improving learning for children who have lagged behind in the past, and for the attention given to closing persistent gaps in achievement. Although there are definition and identification problems, NCLB has also called attention to the need to have qualified teachers. It is important that the positive aspects of NCLB be preserved National Association of Test Directors 2007 Proceedings in the reauthorization process that is now under consideration. It is also important, however, that some fundamental changes be made to make the law more functional. Among the major issues that need to be addressed are funding, providing increased flexibility to states, districts, and schools in implementing the law, and defining teacher quality. My focus, however, is much narrower and more specific. My focus is on fixing the NCLB accountability system. In my judgment, there are several fundamental problems with the NCLB accountability system and those problems are serious enough that they threaten to undermine the more laudable aspects of the law. There are many problems with the NCLB accountability system, but I will focus on just four of them that I believe are especially serious. These are: (1) unrealistic expectations, (2) the meaning of proficient achievement, (3) the reliance on current-status targets, and (4) the use of multiple hurdles. A fifth problem that is really caused by states trying to do something reasonable in light of the four just identified problems to keep an overabundance of schools from failing to make adequate yearly progress (AYP). States have found ways to game the system that have been approved by the U.S. Department of Education to avoid having many more schools being identified as needs improvement than can possibly be provided with effective assistance. I include as approved game playing such activities as watering down the definition of proficient achievement, backloading state trajectories to the 2014 goal of 100% proficiency, increasing the minimum number of students required to hold schools responsible for subgroup results, and the use of confidence intervals - both for initial AYP calculations and for safe-harbor calculations (see, for example, Porter, Linn, & Trimble, 2005 for a discussion of some of these issues). Unrealistic Expectations The title of the law, No Child Left Behind, is rhetorically brilliant. No reasonable person would argue that the education of some children should be ignored or that some identified fraction of students should be left behind. Our society has a moral obligation to attempt to provide a high quality education for all children. Society also has a vested interest in enhancing the educational level of all students. This does not mean, however, that it is reasonable to expect that all students will, in fact, achieve a high level of proficiency in reading and mathematics. As will be discussed in the next section, NCLB does not provide a detailed definition of proficient achievement, but it does specify that states must set 1 NCLB: Changing It; Fixing It; Living With It “challenging achievement standards” and that the proficient level be one of two high levels of achievement “that determine how well children are mastering the material in the State academic content standards” (NCLB, 2001, Part A, Subpart 1, Sec. 1111 (b) (D) (ii)). The NCLB mandate that all children achieve at least the proficient level is totally unrealistic when coupled with the mandate that states set a “challenging” proficient achievement standard. As Rothstein, Jacobsen, and Wilder (2006) so aptly put it, “’proficiency for all’ is an oxymoron” (p. 16). I have previously used trends observed in reading and mathematics on the National Assessment of Educational Progress (NAEP) and the distributions of performance of students from other countries on the Third International Mathematics and Science Studies (TIMSS) to show that 100% proficient is a goal that is completely out of reach even with extraordinary effort on the part of teachers and students (Linn, 2000; 2003; in press). Although there have been fairly sizeable increases in the percentage of students who score at the proficient level or above on the NAEP mathematics assessments, particularly at the fourth grade, the rate of improvement would have to several times as fast in the twenty years than it was in the last decade to come close to 100% by the year 2020, much less to achieve that level in just seven more years. In reading, where the trend lines on NAEP are best described as essentially flat, the prospect of obtaining universal proficiency is even more remote. As Rothstein and his colleagues point out, the issue is more than just the time line. “There is no date by which all (or even nearly all) students in any subgroup, even middle-class white students, can achieve proficiency” Rothstein, Jacobsen, & Wilder, 2006, p. 1). The results of international assessments (see, for example, Mullis, Martin, & Foy, 2005; Mullis, Martin, Gonzalez, & Kennedy, 2003; Organization for Economic Co-operation and Development, 2004) clearly demonstrate that there is substantial variation in student achievement in every participating country. As a result of the large within-country variation, no country has nearly all its students performing above some high standard of achievement that would correspond to proficient achievement as is demanded by NCLB. Based on a linking of the 1999 TIMSS to the 2000 NAEP, Philips (2007), for example, has shown that while countries such as Singapore and Korea come reasonably close to having all their students perform at or above the NAEP basic level on grade 8 mathematics, no country has even three quarters of their students at or above the proficient level. The only way to have all, or nearly all, students exceed a standard of performance is to set that standard at a very low level. A standard that is obtainable by the lowest achieving students will not be a challenge for moderately high achieving students. On the other hand, a standard that is 9 National Association of Test Directors 2007 Proceedings challenging to moderately high achieving students will be unobtainable for the lowest achieving students (Rothstein, Jacobsen, & Wilder, 2006). If proficient remains a challenging standard of achievement and the universal proficiency goal is maintained for the year 2014, then nearly all schools will fail to meet the goal. Although it may be reasonable to believe that improvement is desirable for everyone and every school, it makes no sense to impose sanctions on schools because they fall short of reaching the unrealistic goal of all students achieving at the proficient level or above. Goals should be ambitious, but they should also be obtainable given sufficient effort and adequate resources. How can a goal be set that is ambitious, but realistically obtainable? One way is to rely on past experience. Schools might be rank ordered in terms of the rate of improvement in student achievement on the state’s assessments in reading and in mathematics over the past 4 or 5 years. The highest ranking, say 20%, of schools in terms of gains made on each assessment could then be used to set the goal for all schools. If the top 20% increased the percentage of students performing at the proficient level or above by an average of, say, 2% per year in reading and, say, 3% per year in mathematics then increases of 2% and 3% per year could be set as the goals for reading and mathematics respectively for all schools. Those would certainly be an ambitious goals for schools that had shown little if any improvement or possibly even declined during the past 4 or 5 years, but it would also be based on the knowledge that continued improvement at the identified rate is possible as demonstrated by the performance of the set of schools that were used to set the goals. Proficient Achievement As was previously noted, NCLB specifies that the proficient achievement standard should be challenging and represent a high level of achievement, but it does not give a detailed definition of proficient achievement. Rather, it is left to each state to define the proficient standard. States typically set their achievement standards for each assessment by convening a panel of judges who use a provisional definition of proficient achievement when reviewing items on a state assessment. Judges translate the provisional definition of proficient achievement into a cut score on the test using the standard setting method selected by the state. The stringency of proficient standard varies widely from state to state. This is evident from even a cursory consideration of the percentage of students who scored at the proficient level for different states. Olson (2005) reported the percentage of students who scored at the proficient level or above in reading and 1 NCLB: Changing It; Fixing It; Living With It in mathematics at grades 4 and 8 for 47 states.1 The percentage proficient or above in reading ranged from 35% to 89% at grade 4 and from 30% to 88% at grade 8. In mathematics the range was even larger, from 29% to 92% at grade 4 and from 16% to 87%. These ranges are much larger than the corresponding ranges found on the 2005 NAEP state-by-state assessments in reading and mathematics at those grades. Furthermore, the states with extremely high or low percentage proficient or above make no sense in terms of other things that are known about education in those states. Only 16% of the students were proficient or above on the 2005 grade 8 mathematics assessment in Missouri whereas 87% of the 8th graders in Tennessee were reported to be at that level. Even without resorting to a comparison of achievement on NAEP it simply is not credible that more than 5 times as many 8th grade students are proficient in mathematics in Tennessee than in Missouri. It becomes even less plausible when the 71 percentage point difference on the state assessments is contrasted with the finding that the percentage of grade 8 students who scored at the proficient level or above on NAEP mathematics assessment in 2005 was slightly higher for Missouri (26%) than for Tennessee (21%) (Perie, Grigg, & Dion, 2005). There are several preferable approaches to reporting results in terms of percent proficient. For example, the standard could be defined to be equal to the state median score in a base year. The percentage of students scoring above that constant cut score would then be used to monitor improvement in achievement. Target increases set based on past experience with the gains made by schools showing high rates of improvement in each of the last several years. This might lead to a figure something like a 3% increase per year in the percentage of students above the state median. For a school with 50% of their students above the state median in the 2006 base year the goal would 53%, in 2007. The goal would be 74% and in 2014. That would represent a gigantic improvement in the achievement of the state’s students, but might not be totally unrealistic, and surely is not as implausible as 100% proficient or above. Another alternative would be to use what Jim Popham (2004) has called gradelevel descriptions. At grade level might correspond more closely to the “basic” than the “proficient” level in most states. Using past experience, targets could be set that would bring the achievement of an ever increasing percentage of students up to the “at-grade-level” standard. 1 The closest grade was used for states that did not have assessments in one of the subjects at grade 4 or grade 8. 11 National Association of Test Directors 2007 Proceedings Current-Status vs. Growth or Improvement Although the NCLB accountability system might appear to focus on improvement as suggested by the word progress in AYP, it actually focuses on current status. Schools where students who are already achieving at relatively high levels, for example, can actually have a decline in achievement from one year to the next, and still make AYP. Schools with very low achievement initially, on the other hand, will routinely fail to meet AYP even if they show rather sizeable year-to-year gains in student achievement. With the exception of the rarely applicable safe harbor provision, AYP focuses on current achievement in a given year in comparison to an Annual Measurable Objective (AMO) for that year rather than changes in achievement from one year to the next. Consequently, schools that have a high achieving level to begin with have a relatively easy time meeting AYP without any gains in achievement, at least in the first few years. On the other hand, schools with initially low achieving students would have to have extraordinary improvement in achievement to meet AMO. Consequently, many schools that are actually showing considerable progress, and deserve recognition for the gains they are making, fail to meet AYP because of their initial low performance. Basing evaluations of schools almost exclusively on current performance of students in relationship to fixed targets ignores the fact that schools differ substantially in the achievement of their students when they enter school. It privileges schools serving students who are already high achieving and puts schools serving initially low achieving students at a substantial disadvantage. The inference that a school A is of low quality or that the teachers in school A are less effective than those in School B based solely on the fact that the percentage of students who are at the proficient level or above in a given year is smaller in school A than the corresponding percentage at school B is simply not justified because there are so many other possible explanations of the difference, most notably that the students in the two schools differed in their levels of achievement at the start of the year or when they entered first grade. Many state devised school accountability systems base their evaluations of schools on a combination of current status measures and improvement in student achievement from one year to the next. Therefore it is not surprising that a number of states have expressed interest in the possibility of changing the way in which AYP is determined for NCLB to allow greater emphasis on improvement. A change in the NCLB accountability system that would allow schools to meet AYP either because their current achievement met a target or because the improvement in achievement met an improvement target seems desirable. This 1 NCLB: Changing It; Fixing It; Living With It might be accomplished with a less stringent safe harbor criterion. Consistent with proposals above, both the current year achievement target and the improvement target should be set in light of what has been shown to be possible by schools that have shown substantial gains over a period of 4 or 5 years. An alternative way of evaluating change in achievement that is attractive to several states is the use of longitudinal student records to track the growth in achievement for individual students. Analytical procedures, commonly referred to as value-added models, are used to estimate the school effects on student growth. Consideration should be given to the possibility of allowing states to use results of value-added analyses to provide evidence of improved achievement. The value-added results could be used, possibly in combination with status measures, to satisfy AYP requirements. In response to widespread interest in approaches that focus on growth for purposes of determining AYP, the U.S. Department of Education authorized a pilot program that allowed states to submit proposals to use a growth model to make AYP determinations. The pilot program was announced by Secretary Spellings on November 21, 2005. Several “core principles” that must be met for a proposal to be approved were identified in a letter from Secretary Spellings to the Chief State School Officers regarding the pilot program. The first, and perhaps the most constraining principle, specifies that the growth model “must ensure that all students are proficient by 2013-2014 and set annual goals to ensure that the achievement gap is closing for all subgroups of students” (Spellings, 2005). Thus, despite the argument that the expectation is unrealistic, the fixed achievement target of 100% proficient or above in 2013-2014 is maintained. Eight states submitted proposals to participate in the growth model pilot program and two of those proposals (North Carolina and Tennessee) were approved for implementation of growth model pilots in 2005-2006 (Spellings, 2006). Three more states (Arkansas, Delaware, and Florida) have been approved for 2006-2007 and nine additional states have submitted proposals that are currently under review (Olson, 2006). The pilot program takes one step toward a system that would use information about improvement as well as current achievement in determining whether or not schools are performing adequately. This is an important step, but so far will be applicable only for a small fraction of the states. It is also limited by the continuing requirement that the amount of growth will lead to all students reaching at least the proficient level by 2014. The option of using improvement as well as current status to determine AYP needs to be available to more states and, as was argued above, more realistic achievement goals need to be set. 13 National Association of Test Directors 2007 Proceedings Many states lack a longitudinal data system that would allow them to implement a value-added model. Improvement in performance of students in those states could still be used in the determination of AYP by comparing the performance of student cohorts from one year to the next. Comparisons of successive cohorts of students (e.g., 4th grade student in 2006 compared to 4th grade student in 2005) lacks some of the advantages of longitudinal tracking of student achievement, but can still provide information on changes in student achievement that would complement the comparisons of current performance to fixed targets each year. Multiple Hurdles There are many ways that a school can fail to meet the AYP requirements in a given year, but only one way that it can meet them. It must meet or exceed the participation rate requirements (95% of eligible students) for mathematics and reading/English language arts for the student body as a whole and for each subgroup of students where disaggregated reporting is required, and must meet or exceed the percent proficient or above targets for all students and for all subgroups. Thus, at a minimum, schools must clear 5 hurdles to make AYP. Because of disaggregated reporting requirements for subgroups, schools with diverse student bodies are frequently confronted with many more than the 5 hurdles based on all students in the school. As the number of subgroups for which disaggregated reporting is required increases, the number of hurdles that a school must clear rapidly increases (Marion, White, Carlson, Erpenbach, Rabinowitz, & Sheinker, 2002). Thus a school with more than the minimum number of students in each of several subgroups identified for disaggregated reporting has substantially more than 5 hurdles to clear. For example, a school with 6 subgroups (African American students, Hispanic students, white students, students with limited English proficiency, economically disadvantaged students, and students with disabilities) meeting the minimum size requirement would have not 5, but 29, hurdles to clear (the 5 when all students in the school are considered as a whole, plus 24 for the 4 hurdles (participation rates in reading and mathematics, and achievement in reading and mathematics), for each of the 6 subgroups. Thus, the latter school could fail to make AYP in 29 different ways but could make AYP in only one way – by clearing all 29 hurdles. Requiring schools to meet AYP requirements for separate subgroups of students is consistent with the NCLB goal of closing gaps in achievement for the identified subgroups. Nevertheless, it is clear that NCLB’s multiple-hurdle approach makes it considerably more difficult for large schools with diverse student bodies 1 NCLB: Changing It; Fixing It; Living With It to meet AYP requirements than it is for small schools or schools with homogenous student bodies (Kim & Sunderman, 2005; Linn, 2005). There are alternatives to the conjunctive system of multiple hurdles used in the NCLB school accountability system. The most obvious alternative is some form of a compensatory system. With a compensatory approach, high achievement that is above the goal in one content area can be used to compensate for achievement that falls below the goal in another area. If the AMO for a given year was 50% proficient or above in reading and 40% proficient or above in mathematics, for example, then a school where, say, 55% of its students were proficient or above in reading but only 38% of its students were proficient or above in mathematics could make AYP under a compensatory system while it would fail to do so under the current multiple-hurdle system. A number of state accountability systems that were in place prior to the enactment of NCLB used a compensatory approach. Conclusion NCLB has the potential to make substantial positive contributions to education. It can contribute to the improvement of student achievement and, through its focus on students who have lagged behind and too often been ignored in the past, to the closing of achievement gaps among racial/ethnic groups, between economically disadvantaged students and their more affluent counterparts, between limited English proficient student and native English speakers, and between students with and without disabilities. Some features of the NCLB accountability system, however, need to be modified if the praiseworthy goals of NCLB are going to be achieved. The most important modification is to set performance targets for judging adequate yearly progress that are more reasonable and for which there is a realistic hope that they might be achieved given sufficient effort. The need for more realistic goals applies to both the safe harbor provision of the law and to the annual performance targets. The current definitions of proficient achievement established by states lack any semblance of a common meaning. Alternatives to defining proficiency should be considered that would provide more meaningful and comparable achievement targets. Past data on what schools showing exemplary gains in achievement should be used to set goals that are ambitious, but obtainable with hard work and those goals should be expressed in ways other than the currently poorly defined proficient academic achievement standards that vary wildly from state to state. 15 National Association of Test Directors 2007 Proceedings Changes to AYP requirements should be made that would allow schools to get credit for gains in achievement as well as absolute performance in a given year. The recently introduced pilot program that allows the use of longitudinal growth models by a small number of states is a step in that direction, but the fact that the unrealistic 100% proficiency requirement in 2013-2014 is maintained undercuts the value of the program. It is limited to just 2 states in 2005-2006 and 5 states in 2003-2007. Furthermore, gains made by schools in states without longitudinal tracking systems do not count toward making AYP if the school fails to meet the AMO for either reading/English language arts or mathematics. Finally, the multiple-hurdle approach used to determine AYP should be replaced by a compensatory or hybrid approach. This would make the system fairer for schools that serve heterogeneous student bodies. It would also enhance the reliability of school classification. 1 NCLB: Changing It; Fixing It; Living With It References Kim, J. S. & Sunderman, G. L. (2005). Measuring academic proficiency under the No Child Left Behind Act: Implications for educational equity. Educational Researcher, 34(8), 3-13. Linn, R. L. (2000). Assessments and Accountability. Educational Researcher, 29(2), 4-14. Linn, R. L. (2003). Accountability: Responsibility and reasonable expectations. Educational Researcher, 32, No. 7, 3-13. Linn, R. L. (2005, June 28)). Conflicting demands of No Child Left Behind and state systems: Mixed messages about school performance. Educational Policy Analysis Archives, 13(33). Linn, R. L. (in press). Toward a more effective definition of adequate yearly progress. In G. Sunderman (Ed.), Holding NCLB accountable: Achieving accountability, equity, and school reform. Thousand Oaks, CA, Corwin Press. Marion, S. T., White, C., Carlson, D., Erpenbach, W. J., Rabinowitz, A. & Sheinker. J. (2002). Making valid and reliable decisions in determining adequate yearly progress. A paper series: Implementing the state accountability requirements under the No Child Left Behind Act of 2001. Washington, DC: Council of Chief State School Officers. Mullis, I. V. S., Martin, M. O., & Foy, P. (2005). IEA’s TIMSS 2003 international report on achievement in mathematics cognitive domains: Finding from a developmental project. Chestnut Hill, MA, International Study Center, Lynch School of Education, Boston College. Mullis, I. V. S., Martin, M. O., Gonzalez, E. J., Y Kennedy, A. M. (2003). PIRLS 2001 international report: IEA’s study of reading literacy achievement in primary school in 35 countries. Chestnut Hill, MA, International Study Center, Lynch School of Education, Boston College. No Child Left Behind Act of 2001, Public Law No. 107.110. Olson, L. (2005). Room to Maneuver. Education Week, Special pull out section: A progress report on the No Child Left Behind Act, pp. S1-S6, December 14. 17 National Association of Test Directors 2007 Proceedings Olson, L. (2006). 3 state get OK to use “growth model” to gauge AYP. Education Week, 26(12), Nov. 15, p. 24. Organization for Economic Co-operation and Development. (2004). Learning for tomorrow’s world: First results from PISA 2003. Paris, France: Author. Perie, M., Grigg, W., & Dion, G. (2005). The nation’s report card: Mathematics 2005 (NCES 2006-453). U.S. Department of Education, National Center for Education Statistics. Washington, DC: U.S. Government Printing Office. Phillips, G. W. (2007). Expressing international educational achievement in terms of U.S. performance standards: Linking NAEP achievement levels to TIMSS. Technical Report. Washington, DC: American Institutes for Research. Popham, W. J. (2004). Ruminations regarding NCLB’s most malignant provisions: Adequate yearly progress. Available at //www.ctredpol.org/pubs/Forum28July2004. Porter, A. C., Linn, R. L., & Trimble, S. (2005). The effects of state decisions about NCLB Adequate Yearly Progress targets. Educational Measurement: Issues and Practice, 24(4), 32-39. Rothstein, R., Jacobsen, R., & Wilder, T. (2006). ‘Proficiency for all’ – An oxymoron. Paper presented at a Symposium, “Examining America’s commitment to closing achievement gaps: NCLB and its alternatives,” sponsored by the Campaign for Educational Equity, Teachers College, Columbia University, November 13-14. Spellings, M. (November 21, 2005). Letter to Chief State School Officers, announcing growth model pilot program, with enclosures. Available at: http://www.ed.gov/nclb/landing.jhtml. Spellings, M. (May 17, 2006). Press release. Secretary Spellings approves Tennessee and North Carolina growth model pilots for 2005-2006. Available at http://www.ed.gov/news/pressreleases/2006/05/05172006a.html. 1 NCLB: Changing It; Fixing It; Living With It The State of NCLB at a State Department of Education Judy Feil Ohio State Department of Education We all know that NCLB is about more than annually measuring students academic achievement in English/Language Arts and mathematics in grades 3 – 8 and once in the grade span 9 – 12 and science three times once in three different grade spans. NCLB is about the accountability of schools, districts, and states for the learning of all students and for improving the educational systems that serve those students. It is about providing resources, both fiscal and human, to struggling schools; giving technical assistance to schools that need help; training and hiring highly qualified teachers at all grade levels; and allowing school choice for parent in an effort to improve the educational opportunities of all students, close the achievement gap, and provide a fairer and more equitable education to all groups. Are these lofty goals? Yes. And who can argue with 19 National Association of Test Directors 2007 Proceedings those goals – who is opposed to any of them? The underlying premise is that NCLB will improve the educational system in America helping to raise the standards and expectations for all: states, districts, schools, and students. I would like to share some of the speed bumps along this road to improving opportunities for students – maybe even share some potholes that states have fallen into and are struggle to overcome. Let me remind everyone, although that is probably not necessary, that the AYP has three parts – two of which are dependent upon a state assessment system. That assessment system is the foundation upon which all the other tenants rest. It is the topic of my presentation today. There are three topics I wish to discuss today: some demands of NCLB, the costs of NCLB, and the balancing act required to make NCLB work. These topics are in the realm of my expertise as a state testing director. Remember that Ohio is not unique but is one of 50 states struggling to comply with NCLB and, I might add, making it work. Most of you are aware of the many demands on the state assessment system under NCLB. Only a very few are listed here: • Statewide System of – Challenging Academic Content Standards – Challenging Academic Achievement Standards – Annual High-Quality Assessments • Assessments with High Technical Quality • Inclusion of All Students in the Assessment System • Comparability of test results Some key words are highlighted: challenging, annual, high technical quality, all students, and comparability. Even though the balancing act required under NCLB is listed as the third topic of discussion today, it is interwoven into all the remarks, subtly underlying all the assessment work. Consider the word “challenging” compared to the “all students.” What does the word challenging mean when asked of the entire student population? Challenging for the students who routinely perform at a low level of academic achievement means something entirely difference for the students who perform at the high level of academic achievement. What is challenging to one group of students may represent the impossible for another group of students. Also consider the “comparability” of test results and “all students.” All students in the state must be given the same or comparable assessment. Some students should be provided accommodations and/or special versions in order to have access to the assessment. Are their results truly comparable to the student who just barely misses qualifying for the accommodations? How can the results from the portfolio used as the alternate assessment for the significantly cognitively disabled student be comparable to the score results from a paper pencil on demand assessment? States must do a remarkable balancing act to demonstrate comparability of test results to the NCLB peer reviewers. 1 NCLB: Changing It; Fixing It; Living With It And lastly, the “annual” assessments coupled with “high technical quality.” Most people would agree that the large scale standardized assessments developed by the testing industry for decades were of high technical quality. But those products have been shelved. The timeline to develop and collect evidence of high technical quality was between 5 and 7 years for the testing companies. Gone also is that timeline. States are required to product annual assessments of high technical quality and in most cases with unique assessments every year. Sadly, quality must be balanced against expediency. Despite these seemingly overwhelming challenging, states, to date, have been relatively successful. The Costs of NCLB There are three primary costs associated with the state assessment system linked to the high demands of NCLB: fiscal, human and public perception or credibility costs. Fiscal Costs As you can see from Chart 1, the costs for the state assessment system required by federal law has risen steadily over the approximately 5 ½ years since NCLB was signed into law. The red line graph is the federal appropriation for the mandated state assessments. The blue line graph shown here represents the costs of the assessment contracts with the testing companies working on the Ohio project. As new tests came on line to complete the required battery of mandated tests, the costs increased; as the state worked to test all students, the costs increased; and as the state worked to establish the high quality and comparability of the assessments, special version, alternate assessment, and accommodations, the costs increased. Chart 1 21 National Association of Test Directors 2007 Proceedings State Assessment Costs and Federal Appropriation 90,000,000 80,000,000 Dollar Costs 70,000,000 60,000,000 50,000,000 Federal Appropriation O hio Assessment Costs 40,000,000 30,000,000 20,000,000 10,000,000 0 2002 2003 2004 2005 2006 2007 Fiscal Years As can be seen in Chart 2, 28% was the highest proportion of the assessment costs paid for by the federal appropriation but as the state costs increased, the appropriation remained relatively flat and has now dropped to just 15.5% of the assessment costs. Need it be said – unfunded mandate. Chart 2 Federal Appropriation as Percentage of Total Costs 35.0% 28.0% 30.0% 23.9% Percentage 25.0% 18.5% 20.0% 17.7% 15.5% 15.0% 10.0% 5.0% 0.0% 0.0% 2002 2003 2004 2005 2006 2007 Fiscal Year Human Costs 1 NCLB: Changing It; Fixing It; Living With It We have all seen the headlines about testing companies struggling to meet the customized testing needs of the 50 states and the resulting problems when the highly taxed testing industry stumbles. Tests are not delivered on time or tests are scored incorrectly or test results are reported incorrectly or web based online testing systems fail are headlines we have all seen. The days of off the shelf products that were the mainstay of the testing industry for decades are over and those tests are for the most part being put back on the shelf. Not only is customization the new requirement, but customization to different state standards is the norm. As a result of the strain on both the testing industry and the state departments, new partnership relationships are essential for survival in the current environment. States are assuming more of the quality assurance roles, are more closely monitoring the testing contractors, and are more involved in the test development process than ever before. In order to meet the ever increasing challenges of NCLB, states and their respective testing companies must work as teams to jointly find resolution to problems and to find solutions to the challenges if both entities are to survive. But that too has a price in human costs. Over the past six years, Ohio has seen the number of test forms increase yearly until leveling off in 2007 (Chart 3). The green line graph represents the number of general education assessments. The red and blue line graphs represent respectively the number of alternate assessment and special versions: Braille, large print, English audio on CDs, Spanish bilingual, and 5 foreign language translations on CDs. Please note that the number of test forms for the students needing special versions or the alternate assessments (represents the “all” in NCLB) far exceeds to number of test forms for the general education population. The numbers displayed here are Ohio’s response to the NCLB requirements. Other states would have similar numbers Chart 3 Number of Tests 300 Number of Forms 250 200 150 General Education Alternate Assessment Special Versions Total Forms 100 50 0 2002 2003 2004 2005 Fiscal Year 2006 2007 23 National Association of Test Directors 2007 Proceedings Herein lies one of the pot holes into which some states have fallen on their road to NCLB peer review approval. Because the alternate assessments and special versions are much more costly on a per student basis and because of manpower shortages, some states have not dedicated the necessary resources to those groups. They have failed to find the correct balance between the various needs. These graphs represent the yearly forms for the Ohio statewide assessment system which is monitored by the assessment office. The assessment office in Ohio has formed a successful partnership with its testing contractor to produce high quality assessments annually. While the number of test forms has increased over the past several years the number of assessment staff members in Ohio increased slightly until 2007 when the number of employees dropped (Chart 4). Because the number of employees has not increased at the same rate as the number of tests, the work load for each individual employee has increased. Chart 4 Staffing for Assessment 30 Number of People 25 20 15 10 5 0 2002 2003 2004 2005 2006 2007 Fiscal Year 1 NCLB: Changing It; Fixing It; Living With It Other data give a different view of the assessment office staffing. Of the original 17 employees in the office in 2002, only 4 are still in the assessment office. Of the current staff, only 4 have previous experience in large scale test development and/or test scoring experience; 4 have technical experience in data management and statistics; and there has been 3 testing directors in the six years since NCLB was signed into law. Please do not get me wrong – I am not complaining. Ohio has a relatively large assessment staff for which I am very thankful. Other states are not so fortunate. The very nature of the work of the assessment office is sometimes a source of high human costs. The work is very specialized and requires extensive attention to detail. It can be tedious and time consuming with little recognition for a job well done. There is a steep learning curve for the work involved. It takes between 3 to 6 months for a new employee to be trained to work efficiently, effectively and accurately in assessment work. That is why the turnover rate is so costly. The work is highly stressful both because of the work load and because the recognition of the high stakes nature of the results. Yes, the stakes are high for schools, districts and the state. There is an ever increasing need for high quality work and continuous oversight of all phases of the assessment system. There is very little margin or tolerance for error. Because of condensed timelines of annual high quality tests, there is no forgiveness for missed deadlines. The work is ongoing with little down time. Because the department is a public organization, the staff must be response to the public’s requests for help and information. Not only must the staff be available to help with customer service but they must strive to make sure the assessment administration logistics run as smoothly as possible. All aspects of the assessment administration and reporting must be customized to the various stakeholder groups so that every group has the help and information it needs to administer and understand the test results. You must be thinking that what I am describing is an impossible task. That is not the case. Most states have been successful in the meeting the NCLB demands. Because of the dedicated hard working assessment staff and the team work model employed, the yearly output from the Ohio office is over 7,500 new test questions a year with oversight of an average of 90 committee meetings annually. Including embedded field test form there are between 500 and 600 unique forms developed each year. Ohio administers and scores tests and reports test results to approximately 2,380,000 students on an annual basis. Credibility Costs The last cost I want to discuss is the cost to the credibility of the assessment systems and the accountability system. We have all seen the sensational headlines about state testing programs. These are a few of the headlines Ohio has seen in the past two years – “Kids Suffer When Test Fails,” “16 25 National Association of Test Directors 2007 Proceedings Local School Districts Have Students Affected by Test Error,” “Tests wrongly marked fail,” “Company’s error sent more than 900 students word they didn’t pass,” “Ohio Student Claim Test Confusing,” and “Test gaffe sent wrong message to student.” One article began with the line “Errors on the state’s test do more than jeopardize the credibility of Ohio’s accountability system. They risk significant harm to individual students.” As this opening line states, the harm from testing errors is the student. But I content that real harm in the NCLB environment can also happen to the school, district and state. Gone are the days of administering the shelf products and relatively low stakes of the test results. Under NCLB’s requirement for customized assessments, the states have ownership of the tests and are subsequently responsible for both the success and failure of those tests. Sadly, however, when any aspect of the testing system fails, the credibility of the state department is a stake. Not only is the state ultimately responsible for the assessment system but it must walk a fine line to keep external pressures in relative equilibrium. There are many different sources of pressure on a state’s assessment system (Figure 1). The federal mandates, state laws, state policy makers, the state board of education, the general public and schools/districts. Each creates a different pressure for a different reason. Figure 1 Sources of Pressures Federal Requirements State Policy Makers State Laws/ Legislature Assessment System General Public/ Press State Board of Education Schools/ Districts 1 NCLB: Changing It; Fixing It; Living With It Let me give a few examples. Early on in the initial stages of setting up the assessments, Ohio elected to use three types of items on the assessments: multiple choice, short answer, and extended response. Up until this year, Ohio has administered the grades 3 – 8 assessments in March with the results due in May. There was a general feeling in some schools and districts that students would do better is they has more instruction time before they were administered the statewide tests. These groups put pressure on the state legislature to move the test administration date to May. After several years of debate, a state law was passed, effective this year, changing the test administration date to May with the results due back in districts in time for the accountability local report card. Because of the federal requirement of reporting the AYP results before school began, the deadline for test results remained the same. The pressure upon the state department was to make this change happen smoothly and error free. This shortening of the return, scoring and reporting the test results to about half the time previously allowed. Even though Ohio has worked closely with our testing contractor to work out the logistics of this compressed timeline, it was necessary to form a closer partnership with school districts to find resolution and solutions for unforeseen consequences of this change. Have these new partnerships worked, we hope so but we will not know for several months. Another example of a balancing act is the 2% modified assessment. This assessment is voluntary so there is no federal funding for this program. The regulations for this assessment were just finalized and released and pressure is beginning to build within the state to create a new series of tests for this special population. Is that wrong? Of course not. Is it possible? Yes. So what is the problem? The balancing act is meeting the needs of the students for who the 2% modified assessment is intended when the program is not funded by the federal government because it is voluntary and the state budget language funds only mandatory tests. This is complicated by the increased interest in formative assessments and end-of-course exams that is sweeping the country and has been picked up by the state board of education and policy makers within the state. With limited fiscal and human resources, the assessment office will be doing a fine balancing act to make all happen. While the states contemplate and discuss, policy makers haggle, and politician form alliances only to reconfigure into new support groups for the reenactment of NCLB scheduled for this year, quietly across America in every school and classroom, teachers teach and students learn. Isn’t that what NCLB is all about? 27 National Association of Test Directors 2007 Proceedings NCLB: Changing It, Fixing It, Living With It: Is a High Performing 1 NCLB: Changing It; Fixing It; Living With It District’s Performance High Enough for NCLB? David J. Kroeze Northbrook [IL] School District 27 The No Child Left Behind Act of 2001 (NCLB) is noteworthy for its emphasis on addressing the needs of all children, in particular, those students who typically underachieve and lag behind others. It also focuses the attention of all school districts in America on the need to use assessment data as a means to understand student growth and to make curricular and instructional decisions that specifically support the needs of low and underperforming students. The Act, though well-intentioned, has inherent implementation flaws that need to be addressed in its reauthorization. These flaws spotlight bureaucratic and political aspects of national proficiency and achievement at the expense of addressing the real life issues that practitioners face in their day-to-day work with the teaching-learning process. As a result there are some unintended outcomes that pose challenges for all school districts and cause them to devote time and effort restructuring and retrofitting components of their curriculum, instructional practice, and staffing. The adverse impact of the NCLB has been widely documented (AASA, 2007; CCSSO, 2007; NGA, 2007; Linn, 2007; Nichols & Berliner, 2007; Popham, 2004). Nevertheless, the potential for this law to have lasting positive impact is very real. This paper articulates the impact of NCLB on a high-performing school district, Northbrook School District 27 (NSD27). At the outset, I provide a brief context of NSD27 to give the reader an understanding of the District and the broader picture of our focus on organizational improvement and student performance 29 National Association of Test Directors 2007 Proceedings within that context. Three areas, then, will be addressed. First, this paper focuses on the positive aspects of NCLB has brought to enhance our ability to meet students’ needs. The paper also addresses the adverse impact that the law on our efforts to address performance, community, and staff issues because of its unrealistic expectations. The paper concludes with suggestions for improvement as NCLB moves toward reauthorization. The Context of Northbrook School District 27 Located in the suburbs outside Chicago, NSD27 is a small elementary school district of approximately five square miles in a middle-to-upper middle class socio-economic community with 15% of the population being non-White. It has a community population of 10,000 and a student enrollment of approximately 1,300. The District is one of four school districts in the town serving the northwest segment, which accounts for approximately 30% of the village population. The village founders believed that small school districts with their own governance structure would be able to better serve the needs of the community and provide closer access to school leaders and Board of Education members. NSD27 started out small in population but grew to 2,000 students in the early 1970s due to baby boomer births. Since that time it experienced a steady enrollment decline, with short periods of modest increases. For the past five years enrollment has remained relatively stable. NSD27 is a 154-year-old district having a history of high student academic performance. It is recognized for its performance by outside groups who judge the state’s more than 850 school districts. The District offers comprehensive support programs for students along with many opportunities for students to be involved in extra-curricular activities. NSD27 offers academic programs and professional student services based on curriculum guidelines and regulations established by the State Board of Education. Regular academic programs are included in two elementary schools (K-3), one intermediate school (4-5), and one middle school (6-8). The District delivers regular educational programs via classroom and technology-based instruction, educational learning labs, and school-related activities. Educational program delivery occurs during the traditional school calendar with a four-week summer school program. NSD27 provides other programs that include special education, instruction for English Language Learners (ELL), pre-kindergarten at-risk support, and gifted 1 NCLB: Changing It; Fixing It; Living With It education. Numerous clubs, organizations, and sports programs provide beforeand after-school extra-curricular activities. Based on national, state, and local comparison data, NSD27 is considered an academically and financially high performing system in Illinois. For an academically high-performing district, improving test scores is not the overarching goal. Rather, the challenge is implementing an effective continuous improvement model to affect systemic change. Continuous improvement demands relentless pursuit of doing things better, more efficiently, at lower costs, and with greater sensitivity to the customers’ needs. Ultimately, it distinguishes the leading educational systems of the 21st century (Kimmelman & Kroeze, 2002). Over the past fifteen years NSD27 made a significant shift from what we call an “Event-based” approach of operation to a “Systems Approach.” The event-based approach led to what we called “Random Acts of Improvement” but did not provide a profound or even noticeable impact on the District’s various performance indicators. Therefore, the District developed a systemic approach of continuous improvement to raise its level of organizational effectiveness. Taking a more systemic view of the District provided a more interactive context where alignment of processes and services could be studied for their impact on the overall organization. After researching various continuous improvement models, the Board of Education adopted the Malcolm Baldrige National Quality Program Framework because it more appropriately reflected the type of approaches and thinking that we needed to help our organization move to the next level of maturity. NSD27 began its continuous improvement process at the district administrators’ annual summer retreat in June 2003, implemented it fully in 2004, and progressed to the point of submitting a Baldrige application for review in 2006. As the primary instructional leader of a high performing school district, it is my responsibility to ensure that all students receive a quality educational program that enables students to perform at highly competitive levels and to prepare them to be part of what I call a “Worldwide Community of Excellence.” A highperforming district has some inherent advantages that enable its staff to maximize its resources and provide students with the support they need to perform well. The community expectations for high performance are an ever-present reminder that there is no substitute for excellence. It is within this context that I must implement the provisions of the NCLB, manage the conventional reporting mechanisms of the state report card to our community, and offer a comprehensive well-rounded program of instruction to meet the needs of the 31 National Association of Test Directors 2007 Proceedings whole child. This broader program offering supersedes the limited, narrower scope of the state standards pertaining to NCLB. In our District NCLB state learning standards are merely part of the curriculum, not the total curriculum. Finally, I must provide a staff that is “highly qualified,” not only meeting the requirements of NCLB but IDEA and other content expertise areas such as gifted and English Language Learners. It is within this context that NSD27 implements the No Child Left Behind Act of 2001. NCLB is part of the regulatory environment within which the District operates and exists as part of the broader operation of the District. Positive Aspects of NCLB NCLB has actually called attention to a number of factors on which we have capitalized to improve our ability to meet student needs and to improve our practices. I focus on three distinct positive aspects: (a) Greater attention to low performing students; (b) heightened awareness of our diversity; and (c) enhanced the use of assessment data for instructional decision-making. Greater Attention to Low Performing Students In a high-performing school district, aggregate performance scores on NCLB provide an overall picture of excellence and we are able to demonstrate that our schools are more than sufficiently meeting the AYP requirements. Many highperforming districts could rest of their laurels and tout the aggregate performance. However, the story needs to go deeper with particular attention to individual students. We embraced the intent of NCLB and operationalized it for our students and parents. In NSD27 if a student does not meet the state learning standards, we address that student’s needs individually with support. We take the student’s performance information and triangulate it with other formative and norm-referenced data. The teacher and other key personnel in the school develop an individual learning plan for every student that does not meet our targets. These are monitored throughout the year. One key measure that we use is the Northwest Evaluation Association (NWEA) Measures of Academic Progress (MAP). This measure identifies the student’s performance which is aligned with the Illinois State Learning Standards. Using a companion NWEA tool called the Dynamic Reporting Suite, we are able to identify a projection of student performance on the Illinois Standards Achievement Test (ISAT). In the fall of each year we use our NWEA data and an additional NWEA tool called the Learning Continuum. This tool enabled us to identify key skills and 1 NCLB: Changing It; Fixing It; Living With It concepts that are deficient and then provide skill development support through regular and extra instruction. This approach increases the probability that the student will meet the AYP target the following year. Our internal analysis of student growth shows that most students who initially do not meet the target make substantial growth in the year, and will, within three years, meet the state learning targets. Another aspect of this notion of meeting individual needs of students acknowledges that students can increase in their state standing even if they meet state learning standards. A common point of conversation with our Board of Education and parents is to set internal targets for ourselves to raise the percentage of students who “exceed” state learning standards. Consistent with the high performance expectations of our community, we identify internal targets of performance that far exceed the current AYP expectations. We make it a practice to have all of our students who do not score in the “exceeds” level strive to increase their performance level over time. We consistently set new standards of performance for all students and identify those skills that can be enhanced that would improve their standing. By the time students graduate from eighth grade, most of them have improved their standing on the ISAT. Heightened Awareness of Our Diversity NCLB also heightened our awareness of the diversity we have even in this very homogeneous community. While NSD27 is socio-economically homogeneous, we have a few areas of diversity. Specifically we have two subgroups under NCLB, ethnic (13% Asian, 86% white) and special education population 18%. The District implements well-developed processes to meet the needs of students with specific needs. What NCLB offers is data that support our effort to determine if our approaches are having the impact we believe they should on these students. In other words, are these approaches really working? Analyzing the results of NCLB for our subgroups enables us to make better decisions about the approaches that we use to support these students. These data inform us about the areas to that we can make the adjustments needed to address student needs. Enhanced Use of Assessment Data for Instructional Decision-Making NCLB encouraged and enabled districts’ use of assessment data to make better instructional decisions for students. I have long believed that the use of assessment data has been one of the most underutilized and misunderstood components of the teaching-learning process (Curriculum-InstructionAssessment). Although testing students has become commonplace in school districts, educators struggle with the challenges of assessing student learning. By 33 National Association of Test Directors 2007 Proceedings this I mean that districts are collecting student achievement data, but it is primarily being used to describe characteristics of student performance. Understanding why students achieve the way they do is a more challenging task. Teachers need to know how to use the data to inform their practice and understand student progress. In essence, they have an abundance of data but lack the proper understanding of how to make use of this information. Michael Fullan (1998) in his book, What’s Worth Fighting For Out There? encourages teachers and administrators to become more “assessment literate.” This can be done in several ways including expanding their assessment repertoires, showing parents and students how they arrive at their assessment decisions, collecting assessment data as an ongoing part of classroom learning, monitoring how well their students are achieving over time and communicating the results clearly to parents and the public. NCLB provided us with information that enabled us to revisit how we use assessment data. We incorporated these data into our School Improvement Planning, grade level planning, and, as mentioned, individual student learning plans. NCLB drives quality information to the teacher level, allowing them to examine whether the practices we are implementing have an impact on student performance. Teachers in NSD27 take very seriously the NCLB data and work hard to ensure that their instructional approaches are focused on meeting students’ needs and are consistent with the District’s curriculum and the Illinois State Learning Standards. We are now tracking information over time using NCLB, NWEA and other performance measures that help us to evaluate not only student performance but our approaches. Negative Aspects of NCLB Implementation NCLB is not without its critics or criticisms. From the perspective of a highperforming school district, there are three negative aspects that result from its implementation: (a) statistical implications, (b) definition of proficiency, and (c) requirements of highly qualified staff. The following section elaborates on each of these aspects. Statistical Implications Like other high-performing school districts across the nation, NSD27s statistical mean on national assessments is significantly higher than the national mean (50%), typically 20-25% higher. This fact alone would seem to position highperforming school districts in an enviable position to meet AYP targets. Currently, NSD27 meets AYP targets and does so each year with performances 1 NCLB: Changing It; Fixing It; Living With It on the state assessments indicating that 90+% of NSD27 students meet or exceed levels of proficiency in reading and mathematics. While our overall means are high, the story does not end there. The impact of individual student performances is problematic and gives misleading perceptions of “growth” and “decline” within a grade level. Where large school districts have the challenge of demonstrating substantial change, school districts with small enrollments have the opposite problem. NSD27’s small enrollment leaves it highly vulnerable to individual student performances that can result in substantial fluctuations of overall percentages within the State’s performance level designations (“exceeds”, “meets”, “does not meet”, and “academic warning”). A single student’s performance can influence the performance level percentages by multiple percentage points. In a community that expects consistent, high performance, a perceived decline in performance is a red flag. NSD27 works hard to educate our community regarding the variation we can expect to see. For example, when reporting that a grade level increased or decreased by five percentage points, we are quick to mention that this variation reflects only 2-3 students and is not a significant change. As a result we continue to advocate the use of trend data and triangulating other data to give a more complete view of student performance at any grade level. Despite this work, we find it to be an ongoing educational challenge. Despite our pattern of strong performance on the ISAT, it is unlikely that NSD27 will meet 100% proficiency by 2014. We have a number of new students that enter the district on a yearly basis and we implement what we call a New Student Assimilation Process designed to diagnose student performance levels and needs. This process enables us to begin immediately addressing student learning and developmental needs. We also have low performing students who are making progress but will not meet the proficiency levels by 2014, simply because they have unique learning needs. The greater issue for NSD27 is to ensure that we have implemented instructional and support programs for these students that will result in their ongoing growth and development. Linn points out from his study of proficiency and his work on the National Assessment of Educational Progress (NAEP) and the Third International Mathematics and Science Studies (TIMSS) that “100% proficient is a goal that is completely out of reach even with extraordinary effort on the part of teachers an students” (Linn, 2000; 2007). One question results: If districts with consistent performances as strong as NSD27 cannot make 100% proficiency, then who can? The target is not only unrealistic but statistically impossible (Linn, 2004, 2007) Definition of Proficiency 35 National Association of Test Directors 2007 Proceedings Proficiency standard is another negative aspect of NCLB. Whose definition of proficiency are we using? As Linn (2004, 2007) has appropriately addressed, there is a fundamental flaw with using proficiency as the assessment benchmark. Currently, NCLB allows every state the right and responsibility to define a proficiency standard. The lack of a universally accepted proficiency standard fosters inevitable state-to-state assessment comparisons. More importantly, the lack of clarity regarding proficiency standards does not stop at the state or national level but raises the question of preparedness within a global context. Today’s students will have to learn to function and be productive as adults in a highly competitive global environment. Currently, the various definitions of proficiency do give us an indication that national proficiency improves our standing on an international basis. I worked with two sets the TIMSS data (TIMSS 1995 and TIMSS-R 1999) as a member of the First in the World Consortium (FITW) in partnership with Boston College, Michigan State University, the North Central Regional Education Laboratory (NCREL), and the US Department of Education. We sought to understand our student performance by looking at what we defined as the A+ countries, those that achieved at the highest levels internationally. We also studied the US performance which was close to the mean of its international counterparts. The FITW performed competitively with the highest achieving students in the world but we still had a number of areas where we could improve our performance. The proficiency standard issue of NCLB suggests that if we attain national proficiency, we might improve our global standing. In other words, we would be competitive internationally. This implication raises the following question in my mind: Would American students be competitive with our international counterparts?; and, more significantly: Would we be competitive with the highest achieving students in the world if all students in the country were to achieve proficiency (by 2014 or later)? The data suggest that we would not. (Linn, 2000; 2007) The implication of this notion suggests that our current variable definitions of proficiency are still removed from the reality of our future global workforce. As a result, NSD27 relies on its development of rigorous and coherent programs and performance indicators to define a high level of performance. Highly Qualified Staffing The third negative aspect of NCLB exists within the expectations of highly qualified staff. No one argues that all children have the right to be taught by a 1 NCLB: Changing It; Fixing It; Living With It highly qualified teacher. Fortunately for high-performing school districts, teachers desire to work in these resource-rich settings. However, even highperforming districts have significant challenges in meeting the criteria for “highly qualified” staff. The first challenge comes in the significant investment of time and resources in examining every certificated staffing position. This action is necessary to ensure that every teacher holds the appropriate certification, endorsements, and testing necessary to meet the state’s “highly qualified” criteria. In addition to reviewing current staff’s credentials, changes in the hiring process result. Beyond typical personnel screening procedures, all applicants now are screened through the “highly qualified” lens as well. There is no need to interview an applicant that does not meet the criteria for “highly qualified.” The second challenge of NCLB’s “highly qualified” criteria has a major impact on staffing for subgroup populations, specifically special education (SpEd) and English language learners (ELL). Currently, SpEd teachers must be “highly qualified” in every instructional content area if they are the teacher of record. In other words, they determine the student’s grade. In the middle grades, this means SpEd teachers must have a minimum of 18 semester hours in every content area plus three hours in middle school pedagogy and three hours in adolescent psychology. These criteria are in addition to their SpEd certification and endorsement requirements. As a result we (a) restructure our instructional delivery approaches for SpEd students, and/or (b) send SpEd staff back to college for coursework. The restrictive nature of certification contributes to the third challenge of NCLB: teachers of special populations are choosing to go back into the regular classroom to get out from under the burdensome requirements to become adequately certified. Many excellent teachers who have a heart for students with special needs and have the gifts and talent to work with these students are discouraged and are seeking a more reasonable approach to certification. Similar procedures have been followed for teachers working with the ELL population. As a result we have a greater challenge to find “highly qualified” teachers to fill these critical special area positions. Meeting NCLB’s “highly qualified” criteria have had a major impact on our school system. Suggestions for Improving NCLB As It Moves to Reauthorization As NCLB moves into reauthorization, a number of recommendations have been proposed to improve it. I would offer three improvements as they pertain to high performing districts: (a) Provide alternative models to demonstrate target 37 National Association of Test Directors 2007 Proceedings acquisition and student growth, (b) set more realistic goals for meeting AYP, and (c) provide more realistic expectations for “highly qualified” criteria in special areas. Provide Alternative Models to Demonstrate Target Acquisition and Student Growth One of the fundamental flaws of the NCLB is its sole reliance on current status to determine AYP. While this is one performance measure, it lacks the ability to identify student learning over time. The use of cohort data will enable educators to view growth over time and trends in performance. The basic AYP measurement system should be expanded to include growth model/value added approaches. This expansion would permit districts to demonstrate student growth over time, and growth toward meeting the AYP targets. States should be given great flexibility to design their accountability systems while continuing to support the broader goals of NCLB. Set More Realistic Goals for Meeting AYP Second, the terminal goal of 100% of students meeting the proficiency standards by 2014 needs to be re-examined. Statistically this goal is unattainable and will only result in every school district failing to meet the expectation. As mentioned earlier, we do not have a clear and consistent definition of proficiency nationwide. Linn has concluded that it is impossible for us to attain 100% proficiency despite the work of teachers and students (Linn, 2000; 2007). In high-performing districts, our expectation is to educate students who will success in global environment that is rapidly changing. The Third International Mathematics and Science Studies (TIMSS) data indicate that there are clear distributions of performance of students from the various countries. The real question for districts like NSD27 is: Do we compete favorably with the highest achieving students in the world? It is in this competitive environment that our students will be working. My concern over the proficiency goal stems from the lack of definition of proficiency and the lack of a connection with a universal view of competence. Even if all students in America were to achieve the proficiency standard, we do not know if this performance would make us competitive with students in other countries much less the high achieving students. We must continue to set academically challenging benchmarks and strive to meet them. Then we need to continue to compare our performance with our international counterparts to strive for a competitive standing. 1 NCLB: Changing It; Fixing It; Living With It More Realistic Expectations for Highly Qualified Criteria. All students and in particular students with special needs deserve exceptional teachers. In order to attract and retain quality educators, I would encourage legislators to revisit the current criteria for “highly qualified” during the reauthorization phase of NCLB. Providing appropriate service for special education students and English language learners should remain a priority; however, determining credentials and endorsements for these teachers demands attention. NCLB needs to encourage quality teachers to enter and remain in the field of working with special needs students, not discourage them. Establishing less restrictive and more realistic criteria for “highly qualified” staff would go a long way to lessen the burden of NCLB implementation. Conclusion NCLB has had an impact on all school districts including the high performing ones. This paper delineates some the most important positive and negative aspects of NCLB on a high-performing school district. Among the many positive aspects of the law, it has drawn greater attention to students who do not meet state standards and who are underachieving or lagging behind. Moreover, it has heightened our awareness of our diverse groups of students and the practices that we implement to meet their needs. NSD27 has embraced the guidance to help us better meet these students’ needs. Overall NCLB has put the spotlight on the need to use assessment data to meet students’ needs and improve our programs and services. The law has also had some adverse impacts on our districts. The fundamental problems identified involved statistical anomalies, confusion surrounding definitions of proficiency, and the restrictive nature of “highly qualified” criteria for teachers of special needs students. I offered a few suggestions for consideration that might improve NCLB as it moves into reauthorization. Specifically, Congress could improve the law by providing alternative models to demonstrate target acquisition and student growth. This approach would enable districts to demonstrate progress to AYP and provide trend data growth. Second, NCLB needs to set more realistic goals for meeting AYP. Few, if any, school districts in America will meet 100% proficiency. Moreover, reaching proficiency may not ensure our competitiveness on a more global scale. Finally, the law should provide more realistic expectations for “highly qualified” criteria in special areas. By doing so we can attract and retain teachers who have training and talents to meet the needs of 39 National Association of Test Directors 2007 Proceedings these special populations without experiencing the burdensome certification requirements that are now in place. References American Association of School Administrators. (March 2007). NCLB Update: Presentation at Annual Meeting of the American Association of School Administrators in New Orleans, LA. Aumiller, B.E. (2007). Case study of a K-8 school district's administrative leadership's implementation of the Baldrige education criteria for performance excellence. (Doctoral dissertation, University of Illinois, 2007). 12, 69-72. Council of Chief State School Officers. (2007). Recommendations to reauthorize the elementary and secondary education act. Washington, DC: CCSSO. Dougherzy, C. (2006). Identifying and studying high-performing schools. Austin, TX: National Center for Education Accountability. Fullan, M., & Hargreaves, A. (1998 ). What’s worth fighting for out there. NY: Teachers College Press. Kimmelman, P. L., & Kroeze, D. J. (2002). Achieving world-class schools: Mastering school improvement using a genetic model. Norwood, MA: Christopher-Gordon. Linn, R.L. (2000). Assessments and accountability. Educational Researcher, 29(2), 4-14. Linn, R.L. (2004) Accountability: Responsibility and reasonable expectations. Educational Researcher, 32(7), 3-13. Linn, R.L. (2007). Need modifications of NCLB. Paper presented at a symposium sponsored by the National Association of Test Directors entitled “NCLB: Changing It: Fixing It: Living with It” at the Annual Meeting of the National Council of Measurement in Education, Chicago, IL. National Governors Association. (2007). Reauthorization of NCLB. Washington, DC: National Governors Association. National School Boards Association. (2007). NCLB reauthorization: Guiding principles. Alexandria, VA: NSBA. 1 NCLB: Changing It; Fixing It; Living With It Nichols, S.L., & Berliner, D.C. (2007). Collateral damage: How high-stakes testing corrupts America’s schools. Cambridge: Harvard Education Press. Popham, W.J. (2004). America’s failing schools: How parents and teachers can cope with no child left behind. New York: RoutledgeFalmer. 41 Current Guidance for Integrity In Testing NCLB: A Business Perspective Barb Boyd Nationwide Insurance Why does the business community care about education and the No Child Left Behind legislation? What can the business community do to help? From a business perspective, what should be considered for the NCLB reauthorization? Answers to these and many other questions are addressed in this brief. Business Community and Education Many businesses are actively involved and concerned about public K-12 education. There are numerous reasons for this involvement and concern: Genuine concern for the community - education is the path to economic stability along with the accumulation of assets. o Increased poverty in inner cities. The number of Franklin County residents (Columbus is within Franklin County) living in poverty has increased. In 1999, 11.6 percent of Franklin County residents were below the federal poverty level threshold. By 2005, the percentage of residents living below the federal poverty level threshold had risen to 14.5 percent.1 Just 10 years 42 National Association of Test Directors 2005 Annual Proceedings ago 55 percent of the students in Columbus Public Schools received free or reduced price lunch; now 74 percent of the students receive free or reduced price lunch.2 o Earnings increase in relation to education level. U.S. census data show that those with less than a high school degree earn about $19,000 per year; high school graduates earn an average of $27,000 a year. Those with bachelor’s degrees earn $51,000 a year – almost twice that of those with a high school degree. o While education can be the key to higher earnings, it is also linked to the accumulation of assets. Research shows that, on average, households headed by a high school graduate accumulate ten times more wealth than households headed by a high school dropout. Wealth is critical to the economic wellbeing of individuals and families as it is the gauge of a household’s financial security and prospects. The potential additional wealth, if all households were headed by high school graduates, would be $2.7 billion for Ohio alone! 3 In general, the business community has an enlightened self interest. o There is more demand for workers with post-secondary education. Large companies often have job openings for applicants with post-secondary education. For example: Nationwide has about 600 jobs open all the time; only 15 percent of those positions are for applicants with only a high school diploma. o 13 of the 15 fastest growing occupations in Ohio require postsecondary education. o Businesses need a quality workforce. We need workers who have the technical as well as communication, creativity and innovation skills to be contributing members of the company, right away. It is too expensive for businesses to provide remedial training while on the job. o Businesses need knowledge workers. There will always be ideagenerating jobs within the United States. The following graphic from Tough Choices or Tough Times illustrates that point.4 43 Current Guidance for Integrity In Testing The business community also needs consumers. The company I work for is interested in protecting things that are important to you – your home, car, wealth, and health. Those who are in poverty cannot afford this protection which reduces the number of consumers of our products. There is a strong case for being involved in public education and Nationwide has taken the initiative to become involved in the Columbus Public School district. Nationwide provides volunteers, corporate giving, and philanthropic investments. Volunteers: Math tutoring. Associates from all levels of the organization volunteer to tutor students in math. We tutor students one day a week for five weeks to prepare them to take the math portion of the 10th grade Ohio Graduation Test. We bring the students to Nationwide’s downtown offices to give them a sense of the business world. 44 National Association of Test Directors 2005 Annual Proceedings Partner in Education. Nationwide has partnered with a high need elementary school for over 10 years. We now have 160 tutors that go to the elementary school one day a week for the entire school year! Executives volunteer to be on sub-committees of the Columbus School board such as the audit and accountability and budget development committees. These are areas where business people have expertise and can make a difference. Corporate giving: Principal for a Day is a program co-sponsored by the Nationwide CEO and the Superintendent of the Columbus district. This program encourages business leaders and those leaders who are up and coming to spend a day with a principal in the Columbus district to get a real view of what is happening in the school buildings and of the barriers facing our students. Nationwide has committed three people for two years to develop a leading indicator model using compliance rules for state and federal AYP indicators for Columbus Public Schools. From a business perspective, this seems critical to making dramatic improvements. It is common for the private sector to look at results at least on a monthly basis, make adjustments and monitor again. By the end of the year there are few surprises, because of the continuous monitoring. Philanthropic investments: Nationwide supports many non-profit organizations that support the students of the Columbus School district. As you can see, Nationwide cares about public education and we are willing to put our time and efforts towards improving our students’ chances at success. I believe Nationwide is setting the standard on corporate involvement in education in Columbus. Business Community and NCLB Outcomes Attained The establishment of standards, holding schools accountable to meet them and focusing on groups that have lagged the mainstream are the primary elements of NCLB that are of interest to the business community.5 The Business Roundtable has stated the need for a well trained and productive workforce and the need to increase the competitiveness of our workforce. They believe that measurable results are needed from the education system to ensure there is a strong workforce.6 Business needs a labor supply to fuel 45 Current Guidance for Integrity In Testing innovation and economic growth. We also need stronger science, technology, engineering and math education (STEM). In Ohio, we found a gap in the required curriculum in Ohio. Today only a quarter of Ohio graduates take higher level math and science which is necessary to get good meaningful jobs. We believe we need more rigorous math and science course work beyond the 10th grade graduation test. To eliminate this gap, the legislature just the added STEM curriculum to graduation requirements for all of Ohio students. I believe this gap was found, in part, due to the heightened awareness due to NCLB accountability. As a person in the private sector, I also believe we need increased accountability and increased transparency in order to improve the product – the graduate of the system. I have seen evidence that increased accountability has intensified the attention given to ensuring all students meet the standards. The transparency provided by accountability has given us insights into issues such as teacher effectiveness, early childhood readiness, and mobility. I believe this caused us to discuss issues we hadn’t faced before. We have also learned that we can successfully assess student performance, or assessment of learning. Improvements to Consider There are a few items that could be improved with the No Child Left Behind Act: Invest in an infrastructure for local districts to develop leading indicators, such as we are providing for the Columbus Public Schools. This will enable the educators to ensure that students are learning the state standards prior to being tested. In Ohio, we count the same tests three different ways, making the reporting of the accountability confusing to the general public. Ohio is considering adding a fourth way, the value-added model, one which could be more informative. The value-added model realizes that students come to schools at different levels of readiness. The valueadded model will measure the progress of individual students from year to year instead of comparing different cohorts each year. I believe this model will provide even more transparency into methods that are most effective. While the value-added approach is informative, we will need to find a way to report the accountability that is less confusing to the public. NCLB tests critical topics such as math and reading however, it doesn’t test technology skills, creativity and innovation, or communication skills; skills we desperately need in a flat world.7 46 National Association of Test Directors 2005 Annual Proceedings The Bottom-Line The business community needs school systems to be successful – we need a viable workforce with higher level skills; rote skills can be done elsewhere at a much cheaper rate. However, some of the higher level skills are not easily measured. The accountability and transparency of NCLB has the potential to show us where districts and businesses can work together to improve the outcome for our children. Businesses are willing to invest the time, money and resources to improve the effectiveness of the system. I have given you examples of what Nationwide is doing in Columbus, Ohio. I encourage you to find companies within your community with which to partner. The bottom-line is that it takes all of us to educate a child – it is our workforce! Notes 1 What Matters, Community Assessment 2004, United Way of Central Ohio, 2004 Public Schools, 2007 3 Elena Gouskova and Frank Stafford, University of Michigan Institute for Social Research, 2005 4 Tough Choices or Tough Times, National Center on Education and the Economy, 2007 5 The Columbus Dispatch, editorial, March 12, 2007 6 Education and the Workforce, www.businessroundtable.org, 2006 7 The World is Flat, Thomas L. Friedman, 2005 2 Columbus 47 Current Guidance for Integrity In Testing Discussant’s Comments Glynn Ligon ESP Solutions Group Bob Linn took a short amount of time for what I think is one of the most important points in his presentation -- the compensatory model which is looking across different areas rather than having all of these different points of failure. If you have his paper it is well worth look at that that and considering the compensatory model even more. We have to thank Judy for the mini-poster session by proxy. [Ed. note: due to illness Judy could not fly and her paper was read by the moderator]. That was excellent. So you can see all of those lines in the graphs. One thing that comes down to me as I go around and visit the states and then listen to this paper is that states are paying too much for their testing programs. They just flat over pay. I don’t know if we have any test publishers sitting here, but if there are, you’re overpaid. The fact of the matter is that as states we are spending way too much money on security. We’re paranoid about security. How many forms do they have? Way too many. Why do they have that many forms? Security. So, that’s one thing we have to change David, I gotta tell you, I have two degrees in psychology, and your presentation was fascinating because every time you talk about No Child Left Behind, you say things like, to some degree, nevertheless, on the other hand. But when you talk about continuous improvement you were are right on target. You were committed and you were excited about what you were saying. I don’t know if you noticed on the back of 48 National Association of Test Directors 2005 Annual Proceedings the paper that I received it says that Jack Grayson is going to write a foreword to this paper. Jack Grayson created Baldridge and if he had been here he would have been given you very high marks. Because once you started talking about continuous improvement you were right on with what all that Baldridge involves. That was excellent model for others to follow. Barbara, I love when you get in to talking about leading indicators. Again in this paper, it talks about what in the world are leading indicators. In education we talk more about trailing indicators and that has evolved a bit. And quite frankly, do you know the difference between trailing indicators and leading indicators? When you look at them and what you do with them. They are all just numbers. If you look at your numbers soon enough and take action on them they become leading indicators. That’s what we need to do. So, Nationwide we’re glad you’re on our side! So general comments. The compensatory model is great. It’s what needs to be done, it is the correct fix except for one thing. It should be applied to students, we should look at merging student performance across areas. It should be applied to subgroups. But I don’t think it should be applied to schools. I’m worried because it’s No Child Left Behind, and that translates to the subgroups, if we start averaging then we go back and start covering up low performing subgroups with high performing subgroups. So that’s what bothers me a little, but I think that compensatory model is the right approach. Multiple indicators. I was at a meeting in Colorado where the groups were talking about the multiple indicators used and the three accountability systems in Colorado. And, quite frankly the folks were confused by it. So, there was this whole session and it filled up a wall of posters about what indicators they should have, what multiple indicators they should have. In the end the guy got up and summarized it all by saying, well what we need to do is take all of these multiple indicators and we need to bring them together in one measure. And then I realized that that conundrum really makes sense. And what we do want is multiple indicators that influence the very small number of indicators that we actually pay attention to. Margaret, this kind of gets to your point too, that you don’t want all of these indicators, you want to get them together into one thing you can look at. And again, in the paper I handed out it talks about the difference between an indicator and an index that synthesizes indicators, which is basically what No Child Left Behind is (see Appendix for the paper). So yesterday I did something I’ve never done. I’ve been attending AERA meetings since 1976. I’ve never gone to a presidential address and I wanted to see what Eva Baker would say about testing. And she said she had a slide that talked about accountability fixes. Accountability fixes -- more indicators, opportunities to learn measures, performance assessment, formative assessments and prioritize our standards. She couldn’t have been more wrong. Absolutely wrong. She understands, but she did exactly what is happening in the United States with No Child Left Behind. People have lost site of the fact that accountability and getting diagnostics and helping 49 Current Guidance for Integrity In Testing the teachers, giving people information back that they can actually use to improve our students performance, are two different devices. And as people who have psychometric backgrounds, we’ve got to step up, and we’ve got to be the ones to say, look states you need two testing programs. You need a testing program that does accountability, and here are the characteristics of an accountability test. It’s not testing every standard with 12 items if individual students lack in diagnostic scores that teachers get back at the beginning of the next year and keep doing that every year It’s a very tight survey measure that the right psychometric properties for accountability. And then we do all of those other great things that Eva want us to do – the diagnostic assessments of all kinds. I wrote all sorts of notes on her program yesterday and half of them are underlined and had exclamation marks next to them. Do we do formative assessment separately? There are many types of formative assessments -- process assessments, diagnostic, interim, pre-test, teacher-made test, curriculum tests, performance-based test, unit tests, pop quizzes, grades for credits, benchmarks, adaptive assessments, all of these things have to do with learning, but not with accountability. And that’s what’s causing the confusion with No Child Left Behind. Because we try to do all of those things with the same state assessment test and as we saw in Judy’s paper it costs millions of dollars in Ohio to try to please everybody with a single test. So we need two tests, we need an index that handles this multiple measures situation and we need a compensatory model, but we need to be careful not to lose sight about what No Child Left Behind is all about, and not go back to masking the performance of the individual sub-groups. So, I think my message to you is, it’s time for this group – the nation’s test directors, to step up and say we need our assessments to be true to their focuses. And that probably means we need two test programs in each state. And quiet frankly states ought to be doing formative assessment. Now to pay for that, many districts are doing it themselves. They can talk about what you’re doing and then the states can focus on accountability assessments. So I think to go back to the title of our session, ‘Change It, Fix It Or Live With It.’ I think we can fix it. 50