1 Assessing Teacher-Led Reform: Using Measures of Accountability Beyond Test Scores By Jackie Bennett, Christina Collins, Maisie McAdoo and Rhonda Rosenberg United Federation of Teachers Research Department Introduction The New York City school system, under a progressive new mayor and his veteraneducator chancellor, has taken several steps to enhance teachers’ roles in decision-making, in schools as well as classrooms. The chancellor also has pressed for greater collaboration within and between schools and emphasized the importance of measures of school quality beyond standardized test scores in school accountability processes. This reverses the approach of former Mayor Michael Bloomberg and his chancellor, Joel Klein, who emphasized principal leadership and used standardized test results as the primary means to judge school and teacher performance. These changes are new, but the stakes are high. The NYC school system, like many others, has huge challenges, limited budgets, a high-needs student population and pockets of evident failure. The need for reform is indisputable, but what the mayor and chancellor are doing, in collaboration with the teachers’ and principals’ unions, runs counter to the national education reform agenda. The education research community will rightly ask if this approach works, and researchers will seek to answer that question sooner rather than later. Based on early feedback from schools and teachers involved in some of the new reforms, we suggest that educational researchers join us in exploring a series of indicators beyond standardized test scores. These indicators can identify quantifiable and qualitative changes that take place in school climate, teacher effectiveness and student well-being. Drawing from initial data gathered during the launch of these reforms, this paper concludes that in assessing these new teacher-led reforms, measures should be used which allow researchers, policy-makers, and school stakeholders to do the following: 1) Define and discuss student academic achievement using measures other than standardized test scores; 2) Examine students’ non-academic outcomes, including social-emotional growth; 3) Assess levels of collaboration between teachers and levels of parent engagement; 4) Measure the quality of implementation of innovations; 5) Access data which is useable and useful during the current classroom year. Scholarship which looks at these types of measures, we argue, is essential for the implementation of truly effective school improvement efforts. It is urgent that the research community work towards more consistent inclusion of such measures in scholarship and policy research. Measurement under Teacher Leadership The urgent need to expand measures of school and teacher success in New York City came into focus as the new administration sought to make teachers the leaders of school change. What data could these new leaders use to analyze and assess their changes? At the classroom 2 level, once-yearly standardized test results are of minimal usefulness. Even at the school level, test-score-based snapshots do not produce granular pictures of schools or point a way to improvement. The advent of new teacher-leader positions under the new administration brought with it a need for more detailed and useful ways to examine student outcomes and make decisions. The most radical of the new teacher leader reforms has been the launch of the five-year Progressive Redesign Opportunity Schools for Excellence (PROSE) program in 2014. There are currently 62 PROSE schools with another 40 or so to be added in 2015-16. They are selected by a joint panel of district and union representatives and are required to have a strong record of collaborative practices and respect for teacher “voice” or input. The result of negotiations between the city, the Department of Education and the teachers union, PROSE permits schools, with a 65 percent faculty vote, to alter DOE or union regulations in pursuit of improvement. Teachers and leaders in PROSE schools have wide latitude to change scheduling, hire and fire, plan curriculum and conduct professional development, as long as the school meets performance targets chosen by the school and the joint panel. An “Option PROSE” allows some schools to substitute peer observations for part of the state-mandated teacher evaluation process. Though some NYC schools have been using teacher-leader practices for years, the deliberate institutionalizing of teacher leadership is a new reform on a system level. In addition to PROSE, the systemic teacher leadership changes include identifying and compensating specific teacher-leader roles such as mentor, master and "ambassador" teachers, who may be released from some teaching duties to observe and help train colleagues, The new chancellor has also moved to replace many former Dept. of Education employees, mainly lawyers and administrators, with educators or education administrators. She has mandated that principals must have seven years of teaching experience versus the former three-year requirement. And she has championed teacher-to-teacher professional development over expert-led training Outputs, Inputs and Standardized Tests In order to measure these reforms we have to address the inputs vs. outputs issue. Many researchers and reformers in recent years have criticized assessments of school improvement programs which focus primarily on “inputs”-- how much money the state and city invest in schools, how much cutting-edge technology they deliver, or even teachers’ experience and academic qualifications— arguing that such measures don’t truly matter in judging whether a given reform has been successful or not in raising student achievement. (Hanushek, Levin cites) Instead, discussions of reform have been dominated by the widespread conviction that “outputs,” as defined almost exclusively by standardized measures of student achievement, are the only really objective, hard-nosed way to demonstrate education success. But as weighing the pig doesn’t fatten it, neither do measures of changes in test scores necessarily lead even to improved test scores, let alone to other increases in desired outputs. Driving reform by measuring test score outcomes does not tell teachers or policy-makers how to get there. (More worrisome, 3 over the last decade, have been too many instances of unintended consequences, such as excessive test-prep, abandonment of non-tested subjects and cheating scandals.) Further, cause and effect, in terms of a score on a test given once a year, are almost impossible to disentangle in schools, where so many parts are in simultaneous motion. A simple value-added or growth measure of test-score outcomes is important but inadequate in assessing the reforms being put into place in New York City. There is the inherent challenge of claiming clear statistical significance and causality (based on correlations in either direction) in an education reform context in which many policy changes are being implemented simultaneously. How can test scores identify the impact of new teacher roles separately from other reforms taking place, such as new state-mandated Common Core Learning Standards or redesigned graduation requirements? In addition, reporting and analysis of test score outcomes may take too long to be useful to practitioners and policy-makers; in a one- or two-year data sample, changes of a point or two in either direction are necessarily inconclusive. Strong evidence of success or failure of a given reform requires several years of fairly conclusive movement in one direction or another on a stable and predictable test. Teachers and policy-makers at the ground-level increasingly see this timeline and the limited data available from state-wide standardized test results as useless, at best, for the day-to-day work of improving the educational experience of their students – and, at worst, as a distraction from or misrepresentation of the true teaching and learning happening in their classrooms and schools. A new paradigm for measuring success is necessary to move forward in school improvement research. Alternative Measures of Student Growth and Program Success On the national level, much of the recent interest in alternatives to standardized test scores as the default measure of student academic achievement and growth has been driven by changes to teacher evaluation policy. In response to federal guidelines and incentives in the Race to the Top program which required states to include student learning measures in their teacher evaluation formulas, multiple states have sought to develop measures of growth for teachers who are in subjects and grades outside of those for which value-added measures are available - generally, teachers other than those in grades 4-8 and who teach subjects other than English Language Arts or Math. One of the most commonly adopted alternatives to value-added measures are Student Learning Objectives, which researchers have begun to focus on more frequently over the past two years as states which were early adopters of this method have both released initial results and have begun to adjust their original policies in response to implementation challenges (Gill et al 2014; Lacireno-Pacquet et al 2014) Other researchers have used correlations between student growth scores and nonacademic measures as evidence for increased use of those alternative measures as a necessary complement to standardized test results in teacher evaluation and policy and program evaluation. The best known of these in recent years has likely been the Gates-funded Measures of Effective Teaching Study, which concluded that measures such as student surveys and teachers’ ratings on observation rubrics had consistently significant (although relatively small) correlations with 4 student growth scores, concluding that inclusion of such measures in teacher evaluation formulas was recommended in order to increase the ratings’ usefulness in shaping classroom practice and district- and school-wide decision-making. There is also a growing body of research linking measures of increased collaboration at the school level with increased student academic achievement. Researchers such as Anthony Bryk and his colleagues in Chicago have used novel measures of collaboration and trust in conjunction with more familiar measures of student learning to argue that it is equally important to consistently track and discuss the inputs occurring in a classroom and school as to have high-quality analysis of the data regarding student achievement outputs if teachers and policy-makers are to have useable information for changes to practice. (Quintero 2014, Bryk; Daly) Current Alternative Measures in Use in NYC In New York City, we already have a large body of data in place which offers researchers and policy-makers alternative measures of school performance and of reform implementation, past and present. The challenge in the past has been that these measures were largely overshadowed by the importance of standardized test performance and growth in the previous administration’s heavily-weighted “Progress Report” accountability system, assigning A-F grades for each school. The new DOE administration has been gradually shifting away from this system into a more nuanced portrayal of school performance which adapts many of the measures below using a “Capacity Framework” based on Anthony Bryk’s research on school success in Chicago. (Bryk cite) One large source of data is a yearly survey of parents, teachers and (middle and high school) students at each school that collects information on academic expectations, learning climate, leadership and parent engagement. These surveys, which have been given to all public school teachers, parents and upper-grades students for the last eight years, can be analyzed quantitatively. The responses of parents, teachers and students are reported separately and provide nuanced information than can be cross-referenced. School Quality Reviews are conducted by education experts (though not every school is assessed every year) and rate schools’ curriculum, instructional practices, learning environment and professional collaborations on a four-point scale. Peer School Comparisons—DOE groups schools into peer groups that share similar entering test scores, percentages of special education and high-needs special education students, and some other factors to create groups of similar schools. Schools are assessed against their peers on test-score growth, credit accumulation, attendance, graduation, post-secondary enrollment and other factors. Local Measures of Student Learning (MOSL)—As part of statewide teacher evaluations, teachers, in discussion with their principals, currently set goals for local ‘measures of student learning,’ which can include student engagement, quality of questioning, mastery of a skill and the like. In schools with strong teacher leadership these aspects of the evaluation can provide evidence of success. 5 Thanks to New York State’s and New York City’s many advances in education data collection and reporting over the last decade, researchers also have access to many other aggregate (or aggregate-able) measures; although access to this data is sometimes restricted by student privacy laws and other regulations, many of these data points have been collected for all schools for multiple years They include but are not limited to 1) Academic measures such as grades; rates of grade promotion; credit accumulation; college attendance and persistence; and student portfolio assessment; 2) Indicators of “conditions for student growth,” such as attendance; suspensions; school safety incidents; survey reports on school environment; parent participation rates; enrollment and retention of low-income or high-needs students, such as English language learners, special education students or students in temporary housing. 3) Measures of teacher effectiveness and persistence: teacher turnover; teacher experience and longevity; teacher leadership (grade or department chairs, mentor and master teachers); participation on School Leadership Teams; participation in curriculum writing; and state teacher evaluation ratings.* *State-mandated teacher evaluations, new this year, will also include principal and teacher observational metrics that may be accessible if privacy issues can be resolved. New Measures Teachers' accounts of how their daily practice changes under new policies should be at the core of assessing instructional and curricular reforms, and yet little research has focused on this key question of implementation at the classroom level. The United Federation of Teachers has been active in seeking to explore and expand the use of alternative measures of school and teacher success, with the goal of increasing the use of measures which are instructionally relevant and timely as well as being rigorous and comparable across classrooms. Teachers’ first-person accounts of their efforts are part of the raw data available in a database of published interviews with teachers from the New York Teacher and other union publications and reports. The union also has begun use of an annual survey which uses a representative sampling technique to assess members’ experiences and opinions on a range of subjects not covered by the district survey. The second iteration of this survey is scheduled to be distributed and analyzed in Spring 2015. In addition, the launch of the PROSE program has allowed the union and district to work collaboratively with each other and with the staff and leadership of the sixty-two schools in the first PROSE cohort to explore and identify measures of success which the schools believe will be especially relevant in gauging their implementation of a range of innovations, including changes to teacher evaluation, scheduling and calendar changes, and school and city policies for grading and credit accumulation. 6 As part of their initial application to the program in June 2014, PROSE applicants were asked to identify which measures their schools either currently used or were interested in exploring as ways to measure the success of innovative practices. Several of the schools are members of the New York Performance Standards Consortium. These schools use Performance Based Assessment Tasks (PBATs), in which students present an individually selected project or portfolio to a panel of teachers and other adults at the end of the year. Consortium schools and other PROSE schools proposed dozens of different measures, summarized in the attached table and excerpted below. Measures identified by the PROSE schools as important to their definitions of success included the following categories and selected examples of measures other than state tests: Academic Measures Student portfolios Middle- and high-school admissions High school credit accumulation Student promotion and graduation Enrollment in advanced courses College admission and retention Student Non-Academic Measures Behavior Attendance and retention Social-emotional well-being Enrollment and retention of high-need students Number and diversity of student applicants Other Measures Teacher retention Teacher performance (assessed by school leaders and peer teachers) Administrator performance Support for collaboration within school Collaboration with other schools Parent participation and satisfaction One specific example of an innovation being pursued by a number of PROSE schools is the Mastery-based model of learning and grading. Students are given the opportunity to show learning by documenting mastery of a body of knowledge at an individualized level rather than within a standardized curriculum pacing model. PROSE has offered a subset of these schools the opportunity to experiment with different ways of scheduling student classes and reporting students’ grades and credit accumulation to the district as part of a pilot program this spring. Standardized methods of tracking student credit accumulation or test results would not provide adequate data to assess the implementation and success of this pilot program, especially since many of these schools are already using school-specific student learning management 7 programs to track students’ progress in mastery-based courses. Such practices make it imperative for researchers and policy-makers to turn to the schools themselves for accurate information about what is happening at the school and classroom level and how it has impacted their students’ experiences. Using the data—Theory of Change Many PROSE and other schools are currently tracking or are interested in finding ways to track data points which are not required by or available within city or state accountability systems, but which they have identified as important to student success at their schools and which many use for making instructional decisions on a regular basis during the school year. The PROSE leadership team, composed of union and DOE representatives, is working closely with local staff and with schools to determine how best to use this data to inform and improve the implementation of the program and the school-level innovations. In addition to continuing to identify and refine the measures above, we have begun to develop a Theory of Change (TOC) model for use in helping stakeholders go through a process of linking implementation of PROSE innovations to their aims for student learning and growth. The leadership team is also developing a TOC for the program as a whole. By making the assumed links between a given practice at the school level and teacher or student outcomes more explicit, the TOC process has helped us understand which measures are most important for determing the innovations that could have the greatest impact on the schools we are working with and the students they serve. In March 2015, we will be launching a Participatory Action Research project to work closely with seven PROSE schools to develop a Theory of Change and help them identify which measures of success will be most relevant and useful in gauging the impact of the innovations they are implementing. PROSE is just a single example of an innovative and teacher-led school improvement effort in which standardized tests cannot capture all the information necessary to gauge success in either implementation or results. Researchers interested in fuller and more nuanced examinations of the success of school and district reforms should take advantage of the similar engagement with self-selected measures of success at many schools to shape their research questions and strategies. In doing so, they will both be breaking new ground in the field of educational research methods and providing teachers, schools, and other stakeholders in education the information they need to truly improve student learning.