Growth, Value-Added and Teacher Effectiveness Measures Philip R. Fletcher Senior Research Scientist Pearson Teacher opinion A recent international survey of teachers shows: --That the vast majority of teachers welcome appraisal and feedback on their work. --That it improves their job satisfaction and effectiveness as teachers. --But too many teachers do not receive any feedback on their work at all. --Moreover, evaluation is perceived to be an instrument of compliance rather than development. Teacher ratings Most school districts use pass-fail ratings where nearly all teachers pass. 99% of teachers in districts using binary ratings are rated satisfactory. 94% of teachers in districts using multiple points are in the top two categories. As Arne Duncan noted, “Ninety-nine percent of our teachers are above average.” Teacher salaries Teacher compensation is very predictable. Based on the teacher’s highest degree and years of seniority. Almost completely unrelated to variations in teacher effectiveness. Effectiveness varies Anecdotal and empirical evidence suggests that teachers differ dramatically in effectiveness. An effective teacher will raise student test scores by ten percentiles per year. Three years of effectively teachers raise test scores by thirty percentiles. Traditional teacher evaluation systems fail to recognize these differences. Teacher recognition The need to recognize teachers who make magnificent contributions to student learning. The need to motivate people to gain expertise. And the need to leverage expert teachers and reward them for their efforts. To ensure that students are taught successfully, there is need to differentiate teacher effectiveness in terms of their impact on student learning. Status, growth and effectiveness Student achievement is the status of accumulated subject matter knowledge at one point in time—a lagging indicator. Student learning is growth in subject matter knowledge over time—a leading indicator. It is student learning—not student achievement— that is most relevant in defining and assessing teaching effectiveness. Status, growth and effectiveness Achievement provides evidence of the status of student knowledge and understanding at one point in time. Learning is demonstrated by growth in student achievement from one point in time to another point in time–not by status at either point time alone. Effectiveness is demonstrated by above-average student learning and growth. Status, growth and effectiveness Schematically: Status = Achievement Growth = Learning Relative Growth = Effectiveness Status and growth Relative growth and effectiveness Why growth? Growth reflects learning, and we care about student learning. Because the principle role of teachers is to enhance student learning. Teacher effectiveness should be reflected in how much their students learn. Official incentives Teacher Incentive Fund (TIF) grants require school districts to evaluate teachers. Race to the Top (RttT) funds require a state commitment to measuring teacher effectiveness. No Child Left Behind (NCLB) required testing of all students in reading in mathematics, leading to the development of longitudinal data systems linked to individual teachers. Student testing Most states have test data linked to specific schools and teachers that can be used to track student growth. Many assessment systems are based on student test score growth over time: Value-added models Student growth percentiles Both address effectiveness in terms of learning rather than status. Value-added assessment Value-added models are designed to assess school and teacher contributions to student growth. A value-added assessment model is designed to demonstrate the impact of individual schools and teachers. It is designed to distinguish between teacher effects and other outside influences. Value-added assessment Value-added captures the growth that classes of students achieve during a single year of schooling. To estimate classroom effects, student data include only the students enrolled in a particular class. Value-added assessment Key idea is to statistically isolate the contribution of individual teachers from all other sources of influence. Value-added analyses attempt to determine the amount of student growth that can be attributed to an individual teacher. Value-added models quantify teacher effectiveness— the teacher’s contribution to student learning and growth. Value-added assessment Value-added attributes causality to the teacher. Teachers are responsible for the learning and growth of their students. Under conditions of high stakes accountability, student growth directed toward responsibility and cause. Value-added assessment Some statisticians would argue that value-added unsuited for drawing causal inferences that a given teacher is responsible for the increase in student test scores. “We do not think that their analyses are estimating causal quantities, except under extreme and unrealistic assumptions.” –Rubin, Stuart, and Zanutto (2004). “…it does not appear possible to separate teacher and school effects using currently available accountability data.” –Raudenbush (2004). Value-added assessment Policymakers and school administrators generally express no such reservations and offer strong support for the value-added. “If quality instruction is essential for student learning, then student learning should tell us something about the quality of instruction.” Descriptive accountability Accountability system results may have value without making causal inferences. From this perspective, accountability results should not be used to sanction teachers in schools. Instead, they should be used to make sound judgments about quality and needed improvements. Descriptive information and identification of schools that may require further investigation. Describing student growth The Colorado Growth Model was designed to describe student growth and learning. Quantile regression is used to model the complete distribution of student achievement over time. The model quantifies distance = rate time, probabilistically. Growth percentiles describe the rarity of a student’s current growth, given their prior achievement. Student growth percentiles Student growth percentiles Examining growth with achievement sheds new light on school performance. Median growth above the 50th percentile identifies best practices and needs to provide support. Median growth below the 50th percentile identifies greatest needs and needs to receive support. A gap-closing strategy is built around a consensus of school improvement. Student growth percentiles Common yardstick Most states have administrative data that can be used as a common yardstick to identify the 25% most effective teachers. Supervisor ratings and classroom observations provide no such common yardstick. Local implementation of these other measures varies in 1600 school districts nationwide. More importantly, they do not directly represent student learning. Value-added and growth limitations Value-added and growth percentiles are only available for teachers in certain subject matter areas. Value-added and growth percentiles are available for only a small subset of teachers. Value-added and growth percentiles are limited by the test. Growth metrics are too narrow to provide information about how teachers can improve. Value-added and growth shortcomings Value-added metrics and growth percentiles for individual teachers fluctuate from year to year. They can be influenced by factors beyond the teacher’s control. They are imperfect measures with a relatively large error component. Concern How well does value-added predict the top 25% from year-to-year? How well do alternative measures of teacher effectiveness predict the same top 25% from year to year? Classroom observations? Principals’ ratings? Student surveys? Value-added and growth compare favorably Value-added metrics and growth percentiles compare favorably with performance measures in other fields. The correlation between SAT test scores and freshman success in college is 0.35. The correlation in batting averages between years in professional baseball is 0.36. The correlation between value-added estimates this year and next lies between 0.20 and 0.60. While most value-added estimates correlate 0.30 and 0.40 between years. Value-added and growth prognosis Recommend the use of value-added measures and growth percentiles, principally because they are related to student learning and growth. Are mindful of their limitations and imperfections. Strive continually to improve these growth measures. Suggestion Use multiple measures—not only value-added metrics and growth percentiles. Alternate measures should meaningfully supplement state test score data and increase prediction. Alternate measures should be applicable to a broader range of teachers. Provide direct information and feedback suggesting how teachers can improve teaching. Suggestion Use core and non-core measures to validate the full range of teacher effectiveness for a broader range of teachers. Where value-added metrics and growth percentiles benchmark the reliability of other teacher effectiveness measures. The key idea is to predict the benchmark growth measures. Weight different measures based on their power of prediction. Observational measures What is needed is not so much an accounting of teacher time or a rating of teacher performance, but rather higher level inferences about the teacher’s ultimate purposes and effects. Making holistic judgments requires higher levels of inference. In short, we need a method to obtain holistic rankings reliably and validly. Procedures must minimize rater effects and coding errors. Classroom Interactions A complex situation, difficult to characterize unassisted. Teacher practice and student-teacher interactions— from the participants’ point of view. How do students and teachers interact in a practical and personal sort of way? How do they approach and solve problems together? Are there different classroom profiles? Concourse of meaning The first challenge is to figure out what makes great teaching. This is difficult and controversial from an educational perspective. Yet relatively straightforward from a managerial perspective. Find the best educators and give them an opportunity to debate and create the best pedagogy and teaching practice. Danielson Framework Charlotte Danielson’s Framework serves as a source of statements about teacher effectiveness. The Framework is divided into: --4 Domains --23 Components --76 Elements --304 Items Danielson Framework The 4 Domains include: --Planning and Preparation --The Classroom Environment --Instruction --Professional Responsibilities Danielson Framework The 2 Domains that students actually see: --The Classroom Environment --Instruction Danielson Framework Scoring rubrics: Danielson Unsatisfactory Basic Proficient Distinguished New York State Ineffective Developing Effective Highly Effective Danielson Framework Items: Rubric Unsatisfactory Basic Proficient Distinguished Item Students not working with the teacher are disruptive to the class. Small groups are only partially engaged while not working directly with the teacher. The students are productively engaged during small group work. Students take the initiative with their classmates to ensure that their time is used productively. Danielson Framework The Danielson Framework is prescriptive. Unsatisfactory and basic performance are often just the negation of proficient and distinguished performance. No guide to what teachers do when under stress. Good behavior follows rules. Lacks insight from control theory and negative feedback. “Students help set high standards.” Danielson Framework A good basis for a limited number of items. These items can be readily supplemented with items from other sources, by other authors. Use these sources and create new items to fully cover what students and teachers actually do. Growth, value-added and teacher effectiveness measures Features Student Growth Value-Added TeacherEffectiveness Focus Student Teacher/Educator Teacher/Educator Questions addressed 1. 2. How much did this student grow? Is the student on track? 1. 2. Input variables Output Student scores only 1. 1. 2. 1. 2. 2. Student achievement percentile Student growth percentile How does teacher-classroom growth compare to expected growth? How does teacher-classroom growth compare to that of other teacher-classrooms? Student scores and their characteristics Teacher characteristics Teacher value-added metric Teacher growth percentile To what extent is the teacher/educator effective? 1. 2. Multiple measures Multiple methods 1. Effectiveness scores on individual measures Composite score on multiple measures Predicted comparable valueadded metric Predicted comparable growth percentile 2. 3. 4.