Jacob - Policy Applications

advertisement
Policy Applications of Teacher
Performance Measures:
A Review of the Evidence
Brian Jacob
Presented at the Policy Forum on
Teacher Effectiveness in New Jersey
March 22, 2011
My Task
• Summarize existing evidence on the impact of
policy applications of teacher performance
measures
• I will focus on teacher-level programs including:
– Teacher value-added measures (VAM) and/or
– Classroom observation protocols
• I will focus on programs designed for evaluation
rather than for only diagnostic purposes
Outline
1. The use of VAM to determine teacher
compensation, hire/transfer, and
tenure/dismissal
2. Classroom Observation Protocols
3. Holistic teacher evaluation “systems” (e.g.,
TAP, Minnesota Q-Comp)
4. Some thoughts on implementation details
Teacher VAM and Compensation
• Most common use of VAM to date involves teacher pay,
usually bonuses
• Two goals of these policies:
– Increase teacher effort in the short/medium-term
– Change the composition of the teaching force in the longer-term
• Important underlying assumptions:
– Teachers are not working on the “efficiency frontier” in the existing
system
– Teachers have the capacity/support to change behavior
– The labor supply of potentially effective teachers is relatively elastic
Evidence on VAM and Teacher Pay
• Studies in other countries show positive impact on student
performance
– Probably not very applicable to U.S. context
• Nashville, TN: individual bonuses for middle school math
teachers up to $15,000
– No effect on achievement; few effects on any other intermediate
outcomes
– Period of tremendous achievement growth in Nashville: student
achievement in T and C groups improved substantially
• New York City: bonuses assigned to schools on the basis of
school performance goals, with an average bonus of
$3,000/teacher for meeting achievement goals
– No effect on student achievement or a host of other student and
teacher measures
– Some evidence of more positive effects in smaller schools
VAM and Other Teacher Policies
• No existing programs use VAM for hiring
• “Talent Transfer” program bases transfers/bonuses on VAM
– Identifies high-performing teachers and offers them incentives to
stay/move to high-needs schools for up to two years
– Large financial incentives $10,000/yr for two years to move
– Teachers identified through VA analysis and schools identified on the
basis of low performance
– Charlotte-Mecklenberg started in 2008; LA and Miami in 2010
• No existing policy dismisses and/or denies tenure to teachers
based on VAM alone
Classroom Observation Protocols (COPs)
• Why use classroom observations to measure teacher
performance?
– Additional measure increases reliability of overall rating
– Captures different aspects of effectiveness
– By focusing on teacher practices, COPs can (a) mitigate
gaming/strategic behavior, and (b) provide useful feedback to teachers
• How do current COPs differ from traditional teacher
evaluations conducted by administrators?
– Based on set of pre-specified, observable behaviors
– Outside observers; Multiple unannounced observations per year
• Variation in COPs
–
–
–
–
Danielson: student-teacher relationships, instructional approaches
CLASS: focus on emotional climate of the classroom
MQI: accuracy and richness of teacher’s mathematical knowledge
PLATO: focus on instructional strategies specific to secondary ELA
Research on COPs
• Growing research base suggests COP scores are positively
associated with student achievement
– Danielson: 200+ teachers in Cincinnati, positive relationship with student
achievement
– PLATO: 24 middle school ELA teachers in NYC – 2nd vs. 4th quartile
– MQI: 24 middle school math teachers in Southwestern district – correlations
of .2 to .5 with VAM
• Caveats/Limitations
– Very small samples; timing of observations and student achievement
– COPs are *less* predictive of current student achievement than prior VAM
– COPs are*not* generally more predictive than informal/holistic supervisor
ratings
– Gates-funded MET project (Measures of Effective Teaching) will explore this
issue with much larger samples, using much more sophisticated analysis
• Recent study in Cincinnati shows that participation in
evaluation itself increases teacher VA by .1 s.d.
Teacher Evaluation “Systems”
• No evidence that merit pay alone works in the
short-run
• Some evidence that COPs are associated with
teacher VAM, and the evaluation process itself
may be beneficial
• => Comprehensive programs: development and
training opportunities, data and feedback,
incentives and evaluation
• Examples include: D.C. IMPACT, Teacher
Advancement Program (TAP), Q-Comp in
Minnesota
Teacher Advancement Program (TAP)
• Four components:
–
–
–
–
(1) performance-based compensation
(2) master/mentor teachers
(3) framework for teacher evaluation
(4) ongoing trainings and collaborative groups
• Springer, Ballou and Peng (2008)
– Uses student-level NWEA data in 2 states to compare achievement gains
within schools over time – do gains increase after a school adopts TAP
– Some positive effects for gr 1-5 , but negative effects for gr 9 and 10
• Mathematica Study of Chicago TAP – Yr 2 (2010)
– performance pay for principals and other school staff; average teacher bonus
was $2,000; mentor (master) teachers received $7,00 ($15,000)
– Hybrid design – random assignment of 16 schools + matching
– No effects on student achievement, teacher retention
Denver Pro-Comp
• Consists of 4 components:
–
–
–
–
Market incentives (hard-to-staff schools/subjects)
Student growth (VAM)
Knowledge and skills (completion of degrees or PD)
Professional evaluations
• Voluntary for teachers hired before 2006; mandatory for
teachers hired on/after Jan 1, 2006
• Growth in math and reading since 02-03 districtwide, but
hard to attribute this to Pro-Comp alone
• Some tentative evidence suggesting there might be some
very small positive effects
– Productivity effects and composition effects
• Ongoing work looks at retention in hard-to-serve schools
Minnesota’s Q-Comp
• Voluntary district-level program
• District plan must include multiple teacher career paths, jobembedded professional development, teacher evaluation,
performance pay, and a revised teacher salary schedule
• Districts get $190 per pupil in state aid + $70 per pupil extra
• Since inception in 2005, 46 of Minnesota’s 337 traditional
districts and several charter schools have opted in
• Considerable variation in content of district plans
• Recent evaluation found few positive impacts on student
achievement
– No effect on math achievement
– Potential small effects on reading achievement in some districts that
focused on actions/outcomes rather than subjective evaluations
The Devil is in the
(Implementation) Details
• Much of the debate focuses on what
measures will be included and how heavily
they will weigh in the overall evaluation
• This misses at least 3 critical implementation
choices:
– Where to set cutoffs for different levels?
– How to average across components (VAM, COP)?
– What actions are tied to evaluation outcomes?
Setting Cutoffs to Avoid the
Widget Effect
• Report by TNTP documents that traditional teacher evaluation
systems overstate the number of exemplary teachers and
understate the number of mediocre or ineffective teachers
Distribution of Teacher Ratings Across Districts (Source: Widget Effect)
Domain
Distinguished
Proficient
Basic
Unsatisfactory
Chicago
68.7%
24.9%
6.1%
0.4%
Cincinnati
57.8%
34.7%
6.9%
0.6%
Distribution of Teacher Ratings in DC IMPACT 09-10
Overall Score
Highly
Effective
Effective
Minimally
Effective
Ineffective
16%
66%
16%
2%
The Effect of Averaging
• Given the 4-point scales commonly used, components such as
school performance and “other” outcomes that comprise a small
fraction (~10%) of overall rating will generally have no impact on
teacher’s overall rating
• Suppose that a system includes only VAM and COP, classifies
teachers on a 4-pt scale, weights both equally, and that VAM and
COP are correlated .3
– If 2% of teachers receive ratings of “1” on each, then only .4% will
receive a final rating of “1”
– If 10% of teachers receive ratings of “1” on each, then only 3.5% will
receive lowest rating
• Possible remedies?
– Triage: score of “1” on any measure triggers review
Teacher Non-Renewal Policy in
the Chicago Public Schools (CPS)
• Starting in 2004-05, new collective bargaining agreement in
CPS stipulated that principals could dismiss (i.e., non-renew)
any probationary teacher (years 1-4) outside the RIF process
and without the typical process associated with dismissal for
cause
• Streamlined process: on-line system allow principals to “click”
teachers to non-renew
• Effect of this policy provides some insight on potential impact
of COPs and TESs
– How frequently will principals to dismiss teachers?
– What teacher characteristics do principals value?
– What are the effects on teacher and student outcomes?
Teacher Non-Renewal in Chicago
• Principals do seem to consider (proxies for) teacher
productivity in determining which teachers to dismiss.
– Principals are more likely to dismiss teachers who are frequently
absent, failed the teacher certification exam at least once, and
who have received worse evaluations in the past.
– Principals are less likely to dismiss teachers who attended a
more competitive college and have a MA degree.
– Elementary teachers who were dismissed had lower VAM than
their peers who were not dismissed.
• Policy reduced absences among probationary teachers by
roughly 10-20 percent
Teacher Non-Renewal in Chicago
• Roughly 40% of principals did not dismiss any of their
probationary teachers over the first 3 years of the policy
– Includes many principals in the lowest performing schools in the
district
• Potential explanations
– Teacher labor supply
– Social norms
• Implications
– Managerial discretion alone will not necessarily change
personnel practices
– Principal training?
– Changing the default?
Extra slides
Cincinnati Teacher Evaluation System
• Adapted from Danielson framework; started in 2000-01
• 4 evaluations per year by outside peer experts; 1 additional
observation by school administrator
• 4 domains, only 2 of which assessed by observation:
learning environment (D2) and teaching for learning (D3)
– Teachers receive scores from 1-4 in each of 32 elements, which
are aggregated up to 15 standards and then the 4 domains
• End-of-year scores in each standard/domain are
holistic/subjective determination of outside evaluators
– Based on “preponderance of the evidence” and can account for
growth over the year and/or extenuating circumstances
Evidence on Cincinnati TES
• TES scores predict student achievement even after
controlling for various student characteristics (including
prior achievement)
– 1 point increase in TES => 0.10 -.15 sd increase in scores
– Top vs. bottom quartile teacher => 3 percentage point diff
• Conditional on overall score, teachers who score high in
classroom environment (Domain 2) relative to teaching
practices (Domain 3) appear more effective in math
• Conditional on overall score, teachers whose instruction
focuses on questions/discussion relative to
standards/content appear more effective in reading
The Widget Effect
Distribution of Teacher Ratings in Cincinnati TES – Since 2001
Domain
Distinguished
Proficient
Basic
Unsatisfactory
Classroom Environment
64.1%
31.4%
3.7%
0.7%
Teaching Strategies
46.1%
47.4%
6.4%
0.1%
Download