Teacher Effectiveness - National Network of State Teachers of the

advertisement
Growth, Value-Added and
Teacher Effectiveness Measures
Philip R. Fletcher
Senior Research Scientist
Pearson
Teacher opinion
A recent international survey of teachers shows:
--That the vast majority of teachers welcome appraisal and
feedback on their work.
--That it improves their job satisfaction and effectiveness as
teachers.
--But too many teachers do not receive any feedback on their
work at all.
--Moreover, evaluation is perceived to be an instrument of
compliance rather than development.
Teacher ratings
Most school districts use pass-fail ratings where
nearly all teachers pass.
99% of teachers in districts using binary ratings are
rated satisfactory.
94% of teachers in districts using multiple points
are in the top two categories.
As Arne Duncan noted, “Ninety-nine percent of our
teachers are above average.”
Teacher salaries
Teacher compensation is very predictable.
Based on the teacher’s highest degree and years of
seniority.
Almost completely unrelated to variations in teacher
effectiveness.
Effectiveness varies
Anecdotal and empirical evidence suggests that
teachers differ dramatically in effectiveness.
An effective teacher will raise student test scores by
ten percentiles per year.
Three years of effectively teachers raise test scores
by thirty percentiles.
Traditional teacher evaluation systems fail to
recognize these differences.
Teacher recognition
The need to recognize teachers who make
magnificent contributions to student learning.
The need to motivate people to gain expertise.
And the need to leverage expert teachers and
reward them for their efforts.
To ensure that students are taught successfully,
there is need to differentiate teacher effectiveness
in terms of their impact on student learning.
Status, growth and effectiveness
Student achievement is the status of accumulated
subject matter knowledge at one point in time—a
lagging indicator.
Student learning is growth in subject matter
knowledge over time—a leading indicator.
It is student learning—not student achievement—
that is most relevant in defining and assessing
teaching effectiveness.
Status, growth and effectiveness
Achievement provides evidence of the status of
student knowledge and understanding at one point
in time.
Learning is demonstrated by growth in student
achievement from one point in time to another point
in time–not by status at either point time alone.
Effectiveness is demonstrated by above-average
student learning and growth.
Status, growth and effectiveness
Schematically:
Status = Achievement
Growth = Learning
Relative Growth = Effectiveness
Status and growth
Relative growth and effectiveness
Why growth?
Growth reflects learning, and we care about student
learning.
Because the principle role of teachers is to enhance
student learning.
Teacher effectiveness should be reflected in how
much their students learn.
Official incentives
Teacher Incentive Fund (TIF) grants require school
districts to evaluate teachers.
Race to the Top (RttT) funds require a state
commitment to measuring teacher effectiveness.
No Child Left Behind (NCLB) required testing of all
students in reading in mathematics, leading to the
development of longitudinal data systems linked to
individual teachers.
Student testing
Most states have test data linked to specific schools and
teachers that can be used to track student growth.
Many assessment systems are based on student test
score growth over time:
Value-added models
Student growth percentiles
Both address effectiveness in terms of learning rather
than status.
Value-added assessment
Value-added models are designed to assess school
and teacher contributions to student growth.
A value-added assessment model is designed to
demonstrate the impact of individual schools and
teachers.
It is designed to distinguish between teacher effects
and other outside influences.
Value-added assessment
Value-added captures the growth that classes of
students achieve during a single year of schooling.
To estimate classroom effects, student data include
only the students enrolled in a particular class.
Value-added assessment
Key idea is to statistically isolate the contribution of
individual teachers from all other sources of
influence.
Value-added analyses attempt to determine the
amount of student growth that can be attributed to
an individual teacher.
Value-added models quantify teacher effectiveness—
the teacher’s contribution to student learning and
growth.
Value-added assessment
Value-added attributes causality to the teacher.
Teachers are responsible for the learning and
growth of their students.
Under conditions of high stakes accountability,
student growth directed toward responsibility and
cause.
Value-added assessment
Some statisticians would argue that value-added
unsuited for drawing causal inferences that a given
teacher is responsible for the increase in student test
scores.
“We do not think that their analyses are estimating
causal quantities, except under extreme and unrealistic
assumptions.” –Rubin, Stuart, and Zanutto (2004).
“…it does not appear possible to separate teacher and
school effects using currently available accountability
data.” –Raudenbush (2004).
Value-added assessment
Policymakers and school administrators generally
express no such reservations and offer strong
support for the value-added.
“If quality instruction is essential for student
learning, then student learning should tell us
something about the quality of instruction.”
Descriptive accountability
Accountability system results may have value
without making causal inferences.
From this perspective, accountability results should
not be used to sanction teachers in schools.
Instead, they should be used to make sound
judgments about quality and needed improvements.
Descriptive information and identification of schools
that may require further investigation.
Describing student growth
The Colorado Growth Model was designed to
describe student growth and learning.
Quantile regression is used to model the complete
distribution of student achievement over time.
The model quantifies distance = rate  time,
probabilistically.
Growth percentiles describe the rarity of a student’s
current growth, given their prior achievement.
Student growth percentiles
Student growth percentiles
Examining growth with achievement sheds new light
on school performance.
Median growth above the 50th percentile identifies
best practices and needs to provide support.
Median growth below the 50th percentile identifies
greatest needs and needs to receive support.
A gap-closing strategy is built around a consensus of
school improvement.
Student growth percentiles
Common yardstick
Most states have administrative data that can be
used as a common yardstick to identify the 25%
most effective teachers.
Supervisor ratings and classroom observations
provide no such common yardstick.
Local implementation of these other measures
varies in 1600 school districts nationwide.
More importantly, they do not directly represent
student learning.
Value-added and growth
limitations
Value-added and growth percentiles are only
available for teachers in certain subject matter
areas.
Value-added and growth percentiles are available
for only a small subset of teachers.
Value-added and growth percentiles are limited by
the test.
Growth metrics are too narrow to provide
information about how teachers can improve.
Value-added and growth
shortcomings
Value-added metrics and growth percentiles for
individual teachers fluctuate from year to year.
They can be influenced by factors beyond the
teacher’s control.
They are imperfect measures with a relatively large
error component.
Concern
How well does value-added predict the top 25%
from year-to-year?
How well do alternative measures of teacher
effectiveness predict the same top 25% from year to
year?
Classroom observations?
Principals’ ratings?
Student surveys?
Value-added and growth compare
favorably
Value-added metrics and growth percentiles compare favorably
with performance measures in other fields.
The correlation between SAT test scores and freshman success
in college is 0.35.
The correlation in batting averages between years in
professional baseball is 0.36.
The correlation between value-added estimates this year and
next lies between 0.20 and 0.60.
While most value-added estimates correlate 0.30 and 0.40
between years.
Value-added and growth prognosis
Recommend the use of value-added measures and
growth percentiles, principally because they are
related to student learning and growth.
Are mindful of their limitations and imperfections.
Strive continually to improve these growth
measures.
Suggestion
Use multiple measures—not only value-added
metrics and growth percentiles.
Alternate measures should meaningfully supplement
state test score data and increase prediction.
Alternate measures should be applicable to a
broader range of teachers.
Provide direct information and feedback suggesting
how teachers can improve teaching.
Suggestion
Use core and non-core measures to validate the full
range of teacher effectiveness for a broader range of
teachers.
Where value-added metrics and growth percentiles
benchmark the reliability of other teacher effectiveness
measures.
The key idea is to predict the benchmark growth
measures.
Weight different measures based on their power of
prediction.
Observational measures
What is needed is not so much an accounting of
teacher time or a rating of teacher performance, but
rather higher level inferences about the teacher’s
ultimate purposes and effects.
Making holistic judgments requires higher levels of
inference.
In short, we need a method to obtain holistic
rankings reliably and validly.
Procedures must minimize rater effects and coding
errors.
Classroom Interactions
A complex situation, difficult to characterize
unassisted.
Teacher practice and student-teacher interactions—
from the participants’ point of view.
How do students and teachers interact in a practical
and personal sort of way?
How do they approach and solve problems together?
Are there different classroom profiles?
Concourse of meaning
The first challenge is to figure out what makes great
teaching.
This is difficult and controversial from an
educational perspective.
Yet relatively straightforward from a managerial
perspective.
Find the best educators and give them an
opportunity to debate and create the best pedagogy
and teaching practice.
Danielson Framework
Charlotte Danielson’s Framework serves as a source
of statements about teacher effectiveness.
The Framework is divided into:
--4 Domains
--23 Components
--76 Elements
--304 Items
Danielson Framework
The 4 Domains include:
--Planning and Preparation
--The Classroom Environment
--Instruction
--Professional Responsibilities
Danielson Framework
The 2 Domains that students actually see:
--The Classroom Environment
--Instruction
Danielson Framework
Scoring rubrics:
Danielson
Unsatisfactory
Basic
Proficient
Distinguished
New York State
Ineffective
Developing
Effective
Highly Effective
Danielson Framework
Items:
Rubric
Unsatisfactory
Basic
Proficient
Distinguished
Item
Students not working with the teacher are
disruptive to the class.
Small groups are only partially engaged
while not working directly with the teacher.
The students are productively engaged
during small group work.
Students take the initiative with their
classmates to ensure that their time is used
productively.
Danielson Framework
The Danielson Framework is prescriptive.
Unsatisfactory and basic performance are often just
the negation of proficient and distinguished
performance.
No guide to what teachers do when under stress.
Good behavior follows rules. Lacks insight from
control theory and negative feedback.
“Students help set high standards.”
Danielson Framework
A good basis for a limited number of items.
These items can be readily supplemented with items
from other sources, by other authors.
Use these sources and create new items to fully
cover what students and teachers actually do.
Growth, value-added and teacher
effectiveness measures
Features
Student Growth
Value-Added
TeacherEffectiveness
Focus
Student
Teacher/Educator
Teacher/Educator
Questions
addressed
1.
2.
How much did this student
grow?
Is the student on track?
1.
2.
Input variables
Output
Student scores only
1.
1.
2.
1.
2.
2.
Student achievement
percentile
Student growth percentile
How does teacher-classroom
growth compare to expected
growth?
How does teacher-classroom
growth compare to that of
other teacher-classrooms?
Student scores and their
characteristics
Teacher characteristics
Teacher value-added metric
Teacher growth percentile
To what extent is the
teacher/educator effective?
1.
2.
Multiple measures
Multiple methods
1.
Effectiveness scores on
individual measures
Composite score on multiple
measures
Predicted comparable valueadded metric
Predicted comparable
growth percentile
2.
3.
4.
Download