Formative evaluation of teaching performance

advertisement
Formative evaluation of
teaching performance
Dylan Wiliam (@dylanwiliam)
INEE seminar, Mexico City, 5 December 2013
www.dylanwiliam.org
Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Education matters, for individuals and society
Teaching quality is the crucial variable
Teaching quality is not the same as teacher quality
Predicting who will be good teachers is almost impossible
Evaluating teacher quality is inherently difficult
Professional development is the key to teacher quality
Feedback is more complicated than generally assumed
Formative evaluation of teaching performance
Strategies for formative evaluation
Validity of formative evaluation of teaching
Implementing formative evaluation of teaching
3
Education matters:
for individuals and society
What is the purpose of education?
4

Four main philosophies of education






Personal empowerment
Cultural transmission
Preparation for citizenship
Preparation for work
All are important
Any education system is a (sometimes uneasy)
compromise between these four forces
Raising achievement matters
5

For individuals:
 Increased lifetime
earnings
 Improved
health
 Longer life

For society:
 Lower
criminal justice costs
 Lower healthcare costs
 Increased economic growth:
 Net
present value to Mexico of a 25-point increase on PISA:
US$5 trillion
 Net present value to Mexico of getting all students to 400
on PISA: US$26 trillion (Hanushek & Woessman, 2010)
6
Teaching quality is the crucial variable
We need to focus on classrooms, not schools
7

In most countries, variability at the classroom level
is much greater than that at school level.
 As
long as you go to school, it doesn’t matter very
much which school you go to.
 But it matters very much which classrooms you are in.
-20
-40
Within schools
Between schools explained by social background of schools
Between schools explained by social background of students
Between schools not explained by social background
McGaw (2008)
Iceland .
Finland .
Norway .
Sweden .
Poland .
Between school variation
explained by social background of schools: 16%
Between schools
-60
-80
Denmark .
Ireland .
Canada .
Spain .
New Zealand .
United States .
Mexico .
Portugal .
Luxembourg .
Switzerland .
Italy .
Between school variation
explained by social background of students: 5%
Greece .
Slovak Republic .
Austria .
Germany .
Belgium .
Japan .
0
Hungary .
20
Turkey .
40
Korea .
60
Czech Republic .
80
Netherlands .
Between
school variation
Within schools
not explained by social
background: 18%
Australia .
100
Within-school
variation: 64%
9
Teaching quality is not the same as
teacher quality
Teaching quality/teacher quality

Teaching quality depends on a number of factors
 The
time teachers have to plan teaching
 The size of classes
 The resources available
 The skills of the teacher

All of these are important, but the quality of the
teacher seems to be especially important
Teacher quality
11

Take a group of 50 teachers all teaching the same
subject:
 In
the classroom of the best teacher, students learn in
six months what students taught by the average teacher
will take a year to learn.
 In the classroom of the least effective teacher, students
will take two years to learn the same amount (Hanushek
& Rivkin, 2006)
 And in the classrooms of the best teachers, students
from disadvantaged backgrounds learn as much as
others (Hamre & Pianta, 2005)
The “dark matter” of teacher quality
12


Teachers make a difference
But what makes the difference in teachers?
 In
particular, can we predict student progress from:
 Teacher
qualifications?
 Value-added?
 Teacher observation?
13
Predicting who will be good
teachers is almost impossible
Teacher qualifications and student progress
14
Mathematics
Primary
Middle
Reading
High
Primary
Middle
—
—
+
General theory of
education courses
Teaching practice
courses
Pedagogical
content courses
Advanced
university courses
Aptitude test
scores
Harris and Sass (2007)
+
High
+
—
—
+
15
Evaluating teacher quality is
inherently difficult
Framework for teaching (Danielson 1996)
16

Four domains of professional practice
1.
2.
3.
4.

Planning and preparation
Classroom environment
Instruction
Professional responsibilities
Links with student achievement (Sartain, et al. 2011)
 Domains
1 and 4: no impact on student achievement
 Domains 2 and 3: some impact on student achievement
A framework for teaching (Danielson, 1996)

Domain 2: The classroom environment
2a: Creating an environment of respect and rapport
 2b: Establishing a culture for learning
 2c: Managing classroom procedures
 2d: Managing student behavior
 2e: Organizing physical space


Domain 3: Instruction
3a: Communicating with students
 3b: Using questioning and discussion techniques
 3c: Engaging students in learning
 3d: Using assessment in instruction
 3e: Demonstrating flexibility and responsiveness

Observations and teacher quality
18
Percentage change in rate of learning
Reading
20
15
Mathematics
So, the highest-rated teachers are 30%
more productive than the lowest rated
10
5
0
-5
But the best teachers are 400% more
productive than the least effective
-10
-15
Unsatisfactory
Basic
Proficient
Distinguished
Sartain, Stoelinga, Brown, Luppescu, Matsko, Miller, Durwood, Jiang, and Glazer (2011)
We don’t know much about teaching…
19


We cannot predict how good a teacher will be
We cannot tell good teaching when we see it
 Expert
ratings of teaching
 Student ratings of teaching

We cannot evaluate teaching with test scores
Traditional approaches to improving teaching

Two main approaches
 Removing
ineffective teachers
 Rewarding good teachers

Problems
 Consume
large amounts of management time
 Technically difficult to do well
 Create competition between teachers
 Differentially effective according to task complexity
The story so far



Improving student achievement is a priority for
every country
Improving student achievement requires improving
teacher quality
Improving teacher quality requires investment in
serving teachers
22
Professional development is the key
to teacher quality
General conclusions about expertise
23



Elite performance is the result of at least a decade
of maximal efforts to improve performance
through an optimal distribution of deliberate
practice
What distinguishes experts from others is the
commitment to deliberate practice
Deliberate practice is
 an
effortful activity that can be sustained only for a
limited time each day
 neither motivating nor enjoyable—it is instrumental in
achieving further improvement in performance
Expertise
24

According to Berliner (1994), experts:








Excel mainly in their own domain
Often develop automaticity for the repetitive operations that
are needed to accomplish their goals
Are more sensitive to the task demands and social situation
when solving problems
Are more opportunistic and flexible in their teaching than
novices
Represent problems in qualitatively different ways than novices
Have faster and more accurate pattern recognition capabilities
Perceive meaningful patterns in the domain in which they are
experienced
Begin to solve problems slower but bring richer and more
personal sources of information to bear
Effects of experience in teaching
25
Extra months per year o f learning
1
Mathematics
1
0
0
-1
-1
-2
-2
-3
-3
-4
-4
-5
Years of teaching experience
0
1
2
3 to 5
Rivkin, Hanushek and Kain (2005)
Reading
-5
0
Years of teaching experience
1
2
3 to 5
Implications for education systems





Pursuing a strategy of getting the “best and brightest”
into teaching is unlikely to succeed
Currently all teachers slow, and most actually stop,
improving after two or three years in the classroom
Expertise research therefore suggests that they are
only beginning to scratch the surface of what they are
capable of
What we need is to persuade those with a real passion
for working with young people to become teachers,
and to continue to improve as long as they stay in the
job.
There is no limit to what we can achieve if we support
our teachers in the right way
27
Feedback is generally more complex
than generally assumed
Important caveats about research findings
28


Educational research can only tell us what was, not
what might be.
Moreover, in education, “What works?” is not the
right question, because
 everything works
somewhere, and
 nothing works everywhere, which is why
 in education, the right question is, “Under what
conditions does this work?”
Effects of formative assessment
Standardized effect size: differences in means, measured
in population standard deviations
Source
Kluger & DeNisi (1996)
Effect size
0.41
Black &Wiliam (1998)
Wiliam et al., (2004)
0.4 to 0.7
0.32
Hattie & Timperley (2007)
Shute (2008)
0.96
0.4 to 0.8
Understanding meta-analysis
30

A technique for aggregating results from different
studies by converting empirical results to a
common measure (usually effect size)
Standardized effect size is defined as:

Problems with meta-analysis

 The
“file drawer” problem
 Variation in population variability
 Selection of studies
 Sensitivity of outcome measures
Effects of feedback
31


Kluger & DeNisi (1996) review of 3000 research reports
Excluding those:





without adequate controls
with poor design
with fewer than 10 participants
where performance was not measured
without details of effect sizes

left 131 reports, 607 effect sizes, involving 12652
individuals

On average, feedback increases achievement


Effect sizes highly variable
38% (50 out of 131) of effect sizes were negative
Getting feedback right is hard
Response type
Feedback indicates performance…
falls short of goal
exceeds goal
Change behavior
Increase effort
Exert less effort
Change goal
Reduce aspiration
Increase aspiration
Abandon goal
Decide goal is too hard
Decide goal is too easy
Reject feedback
Feedback is ignored
Feedback is ignored
Kluger and DeNisi’s conclusions…
These considerations of utility and alternative interventions
suggest that even an FI [feedback intervention] with
demonstrated positive effects on performance should not be
administered whenever possible. Rather, additional
development of FIT [feedback intervention theory] is needed to
establish the circumstance under which positive FI effects on
performance are also lasting and efficient and when these
effects are transient and have questionable utility. This research
must focus on the processes induced by FIs and not on the
general question of whether FIs improve performance—look at
how little progress 90 years of attempts to answer the latter
question have yielded. (p. 278)
34
Formative evaluation of teaching
performance
The evidence base for formative assessment
35








Fuchs & Fuchs (1986)
Natriello (1987)
Crooks (1988)
Bangert-Drowns, et al. (1991)
Dempster (1991, 1992)
Elshout-Mohr (1994)
Kluger & DeNisi (1996)
Black & Wiliam (1998)








Nyquist (2003)
Brookhart (2004)
Allal & Lopez (2005)
Köller (2005)
Brookhart (2007)
Wiliam (2007)
Hattie & Timperley (2007)
Shute (2008)
Assessment for learning/formative assessment
“Assessment for learning is any assessment for which the first priority
in its design and practice is to serve the purpose of promoting
students’ learning. It thus differs from assessment designed primarily
to serve the purposes of accountability, or of ranking, or of certifying
competence. An assessment activity can help learning if it provides
information that teachers and their students can use as feedback in
assessing themselves and one another and in modifying the teaching
and learning activities in which they are engaged. Such assessment
becomes “formative assessment” when the evidence is actually used
to adapt the teaching work to meet learning needs.” (Black, Harrison,
Lee, Marshall & Wiliam, 2004 p. 10)
Theoretical questions
37

Need for clear definitions
 So

that research outcomes are commensurable
Theorization and definition
 Possible
variables
 Category
(instruments, outcomes, functions)
 Beneficiaries (teachers, learners)
 Timescale (months, weeks, days, hours, minutes)
 Consequences (outcomes, instruction, decisions)
 Theory of action (what gets formed?)
Formative assessment: a new definition
“An evaluation of teacher performance functions
formatively to the extent that evidence of teacher
performance that is elicited by the assessment is
interpreted by leaders, teachers, or their peers to
make decisions about the professional development of
the teacher that are likely to be better, or better
founded, than those that would have been taken in the
absence of that evidence.”


Formative evaluation involves the creation of, and
capitalization upon, moments of contingency in
the regulation of teachers’ learning processes
Kinds of regulation (Perrenoud, 1998)
 Proactive
 Interactive
 Retroactive

Agents
 Leaders
(external regulation)
 Peers (co-regulation)
 Teachers (self-regulation)
40
Strategies of formative evaluation
Unpacking formative assessment of teaching
Where the
teacher is now
Leader
Peer
Teacher
Clarifying,
sharing and
understanding
learning
intentions
Where the teacher
is going
Engineering effective
situations, tasks and
activities that elicit
evidence of development
How to get there
Providing feedback that moves
learners forward
Activating teachers as learning
resources for one another
Activating teachers as owners
of their own learning
Validity of formative evaluation
Validity: an evolving concept
43

Evolution of the idea
A property of a test
 A property of students’ results on a test
 A property of the inferences drawn on the basis of test
results


For any test:
some inferences are warranted
 some are not



“One validates not a test but an interpretation of data
arising from a specified procedure” (Cronbach, 1971;
emphasis in original)
No such thing as a valid assessment!
Validating formative evaluation

An assessment is a procedure for making
inferences:
 about
what the learner knows (summative)
 about what to do next (formative)


Summative inferences are validated by consistency
of meanings across different readers
Formative inferences are validated by the
consequences for learners
45
Implementing formative evaluation
of teaching performance
A model for teacher learning
46


Content, then process
Content (what we want teachers to change):
 Evidence
 Ideas

(strategies and techniques)
Process (how to go about change):
 Choice
 Flexibility
 Small
steps
 Accountability
 Support
Choice
A strengths-based approach to change
48


Talent development requires attending to both
strengths and weaknesses
The question is how to distribute attention
between the two:
 For
novices, attention to weaknesses is likely to have
the greatest payoff
 For more experienced teachers, attention to strengths
is likely to be more advantageous
Flexibility
Tight, but loose
50

Two opposing factors in any school reform



Need for flexibility to adapt to local circumstances
Need to maintain fidelity to the theory of action of the reform, to
minimise “lethal mutations”
The “tight but loose” formulation:

… combines an obsessive adherence to central design principles (the
“tight” part) with accommodations to the needs, resources,
constraints, and affordances that occur in any school or district
(the “loose” part), but only where these do not conflict with the
theory of action of the intervention.
Small steps
Expertise
52

According to Berliner (1994), experts:








Excel mainly in their own domain
Often develop automaticity for the repetitive operations that
are needed to accomplish their goals
Are more sensitive to the task demands and social situation
when solving problems
Are more opportunistic and flexible in their teaching than
novices
Represent problems in qualitatively different ways than novices
Have faster and more accurate pattern recognition capabilities
Perceive meaningful patterns in the domain in which they are
experienced
Begin to solve problems slower but bring richer and more
personal sources of information to bear
Looking at the wrong knowledge
53

The most powerful teacher knowledge is not explicit:




That’s why telling teachers what to do doesn’t work.
What we know is more than we can say.
And that is why most professional development has been
relatively ineffective.
Improving practice involves changing habits, not adding
knowledge:

That’s why it’s hard:




And the hardest bit is not getting new ideas into people’s heads.
It’s getting the old ones out.
That’s why it takes time.
But it doesn’t happen naturally:

If it did, the most experienced teachers would be the most
productive, and that’s not true (Hanushek & Rivkin, 2006).
Hand hygiene in hospitals
Study
Preston, Larson, & Stamm (1981)
Focus
Compliance rate
Open ward
16%
ICU
30%
Albert & Condie (1981)
ICU
28% to 41%
Larson (1983)
All wards
45%
Donowitz (1987)
Pediatric ICU
30%
Graham (1990)
ICU
32%
Dubbert (1990)
ICU
81%
Pettinger & Nettleman (1991)
Surgical ICU
51%
Larson, et al. (1992)
Neonatal ICU
29%
Doebbeling, et al. (1992)
ICU
40%
Zimakoff, et al. (1992)
ICU
40%
Meengs, et al. (1994)
ER (Casualty)
32%
Pittet, Mourouga, & Perneger (1999)
All wards
48%
ICU
36%
Pittet (2001)
Accountability
Making a commitment
56

Action planning:





Forces teachers to make their ideas concrete and creates a record
Makes the teachers accountable for doing what they promised
Requires each teacher to focus on a small number of changes
Requires the teachers to identify what they will give up or reduce
A good action plan:





Does not try to change everything at once
Spells out specific changes in teaching practice
Relates to the five “key strategies” of AFL
Is achievable within a reasonable period of time
Identifies something that the teacher will no longer do or will do
less of
Support
Supportive accountability
58

What is needed from teachers:
A
commitment to:
 The
continual improvement of practice
 Focus on those things that make a difference to students

What is needed from leaders:
A
commitment to engineer effective learning
environments for teachers by:
 Creating
expectations for continually improving practice
 Keeping the focus on the things that make a difference to
students
 Providing the time, space, dispensation, and support for
innovation
 Supporting risk-taking
Download