When is assessment learning oriented?

advertisement
When is assessment
learning-oriented?
Dylan Wiliam
www.dylanwiliam.net
4th Biennial EARLI/Northumbria Assessment
Conference, Potsdam, Germany, August 2008
Overview of presentation
Why do we need formative assessment?
Clarifying assumptions and definitions about formative assessment
A theoretically-based definition of formative assessment
How formative assessment relates to other aspects of education
Which of the following categories of skill has disappeared from the
work-place most rapidly over the last 40 years?
1. Routine manual
2. Non-routine manual
3. Routine cognitive
4. Complex communication
5. Expert thinking/problem-solving
The changing demand for skill
Autor, Levy & Murnane, 2003
There is only one 21st century skill
So the model that says learn while you’re at school, while you’re young,
the skills that you will apply during your lifetime is no longer tenable. The
skills that you can learn when you’re at school will not be applicable. They
will be obsolete by the time you get into the workplace and need them,
except for one skill. The one really competitive skill is the skill of being
able to learn. It is the skill of being able not to give the right answer to
questions about what you were taught in school, but to make the right
response to situations that are outside the scope of what you were taught
in school. We need to produce people who know how to act when they’re
faced with situations for which they were not specifically prepared.
(Papert, 1998)
A convergence of interests
Philosophies of education (Williams, 1966)…
Transmission of culture
Preparation for employement
Self-actualization
…all require preparation for future learning (PFL)
Cannot be taught in isolation from other learning
Students still need the basic skills of literacy, numeracy, concepts and facts
Learning power is developed primarily through pedagogy, not curriculum
We have to develop the way teachers teach, not what they teach
Learning power environments
Key concept:
Teachers do not create learning
Learners create learning
Teaching is the engineering of effective learning environments
Key features of effective learning environments:
Create student engagement (pedagogies of engagement)
Well-regulated (pedagogies of contingency)
Why pedagogies of engagement?
Intelligence is partly inherited
So what?
Intelligence is partly environmental
Environment creates intelligence
Intelligence creates environment
Learning environments
High cognitive demand
Inclusive
Obligatory
Motivation: cause or effect?
high
arousal
Flow
anxiety
challenge
control
worry
relaxation
apathy
boredom
low
low
competence
high
(Csikszentmihalyi, 1990)
Why pedagogies of contingency?
Learners do not learn what they are taught
Assessment is the bridge between teaching and learning, and thus the
central process of teaching (as opposed to lecturing).
Pedagogies of contingency
Personalisation
Mass customization (rather than mass production or individualisation)
Diversity
A valuable teaching resource (rather than a challenge to be minimized)
What gets learnt?
Denvir & Brown (1996)
Understanding of basic number
in low-achieving 7-9 year olds
Extensive conceptual hierarchy
developed
Students assessed
Teacher plans teaching
programme
Students re-assessed
The research evidence
Several major reviews of the research…
Natriello (1987)
Crooks (1988)
Kluger & DeNisi (1996)
Black & Wiliam (1998)
Nyquist (2003)
… all find consistent, substantial effects
It’s the cost-benefit ratio stupid…
Intervention
Extra months of
learning per year
Cost/
classroom/yr
Class-size reduction (by 30%)
4
€25k
Increase teacher content
knowledge from weak to strong
2
?
Formative assessment/
Assessment for learning
8
€2.5k
Independent dimensions of assessment
Scale
 Large-scale (nomothetic) versus small-scale (idiographic)
Locus
 Classroom versus examination hall
Authority
 Teacher-produced versus expert-produced
Scope
 Continuous versus one-off
Format
 Multiple-choice versus constructed response
Function
 Formative versus summative
No such thing as formative assessment
Purposes of assessments
 Evaluative
 Summative
 Formative
 Instruments
 Purposes
 Functions
Prospects for integration are bleak
Formative assessment involves the
creation of, and capitalization upon,
moments of contingency in instruction
“An assessment functions formatively
when evidence about student
achievement elicited by the
assessment is interpreted and used to
make decisions about the next steps in
instruction that are likely to be better,
or better founded, than the decisions
that would have been made in the
absence of that evidence.”
Black and Wiliam, 2009 (we hope!)
Some principles
A commitment to formative assessment
Does not entail any view of what is to be learned
Does not entail any view of what happens when learning takes place
…although clarity on these is essential.
Evolving conceptions of formative assessment
“Feedback” metaphor
 Components of a feedback system
data on the actual level of some measurable attribute;
data on the reference level of that attribute;
a mechanism for comparing the two levels and generating
information about the ‘gap’ between the two levels;
a mechanism by which the information can be used to alter the gap.
Feedback system
 Importance of eliciting the right data
 The role of the learner
 The role of the learning milieu (e.g., as activity system)
Unpacking formative assessment
Key processes
Establishing where the learners are in their learning
Establishing where they are going
Working out how to get there
Participants
Teachers
Peers
Learners
Aspects of formative assessment
Where the learner
is going
Teacher
Peer
Learner
Where the learner is
Engineering effective
Clarify and share discussions, tasks and
activities that elicit
learning intentions
evidence of learning
How to get there
Providing feedback
that moves learners
forward
Understand and
share learning
intentions
Activating students as learning
resources for one another
Understand
learning intentions
Activating students as owners
of their own learning
Five “key strategies”…
Clarifying, understanding, and sharing learning intentions
curriculum philosophy
Engineering effective classroom discussions, tasks and activities that
elicit evidence of learning
classroom discourse, interactive whole-class teaching
Providing feedback that moves learners forward
 feedback
Activating students as learning resources for one another
 collaborative learning, reciprocal teaching, peer-assessment
Activating students as owners of their own learning
metacognition, motivation, interest, attribution, self-regulated learning, selfassessment
(Wiliam & Thompson, 2007)
…and one big idea
Use evidence about achievement to adapt instruction to meet learner
needs
Keeping learning on track (KLT)
A pilot guides a plane or boat toward its destination by taking constant
readings and making careful adjustments in response to wind, currents,
weather, etc.
A KLT teacher does the same:
Plans a carefully chosen route ahead of time (in essence building the track)
Takes readings along the way
Changes course as conditions dictate
Effects of formative assessment
Long-cycle
 Span: across units, terms
 Length: four weeks to one year
 Impact: Student monitoring; curriculum alignment
Medium-cycle
 Span: within and between teaching units
 Length: one to four weeks
 Impact: Improved, student-involved, assessment; teacher cognition about learning
Short-cycle
 Span: within and between lessons
 Length:
 day-by-day: 24 to 48 hours
 minute-by-minute: 5 seconds to 2 hours
 Impact: more responsive classroom practice; increased student engagement
System responsiveness and time-frames
If evidence is to inform decision-making, the evidence needs to be available
before the decision…
 Long-cycle: Are our professional development programmes well-aligned with the
needs of our teachers?
 Cycle-length: two years
 Long-cycle: Does our curriculum adequately cover the state standards as
operationalized in the annual state test?
 Cycle-length: one year
 Medium-cycle: Is this student responding adequately to the tier 1 intervention for
reading or do they require a tier 2 intervention?
 Cycle-length: one month
 Short cycle: Does the class understand the generation of equivalent fractions well
enough to move on to the addition of fractions?
 Cycle-length: five minutes
The formative assessment hi-jack…
“Statistical process control” models of learning
USA: “Formative tests”
Tests administered at intervals of 6 to 10 weeks
Often not even keyed to instruction
England (5 to 16 year olds):“Assessment for learning strategy”
Government policy focused on target-setting and level chasing
Focus on “tracking achievement”
England (Higher Education): Portfolio assessment
New focus on formative e-assessment
Ideas whose time has come…or gone…
Diagnostic analysis of standardized tests is probably dead
 Lack of agreements about models
 Models make assumptions not about items, but how students answer them
 Dearth of assessment developers who know enough about learners
 Poor efficiency
More promising developments
 Use of Bayesian inference networks to build proficiency models
 But
 Proficiency models are not necessarily developmental models
 Models need large amounts of data to run
Getting the cycle right (and the right cycle)
Within this view of formative assessment
 feedback is not whole of formative assessment
 It’s not even the most important component of formative assessment
Medium- and long-cycle formative assessments
 Are supported by existing psychometrics
 Are easy to manage, but
 Generally produce small effects
Short-cycle formative assessments
 Contradict important psychometric assumptions
 Reliability
 Monotonicity of ICCs
 Are difficult to establish, but
 Generally produce large effects
The overlap between age-cohorts is large…
QuickTime™ and a
decompressor
are needed to see this picture.
The spread of
achievement within
each cohort is
greater than
generally assumed
…so individual progress is hard to track
On typical standardized tests growth is slow…
Average annual growth of achievement of individuals is around 0.4 sd
So monthly growth of individual achievement is 0.03 sd
…and the reliability of the test is limited…
A reliability of 0.90 corresponds to a standard error of measurement of 0.3
sd
In other words, the SEM of a highly reliable test is ten times the monthly
growth in achievement.
So standardized tests are completely useless for monitoring individual
progress in achievement—they are insensitive to instruction.
…and the data is no use when it arrives…
Traditional testing deals with individuals, but teachers mostly deal with groups
Data-Push vs. Decision-Pull
 “Data-push”
 Quality control at end of an instructional sequence
 Monitoring assessment that dentifies that remediation is needed, but not what
 Requires new routines to utilize the information
 “a series of unwanted answers to unasked questions” (Popper)
 Decision-Pull
 Starts with the decisions teacher make daily
 Supports teachers “on-the-fly” decisions
If a 30-item test provides useful information on an individual, then responses
from 30 individuals on a single item might provide useful information on a class
Characteristics of hinge-point questions
Relate to important learning outcomes necessary for progression in
learning
Can be used at any point in a learning sequence
Beginning (range-finding)
Middle (mid-course correction)
End (e.g., “exit pass”)
When used in “real-time” teacher must be able to collect and interpret
the response of all students in 30 seconds
Low probability of correct guessing
In which of these right-angled triangles is a2 + b2 = c2 ?
A
b
a
B
a
c
C
b
a
b
D
c
c
b
c
E
c
a
a
b
F
b
c
a
Build on key (mis-)conceptions…in math
What can you say about the means of the following two data sets?
Set 1:
Set 2:
10
10
12
12
13
13
15
15
0
A. The two sets have the same mean.
B. The two sets have different means.
C. It depends on whether you choose to count the zero.
…in Science…
The ball sitting on the table is not moving. It is not moving because:
A.
B.
C.
D.
E.
no forces are pushing or pulling on the ball.
gravity is pulling down, but the table is in the way.
the table pushes up with the same force that gravity pulls down
gravity is holding it onto the table.
there is a force inside the ball keeping it from rolling off the table
Wilson & Draney, 2004
… and History.
Why are historians concerned with bias when analyzing sources?
A. People can never be trusted to tell the truth
B. People deliberately leave out important details
C. People are only able to provide meaningful information if they
experienced an event firsthand
D. People interpret the same event in different ways, according to their
experience
E. People are unaware of the motivations for their actions
F. People get confused about sequences of events
Requirements for hinge-point questions
For an item to support instructional decision-making, the key
requirement is that in no case do incorrect and correct cognitive rules
map on to the same response (Wylie & Wiliam, 2007)
Cognitive Rules
Responses
Correct
A
Incorrect
B
C
D
Item improvement
In which of these figures is one-quarter shaded?
A
B
C
D
A
B
C
D
The discovery of
new incorrect
cognitive rules
that interpret item
keys leads to item
improvement
Feedback
Kinds of feedback in Higher Education (Nyquist, 2003)
 Weaker feedback only
Knowledge of results (KoR)
 Feedback only
KoR + clear goals or knowledge of correct results (KCR)
 Weak formative assessment
KCR+ explanation (KCR+e)
 Moderate formative assessment
(KCR+e) + specific actions for gap reduction
 Strong formative assessment
(KCR+e) + activity
Effect of formative assessment (HE)
N
Effect*
Weaker feedback only
31
0.14
Feedback only
48
0.36
Weaker formative assessment
49
0.29
Moderate formative assessment
41
0.39
Strong formative assessment
16
0.56
*corrected values
Feedback
Feedback should
 Cause thinking
 Provide guidance on how to improve
 Focus on what to take forward to the next assignment rather that what is deficient
about the last assignment
 Be used
Techniques
 Delayed scores/grades
 Learning portfolios
 “Five of these answers are wrong. Find them and fix them”
 ‘Three-quarters of the way through a unit” test
Sharing learning intentions
Effective summative assessment:
 Requires raters to share a construct of quality
Effective formative assessment
 Requires learners to share the same construct of quality as the raters
 Requires teachers to have an anatomy of quality
Techniques
 Explaining learning intentions at start of lesson/unit
Learning intentions
Success criteria
 Intentions/criteria in students’ language
 Posters of key words to talk about learning
 Planning/writing frames
 Annotated examples ‘flesh out’ assessment standards (e.g. lab reports)
 Opportunities for students to design their own tests
Students owning their learning and as
learning resources for one another
Techniques
Students assessing their own/peers’ work
with rubrics
with exemplars
“two stars and a wish”
Training students to pose questions/identifying group weaknesses
Self-assessment of understanding
Traffic lights
Red/green discs
The learning milieu
Dual processing theory (Boekaerts, 1993)
Self-regulated learning is both metacognitively governed and affectively
charged (Boekaerts, 2006 p. 348)
Students form mental representations of the task-in-context and appraise:
current perceptions of the task and the physical, social, and instructional
context within which it is embedded;
activated domain-specific knowledge and (meta)cognitive strategies
related to the task;
beliefs about motivation (including domain-specific capacity), interest and
the effects of effort
Resulting appraisal generates activity along one of two pathways:
‘well-being’ pathway
‘growth’ pathway
When is assessment learning oriented?
Assessment is learning oriented when it
Is integrated into instructional design so that it becomes invisible
Creates engagement in learning
Helps learners to understand what successful performance looks like
Generates information that can be interpreted in terms of a learning
progression
Focuses attention on growth rather than well-being
Provides a focus for supportive conversations between learners
Download