Why education will never be a research

advertisement
Why teaching will never be a
research-based profession
and why that’s a Good Thing
Dylan Wiliam (@dylanwiliam)
1
www.dylanwiliam.net
Outline
2




What does it mean for a practice to be “researchbased”?
Why educational research falls short
What educational research should do, and how it
should do it
The role of teachers in educational research
What does it mean to be research-based?
3

In a ‘research-based’ profession:
 Professionals
would, for the majority of decisions they
need to take, be able to find and access credible
research studies that provided evidence that
particular courses of action that would, implemented
as directed, be substantially more likely to lead to
better outcomes than others.
Important caveats about research findings
4


Educational research can only tell us what was, not
what might be.
Moreover, in education, “What works?” is rarely
the right question, because
 everything works
somewhere, and
 nothing works everywhere, which is why
 in education, the right question is, “Under what
conditions does this work?”
Causality: a tricky issue
5

Traditionally, causality has been defined in terms of
a counter-factual argument
 “We
may define a cause to be an object followed by
another, and where all the objects, similar to the first,
are followed by objects similar to the second. Or, in
other words, where, if the first object had not been,
the second never had existed.” (Hume, 1748 Section
VII)
 “If c and e are two actual events such that e would not
have occurred without c, then c is a cause of e.” (Lewis
1973 p. 563)
Research methods 101: causality
6

Does c cause e?
 Given
c, e happened (factual)
 Problem: post hoc
ergo propter hoc
 If
c had not happened, e would not have happened
(counterfactual)
 Problem: c
did happen
 So
we need to create a parallel world where c did not
happen
 Same

group different time (baseline measurement)
Need to assume stability over time
 Different group

same time (control group)
Need to assume groups are equivalent
 Randomized
controlled trial
Problems with RCTs in education
7




Clustering
Power
Implementation
Context
Meta-analysis in education:
“I think you’ll find it’s a bit more
complicated than that” (Goldacre, 2008)
8
Educational Endowment Foundation toolkit
9
Intervention
Cost
Quality of
evidence
Extra months
of learning
Feedback
££

+8
Metacognition and self-regulation
££

+8
Peer tutoring
££

+6
Early years intervention
£££££

+6
One to one tuition
££££

+5
Homework (secondary)
£

+5
Collaborative learning
£

+5
Phonics
£

+4
Small group tuition
£££

+4
Behaviour interventions
£££

+4
Digital technology
££££

+4
£

+4
Social and emotional learning
Educational Endowment Foundation toolkit
10
Intervention
Cost
Quality of
evidence
Extra months
of learning
Parental involvement
£££

+3
£££££

+3
Summer schools
£££

+3
Sports participation
£££

+2
Arts participation
££

+2
Extended school time
£££

+2
Individualized instruction
£

+2
After school programmes
££££

+2
£

+2
£££

+1
£

+1
Reducing class size
Learning styles
Mentoring
Homework (primary)
Educational Endowment Foundation toolkit
11
Intervention
Cost
Quality of
evidence
Extra months
of learning
Teaching assistants
££££

0
Performance pay
££

0
Aspiration interventions
£££

0
Block scheduling
£

0
School uniform
£

0
Physical environment
££

0
Ability grouping
£

-1
An illustrative example: feedback
12


Kluger and DeNisi (1996) review of 3000 research reports
Excluding those:







without adequate controls
with poor design
with fewer than 10 participants
where performance was not measured
without details of effect sizes
left 131 reports, 607 effect sizes, involving 12652
individuals
On average, feedback increases achievement


Effect sizes highly variable
38% (50 out of 131) of effect sizes were negative
Understanding meta-analysis
13

A technique for aggregating results from different
studies by converting empirical results to a
common measure (usually effect size)
Standardized effect size is defined as:

Problems with meta-analysis

 The
“file drawer” problem
 Variation in population variability
 Selection of studies
 Sensitivity of outcome measures
The “file drawer” problem
14
The importance of statistical power
15


The statistical power of an experiment is the
probability that the experiment will yield an effect that
is large enough to be statistically significant.
In single-level designs, power depends on
significance level set
 magnitude of effect
 size of experiment


The power of most social studies experiments is low
Psychology:
0.4 (Sedlmeier & Gigerenzer, 1989)
 Neuroscience: 0.2 (Button et al., 2013)
 Education:
0.4


Only lucky experiments get published…
Variation in variability
16
Annual growth in achievement, by age
17
1.6
A 50% increase in the
rate of learning for sixyear-olds is equivalent
to an effect size of 0.76
annual growth (SDs)
1.4
1.2
A 50% increase in the
rate of learning for 15year-olds is equivalent
to an effect size of 0.1
1.0
0.8
0.6
0.4
0.2
0.0
5
6
7
8
9
10
11
Age
Bloom, Hill, Black, and Lipsey (2008)
12
13
14
15
16
Variation in variability
18


Studies with younger children will produce larger
effect size estimates
Studies with restricted populations (e.g., children
with special needs, gifted students) will produce
larger effect size estimates
Selection of studies
19
Feedback in STEM subjects
20


Review of 9000 papers on feedback in
mathematics, science and technology
Only 238 papers retained
 Background
papers
 Descriptive papers
 Qualitative papers
 Quantitative papers
 Mathematics
 Science
 Technology
Ruiz-Primo and Li (2013)
24
79
24
111
60
35
16
Classification of feedback studies
21
1. Who provided the feedback (teacher, peer, self, or technology-based)?
2. How was the feedback delivered (individual, small group, or whole
class)?
3. What was the role of the student in the feedback (provider or
receiver)?
4. What was the focus of the feedback (e.g., product, process, selfregulation for cognitive feedback; or goal orientation, self-efficacy for
affective feedback)
5. On what was the feedback based (student product or process)?
6. What type of feedback was provided (evaluative, descriptive, or
holistic)?
7. How was feedback provided or presented (written, video, oral, or
video)?
8. What was the referent of feedback (self, others, or mastery criteria)?
9. How, and how often was feedback given in the study (one time or
multiple times; with or without pedagogical use)?
Main findings
22
Characteristic of studies included
Maths
Science
Feedback treatment is a single event lasting minutes
85%
72%
Reliability of outcome measures
39%
63%
Validity of outcome measures
24%
3%
Dealing only or mainly with declarative knowledge
12%
36%
9%
0%
14%
17%
Schematic knowledge (e.g., knowing why)
Multiple feedback events in a week
Sensitivity to instruction
23
Sensitivity of outcome measures
24

Distance of assessment from the curriculum

Immediate


Close


e.g., if an immediate assessment asked students to construct boats
out of paper cups, the proximal assessment would ask for an
explanation of what makes bottles float
Distal


e.g., where an immediate assessment asked about number of
pendulum swings in 15 seconds, a close assessment asks about the
time taken for 10 swings
Proximal


e.g., science journals, notebooks, and classroom tests
e.g., where the assessment task is sampled from a different domain
and where the problem, procedures, materials and measurement
methods differed from those used in the original activities
Remote

standardized national achievement tests.
Ruiz-Primo, Shavelson, Hamilton, and Klein (2002)
Impact of sensitivity to instruction
25
Effect size
Close
Proximal
Why research hasn’t changed teaching
26

Aristotle’s main intellectual virtues
 Episteme: knowledge
of universal truths
 Techne: ability to make things
 Phronesis: practical wisdom

Flyvbjerg (2001)
 “By
definition, phronetic researchers focus on values;
for example by taking their point of departure in the
classic value-rational questions: Where are we going?
Is it desirable? What should be done?” (p130)
Maxims and rules
27
“Maxims are rules, the correct application of which is part
of the art which they govern. The true maxims of golfing or
of poetry increase our insight into golfing or poetry and
may even give valuable guidance to golfers and poets; but
these maxims would instantly condemn themselves to
absurdity if they tried to replace the golfer's skill or the
poet's art. Maxims cannot be understood, still less applied
by anyone not already possessing a good practical
knowledge of the art. They derive their interest from our
appreciation of the art and cannot themselves either
replace or establish that appreciation.”
Polanyi (1958 pp. 31-32)
The knowledge-creating spiral
28
to
Tacit knowledge
Explicit knowledge
Dialogue
Tacit knowledge
from
Explicit knowledge
Socialization
Externalization
sympathised knowledge
conceptual knowledge
Networking
Sharing experience
Internalization
Combination
operational knowledge
systemic knowledge
Learning by doing
Nonaka and Takeuchi (1995)
Inquiry systems
29
System
Evidence
Leibnizian
Rationality
Lockean
Observation
Kantian
Representation
Hegelian
Dialectic
Singerian
Values, ethics, practical consequences
Churchman (1971)
Inquiry systems
30
The Lockean inquirer displays the ‘fundamental’ data
that all experts agree are accurate and relevant, and
then builds a consistent story out of these. The Kantian
inquirer displays the same story from different points of
view, emphasising thereby that what is put into the story
by the internal mode of representation is not given from
the outside. But the Hegelian inquirer, using the same
data, tells two stories, one supporting the most
prominent policy on one side, the other supporting the
most promising story on the other side (Churchman,
1971 p. 177).
Singerian inquiry systems
31
The ‘is taken to be’ is a self-imposed imperative of the community.
Taken in the context of the whole Singerian theory of inquiry and
progress, the imperative has the status of an ethical judgment. That is,
the community judges that to accept its instruction is to bring about a
suitable tactic or strategy [...]. The acceptance may lead to social
actions outside of inquiry, or to new kinds of inquiry, or whatever. Part
of the community’s judgement is concerned with the appropriateness
of these actions from an ethical point of view. Hence the linguistic
puzzle which bothered some empiricists—how the inquiring system
can pass linguistically from “is” statements to “ought” statements— is
no puzzle at all in the Singerian inquirer: the inquiring system speaks
exclusively in the “ought,” the “is” being only a convenient façon de
parler when one wants to block out the uncertainty in the discourse.
(Churchman, 1971: 202).
Educational research…
32

…can be characterised as a never-ending process of
assembling evidence that:
particular inferences are warranted on the basis of the
available evidence;
 such inferences are more warranted than plausible rival
inferences;
 the consequences of such inferences are ethically
defensible.


The basis for warrants, the other plausible
interpretations, and the ethical bases for defending the
consequences, are themselves constantly open to
scrutiny and question.
A way forward: in Pasteur’s quadrant
33
Considerations of use
No
Quest for
fundamental
understanding?
Stokes (1997)
Yes
Pure basic research
Yes
(Bohr)
Use-inspired basic
research (Pasteur)
Applied research
No
unmotivated by
applications (Brahe)
Pure applied
research (Edison)
The roles of teachers and researchers
34

The role of teachers
 All
teachers should be seeking to improve their
practice through a process of ‘disciplined inquiry’
 Some
may wish to share their work with others
 Some may wish to write their work up for publication
 Some may wish to pursue research degrees
 Some may even wish to undertake research

The role of education researchers
 Abandoning
“physics envy”
 Working with teachers to make their findings
applicable in contexts other than the context of data
collection
References
35
Bloom, H. S., Hill, C. J., Black, A. R., & Lipsey, M. W. (2008).
Performance trajectories and performance gaps as
achievement effect-size benchmarks for educational
interventions. Journal of Research on Educational
Effectiveness, 1(4), 289–328.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A.,
Flint, J., Robinson, E. S. J., & Munafo, M. R. (2013). Power
failure: Why small sample size undermines the reliability
of neuroscience. Nature Reviews Neuroscience, advance
online publication. doi: 10.1038/nrn3475
Churchman, C. W. (1971). The design of inquiring systems:
basic concepts of systems and organization. New York,
NY: Basic Books.
Flyvbjerg, B. (2001). Making social science matter: why
social inquiry fails and how it can succeed again.
Cambridge, UK: Cambridge University Press.
Goldacre, B. (2008). Bad science. London, UK: Fourth
Estate.
Hume, D. (1748). An enquiry concerning human
understanding. London, UK: Andrew Millar.
Kluger, A. N., & DeNisi, A. (1996). The effects of feedback
interventions on performance: a historical review, a
meta-analysis, and a preliminary feedback intervention
theory. Psychological Bulletin, 119(2), 254-284.
Lewis, D. (1973). Causation. Journal of Philosophy, 70(17),
556-567.
Nonaka, I., & Takeuchi, H. (1995). The knowledge-creating
company: how Japanese companies create the dynamics
of innovation. New York, NY: Oxford University Press.
Polanyi, M. (1958). Personal knowledge. London, UK:
Routledge & Kegan Paul.
Ruiz-Primo, M. A., & Li, M. (2013). Examining formative
feedback in the classroom context: New research
perspectives. In J. H. McMillan (Ed.), Sage handbook of
research on classroom assessment (2 ed., pp. 215-232).
Thousand Oaks, CA: Sage.
Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S.
(2002). On the evaluation of systemic science education
reform: searching for instructional sensitivity. Journal of
Research in Science Teaching, 39(5), 369-393.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of
statistical power have an effect on the power of studies?
Psychological Bulletin, 105(2), 309-316. doi:
10.1037/0033-2909.105.2.309
Stokes, D. E. (1997). Pasteur's quadrant: basic science and
technological innovation. Washington, DC: Brookings
Institution Press.
Download