here - Evaluation Revisited

advertisement
Implications of complication and complexity
for evaluation
Patricia J. Rogers
CIRCLE (Collaboration for Interdisciplinary Research, Consulting and Learning in Evaluation)
Royal Melbourne Institute of Technology, Australia
Patricia.Rogers@rmit.edu.au
Evaluation Revisited: Improving the quality of evaluative practice
by embracing complexity
Utrecht, the Netherlands May 20-21 2010
The naïve experimentalism view of evaluation and
evidence-based policy and practice INTENDED BENEFICIARIES
BENEFIT AS EXPECTED
PRACTITIONERS
DO THING ‘A’
POLICYMAKERS
DECIDE TO DO THING ‘A’
SINGLE STUDY
RESEARCHERS
SEVERAL STUDIES
FIND THAT THING ‘A’ WORKS
2
But things are often more complicated or complex than this …
3
What can (and does) go wrong with naïve experimentalism
PRACTITIONERS
NOT FEASIBLE IN
OTHER LOCATIONS
DO THING ‘A’
NOT SCALEABLE
POLICYMAKERS
DECIDE TO DO THING ‘A’
RANDOM
ERROR
RESEARCHERS
FIND THAT THING ‘A’ WORKS
NEGATIVE
EFFECTS
IGNORED
MISREPRESENTATION
OF RESULTS
DIFFERENTIAL
EFFECTS – THING ‘A’
ONLY WORKS IN SOME
CONTEXTS
NARROW STUDIES THAT
IGNORE IMPORTANT
EVIDENCE
4
An alternative view of knowledge- building
5
An approach to evaluation and evidence-based policy
and practice that recognizes the complicated and
complex aspects of situations and interventions
Researchers and
Policymakers
evaluators
What is needed? What is possible?
What works? What works for whom in what situations?
What is working?
Practitioners
and managers
Community
and civil society
6
Advocacy for RCTs (Randomised Controlled Trials) in
development evaluation
2003
2006
“J-PAL is best understood as a network of
affiliated researchers … united by their use of
the randomized trial methodology”
“Advocated more use of RCTs
2009
2010
TED talk
Used leeches to
illustrate the
alternative to using
RCTs as evidence
Argued that
experimental and
quasi-experimental
designs had a
comparative
advantage because
they provide an
unbiased numeric
estimate of impact
7
Distinguishing between RCTs and naïve experimentalism
RCT (Randomised Controlled Trial)
– one of many research designs that can be suitable
– involves randomly assigning (truly randomly, not ad hoc) potential
participants to either receive the treatment (or one of several version of the
treatment) or to be in the control group (who might receive nothing or the
current standard treatment)
– in ‘double blind’ RCTs neither the participants nor the researchers know
who is in the treatment group (eg the control group get pills that look the
same and the details of the group are kept secret until after the results are
recorded)
Naïve experimentalism
– believes that RCTs always provide the best evidence (the ‘gold standard’
approach)
– ignores (or is ignorant) of the potential risks in using RCTs and the other
approaches that can be appropriate
8
Exploring complication and complexity in evaluation
1997
2010
2006
2008
2008
2009
9
Some unhelpful ways ‘complex’ is used
• Difficult –
eg little available data, hard to get additional data
• Beyond scrutiny –
eg too technical for others to understand or challenge
• Ad hoc –
eg too overwhelmed with implementation to think about planning or follow through
10
Two framings of simple, complicated and complex
Simple
Glouberman and Zimmerman
2002
Kurtz and Snowden 2003
Tested ‘recipes’ assure
replicability
The domain of the ‘known’,
Expertise is not needed
Cause and effect are well
understood,
Best practices can be confidently
recommended,
Complicated Success requires high level
of expertise in many
specialized fields +
coordination
Complex
Every situation is unique –
previous success does not
guarantee success
The domain of the ‘knowable’
Expert knowledge is required,
The domain of the ‘unknowable’,
Patterns are only evident in
retrospect.
Expertise can help but is
not sufficient; relationships
are key
11
Using the framework
Can be used to refer to a situation or to an intervention
Not useful as a way of classifying the whole situation or intervention
most useful to consider aspects of interventions
Not normative
complex is not better than simple
simple interventions can still be difficult to do well, or to get good
data about
12
Simple can sometimes be appropriate
“It can scarcely be denied that the
supreme goal of all theory is to
make the irreducible basic
elements as simple
and as
“Everything
should
befew as
possible without having to
made assurrender
simple the
as adequate
possible,
but no simpler.”
representation
of a single datum of
experience.”
Albert Einstein, Oxford University, 1933
13
Implications of complicated and complex situations and
interventions for evaluation
1. Focus
2. Governance
3. Consistency
4. Necessariness
5. Sufficiency
6. Change trajectory
7. Unintended outcomes
(Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
14
1. Focus - implications for evaluation?
Simple
Single set of objectives
Complicated
Different objectives valued by different
stakeholders
Multiple, competing imperatives
Objectives at multiple levels of a system
Complex
Emergent objectives
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
15
Focus - Objectives at multiple levels of a system
Intervention
A ctivities at
system level
A ctivities at site
level
A ctivities at
client level
S horter term outcom es at
system level
L on ger term
outcom es
S horter term outcom es at
site level
S horter term outcom es at
client level
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
16
2. Governance - implications for evaluation?
Simple
Single organization
Complicated
Specific organizations with formalized requirements
Complex
Emergent organizations working together in flexible
ways
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
17
3. Consistency - implications for evaluation?
Simple
Standardized
Complicated
Adapted
Complex
Adaptive
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
18
What interventions look like – teaching reading
Simple – best
practice
Teachers select a reading program which has been shown in
RCTs to be effective (eg Reading First program - $1b p.a)
Complicated adapted
Teachers identify children’s learning stage and provide exercises
to match this (eg Victorian Catholic Education Systems Literacy
Assessment Project)
Griffin, P. 2009 ‘Ambitious new project to raise literacy and numeracy levels in Victorian Schools. http://newsroom.melbourne.edu/studio/ep-29
Griffin, P., Murray, L., Care, E., Thomas, A., & Perri, P. (2009). Developmental Assessment: Lifting literacy through Professional Learning
Teams, Assessment in Education. In press
19
What interventions look like – supporting small businesses
Complicated – what
are the ‘active
ingredients’
An RCT compares the effect on small businesses of providing
(i) business training
(ii) savings incentive
(iii) wages support
(iv) business training and savings incentive
(v) business training and wages support
(vi) savings incentive and wages support (McKenzie, 2010)
Complex - adaptive
A program works with small businesses to iteratively dentify what
they need, and meet this need
20
4. Necessariness - implications for evaluation?
Simple
Only way to achieve the intended impacts
Complicated
One of several ways to achieve the intended impacts – which can
be identified in advance
Complex
One of several ways to achieve the intended impacts – which are
only evident in retrospect
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
21
Necessariness – with/without comparisons
A US program to assist poor families through linking them to services
found that families receiving the program experienced improvements
in welfare — but so did the families that were randomly assigned to a
control group that did not receive the visits (St. Pierre and Layzer
1999).
[As this case shows], a good study helps avoid spending funds on
ineffective programs and redirects attention to improving designs or to
more promising alternatives.’ (When Will We Ever Learn?)
But families in the control group had also accessed services..
The appropriate comparison would have been to compare the costs
incurred in the different groups
St Pierre et al, 1996 Report on the National Evaluation of the Comprehensive Child Development
Program. Summary and links to reports available at
http://www.researchforum.org/project_abstract_166.html
22
5. Sufficiency - implications for evaluation?
Simple
Sufficient to produce the intended impacts. Works the same for
everyone
Complicated
Only works in conjunction with other interventions (previously,
concurrently, or subsequently) and/or only works for some
people and/or only works in some circumstances – which can
be identified in advance
Complex
Only works in conjunction with other interventions (previously,
concurrently, or subsequently) and/or only works for some
people and/or only works in some circumstances – which is
only evident in retrospect
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
23
:
False negatives – the potted plant thought experiment
If 200 potted plants are randomly assigned to
either a treatment group that receives daily
water, or to a control that receives none,
and both groups are placed in a dark cupboard,
the treatment group does not have
better outcomes than the control.
Possible conclusions: Watering plants is
ineffective in making them grow.
Better conclusion: Water is not sufficient.
:
False positives – Early Head Start
• Early Head Start program - on average effective. Listed as an
‘evidence-based program’
• But unfavourable outcomes for children in families with high levels of
demographic risk factors (Mathematica Policy Research Inc, 2002,
Westhorp (2008)
Westhorp, G (2008) Development of Realist Evaluation Methods for Small Scale
Community Based Settings Unpublished PhD Thesis, Nottingham Trent University
Mathematica Policy Research Inc (2002). Making a Difference in the Lives of Infants and
Toddlers and Their Families: The Impacts of Early Head Start, Vol 1. US Department
of Health and Human Services.
6. Change trajectory - implications for evaluation?
Simple
Simple relationship– readily understood
Complicated
Complicated relationship– needs expertise to
understand and predict
Complex
Complex relationship (including tipping points)– cannot
be predicted but only understood in retrospect
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
26
:
Complicated dose-response relationship
– does stress improve performance?
27
7. Unintended outcomes - implications for evaluation?
Simple
Unintended outcomes can be anticipated and
monitored
Complicated
Different unintended outcomes are likely in particular
combinations of circumstances – expertise is needed
to anticipate them and identify them
Complex
Unintended outcomes cannot be anticipated but only
identified (and addressed) as they emerge or in
retrospect
Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass)
28
Some thoughts on how evaluation might help us to
understand the complicated and the complex
Issues that may need to be addressed
1.
2.
3.
4.
5.
6.
7.
Focus
Governance
Consistency
Necessariness
Sufficiency
Change trajectory
Unintended outcomes
Possible evaluation methods, approaches
and methodologies
• Emergent evaluation design that can
accommodate emergent program
objectives and emergent evaluation issues
• Collaborative evaluation across different
stakeholders and organisations
• Non-experimental approaches to causal
attribution/contribution that don’t rely on a
standardized ‘treatment’
• Realist evaluation that pays attention to the
contexts in which causal mechanisms
operate
• Realist synthesis that can integrate diverse
evidence (including credible single case
studies) in different contexts
• ‘Butterfly nets’ to catch unanticipated
results
29
Looking forward to
hearing about your approaches
to addressing these issues in evaluation
30
Download