What*s involved in *rigorous impact evaluation

advertisement
* What’s involved in
“rigorous impact
evaluation”?
IOCE proposes more
holistic perspectives
Presented by Jim Rugh
to NONIE Conference in Paris 28 March 2011
Join me in a review the basics of:
*
An introduction to various evaluation designs
Illustrating the need for quasi-experimental
longitudinal time series evaluation design
Project participants
Comparison group
baseline
scale of major impact indicator
end of project
evaluation
post project
evaluation
4
… one at a time, beginning with the most rigorous design.
5
X = Intervention (treatment), I.e. what the
project does in a community
O = Observation event (e.g. baseline, mid-term
evaluation, end-of-project evaluation)
P (top row): Project participants
C (bottom row): Comparison (control) group
6
Design #1: Longitudinal Quasi-experimental
P1
X
C1
P2
X
C2
P3
P4
C3
C4
Project participants
Comparison group
baseline
midterm
end of project
evaluation
post project
evaluation
7
Design #2: Quasi-experimental (pre+post, with comparison)
P1
X
P2
C1
C2
Project participants
Comparison group
baseline
end of project
evaluation
8
Design #2+: Typical Randomized Control Trial
P1
X
P2
C1
C2
Project participants
Research subjects
randomly assigned
either to project or
control group.
Control group
baseline
end of project
evaluation
9
Design #3: Truncated QED
X
P1
X
C1
P2
C2
Project participants
Comparison group
midterm
end of project
evaluation
10
Design #4: Pre+post of project; post-only comparison
P1
X
P2
C
Project participants
Comparison group
baseline
end of project
evaluation
11
Design #5: Post-test only of project and comparison
X
P
C
Project participants
Comparison group
end of project
evaluation
12
Design #6: Pre+post of project; no comparison
P1
X
P2
Project participants
baseline
end of project
evaluation
13
Design #7: Post-test only of project participants
X
P
Project participants
end of project
evaluation
14
D
e
s
i
g
n
T1
T4
cont.)
(endline)
(ex-post)
X
P3
C3
P4
C4
X
P2
C2
X
P2
C2
X
X
P2
C2
X
X
P1
C1
X
X
P2
X
X
P1
(baseline)
(intervention)
1
P1
C1
X
2
P1
C1
X
3
4
X
P1
5
6
7
P1
T2
X
T3
X
(midterm)
P2
C2
P1
C1
(intervention,
Note: These 7 evaluation designs are described in the
15
RealWorld Evaluation
book
What kinds of evaluation designs are
actually used in the real world (of
international development)? Findings from
meta-evaluations of 336 evaluation reports
of an INGO.
Post-test only
59%
Before-and-after
25%
With-and-without
15%
Other
counterfactual
1%
Even proponents of RCTs have acknowledged that
RTCs are only appropriate for perhaps 5% of
development interventions. An empirical study by
Forss and Bandstein, examining evaluations in the
OECD/DAC DEReC database by bilateral and
multilateral organisations found only 5% used even
a counterfactual design.
While we recognize that experimental and quasi
experimental designs have a place in the toolkit
for impact evaluations, we think that more
attention needs to be paid to the roughly 95% of
situations where these designs would not be
possible or appropriate.
*
One form of Program Theory (Logic) Model
Economic context
in which the
project operates
Design
Inputs
Institutional and
operational
context
Political context
in which the
project operates
Implementation
Process
Outputs
Outcomes
Impacts
Socio-economic and cultural characteristics
of the affected populations
Note: The orange boxes are included in conventional Program Theory Models. The
addition of the blue boxes provides the recommended
more complete analysis.
19
Sustainability
20
Consequences
Consequences
Consequences
PROBLEM
PRIMARY
CAUSE 1
Secondary
cause 2.1
Tertiary
cause 2.2.1
PRIMARY
CAUSE 2
Secondary
cause 2.2
Tertiary
cause 2.2.2
PRIMARY
CAUSE 3
Secondary
cause 2.3
Tertiary
cause 2.2.3
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTPUT 2.1
OUTCOME
2
OUTPUT 2.2
OUTCOME
3
OUTPUT 2.3
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
High infant mortality rate
Children are malnourished
Insufficient
food
Contaminated
water
Flies and
rodents
Diarrheal
disease
Unsanitary
practices
Do not use
facilities
correctly
Poor quality
of food
Need for
improved health
policies
People do not
wash hands
before eating
Reduction in poverty
Women empowered
Women in
leadership roles
Improved
educational
policies
Parents
persuaded to
send girls to
school
Young women
educated
Economic
opportunities
for women
Female
enrollment
rates increase
Curriculum
improved
Schools
built
School system
hires and pays
teachers
To have synergy and achieve impact all of these need to address
the same target population.
Program Goal: Young
women educated
Advocacy
Project
Goal:
Improved
educational
policies
enacted
ASSUMPTION
(that others will do this)
Construction
Project Goal:
More
classrooms
built
OUR project
Teacher
Education
Project
Goal:
Improve
quality of
curriculum
PARTNER will do this
Program goal at impact level
We need to recognize which evaluative
process is most appropriate for
measurement at various levels
• Impact
• Outcomes
• Output
• Activities
• Inputs
PROGRAM EVALUATION
PROJECT EVALUATION
PERFORMANCE MONITORING
The “Rosetta Stone of Logical Frameworks”
Needs-based
American Red
Cross
AusAID
Ultimate
Impact
End Outcomes
Intermediate Outputs
Outcomes
Interventions
Higher
Consequence
Program Goal
Specific Problem
Cause
Solution
Process
Inputs
Project Impact
Outcomes
Outputs
Activities
Inputs
Major
Development
Objectives
Intermediate
Objectives
Effects
Outputs
Activities
Inputs
Outputs
Activities
Inputs
Outputs
Activities
Inputs
Project purpose
Intermediate
Results
Purpose
Results/Outputs
Outputs
Activities
Activities
Inputs
Inputs
Outputs
Activities
Specific
Objective
Results
Expected
Results
Activities
Activities
Immediate
Objectives
Purpose
Results
Intermediate
Results
Project
Objective
Strategic
27
Objective
Intermediate
Outputs
Activities
Inputs
Outputs
Objectives
Outputs
Activities
Activities
Activities
Volunteers
Inputs
Outputs
Activities
Input/Resources
Intermediate
Results
(Outputs)
Activities
Inputs
(Activities)
(Inputs)
Scheme Goal
CARE logframe Program Goal
Project Final Goal
CARE
terminology
CIDA + GTZ
CRS Proframe
Program Impact
Project Impact
DANIDA +
DfID
EIDHR
Goal
Overall goal
Goal
Strategic Objective
Overall
Objectives
European Union Overall
Project Purpose
Objective
FAO + UNDP + Development Objective
NORAD
PC/LogFrame
Goal
Peace Corps
Purpose
Goals
Goal
Strategic Objective
SAVE – Results
Framework
UNHCR
Sector
Goal
Objective
USAID
Final Goal
LogFrame
USAID Results Goal
Strategic Objective
*
How do we know if the observed changes in
the project participants or communities

income, health, attitudes, school attendance, etc.
are due to the implementation of the project

credit, water supply, transport vouchers, school
construction, etc.
or to other unrelated factors?

changes in the economy, demographic movements,
other development programs, etc.
29
What change would have occurred in
the relevant condition of the target
population if there had been no
intervention by this project?
30
*
Control group = randomized allocation of
subjects to project and non-treatment
group
Comparison group = separate procedure
for sampling project and non-treatment
groups that are as similar as possible in all
aspects except the treatment (intervention)
31
2003
2006
J-PAL is best understood as a network of
affiliated researchers … united by their use
of the randomized trial methodology…
2008
2010
2009
32
So, are Randomized Control Trials (RCTs) are
the Gold Standard and should they be used in
most if not all program impact evaluations?
Yes or no?
Why or why not?
If so, under what circumstances
should they be used?
If not, under what circumstances
would they not be appropriate?
33
Question needed for evidence-based
policy 
What works?
What interventions look like 
Discrete, standardized intervention
How interventions work 
Pretty much the same everywhere
Process needed for evidence
uptake 
Knowledge transfer
34
Adapted from Patricia Rogers, RMIT University
• Complicated, complex programs where there are
multiple interventions by multiple actors
• Projects working in evolving contexts (e.g.
countries in transition, conflicts, natural disasters)
• Projects with multiple layered logic models, or
unclear cause-effect relationships between
outputs and higher level “vision statements” (as is
often the case in the real world of international
development projects)
35
There are other methods for
assessing the counterfactual
Reliable secondary data that depicts relevant
trends in the population
Longitudinal monitoring data (if it includes nonreached population)
Qualitative methods to obtain perspectives of
key informants, participants, neighbors, etc.
36
A conventional statistical counterfactual (with random selection into
treatment and control groups) is often not possible/appropriate:
 When conducting the evaluation of complex interventions
 When the project involves a number of interventions which may be
used in different combinations in different locations
 When each project location is affected by a different set of
contextual factors
 When it is not possible to use standard implementation procedures for
all project locations
 When many outcomes involve complex behavioral changes
 When many outcomes are multidimensional or difficult to measure
through standardized quantitative indicators.
37
Some of the alternative approaches for
constructing a counterfactual
A: Theory based approaches
1. Program theory / logic models
2. Realistic evaluation
3. Process tracing
4. Venn diagrams and many other PRA methods
5. Historical methods
6. Forensic detective work
7. Compilation of a list of plausible alternative causes
8. …
(for more details see www.RealWorldEvaluation.org)
Some of the alternative approaches for
constructing a counterfactual
B: Quantitatively oriented approaches
1. Pipeline design
2. Natural variations
3. Creative uses of secondary data
4. Creative creation of comparison groups
5. Comparison with other programs
6. Comparing different types of interventions
7. Cohort analysis
8. …
(for more details see www.RealWorldEvaluation.org)
Some of the alternative approaches for
constructing a counterfactual
C: Qualitatively oriented approaches
1. Concept mapping
2. Creative use of secondary data
3. Many PRA techniques
4. Process tracing
5. Compiling a book of possible causes
6. Comparisons between different projects
7. Comparisons among project locations with different
combinations and levels of treatment
(for more details see www.RealWorldEvaluation.org)
*
Different lenses needed for different
situations in the RealWorld
Simple
Complicated
Complex
Following a recipe
Sending a rocket to
the moon
Raising a child
Recipes are tested to
Sending one rocket to
assure easy replication the moon increases
assurance that the
next will also be a
success
Raising one child
provides experience
but is no guarantee of
success with the next
The best recipes give
good results every
time
Uncertainty of
outcome remains
There is a high degree
of certainty of
outcome
Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008;
also presented by Patricia Rodgers at Cairo impact conference 2009.
42
What’s a conscientious evaluator to do
when facing such a complex world?
Consequences
Consequences
Consequences
DESIRED IMPACT
OUTCOME
1
OUTCOME 2
OUTCOME
3
A more comprehensive design
OUTPUT 2.1
OUTPUT 2.2
OUTPUT 2.3
A Simple RCT
Intervention
Intervention
Intervention
2.2.1
2.2.2
2.2.3
Expanding the results chain for multi-donor,
multi-component program
Impacts
Intermediate
outcomes
Outputs
Inputs
Increased
rural H/H
income
Increased
production
Credit for
small
farmers
Donor
Increased
political
participation
Access to offfarm
employment
Rural
roads
Government
Improved
education
performance
Increased school
enrolment
Improved
health
Increased use of
health services
Health
services
Schools
Other donors
Attribution gets very difficult! Consider plausible contributions each makes.
*
OECD-DAC (2002: 24) defines impact as “the positive and
negative, primary and secondary long-term effects
produced by a development intervention, directly or
indirectly, intended or unintended. These effects can be
economic, sociocultural, institutional, environmental,
technological or of other types”.
Is it limited to direct attribution? Or point to the need
for counterfactuals or Randomized Control Trials (RCTs)?
47
1.
Direct cause-effect relationship between one output (or a very
limited number of outputs) and an outcome that can be
measured by the end of the research project?  Pretty clear
attribution.
… OR …
2.
Changes in higher-level indicators of sustainable improvement
in the quality of life of people, e.g. the MDGs (Millennium
Development Goals)?  More significant. But assessing
plausible contribution is more feasible than assessing unique
direct attribution.
48
Rigorous impact evaluation should
include (but is not limited to):
1) thorough consultation with and
involvement by a variety of stakeholders,
2) articulating a comprehensive logic model
that includes relevant external
influences,
3) getting agreement on desirable ‘impact
level’ goals and indicators,
4) adapting evaluation design as well as
data collection and analysis
methodologies to respond to the
questions being asked, …
Rigorous impact evaluation should
include (but is not limited to):
5) adequately monitoring and
documenting the process throughout
the life of the program being
evaluated,
6) using an appropriate combination of
methods to triangulate evidence being
collected,
7) being sufficiently flexible to account
for evolving contexts, …
Rigorous impact evaluation should
include (but is not limited to):
8) using a variety of ways to determine
the counterfactual,
9) estimating the potential
sustainability of whatever changes have
been observed,
10) communicating the findings to
different audiences in useful ways,
11) etc. …
The point is that the list of
what’s required for ‘rigorous’
impact evaluation goes way
beyond initial randomization
into treatment and ‘control’
groups.
To attempt to conduct an impact evaluation
of a program using only one pre-determined
tool is to suffer from myopia, which is
unfortunate. On the other hand, to
prescribe to donors and senior managers of
major agencies that there is a single
preferred design and method for conducting
all impact evaluations can and has had
unfortunate consequences for all of those
who are involved in the design,
implementation and evaluation of
international development programs.
We must be careful that in using the
“Gold Standard”
we do not violate the “Golden Rule”:
“Judge not that you not be judged!”
In other words:
“Evaluate others as you would have
them evaluate you.”
Caution: Too often what is called Impact Evaluation is
based on a “we will examine and judge you” paradigm.
When we want our own programs evaluated we prefer a
more holistic approach.
To use the language of the OECD/DAC, let’s be sure our
evaluations are consistent with these criteria:
RELEVANCE: The extent to which the aid activity is suited
to the priorities and policies of the target group, recipient
and donor.
EFFECTIVENESS: The extent to which an aid activity
attains its objectives.
EFFICIENCY: Efficiency measures the outputs – qualitative
and quantitative – in relation to the inputs.
IMPACT: The positive and negative changes produced by a
development intervention, directly or indirectly, intended
or unintended.
SUSTAINABILITY is concerned with measuring whether the
benefits of an activity are likely to continue after donor
funding has been withdrawn. Projects need to be
environmentally as well as financially sustainable.
The bottom line is defined by this
question:
Are our programs making
plausible contributions towards
positive impact on the quality of
life of our intended beneficiaries?
Let’s not forget them!
58
58
Download