Introduction to impact evaluation

advertisement
Quality Impact Evaluation: an
introductory workshop
Howard White
International Initiative for Impact Evaluation
Howard White
www.3ieimpact.org
PART I
INTRODUCTION TO IMPACT
EVALUATION
Howard White
www.3ieimpact.org
What is impact and why does it matter?
• Write down a definition of impact
evaluation
• Impact = the (outcome) indicator with the
intervention compared to what it would
have been in the absence of the
intervention
• Impact evaluation is the only meaningful
way to measure results
Howard White
www.3ieimpact.org
Why results?
• Results agenda from early 1990s
• USAID experience
• Move away from outcome monitoring to
impact evaluation to evidence based
development
Howard White
www.3ieimpact.org
What is evidence-based development?
Allocating
resources to
programs
designed and
implemented on
the basis of
evidence of what
works and what
doesn’t
Howard White
www.3ieimpact.org
Why did the
Bangladesh
Integrated
Nutrition
Program
(BINP) fail?
Howard White
Why did the
Bangladesh
Integrated
Nutrition
Project (BINP)
fail?
www.3ieimpact.org
Comparison of impact estimates
Howard White
www.3ieimpact.org
Summary of theory
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Behaviour change
sufficient to change
child nutrition
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Improved
nutritional
outcomes
Food is of sufficient
quantity and quality
Howard White
www.3ieimpact.org
The theory of change
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Behaviour change
sufficient to change
child nutrition
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Right target
group for
nutritional
counselling
Food is of sufficient
quantity and quality
Howard White
www.3ieimpact.org
Improved
nutritional
outcomes
The theory of change
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Knowledge
acquired and
used
Behaviour change
sufficient to change
child nutrition
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Food is of sufficient
quantity and quality
Howard White
www.3ieimpact.org
Improved
nutritional
outcomes
The theory of change
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Children are
correctly
identified to
be enrolled in
the program
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Behaviour change
sufficient to change
child nutrition
Food is
delivered to
those enrolled
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
The right
children are
enrolled in the
programme
Food is of sufficient
quantity and quality
Howard White
www.3ieimpact.org
Improved
nutritional
outcomes
The theory of change
Target group
participate in
program
(mothers of
young
children)
Target group
for
nutritional
counselling is
the relevant
one
Exposure to
nutritional
counselling
results in
knowledge
acquisition and
behaviour
change
Children are
correctly
identified to
be enrolled in
the program
Food is
delivered to
those enrolled
Supplementary
feeding is
supplementary
Behaviour change
sufficient to change
child nutrition
Improved
nutritional
outcomes
Supplementary
feeding is
supplemental, i.e.
no leakage or
substitution
Food is of sufficient
quantity and quality
Howard White
www.3ieimpact.org
P r o b a b ility o f p a r t ic ip a t io n in g r o w th m o n ito r in n g
Participation rates
1 .0
L iv in g in
R a jn a g a r o r
S h a h ra s ti
0 .9
0 .8
0 .7
0 .6
L iv in g w ith
m o th e r-in -la w
in R o r S
0 .5
0 .4
0 .3
0 .2
0 .1
0 .0
B a s e v a lu e
Howard White
L iv in g w ith m o th e r-in -la w in
R a jn a g a r o r S h a h ra s ti
H ig h e r e d u c a tio n
www.3ieimpact.org
N o w a te r o r s a n ita tio n
(re m o te lo c a tio n )
The attribution problem:
factual and counterfactual
Impact
varies over
time
Impact varies over time
Howard White
www.3ieimpact.org
… and is it sustainable?
Howard White
www.3ieimpact.org
What has been the impact of the French
revolution?
“It is too early to say”
Zhou Enlai
Howard White
www.3ieimpact.org
• So where does the counterfactual come
from?
• Most usual is to use a comparison group
of similar people / households / schools /
firms…
Howard White
www.3ieimpact.org
What do we need to measure
impact? Girl’s secondary
enrolment
Before
After
Project (treatment)
92
comparison
The majority of evaluations
have just this information
… which means we can
say absolutely nothing
about impact
Howard White
www.3ieimpact.org
Before versus after single difference comparison
Before versus after = 92 – 40 = 52
Project (treatment)
Before
After
40
92
comparison
“scholarships
have led to rising
schooling of
young girls in the
project villages”
Howard White
This ‘before versus after’ approach is
outcome monitoring, which has become
popular recently. Outcome monitoring
has its place, but it is not impact
evaluation
www.3ieimpact.org
Rates of completion of elementary male and female
students in all rural China’s poor areas
Share of rural children
100
80
60
40
20
1993 2008
1993 2008
www.3ieimpact.org
girls
Howard White
boys
0
Post-treatment comparison
comparison
Single difference = 92 – 84 = 8
Before
After
Project (treatment)
92
comparison
84
But we don’t know if they were
similar before… though there are
ways of doing this (statistical
matching = quasi-experimental
approaches)
Howard White
www.3ieimpact.org
Double difference =
(92-40)-(84-26) = 52-58 = -6
Before
After
Project (treatment)
40
92
comparison
26
84
Conclusion: Longitudinal (panel) data, with a
comparison group, allow for the strongest
impact evaluation design (though still need
matching). SO WE NEED BASELINE DATA
FROM PROJECT AND COMPARISON AREAS
Howard White
www.3ieimpact.org
Exercise
• What is the objective of your intervention?
• Define up to three main outcome
indicators for your intervention
• Using hypothetical outcome data for one
indicator write down the before/after,
comparison/treatment matrix calculate the
– Ex-post single difference
– Before versus after (single difference)
Before
– Double difference
Project
impact estimates
Comparison
Howard White
www.3ieimpact.org
After
Small n versus large n
evaluation designs
Howard White
www.3ieimpact.org
Main points so far
• Analysis of impact implies a counterfactual
comparison
• Outcome monitoring is a factual analysis, and so
cannot tell us about impact
• The counterfactual is most commonly
determined by using a comparison group
If you are going to do impact evaluation you
need a credible counterfactual using a comparison group
- VERY PREFERABLY WITH BASELINE DATA
Howard White
www.3ieimpact.org
However….
• This is for ‘large n’ interventions
– There are a large number of units of intervention, e.g. children,
households, firms, schools.
– Examples of small n are policy reform and many (but not all) capacity
building projects.
– Some reforms (e.g. health insurance) can be given large n designs
• ‘Small n’ interventions require either
– Modelling (computable general equilibrium, CGE, models), e.g. trade
and fiscal policy
– Qualitative approaches, e.g. the impact of impact assessments
– A theory-based large n study may have elements of small n analysis at
some stages of the causal chain
Howard White
www.3ieimpact.org
Thoughts on small n
• Identify theory of change reflecting
multiple players and channels of influence
• Stakeholder mapping
• Avoiding leading questions
• Looking for footprints
Howard White
www.3ieimpact.org
Example: channels for donor influence
Direct
influence on
government
Formal channels
Semi-formal
channels
Informal channels
Annual aid
negotiations
Direct high-level
communication to
recipient head of
state
Margins of CG
meetings
Direct
communication from
Minister to head of
IFI
Informal contacts
IFI-own staff
‘Gentleman’s
agreements’
Social contacts in
country
CG meetings
Impact of donor
financed TC
Local consultative
groups
Indirect influence via
IFIs
Board
SPA
CG meetings
Indirect via other
agencies
Like minded groups
Margins CG/SPA
meetings
Joint programs
Leadership by
example
Howard White
www.3ieimpact.org
Exercise
• Which elements of your intervention are
amenable to a large n impact evaluation
design, and which a small n design?
• Are there any bits left?
Howard White
www.3ieimpact.org
Problems in implementing rigorous impact
evaluation: selecting a comparison group
• Contagion: other interventions
• Spill over effects: comparison affected by
intervention
• Selection bias: beneficiaries are different
• Ethical and political considerations
Howard White
www.3ieimpact.org
The problem of selection bias
• Program participants are not chosen at random,
but selected through
– Program placement
– Self selection
• This is a problem if the correlates of selection
are also correlated with the outcomes of interest,
since those participating would do better (or
worse) than others regardless of the intervention
Howard White
www.3ieimpact.org
Selection bias from program placement
• A program of school improvements is targeted at the
poorest schools
• Since these schools are in poorer areas it is likely
that students have home and parental
characteristics are associated with lower learning
outcomes (e.g. illiteracy, no electricity, child labour)
• Hence learning outcomes in project schools will be
lower than the average for other schools
• The comparison group has to be drawn from a
group of schools in similarly deprived areas
Howard White
www.3ieimpact.org
Selection bias from self-selection
• A community fund is available for community-identified
projects
• An intended outcome is to build social capital for future
community development activities
• But those communities with higher degrees of cohesion
and social organization (i.e. social capital) are more
likely to be able to make proposals for financing
• Hence social capital is higher amongst beneficiary
communities than non-beneficiaries regardless of the
intervention, so a comparison between these two groups
will overstate program impact
Howard White
www.3ieimpact.org
Examples of selection bias
• Hospital delivery in Bangladesh (0.115 vs 0.067)
• Secondary education and teenage pregnancy in
Zambia
• Male circumcision and HIV/AIDS in Africa
Howard White
www.3ieimpact.org
HIV/AIDs and
circumcision:
geographical
overlay
Howard White
www.3ieimpact.org
Main point
There is ‘selection’ in who benefits from
nearly all interventions. So need to get
a comparison group which has the
same characteristics as those selected
for the intervention.
Howard White
www.3ieimpact.org
Discussion
• Is selection bias likely for your
intervention? Why and how will it affect the
attempt to measure impact?
Howard White
www.3ieimpact.org
Dealing with selection bias
• Need to use experimental or quasi-experimental
methods to cope with this; this is what has been meant
by rigorous impact evaluation
• Experimental (randomized control trials = RCTs,
commonly used in agricultural research and medical
trials, but are more widely applicable)
• Quasi-experimental
–
–
–
–
Propensity score matching
Regression discontinuity
Pipeline approach
Regressions (including instrumental variables)
Howard White
www.3ieimpact.org
Randomization (RCTs)
• Randomization addresses the problem of selection bias by
the random allocation of the treatment
• Randomization may not be at the same level as the unit of
intervention
– Randomize across schools but measure individual learning outcomes
– Randomize across sub-districts but measure village-level outcomes
• The less units over which you randomize the higher your
standard errors
• But you need to randomize across a ‘reasonable number’ of
units
– At least 30 for simple randomized design (though possible imbalance
considered a problem for n < 200)
– Can be as few as 10 for matched pair randomization
Howard White
www.3ieimpact.org
Issues in randomization
• Randomize across eligible population not whole
population
• Can randomize across the pipeline
• Is no less unethical than any other method with
a control group (perhaps more ethical), and any
intervention which is not immediately universal
in coverage has an untreated population to act
as a potential control group
• No more costly than other survey-based
approaches
Howard White
www.3ieimpact.org
Conducting an RCT
• Has to be an ex-ante design
• Has to be politically feasible, and confidence that program
managers will maintain integrity of the design
• Perform power calculation to determine sample size (and
therefore cost)
• Adopt strict randomization protocol
• Maintain information on how randomization done, refusals
and ‘cross-overs’
• A, B and A+B designs (factorial designs)
• Collect baseline data to:
– Test quality of the match
– Conduct difference in difference analysis
Howard White
www.3ieimpact.org
Exercise
• Is any element of your intervention (or
indirectly supported activities) amenable to
randomization?
• What are the unit of assignment and the
unit of analysis?
• Over how many units will you assign the
treatment?
• What is the treatment? What is the
control?
Howard White
www.3ieimpact.org
Quasi-experimental approaches
• Possible methods
– Propensity score matching
– Regression discontinuity
– Instrumental variables
• Advantage: can be done ex post, and when
random assignment not possible
• Disadvantage: cannot be assured of absence of
selection bias
Howard White
www.3ieimpact.org
Propensity score matching
• Need someone with all the same age,
education, religion etc.
• But, matching on a single number calculated as
a weighted average of these characteristics
gives the same result and matching individually
on every characteristic – this is the basis of
propensity score matching
• The weights are given by the ‘participation
equation’, that is a probit equation of whether a
person participates in the project or not
Howard White
www.3ieimpact.org
Propensity score matching:
what you need
• Can be based on ex post single difference,
though double difference is better
• Need common survey for treatment and
potential comparison, or survey with
common sections for matching variables
and outcomes
Howard White
www.3ieimpact.org
Propensity score matching:
example of matching:
water supply in Nepal
Variable
Before matching
After matching
Rural resident
Treatment: 29%
Comparison: 78%
Treatment: 33%
Comparison: 38%
Richest wealth quintile
Treatment: 46%
Comparison: 2%
Treatment: 39%
Comparison: 36%
H/h higher education
Treatment: 21%
Comparison: 4%
Treatment: 17%
Comparison: 17%
Outcome (diarrhea
incidence children<2)
Treatment: 18%
Comparison: 23%
Treatment: 15%
Comparison: 23%
OR = 1.28
OR = 1.53
Howard White
www.3ieimpact.org
Regression discontinuity: an example –
agricultural input supply program
Howard White
www.3ieimpact.org
Naïve impact estimates
• Total = income(treatment) – income(comparison) = 9.6
• Agricultural h/h only = 7.7
• But there is a clear link between net income and land
holdings
• And it turns out that the program targeted those
households with at least 1.5 ha of land (you can see this
in graph)
• So selection bias is a real issue, as the treatment group
would have been better off in absence of program, so
single difference estimate is upward bias
Howard White
www.3ieimpact.org
Regression discontinuity
• Where there is a ‘threshold allocation rule’ for program
participation, then we can estimate impact by comparing
outcomes for those just above and below the threshold
(as these groups are very similar)
• We can do that by estimating a regression with a dummy
for the threshold value (and possibly also a slope
dummy) – see graph
• In our case the impact estimate is 4.5, which is much
less than that from the naïve estimates (less than half)
Howard White
www.3ieimpact.org
Exercise
• What design would you use to establish
the impact of your intervention on the
outcomes of interest?
Howard White
www.3ieimpact.org
PART II
THEORY-BASED IMPACT EVALUATION
Howard White
www.3ieimpact.org
“I think you
should be
more explicit
here in Stage
2…”
Howard White
www.3ieimpact.org
The missing middle
Howard White
www.3ieimpact.org
Think about theory
Test
scores
Pupil teacher ratio
Howard White
www.3ieimpact.org
Test
scores
Pupil teacher ratio
Howard White
www.3ieimpact.org
Examples of ‘atheoretical’ IEs
• School capitation grant studies that don’t ask
how the money was used
• BCC intervention studies that don’t ask if
behaviour has changed (indeed, almost any
study that does not capture behavior change)
• Microfinance studies that don’t look at use of
funds and cash flow
• Studies of capacity development that don’t ask if
knowledge acquired and used
Howard White
www.3ieimpact.org
Examples of theory-based impact evaluation
Bangladesh Integrated
Nutrition Program
Howard White
Orissa rural sanitation
www.3ieimpact.org
Common questions in TBIE
Design
Implementation
Evaluation questions
Targeting
The right people?
Type I and II errors
Why is targeting failing?
(protocols faulty, not being
followed, corruption ..)
Training /
capacity
building
Right people? Is it
appropriate?
Mechanisms to
ensure skills utilized
Quality of delivery
Skills / knowledge
acquired and used
Have skills / knowledge
changed? Are skills /
knowledge being applied?
Do they make a difference to
outcomes?
Intervention
delivery
Tackling a binding
constraint?
Appropriate?
Within local
institutional capacity
Delivered as intended:
protocols followed, no
leakages, technology
functioning and
maintained
What problems have been
encountered in
implementation? When did
first benefits start being
realized? How is the
intervention perceived by IA
staff and beneficiaries?
Behavior
change
Is desired BC
culturally possible
and appropriate; will
it benefit intended
beneficiaries?
Is BC being promoted as Is behavior change occurring?
intended (right people,
If not, why not?
right message, right
media?)
Howard White
www.3ieimpact.org
The principles
• Map out the causal chain (programme theory): see
figure from BINP example
• Understand context: Bangladesh is not Tamil Nadu
• Anticipate heterogeneity: more malnourished children;
different implementing agencies
• Rigorous evaluation of impact using an appropriate
counterfactual: PSM versus simple comparison
• Rigorous factual analysis: targeting, KP gap, CNPs
• Use mixed methods: informed by anthropology, focus
groups, own field visits
Howard White
www.3ieimpact.org
Using theory to avoid IE
• Were all the other links in the causal chain
in place and working?
• Example, sustainability of social fund
investments, three aspects
– Technical capacity
– Institutional mechanisms
– Financial
Howard White
www.3ieimpact.org
Exercise
• Map out theory of change underlying your
project
• What are the related evaluation
questions?
Howard White
www.3ieimpact.org
MIXED METHODS
Howard White
www.3ieimpact.org
Different parts of causal chain
require different analysis
• Factual versus counterfactual
• Examples of factual
– Use of funds
– Targeting
– Participatory processes
• Quantitative (and different types of quantitative) and
qualitative (includes but not restricted to participatory
methods) and the combination of the two
• And cost data for cost effectiveness or cost-benefit
analysis
Howard White
www.3ieimpact.org
Examples from Andhra Pradesh Self Help
Groups
Number of SHGs
and % penetration
Drop outs &
corrupt practices
The angry man
Returns to cows and goats: quantitative
ethnography
Howard White
www.3ieimpact.org
Why the angry man was angry
Loan allocation is to households not individuals
Howard White
www.3ieimpact.org
More examples
• The disconnected in connected villages
pretty much everywhere
• The role of the community in social funds
in Malawi and Zambia
Howard White
www.3ieimpact.org
Poorest don’t connect
90
… then it takes 7 years for the next 10% to
connect
80
70
Electrification rate
60
All households
Another 10% connect in
the next two years...
54% connect
in first year
Poor households
50
40
30
20
10
0
0
2
4
6
8
Years since grid connection
Howard White
www.3ieimpact.org
10
12
14
People participate in making bricks,
not decisions
Howard White
www.3ieimpact.org
Howard White
www.3ieimpact.org
• Cost data, e.g. DCPP on $/DALY
• CER and CBA
• WTP
Howard White
www.3ieimpact.org
Also mix methods for
identification strategy
Howard White
www.3ieimpact.org
AP village fund
allocation
Fixed funds per community:
more households per SHG
Lower
membership
rates in larger
villages
Howard White
www.3ieimpact.org
The puzzle of
the disconnected
households in
Laos
Howard White
www.3ieimpact.org
You can’t
carry
electricity on
boats
Howard White
www.3ieimpact.org
Mixing methods
• Understanding context to
– Shape evaluation questions (empowering the
poor = democratic IE)
– Design data collection
• Mapping out theory(ies) of change
• Addressing factual questions, leading to…
• … interpretation of counterfactual findings
Howard White
www.3ieimpact.org
General principle: the quality of
data deteriorates the more
formal the process of data
collection
Howard White
www.3ieimpact.org
What do questionnaires miss?
• Consumption
– Festivals
– Labour exchange
– Wildfoods
• Net income from household enterprises
• Abuse of nearly all kinds
• Who is a household member?
Howard White
www.3ieimpact.org
Protein in Northern Zambia
Howard White
www.3ieimpact.org
What is to be done?
•
•
•
•
Know what questions to ask and how
Proxy measures
Enumerator training
Contrived informality
Howard White
www.3ieimpact.org
The challenge of integration
• Parallel studies not integrated studies (multi-disciplinary
not inter-disciplinary)
• Why?
– At best silo mentality, at worst arrogance (“trust me,
I’m an economist”)
– Academic incentives
– People just don’t know how to do it
• What to do?
– Start with theory of change
– Detailed team discussions around causal chain
– Team members who bridge studies
– Quality of external peer review
Howard White
www.3ieimpact.org
Exercise
What sort of data do you require to answer
your evaluation questions, and how will
you collect those data?
Howard White
www.3ieimpact.org
PART III
MANAGING IMPACT EVALUATIONS
Howard White
www.3ieimpact.org
When to do an impact evaluation
• Different stuff
– Pilot programs
– Innovative programs
– New activity areas
• Established stuff
– Representative programs
– Important programs
• Look to fill gaps
Howard White
www.3ieimpact.org
What do IE managers need to know?
• If an IE is needed and viable
• Your role as champion
• The importance of ex ante designs with
baseline (building evaluation into design)
– Funding issues
• The importance of a credible design with a
strong team (and how to recognize that)
– Help on design
• Ensure management feedback loops
Howard White
www.3ieimpact.org
Issues in managing IEs
• Different objective functions of managers
and study teams
• Project management buy-in
• Trade-offs
– On time
– On richness of study design
Howard White
www.3ieimpact.org
Overview on data collection
•
•
•
•
•
•
Baseline, midterm and endline
Treatment and comparison
Process data
Capture contagion and spillovers
Quant and qual
Different levels (e.g. facility data, worker
data) – link the data
• Multiple data sources
Howard White
www.3ieimpact.org
Data used in BINP study
• Project evaluation data (three rounds)
• Save the Children evaluation
• Helen Keller Nutritional Surveillance
Survey
• DHS (one round)
• Project reports
• Anthropological studies of village life
• Action research (focus groups, CNP
survey)
Howard White
www.3ieimpact.org
Some study costs
• IADB vocational training studies:
US$20,000 each
• IEG BINP study US$40,000-60,000
• IEG rural electrification study US$120,000
• IEG Ghana education study US$500,000
• Average 3ie study US$300,000 +
• Average 3ie study in Africa with two
rounds of surveys; US$500,000 +
Howard White
www.3ieimpact.org
Some timelines
• Ex post 12-18 months
• Ex ante:
– lead time for survey design 3-6 months
– Post-survey to first impact estimates 6-9
months
– Report writing and consultation 3-6 months
Howard White
www.3ieimpact.org
Budget and timeline
•
•
•
•
•
Ex post or ex ante
Existing data or new data
How many rounds of data collection?
How large is sample?
When is it sensible to estimate impact?
Howard White
www.3ieimpact.org
Exercise
• Propose for your PROPOSED IMPACT
EVALUATION
– Management structure (quality assurance)
– Timeline for impact evaluation
– Budget
AND THEN GROUP PRESENTATIONS OF NO
MORE THAN FIVE MINUTES
Howard White
www.3ieimpact.org
Presentations and voting
Howard White
www.3ieimpact.org
Remember
Results means impact
But be selective
Be issues-driven not methods driven
Find best available method for evaluation
questions at hand
Randomization often is possible
But do ask, is this sufficiently credible to be
worth doing?
PLEASE COMPLETE YOUR
EXIT SURVEY
Howard White
www.3ieimpact.org
Thank you
Visit www.3ieimpact.org
Howard White
www.3ieimpact.org
Download