Impact Evaluation Overview - Center for Effective Global Action

advertisement
Impact Evaluation:
An Overview
Lori Beaman, PhD
RWJF Scholar in Health Policy
UC Berkeley
What is Impact Evaluation?

IE assesses how a program affects the well-being
or welfare of individuals, households or
communities (or businesses)

Well-being at the individual level can be captured
by income & consumption, health outcomes or
ideally both

At the community level, poverty levels or growth
rates may be appropriate, depending on the
question
Outline

Advantages of Impact Evaluation

Challenges for IE: Need for Comparison
Groups

Methods for Constructing Comparison
IE Versus other M&E Tools

The key distinction between impact evaluation
and other M&E tools is the focus on discerning
the impact of the program from all other
confounding effects

IE seeks to provide evidence of the causal link
between an intervention and outcomes
Monitoring and IE
IMPACT
OUTCOMES
OUTPUTS
INPUTS
Effect on living standards and welfare
- infant and child mortality,
- improved household income
Access, usage and satisfaction of users
- number of children vaccinated,
- percentage within 5 km of health center
Goods and services generated
- number of nurses
- availability of medicine
Financial and physical resources
- spending in primary health care
Monitoring and IE
Program impacts
confounded by local,
national, global effects
IMPACTS
OUTCOMES
Users meet
service
delivery
OUTPUTS
Gov’t/program
production
function
INPUTS
difficulty
of
showing
causality
Logic Model: An Example
Consider a program of providing
Insecticide-Treated Nets (ITNs) to poor
households
 What are:





Inputs?
Outputs?
Outcomes?
Impacts?
Logic Model: An Example
Inputs: # of ITNs; # of health or NGO
employees to help dissemination
 Outputs: # of ITNs received by HHs
 Outcomes: ITNs utilized by # of
households
 Impact: Reduction in illness from malaria;
increase in income; improvements in
children’s school attendance and
performance

Advantages of IE

In order to be able to determine which projects
are successful, need a carefully designed impact
evaluation strategy

This is useful for:
 Understanding if projects worked:





Justification for funding
Scaling up
Meta-analysis: Learning from Others
Cost-benefit tradeoffs across projects
Can test between different approaches of same
program or different projects to meet national
indicator
Essential Methodology

Difficulty is determining what would have
happened to the individuals or communities of
interest in absence of the project

The key component to an impact evaluation is to
construct a suitable comparison group to proxy
for the “counterfactual”

Problem: can only observe people in one state of
the world at one time
Before/After Comparisons

Why not collect data on individuals before and
after intervention (the Reflexive)? Difference in
income, etc, would be due to project

Problem: many things change over time,
including the project


The country is growing and ITN usage is increasing
generally (from 2000-2003 in NetMark data), so
how do we know an increase in ITN use is due to
the program or would have occurred in absence of
program?
Many factors affect malaria rate in a given year
Example: Providing Insecticide-Treated
Nets (ITNs) to Poor Households

The intervention: provide free ITNs to households
in Zamfara
 Program targets poor areas
 Women have to enroll at local NGO office in
order to receive bednets
 Starts in 2002, ends in 2003, we have data on
malaria rates from 2001-2004

Scenario 1: we observe that the households in
Zamfara we provided bednets to have an
increase malaria from 2002 to 2003
Basic Problem of Impact Evaluation:
Scenario 1
Underestimated Impact when
Malaria
Rate
using before/after comparisons: High rainfall year
Zamfara households
with bednets
Impact = C – A?
An increase in
malaria rate!
C
A
2001
2002 Treatment Period
2003
2004
Years
Basic Problem of Impact Evaluation:
Scenario 1
Underestimated Impact when
Malaria
Rate
using before/after comparisons: High rainfall year
“Counterfactual”
Zamfara Households if
no bednets provided
B
C
Impact ≠ C - A
Impact = C – B
A Decline in the
Malaria Rate!
Zamfara households
with bednets
A
2001
2002 Treatment Period
2003
2004
Years
Basic Problem of Impact Evaluation:
Scenario 2
Overestimated Impact: Bad Rainfall
Malaria
Rate
Impact ≠ C - A
“Counterfactual”
(Zamfara households
if no bednets provided)
B
A
Zamfara households
C
2001
2002 Treatment Period
TRUE Impact =
C-B
2003
2004
Years
Comparison Groups


Instead of using before/after comparisons, we
need to use comparison groups to proxy for the
counterfactual
Two Core Problems in Finding Suitable Groups:

Programs are targeted


Participation is voluntary

•
Recipients receive intervention for particular reason
Individuals who participate differ in observable and
unobservable ways (selection bias)
Hence, a comparison of participants and an
arbitrary group of non-participants can lead to
misleading or incorrect results
Comparison 1: Treatment and Region B

Scenario 1: Failure of reflexive comparison due to higher
rainfall, and everyone experienced an increase in malaria
rates

We compare the households in the program region to those
in another region

We find that our “treatment” households in Zamfara have a
larger increase in malaria rates than those in region B, Oyo.
Did the program have a negative impact?

Not necessarily! Program placement is important:
 Region B has better sanitation and therefore affected less
by rainfall (unobservable)
Basic Problem of Impact Evaluation:
Program Placement
High Rainfall
Malaria
rate
D
TRUE IMPACT: E-D
E
“Treatment”: Zamfara
A
2001
2002 Treatment Period
2003
2004
Years
Basic Problem of Impact Evaluation:
Program Placement
Underestimated Impact when using region B
comparison group: High Rainfall
Malaria
rate
E-A > C-B : Region B
affected less by rainfall
Region B: Oyo
C
B
D
TRUE IMPACT: E-D
E
“Treatment”: Zamfara
A
2001
2002 Treatment Period
2003
2004
Years
Comparison 2: Treatment vs. Neighbors

We compare “treatment” households with their neighbors.
We think the sanitation and rainfall patterns are about the
same.

Scenario 2: Let’s say we observe that treatment
households’ malaria rates decrease more than comparison
households. Did the program work?
 Not necessarily: There may be two types of households:
types A and B, with A knowing how malaria is
transmitted and also burn mosquito coils
 Type A households were more likely to register with the
program. However, their other characteristics mean they
would have had lower malaria rates in the absence of
the ITNs (individual unobservables).
Basic Problem of Impact Evaluation:
Selection Bias
Comparing Project Beneficiaries (Type A) to
Malaria
Rates
Neighbors (Type B)
Type B HHs
Observed
difference
Type A HHs with
Project
Y1
Y2
Treatment Period
Y3
Y4
Years
Basic Problem of Impact Evaluation:
Selection Bias
Participants are often different than Non-participants
Malaria
Rates
Type B HHs
Observed
difference
Selection Bias
True Impact
Type A Households
Type A HHs with
Project
Y1
Y2
Treatment Period
Y3
Y4
Years
Basic Problem of Impact Evaluation:
Spillover Effects

Another difficulty finding a true counterfactual
has to do will spillover or contagion effects

Example: ITNs will not only reduce malaria rates
for those sleeping under nets, but also may lower
overall rates because ITNs kill mosquitoes

Problem: children who did not receive
“treatment” may also have lower malaria rates –
and therefore higher school attendance rates

Generally leads to underestimate of treatment
effect
Basic Problem of Impact Evaluation:
Spillover Effects
School Attendance
“Treatment” Children
B
Impact ≠ B - C
Impact = B - A
C
A
2001
2002 Treatment Period
“Control” Group of
Children in
Neighborhood School
C>A due to spillover
from treatment
children
2003
2004
Years
Counterfactual: Methodology

We need a comparison group that is as
identical in observable and unobservable
dimensions as possible, to those receiving
the program, and a comparison group that
will not receive spillover benefits.

Number of techniques:
 Randomization as gold standard
 Various Techniques of Matching
How to construct a comparison group –
building the counterfactual
1.
2.
3.
4.
Randomization
Difference-in-Difference
Regression discontinuity
Matching


Pipeline comparisons
Propensity score
1. Randomization

Individuals/communities/firms are randomly
assigned into participation

Counterfactual: randomized-out group

Advantages:
 Often addressed to as the “gold standard”: by
design: selection bias is zero on average
and mean impact is revealed
 Perceived as a fair process of allocation with
limited resources
Randomization: Disadvantages

Disadvantages:




Ethical issues, political constraints
Internal validity (exogeneity): people might
not comply with the assignment (selective
non-compliance)
External validity (generalizability): usually run
controlled experiment on a pilot, small scale.
Difficult to extrapolate the results to a larger
population.
Does not always solve problem of spillovers
When to Randomize

If funds are insufficient to treat all eligible
recipients
 Randomization can be the most fair and
transparent approach

The program is administered at the individual,
household or community level
 Higher level of implementation difficult:
example – trunk roads

Program will be scaled-up: learning what works is
very valuable
2. Difference-in-difference

Observations over time: compare observed changes
in the outcomes for a sample of participants and
non-participants

Identification assumption: the selection bias or
unobservable characteristics are time-invariant
(‘parallel trends’ in the absence of the program)

Counter-factual: changes over time for the nonparticipants
Diff-in-Diff: Continued
Constraint: Requires at least two cross-sections
of data, pre-program and post-program on
participants and non-participants
 Need to think about the evaluation ex-ante,
before the program
 More valid if there are 2 pre-periods so can
observe whether trend is same

Can be in principle combined with matching to
adjust for pre-treatment differences that affect
the growth rate
Implementing differences in differences:
Different Strategies
Some arbitrary comparison group
 Matched diff in diff
 Randomized diff in diff


These are in order of more problems 
less problems, think about this as we look
at this graphically
Essential Assumptions of Diff-in-Diff
Initial
difference
Y1
must be time
invariant
Y1*

In absence
of program,
the change
over time
would be
identical

Impact
Y0
t=0
t=1 time
Difference-in-Difference in ITN
Example

Instead of comparing Zamfara to Oyo,
compare Zamfara to Niger if:



While Zamfara and Oyo have different malaria
rates and different ITN usage, we expect that
they change in parallel
Use NetMark data to compare 2000 to 2003 in
Zamfara and Niger states
Use additional data (GHS, NLSS) to compare
incomes and sanitation infrastructure levels
and changes prior to program implementation
3. Regression discontinuity design

Exploit the rule generating assignment into a program
given to individuals only above a given threshold –
Assume that discontinuity in participation but not in
counterfactual outcomes

Counterfactual: individuals just below the cut-off who
did not participate


Advantages:
 “Identification” built in the program design
 Delivers marginal gains from the program around the
eligibility cut-off point. Important for program expansion
Disadvantages:
 Threshold has to be applied in practice, and individuals
should not be able manipulate the score used in the
program to become eligible
RDD in ITN Example
Program available for poor households
 Eligibility criteria: must be below the
national poverty line or < 1 ha of land
 Treatment group: those below cut-off



Those with income below the poverty line and
therefore qualified for ITNs
Comparison group: those right above the
cutoff

Those with income just above poverty line and
therefore not-eligible
RDD in ITN Example

Problems:



How well enforced was the rule?
Can the rule be manipulated?
Local effect: may not be generalizable if
program expands to households well above
poverty line

Particularly relevant since NetMark data indicate low
ITN usage across all socio-economic status groups
4. Matching

Match participants with non-participants from a larger
survey

Counterfactual: matched comparison group

Each program participant is paired with one or more nonparticipant that are similar based on observable
characteristics

Assumes that, conditional on the set of observables, there
is no selection bias based on unobserved heterogeneity

When the set of variables to match is large, often match
on a summary statistics: the probability of participation
as a function of the observables (the propensity score)
4. Matching

Advantages:


Does not require randomization, nor baseline (preintervention data)
Disadvantages:



Strong identification assumptions
 In many cases, may make interpretation of results
very difficult
Requires very good quality data: need to control for all
factors that influence program placement
Requires significantly large sample size to generate
comparison group
Matching in Practice

Using statistical techniques, we match a group of
non-participants with participants using variables
like gender, household size, education,
experience, land size (rainfall to control for
drought), irrigation (as many observable
characteristics not affected by program
intervention)

One common method: Propensity Score Matching
Matching in Practice: 2 Approaches

Approach 1: After program implementation, we match
(within region) those who received ITNs with those who
did not. Problem?



Problem: likelihood of usage of different households is
unobservable, so not included in propensity score
This creates selection bias
Approach 2: The program is allocated based on land
size. After implementation, we match those eligible in
region A with those in region B. Problem?


Problems: same issues of individual unobservables, but
lessened because we compare eligible to potential eligible
Now problem of unobservable factors across regions
An extension of matching:
pipeline comparisons

Idea: compare those just about to get an
intervention with those getting it now

Assumption: the stopping point of the intervention
does not separate two fundamentally different
populations

Example: extending irrigation networks

In ITN example: If only some communities within
Zamfara receive ITNs in round 1: compare them to
nearby communities will receive ITNs in round 2

Difficulty with Infrastructure: Spillover effects may
be strong or anticipatory effect
Download