Diapositiva 1

advertisement
The counterfactual logic for
public policy evaluation
Alberto Martini
1
Everybody likes “impacts”
(politicians, funders, managing
authorities, eurocrates)
Impact is the most used and
misused term in evaluation
Impacts differ in a
fundamental way from
outputs and results
Outputs and results are
observable quantities
Can we observe an
impact?
No, we can’t
This is a major point of departure
between this and other paradigms
As output indicators measure outputs,
as result indicators
measure results,
so supposedly
impact indicators
measure impacts
Sorry, they don’t
Almost everything about programmes can
be observed (at least in principle):
outputs (beneficiaries served, activities
done, training courses offered,
KM of roads built, sewages cleaned)
outcomes/results (income levels,
inequality, well-being of the population,
pollution, congestion, inflation,
unemployment, birth rate)
Unlike outputs and results,
to measure impacts
one needs to deal with
unobservables
To measure impacts, it is not
enough to “count” something,
or compare results with targets,
or to check progress from baseline
It is necessary to deal with
causality
“Causality is in
the mind”
J.J. Heckman
Nobel Prize Economics 2000
How would you define
impact/effect?
“the difference between a situation
observed after an intervention
has been implemented
and the situation that
…………………………………………………………………..
?
would have occurred without the
intervention
10
There is just one tiny problem
with this definition
the situation that would have
occurred without the
intervention cannot be
observed
11
The social science scientific
community has developed the
notion of potential outcomes
“given a treatment, the potential
outcomes is what we would observe
for the same individual for different
values of the treatment”
12
Hollywood’s
version of
potential
outcomes
13
A priori there are only
potential outcomes of the
intervention, but later
one becomes an observed
outcome, while the other
becomes the counterfactual
outcome
A very intuitive example of the
role of counterfactual analysis
in producing credible evidence
for policy decisions
Does learning and
playing chess have
a positive impact
on achievement in
mathematics?
Policy-relevant question:
Should we make chess part of the
regular curriculum in elementary
schools, to improve mathematics
achievement? Or would it
be a waste of time?
Which kind of evidence do we need to
make this decision in an informed way?
17
Let us assume we
have a crystal ball
and we know
“truth”:
for all pupils we know both potential
outcomes—the math score they
would obtain if they practiced
chess or the score they would obtain
if they did not practice chess
General rule:
what we observe
can be very different
than what is true
Types of pupils
What happens to them
High ability
1/3
Practice chess at home and do
not gain much if taught in school
Mid ability
1/3
Practice chess only if taught in
school, otherwise they do not
learn chess
Low ability
1/3
Unable to play chess effectively,
even if taught in school
20
Potential
outcomes
If they do
play chess
at school
If they do
NOT play
at school
difference
math test scores
High ability
Mid ability
Low ability
70
50
20
70
0
40
10
20
0
Try to memorize these numbers: 70 50 40 20 10
21
SO WE KNOW THAT
1.
there is a true impact
but it is small
2. the only ones to benefit are
mid ability students, for them
the impact is 10 points
22
The naive evidence:
observe the differences between chess
players and non players and infer
something about the impact of chess
The difference between players and non
players measures the effect of playing
chess.
DO YOU AGREE?
23
The usefulness of the potential
outcome way of reasoning is to
make clear what we observe and
we do not observe,
and what we can learn and cannot
learn from the data, and how
mistakes are made
24
What we observe
High ability
70
Mid ability
average=30
Low ability
20
DO YOU SEE THE POINT?
25
Results of the direct comparison
Pupils who play
chess
Average score
= 70 points
Pupils who do not
play chess
Average score
= 30 points
Difference = 40 points
is this the impact of playing chess?
26
Can we attribute the difference of 40 points to
playing chess alone?
OBVIOUSLY NOT
There are many more factors
at play that influence math scores
27
Play
chess
Does it haveCSan impact on?
SELECTION PROCESS
Math
ability
Math
test
scores
DIRDIRE
DIRECT
INFLUENCE
Ignoring math ability could severly mislead us,
if we intend to interpret the difference in test
scores as a causal effect of chess
28
First (obvious) lesson we learn
Most observed differences tell us
nothing about causality
We should be careful in general to
make causal claims based on the
data we observe
29
However, comparing math
test scores for kids who
have learned chess by
themselves and kids
who have not
is pretty silly, isn’t it?
30
Almost as silly as:
Comparing participants of training
courses with non participants and
calling the difference in subsequent
earnings “the impact of training”
Comparing enterprises applying for subsidies
with those not applying and call the
difference in subsequent investment
“the impact of the subsidy”
31
The raw difference between selfselected participants and nonparticipants is a silly way to apply
the counterfactual approach
the problem is selection bias
(pre-existing differences)
32
Now we decide to teach
pupils how to play
chess in school
Schools can
participate
or not
33
We compare pupils in schools that
participated in the program and
pupils in schools which did not
in order to get an estimate of the
impact of teaching chess in school
34
We get the following results
Pupils in the
Pupils in the non
treated schools
treated schools
Average score
= 53 points
Average score
= 29 points
Difference = 24 points
is this the TRUE impact?
35
High ability
Mid ability
Low ability
Schools that
participated
Schools that did
NOT participate
30%
10%
60%
10%
20%
70%
There is an evident difference in
composition between the two
types of schools
36
High ability
Mid ability
Low ability
Schools that
participated
Schools that did
NOT
30%
10 %
60%
10%
WEIGHTED Average of
70, 50 and 20 = 53
20 %
70 %
WEIGHTED Average of
70, 40 and 20 = 29
Average impact = 53 – 29 = 24
37
The difference of 24 points is a
combination of the true impact and
of the difference in composition
If we did not know the truth, we might
take 24 as the true impact on math
score, and being a large impact,
make the wrong decision
38
We have two alternatives:
statistically adjusting the data
or
conducting an experiment
The mostt
39
Any form of adjustment
assumes we have a model in mind,
we know that ability influences
math scores and we know how to
measure ability
40
But even if we do not have all this
information we can conduct a
randomized experiment
The schools who participate get free
instructors to teach chess , provided
they agree to exclude
one classroom at random
41
Results of the randomized experiment
Pupils in the
treated classes in
the volunteer
schools
Average score
= 53 points
Pupils in the
excluded classes in
the volunteer
schools
Average score
= 47 points
Difference = 6 points
this is very close the TRUE impact
42
Schools that
volunteered
High ability
30%
Mid ability
60%
Low ability
10%
Schools that
did NOT
volunteer
random assignment: flip a coin
EXPERIMENTALS
CONTROLS
average of 70, 50 & 20 = 53 mean of 70, 40 & 20 = 47
Impact = 53 – 47 = 6
43
Experiments are not they only
way to identify impacts
However, it is very unlikely that an experiment
will generate grossly mistaken estimates
If anything, they tend to be biased toward zero
On the other hand, some wrong
comparisons can produce wildly mistaken
estimates
44
Download