Cevdet Denizer (Bosphorus University)
Daniel Kaufmann (Revenue Watch and Brookings Institution)
Aart Kraay (World Bank)
IEG Evaluation Week Presentation
March 18, 2013
• Huge literature on aid effectiveness at two levels:
– “macro” level – e.g. does total aid raise aggregate
GDP growth?
– “micro” level – e.g. evaluations (randomized or otherwise) of individual projects
• Know much less about the relative importance of project-specific versus country-specific factors in determining project outcomes
– “macro” literature uninformative about individual projects
– “micro” literature (mostly) does not have crosscountry dimension
• Uses very large sample of 6000+ World Bank projects since 1980s
– crude, but credible, outcome measure for each project based on internal evaluation processes
(IEG project success ratings)
• Match these up with two types of potential correlates of project success:
– “macro” country-level variables (easy...)
– “micro” project-level variables (hard...but interesting)
• Project-level outcomes vary much more within countries than between countries
• Limited cross-country average variation in project performance is well-explained by standard “macro” variables
• Look at variety of “micro” project-level correlates of project-level outcomes
– basic project characteristics
– early-warning indicators
– identity of task team leader
– much more to be done here since this is where most of the action is!
• very crude (sat/unsat, or 6 point scale after 1995)
– definitely not randomized evaluations!
• projects assessed relative to development objective only, these are not standardized across projects
– different standards for DOs across different sectors?
• include sector dummies
– evolving standards for setting DOs and evaluating them?
• include sector x approval period dummies
• include sector x evaluation period dummies
– “setting bar low” in difficult countries?
• significant self-reporting component
– incentives of task managers to give poor ratings?
• independence of IEG?
• many steps from effective individual World Bank projects to any macro growth effects of aid
Despite these concerns, these ratings seem broadly credible and have advantage of huge country-yearproject coverage
• Start with universe of 7342 completed projects evaluated since 1983, and construct two subsets based on (i) availability of RHS variables and (ii) units of evaluation ratings
– 6569 projects evaluated 1983-2011 (binary outcome variable)
– 4191 projects evaluated 1995-2011 (6-point outcome variable)
• All specifications control for:
– potential mean differences across three types of evaluations
– evaluation lag (time between evaluation and completion), usually significantly negative
• “Standard” set of country-level variables from literature
– Good policy (CPIA)
– Shocks (GDP growth)
– Democracy (Freedom House)
• Average each one over life of project
– non-trivial decision how to do this, because projects last a long time (median=6 years)
• alternatives might be initial? final? weighting? separately by year of project life?
Dependent Variable Is:
Real GDP Per Capita Growth
CPIA Rating
Freedom House Rating
Number of Observations
R-Squared
Sector Dummies Y
Sector x Evaluation Period Dummies Y
Sector x Approval Period Dummies Y
Estimation Method OLS
0.00434
(0.99)
6569
0.122
All Projects
1983-2011
Sat/Unsat
1995-2011
1-6 Rating
1.915***
(8.53)
0.118***
(9.70)
4.839***
(6.36)
0.533***
(10.81)
0.0143
(0.88)
4191
0.143
Y
Y
Y
OLS
-0.00653
(-0.59)
1936
0.165
Y
Y
Y
OLS
1983-2011
Sat/Unsat
AFR Sample
1995-2011
1-6 Rating
2.316***
(4.96)
0.118***
(5.11)
5.892***
(3.51)
0.488***
(4.74)
-0.00469
(-0.11)
1172
0.173
Y
Y
Y
OLS
• Generally sensible results in full sample
– policies/institutions matter a lot
• validation of CPIA in PBA
– growth matters
– no strong evidence that political rights/civil liberties matter
• Country-level variables by construction will explain only country-level average variation in project outcomes
• But, country-level average variation in project outcomes is only 20% of the total variation in project outcomes
– based on regression of project outcomes on country dummies, by year – average R-squared is about 0.2
– “macro” correlates explain this 20% reasonably well
• Points to importance of considering project-level factors
(which we do next)
1 2 3 4
Average CPIA Score over Life of Project
IEG Rating Fitted Values
5
• dummy for investment lending (vs DPLs, SALs, etc)
• three proxies for complexity
– “concentration” of project in its major sector
– dummy for “repeater” projects, e.g. Botswana
Education II, III are repeats, Education I is not
– ln(size in dollars)
• project length (years from approval to evaluation)
• preparation and supervision costs as share of total project size
All Projects AFR Projects
1983-2011
Sat/Unsat Dependent Variable Is:
1995-2011
1-6 Rating
1983-2011
Sat/Unsat
1995-2011
1-6 Rating
Dummy for Investment Projects
Share of Project in Largest Sector
Dummy for Repeater Projects
Log(Total Project Size)
Project length (years)
Log(Preparation Costs/Total Size)
Log(Supervision Costs/Total Size)
Number of Observations
R-Squared
Sector Dummies
Sector x Evaluation Period Dummies
Sector x Approval Period Dummies
Estimation Method
0.0489*
(1.73)
0.0771
(0.81)
-0.00111*** -0.00305***
(-3.31) (-3.28)
0.00323
(0.25)
-0.0486***
(-4.46)
-0.0126
(-0.27)
-0.136***
(-3.72)
-0.00523
(-1.12)
-0.00664
(-0.83)
-0.0479***
(-4.55)
6569
0.130
Y
Y
Y
OLS
-0.0307**
(-2.11)
-0.0419
(-1.46)
-0.137***
(-3.93)
4191
0.156
Y
Y
Y
OLS
0.0603
(1.08)
-0.00145**
(-2.06)
-0.0235
(-0.85)
-0.0673***
(-3.22)
-0.0135
(-1.46)
-0.0114
(-0.64)
-0.0628***
(-3.20)
1936
0.177
Y
Y
Y
OLS
-0.0534*
(-1.81)
-0.00414
(-0.08)
-0.148**
(-2.38)
1172
0.188
Y
Y
Y
OLS
0.430**
(2.41)
-0.00431**
(-2.25)
-0.0127
(-0.13)
-0.0777
(-1.13)
• Investment projects do slightly better
• Mixed results on complexity
– projects more concentrated in one sector do worse??
– “repeater” projects don’t do better?
– larger projects do worse
• Length, preparation (and especially supervision) costs negatively correlated with outcomes
– big-time endogeneity problem – e.g. “difficult” projects require more preparation, supervision, take longer
– more on this later (and in paper)
• Effectiveness delay (time in quarters from approval to first disbursement)
• “Early-warning” indicators of problem projects from endof-FY Implementation Status Review (ISR) Reports for each year project is active
• “problem project” flag – raised if task manager thinks progress towards development objective is unsatisfactory
• “potential problem” flag – raised if three or more of 12 detailed flags are raised
• dummy for restructuring (very rare)
– dummy=1 if these flags observed in first half of project
(only for projects lasting at least four years)
Dependent Variable Is:
All Projects
1983-2011 1995-2011
Sat/Unsat 1-6 Rating
AFR Projects
1983-2011 1995-2011
Sat/Unsat 1-6 Rating
Time from Approval to
First Disbursement (quarters)
0.00237
(1.49)
0.0110**
(2.23)
0.00232
(0.75)
0.0135
(1.55)
Dummy for Restructuring
During First Half of Project
0.0978*
(1.94)
Dummy for Problem Project Flag
During First Half of Project
Dummy for Potential Problem Flag
During First Half of Project
-0.141***
(-7.19)
-0.0381
(-1.54)
Number of Observations
R-Squared
3764
0.156
Sector Dummies Y
Sector x Evaluation Period Dummies Y
Sector x Approval Period Dummies
Estimation Method
Y
OLS
0.355***
(2.89)
-0.374***
(-6.33)
-0.100
(-1.48)
2682
0.181
Y
Y
Y
OLS
0.269**
(2.52)
-0.109***
(-2.90)
-0.0824*
(-1.86)
1082
0.200
Y
Y
Y
OLS
0.786***
(3.34)
-0.198*
(-1.83)
-0.213*
(-1.77)
785
0.230
Y
Y
Y
OLS
Approval
First Half of
Implementation
Second Half of
Implementation
Evaluation
3764 Projects
Approval
First Half of
Implementation
Second Half of
Implementation
Evaluation
3764 Projects
943 Problem
Projects 25%
2821 Good
Projects 75%
Approval
3764 Projects
First Half of
Implementation
943 Problem
Projects 25%
2821 Good
Projects 75%
Second Half of
Implementation
592 Problem
Projects 63%
351 Good
Projects 37%
853 Problem
Projects 30%
1968 Good
Projects 70%
Evaluation
Approval
3764 Projects
First Half of
Implementation
943 Problem
Projects 25%
2821 Good
Projects 75%
Second Half of
Implementation
592 Problem
Projects 63%
351 Good
Projects 37%
853 Problem
Projects 30%
1968 Good
Projects 70%
Evaluation
41% Success
81% Success
48% Success
87% Success
Approval
3764 Projects
First Half of
Implementation
943 Problem
Projects 25%
2821 Good
Projects 75%
Second Half of
Implementation
592 Problem
Projects 63%
351 Good
Projects 37%
853 Problem
Projects 30%
1968 Good
Projects 70%
Evaluation
41% Success
55%
81% Success
48% Success
75%
87% Success
Overall 71%
Success Rate
• Effectiveness delays are associated with slightly
better outcomes
• Problem Project Flag raised in first half of life of project are highly significantly negative
• not a mechanical correlation with outcome
• potential problem flags also significant in AFR
• Restructurings are positively correlated with outcomes (more so in AFR)
• Again partial correlations are hard to interpret – e.g. a “difficult” project is more likely to be flagged and is more likely to turn out unsuccessful
• Many of the project variables respond endogenously to project characteristics, e.g.
– “difficult” projects require more supervision, are more likely to be flagged, and also are more likely to be unsuccessful
– creates downward bias in OLS estimates of effects of interventions such as supervision
• Can’t rely on standard solutions like randomized controlled assignment of Bank inputs (infeasible) or instrumental variables (unjustifiable)
Paper has details on alternative approach to quantify likely biases – with reasonable assumptions can retrieve intuitively-plausible positive effects of supervision, flags, etc. on project outcomes – but magnitude hard to pin down precisely
• Task team leader (TTL) is important World Bank “input” into projects
• We have data on the staff ID number of the TTL:
– from final ISR, for 3,925 projects in post-1995 sample
• publicly available in Project Portal
– for each ISR, for 3,187 projects in post-1995 sample
• use to investigate TTL turnover
• Explore two practical questions:
– How important are TTL fixed effects relative to country fixed effects?
– How important is TTL “quality” relative to other correlates of project outcomes?
• In order to investigate this, need a sample where there is meaningful variation across countries and TTLs
– e.g. if each TTL worked in only one country, can’t separately identify country and TTL effects
• Restrict attention to sample of 2407 projects where TTL has managed (i) more than one project, and (ii) in more than one country
– covers 136 countries and 710 TTLs
• For projects where we have “time series of TTLs” by ISR within projects, also identify “Initial” TTL, as distinct from “Final” TTL at time of final ISR
– look at subset of projects where “Initial” and “Final” TTL are different to separately identify “Initial” and “Final” TTL effects
• Proxy for quality of TTL on a given project as average IEG rating on all other projects with same TTL
– only for projects with TTLs managing two or more projects
– variant 1: define quality as average IEG rating over
previous projects managed by same TTL
– variant 2: define quality as weighted average (by number of ISRs) of all other projects the TTL was ever responsible for (not just at the end of project)
• TTL “turnover” is average number of TTLs per ISR
– median project lasts six years, has 12 ISRs, and 2 TTLs
Dependent Variable Is:
(1)
All Projects Evaluated 1995-2011
(2) (3) (5) (6) (7)
1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating
CPIA Rating 0.539*** 0.542*** 0.413*** 0.471*** 0.458*** 0.317***
(10.63) (8.88) (6.73) (7.84) (7.39) (3.21)
TTL Quality (Average Outcome on
all Other Projects)
TTL Quality (Average Outcome on all
Previous Projects)
0.180***
(6.29)
0.155***
(5.31)
TTL Quality (ISR-Weighted Average
Outcome on all Other Projects)
TTL Turnover (Number of TTLs per ISR)
0.188***
(4.32)
Evaluator "Toughness" (Average
Outcome of all Other Projects Rated
By Same Evaluator)
Number of Observations
R-Squared
2407
0.084
1707
0.082
1783
0.049
0.167*** 0.131*** 0.0969**
(5.19) (3.99) (2.42)
-1.282***
(-6.08)
-1.672***
(-4.79)
0.271*** 0.0660
(3.50) (0.77)
1895
0.089
1672
0.059
1063
0.227
• TTL quality is highly significant with economically large effects, e.g. consider move from P25 to P75 of:
– TTL Quality: 3.5→4.75, IEG score ↑ by 0.23
– CPIA Score: 3.1→3.6, IEG score ↑ by 0.22
– Alternative quality measures have similarly large effects
• TTL turnover is highly significant – moving from 2/12 TTLs per ISR to 3/12 TTLs per ISR implies IEG score ↑ by 0.10
– but need to be cautious about endogeneity of TTL turnover
– much more to be done here, e.g. to better understand costs and benefits of 3-5-7 rule
• So far have focused on TTL effects – but could very well also be evaluator effects
– are there “tough” and “easy” evaluators?
– how do they match to TTLs?
• Two data sources on evaluator identity
– anonymized data from IEG on staff who do desk reviews of ICRs, for each project since 1995
– manually (!) collected data on TTL for 1150 Project
Performance Audit Reports since 1995
• Some evidence of evaluator effects, but:
– does not undermine significance of TTL effects
– does not survive addition of other controls (likely reflects sectoral specialization of reviewers?)
• Evidence suggests there is a quantitatively-important
“human factor” in project outcomes
• But much more needs to be done:
– are there common attributes to TTLs who have a track record of successful projects?
– are there endogeneity problems in the
“assignment” of TTLs to projects?
– do higher levels of management matter?
– are there other dimensions, such as counterpart quality, that matter as well?
• Country-level policies and institutions do matter a lot for project outcomes
– don’t throw out baby with bathwater!
– (one more) piece of support for donor policies targetting aid to countries with better policy
– but at most this can help us with 20% of variation in project outcomes that occurs across countries
• The 80% of variation in project outcomes within countries challenges us to think hard about how to improve project success within countries, e.g.
– why are problem projects hard to turn around, or cancel outright once warning signs emerge?
– is there scope for project- as well as country-level aid allocation mechanisms to ensure better outcomes?
• e.g. what if WB were to allocate some resources to
“proposals” submitted by TTLs?
– analogous to NSF (or KCP) proposals to obtain research grants
– criteria for judging proposals could be tailored to reflect country and TTL characteristics
– how can we better learn about the effectiveness of
Bank inputs into project outcomes?
• Many more interesting questions to be answered using this kind of project data
– some preliminary evidence that projects managed by “decentralized” TTLs located in country of project do better
– assembling TTL-VPU assignment data to see if “3-
5-7”-induced TTL turnover matters for project outcomes
– working with colleagues at AfDB and AsDB to assemble similar data for their projects
– and much more....suggestions welcome!