Good Countries or Good Projects?

Good Countries or Good Projects?

Micro and Macro Correlates of World Bank

Project Performance

Cevdet Denizer (Bosphorus University)

Daniel Kaufmann (Revenue Watch and Brookings Institution)

Aart Kraay (World Bank)

IEG Evaluation Week Presentation

March 18, 2013

Motivation

• Huge literature on aid effectiveness at two levels:

– “macro” level – e.g. does total aid raise aggregate

GDP growth?

– “micro” level – e.g. evaluations (randomized or otherwise) of individual projects

• Know much less about the relative importance of project-specific versus country-specific factors in determining project outcomes

– “macro” literature uninformative about individual projects

– “micro” literature (mostly) does not have crosscountry dimension

This Paper

• Uses very large sample of 6000+ World Bank projects since 1980s

– crude, but credible, outcome measure for each project based on internal evaluation processes

(IEG project success ratings)

• Match these up with two types of potential correlates of project success:

– “macro” country-level variables (easy...)

– “micro” project-level variables (hard...but interesting)

Preview of Main Results

• Project-level outcomes vary much more within countries than between countries

• Limited cross-country average variation in project performance is well-explained by standard “macro” variables

• Look at variety of “micro” project-level correlates of project-level outcomes

– basic project characteristics

– early-warning indicators

– identity of task team leader

– much more to be done here since this is where most of the action is!

Many Potential Concerns with Outcome Measure

• very crude (sat/unsat, or 6 point scale after 1995)

– definitely not randomized evaluations!

• projects assessed relative to development objective only, these are not standardized across projects

– different standards for DOs across different sectors?

• include sector dummies

– evolving standards for setting DOs and evaluating them?

• include sector x approval period dummies

• include sector x evaluation period dummies

– “setting bar low” in difficult countries?

Potential Concerns, Cont’d

• significant self-reporting component

– incentives of task managers to give poor ratings?

• independence of IEG?

• many steps from effective individual World Bank projects to any macro growth effects of aid

Despite these concerns, these ratings seem broadly credible and have advantage of huge country-yearproject coverage

Setup of Empirical Results

• Start with universe of 7342 completed projects evaluated since 1983, and construct two subsets based on (i) availability of RHS variables and (ii) units of evaluation ratings

– 6569 projects evaluated 1983-2011 (binary outcome variable)

– 4191 projects evaluated 1995-2011 (6-point outcome variable)

• All specifications control for:

– potential mean differences across three types of evaluations

– evaluation lag (time between evaluation and completion), usually significantly negative

“Macro” Correlates of Project Outcomes

• “Standard” set of country-level variables from literature

– Good policy (CPIA)

– Shocks (GDP growth)

– Democracy (Freedom House)

• Average each one over life of project

– non-trivial decision how to do this, because projects last a long time (median=6 years)

• alternatives might be initial? final? weighting? separately by year of project life?

Results: “Macro” Correlates

Dependent Variable Is:

Real GDP Per Capita Growth

CPIA Rating

Freedom House Rating

Number of Observations

R-Squared

Sector Dummies Y

Sector x Evaluation Period Dummies Y

Sector x Approval Period Dummies Y

Estimation Method OLS

0.00434

(0.99)

6569

0.122

All Projects

1983-2011

Sat/Unsat

1995-2011

1-6 Rating

1.915***

(8.53)

0.118***

(9.70)

4.839***

(6.36)

0.533***

(10.81)

0.0143

(0.88)

4191

0.143

Y

Y

Y

OLS

-0.00653

(-0.59)

1936

0.165

Y

Y

Y

OLS

1983-2011

Sat/Unsat

AFR Sample

1995-2011

1-6 Rating

2.316***

(4.96)

0.118***

(5.11)

5.892***

(3.51)

0.488***

(4.74)

-0.00469

(-0.11)

1172

0.173

Y

Y

Y

OLS

Results: “Macro” Correlates

• Generally sensible results in full sample

– policies/institutions matter a lot

• validation of CPIA in PBA

– growth matters

– no strong evidence that political rights/civil liberties matter

Results: From “Macro” to “Micro” Correlates

• Country-level variables by construction will explain only country-level average variation in project outcomes

• But, country-level average variation in project outcomes is only 20% of the total variation in project outcomes

– based on regression of project outcomes on country dummies, by year – average R-squared is about 0.2

– “macro” correlates explain this 20% reasonably well

• Points to importance of considering project-level factors

(which we do next)

Project Outcome Ratings and Country

Performance

1 2 3 4

Average CPIA Score over Life of Project

IEG Rating Fitted Values

5

“Micro” Correlates of Project Outcomes, 1

• dummy for investment lending (vs DPLs, SALs, etc)

• three proxies for complexity

– “concentration” of project in its major sector

– dummy for “repeater” projects, e.g. Botswana

Education II, III are repeats, Education I is not

– ln(size in dollars)

• project length (years from approval to evaluation)

• preparation and supervision costs as share of total project size

Results: Basic Project Characteristics

All Projects AFR Projects

1983-2011

Sat/Unsat Dependent Variable Is:

1995-2011

1-6 Rating

1983-2011

Sat/Unsat

1995-2011

1-6 Rating

Dummy for Investment Projects

Share of Project in Largest Sector

Dummy for Repeater Projects

Log(Total Project Size)

Project length (years)

Log(Preparation Costs/Total Size)

Log(Supervision Costs/Total Size)


R-Squared

Sector Dummies

Sector x Evaluation Period Dummies

Sector x Approval Period Dummies

Estimation Method

0.0489*

(1.73)

0.0771

(0.81)

-0.00111*** -0.00305***

(-3.31) (-3.28)

0.00323

(0.25)

-0.0486***

(-4.46)

-0.0126

(-0.27)

-0.136***

(-3.72)

-0.00523

(-1.12)

-0.00664

(-0.83)

-0.0479***

(-4.55)

6569

0.130

Y

Y

Y

OLS

-0.0307**

(-2.11)

-0.0419

(-1.46)

-0.137***

(-3.93)

4191

0.156

Y

Y

Y

OLS

0.0603

(1.08)

-0.00145**

(-2.06)

-0.0235

(-0.85)

-0.0673***

(-3.22)

-0.0135

(-1.46)

-0.0114

(-0.64)

-0.0628***

(-3.20)

1936

0.177

Y

Y

Y

OLS

-0.0534*

(-1.81)

-0.00414

(-0.08)

-0.148**

(-2.38)

1172

0.188

Y

Y

Y

OLS

0.430**

(2.41)

-0.00431**

(-2.25)

-0.0127

(-0.13)

-0.0777

(-1.13)

Results: Basic Project Characteristics

• Investment projects do slightly better

• Mixed results on complexity

– projects more concentrated in one sector do worse??

– “repeater” projects don’t do better?

– larger projects do worse

• Length, preparation (and especially supervision) costs negatively correlated with outcomes

– big-time endogeneity problem – e.g. “difficult” projects require more preparation, supervision, take longer

– more on this later (and in paper)

“Micro” Correlates of Project Outcomes, 2

• Effectiveness delay (time in quarters from approval to first disbursement)

• “Early-warning” indicators of problem projects from endof-FY Implementation Status Review (ISR) Reports for each year project is active

• “problem project” flag – raised if task manager thinks progress towards development objective is unsatisfactory

• “potential problem” flag – raised if three or more of 12 detailed flags are raised

• dummy for restructuring (very rare)

– dummy=1 if these flags observed in first half of project

(only for projects lasting at least four years)

Results: Early Warning Indicators


All Projects

1983-2011 1995-2011

Sat/Unsat 1-6 Rating

AFR Projects

1983-2011 1995-2011

Sat/Unsat 1-6 Rating

Time from Approval to

First Disbursement (quarters)

0.00237

(1.49)

0.0110**

(2.23)

0.00232

(0.75)

0.0135

(1.55)

Dummy for Restructuring

During First Half of Project

0.0978*

(1.94)

Dummy for Problem Project Flag


Dummy for Potential Problem Flag


-0.141***

(-7.19)

-0.0381

(-1.54)


R-Squared

3764

0.156

Sector Dummies Y

Sector x Evaluation Period Dummies Y

Sector x Approval Period Dummies

Estimation Method

Y

OLS

0.355***

(2.89)

-0.374***

(-6.33)

-0.100

(-1.48)

2682

0.181

Y

Y

Y

OLS

0.269**

(2.52)

-0.109***

(-2.90)

-0.0824*

(-1.86)

1082

0.200

Y

Y

Y

OLS

0.786***

(3.34)

-0.198*

(-1.83)

-0.213*

(-1.77)

785

0.230

Y

Y

Y

OLS

Approval

First Half of

Implementation

Second Half of

Implementation

Evaluation

3764 Projects

Approval

First Half of

Implementation

Second Half of

Implementation

Evaluation

3764 Projects

943 Problem

Projects 25%

2821 Good

Projects 75%

Approval

3764 Projects

First Half of

Implementation

943 Problem

Projects 25%

2821 Good

Projects 75%

Second Half of

Implementation

592 Problem

Projects 63%

351 Good

Projects 37%

853 Problem

Projects 30%

1968 Good

Projects 70%

Evaluation

Approval

3764 Projects

First Half of

Implementation

943 Problem

Projects 25%

2821 Good

Projects 75%

Second Half of

Implementation

592 Problem

Projects 63%

351 Good

Projects 37%

853 Problem

Projects 30%

1968 Good

Projects 70%

Evaluation

41% Success

81% Success

48% Success

87% Success

Approval

3764 Projects

First Half of

Implementation

943 Problem

Projects 25%

2821 Good

Projects 75%

Second Half of

Implementation

592 Problem

Projects 63%

351 Good

Projects 37%

853 Problem

Projects 30%

1968 Good

Projects 70%

Evaluation

41% Success

55%

81% Success

48% Success

75%

87% Success

Overall 71%

Success Rate

Results: Early Warning Indicators

• Effectiveness delays are associated with slightly

better outcomes

• Problem Project Flag raised in first half of life of project are highly significantly negative

• not a mechanical correlation with outcome

• potential problem flags also significant in AFR

• Restructurings are positively correlated with outcomes (more so in AFR)

• Again partial correlations are hard to interpret – e.g. a “difficult” project is more likely to be flagged and is more likely to turn out unsuccessful

Role of Unobserved (by us) Project

Characteristics

• Many of the project variables respond endogenously to project characteristics, e.g.

– “difficult” projects require more supervision, are more likely to be flagged, and also are more likely to be unsuccessful

– creates downward bias in OLS estimates of effects of interventions such as supervision

• Can’t rely on standard solutions like randomized controlled assignment of Bank inputs (infeasible) or instrumental variables (unjustifiable)

Paper has details on alternative approach to quantify likely biases – with reasonable assumptions can retrieve intuitively-plausible positive effects of supervision, flags, etc. on project outcomes – but magnitude hard to pin down precisely

Role of Task Team Leaders

• Task team leader (TTL) is important World Bank “input” into projects

• We have data on the staff ID number of the TTL:

– from final ISR, for 3,925 projects in post-1995 sample

• publicly available in Project Portal

– for each ISR, for 3,187 projects in post-1995 sample

• use to investigate TTL turnover

• Explore two practical questions:

– How important are TTL fixed effects relative to country fixed effects?

– How important is TTL “quality” relative to other correlates of project outcomes?

Country Effects vs TTL Effects

• In order to investigate this, need a sample where there is meaningful variation across countries and TTLs

– e.g. if each TTL worked in only one country, can’t separately identify country and TTL effects

• Restrict attention to sample of 2407 projects where TTL has managed (i) more than one project, and (ii) in more than one country

– covers 136 countries and 710 TTLs

• For projects where we have “time series of TTLs” by ISR within projects, also identify “Initial” TTL, as distinct from “Final” TTL at time of final ISR

– look at subset of projects where “Initial” and “Final” TTL are different to separately identify “Initial” and “Final” TTL effects

How Much Does TTL “Quality” Matter?

• Proxy for quality of TTL on a given project as average IEG rating on all other projects with same TTL

– only for projects with TTLs managing two or more projects

– variant 1: define quality as average IEG rating over

previous projects managed by same TTL

– variant 2: define quality as weighted average (by number of ISRs) of all other projects the TTL was ever responsible for (not just at the end of project)

• TTL “turnover” is average number of TTLs per ISR

– median project lasts six years, has 12 ISRs, and 2 TTLs

Results: TTL Quality and Project Outcomes


(1)

All Projects Evaluated 1995-2011

(2) (3) (5) (6) (7)

1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating 1-6 Rating

CPIA Rating 0.539*** 0.542*** 0.413*** 0.471*** 0.458*** 0.317***

(10.63) (8.88) (6.73) (7.84) (7.39) (3.21)

TTL Quality (Average Outcome on

all Other Projects)

TTL Quality (Average Outcome on all

Previous Projects)

0.180***

(6.29)

0.155***

(5.31)

TTL Quality (ISR-Weighted Average

Outcome on all Other Projects)

TTL Turnover (Number of TTLs per ISR)

0.188***

(4.32)

Evaluator "Toughness" (Average

Outcome of all Other Projects Rated

By Same Evaluator)


R-Squared

2407

0.084

1707

0.082

1783

0.049

0.167*** 0.131*** 0.0969**

(5.19) (3.99) (2.42)

-1.282***

(-6.08)

-1.672***

(-4.79)

0.271*** 0.0660

(3.50) (0.77)

1895

0.089

1672

0.059

1063

0.227


• TTL quality is highly significant with economically large effects, e.g. consider move from P25 to P75 of:

– TTL Quality: 3.5→4.75, IEG score ↑ by 0.23

– CPIA Score: 3.1→3.6, IEG score ↑ by 0.22

– Alternative quality measures have similarly large effects

• TTL turnover is highly significant – moving from 2/12 TTLs per ISR to 3/12 TTLs per ISR implies IEG score ↑ by 0.10

– but need to be cautious about endogeneity of TTL turnover

– much more to be done here, e.g. to better understand costs and benefits of 3-5-7 rule


• So far have focused on TTL effects – but could very well also be evaluator effects

– are there “tough” and “easy” evaluators?

– how do they match to TTLs?

• Two data sources on evaluator identity

– anonymized data from IEG on staff who do desk reviews of ICRs, for each project since 1995

– manually (!) collected data on TTL for 1150 Project

Performance Audit Reports since 1995

• Some evidence of evaluator effects, but:

– does not undermine significance of TTL effects

– does not survive addition of other controls (likely reflects sectoral specialization of reviewers?)


• Evidence suggests there is a quantitatively-important

“human factor” in project outcomes

• But much more needs to be done:

– are there common attributes to TTLs who have a track record of successful projects?

– are there endogeneity problems in the

“assignment” of TTLs to projects?

– do higher levels of management matter?

– are there other dimensions, such as counterpart quality, that matter as well?

Policy Implications

• Country-level policies and institutions do matter a lot for project outcomes

– don’t throw out baby with bathwater!

– (one more) piece of support for donor policies targetting aid to countries with better policy

– but at most this can help us with 20% of variation in project outcomes that occurs across countries

Policy Implications, Cont’d

• The 80% of variation in project outcomes within countries challenges us to think hard about how to improve project success within countries, e.g.

– why are problem projects hard to turn around, or cancel outright once warning signs emerge?

– is there scope for project- as well as country-level aid allocation mechanisms to ensure better outcomes?

• e.g. what if WB were to allocate some resources to

“proposals” submitted by TTLs?

– analogous to NSF (or KCP) proposals to obtain research grants

– criteria for judging proposals could be tailored to reflect country and TTL characteristics

– how can we better learn about the effectiveness of

Bank inputs into project outcomes?

Pipeline

• Many more interesting questions to be answered using this kind of project data

– some preliminary evidence that projects managed by “decentralized” TTLs located in country of project do better

– assembling TTL-VPU assignment data to see if “3-

5-7”-induced TTL turnover matters for project outcomes

– working with colleagues at AfDB and AsDB to assemble similar data for their projects

– and much more....suggestions welcome!

Good Countries or Good Projects?

Good Countries or Good Projects?

Micro and Macro Correlates of World Bank

Project Performance

Motivation

This Paper

Preview of Main Results

Many Potential Concerns with Outcome Measure

Potential Concerns, Cont’d

Setup of Empirical Results

“Macro” Correlates of Project Outcomes

Results: “Macro” Correlates

Results: “Macro” Correlates

Results: From “Macro” to “Micro” Correlates

Project Outcome Ratings and Country

Performance

“Micro” Correlates of Project Outcomes, 1

Results: Basic Project Characteristics

Results: Basic Project Characteristics

“Micro” Correlates of Project Outcomes, 2

Results: Early Warning Indicators

Results: Early Warning Indicators

Role of Unobserved (by us) Project

Characteristics

Role of Task Team Leaders

Country Effects vs TTL Effects

How Much Does TTL “Quality” Matter?

Results: TTL Quality and Project Outcomes

Results: TTL Quality and Project Outcomes

Results: TTL Quality and Project Outcomes

Results: TTL Quality and Project Outcomes

Policy Implications

Policy Implications, Cont’d

Pipeline

Related documents

Products

Support

Good Countries or Good Projects?