Quality Impact Evaluation: an introductory workshop Howard White International Initiative for Impact Evaluation Howard White www.3ieimpact.org PART I INTRODUCTION TO IMPACT EVALUATION Howard White www.3ieimpact.org What is impact and why does it matter? • Write down a definition of impact evaluation • Impact = the (outcome) indicator with the intervention compared to what it would have been in the absence of the intervention • Impact evaluation is the only meaningful way to measure results Howard White www.3ieimpact.org Why results? • Results agenda from early 1990s • USAID experience • Move away from outcome monitoring to impact evaluation to evidence based development Howard White www.3ieimpact.org What is evidence-based development? Allocating resources to programs designed and implemented on the basis of evidence of what works and what doesn’t Howard White www.3ieimpact.org Why did the Bangladesh Integrated Nutrition Program (BINP) fail? Howard White Why did the Bangladesh Integrated Nutrition Project (BINP) fail? www.3ieimpact.org Comparison of impact estimates Howard White www.3ieimpact.org Summary of theory Target group participate in program (mothers of young children) Target group for nutritional counselling is the relevant one Exposure to nutritional counselling results in knowledge acquisition and behaviour change Behaviour change sufficient to change child nutrition Children are correctly identified to be enrolled in the program Food is delivered to those enrolled Supplementary feeding is supplemental, i.e. no leakage or substitution Improved nutritional outcomes Food is of sufficient quantity and quality Howard White www.3ieimpact.org The theory of change Target group participate in program (mothers of young children) Target group for nutritional counselling is the relevant one Exposure to nutritional counselling results in knowledge acquisition and behaviour change Behaviour change sufficient to change child nutrition Children are correctly identified to be enrolled in the program Food is delivered to those enrolled Supplementary feeding is supplemental, i.e. no leakage or substitution Right target group for nutritional counselling Food is of sufficient quantity and quality Howard White www.3ieimpact.org Improved nutritional outcomes The theory of change Target group participate in program (mothers of young children) Target group for nutritional counselling is the relevant one Exposure to nutritional counselling results in knowledge acquisition and behaviour change Children are correctly identified to be enrolled in the program Food is delivered to those enrolled Knowledge acquired and used Behaviour change sufficient to change child nutrition Supplementary feeding is supplemental, i.e. no leakage or substitution Food is of sufficient quantity and quality Howard White www.3ieimpact.org Improved nutritional outcomes The theory of change Target group participate in program (mothers of young children) Target group for nutritional counselling is the relevant one Children are correctly identified to be enrolled in the program Exposure to nutritional counselling results in knowledge acquisition and behaviour change Behaviour change sufficient to change child nutrition Food is delivered to those enrolled Supplementary feeding is supplemental, i.e. no leakage or substitution The right children are enrolled in the programme Food is of sufficient quantity and quality Howard White www.3ieimpact.org Improved nutritional outcomes The theory of change Target group participate in program (mothers of young children) Target group for nutritional counselling is the relevant one Exposure to nutritional counselling results in knowledge acquisition and behaviour change Children are correctly identified to be enrolled in the program Food is delivered to those enrolled Supplementary feeding is supplementary Behaviour change sufficient to change child nutrition Improved nutritional outcomes Supplementary feeding is supplemental, i.e. no leakage or substitution Food is of sufficient quantity and quality Howard White www.3ieimpact.org P r o b a b ility o f p a r t ic ip a t io n in g r o w th m o n ito r in n g Participation rates 1 .0 L iv in g in R a jn a g a r o r S h a h ra s ti 0 .9 0 .8 0 .7 0 .6 L iv in g w ith m o th e r-in -la w in R o r S 0 .5 0 .4 0 .3 0 .2 0 .1 0 .0 B a s e v a lu e Howard White L iv in g w ith m o th e r-in -la w in R a jn a g a r o r S h a h ra s ti H ig h e r e d u c a tio n www.3ieimpact.org N o w a te r o r s a n ita tio n (re m o te lo c a tio n ) The attribution problem: factual and counterfactual Impact varies over time Impact varies over time Howard White www.3ieimpact.org … and is it sustainable? Howard White www.3ieimpact.org What has been the impact of the French revolution? “It is too early to say” Zhou Enlai Howard White www.3ieimpact.org • So where does the counterfactual come from? • Most usual is to use a comparison group of similar people / households / schools / firms… Howard White www.3ieimpact.org What do we need to measure impact? Girl’s secondary enrolment Before After Project (treatment) 92 comparison The majority of evaluations have just this information … which means we can say absolutely nothing about impact Howard White www.3ieimpact.org Before versus after single difference comparison Before versus after = 92 – 40 = 52 Project (treatment) Before After 40 92 comparison “scholarships have led to rising schooling of young girls in the project villages” Howard White This ‘before versus after’ approach is outcome monitoring, which has become popular recently. Outcome monitoring has its place, but it is not impact evaluation www.3ieimpact.org Rates of completion of elementary male and female students in all rural China’s poor areas Share of rural children 100 80 60 40 20 1993 2008 1993 2008 www.3ieimpact.org girls Howard White boys 0 Post-treatment comparison comparison Single difference = 92 – 84 = 8 Before After Project (treatment) 92 comparison 84 But we don’t know if they were similar before… though there are ways of doing this (statistical matching = quasi-experimental approaches) Howard White www.3ieimpact.org Double difference = (92-40)-(84-26) = 52-58 = -6 Before After Project (treatment) 40 92 comparison 26 84 Conclusion: Longitudinal (panel) data, with a comparison group, allow for the strongest impact evaluation design (though still need matching). SO WE NEED BASELINE DATA FROM PROJECT AND COMPARISON AREAS Howard White www.3ieimpact.org Exercise • What is the objective of your intervention? • Define up to three main outcome indicators for your intervention • Using hypothetical outcome data for one indicator write down the before/after, comparison/treatment matrix calculate the – Ex-post single difference – Before versus after (single difference) Before – Double difference Project impact estimates Comparison Howard White www.3ieimpact.org After Small n versus large n evaluation designs Howard White www.3ieimpact.org Main points so far • Analysis of impact implies a counterfactual comparison • Outcome monitoring is a factual analysis, and so cannot tell us about impact • The counterfactual is most commonly determined by using a comparison group If you are going to do impact evaluation you need a credible counterfactual using a comparison group - VERY PREFERABLY WITH BASELINE DATA Howard White www.3ieimpact.org However…. • This is for ‘large n’ interventions – There are a large number of units of intervention, e.g. children, households, firms, schools. – Examples of small n are policy reform and many (but not all) capacity building projects. – Some reforms (e.g. health insurance) can be given large n designs • ‘Small n’ interventions require either – Modelling (computable general equilibrium, CGE, models), e.g. trade and fiscal policy – Qualitative approaches, e.g. the impact of impact assessments – A theory-based large n study may have elements of small n analysis at some stages of the causal chain Howard White www.3ieimpact.org Thoughts on small n • Identify theory of change reflecting multiple players and channels of influence • Stakeholder mapping • Avoiding leading questions • Looking for footprints Howard White www.3ieimpact.org Example: channels for donor influence Direct influence on government Formal channels Semi-formal channels Informal channels Annual aid negotiations Direct high-level communication to recipient head of state Margins of CG meetings Direct communication from Minister to head of IFI Informal contacts IFI-own staff ‘Gentleman’s agreements’ Social contacts in country CG meetings Impact of donor financed TC Local consultative groups Indirect influence via IFIs Board SPA CG meetings Indirect via other agencies Like minded groups Margins CG/SPA meetings Joint programs Leadership by example Howard White www.3ieimpact.org Exercise • Which elements of your intervention are amenable to a large n impact evaluation design, and which a small n design? • Are there any bits left? Howard White www.3ieimpact.org Problems in implementing rigorous impact evaluation: selecting a comparison group • Contagion: other interventions • Spill over effects: comparison affected by intervention • Selection bias: beneficiaries are different • Ethical and political considerations Howard White www.3ieimpact.org The problem of selection bias • Program participants are not chosen at random, but selected through – Program placement – Self selection • This is a problem if the correlates of selection are also correlated with the outcomes of interest, since those participating would do better (or worse) than others regardless of the intervention Howard White www.3ieimpact.org Selection bias from program placement • A program of school improvements is targeted at the poorest schools • Since these schools are in poorer areas it is likely that students have home and parental characteristics are associated with lower learning outcomes (e.g. illiteracy, no electricity, child labour) • Hence learning outcomes in project schools will be lower than the average for other schools • The comparison group has to be drawn from a group of schools in similarly deprived areas Howard White www.3ieimpact.org Selection bias from self-selection • A community fund is available for community-identified projects • An intended outcome is to build social capital for future community development activities • But those communities with higher degrees of cohesion and social organization (i.e. social capital) are more likely to be able to make proposals for financing • Hence social capital is higher amongst beneficiary communities than non-beneficiaries regardless of the intervention, so a comparison between these two groups will overstate program impact Howard White www.3ieimpact.org Examples of selection bias • Hospital delivery in Bangladesh (0.115 vs 0.067) • Secondary education and teenage pregnancy in Zambia • Male circumcision and HIV/AIDS in Africa Howard White www.3ieimpact.org HIV/AIDs and circumcision: geographical overlay Howard White www.3ieimpact.org Main point There is ‘selection’ in who benefits from nearly all interventions. So need to get a comparison group which has the same characteristics as those selected for the intervention. Howard White www.3ieimpact.org Discussion • Is selection bias likely for your intervention? Why and how will it affect the attempt to measure impact? Howard White www.3ieimpact.org Dealing with selection bias • Need to use experimental or quasi-experimental methods to cope with this; this is what has been meant by rigorous impact evaluation • Experimental (randomized control trials = RCTs, commonly used in agricultural research and medical trials, but are more widely applicable) • Quasi-experimental – – – – Propensity score matching Regression discontinuity Pipeline approach Regressions (including instrumental variables) Howard White www.3ieimpact.org Randomization (RCTs) • Randomization addresses the problem of selection bias by the random allocation of the treatment • Randomization may not be at the same level as the unit of intervention – Randomize across schools but measure individual learning outcomes – Randomize across sub-districts but measure village-level outcomes • The less units over which you randomize the higher your standard errors • But you need to randomize across a ‘reasonable number’ of units – At least 30 for simple randomized design (though possible imbalance considered a problem for n < 200) – Can be as few as 10 for matched pair randomization Howard White www.3ieimpact.org Issues in randomization • Randomize across eligible population not whole population • Can randomize across the pipeline • Is no less unethical than any other method with a control group (perhaps more ethical), and any intervention which is not immediately universal in coverage has an untreated population to act as a potential control group • No more costly than other survey-based approaches Howard White www.3ieimpact.org Conducting an RCT • Has to be an ex-ante design • Has to be politically feasible, and confidence that program managers will maintain integrity of the design • Perform power calculation to determine sample size (and therefore cost) • Adopt strict randomization protocol • Maintain information on how randomization done, refusals and ‘cross-overs’ • A, B and A+B designs (factorial designs) • Collect baseline data to: – Test quality of the match – Conduct difference in difference analysis Howard White www.3ieimpact.org Exercise • Is any element of your intervention (or indirectly supported activities) amenable to randomization? • What are the unit of assignment and the unit of analysis? • Over how many units will you assign the treatment? • What is the treatment? What is the control? Howard White www.3ieimpact.org Quasi-experimental approaches • Possible methods – Propensity score matching – Regression discontinuity – Instrumental variables • Advantage: can be done ex post, and when random assignment not possible • Disadvantage: cannot be assured of absence of selection bias Howard White www.3ieimpact.org Propensity score matching • Need someone with all the same age, education, religion etc. • But, matching on a single number calculated as a weighted average of these characteristics gives the same result and matching individually on every characteristic – this is the basis of propensity score matching • The weights are given by the ‘participation equation’, that is a probit equation of whether a person participates in the project or not Howard White www.3ieimpact.org Propensity score matching: what you need • Can be based on ex post single difference, though double difference is better • Need common survey for treatment and potential comparison, or survey with common sections for matching variables and outcomes Howard White www.3ieimpact.org Propensity score matching: example of matching: water supply in Nepal Variable Before matching After matching Rural resident Treatment: 29% Comparison: 78% Treatment: 33% Comparison: 38% Richest wealth quintile Treatment: 46% Comparison: 2% Treatment: 39% Comparison: 36% H/h higher education Treatment: 21% Comparison: 4% Treatment: 17% Comparison: 17% Outcome (diarrhea incidence children<2) Treatment: 18% Comparison: 23% Treatment: 15% Comparison: 23% OR = 1.28 OR = 1.53 Howard White www.3ieimpact.org Regression discontinuity: an example – agricultural input supply program Howard White www.3ieimpact.org Naïve impact estimates • Total = income(treatment) – income(comparison) = 9.6 • Agricultural h/h only = 7.7 • But there is a clear link between net income and land holdings • And it turns out that the program targeted those households with at least 1.5 ha of land (you can see this in graph) • So selection bias is a real issue, as the treatment group would have been better off in absence of program, so single difference estimate is upward bias Howard White www.3ieimpact.org Regression discontinuity • Where there is a ‘threshold allocation rule’ for program participation, then we can estimate impact by comparing outcomes for those just above and below the threshold (as these groups are very similar) • We can do that by estimating a regression with a dummy for the threshold value (and possibly also a slope dummy) – see graph • In our case the impact estimate is 4.5, which is much less than that from the naïve estimates (less than half) Howard White www.3ieimpact.org Exercise • What design would you use to establish the impact of your intervention on the outcomes of interest? Howard White www.3ieimpact.org PART II THEORY-BASED IMPACT EVALUATION Howard White www.3ieimpact.org “I think you should be more explicit here in Stage 2…” Howard White www.3ieimpact.org The missing middle Howard White www.3ieimpact.org Think about theory Test scores Pupil teacher ratio Howard White www.3ieimpact.org Test scores Pupil teacher ratio Howard White www.3ieimpact.org Examples of ‘atheoretical’ IEs • School capitation grant studies that don’t ask how the money was used • BCC intervention studies that don’t ask if behaviour has changed (indeed, almost any study that does not capture behavior change) • Microfinance studies that don’t look at use of funds and cash flow • Studies of capacity development that don’t ask if knowledge acquired and used Howard White www.3ieimpact.org Examples of theory-based impact evaluation Bangladesh Integrated Nutrition Program Howard White Orissa rural sanitation www.3ieimpact.org Common questions in TBIE Design Implementation Evaluation questions Targeting The right people? Type I and II errors Why is targeting failing? (protocols faulty, not being followed, corruption ..) Training / capacity building Right people? Is it appropriate? Mechanisms to ensure skills utilized Quality of delivery Skills / knowledge acquired and used Have skills / knowledge changed? Are skills / knowledge being applied? Do they make a difference to outcomes? Intervention delivery Tackling a binding constraint? Appropriate? Within local institutional capacity Delivered as intended: protocols followed, no leakages, technology functioning and maintained What problems have been encountered in implementation? When did first benefits start being realized? How is the intervention perceived by IA staff and beneficiaries? Behavior change Is desired BC culturally possible and appropriate; will it benefit intended beneficiaries? Is BC being promoted as Is behavior change occurring? intended (right people, If not, why not? right message, right media?) Howard White www.3ieimpact.org The principles • Map out the causal chain (programme theory): see figure from BINP example • Understand context: Bangladesh is not Tamil Nadu • Anticipate heterogeneity: more malnourished children; different implementing agencies • Rigorous evaluation of impact using an appropriate counterfactual: PSM versus simple comparison • Rigorous factual analysis: targeting, KP gap, CNPs • Use mixed methods: informed by anthropology, focus groups, own field visits Howard White www.3ieimpact.org Using theory to avoid IE • Were all the other links in the causal chain in place and working? • Example, sustainability of social fund investments, three aspects – Technical capacity – Institutional mechanisms – Financial Howard White www.3ieimpact.org Exercise • Map out theory of change underlying your project • What are the related evaluation questions? Howard White www.3ieimpact.org MIXED METHODS Howard White www.3ieimpact.org Different parts of causal chain require different analysis • Factual versus counterfactual • Examples of factual – Use of funds – Targeting – Participatory processes • Quantitative (and different types of quantitative) and qualitative (includes but not restricted to participatory methods) and the combination of the two • And cost data for cost effectiveness or cost-benefit analysis Howard White www.3ieimpact.org Examples from Andhra Pradesh Self Help Groups Number of SHGs and % penetration Drop outs & corrupt practices The angry man Returns to cows and goats: quantitative ethnography Howard White www.3ieimpact.org Why the angry man was angry Loan allocation is to households not individuals Howard White www.3ieimpact.org More examples • The disconnected in connected villages pretty much everywhere • The role of the community in social funds in Malawi and Zambia Howard White www.3ieimpact.org Poorest don’t connect 90 … then it takes 7 years for the next 10% to connect 80 70 Electrification rate 60 All households Another 10% connect in the next two years... 54% connect in first year Poor households 50 40 30 20 10 0 0 2 4 6 8 Years since grid connection Howard White www.3ieimpact.org 10 12 14 People participate in making bricks, not decisions Howard White www.3ieimpact.org Howard White www.3ieimpact.org • Cost data, e.g. DCPP on $/DALY • CER and CBA • WTP Howard White www.3ieimpact.org Also mix methods for identification strategy Howard White www.3ieimpact.org AP village fund allocation Fixed funds per community: more households per SHG Lower membership rates in larger villages Howard White www.3ieimpact.org The puzzle of the disconnected households in Laos Howard White www.3ieimpact.org You can’t carry electricity on boats Howard White www.3ieimpact.org Mixing methods • Understanding context to – Shape evaluation questions (empowering the poor = democratic IE) – Design data collection • Mapping out theory(ies) of change • Addressing factual questions, leading to… • … interpretation of counterfactual findings Howard White www.3ieimpact.org General principle: the quality of data deteriorates the more formal the process of data collection Howard White www.3ieimpact.org What do questionnaires miss? • Consumption – Festivals – Labour exchange – Wildfoods • Net income from household enterprises • Abuse of nearly all kinds • Who is a household member? Howard White www.3ieimpact.org Protein in Northern Zambia Howard White www.3ieimpact.org What is to be done? • • • • Know what questions to ask and how Proxy measures Enumerator training Contrived informality Howard White www.3ieimpact.org The challenge of integration • Parallel studies not integrated studies (multi-disciplinary not inter-disciplinary) • Why? – At best silo mentality, at worst arrogance (“trust me, I’m an economist”) – Academic incentives – People just don’t know how to do it • What to do? – Start with theory of change – Detailed team discussions around causal chain – Team members who bridge studies – Quality of external peer review Howard White www.3ieimpact.org Exercise What sort of data do you require to answer your evaluation questions, and how will you collect those data? Howard White www.3ieimpact.org PART III MANAGING IMPACT EVALUATIONS Howard White www.3ieimpact.org When to do an impact evaluation • Different stuff – Pilot programs – Innovative programs – New activity areas • Established stuff – Representative programs – Important programs • Look to fill gaps Howard White www.3ieimpact.org What do IE managers need to know? • If an IE is needed and viable • Your role as champion • The importance of ex ante designs with baseline (building evaluation into design) – Funding issues • The importance of a credible design with a strong team (and how to recognize that) – Help on design • Ensure management feedback loops Howard White www.3ieimpact.org Issues in managing IEs • Different objective functions of managers and study teams • Project management buy-in • Trade-offs – On time – On richness of study design Howard White www.3ieimpact.org Overview on data collection • • • • • • Baseline, midterm and endline Treatment and comparison Process data Capture contagion and spillovers Quant and qual Different levels (e.g. facility data, worker data) – link the data • Multiple data sources Howard White www.3ieimpact.org Data used in BINP study • Project evaluation data (three rounds) • Save the Children evaluation • Helen Keller Nutritional Surveillance Survey • DHS (one round) • Project reports • Anthropological studies of village life • Action research (focus groups, CNP survey) Howard White www.3ieimpact.org Some study costs • IADB vocational training studies: US$20,000 each • IEG BINP study US$40,000-60,000 • IEG rural electrification study US$120,000 • IEG Ghana education study US$500,000 • Average 3ie study US$300,000 + • Average 3ie study in Africa with two rounds of surveys; US$500,000 + Howard White www.3ieimpact.org Some timelines • Ex post 12-18 months • Ex ante: – lead time for survey design 3-6 months – Post-survey to first impact estimates 6-9 months – Report writing and consultation 3-6 months Howard White www.3ieimpact.org Budget and timeline • • • • • Ex post or ex ante Existing data or new data How many rounds of data collection? How large is sample? When is it sensible to estimate impact? Howard White www.3ieimpact.org Exercise • Propose for your PROPOSED IMPACT EVALUATION – Management structure (quality assurance) – Timeline for impact evaluation – Budget AND THEN GROUP PRESENTATIONS OF NO MORE THAN FIVE MINUTES Howard White www.3ieimpact.org Presentations and voting Howard White www.3ieimpact.org Remember Results means impact But be selective Be issues-driven not methods driven Find best available method for evaluation questions at hand Randomization often is possible But do ask, is this sufficiently credible to be worth doing? PLEASE COMPLETE YOUR EXIT SURVEY Howard White www.3ieimpact.org Thank you Visit www.3ieimpact.org Howard White www.3ieimpact.org