* What’s involved in “rigorous impact evaluation”? IOCE proposes more holistic perspectives Presented by Jim Rugh to NONIE Conference in Paris 28 March 2011 Join me in a review the basics of: * An introduction to various evaluation designs Illustrating the need for quasi-experimental longitudinal time series evaluation design Project participants Comparison group baseline scale of major impact indicator end of project evaluation post project evaluation 4 … one at a time, beginning with the most rigorous design. 5 X = Intervention (treatment), I.e. what the project does in a community O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation) P (top row): Project participants C (bottom row): Comparison (control) group 6 Design #1: Longitudinal Quasi-experimental P1 X C1 P2 X C2 P3 P4 C3 C4 Project participants Comparison group baseline midterm end of project evaluation post project evaluation 7 Design #2: Quasi-experimental (pre+post, with comparison) P1 X P2 C1 C2 Project participants Comparison group baseline end of project evaluation 8 Design #2+: Typical Randomized Control Trial P1 X P2 C1 C2 Project participants Research subjects randomly assigned either to project or control group. Control group baseline end of project evaluation 9 Design #3: Truncated QED X P1 X C1 P2 C2 Project participants Comparison group midterm end of project evaluation 10 Design #4: Pre+post of project; post-only comparison P1 X P2 C Project participants Comparison group baseline end of project evaluation 11 Design #5: Post-test only of project and comparison X P C Project participants Comparison group end of project evaluation 12 Design #6: Pre+post of project; no comparison P1 X P2 Project participants baseline end of project evaluation 13 Design #7: Post-test only of project participants X P Project participants end of project evaluation 14 D e s i g n T1 T4 cont.) (endline) (ex-post) X P3 C3 P4 C4 X P2 C2 X P2 C2 X X P2 C2 X X P1 C1 X X P2 X X P1 (baseline) (intervention) 1 P1 C1 X 2 P1 C1 X 3 4 X P1 5 6 7 P1 T2 X T3 X (midterm) P2 C2 P1 C1 (intervention, Note: These 7 evaluation designs are described in the 15 RealWorld Evaluation book What kinds of evaluation designs are actually used in the real world (of international development)? Findings from meta-evaluations of 336 evaluation reports of an INGO. Post-test only 59% Before-and-after 25% With-and-without 15% Other counterfactual 1% Even proponents of RCTs have acknowledged that RTCs are only appropriate for perhaps 5% of development interventions. An empirical study by Forss and Bandstein, examining evaluations in the OECD/DAC DEReC database by bilateral and multilateral organisations found only 5% used even a counterfactual design. While we recognize that experimental and quasi experimental designs have a place in the toolkit for impact evaluations, we think that more attention needs to be paid to the roughly 95% of situations where these designs would not be possible or appropriate. * One form of Program Theory (Logic) Model Economic context in which the project operates Design Inputs Institutional and operational context Political context in which the project operates Implementation Process Outputs Outcomes Impacts Socio-economic and cultural characteristics of the affected populations Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the recommended more complete analysis. 19 Sustainability 20 Consequences Consequences Consequences PROBLEM PRIMARY CAUSE 1 Secondary cause 2.1 Tertiary cause 2.2.1 PRIMARY CAUSE 2 Secondary cause 2.2 Tertiary cause 2.2.2 PRIMARY CAUSE 3 Secondary cause 2.3 Tertiary cause 2.2.3 Consequences Consequences Consequences DESIRED IMPACT OUTCOME 1 OUTPUT 2.1 OUTCOME 2 OUTPUT 2.2 OUTCOME 3 OUTPUT 2.3 Intervention Intervention Intervention 2.2.1 2.2.2 2.2.3 High infant mortality rate Children are malnourished Insufficient food Contaminated water Flies and rodents Diarrheal disease Unsanitary practices Do not use facilities correctly Poor quality of food Need for improved health policies People do not wash hands before eating Reduction in poverty Women empowered Women in leadership roles Improved educational policies Parents persuaded to send girls to school Young women educated Economic opportunities for women Female enrollment rates increase Curriculum improved Schools built School system hires and pays teachers To have synergy and achieve impact all of these need to address the same target population. Program Goal: Young women educated Advocacy Project Goal: Improved educational policies enacted ASSUMPTION (that others will do this) Construction Project Goal: More classrooms built OUR project Teacher Education Project Goal: Improve quality of curriculum PARTNER will do this Program goal at impact level We need to recognize which evaluative process is most appropriate for measurement at various levels • Impact • Outcomes • Output • Activities • Inputs PROGRAM EVALUATION PROJECT EVALUATION PERFORMANCE MONITORING The “Rosetta Stone of Logical Frameworks” Needs-based American Red Cross AusAID Ultimate Impact End Outcomes Intermediate Outputs Outcomes Interventions Higher Consequence Program Goal Specific Problem Cause Solution Process Inputs Project Impact Outcomes Outputs Activities Inputs Major Development Objectives Intermediate Objectives Effects Outputs Activities Inputs Outputs Activities Inputs Outputs Activities Inputs Project purpose Intermediate Results Purpose Results/Outputs Outputs Activities Activities Inputs Inputs Outputs Activities Specific Objective Results Expected Results Activities Activities Immediate Objectives Purpose Results Intermediate Results Project Objective Strategic 27 Objective Intermediate Outputs Activities Inputs Outputs Objectives Outputs Activities Activities Activities Volunteers Inputs Outputs Activities Input/Resources Intermediate Results (Outputs) Activities Inputs (Activities) (Inputs) Scheme Goal CARE logframe Program Goal Project Final Goal CARE terminology CIDA + GTZ CRS Proframe Program Impact Project Impact DANIDA + DfID EIDHR Goal Overall goal Goal Strategic Objective Overall Objectives European Union Overall Project Purpose Objective FAO + UNDP + Development Objective NORAD PC/LogFrame Goal Peace Corps Purpose Goals Goal Strategic Objective SAVE – Results Framework UNHCR Sector Goal Objective USAID Final Goal LogFrame USAID Results Goal Strategic Objective * How do we know if the observed changes in the project participants or communities income, health, attitudes, school attendance, etc. are due to the implementation of the project credit, water supply, transport vouchers, school construction, etc. or to other unrelated factors? changes in the economy, demographic movements, other development programs, etc. 29 What change would have occurred in the relevant condition of the target population if there had been no intervention by this project? 30 * Control group = randomized allocation of subjects to project and non-treatment group Comparison group = separate procedure for sampling project and non-treatment groups that are as similar as possible in all aspects except the treatment (intervention) 31 2003 2006 J-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology… 2008 2010 2009 32 So, are Randomized Control Trials (RCTs) are the Gold Standard and should they be used in most if not all program impact evaluations? Yes or no? Why or why not? If so, under what circumstances should they be used? If not, under what circumstances would they not be appropriate? 33 Question needed for evidence-based policy What works? What interventions look like Discrete, standardized intervention How interventions work Pretty much the same everywhere Process needed for evidence uptake Knowledge transfer 34 Adapted from Patricia Rogers, RMIT University • Complicated, complex programs where there are multiple interventions by multiple actors • Projects working in evolving contexts (e.g. countries in transition, conflicts, natural disasters) • Projects with multiple layered logic models, or unclear cause-effect relationships between outputs and higher level “vision statements” (as is often the case in the real world of international development projects) 35 There are other methods for assessing the counterfactual Reliable secondary data that depicts relevant trends in the population Longitudinal monitoring data (if it includes nonreached population) Qualitative methods to obtain perspectives of key informants, participants, neighbors, etc. 36 A conventional statistical counterfactual (with random selection into treatment and control groups) is often not possible/appropriate: When conducting the evaluation of complex interventions When the project involves a number of interventions which may be used in different combinations in different locations When each project location is affected by a different set of contextual factors When it is not possible to use standard implementation procedures for all project locations When many outcomes involve complex behavioral changes When many outcomes are multidimensional or difficult to measure through standardized quantitative indicators. 37 Some of the alternative approaches for constructing a counterfactual A: Theory based approaches 1. Program theory / logic models 2. Realistic evaluation 3. Process tracing 4. Venn diagrams and many other PRA methods 5. Historical methods 6. Forensic detective work 7. Compilation of a list of plausible alternative causes 8. … (for more details see www.RealWorldEvaluation.org) Some of the alternative approaches for constructing a counterfactual B: Quantitatively oriented approaches 1. Pipeline design 2. Natural variations 3. Creative uses of secondary data 4. Creative creation of comparison groups 5. Comparison with other programs 6. Comparing different types of interventions 7. Cohort analysis 8. … (for more details see www.RealWorldEvaluation.org) Some of the alternative approaches for constructing a counterfactual C: Qualitatively oriented approaches 1. Concept mapping 2. Creative use of secondary data 3. Many PRA techniques 4. Process tracing 5. Compiling a book of possible causes 6. Comparisons between different projects 7. Comparisons among project locations with different combinations and levels of treatment (for more details see www.RealWorldEvaluation.org) * Different lenses needed for different situations in the RealWorld Simple Complicated Complex Following a recipe Sending a rocket to the moon Raising a child Recipes are tested to Sending one rocket to assure easy replication the moon increases assurance that the next will also be a success Raising one child provides experience but is no guarantee of success with the next The best recipes give good results every time Uncertainty of outcome remains There is a high degree of certainty of outcome Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008; also presented by Patricia Rodgers at Cairo impact conference 2009. 42 What’s a conscientious evaluator to do when facing such a complex world? Consequences Consequences Consequences DESIRED IMPACT OUTCOME 1 OUTCOME 2 OUTCOME 3 A more comprehensive design OUTPUT 2.1 OUTPUT 2.2 OUTPUT 2.3 A Simple RCT Intervention Intervention Intervention 2.2.1 2.2.2 2.2.3 Expanding the results chain for multi-donor, multi-component program Impacts Intermediate outcomes Outputs Inputs Increased rural H/H income Increased production Credit for small farmers Donor Increased political participation Access to offfarm employment Rural roads Government Improved education performance Increased school enrolment Improved health Increased use of health services Health services Schools Other donors Attribution gets very difficult! Consider plausible contributions each makes. * OECD-DAC (2002: 24) defines impact as “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended. These effects can be economic, sociocultural, institutional, environmental, technological or of other types”. Is it limited to direct attribution? Or point to the need for counterfactuals or Randomized Control Trials (RCTs)? 47 1. Direct cause-effect relationship between one output (or a very limited number of outputs) and an outcome that can be measured by the end of the research project? Pretty clear attribution. … OR … 2. Changes in higher-level indicators of sustainable improvement in the quality of life of people, e.g. the MDGs (Millennium Development Goals)? More significant. But assessing plausible contribution is more feasible than assessing unique direct attribution. 48 Rigorous impact evaluation should include (but is not limited to): 1) thorough consultation with and involvement by a variety of stakeholders, 2) articulating a comprehensive logic model that includes relevant external influences, 3) getting agreement on desirable ‘impact level’ goals and indicators, 4) adapting evaluation design as well as data collection and analysis methodologies to respond to the questions being asked, … Rigorous impact evaluation should include (but is not limited to): 5) adequately monitoring and documenting the process throughout the life of the program being evaluated, 6) using an appropriate combination of methods to triangulate evidence being collected, 7) being sufficiently flexible to account for evolving contexts, … Rigorous impact evaluation should include (but is not limited to): 8) using a variety of ways to determine the counterfactual, 9) estimating the potential sustainability of whatever changes have been observed, 10) communicating the findings to different audiences in useful ways, 11) etc. … The point is that the list of what’s required for ‘rigorous’ impact evaluation goes way beyond initial randomization into treatment and ‘control’ groups. To attempt to conduct an impact evaluation of a program using only one pre-determined tool is to suffer from myopia, which is unfortunate. On the other hand, to prescribe to donors and senior managers of major agencies that there is a single preferred design and method for conducting all impact evaluations can and has had unfortunate consequences for all of those who are involved in the design, implementation and evaluation of international development programs. We must be careful that in using the “Gold Standard” we do not violate the “Golden Rule”: “Judge not that you not be judged!” In other words: “Evaluate others as you would have them evaluate you.” Caution: Too often what is called Impact Evaluation is based on a “we will examine and judge you” paradigm. When we want our own programs evaluated we prefer a more holistic approach. To use the language of the OECD/DAC, let’s be sure our evaluations are consistent with these criteria: RELEVANCE: The extent to which the aid activity is suited to the priorities and policies of the target group, recipient and donor. EFFECTIVENESS: The extent to which an aid activity attains its objectives. EFFICIENCY: Efficiency measures the outputs – qualitative and quantitative – in relation to the inputs. IMPACT: The positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended. SUSTAINABILITY is concerned with measuring whether the benefits of an activity are likely to continue after donor funding has been withdrawn. Projects need to be environmentally as well as financially sustainable. The bottom line is defined by this question: Are our programs making plausible contributions towards positive impact on the quality of life of our intended beneficiaries? Let’s not forget them! 58 58