Implications of complication and complexity for evaluation Patricia J. Rogers CIRCLE (Collaboration for Interdisciplinary Research, Consulting and Learning in Evaluation) Royal Melbourne Institute of Technology, Australia Patricia.Rogers@rmit.edu.au Evaluation Revisited: Improving the quality of evaluative practice by embracing complexity Utrecht, the Netherlands May 20-21 2010 The naïve experimentalism view of evaluation and evidence-based policy and practice INTENDED BENEFICIARIES BENEFIT AS EXPECTED PRACTITIONERS DO THING ‘A’ POLICYMAKERS DECIDE TO DO THING ‘A’ SINGLE STUDY RESEARCHERS SEVERAL STUDIES FIND THAT THING ‘A’ WORKS 2 But things are often more complicated or complex than this … 3 What can (and does) go wrong with naïve experimentalism PRACTITIONERS NOT FEASIBLE IN OTHER LOCATIONS DO THING ‘A’ NOT SCALEABLE POLICYMAKERS DECIDE TO DO THING ‘A’ RANDOM ERROR RESEARCHERS FIND THAT THING ‘A’ WORKS NEGATIVE EFFECTS IGNORED MISREPRESENTATION OF RESULTS DIFFERENTIAL EFFECTS – THING ‘A’ ONLY WORKS IN SOME CONTEXTS NARROW STUDIES THAT IGNORE IMPORTANT EVIDENCE 4 An alternative view of knowledge- building 5 An approach to evaluation and evidence-based policy and practice that recognizes the complicated and complex aspects of situations and interventions Researchers and Policymakers evaluators What is needed? What is possible? What works? What works for whom in what situations? What is working? Practitioners and managers Community and civil society 6 Advocacy for RCTs (Randomised Controlled Trials) in development evaluation 2003 2006 “J-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology” “Advocated more use of RCTs 2009 2010 TED talk Used leeches to illustrate the alternative to using RCTs as evidence Argued that experimental and quasi-experimental designs had a comparative advantage because they provide an unbiased numeric estimate of impact 7 Distinguishing between RCTs and naïve experimentalism RCT (Randomised Controlled Trial) – one of many research designs that can be suitable – involves randomly assigning (truly randomly, not ad hoc) potential participants to either receive the treatment (or one of several version of the treatment) or to be in the control group (who might receive nothing or the current standard treatment) – in ‘double blind’ RCTs neither the participants nor the researchers know who is in the treatment group (eg the control group get pills that look the same and the details of the group are kept secret until after the results are recorded) Naïve experimentalism – believes that RCTs always provide the best evidence (the ‘gold standard’ approach) – ignores (or is ignorant) of the potential risks in using RCTs and the other approaches that can be appropriate 8 Exploring complication and complexity in evaluation 1997 2010 2006 2008 2008 2009 9 Some unhelpful ways ‘complex’ is used • Difficult – eg little available data, hard to get additional data • Beyond scrutiny – eg too technical for others to understand or challenge • Ad hoc – eg too overwhelmed with implementation to think about planning or follow through 10 Two framings of simple, complicated and complex Simple Glouberman and Zimmerman 2002 Kurtz and Snowden 2003 Tested ‘recipes’ assure replicability The domain of the ‘known’, Expertise is not needed Cause and effect are well understood, Best practices can be confidently recommended, Complicated Success requires high level of expertise in many specialized fields + coordination Complex Every situation is unique – previous success does not guarantee success The domain of the ‘knowable’ Expert knowledge is required, The domain of the ‘unknowable’, Patterns are only evident in retrospect. Expertise can help but is not sufficient; relationships are key 11 Using the framework Can be used to refer to a situation or to an intervention Not useful as a way of classifying the whole situation or intervention most useful to consider aspects of interventions Not normative complex is not better than simple simple interventions can still be difficult to do well, or to get good data about 12 Simple can sometimes be appropriate “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as “Everything should befew as possible without having to made assurrender simple the as adequate possible, but no simpler.” representation of a single datum of experience.” Albert Einstein, Oxford University, 1933 13 Implications of complicated and complex situations and interventions for evaluation 1. Focus 2. Governance 3. Consistency 4. Necessariness 5. Sufficiency 6. Change trajectory 7. Unintended outcomes (Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 14 1. Focus - implications for evaluation? Simple Single set of objectives Complicated Different objectives valued by different stakeholders Multiple, competing imperatives Objectives at multiple levels of a system Complex Emergent objectives Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 15 Focus - Objectives at multiple levels of a system Intervention A ctivities at system level A ctivities at site level A ctivities at client level S horter term outcom es at system level L on ger term outcom es S horter term outcom es at site level S horter term outcom es at client level Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 16 2. Governance - implications for evaluation? Simple Single organization Complicated Specific organizations with formalized requirements Complex Emergent organizations working together in flexible ways Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 17 3. Consistency - implications for evaluation? Simple Standardized Complicated Adapted Complex Adaptive Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 18 What interventions look like – teaching reading Simple – best practice Teachers select a reading program which has been shown in RCTs to be effective (eg Reading First program - $1b p.a) Complicated adapted Teachers identify children’s learning stage and provide exercises to match this (eg Victorian Catholic Education Systems Literacy Assessment Project) Griffin, P. 2009 ‘Ambitious new project to raise literacy and numeracy levels in Victorian Schools. http://newsroom.melbourne.edu/studio/ep-29 Griffin, P., Murray, L., Care, E., Thomas, A., & Perri, P. (2009). Developmental Assessment: Lifting literacy through Professional Learning Teams, Assessment in Education. In press 19 What interventions look like – supporting small businesses Complicated – what are the ‘active ingredients’ An RCT compares the effect on small businesses of providing (i) business training (ii) savings incentive (iii) wages support (iv) business training and savings incentive (v) business training and wages support (vi) savings incentive and wages support (McKenzie, 2010) Complex - adaptive A program works with small businesses to iteratively dentify what they need, and meet this need 20 4. Necessariness - implications for evaluation? Simple Only way to achieve the intended impacts Complicated One of several ways to achieve the intended impacts – which can be identified in advance Complex One of several ways to achieve the intended impacts – which are only evident in retrospect Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 21 Necessariness – with/without comparisons A US program to assist poor families through linking them to services found that families receiving the program experienced improvements in welfare — but so did the families that were randomly assigned to a control group that did not receive the visits (St. Pierre and Layzer 1999). [As this case shows], a good study helps avoid spending funds on ineffective programs and redirects attention to improving designs or to more promising alternatives.’ (When Will We Ever Learn?) But families in the control group had also accessed services.. The appropriate comparison would have been to compare the costs incurred in the different groups St Pierre et al, 1996 Report on the National Evaluation of the Comprehensive Child Development Program. Summary and links to reports available at http://www.researchforum.org/project_abstract_166.html 22 5. Sufficiency - implications for evaluation? Simple Sufficient to produce the intended impacts. Works the same for everyone Complicated Only works in conjunction with other interventions (previously, concurrently, or subsequently) and/or only works for some people and/or only works in some circumstances – which can be identified in advance Complex Only works in conjunction with other interventions (previously, concurrently, or subsequently) and/or only works for some people and/or only works in some circumstances – which is only evident in retrospect Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 23 : False negatives – the potted plant thought experiment If 200 potted plants are randomly assigned to either a treatment group that receives daily water, or to a control that receives none, and both groups are placed in a dark cupboard, the treatment group does not have better outcomes than the control. Possible conclusions: Watering plants is ineffective in making them grow. Better conclusion: Water is not sufficient. : False positives – Early Head Start • Early Head Start program - on average effective. Listed as an ‘evidence-based program’ • But unfavourable outcomes for children in families with high levels of demographic risk factors (Mathematica Policy Research Inc, 2002, Westhorp (2008) Westhorp, G (2008) Development of Realist Evaluation Methods for Small Scale Community Based Settings Unpublished PhD Thesis, Nottingham Trent University Mathematica Policy Research Inc (2002). Making a Difference in the Lives of Infants and Toddlers and Their Families: The Impacts of Early Head Start, Vol 1. US Department of Health and Human Services. 6. Change trajectory - implications for evaluation? Simple Simple relationship– readily understood Complicated Complicated relationship– needs expertise to understand and predict Complex Complex relationship (including tipping points)– cannot be predicted but only understood in retrospect Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 26 : Complicated dose-response relationship – does stress improve performance? 27 7. Unintended outcomes - implications for evaluation? Simple Unintended outcomes can be anticipated and monitored Complicated Different unintended outcomes are likely in particular combinations of circumstances – expertise is needed to anticipate them and identify them Complex Unintended outcomes cannot be anticipated but only identified (and addressed) as they emerge or in retrospect Funnell and Rogers 2010 Purposeful Program Theory. Jossey-Bass) 28 Some thoughts on how evaluation might help us to understand the complicated and the complex Issues that may need to be addressed 1. 2. 3. 4. 5. 6. 7. Focus Governance Consistency Necessariness Sufficiency Change trajectory Unintended outcomes Possible evaluation methods, approaches and methodologies • Emergent evaluation design that can accommodate emergent program objectives and emergent evaluation issues • Collaborative evaluation across different stakeholders and organisations • Non-experimental approaches to causal attribution/contribution that don’t rely on a standardized ‘treatment’ • Realist evaluation that pays attention to the contexts in which causal mechanisms operate • Realist synthesis that can integrate diverse evidence (including credible single case studies) in different contexts • ‘Butterfly nets’ to catch unanticipated results 29 Looking forward to hearing about your approaches to addressing these issues in evaluation 30