Monitoring and Evaluation: Evaluation Designs Objectives of the Session By the end of this session, participants will be able to: • Understand the purpose, strengths, and shortcomings of different study designs • Distinguish between study designs that enable us to causally link program activities to observed changes and study designs that do not • Link evaluation designs to the types of decisions that need to be made Causality Requirements • A precedes B. • B is present only when A is present. • We can rule out all other possible causes of B. The Basic Experimental Principle • The intervention is the only difference between two groups • This is achieved by random assignment Class Activity Can you name situations in which random assignment can be used in evaluation? An Experimental Design RA Experimental group O1 Control group O3 X O2 O4 An Experimental Design-Cont’d. • In this design, there are two groups, an experimental group and a control group. Both have been randomly selected and both complete the pre-test. Only the experimental group gets the intervention, then both groups complete the posttest. An Experimental Design-Cont’d. Steps 1. Identify people or groups, some of which could get the intervention. 2. Pre-test everyone. 3. Randomly assign some of the participants to either the control group or the experimental group. 4. Deliver the intervention to the experimental group. The control group may receive an alternative intervention or nothing at all. 5. Post-test both groups with the same instrument under the same conditions. Factors that May Lead Us to Make Invalid Conclusions • Dropout: There may be loss to follow-up. • Instrumentation effects: Occur when a questionnaire is changed between pre-test and post-test. • Testing effects: Occur because study participants remember questions that were asked of them at pre-test and perform better at post-test because they are familiar with the questions. A Second Experimental Design Experimental group RA Control group X O2 O4 A Second Experimental Design-Cont’d • In this design, experimental and control groups are formed; however, there is no pre-test. Instead, the experimental group gets the intervention and then both groups are measured at the end of the program. A Non-Experimental Design Time Experimental group O1 X O2 A Non-Experimental Design-Cont’d • In this method of evaluation, only people who are participating in the program get the pre- and post-test. Steps 1. Pre-test everyone in the program. 2. Deliver the intervention. 3. Post-test the same individuals. This design does not provide any information about what kinds of results might have occurred without the program and is the weakest in terms of scientific rigor. Another Factor that May Lead to Invalid Conclusions • History effects: These occur when extraneous events (events that occur outside the study) influence study-measured outcomes. A Second Non-Experimental Design Time Experimental group O1 O2 O3 X O4 O5 O6 A Second Non-Experimental DesignCont’d • For this design, a survey is administered multiple times - before, during, and after a program A Second Non-Experimental DesignCont’d Steps 1. Select a program-outcome measure that can be used repeatedly. 2. Decide who will be in the experimental group. Will it be the same group of people measured many times, or will it be successive groups of different people? 3. Collect at least three measurements prior to the intervention that were made at regular intervals. 4. Check the implementation of the intervention. 5. Continue to collect measurements, at least through the duration of the program. A Quasi-Experimental Design Time Experimental group O1 X O2 --------------------------------- Comparison group O3 O4 A Quasi-Experimental DesignCont’d. • In this design, two groups which are similar, but which were not formed by random assignment, are measured both before and after one of the groups gets the program intervention. A Quasi-Experimental DesignCont’d. Steps 1. Identify people who will be getting the program. 2. Identify people who are not getting the program, but are other ways very similar. 3. Pre-test both groups. 4. Deliver the intervention to the experimental group. The control group may receive an alternative intervention or nothing at all. 5. Post-test both groups. Threat to Validity • Selection effects: Occur when people selected for a comparison group differ from the experimental group. Summary Features of Different Study Designs True experiment Quasi-experiment Non-experimental Partial coverage/ new programs Partial coverage/ new programs Full coverage programs Control group Comparison group -- Strongest design Weaker than experimental design Weakest design Most expensive Moderately expensive Least expensive Summary Features of Different Study DesignsCont’d. I. Non-experimental (One-Group, Post-Only) IMPLEMENT PROGRAM II. ASSESS TARGET GROUP AFTER PROGRAM Non-experimental (One-Group, Pre- and Post-Program) ASSESS TARGET GROUP BEFORE PROGRAM IMPLEMENT PROGRAM ASSESS TARGET GROUP AFTER PROGRAM Summary Features of Different Study Designs-ctd III. Experimental (Pre- and Post-Program with Control Group) RANDOMLY ASSIGN PEOPLE FROM THE SAME TARGET POPULATION TO GROUP A OR GROUP B TARGET GROUP A CONTROL GROUP B ASSESS TARGET GROUP A ASSESS TARGET GROUP A IMPLEMENT PROGRAM WITH TARGET GROUP A ASSESS TARGET GROUP A ASSESS CONTROL GROUP B Summary Features of Different Study Designs IV. Quasi-Experimental (Pre- and Post-Program with NonRandomized Comparison Group) ASSESS TARGER GROUP BEFORE PROGRAM ASSESS COMPARISON GROUP BEFORE PROGRAM IMPLEMENT PROGRAM ASSESS TARGET GROUP AFTER PROGRAM ASSESS COMPARISON GROUP AFTER PROGRAM Summary Features of Different Study Designs-Cont’d. • • • The different designs vary in their capacity to produce information that allows you to link program outcomes to program activities. The more confident you want to be about making these connections, the more rigorous the design and costly the evaluation. Your evaluator will help determine which design will maximize your program’s resources and answer your team’s evaluation questions with the greatest degree of certainty. Important Issues to Consider When Choosing a Design • • • • Complex evaluation designs are most costly, but allow for greater confidence in a study’s findings. Complex evaluation designs are more difficult to implement, and so require higher levels of expertise in research methods and analysis. Be prepared to encounter stakeholder resistance to the use of comparison or control groups, such as a parent wondering why his or her child will not receive a potentially beneficial intervention No evaluation design is immune to threats to its validity; there is a long list of possible complications associated with any evaluation study. However, your evaluator will help you maximize the quality of your evaluation study. Exercise • A maternity hospital wishes to determine if the offer of post-partum family-planning methods will increase contraceptive use among women who deliver at the hospital. • What study design would you recommend to test the hypothesis that women who are offered postpartum family-planning services are more likely to use family planning than women are not offered services? Exercise • You have been asked to evaluate the impact of a national mass-media AIDS-prevention campaign on condom use. • What study design would you choose and why? Linking Evaluation Design to Decision-Making Deciding Upon An Appropriate Evaluation Design • Indicators: What do you want to measure? – – – – Provision Utilization Coverage Impact • Type of inference: How sure to you want to be? – Adequacy – Plausibility – Probability • Other factors Source: Habicht, Victora, and Vaughan (1999) Clarification of Terms Types of evaluation Are the services available? Are they accessible? Performance Provision or process evaluation Utilization Impact evaluation Is their quality adequate? Are the services being used? Coverage Is the target population being reached? Impact Were there improvements in disease patterns or health-related behaviors? Clarification of Terms Adequacy assessment Plausibility assessment Probability assessment •Did the expected changes occur? •Are objectives being met? •Are activities were performed as planned? •May or may not require before/after comparison. Does not require controls •Did the program seem to have an effect to an intervention above and beyond other external influences? •Requires before-and-after comparison with controls and treatment of confounding factors. •Did the program have an effect (P < x%)? •Determines the statistical probability that the intervention caused the effect. •Requires before/after comparison with randomized control. Adequacy Assessment • Adequacy studies only describe if a condition is met or not – Typically addresses provision, utilization or coverage aspects. No need for control, pre/post data in such cases • Hypothesis tested: Are expected levels achieved? – Can also answer questions of impact (magnitude of change) provided pre/post data is available • Hypothesis tested: Difference is equal or greater than expected Features of Adequacy Assessment • Simplest (and cheapest) of evaluation models, as it does not try to control for external effects. Data are needed only for outcomes. • If only input or output results are needed, then the lack of controls is not a problem. • When measuring impact, however, it is not possible to infer that the change is due to the program due to lack of controls. • Also, if there is no change, it will not be possible to say whether the lack of change is due to program inefficiency, or if the program has impeded a further deterioration. Class Activity For each of the following outcomes of interest, provide indicators that would be useful in the evaluation of a program for control of diarrheal diseases aimed at young children with emphasis on the promotion of oral rehydration salts (ORS): - Provision: Are the services available? Are services accessible? Is their quality adequate? - Utilization: Are the services being used? - Coverage: Is target population being reached? - Impact: Were there improvements in disease patterns or health behaviors? Adequacy Assessment Inferences • Are objectives being met? – Compares program performance with previously-established adequacy criteria, e.g. 80% ORT-use rate – No control group – 2+ measurements to assess adequacy of change over time • Provision, utilization, coverage – Are activities being performed as planned? • Impact – Are observed changes in health or behavior of expected direction and magnitude? • Cross-sectional or longitudinal Source: Habicht, Victora and Vaughan (1999) Class Activity • What are the advantages of adequacy evaluations? • What are the limitations of adequacy evaluations? • If an adequacy evaluation shows a lack of change in indicators, how can this be interpreted? • Which of the study designs discussed earlier can be used for adequacy evaluations? Plausibility Assessment Inferences (1) • Program appears to have effect above and beyond impact of non-program influences • Includes control group – Historical control group • Compares changes in community before & after program and attempts to rule out external factors • Same target population – Internal control group • Compares groups/individuals with different intensities of exposure to program (dose-response) • Compares previous exposure to program between individuals with and without the disease (case-control) – External control group • Compares communities/geographic areas with and without the program • Population that were never targeted by the intervention, but who share key characteristics with the beneficiaries Source: Habicht, Victora and Vaughan (1999) Plausibility Assessment Inferences (2) • Provision, utilization, coverage – Intervention group appears to have better performance than control – Cross-sectional, longitudinal, longitudinal-control • Impact – Changes in health/behavior appear to be more beneficial in intervention than control group – Cross-sectional, longitudinal, longitudinal-control, casecontrol Source: Habicht, Victora and Vaughan (1999) Controls and Confounding Factors • For all types of controls, the groups being compared should be similar in all respect except their exposure to the intervention • That is almost never possible, however. There is always one factor that influences one group more than another (confounding factor). E.g., mortality due to diarrhea may be due to better access to drinking water, not to the ORS program. • To eliminate this problem, confounding must be measured and statistically treated, either via matching, standardization, or multivariate analysis. Probability Assessment Inferences • There is only a small probability that the differences between program and control areas were due to chance (P < .05) • Requires control group • Requires randomization • Often not feasible for assessing program effectiveness – – – – – Randomization needed before program starts Political factors Scale-up Inability to generalize results Known efficacy of intervention Source: Habicht, Victora and Vaughan (1999) Summary Assessment Objective What it says Adequacy: Assess whether impact was reached Indicates whether Outcome data resources were well collected among spent or not beneficiaries Understand what affects the outcomes Helps understand the determinants of success/failure of program (assessment of change in outcome) Plausibility: (before/after comparison controlling for confounding factors) Probability: (Causal analysis of before/after differences) Determine the Establishes precise causal effect of one causation between intervention on the action and effect outcome Data needs Outcome data plus confounders collected among beneficiaries and controls Outcome data collected among beneficiaries and control Discuss with Decision-Makers Before Choosing Evaluation Design Possible Areas of Concern to Different Decision-Makers Type of evaluation Adequacy Plausibility Probability Provision Utilization Coverage Health center manager International Agencies District health managers International Agencies International Agencies Donor Agencies & Scientists Source: Habicht, Victora and Vaughan (1999) Impact Donor agencies Scientists Evaluation Flow from Simpler to More Complex Designs Type of Evaluation Provision Adequacy 1st Utilization 2nd Plausibility Probability Source: Habicht, Victora and Vaughan (1999) Coverage Impact 3rd 4th (b) 4th (a) 5th Key Issues to Discuss with Decision Makers Before Choosing a Design • Is there a need for collecting new data? If so, at what level? • Does design include intervention-control or a before-after comparison? • How rare is the event to be measured? • How small is the difference to be detected? • How complex will the data analysis be? • How much will alternative designs cost? Source: Habicht, Victora and Vaughan (1999) References • Adamchak S et al. (2000). A Guide to Monitoring and Evaluating Adolescent Reproductive Health Programs. Focus on Young Adults, Tool Series 5. Washington, D.C.: Focus on Young Adults. • Fisher A et al. (2002). Designing HIV/AIDS Intervention Studies. An Operations Research Handbook. New York: The Population Council. • Habicht JP et al. (1999). Evaluation Designs for Adequacy, Plausibility, and Probability of Public Health Programme Performance and Impact. International Journal of Epidemiology, 28: 10-18. • Rossi P et al. (1999). Evaluation. A Systematic Approach. Thousand Oaks: Sage Publications.