Quasi-Experimental Approaches to Outcome Evaluation Presented By: Lana, Kasia & Catherine Concept Map: Let us Explain Quasi-Experiments How to Increase the Validity of Interpretations Making Observation s a Greater # of Times Observing Other Groups Nonequivalent Control Group Designs RegressionDiscontinuit y Design Observing Other Dependent Variables Problems in Selecting Comparison Groups Combing Designs to Increase Internal Validity Time-Series Designs (Information across many time Intervals) Selective Control Design Time-Series and Nonequivalent Control Groups Analysis of Time-Series Designs ABAB Design When The Intervention Cannot Be Removed, but the Effect Is Large How the Feminist/Critical Research Paradigm Applies Quasi? The word quasi means as if or almost so a quasi experiment is almost a true experiment What makes a true experiment is random assignment of groups or people to treatments But in Quasi Experiments you have only partial control over the independent variables because assignment to conditions is not random This is useful when random assignment is impossible or unethical (males vs. females, high vs. low self-esteem). It’s not just having intact groups that creates a quasiexperiment. Individuals who are not in intact groups could enter treatment levels through: self selection (ex. Because of a particular performance category) or because the researcher has “paired” individuals that are believed to be similar. How To Tell If Something Caused Something Else…3 STEPS 1) That the cause comes before the expected effect 2) The cause co-varies with the effect (the more of the cause, the more of the effect) Now are you ready??? Cause this next step is the focus of this chapter!!! 3) that no possible explanation of the effect can be found EXCEPT for the assumed cause Quasi-Experiments Use three methods to increase validity 1) Observing participants at additional times both before and after the program 2) Observing additional natural groups of people who were not involved in the program 3) Use of a variety of variables- some expected to be effected by the program and some not. Keep in Mind… These methods do not achieve the airtight control of true experiments However, quasi-experiments do control for many biases and can yield highly interpretable evaluations Making Observations A Greater Number of Times This can help the following problems.. It distinguishes random changes from time period to time period. All variables will show variation over time Ex. # of crimes on a given day…not consistent. Without info on day-to-day variation, one cannot know whether there is anything that needs to be explained. Ignoring random variation and threats to internal validity can lead to erroneous interpretations of the casual connection between two events. Time Series Designs This is a longitudinal design meaning over time.. Participants are tested at different times during the course of the study. Strategy: obtain a base line measurement before an intervention and document both the change and the maintenance of change This is a way of meeting some of the internal validity challenges that were described in chapter 8. How can it help with internal validity? Maturation: the effects can be traced during the time periods before and after the intervention History: effects are more easily detected. By relating changes in dependent variables to historical events, it is possible to distinguish the effects of the program from the impact of major non-program influences. Time series designs can also be of two types: 1) interrupted, 2) non-interrupted Interrupted Vs. Non-interrupted Both types examine changes in the dependent variable over time However, the interrupted time series design involves before and after measurement. Program evaluators use interrupted designs almost exclusively when a definite intervention has occurred at a specific time. The evaluators job is to learn whether the interruption- that is, the introduction of a program- had an impact. Characteristics Of A Time Series Design 1) single unit is defined 2) measurements are made 3) Over a number of time intervals 4) that precede and follow some controlled or natural observation The unit observed (person, group, etc) serves as its own control. Possible patterns: There are a number of possible patterns observable in graphs of a program’s outcome plotted over time periods. Now, we will explain two of the possibilities. Note: the dashed line in the pictures represents the time of the program/intervention. No effect of the Intervention There appears to be no out- of – the ordinary change in the observations after the program Criterion Time Intervals Most Hoped for Finding Criterion -This shows a marked increase from a fairly stable level before the intervention, and the criterion remains fairly stable afterwards Time Intervals Analysis of Time-Series Designs 1) ABAB Design – When the intervention is implemented and then removed. – After establishing a baseline, an intervention is introduced that is supposed to reduce the frequency of a problem behavior. – Suppose the intervention is effective: the problem behavior decreases. – After several observation periods the intervention is removed. – If the rate of the problem behavior increases, it appears as if the intervention had an effect (that is, the change was not due to just maturation or history). – If the intervention is reintroduced and the problem behavior is again decreased, it is quite safe to say the intervention is effective! Continued… 2) When the intervention cannot be removed, but the effect is large. – In this case the impact of the intervention may still be obvious because of the large effect – There is a good example in your book that you can make a note to look over on page 182. In this example it talks about smoothing – Smoothing is a method used to reduce/cancel the effect due to random variation and shows trends. – Smoothing a graph is no different from finding the mean of a set of numbers in order to identify the general pattern. – A smoothed graph can reveal a pattern in a graph better than a graph of the raw data can, just as a mean reveals a general trend better than a list of original data points. Observing Other Groups Nonequivalent Control Group Designs This is another way of increasing the interpretability of an evaluation. – Why non-equivalent? Because we did not use random assignment to place subjects in treatment groups, so we cannot assume that on the average the groups are the same, or equivalent to begin with. Continued. So what does it involve? – It increases the number of groups observed – Here we have experimental and control groups that are designated before the treatment occurs and are not randomly assigned. – If the pretest-posttest design could be duplicated with another group that did not receive the program, a potentially strong research design would result. – As long as the groups are comparable, nearly all the internal validity tests are satisfied by this design. What is Expected? A larger improvement between the pretest and posttest for the program group than for the comparison group. Ex: Dependent Variable Program Group Comparison Group Before After Time of Observation How To Analyze this Data By having the two groups by two time periods analysis of variance, with repeated measurements over time periods (Mixed Design) Remember Analysis of Variance tells us if the group means differ…. If the program is successful and the group means follow the picture you just saw, the analysis of variance would reveal a significant interaction between group and testing period. Continued Before beginning a statistical analysis it is always wise to inspect the data carefully. Ideally we would want to find that the standard deviations associated with the means in the example are similar In this example the means before the program were nearly equal, if you have a case where the pretest means are quite different, analyses of nonequivalent control group designs may be misleading. Positive Features of This Design Including the comparison group permits the isolation of the threats of internal validity – Because both groups were tested at the same time, they had the same amount of time to mature. – Historical forces have presumably affected the groups equally – Because both groups are tested twice, testing effects should be equivalent – Finally, the rates of participant loss between pretest and posttest can be examined to be sure that they are similar Useful? Nonequivalent control groups are especially useful when part of an organization is exposed to the program while other parts are not. Since selection to the program is not in the hands of the participants, and since the participants’ level of need does not determine eligibility, the comparability of the group is quite good. Why Comparison Groups? They are chosen when one seeks to learn if there is an effect of a program (no-treatment group) This would not be appropriate if a comparison is needed on different ways to offer a servicehere you would use the comparison of different programs If there is a suspicion that attention alone could affect an outcome, then the comparison group would be a placebo group (a group that experiences a program not expected to affect the outcome variable). Problems in Selecting Comparison Groups Major weakness: finding a comparison group sufficiently similar to the treatment group to permit drawing valid interpretations. – Ex. Parents who seek out special programs for their children may also be devoting more attention to their children at home than are parents who do not seek out special programs. Matching Gone Wrong While matching is often used to select comparison groups (on income level, test score level, rated adjustment, locality of residence, etc) there are situations where it can go wrong. Example pg. 187 (2nd Paragraph) The Moral Is Clear The nonequivalent control group design is especially sensitive to regression effects when groups are systematically different on some dimensions. Regression Is Not The Only Weakness. Other reasons why groups may differ from each other: – And most examples break down to a lack of consistency (teacher using different methods for different classes, or one physician encouraging a brochure while others are not, etc) Regression-Discontinuity Design Seen as a useful method for determining whether a program or treatment is effective. -this actually refers to a set of design variations In its simplest, most traditional form, this is a pretest-posttest programcomparison group strategy. How Is It Different? The unique characteristic of this design which sets it apart from other pretestposttest group designs is the method by which research participants are assigned to conditions. Participants here, are assigned to program or comparison groups solely of the basis of a cutoff score on a preprogram measure. Major Advantage… This cutoff criterion implies the major advantage of these designs: – They are appropriate when we wish to target a program or treatment to those who most need or deserve it. – So unlike the other quasi-experiment alternatives, this design does not require us to assign potentially needy individuals to a no-program comparison group in order to evaluate the effectiveness of a program. Regression-Discontinuity Design Used when eligibility for a service is based on a continuous variable (income, achievement, level of disability, etc.) Example: You have 300 students who are being tested for reading achievement. Those scoring the lowest are defined as those needing the most assistance. If the program has facilities for 100 students, it seems reasonable and fair to take the 100 with the lowest scores into the program. If all 300 are retested at the end of the school year, what would be expected? We would not expect the 100 to outperform the 200 regular class students. If the program was effective, we would expect that the treated children would have gained more than they would have had they stayed in their regular classrooms. The regression discontinuity design enables the evaluator to measure such effects. Observing Other Dependent Variables Used to increase the validity of interpretations These variables are not expected to be changed by the program, or, at most, changed only marginally. AKA control construct design. – The added dependent measures must be similar to the major dependent variable WITHOUT being strongly influenced by the program. Back to the Example…(We’ll put it into perspective for you!) If the children do read better after a special program, it might be expected that they would do better on a social sciences test than they did before the reading program Consequently, a social sciences test would NOT be a good control construct measure for a reading program However, a test of math skills requiring only minimal reading levels might be an excellent additional dependent variable. Activity In your group you were given a type of quasiexperiment. What you need to do with it: – List one benefit and one drawback of this type – Give a program example that your quasiexperiment would be best suited to evaluate – At what point within the program would it be most helpful? Combining Designs to Increase Internal Validity 1)Time-Series and Nonequivalent Control Groups – If a group similar to the program participants can be observed, the simple interrupted time-series design is strengthened considerably. – A key to drawing valid interpretations from observations lies in being able to repeat the observations – If a finding can be replicated, one can be more sure of conclusions than if conditions make replication impossible. Interrupted Time Series with Switching Replications This is a refinement in which there are two groups, each serving as either the treatment or comparison group on an alternating basis, through multiple replications of treatment and removal Continued This requires an even higher level of control over subjects by the researcher but is a particularly strong design in ruling out threats to validity. Not useful in studies where the treatment intervention has been gradual, or when treatment effect does not decay well. How About An Example??? Let’s say your seeing if a new alcoholism treatment program works. Steps of Interrupted Time Series with Switching Replications: – Form two groups of patients, the experimental group and the comparison group. – Pre-test both groups with an instrument that would provide a baseline for the groups, such as the Alcohol Dependence Scale. – Apply the treatment to the experimental group and withhold it from the comparison group. – Measure the experimental group many times (e.g., every two weeks) to see if it responded to your treatment. If it did, you would apply the same treatment to the control group, measure it many times, and see if you got the same results. If you did, you can safely assume that this new program is promising. Still Combing Designs… 2) Selective Control Design – By understanding the context of a program, evaluators may be able to identify the threats to internal validity that are most likely to affect an evaluation – Evaluators may then choose comparison groups to control for specific threats to internal validity so that the most plausible alternative interpretations are eliminated. – When the appropriate non-equivalent control groups are available, the selective control design can be a powerful approach. Feminist/Critical Research Paradigm Ontology (Nature of reality): The apprehended world makes a material difference in terms of race, gender and class Epistemology (Viewpoint/perception of knowledge base): Knowledge as subjective and political; researchers values frame inquiry Methodology (How knowledge is gained): Transformative inquiry; changing the questions Products (Forms of knowledge produced): Value mediated critiques that challenge existing power structures and promote resistance. How Does This Apply To QuasiExperiments??? Quasi Experiments- Contextual Analyses True Experiments- Criticized by feminist researchers – No one ‘true’ reality – All research is bias based upon the perception/experience of the researcher – Feminist research: contains a background about the researcher (s) Continued. Criticism of traditional methods So…feminist stance about quasiexperiments would be one that is favorable because there is no ‘random variation’ Who uses Quasi Experiments in Our Community? (A few examples) Well-come Centre for Human Potential SACC Hiatus House THANK YOU Time For The Test!!!