Research Methods in Psychology Quasi-Experimental Designs and Program Evaluation Applied Research Goal • to improve the conditions in which people live and work Natural settings • messy, “real world,” hard to establish experimental control Quasi-experiments • procedures that approximate the conditions of highly controlled laboratory experiments Program evaluation • applied research to learn whether real-world treatments work Characteristics of True Experiments manipulate an Independent Variable (IV) • treatment, control conditions • high degree of control especially random assignment to conditions unambiguous outcome regarding effect of IV on DV • internal validity Obstacles to Conducting True Experiments in Natural Settings Permission • difficult to gain permission to conduct true experiments in natural settings • difficult to gain access to participants Random assignment perceived as unfair • people want a “treatment” • random assignment is best way to determine whether a treatment is effective • use “waiting-list” control group Advantage of True Experiments Threats to internal validity are controlled • confoundings (alternative explanations for findings) are controlled • rule out alternative explanations to make a causal inference about effect of IV on DV • 8 general classes of threats to internal validity • History • Maturation • Testing Instrumentation • Regression • Selection • Subject attrition • Additive effects with Selection Threats to Internal Validity History • When an event occurs at the same time as the treatment and changes participants’ behavior • participants’ “history” includes events other than treatment • difficult to distinguish whether treatment has an effect History Threat, continued 70 Condom sales 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week Does an AIDS awareness campaign on campus influence condom sales in campus vending machines? History threat: Suppose at week 4 (X = treatment) a celebrity announces he is HIV+ Can you conclude the awareness campaign was effective? Threats to Internal Validity, continued Maturation • Participants naturally change over time. • These maturational changes, not treatment, may explain any changes in participants during an experiment. Maturation Threat, continued MeanComprehension 90 80 70 60 50 40 30 20 10 0 Pre Post Does a new reading program improve 2nd graders’ reading comprehension? Reading comprehension improves naturally as children mature over the year. Can you conclude the reading program was effective? Threats to Internal Validity, continued Testing • Taking a test generally affects subsequent testing • Participants’ performance on a measure at the end of a study may differ from an initial testing because of their familiarity with the measures Testing Threat, continued 14 Minutes (Mean) 12 10 8 6 4 2 0 pre post Does teaching people a new problem solving strategy influence their ability to solve problems quickly? If similar problems are used in the pretest, faster problem solving may be due to familiarity with the test. Can we conclude that the new strategy improves problem-solving ability? Threats to Internal Validity, continued Instrumentation • Instruments used to measure participants’ performance may change over time example: observers may become bored or tired • Changes in participants’ performance may be due to changes in instruments used to measure performance, not to a treatment Reports of Rape Instrumentation, continued 50 45 40 35 30 25 20 15 10 5 0 1 2 3 4 X 5 6 7 8 Month Suppose that a police protection program is implemented to decrease incidence of rape. At the same time the program is implemented (X), reporting laws change such that what constitutes rape is broadened. Can we conclude the program was effective (or ineffective)? Threats to Internal Validity, continued Regression • Participants sometimes perform very well or very poorly on a measure because of chance factors (e.g., luck). • These chance factors are not likely to be present during a second testing, so their scores will not be so extreme. • The scores will “regress” (go toward) the mean. • Regression effects, not treatment, may account for changes in participants’ performance over time. Regression, continued A test score = true score + error (e.g., chance) definition of an unreliable test or measure: • it measures with a lot of error If people score very high or very low on a test, it’s possible that chance factors produced the extreme score. On a second testing, those chance factors are less likely to be present (that’s why they’re “chance”) Test Scores (Mean) Regression, continued 100 90 80 70 60 50 40 30 20 10 0 Pre Post Suppose that students were selected for an accelerated enrichment program because of their very high scores on a brief test. Regression: to the extent the test is an unreliable measure of ability, we can expect their scores to regress to the mean at the 2nd testing. Can we conclude the enrichment program was effective? Threats to Internal Validity, continued Subject attrition • When participants are lost from the study (attrition), the group equivalence formed at the start of the study may be destroyed. • Differences between treatment and control groups at the end of the study may be due to differences in those who remain in each group. Subject Attrition, continued 250 Mean Weight 200 150 100 50 0 Time1 Time2 Suppose that an exercise program is offered to employees who would like to lose weight. At Time 1, N = 50 M weight = 225 pounds At Time 2, N = 25 (25 drop out of study) Suppose the 25 who stayed in the program weighed, on average, 150 pounds at Time 1 Did the exercise program help people to lose weight? Threats to Internal Validity, continued Selection • occurs when differences exist between individuals in treatment and control groups at the start of a study • these differences become alternative explanations for any differences observed at the end of the study • random assignment controls the selection threat Selection, continued 40 mean lbs/week 35 30 25 20 15 10 5 0 Recyc. Not Suppose that a community recycling program is tested. Individuals who are interested in recycling are encouraged to participate. Evaluation: Compare the weight of garbage from participants in the program with weight of garbage from those not in the new program. Can we tell if the new recycling program is effective? Threats to Internal Validity, continued Additive effects with selection • When one group of participants in an experiment responds differently to an external event (history) matures at a different rate is measured more sensitively by a test (instrumentation) • these threats (rather than treatment) may account for any group differences at the end of a study Additive Effects with Selection, continued Condom sales 70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)? History threat: Suppose a celebrity announces at week 4 that he is HIV+ Can you conclude the awareness campaign at School A is effective? Yes, both groups should have experienced the same history threat equally. Additive Effects with Selection, continued Condom sales 70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)? Additive effect of Selection and History: Suppose at week 4 (X), the student newspaper at School A reports about students who are HIV+ (not part of the awareness campaign). Can you conclude the awareness campaign was effective? Additive Effects with Selection, continued Condom sales 70 60 50 40 30 20 10 0 1 2 3 4 X 5 6 7 8 Week School A School B Does an AIDS awareness campaign at School A affect condom sales compared to control (no awareness campaign (School B)? Additive effect of Selection and History: Suppose at week 4 (X), the student newspaper at School B reports about students who are HIV+ Can you conclude whether the awareness campaign at School A was effective? Threats to Internal Validity, continued Important points to remember • When there is no comparison group in a study, the following threats to internal validity must be ruled out: history, maturation, testing, instrumentation, regression, subject mortality, selection • When a comparison group is added, the following threats must be ruled out: selection, additive effects with selection • Adding a comparison group helps researchers to rule out many threats to internal validity Threats to Internal Validity, continued Threats that even true experiments may not eliminate • contamination • experimenter expectancy effects • novelty effects (including Hawthorne effect) Threats to external validity • occur when treatment effects may not be generalized beyond the particular people, setting, treatment, and outcome of an experiment. • best way to assess external validity: replication Threats to Internal Validity, continued Contamination • occurs when there is communication about the experiment between groups of participants • three possible outcomes resentment rivalry diffusion of treatments Threats to Internal Validity, continued Expectancy effects • occur when an experimenter unintentionally influences the results of an experiment • two types expectations lead to systematic errors in interpretation of participants’ performance expectations lead to errors in recording data Threats to Internal Validity, continued Novelty effects • refer to changes in people’s behaviors simply because as innovation (e.g., a treatment) produces excitement, energy, enthusiasm • Hawthorne effect: a special case performance changes when people know “significant others” (e.g., researchers, employers) are interested in them or care about their living or work conditions Because of contamination, expectancy and novelty effects, researchers may have trouble concluding that a treatment was effective Quasi-Experiments “Quasi-” (resembling) experiments • an important alternative when true experiments are not possible • lack the high degree of control found in true experiments • researchers must seek additional evidence to eliminate threats to internal validity The One-Group Pretest-Posttest Design “bad experiment” or “preexperimental design” • an intact group is selected to receive a treatment e.g., a classroom of children, a group of employees • pretest records participants’ performance before treatment observation 1 (O1) • treatment is implemented (X) • posttest records performance following treatment (O2) O1 X O2 One-Group Pretest-Posttest Design, cont. O1 X O2 • None of the threats to internal validity are controlled. • Any change between pretest (O1) and posttest (O2) may be due to treatment (X) or history (some other event coincided with treatment) testing (effects of repeated testing) maturation (natural changes in participants over time or instrumentation, regression, subject attrition Quasi-Experimental Designs Nonequivalent Control Group Design • a group similar to the treatment group serves as a comparison group • obtain pretest and posttest measures for individuals in both groups • random assignment to groups is not used • pretest scores are used to determine whether the groups are equivalent equivalent only on this dimension Nonequivalent Control Group Design, continued treatment treatment group O1 X O2 -----------------O1 pretest nonequivalent control group O2 posttest Nonequivalent Control Group Design, continued Example: Does taking a research methods course improve reasoning ability? Compare students in research methods and developmental psychology courses DV: 7-item test of methodological and statistical reasoning ability Suppose group differences are observed at the posttest Nonequivalent Control Group Design, continued By adding a comparison group, rule out these threats to internal validity: Mean Reasoning Score 6 5 • • • • • 4 3 2 1 0 Pre Develop Post Methods history maturation testing instrumentation regression Assume that these threats happen the same to both groups, therefore, can’t be used to explain posttest differences Nonequivalent Control Group Design, continued What threats are not ruled out? • Selection Without random assignment to conditions, the two groups are probably not equivalent on many dimensions These preexisting differences may account for group differences at the posttest Nonequivalent Control Group Design, continued • Additive effects with selection The two groups • may have different experiences (selection X history) • may mature at different rates (selection X maturation) • may be measured more or less sensitively by the instrument (selection X instrumentation) • may drop out of the study (courses) at different rates (differential subject attrition) • may differ in terms of regression to the mean (differential regression) Quasi-Experiments, continued Simple Interrupted Time-Series Design • Observe a DV for some time before and after a treatment is introduced. • Archival data are often used. • Look for clear discontinuity in the time-series data for evidence of treatment effectiveness. O1 O2 O3 O4 X O5 O6 O7 O8 Simple Interrupted Times-Series Design, continued Example: Study habits • intervention: An instructional course to change students’ study habits implemented during the summer following the sophomore year (after semester 4) • DV: semester GPA • Suppose that a discontinuity is observed when the treatment (X) is introduced Simple Interrupted Times-Series Design, continued What threats can be ruled out? 4 • maturation: assume maturational changes are gradual, not abrupt • testing (GPA): if testing influences performance, these effects are likely to show up in initial observations (before X) 3.5 Mean GPA 3 2.5 2 1.5 testing effects less likely with archival data 1 0.5 0 1 2 3 4 5 6 7 8 • regression: if scores regress to the mean, they will do so in initial observations Quasi-Experiments, continued Time-Series with Nonequivalent Control Group Design • Add a comparison group to the simple timeseries design O1 O2 O3 O4 X O5 O6 O7 O8 -------------------------------------------------------------- O1 O2 O3 O4 O5 O6 O7 O8 Time Series with Nonequivalent Control Group Design, continued Example: Study habits Mean GPA 4 • Suppose that a nonequivalent control group is added—these students don’t participate in the study habits course • Who could be in the comparison group? • What threats would you be able to rule out? 3.5 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 Study Control Program Evaluation Goal • provide feedback to administrators of human service organizations in order to help them decide what services to provide who to provide services to how to provide services most effectively and efficiently Big growth area (especially health care) Program evaluators assess • needs, process, outcomes, efficiency of social services Four Questions of Program Evaluation Needs • Is an agency or organization meeting the needs of the people it serves survey research designs Process • How is a program being implemented (is it going as planned)? observational research designs Four Questions of Program Evaluation, cont. Outcome • Has a program been effective in meeting its stated goals experimental, quasi-experimental research designs; archival data Efficiency • Is a program cost-efficient relative to alternative programs experimental, quasi-experimental research designs; archival data Basic Research and Applied Research Program evaluation is the most extreme case of applied research • goal is practical, not theoretical Relationship between basic and applied research is reciprocal • basic research provides scientifically based principles about behavior and mental processes • these principles are applied in complex, real world • new complexities are recognized and new hypotheses must be tested using basic research