WattsSEDU7006-8-5 0 Quasi-experimental Design Stephen W. Watts Northcentral University WattsSEDU7006-8-5 1 Quasi-experimental Design Jackson (2012) Chapter Exercises #2. The psychology professor has two sections of students that were not randomly assigned. The treatment will be weekly quizzes. The desired outcome is improved student learning. I will assume that student learning is measured by the total score on all exams not including the weekly quizzes. This design can be diagrammed with the following notation: N X N O O Based on these assumptions I would recommend a nonequivalent control group posttest only design for this quasi experiment. If student learning is measured by scores on major exams through the course, however, the notation changes, as would the recommendation. The following notation represents an alternative possibility for this design: N N X O O X O O Based on these assumptions, I would recommend a nonequivalent control group time-series design for this experiment. Overall, this later design is a stronger design because of the multiple observations and points of comparison between the two sections. #4. Usually the nonequivalent control group design uses intact groups. Since these groups are not randomly selected, the major confound for this design is selection bias. Selection bias arises when the control and experimental groups are not comparable before the study and gives an alternative explanation for any differences in the posttest. In this design there are a couple of social interaction threats that are possible. Students in the experimental section may WattsSEDU7006-8-5 2 find out that students in the control section are not getting quizzes and react negatively, causing resentful demoralization, or react competitively, causing compensatory rivalry. Either case would tend “to equalize the outcomes between groups, minimizing the chance of seeing a program effect even if there is one” (Trochim & Donnelly, 2008, p. 171). A single-group design’s worst confound is that there is no comparison group, therefore, if student scores are high or low, it is impossible to tell if they are that way because of the treatment, or because of some alternative explanation. With only a single group to measure, all of the single group threats to internal validity are possible. In this particular study the most likely would be a history threat, where an event or set of events could affect the outcome more than the treatment. #6. Three reasons a researcher might choose a single-case design include, (a) when a single person or condition is of interest, (b) when replication of results is essential, and (c) situations in which error variance needs to be eliminated (Jackson, 2012). Clinical trials often are interested in the effect of a treatment on a single individual. One study was interested in mitigating or eliminating auditory hallucinations and delusions as a result of schizophrenia, and evaluated the use of “an innovative rational-emotive cognitive treatment” (Qumtin, Belanger, & Lamontagne, 2012, p. 114). From this treatment there was a noticeable and immediate reduction in depression and anxiety and an increase in the patient’s quality of life that extended through the 12-month follow up. A discussion regarding replication of empirical results resulting from single-case designs in psychology and education was written by Kratochwill and Levin (2010) along with suggestions for improving the credibility of these designs by using randomization. Error variance results from differences between participants in a group. In a single-case design WattsSEDU7006-8-5 3 there is no group, hence no error variance; making the determination of outcomes resulting from the independent variable much less complicated. #8. Single case designs can be implemented in a number of ways. In a reversal design the focus is on a single participant, a single behavior, or a single situation, and involves the independent variable being applied and removed one or more times to assess its impact. By evaluating the behavior without the treatment applied and then comparing the behavior after treatment, a determination can be made regarding the treatments effect. Succeeding periods of non-treatment are called the reversal and can demonstrate what happens to the behavior with a return to the baseline. A reversal design is very similar to a within-subjects group design because each subject experiences both the control and experimental condition. Greenhoot (2003) summarized that the reversal of an effect because of the removal of a treatment provided “strong evidence for a causal link between the independent and dependent variable” (p. 98). In a multiple-baseline design multiple participants, behaviors, or situations may be involved. In these single case studies the measures of interest are individualized and descriptive statistics are not used to aggregate them. The multiple-baseline design measures the effect of the application of a treatment at differing times for multiple subjects, or can be used with a single subject by applying the treatment in conjunction with different behaviors or in different scenarios. Part I Assignment Question Answers Describe the advantages and disadvantages of quasi-experiments? What is the fundamental weakness of a quasi-experimental design? Why is it a weakness? Does its weakness always matter? A quasi-experimental design has certain advantages, including; (a) the ability to “draw slightly stronger conclusions than . . . with correlational research” (Jackson, WattsSEDU7006-8-5 4 2012, p. 342), (b) contributing original research to the body of knowledge in a field (Ellis & Leavy, 2011), (c) allowing real-world research in the field as opposed to more controlled conditions (Jackson, 2012), (d) may involve an independent variable that is nonmanipulated (Hoadley, 2004), and (e) can be used with intact groups (de Anda, 2007). The disadvantages of using a quasi-experimental design are that (a) it “limits internal validity in a study” (Jackson, 2012, p. 342), (b) does not establish a causal relationship between variables (Jackson, 2012), while (c) probabilistic equivalence cannot be assumed (Dimitrov & Rumrill, 2003). The fundamental weakness of a quasi-experimental design, and what makes it “quasi”, is that it does not involve random assignment of participants to experimental conditions (Greenhoot, 2003). Nonrandom assignment weakens internal validity, diminishes the ability to establish cause-andeffect by eliminating the expectation that groups are equivalent. This weakness does not exclude quasi-experiments from a researchers arsenal because of the advantages mentioned above. These advantages make quasi-experiments the most used form of quantitative research (Ellis & Leavy, 2011). There may be times when subjects cannot be randomly assigned to groups, there is only one subject of interest, or the topic of research either exists or it does not; in all of these cases a quasi-experiment is a viable option for conducting the research while an experiment is not (Greenhoot, 2003). If you randomly assign participants to groups, can you assume the groups are equivalent at the beginning of the study? At the end? Why or why not? If you cannot assume equivalence at either end, what can you do? Please explain. If participants are randomly assigned to groups from the same population of interest to the study, the groups can be said to be probabilistically equivalent at the beginning of a study (Trochim & Donnelly, 2008). Random assignment does not ensure groups are exactly the same, but that differences between WattsSEDU7006-8-5 5 the groups can be statistically determined since “groups can differ only due to chance assignment” (p. 189). If, however, participants are chosen from a pool that does not represent the study population, or from multiple pools with differing characteristics, they will neither be equivalent or probabilistically equivalent. Assigning subjects to control and experimental groups randomly is the best means of ensuring probabilistic equivalence at the beginning of a study, as long as the subjects are chosen from the appropriate population. Even though the groups will not be exactly the same, the probability that they are different can be calculated and minimized. There are also designs that allow for the use of the same participants in both control and experimental conditions, ensuring equivalence between conditions. Whether groups are probabilistically equivalent at the end of a study is the purpose of research. In a perfect world, where the treatment has an effect, the experimental group will consist of a different population from the control group at the end of a study, and this difference is the treatment effect. Unfortunately, there are confounds other than the application of the treatment that may exacerbate or minimize differences between the groups, consisting of the threats to internal validity. The best way to ensure that the only conditions that change between the groups is by designing the experiment to minimize threats to internal validity. By demonstrating that an independent variable precedes the dependent variables, and that they covary together, while showing that there are no other plausible explanations for the relationship, internal validity is high and can explain causality. With a properly designed study, the first two criterion are easy to control so it is critical to focus on the third criteria, and minimize alternative explanations. WattsSEDU7006-8-5 6 Explain and give examples of how the particular outcomes of a study can suggest if a particular threat is likely to have been present. Trochim and Donnelly (2008) identified five possible outcomes for a research study and the potential internal validity threats associated with each outcome. They noted that the crossover pattern has the “clearest pattern of evidence for the effectiveness of the program of all five of the hypothetical outcomes” (p. 215) and that there are no plausible threats to internal validity. In this particular pattern the control group does not appear to change from pretest to posttest, but the pretest of the experimental group has a lower score than the control, while the posttest score of the experimental group is higher than that of the control group. Two patterns identified by Trochim and Donnelly (2008) have the control and experimental groups close in score on the pretest, which I will label as diverging patterns. In the first diverging outcome pattern, the control group maintains a consistent pretest-posttest score, while the treatment group’s??? score increases from pretest to posttest. The most likely internal validity threat presented in this situation is a selection-history threat where an event occurs to which the treatment group reacts and the control group does not. The second diverging outcome pattern reflects improvement in scores for both groups between pretest and posttest, but the treatment group improves more. Because of the improvement in scores by both groups almost all of the multiple-group threats to internal validity are possible, with the exception of selectionregression. Improvements by both groups can indicate selection-maturation, selection-history, selection-testing, and selection-instrumentation as possible threats that affected the scores. Two patterns identified by Trochim and Donnelly (2008) have the control and experimental groups close in score on the posttest, which I will label as converging patterns. In the first converging outcome pattern the two groups begin far apart on pretest scores, with the WattsSEDU7006-8-5 7 experimental group being much higher than the control group, and end with both groups’ posttest scores close together. The second converging outcome pattern is similar except in this case the pretest scores are much lower for the experimental group, and in the posttest the control and experimental scores are close together. In both cases, the control group remains consistent between pretest and posttest scores, while the experimental groups vary greatly and approach the scores of the control group. The most likely threat to internal validity in the converging scenarios is selection-regression. Participants may have been selected because of high or low scores on a measure, and even though the treatment may or may not have had an effect, the scores are closer to the mean. Describe each of the following types of designs, explain its logic, and why the design does or does not address the selection threats discussed in Chapter 7 of Trochim and Donnelly (2006): a. Non-equivalent control group posttest only. The nonequivalent control group posttest only design has at least two nonrandomly selected groups. One of the groups is the experimental group and will be administered the new treatment, while the other group is the control group and will have no treatment, or the standard treatment. If the two groups can be shown to be relatively equivalent the results of this design are very similar to a true experiment, and the comparison between the groups on the posttest should reflect the effect of the treatment (Jackson, 2012). The problem with this design is that it allows very little within the design to determine if the groups are roughly equivalent in the first place, meaning that “you can never be sure the groups are comparable” (Trochim & Donnelly, 2008, p. 211). This means that the design is especially susceptible to selection bias since the groups are not randomly selected, and the groups may not be equivalent. It is possible using this design that the two groups could have different history or WattsSEDU7006-8-5 8 maturation during the duration of the program that could impact the posttest, but none of the other multiple-group threats to internal validity are applicable for this design. If participants of each group are not isolated from each other there is also the possibility of the social interaction threats to internal validity; diffusion or imitation of treatment, compensatory rivalry, or resentful demoralization (Trochim & Donnelly, 2008). b. Non-equivalent control group pretest/posttest. The nonequivalent control group pretest/posttest design has at least two nonrandomly selected groups. One group comprises the experimental group, and receives treatment, while the other group is the control group, receiving no treatment or the standard treatment. Each group receives a measure regarding the dependent variables at the beginning of the study, and then another equivalent measure regarding the dependent variables at the end of the study. The addition of the pretest allows the researcher to compare the groups with respect to the dependent variables to determine whether the groups are roughly equivalent. If the groups are equivalent the internal validity of the posttest comparison is increased, and if they are not the posttest scores can be statistically adjusted based on the pretest scores. c. Cross-sectional. A cross-sectional design studies multiple strata of a population at the same time. For example, a cross-sectional design focused on writing development may sample from 7th, 9th, and 11th grade students. The logic behind a cross-sectional study is to collect as much data regarding the stratum as quickly as possible. This study often does not adequately address selection bias, since the subjects are by definition stratified around some criteria that distinguishes them from the other strata. d. Regression-Discontinuity. The regression-discontinuity design begins by administering a pretest to all participants, and then relegates them to control or experimental groups based on a WattsSEDU7006-8-5 9 specific cutoff score. Those with the greatest need for treatment are administered the treatment, while the other group serves as a control. Unlike randomized experiments this design makes no attempt to equalize the groups. While in other designs nonequivalence is damaging to internal validity, in a regression-discontinuity design it serves to strengthen the internal validity. The reason for this is that the groups are intentionally designed to not be equivalent, but instead have a linear relationship with each other. If during the posttest there is still a linear relationship between the groups, the treatment had no effect. However, if there is a discontinuity between the groups at the point of cutoff, a gap between each group’s linear representations of scores, this indicates the effect of the treatment. This design addresses all threats to internal validity from multiple groups, but can still be affected by social interaction threats that will tend to diminish the effect size. Why are quasi-experimental designs used more often than experimental designs? Quasi-experimental designs are often used when it is not feasible or ethical to conduct a randomized controlled study (Trochim & Donnelly, 2008). For example, in education research the general desire is to know how some treatment affects students in the classroom. In most cases if an experiment is going to be conducted in a classroom setting, students cannot be randomly selected by teachers for inclusion into an experimental and control group. Thus, for the majority of educational studies it is not feasible to randomize. Further, if a treatment is known to be efficacious, it is unethical to withhold that treatment from certain individuals while providing it to others. Most medical research chooses quasi-experimental designs for this reason (Harris et al., 2006). One conclusion you might reach (hint) after completing the readings for this assignment is that there are no bad designs, only bad design choices (and implementations). WattsSEDU7006-8-5 10 State a research question for which a single-group post-test only design can yield relatively unambiguous findings. Many training classes use this very design. At the culmination of the training an evaluation is presented to the students and they are requested to fill it out, depicting their satisfaction with various factors regarding the course. Students could not very well be satisfied with or express how much they learned from the class at the beginning of the class, so in this situation a pretest would make little sense. Part II Assignment Question Answers What research question(s) does the study address? Research questions are designed to determine relationships between variables. In this study there were two independent variables, identified as income level of parents, and exposure time to children’s TV commercials. One research question for this study could have been: Among pre-pubescent children, how effective are commercials directed toward them at product recognition? Another research question could have been: What effect does parent’s income level have on the amount of time children view TV? What is Goldberg’s rationale for the study? Was the study designed to contribute to theory? Do the results of the study contribute to theory? For both questions: If so, how? If not, why not? Goldberg’s (1990) rationale for conducting the quasi-experiment was three-fold. First, Goldberg identified that “research using an experimental paradigm has tended to support the view that the influence of commercials targeted at children is considerable” (p. 445, emphasis in original) and gave a number of examples. Second, Goldberg demonstrated that “much of the research stream that has utilized a survey research/correlational paradigm to study the effects of advertising on children tends to indicate that advertising has a fairly minor role in influencing measures such as children’s preferences” (p. 446, emphasis in original). Third, Goldberg WattsSEDU7006-8-5 11 identified that others had “noted the need to combine the two approaches” (p. 446) and proposed a quasi-experiment as a way to have “some of the advantages of both the experimental and survey methods” (p. 446) to determine the effect of advertising directed toward children. The study was designed to contribute to theory because Goldberg (1990) continued to test whether advertising directed towards children is efficacious. To this point, the literature was inconclusive regarding the effect of advertising on children due to a divergence between outcomes gathered through experimental versus case studies. If advertising geared toward children is ineffectual, companies need to determine a better and more effective use of their advertising dollars. The applicable theory in this study is advertising theory; the assumptions and principles that encourage people to buy products. This framework is subdivided in this study, focusing on children and how commercials directed towards them impacts their recognition of and desire to obtain certain products. The results of Goldberg’s (1990) study did contribute to theory. The outcomes indicated that children with minimal to no exposure to commercials targeting them were much less likely to recognize advertised products, or have them in their homes. On the other hand, children with more exposure to advertising were shown to be more likely to recognize certain products and to have them in their homes. This study corroborates much of the experimental research regarding the effectiveness of advertising on children and contributes to advertising theory. What constructs does the study address? How are they operationalized? The main construct in Goldberg (1990) is advertising. Is advertising to children effective? Advertising as a construct was operationalized in the study through statistics regarding what is advertised to children. Since, “the most prevalent product categories advertised to children are toys . . . and foods” (p. 448) these were chosen as a way to determine the effectiveness of the advertisements. WattsSEDU7006-8-5 12 Further, one-quarter of the foods marketed toward children are breakfast cereals, so “children’s toys and cereals were selected as the focus of the study” (p. 448). A moderating construct was created in the study based on income level. Since some of the children’s parents’ income level was unavailable, this construct was operationalized by collecting the children’s mothers and fathers occupation. Another construct that was utilized in the study was exposure time to French-based and English-based TV stations by using a survey to determine which shows were watched when. By calculating how often shows were watched, the researchers were able to “estimate the total number of [American children’s commercial TV] programs each child watched during the year” (p. 448). What are the independent and dependent variables in the study? The independent variables in Goldberg’s (1990) study were “cultural affiliation as defined by the language spoken” (p. 448), “the total number of ACTV programs each child watched during the year” (p. 448), and family income as differentiated by parental occupations. Goldberg’s dependent variables consisted of recognition of toys and breakfast cereals that were marketed on American TV. Name the type of design the researchers used. Goldberg (1990) used a nonequivalent control group posttest only quasi-experimental design for the main focus of this study. What internal and external validity threats did the researchers address in their design? How did they address them? Are there threats they did not address? If so how does the failure to address the threats affect the researchers’ interpretations of their findings? Are Goldberg’s conclusions convincing? Why or why not? Conducting each groups survey at one time, and gathering surveys from different socio-economic groups in disparate locations, the threats to internal validity through social interaction were eliminated or minimized. The study WattsSEDU7006-8-5 13 consisted of multiple groups, eliminating the single group threats to internal validity. With multiple groups there is only a single threat to internal validity; that of selection bias (Jackson, 2012). By taking the surveys at one time for each group, and not having a pre-test the selection threats of testing, instrumentation, mortality, and regression were eliminated. With all surveys for all groups taken within a two-week period the selection threat of maturation was minimized or eliminated. The only remaining threat to internal invalidity is history; there could have been some event that occurred between the surveys taken in the schools and the surveys taken in the camps that may have affected the later collection. Goldberg (1990) addressed the threat of history by using a t-test to determine if there were any significant differences between the groups with regards to the dependent measures, and there were not. External validity addresses the generalizability of a study. For this study the major threat to external validity was that the groups were not randomly selected into the control and experimental groups. Goldberg (1990) raised the issue that an alternative explanation for the results may be “due to cultural differences between the two groups” (p. 453). Goldberg attempted to address this issue by comparing the dependent variables with the independent variables within-groups, finding that “at comparable levels of ACTV viewing, English- and French-speaking children purchased equivalent numbers of children’s cereals” (p. 453). He admits, however, that in regards to the recognition of toys “other factors . . . may have contributed to the observed difference in toy awareness levels between English- and Frenchspeaking children” (p. 453). Goldberg’s (1990) conclusions are reasonably convincing, but there are limitations. The study shows that there is a difference between the language groups, and his rationale regarding the children’s access to commercials geared toward them is appealing. A major weakness to the WattsSEDU7006-8-5 14 overall conclusion is that he only compares children within the single province of Quebec, and admits that differences in recognition could be the result of cultural factors, and he does not proceed to the next logical step of showing whether recognition of toys is the same as buying them. So, I am willing to accept that advertising directed toward children is effective at garnering brand recognition; I cannot find support that the legislation “appears to have reduced consumption of those cereals” (p. 453). Nowhere in his study did Goldberg have provisions for comparing current consumption with past consumption. Further, his conclusion that reduced exposure to toy commercials “would leave children unaware of the toys and thus less able to pressure their parents to buy them” (p. 453) is not warranted by his study, since he tested recognition only, but failed to determine if this recognition translated into “pressure” on parents to buy more toys. WattsSEDU7006-8-5 15 References de Anda, D. (2007). Intervention research and program evaluation in the school setting: Issues and alternative research designs. Children & Schools, 29(2), 87-94. Retrieved from ERIC Database. (EJ762838) Dimitrov, D. M., & Rumrill, P. D. Jr. (2003). Pretest-posttest designs and measurement of change. Work, 20(2), 159-165. Retrieved from http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCoQFj AA&url=http%3A%2F%2Fwww.phys.lsu.edu%2Ffaculty%2Fbrowne%2FMNS_Semina r%2FJournalArticles%2FPretest-posttest_design.pdf&ei=Z0V1T4rLJ8Wi2gXummUDQ&usg=AFQjCNF6-c4j-7JJKm9ohkXr6jXJ1T7pA&sig2=BW7hMizEWzQBfgUIIQ30rA Ellis, T. J., & Levy, Y. (2011). Framework of problem-based research: A guide for novice researchers on the development of a research-worthy problem. Informing Science: the International Journal of an Emerging Transdiscipline, 11(1), 17-33. Retrieved from http://inform.nu/Articles/Vol11/ISJv11p017-033Ellis486.pdf Goldberg, M. E. (1990). A quasi-experiment assessing the effectiveness of TV advertising directed to children. Journal of Marketing Research, 27(4), 445-454. Retrieved from http://www.jstor.org/stable/3172629 Greenhoot, A. F. (2003). Desgin and analysis of experimental and quasi-experimental investigations. In M.C. Roberts & S. S. Ilardi (Eds.), Handbook of Research Methods in Clinical Psychology (p.92-114). doi:10.1002/9780470756980.ch6 Harris, A., McGregor, J. C., Perencevich, E. N., Furuno, J. P., Zhu, J., Peterson, D. E., & Finkelstein, J. (2006). The use and interpretation of quasi-experimental studies in medical informatics. Journal of American Medical Informatics Association, 13(1), 16-23. doi:10.1197/jamia.M1749 Hoadley, C. (2007). Learning sciences theories and methods for e-learning researchers. In R. Andrews, & C. Haythornthwaite (eds.), The SAGE handbook of e-learning research (pp. 139-156). Los Angeles, CA: SAGE. Jackson, S. L. (2012). Research methods and statistics: A critical thinking approach (4th ed.). Belmont, CA: Wadsworth Cengage Learning. Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: Randomization to the rescue. Psychological Methods, 15(2), 124144. doi:10.1037/a0017736 Qumtin, E., Bélanger, C., & Lamontagne, V. (2012). A single-case experiment for an innovative cognitive behavioral treatment of auditory hallucinations and delusions in schizophrenia. International Journal Of Psychological Studies, 4(1), 114-121. doi:10.5539/ijps.v4nlp114 WattsSEDU7006-8-5 16 Trochim, W. M. K., & Donnelly, J. P. (2008). The research methods knowledge base (3rd ed.). Mason, OH: Cengage Learning.