H 615 Week 4 Comments/Critiques The readings cover two types of study designs considered stronger than the quasi-experimental designs previously discussed, yet still limited with respect to causal inferences when compared to randomized experiments. The basic form of the Interrupted-Time Series (ITS) design requires one treatment group with many observations before and after treatment. Other quasi-experimental design features can be added to the basic design to strengthen causal inferences. The Regression Discontinuity (RD) design involves preassignment to treatment or control based on a cutoff score or pre-program measure, making it an appealing design for researchers who wish to target a program/treatment to those in greatest need. ITS and RD designs are similar in that the effect occurs at a specific point on a continuum, where time is the continuum for ITS and the assignment variable for RD (Shadish et al, 2002). Both designs are improved when combined with a Randomized Controlled or Group-Randomized Trial (RCT/GRT). For example, Biglan et al (2000) suggests the multiple baseline design (MBD) be utilized to identify potentially effective interventions followed by RCTs to test efficacy and generalizability. Similarly, Shadish et al (2002) suggests RD designs be combined with randomized experiments to I increase the power and ethics of both designs. I feel like I come back to this idea every week but reading about the time series and regression-discontinuity designs made me think about again the importance of being a good, thorough researcher. Time series and RDD designs force the researcher to know, and address, as many of the possible validity threats as possible. But it also means they must know when to look at when the effect should take place and who should be included in the treatment groups. For time series designs, a policy change in January may not mean an effect will be seen in February (or maybe even the following January). For RDD designs, a researcher must have strong confidence that a 70 percent on an entrance exam to college, for example, is the cutoff point where students would need additional support to continue enrollment. This also goes back to programs of research in a way. A researcher might not know during the first study a 70% is the appropriate cutoff mark. Maybe it is 60% or 80%. This finally drives me back again to policy. How do political forces effect the implementation or effectiveness of programs that we look at using these two designs? The Biglan et al article, to me, was the most interesting. It addressed a viable experimental option when an RCT is not feasible. I think it would be interesting to discuss the pros and cons of how each group is first a control and then a treatment. Biglan’s article relates to public health and how the time-series experiment can be used to develop and evaluate community interventions. Particularly relevant is his framing of the time-series experiment as an initial step and a compliment to RCT. Using this experimental approach in the context of a community based participatory model would be quite powerful. I think this would address Rhoda’s concerns around the confounding factors of community effects, etc. In fact, participation may improve efforts to address internal and external validity but also have the benefit of communities helping other communities achieve similar effects (a grass roots effort in the making). After the principle factors have been identified and adjustments have been made to the intervention, it would be prudent to utilize an RCT to assess its efficacy and effectiveness. This complimentary process seems to me to be the basis of a well-designed RCT experiment that enhances both internal and external validity. ITS, RD, and MBD designs are especially relevant when assessing the sustainable impacts of programs and policies. The readings advocate for these designs, noting the cost and oft impracticality of RCTs, the importance of identifying key principles that connect independent variables to outcomes, and the distribution of effective interventions to vulnerable populations. Although ITS and RD can supplement quasi or experimental designs, as stand-alone, is randomization is less relevant because sampling is purposive and treatment groups can serve as their own control? Biglan et al assert there is need to understand the contextual principles that lend themselves to applicability of interventions across communities – could ITS or RDs substitute for qualitative work, such as the case study, when resources are limited? Importantly noted, when findings lack generalizability, it is not necessarily a failure of the study, but a “clue” that the moderating influences of particular communities need to be better understood to demonstrate similar effects of the intervention. Rhoda et al assert confounding factors may muddle causal inference, given investigators’ inability to control many aspects of community-based research, and although the readings were compelling in the application of these designs, when causal inference is of utmost importance, RCTs remain supreme. 1 H 615 Week 4 Comments/Critiques The readings for this week reemphasized the complexities of design choice by introducing even more options. While time-series designs were fairly straightforward, I had some difficulty digesting the regression discontinuity designs. As mentioned in the book chapters, and the Trochim (2006) and Pennell et al. (2010) articles, a major benefit of the RDD is that it allows the treatment to be offered based on need, thus providing an ethical advantage over RCTs. These readings also understandably note that to achieve comparable statistical power, an RDD needs a sample size about 2.75 times that necessary for a RCT, hence minimizing efficiency. But what about the efficiency of allocating resources given the deliberate assignment to treatment and control based on need in an RDD? On the other hand, how do we deal with the possibility that those determined to be most at need, are actually not, or that the cutoff is exclusionary? When, if ever, does clinical importance/necessity take precedence over statistical significance/power? Though the readings touched on this a bit, I think it would be useful to further discuss when/how it would be appropriate to combine RDDs and RCTs for optimal power and solid ethics. Chapters 6 & 7 of the SCC text introduce interrupted time series (ITS) and regression discontinuity (RD) designs in addition to an array of elements that may strengthen each with regard to the plausibility of causal inference. The Biglan et al. article discusses why/how two types of ITS designs (i.e. across and within case(s) multiple baseline designs [MBD]) could be used in community-based prevention research; whereas, Rhoda et al. 2011 compare the MBD (with emphasis on the stepped wedge design [SWD]/similar to the across cases MBD) to group randomized trials (GRT). Penell et al. (2011) describe how/why the RD design including a randomization interval (for group assignment) could be extended to the cutoff group randomized trial (CO-GRT) design. As Rhoda et al. point out in their discussion of SWD versus GRT, the array of research designs described across these readings may overlap according to how they are specified. One of many questions that arose for me across these readings is: what would be a feasible/socially-acceptable example of the last RD/experimental design described by Pennell et al. (i.e. making the probability of treatment assignment vary continuously with the level of a baseline covariate) for a community-based research study? Is this ever done? Biglan et al. argue time-series are more affordable than RCT but go on to describe how multiple time-series are useful to find out mechanisms that make a program successful in subgroups of the population, which could be quite costly too. They go on to claim that grouping time series into quasi-experimental category will make people think (1) less of time-series or (2) that any quasi-experiment can validate an intervention. If that is the case, there should be more of an emphasis on educating researchers about different methodologies if they can’t determine the validity differences between time-series and single group post-test only designs. Furthermore, they advise against randomizing program delivery in favor of delivery when there is a steady state of the variable of interest, which contradicts SCC and Rhoda who mention randomization as a way to improve the time-series design. For RDD it is interesting that all the readings credit this design with being more ethically sound when there may be biases in the selection of the cut-off process and programs are still being kept from people who could benefit from it. I’m not sure that I would say it is on higher moral ground just different moral ground. The Time Series (TS) and Regression Discontinuity (RD) research designs are quasi-experiments which are exceptionally good at modeling a treatment’s counterfactual argument. However, Rhoda et al. and Pennell et al. argue that these designs are not as effective or efficient as RCT experiments in detecting effect sizes. The crucial deciding factor in whether to use these designs will therefore be feasibility. For the TS, the researcher must ask whether it is feasible to find a randomized sample of the population of interest – if a researcher wants to study community-wide patterns as Biglan et al. promote, an RCT might be infeasible. For community interventions, the TS could be combined with focus group and interview methods to capture multiple levels of effects and potential moderators. If an RCT study of an intervention is desired but there are ethical concerns about denying treatment to certain groups, an RD study is useful. In either case, the interventionist might want to use mixed methods, or consider this study as a single step in a program of research. The TS can provide general preliminary findings to inform more specified studies, while the RD can serve to confirm previous hypotheses about treatment in a more controlled environment. With the time series design, we are reminded again of the critical role of temporality in establishing causal inference. More than any other design so far discussed, the fundamental assumption of exposure preceding 2 H 615 Week 4 Comments/Critiques outcome in determining causality is particularly evident with this design. The close attention to details of time helps to reduce potential historical bias and further improve internal validity. However, the realities of research today (particularly for the fledgling behavioral scientist) do not allow the luxury of long swathes of time (or boundless funding) required to collect data firsthand. A more realistic approach might be to take advantage of meticulously kept archives whenever possible. These would still allow for the multiple pre-treatment points that are necessary for this type of design, while (hopefully) being much less capital intensive. Similar to the time-series design, in its likeness to a scale with intervals, the regression-discontinuity design seems to be a middle ground between random-experimental and quasi-experimental designs; allowing for both high internal validity as well as practicality (i.e., ease of implementation). This is a huge advantage in behavioral science where issues of ethics may preclude the utilization of, admittedly, more rigorous experimental designs. In fact, it seems to be generally agreed upon than when done right, this design can be just as powerful as the randomized experiment in making deductions about causality. After reading through this week’s chapters and articles I found myself wondering if studies that apply different designs can be cross-examined. I understand the concept of a meta-analysis study, but if the study designs vary is it truly feasible to compare results and compile information about the outcomes measured? I presume that the answer is rarely simple, even very clear for those in the field of research, but as a novice researcher I often find I draw such basic questions to generate an understanding of the material. As research evolves, for instance, the recent emphasis on community-based intervention research, do we continue to throw out previously highly recognized study designs? If so, what does the literature then tell us about trends if designs fluctuate in value or recognition? The regression discontinuity design randomizes according to level of need, in which individuals most in need are assigned to the treatment group. Is this not a form of bias? What about attrition in the control group? If subjects learn that they did not qualify for the treatment could we anticipate a drop out rate for the control group, perhaps based subjective reasoning (e.g., subject or group does don’t feel important enough to receive the treatment)? For no particular reason, except that the designs didn’t (generally appear to) utilize randomization in this week’s readings, I lumped them in my mind together with quasi-experimental designs. That was before I read (and later began to understand) the angsty arguments about how these designs ought to be designated. In comparison to Shadish, Cook and Campbell’s outright enthusiasm about interrupted time-series, I was surprised that Biglan and colleagues devoted almost an entire page to differentiating interrupted time-series from quasi-experimental designs in a very negative light. The fear in that article that policymakers would think that “any quasi-experimental design provides sufficient evidence” (page 43) for a given program reminded me of Shonkoff’s work on translating study results to practitioners and policymakers and the caution required of a researcher when phrasing results for audiences without scientific training. Perhaps Biglan’s outrage is evidence that thoughtfully designing a study and skillfully matching it with appropriate statistical analyses is not more important than being able to accurately translate one’s findings into information for other audiences. SCC posit that the interrupted time series is a powerful design for allowing for causal inferences in quasiexperiments. Although statistical modeling suggests at least 100 time points, in practice models can be built with much less. This design is further strengthened by the addition of design elements covered in other chapters that reduce threats to various types of validity. Another design that SCC see as a particularly strong quasi-experimental design is the regression discontinuity design where individuals are not randomly assigned to treatment or control groups, but are assigned based on a cut-off point that is a continuous variable that is not caused by the treatment. RDD has certain commonalities with interrupted time series design such as that the effect of treatment usually impacts the slope or intercept. Both of these designs are considered good alternatives to the gold-standard randomized trials when resources are limited or when it would be unethical to randomize and treat. Researchers interested in community interventions are pushing for more use of such designs and for the recognition that these designs are not equal to other quasi-experimental designs in their ability to allow generation of causal inferences (Biglan, Ary & Wagenaar, 2000). 3 H 615 Week 4 Comments/Critiques Interrupted time series (ITS) and regression discontinuity designs (RDD) allow health intervention researchers to closely approximate the results found in randomized control trials without the requirement of randomization. Shadish, Cook, & Campbell (2002) recommend ITS design use when longitudinal data is available for a population prior to an intervention, during an intervention, and after the intervention. This type of study design, as well as RDD, allows researchers to utilize readily available statistical analysis packages when interpreting causation. Biglan, Ary, & Wagenaar (2000) refer to the financial benefits of using ITS designs, which is important when using public funds to conduct research. Shadish et al. (2002) explain RDD is commonly used in educational interventions, but can also be used in health behavior and promotion (HBHP). An example of its use in HBHP would be when doing a pretest of BMI on a group of people, giving those with a BMI of 30% an obesity related intervention, giving those below 30% no intervention, and then comparing the BMI’s of both groups after the intervention. Pennell, Hade, Murry, & Rhoda (2011) also support the use of RDD in community health intervention research since the results can be more generalizable. Randomized control trials are considered the standard for all tests. However, there are situations where multiple baseline designs, interrupted time series, or regression discontinuity is preferable for the experiment. Biglan mentioned randomized trials as the most effective, but is not appropriate if cost is an issue. Pennell stated random trials might deny benefits for some groups, which I could see happen to inaccessible groups. SCC and Biglan reference interrupted time series as a strong alternative when random trials are not possible and timeseries is available. Effective interventions to curb substance abuse, consumer safety, and alcohol usage are examples of interrupted time series designs. SCC says regression discontinuity involves the cutoff score and is powerful when the cutoff is at the mean of the assignment variable. Despite the advantages of the alternative designs, I found Biglan, Rhoda, and SCC collectively comment how these designs are not effective in drawing causal inferences. What I appreciated about the message from these readings is as future professionals in prevention, incorporating these designs (regardless of which one) in programs, it is not about if they work (we sometimes emphasize excessively), rather it is to learn about the precision, scope, and depth of the interacting variables. In arguing whether MBD is acceptable/superior to GRTs, an important consideration is the burden of participation. In MBD the number of times a measurement is taken is much higher than in GRT, and depending on the type of measurement that is being taken, attrition may be increased due to member fatigue. If the measurement involves, for instance, a blood test, great care must be taken to minimize the frequency of measurement. Furthermore, the problem of instrumentation and recency effects arises when measures are used many times over several weeks/months. Rhoda and colleagues ignore this as a limitation, and only discuss prohibitive cost as an issue for GRTs, but high cost of frequent measurement presents a challenge here as well. Biglan accuses RCTs of being expensive in comparison, but offers only a series of examples of expensive studies, with no evidence that these studies are representative of community RCTs. Depending on the dependent variable, a RDD may be a beneficial design for addressing the cost of frequent measurement, while still administering the intervention to populations that are in most need. This, of course, carries the problem that results may not be generalizable to populations scoring below the cutoff on the DV. 4