H615 Week 4 Comments

advertisement
H 615 Week 4 Comments/Critiques
The readings cover two types of study designs considered stronger than the quasi-experimental designs previously
discussed, yet still limited with respect to causal inferences when compared to randomized experiments. The basic
form of the Interrupted-Time Series (ITS) design requires one treatment group with many observations before and
after treatment. Other quasi-experimental design features can be added to the basic design to strengthen causal
inferences. The Regression Discontinuity (RD) design involves preassignment to treatment or control based on a
cutoff score or pre-program measure, making it an appealing design for researchers who wish to target a
program/treatment to those in greatest need. ITS and RD designs are similar in that the effect occurs at a specific
point on a continuum, where time is the continuum for ITS and the assignment variable for RD (Shadish et al,
2002). Both designs are improved when combined with a Randomized Controlled or Group-Randomized Trial
(RCT/GRT). For example, Biglan et al (2000) suggests the multiple baseline design (MBD) be utilized to identify
potentially effective interventions followed by RCTs to test efficacy and generalizability. Similarly, Shadish et al
(2002) suggests RD designs be combined with randomized experiments to I increase the power and ethics of both
designs.
I feel like I come back to this idea every week but reading about the time series and regression-discontinuity
designs made me think about again the importance of being a good, thorough researcher. Time series and RDD
designs force the researcher to know, and address, as many of the possible validity threats as possible. But it also
means they must know when to look at when the effect should take place and who should be included in the
treatment groups. For time series designs, a policy change in January may not mean an effect will be seen in
February (or maybe even the following January). For RDD designs, a researcher must have strong confidence that a
70 percent on an entrance exam to college, for example, is the cutoff point where students would need additional
support to continue enrollment. This also goes back to programs of research in a way. A researcher might not
know during the first study a 70% is the appropriate cutoff mark. Maybe it is 60% or 80%. This finally drives me
back again to policy. How do political forces effect the implementation or effectiveness of programs that we look
at using these two designs?
The Biglan et al article, to me, was the most interesting. It addressed a viable experimental option when an RCT is
not feasible. I think it would be interesting to discuss the pros and cons of how each group is first a control and
then a treatment. Biglan’s article relates to public health and how the time-series experiment can be used to
develop and evaluate community interventions. Particularly relevant is his framing of the time-series experiment
as an initial step and a compliment to RCT. Using this experimental approach in the context of a community based
participatory model would be quite powerful. I think this would address Rhoda’s concerns around the confounding
factors of community effects, etc. In fact, participation may improve efforts to address internal and external
validity but also have the benefit of communities helping other communities achieve similar effects (a grass roots
effort in the making). After the principle factors have been identified and adjustments have been made to the
intervention, it would be prudent to utilize an RCT to assess its efficacy and effectiveness. This complimentary
process seems to me to be the basis of a well-designed RCT experiment that enhances both internal and external
validity.
ITS, RD, and MBD designs are especially relevant when assessing the sustainable impacts of programs and policies.
The readings advocate for these designs, noting the cost and oft impracticality of RCTs, the importance of
identifying key principles that connect independent variables to outcomes, and the distribution of effective
interventions to vulnerable populations. Although ITS and RD can supplement quasi or experimental designs, as
stand-alone, is randomization is less relevant because sampling is purposive and treatment groups can serve as
their own control? Biglan et al assert there is need to understand the contextual principles that lend themselves to
applicability of interventions across communities – could ITS or RDs substitute for qualitative work, such as the
case study, when resources are limited? Importantly noted, when findings lack generalizability, it is not necessarily
a failure of the study, but a “clue” that the moderating influences of particular communities need to be better
understood to demonstrate similar effects of the intervention. Rhoda et al assert confounding factors may muddle
causal inference, given investigators’ inability to control many aspects of community-based research, and although
the readings were compelling in the application of these designs, when causal inference is of utmost importance,
RCTs remain supreme.
1
H 615 Week 4 Comments/Critiques
The readings for this week reemphasized the complexities of design choice by introducing even more options.
While time-series designs were fairly straightforward, I had some difficulty digesting the regression discontinuity
designs. As mentioned in the book chapters, and the Trochim (2006) and Pennell et al. (2010) articles, a major
benefit of the RDD is that it allows the treatment to be offered based on need, thus providing an ethical advantage
over RCTs. These readings also understandably note that to achieve comparable statistical power, an RDD needs a
sample size about 2.75 times that necessary for a RCT, hence minimizing efficiency. But what about the efficiency
of allocating resources given the deliberate assignment to treatment and control based on need in an RDD? On the
other hand, how do we deal with the possibility that those determined to be most at need, are actually not, or that
the cutoff is exclusionary? When, if ever, does clinical importance/necessity take precedence over statistical
significance/power? Though the readings touched on this a bit, I think it would be useful to further discuss
when/how it would be appropriate to combine RDDs and RCTs for optimal power and solid ethics.
Chapters 6 & 7 of the SCC text introduce interrupted time series (ITS) and regression discontinuity (RD) designs in
addition to an array of elements that may strengthen each with regard to the plausibility of causal inference. The
Biglan et al. article discusses why/how two types of ITS designs (i.e. across and within case(s) multiple baseline
designs [MBD]) could be used in community-based prevention research; whereas, Rhoda et al. 2011 compare the
MBD (with emphasis on the stepped wedge design [SWD]/similar to the across cases MBD) to group randomized
trials (GRT). Penell et al. (2011) describe how/why the RD design including a randomization interval (for group
assignment) could be extended to the cutoff group randomized trial (CO-GRT) design. As Rhoda et al. point out in
their discussion of SWD versus GRT, the array of research designs described across these readings may overlap
according to how they are specified. One of many questions that arose for me across these readings is: what
would be a feasible/socially-acceptable example of the last RD/experimental design described by Pennell et al.
(i.e. making the probability of treatment assignment vary continuously with the level of a baseline covariate) for
a community-based research study? Is this ever done?
Biglan et al. argue time-series are more affordable than RCT but go on to describe how multiple time-series are
useful to find out mechanisms that make a program successful in subgroups of the population, which could be
quite costly too. They go on to claim that grouping time series into quasi-experimental category will make people
think (1) less of time-series or (2) that any quasi-experiment can validate an intervention. If that is the case, there
should be more of an emphasis on educating researchers about different methodologies if they can’t determine
the validity differences between time-series and single group post-test only designs. Furthermore, they advise
against randomizing program delivery in favor of delivery when there is a steady state of the variable of interest,
which contradicts SCC and Rhoda who mention randomization as a way to improve the time-series design. For RDD
it is interesting that all the readings credit this design with being more ethically sound when there may be biases in
the selection of the cut-off process and programs are still being kept from people who could benefit from it. I’m
not sure that I would say it is on higher moral ground just different moral ground.
The Time Series (TS) and Regression Discontinuity (RD) research designs are quasi-experiments which are
exceptionally good at modeling a treatment’s counterfactual argument. However, Rhoda et al. and Pennell et al.
argue that these designs are not as effective or efficient as RCT experiments in detecting effect sizes. The crucial
deciding factor in whether to use these designs will therefore be feasibility. For the TS, the researcher must ask
whether it is feasible to find a randomized sample of the population of interest – if a researcher wants to study
community-wide patterns as Biglan et al. promote, an RCT might be infeasible. For community interventions, the
TS could be combined with focus group and interview methods to capture multiple levels of effects and potential
moderators. If an RCT study of an intervention is desired but there are ethical concerns about denying treatment
to certain groups, an RD study is useful. In either case, the interventionist might want to use mixed methods, or
consider this study as a single step in a program of research. The TS can provide general preliminary findings to
inform more specified studies, while the RD can serve to confirm previous hypotheses about treatment in a more
controlled environment.
With the time series design, we are reminded again of the critical role of temporality in establishing causal
inference. More than any other design so far discussed, the fundamental assumption of exposure preceding
2
H 615 Week 4 Comments/Critiques
outcome in determining causality is particularly evident with this design. The close attention to details of time
helps to reduce potential historical bias and further improve internal validity.
However, the realities of research today (particularly for the fledgling behavioral scientist) do not allow
the luxury of long swathes of time (or boundless funding) required to collect data firsthand. A more realistic
approach might be to take advantage of meticulously kept archives whenever possible. These would still allow for
the multiple pre-treatment points that are necessary for this type of design, while (hopefully) being much less
capital intensive.
Similar to the time-series design, in its likeness to a scale with intervals, the regression-discontinuity
design seems to be a middle ground between random-experimental and quasi-experimental designs; allowing for
both high internal validity as well as practicality (i.e., ease of implementation). This is a huge advantage in
behavioral science where issues of ethics may preclude the utilization of, admittedly, more rigorous experimental
designs. In fact, it seems to be generally agreed upon than when done right, this design can be just as powerful as
the randomized experiment in making deductions about causality.
After reading through this week’s chapters and articles I found myself wondering if studies that apply different
designs can be cross-examined. I understand the concept of a meta-analysis study, but if the study designs vary is it
truly feasible to compare results and compile information about the outcomes measured? I presume that the
answer is rarely simple, even very clear for those in the field of research, but as a novice researcher I often find I
draw such basic questions to generate an understanding of the material.
As research evolves, for instance, the recent emphasis on community-based intervention research, do we
continue to throw out previously highly recognized study designs? If so, what does the literature then tell us about
trends if designs fluctuate in value or recognition? The regression discontinuity design randomizes according to
level of need, in which individuals most in need are assigned to the treatment group. Is this not a form of bias?
What about attrition in the control group? If subjects learn that they did not qualify for the treatment could we
anticipate a drop out rate for the control group, perhaps based subjective reasoning (e.g., subject or group does
don’t feel important enough to receive the treatment)?
For no particular reason, except that the designs didn’t (generally appear to) utilize randomization in this week’s
readings, I lumped them in my mind together with quasi-experimental designs. That was before I read (and later
began to understand) the angsty arguments about how these designs ought to be designated. In comparison to
Shadish, Cook and Campbell’s outright enthusiasm about interrupted time-series, I was surprised that Biglan and
colleagues devoted almost an entire page to differentiating interrupted time-series from quasi-experimental
designs in a very negative light. The fear in that article that policymakers would think that “any quasi-experimental
design provides sufficient evidence” (page 43) for a given program reminded me of Shonkoff’s work on translating
study results to practitioners and policymakers and the caution required of a researcher when phrasing results for
audiences without scientific training. Perhaps Biglan’s outrage is evidence that thoughtfully designing a study and
skillfully matching it with appropriate statistical analyses is not more important than being able to accurately
translate one’s findings into information for other audiences.
SCC posit that the interrupted time series is a powerful design for allowing for causal inferences in quasiexperiments. Although statistical modeling suggests at least 100 time points, in practice models can be built with
much less. This design is further strengthened by the addition of design elements covered in other chapters that
reduce threats to various types of validity. Another design that SCC see as a particularly strong quasi-experimental
design is the regression discontinuity design where individuals are not randomly assigned to treatment or control
groups, but are assigned based on a cut-off point that is a continuous variable that is not caused by the treatment.
RDD has certain commonalities with interrupted time series design such as that the effect of treatment usually
impacts the slope or intercept. Both of these designs are considered good alternatives to the gold-standard
randomized trials when resources are limited or when it would be unethical to randomize and treat. Researchers
interested in community interventions are pushing for more use of such designs and for the recognition that these
designs are not equal to other quasi-experimental designs in their ability to allow generation of causal inferences
(Biglan, Ary & Wagenaar, 2000).
3
H 615 Week 4 Comments/Critiques
Interrupted time series (ITS) and regression discontinuity designs (RDD) allow health intervention researchers to
closely approximate the results found in randomized control trials without the requirement of randomization.
Shadish, Cook, & Campbell (2002) recommend ITS design use when longitudinal data is available for a population
prior to an intervention, during an intervention, and after the intervention. This type of study design, as well as
RDD, allows researchers to utilize readily available statistical analysis packages when interpreting causation. Biglan,
Ary, & Wagenaar (2000) refer to the financial benefits of using ITS designs, which is important when using public
funds to conduct research. Shadish et al. (2002) explain RDD is commonly used in educational interventions, but
can also be used in health behavior and promotion (HBHP). An example of its use in HBHP would be when doing a
pretest of BMI on a group of people, giving those with a BMI of 30% an obesity related intervention, giving those
below 30% no intervention, and then comparing the BMI’s of both groups after the intervention. Pennell, Hade,
Murry, & Rhoda (2011) also support the use of RDD in community health intervention research since the results
can be more generalizable.
Randomized control trials are considered the standard for all tests. However, there are situations where multiple
baseline designs, interrupted time series, or regression discontinuity is preferable for the experiment. Biglan
mentioned randomized trials as the most effective, but is not appropriate if cost is an issue. Pennell stated
random trials might deny benefits for some groups, which I could see happen to inaccessible groups. SCC and
Biglan reference interrupted time series as a strong alternative when random trials are not possible and timeseries is available. Effective interventions to curb substance abuse, consumer safety, and alcohol usage are
examples of interrupted time series designs. SCC says regression discontinuity involves the cutoff score and is
powerful when the cutoff is at the mean of the assignment variable. Despite the advantages of the alternative
designs, I found Biglan, Rhoda, and SCC collectively comment how these designs are not effective in drawing causal
inferences. What I appreciated about the message from these readings is as future professionals in prevention,
incorporating these designs (regardless of which one) in programs, it is not about if they work (we sometimes
emphasize excessively), rather it is to learn about the precision, scope, and depth of the interacting variables.
In arguing whether MBD is acceptable/superior to GRTs, an important consideration is the burden of participation.
In MBD the number of times a measurement is taken is much higher than in GRT, and depending on the type of
measurement that is being taken, attrition may be increased due to member fatigue. If the measurement involves,
for instance, a blood test, great care must be taken to minimize the frequency of measurement. Furthermore, the
problem of instrumentation and recency effects arises when measures are used many times over several
weeks/months. Rhoda and colleagues ignore this as a limitation, and only discuss prohibitive cost as an issue for
GRTs, but high cost of frequent measurement presents a challenge here as well. Biglan accuses RCTs of being
expensive in comparison, but offers only a series of examples of expensive studies, with no evidence that these
studies are representative of community RCTs. Depending on the dependent variable, a RDD may be a beneficial
design for addressing the cost of frequent measurement, while still administering the intervention to populations
that are in most need. This, of course, carries the problem that results may not be generalizable to populations
scoring below the cutoff on the DV.
4
Download