H615 Week 3 Comments

advertisement
H 615 Week 3 Comments/Critiques
Prior to this week’s readings, I was unaware of the great variety of quasi-experimental designs that exist.
The utility of quasi-experimental designs was supported across all of the readings, from slightly different
perspectives. Shadish et al (2002) argued that commonly used quasi-experimental designs can be
strengthened with the inclusion of design elements such as pretests and control groups to reduce the
plausibility of validity threats and to perhaps (particularly when added to interrupted time series)
contend with causal inferences resulting from randomized experiments. Cook et al (2008) determined
three conditions when causal estimates from quasi-experimental designs rival experimental estimates:
regression-discontinuity studies, matching groups on pretest, and known selection process into
treatment. Des Jarlais et al (2004) cite multiple sources in support of including findings from quasiexperimental designs in the development of evidence-based recommendations to strengthen public
health practice. All of the readings promote the use of nonrandomized designs together with, or as
possible alternatives to, randomized experiments. Ultimately, the best study design depends on the
research question, consideration for validity threats, feasibility, and intent to evaluate efficacy vs.
effectiveness. I was intrigued by the nonequivalent dependent variable posttest, and curious about its
frequency of use in study design.
While we would prefer to study human health and development using randomized experiments, this is
sometimes impossible due to logistic, funding, and ethical constraints. Cook et al. (2008) argue the social
sciences will always need alternatives to randomized experiments, the key issue being how investigators
can design strong nonrandomized studies. They also point the important difference between a quasiexperimental and a non-experimental study, and indicate that comparing randomized experiments with
poorly designed nonrandomized studies is unfair. Shadish et al. (2002) argue that quasi-experimental
studies can sometimes infer cause, but this causal inference requires collecting more data and making
more analytical assumptions; they encourage researchers to place emphasis in the former rather than
the latter, and caution them to tolerate the ambiguity of results despite their effort in building stronger
designs. One way to increase the relevance of non-experimental designs in the field of evidence-based
public health is to improve the reporting standards, being transparent and clear, and recognizing pitfalls
and limitations as well as strengths, as stated by Des Jarlais et al (2004). After all, a possible threat to
validity is not always a plausible one, and we should not automatically dismiss the potential of
nonrandomized studies.
When randomized controlled trials are not the appropriate choice, researchers must turn to quasiexperimental study designs. Shadish et al (2002) provided useful guidelines for designing studies starting
with the most basic design, conducting only a posttest, to more sound designs which incorporate control
groups with multiple pretests as well as posttests. Quasi-experimental designs have been accused of not
being rigorous enough to use their results to construct causal statements. This idea was disproved by
Cook, Shadish, and Wong (2008) who, while comparing study results of randomized control experiments
with exemplarily designed observational studies, found that both types of study design produced similar
patterns of statistical significance. These results were found across multiple studies. The importance of
exemplary study design and methods for interventions as well as detailed descriptions of that design
were focused on in the Des Jarlai, Lyles, & Crepaz (2004) commentary. Des Jarlai et al. developed a most
useful guideline, “TREND”, to help researchers evaluate the soundness of an intervention that uses a
quasi-experimental design approach. Researchers can use “TREND” to determine if articles are
presenting enough information about their study and methods to be of actual use in a broader context.
In the undergraduate class I TA for, someone recently (erroneously) referred to a study thusly: “they
gave some pregnant women more alcohol and some less, and looked at the effects on their kids.” Social
scientists must often forgo experiments to avoid such obvious ethical concerns. Researchers may
bemoan the fact that we cannot conduct experiments on every relationship of interest, and SCC
1
H 615 Week 3 Comments/Critiques
certainly show that all non-experimental designs are subject to validity threats. However, the take-home
message for these readings is that through careful, thoughtful design, quasi-experiments can produce
valid outcomes. The issue with many quasi-experimental designs is that the researcher has not taken
care to avoid threats to internal validity. This is the case with selection bias, as Cook, Shadish and Wong
point out. I can think of many studies in which this threat occurred, but researchers relied on “off-theshelf” selection correlates like race, SES, and gender instead of exploring specific theoretical selection
correlates. The TREND guidelines proposed by Des Jarlais, et al. further help researchers to design
studies to meet high standards of validity, and to publish findings in digestible and analyzable ways.
These ideas will help me to design more effective intervention studies in the future.
These readings restored my faith in our collective ability to successfully (with practical means) measure
change in people and their surrounding world (although Table 4.3 read like a synopsis of social
psychology). They all highlighted the importance of careful thought and planning about designs other
than RCTs, and emphasized how to phrase any implications about causal effects resulting from said
studies. I admit that I’m far more comfortable with the designs that use both control groups and
pretests, especially the double pre-test (pp. 145). Let’s say that there was a double pretest of parent
involvement, then the treatment of childcare setting (Head Start, daycare, parent care, preschool), next
the post-test regarding parent involvement was conducted during the end of the treatment. How much
of a concern would it be if the post-tests were given over a series of weeks (where some had nearly
completed the treatment, others entirely completed, and others completed some time ago)? Page 158
briefly mentions that in an ideal setting, the treatment is temporally separated from post-test, but did
not address any statistical approaches for coping with the varied timing of post-tests.
Quasi-experiments are those that resemble randomized experiments, but do not include random
sampling assignment in the research design because of various constraints. These experiments may
limit the ability to extract descriptive causal inferences, especially when done without a pretest or
control group. Careful consideration of the alternative explanations and the minimization of threats to
validity with a priori design elements can strengthen the descriptive causal inferences drawn from quasiexperiments.
If randomization is not possible, in addition to research design elements, a researcher can use
other methods of assignment such as a cutoff point on a specified variable as is the case with regression
discontinuity design (RDD) in order to approximate randomization. RDD has been shown to be
concordant with random assignment in a few studies that used within-study comparisons to assess the
outcomes of both methodologies (Cook & Wong, 2013). Another strategy to improve causal inferences
drawn from quasi-experiments has been suggested by the TREND group, who follow clinical trials
researchers in proposing a transparent reporting system of evaluations that would standardize reporting
procedures for behavioral interventions. Both of these approaches represent attempts to increase the
validity of quasi-experimental designs that are commonplace within the constraints of the social
sciences.
It seems clear to those in the social and behavioral sciences that the RCT while long considered the holy
grail of intervention research, is not always an ethically or practically plausible option as we seek to
identify effective and efficacious interventions. As Cook, Shadish and Wong discuss, there are cases in
which RCTs are not necessary, where the causal inferences of non-randomized experiments can take the
place of RCT results, with the proper statistical adjustments in place as a precaution. Even within-study
analysis of results provides little clarity about the level of agreement between various studies without a
systematic method of analyzing study results and effect sizes from differing methodologies. The TREND
guideline for publication of non-experimental study publication offers a set of critical factors which
authors should consistently include in manuscripts to facilitate analysis of results between studies. I
2
H 615 Week 3 Comments/Critiques
wondered as I read these articles, after the design considerations raised by SC&C, why is it that the RCT
is so highly regarded in research, if other methods of inquiry can offer insight into causation as well?
And do we lose a valuable aspect of the causal relationship by clinging so tenaciously to the RCT?
The readings for this week suggest that though quasi-experiments contribute to the field, such studies
are “weaker” than their experimental counterparts. As is noted by Cook et. al (2008), “the randomized
experiment reigns supreme institutionally supported through its privileged role in graduate training,
research funding, and academic publishing” (725). However, the same piece suggests that quasiexperiments can contribute to the field when threats to validity are accounted for. Similarly, the book
chapters suggest that quasi-experiments should be included in our methodological repertoire (134), but
that with such studies, researchers need to “tolerate ambiguity” (161), essentially perpetuating the
prevailing scientific norms privileging experimental designs. This is echoed by Jarlais and colleagues
(2004), who state “ when empirical evidence from RCTs is available, ‘weaker’ designs are often
considered to be of little or no evidentiary value’” (361). What are the consequences of this? Much like
with the tendency to overlook null hypotheses and qualitative research, could being overly critical of
quasi-experimental designs lead to the dismissal of potentially important contributions? Given real
issues of practicality and restrictions, is it necessary to change these attitudes/norms in the field? If so,
how can we go about doing that?
To be completely honest, the many designs and factors to draw inference without threat of validity
appears to be more challenging then actually performing a study. I am also curious to know what exactly
a journal is looking for when they review a study for publication. Are they addressing issues in design
and threats of validity? Does a study’s ability to draw causal inference increase a journal’s impact factor?
For instance, studies that employ more appropriate study design and methodology translate to stronger
overall impact of the journal on the field of research?
The availability and overall ability of a researcher to perform a randomized experiment
influences, often decides, the study design. Is the amount of attention warranted by quasi-experiments
and nonexperimental studies to reduce threats of validity and improve chance for causal inference as
valuable as being able to execute a randomized experiment? Do researchers get “lazy” in designing a
quasi-experimental study because there are so many factors to consider?
Finally, how deep is the evidence for synthesizing results across randomized experiment, quasiexperiment, and nonexperiment study designs? Earlier reviews found discrepancies between
randomized experimental and observational studies but a more recent review found the opposite in that
observational research produced similar outcomes to that of experimental design.
Quasiexperimental designs, their nuances discussed in detail in Shadish, Cook, & Campbell (2002), share
many features with randomized experiments (RCTs) aside from randomization. Although RCTs are
considered the “gold standard” in causal inference, Cook, Shadish, & Wong (2008) present within-study
comparisons that demonstrate in some quasiexperimental designs – regression discontinuity,
abbreviated interrupted time-series, and simplistic quasiexperimental designs (if criteria of population
matching (e.g., Bloom et al., 2005) and careful measurement selection (e.g., Shadish et al., in press) are
met) – results produced mirror findings from experiments. Thus policy makers should have reasonable
confidence in their use of quasiexperimental studies in their efforts toward evidence-based policy
creation; an especially valuable notion when random assignment is not feasible. The TREND checklist
creates a framework to improve the quality of data reported for quasiexperimental studies and lends
further support to policy makers’ decisions (Des Jarlais et al., 2004). Albeit optimistic, questions still
remain for quasiexperiments, especially regarding the context and conditions for producing unbiased
results.
3
H 615 Week 3 Comments/Critiques
Campbell and Stanley, in 1963, state that there is no perfect study and the general point of CS and CCS is
to examine the validity based strengths and weaknesses of different designs and possible
counterbalances of those weaknesses all with the understanding that designs may be dictated by
situational constraints. While the idea of the within experiment comparisons to examine which quasimethods produce results that mimic RCT results (such as those examined by Cook et al.) can provide
support for stronger causal statements when RCT’s are not possible, these studies exacerbate the
largely arbitrary debate over which quasi-method is “best”. Instead, researchers should recognize the
utility, strengths, weaknesses, and situational constraints of each design, and take what can be learned
from the study based on its strengths to help inform future studies on the correlation of interest. Fewer
resources should be placed on which design is “better” and more on the publication of more detailed
study designs for RCT, quasi, and non-experimental studies as advocated by Des Jarlais et al. These
publications would allow researchers to make more informed conclusions about the results of single
studies and thus strengthen their ability to use these results to inform future studies.
After reading Chapters 4 & 5 in addition to the Cook, Shadish and Wong article, I am still fuzzy on when
the use of propensity scoring versus a regression discontinuity design versus an instrumental variable is
appropriate to strengthen what Cook et al. categorize as non-experimental study designs. I am
specifically thinking of secondary data analysis projects using cross-sectional or perhaps only two years
of panel survey data; how would I determine which of these methods is the most appropriate to use in
creating a comparable comparison group to minimize/detect selection bias? Further, should “off-theshelf” covariates be utilized to create and apply a propensity score in matching treatment to comparison
cases if those covariates are all that are available? In a similar vein, is the “shotgun” approach to
selecting variables for a propensity score that bad if we don’t know exactly what combination of
variables/factors are correlated with treatment and effect(s) (i.e. theoretical/empirical literature on a
given cause and its effect(s) is scant)? More broadly, how can we discern when the addition of one or
more of these design elements to strengthen the causal inference from a given study is worth pursuing
(and/or convincing others we should)?
The readings for this week raised questions in my mind about the assumption in randomized
experiments (RE) that random assignment to treated and control groups is indeed enough to eliminate
bias. The article by Shadish ,Cook, and Campbell talk about building designs that render threats to
internal and external validity implausible. Cook, Shadish and Wong, when comparing REs to QEDs,
suggested that this is plausible outside of REs and the key is greater precision around the selection and
measurement of covariates in reducing the influence of bias; the more effort, generally the more
comparable. Where I am struggling is how can we say an RE is superior because of random assignment?
I understand the statistical concept but the RT examples in the book and articles talk about random
assignment from a subset of a population (schools in a school district among a nation of school districts)
and because random assignment was used the experiment were somehow better. Don’t we still have
bias, hidden bias, and still need to face all of the threats to validity? I am not sure how REs are any
different or superior to QEDs in this situation. In either case researchers need to be knowledgeable to all
threats of validity and still face selection bias. I know I am oversimplifying this but if well-done QED can
approximate REs why do we need REs? Why not consistently employ meta-analyses? [You are missing
something critical – and I am glad you laid it out! Randomization provides for maximum internal validity
– that is, causal inference. It does nothing for external validity (generalizability). So, when causal
inference is most important, which it often is, then an RCT is the strongest approach. When
generalizability is most important, which it is AFTER we know that something is efficacious, then, other
designs might be better. However, as we will see in later chapters, there are things we can do to enhance
generalizability from RCTs.]
4
H 615 Week 3 Comments/Critiques
Various reasons can lead researchers to exclude randomization, control groups or pre-test observations,
moving away from long-standing, gold-standard traditions of RCTs. While causal inferences could be
compromised without such features, the readings were compelling in how best to guard against validity
threats in quasi-experimental designs. Questions for my own research arose: Could one-group posttestonly designs be appropriate for programs aiming to increase pro-social bystander behavior because this
knowledge and these skills are not commonly taught in other areas of one’s life, and thus, be only
attributed to an intervention? For research on rare events, such as bystander intervention, finding
suitable proxies are challenging. Could the TTI’s “Related Behaviors” serve as a proxy, if the literature
supports covariance between the related behavior and the behavior of interest? If found to be valid
measures, students could potentially be matched based on their likeliness to intervene in other, related
risk situations. This strategy would be more compelling if this construct was a near perfect predictor of
intent to intervene in dating violence and sexual assault situations, but it would not impossible to
support as valid if journal space allowed enough room for investigators to be transparent in their
explanations for such design features.
The readings pointed out the importance of being strategic, thorough, and knowledgeable when
completing quasi-experiments. Using quasi-experiments challenges researchers to use the most
comprehensive model possible, and in addition, makes them take into account the issues that most
likely will come into play. This must be done before the study is conducted, meaning researchers need
to have thorough knowledge of the sample, setting, and other aspects to project’s possible issues. It is
important to know what ones priorities are in terms of validity and the literature of similarly conducted
studies to understand if more resources should be utilized toward additional pre-tests, posttests, and
other components. This thoroughness in terms of research design and implementation must then
follow-through to the reporting and publishing of it. The documentation must be done to a high quality
to help push the field forward and help enlighten future avenues for research. This makes me ask the
question, because of the importance of documenting quality reports on research design, are we not
doing justice to our work by saying “the sample came from a town next to a large Midwestern
university” instead of saying Ann Arbor or Columbus?
5
Download