H615 Week 7 Comments

advertisement
H 615 Week 7 Comments/Critiques
Randomized experiments are the gold standard for establishing causal inference, but they are limited
with respect to generalizability. Shadish et al (2002) argue that random sampling is inadequate to solve
the problem of generalized causal inference from experiments. Five principles are proposed to improve
generalizations of construct and external validity: surface similarity (similarities between study
operations and characteristics of target of generalization), ruling out irrelevancies (attributes that do not
change generalization), making discriminations (features of persons/settings/treatments/outcomes that
limit generalizability), interpolation-extrapolation (interpolating to unsampled values within sample
range; extrapolating beyond sample range), and causal explanation (developing/testing explanatory
theories about generalization target). Flay et al (2005) propose standards for prevention interventions
that are efficacious, effective, and ready for dissemination. Effective interventions must meet efficacy
standards, and four additional standards. Interventions ready for dissemination must meet another
three standards. Several “desirable standards” are also proposed, including measurement of proximal
outcomes (also suggested by Shadish as an approach for studying causal explanation), and
measurement of potential negative effects (question: why a required standard that efficacy claims must
have no serious negative effects, yet measurement is not required?). I would like to learn more about
how the standards have advanced prevention science since this publication.
This weeks readings suggest that we begin to think about intervention development as a program of
research rather than a series of independent studies. SCC and F/B recognize the limitations of the single
study and in response created principles and processes that shift the field and potentially funders
perspective of what it means to create a efficacious, effective and scalable intervention. SCC’s review of
the principles of surface similarity, ruling out irrelevancies, making discrimination, interpolation and
extrapolation in the face of purposive sampling demands greater thought and consideration than might
be devoted if one had access to a truly random sample and random assignment. SCC fleshes out how
real world barriers can be addressed via sampling methods and statistical techniques and still enlightens
our understanding of causal relationships. The SCC reading dovetails into the standards of evidence
outlined by F/B who speak to researcher accountability and responsibility to creating research that
meets professional expectations that address creation, translation and implementation of interventions
that impact human lives. These readings force the researcher to get their head out of the immediate
issues and develop a long-term vision of a product that is of value to science and the lives it will impact.
The readings this week explore challenges in establishing generalized causal inference while specifying
standards to which experiments should adhere to promote disseminating only the best interventions.
With increasing variability in the range of UTOS studied, is the extent to which findings in RCTs can be
generalized diminishing? The solution, formal sampling, appears costly (cluster sample when full
enumerations of the outcome variable are not possible) and beyond the reach of many investigators
(sampling different settings). Does a movement toward community-based participatory research by
some emphasize specificity over generalizability? Sampling theory can guide researchers to make
decisions intentionally. Measuring many constructs, along with mediators and moderators, can increase
chance of respondent fatigue, but is still preferred over limiting measurement tools to a few items of
high validity. The quest for psychometrically sound instruments, which is also one of Flay et al’s criteria
for Standards of Evidence, can continue indefinitely if the purpose is to find the right balance between
the number of items and the degree of validity, as could SEM to measure assumed causal inferences
when a variety of variables and mediators (and the different ways they are measured), are tested.
This week’s readings provide methods and considerations for improving generalizability of program
results through design considerations connected to the five principles of causal inference. Providing an
argument for large-scale program generalizability requires evidence in support of all five principles.
However, it is likely impractical or not possible to address all principles in one or even a few studies thus
1
H 615 Week 7 Comments/Critiques
a line of research is the best way to establish generalizability. As Flay et al. describe, a line of research is
necessary to establish efficacy, effectiveness, and dissemination. Research must progress through these
phases as evidence provided by each phase is required to establish the knowledge related to the next
research phase. For example, a study must meet all standards of efficacy before effectiveness can be
established. As there are multiple standards for each research phase, and replication of results is
recommended to provide the necessary support for results, programs of research over single-studies
should be used for program development, implementation, evaluation, and dissemination. Any single
study’s worth is not simply the outcomes of that study but the ability of other researchers to evaluate
the strengths/weaknesses of the study to determine what questions were answered/remain on
generalizability and build future studies accordingly.
The readings really drove home the point that a program of research is needed for a program to get to
the dissemination phase. It highlighted the benefit of using both qualitative and quantitative methods
throughout one’s research program. For example, Flay and colleagues made a comment that it is
important for the design to establish causal effects and that we must be confident that the
program/treatment was responsible for that effect. In Chapter 12 it highlighted also path models that
can statistically pinpoint associations between certain variables. However, without a program of
research teasing out other possible causes and by using multiple measures at times to ensure what we
are looking at does correlate with an outcome, we can’t be confident in our conclusions. Qualitative
methods can be brought to see if A is really causing B or if it really is an unmeasured variable. It was
especially beneficial to read about path models because without sound constructs and theoretically
rationale, significant results may be practically meaningless. This to me stresses the importance of
collaboration to better grasp how to measure variables, sample, and to theoretically and conceptually
discuss issues to get research to the dissemination phase.
The readings this week took us deeper into generalized causal inference from experiments, particularly
sampling procedures to promote strength in claiming causal inference. Research design is an important
contributor to our ability to generalize results and demonstrates the effectiveness of a program. If a
program is shown to be effective, efforts from the authors/developers should be to make the
intervention available to others. From my experience, physical activity literature has lacked thorough
explanation of the intervention. Even fewer studies actually provide information on how to obtain more
about the intervention. Is it not required of authors to make public tested/evaluated interventions?
From the information presented in chapter 12, and previous chapters on statistical use in developing
causal inference, I am still caught up in the idea that an investigator can generate a number of desired
outcomes purely by implementing the appropriate statistical methods. If this were the case, is it
common to observe “slacking” on the design of a study? As with the Duke incident, investigators can
easily manipulate data to reflect poor study design or errors on part of the researcher in appropriate
delivery of a study.
No matter how good one study or experiment is, alone, it is not enough for making causal inferences.
Single studies are unlikely to have enough power in terms of samples, settings, treatments, and
outcomes to answer our questions in a definitive manner. As Valentine et al (2011) state, replication is
an essential feature of science, and without it, we cannot make good decisions about public health
interventions. Shadish et al (2002) identify several ways to go around that issue, including multistudy
programs, narrative reviews of existing research, and meta-analysis – all of which have their flaws, but
when used in conjunction can strengthen causal inference. Shadish et al (2002) also make a call for reevaluating our assumptions as social scientists and being critical of our own work as we are of others. As
the field of experimentation evolves and becomes more specialized – both in terms of knowledge and
2
H 615 Week 7 Comments/Critiques
opportunity – critical evaluation of research becomes even more essential. To that end, Valentine et al
(2011) suggest that more funds and incentives be devoted to replicating studies and experiments as a
way to improve the field of public health.
As is evident from SCC, in the progression of an intervention from possible to plausible to proven, and
from proven within a limited set of circumstances to widely accepted and effectively implemented
(disseminated), a great number of potential pitfalls and detours exist, for which Flay, Biglan and
colleagues offer a set of standards to guide the researcher. One standard in particular struck me, first
for its impracticality, and then for its efficiency. Under Standard 2.b.ii, the first Desirable Standard
(Measurement of proximal outcomes and mediators) first struck me as an onerous and expensive
addition. In the behavioral sciences, such outcomes may be difficult to measure, may require the use of
proxy measures (a violation of another standard) and may not be initially intuitive. However, given the
likelihood that changes will need to be made to the original intervention protocol as study is conducted
on a broader demographic, it will prove useful to determine if an intervention still exerts its effect
through the same mechanisms, and if not, whether alternate outcomes are being affected. This can also
provide critical guidance in the case an intervention fails to have the intended effect in a new
population, and indicate whether the intervention is completely ineffective, or whether impact on
alternative outcomes may be acceptable indication of effectiveness. Despite the utility of such data,
each research endeavor is limited by cost, and to measure everything is difficult and costly.
These readings built on the foundation laid by previous readings and class discussions regarding
sampling strategies (and how researchers may correctly phrase results based on sampling), as well as
how to discover causal paths in quantitative and qualitative ways. SCC also ties in how theory drives
statistical models in different ways. Flay et al.’s article emphasized that it takes more than one study
design to evaluate the efficacy of an intervention, and details each standard that needs to be met before
the intervention is ready to be disseminated, while echoing SCC’s cautions about specificity regarding
not only outcomes but about by whom, where, and how outcomes were experienced by different
groups of people. While it was satisfying to read a synthesis of previous material, from a practical
standpoint, do policymakers appropriately understand the necessity and utility of having multiple
studies with designs that guard against different threats in order to create a picture that allows for a
comprehensive view of both the efficacy and effectiveness of a given intervention? How many
interventions are truly long-lived enough for enough evaluations to take place for a researcher or
policymaker to confidently promote that intervention?
Shadish, Cook, and Campbell (2002) describe many issues surrounding generalized causal inference and
strategies for studying causal explanation. Of particular interest is their extensive discussion of
qualitative methods and structural equation modeling. I am beginning to learn more about qualitative
methods by working on analyzing transcripts that my advisor, Carolyn Mendez-Luck, has collected
regarding Latino caregivers and am also taking a qualitative methods course and learning ethnographic
field methods. While I most likely will not be conducting ethnographies as part of my Public Health
doctoral program, it is an intriguing method nonetheless. I also plan to enroll in Structural Equation
Modeling next term to learn more about causal pathways and covariance. According to our text, this is
an appropriate class for a student in Health Behavior and Promotion and I am looking forward to
learning these methods. Flay et al (2005) make the case for having interventions meet certain criteria, or
Standards of Evidence, before they are disseminated. An important aspect of these standards includes
having intervention creators provide a detailed manual for their interventions to communities who wish
to implement health behavior interventions. This is often extremely important when laypersons will be
implementing interventions.
3
H 615 Week 7 Comments/Critiques
Being able to generalize a causal inference ultimately makes it more broadly relevant and useful. As
noted in the book, researchers make generalizations intuitively, alongside more rigorous measures
including obtaining heterogeneous samples, and using qualitative methods and statistical models.
Despite this, however, I wonder about the implications of “generalizability” in the professional sphere.
As we’ve discussed in class, for instance, translation and adaptation contribute to the ability to
generalize causal inference. However, researchers may avoid these practices as they aim for innovation.
Although solid translation/adaptation practices require adherence to stringent standards of evidence as
outlined in the Flay et al. piece, coming up with a totally novel program or entirely new causal inference
may be viewed as more “groundbreaking” than implementing evidence-based interventions or adapting
them to different settings, potentially appealing more to funders and publishers. However, these "less
exciting" practices can ultimately move the field forward while potentially minimizing costs. So what is
the position of generalizability efforts (specifically systematic retrospective efforts and
translation/adaptation) in an academic culture that places tremendous value on novelty? Is our field
convinced that generalizability efforts are innovative? How do we bridge the gaps between research and
practice, and generalizability and innovation?
The research field has changed considerably as demand is greater for application. Having evidence of
the effectiveness of a prevention program is not enough as there is a need for applicability to a general
audience. Unfortunately, this is challenging as a variety of communities exist across the world, and with
dwindling resources; this limits the researchers adaptation abilities. Flay mentioned the need for more
replication studies and better dissemination. Replication reduces the potential for chance and helps
increase confidence in program effectiveness. In addition, effective dissemination is likely to occur
when delivery, fidelity, and proximal goals are part of the intervention process. SCC adds that
generalization is a product of multiple attempts at replication, which reflects why generalization
processes are either successful or a failure. To improve, SCC suggests sampling people in the
community, along with the treatments and observations. In an ideal situation, researchers are able to
generalize and provide tailored information for people with similar traits on their rate of success.
However, funding restricts the researchers and restrains how they advance their program intervention.
Approaches to effectiveness and dissemination should be used with the confidence that it is a provision
for improved studies in the future.
Randomized control trials have long been criticized for their lack of generalizability, calling into question
their usefulness in policy and community intervention development. Shadish, Cook, & Campbell (2002)
suggest five principles of generalized causal inference – surface similarity, ruling out irrelevancies,
making discriminations, interpolation and extrapolation, and causal explanation – and their unique
applications to both construct and external validity. Ruling out irrelevancies seems particularly pertinent
when considering the cost of program implementation within a given community, especially for a RCT.
Complications arise as a researcher, looking to rule out irrelevancies, excludes selected independent
variables over others and misses a key moderating/mediating effect; possibly resulting in a more
detrimental in the analysis/conclusions and cost more in the long-term. As an additional consideration
Flay et al. (2005) call attention to evidence that even the best supported intervention is unlikely to be
effective in every implementation due to diversity of settings, place, populations, etc. Although
problematic, this effectiveness quandary creates an opportunity to harken back to the program theory
to explore answers regarding variable irrelevancies. Depending on the selected theory (an integrative
one such as the TTI would be most comprehensive), it can help researchers understand the diffusion of
their intervention within a population.
While I am struck by the level of detail and careful deliberation that have obviously gone into
constructing the SPR list of standards (Flay et. al, 2005) I cannot help but wonder about the capacity of
4
H 615 Week 7 Comments/Critiques
the potential end users (in particular communities, much more than administrators and policymakers) to
utilize such a seemingly complex list in choosing what prevention program to employ. Unless the field is
prepared to offer such stakeholders a quick and easy-to-assimilate version of this list, we might run the
risk of losing them entirely by overwhelming them with (undeniably) sound information. On the other
hand, for researchers versed in the lingo and theory of behavioral science, this is certainly an invaluable
tool. I was taken aback first to realize that, apparently, (according to SCC) what I have always considered
scientific process in establishing external validity, is in fact “superficial” (to use their own words). As they
point out, this would be generalizing purely on the basis of prototypical properties (“proximal
similarity”). I do see how this process is a subjective one, depending more on value judgment, rather
than formal sampling.
SCC present their grounded theory of generalized causal inference, which includes five principles for
establishing generalizability. The five principles outline an approach for establishing generalizability for
research studies, including those that employ purposive sampling. This is an important approach as
much of quantitative and qualitative work in the social sciences utilizes purposive sampling. In addition,
SCC discuss that there are specific ways of conducting purposive sampling, such as from typical or
heterogeneous instances, in order to extend the ability to generalize from a study or research program.
Flay et al. (2005) outline Standards of Evidence in the realm of prevention research. Several of the
Standards refer to the generalizability of causal inferences from efficacy and effectiveness trials that
may or may not be ready for dissemination. In a similar manner to SCC, the authors posit that
generalizability is limited to the persons and settings that are similar to those in the research study.
Towards that effect, the Standards dictate that the research sample be well-characterized and subgrouped in analyses, and that the appropriate statistical tests are used as well as the practical value of
the numbers reported.
5
Download