H615 Week 2 Comments

advertisement
H 615 Week 2 Comments
The readings attempted to define validity and types of validity. A clear message across all readings was that validity
itself is not a “one size fits all” construct. According to SCC (2002), “…validity judgments are not absolute…” (pg. 34).
Rather, “…validity threats are heuristic aids...they are not universally relevant…” (pg. 40). Threats to validity serve
researchers by helping to anticipate criticisms of causal inferences, such that study design controls may minimize
threats. Statistical conclusion validity and internal validity relate to study operation and causal inference. Although the
former concerns assessment of statistical covariation and the latter causal-reasoning, threats to either type may lead to
incorrect causal conclusions. Threats to internal validity include potentially influential factors that exist outside of an
intervention (e.g. history, maturation). A question remains: how do we tease out intervention effects in large, multilevel, community-based interventions? Construct validity and external validity relate to generalizability. The former is
concerned with understanding what is measured and how to assess it. The latter concerns the extent to which causality
holds true across persons, settings, treatments, and outcomes. Bottom line: addressing all threats to validity is an
undertaking more appropriate for a program of research than for any single study.
“The Truth about Validity” reading reminded me of the parable of the blind men and the elephant; particularly, the last
line that stated, “For, quarreling, each to his view they cling. Such folk see only one side of a thing.” While reading
Shadish’s comments and reflecting on the Shadish, Cook and Campbell (SCC) readings it became clearer that addressing
validity in research is a process with the long-term goal of understanding the truth about a causal relationship. SCC’s
typology is a heuristic, much like a theory, to help us focus on those threats most pertinent to the research at hand. I
agree with SCC’s suggestion that it is not realistic to operationally address every threat to validity in an experiment. I
think this is what Shadish meant when he stated “Good and truth are two different things.” It has been my perception
that methods coursework and critiques place too much emphasis on validity within the single experiment rather than
addressing the validity of a single experiment within a program of research. The latter shifts the focus to causal direction
rather than effect size and requires more thoughtful consideration when identifying future actions related to uncovering
the truth about a causal relationship.
In Shadish, Cook and Campbell, the authors make the point that basic researchers are more concerned with construct
validity, whereas applied researchers are more concerned with external validity. However, in his 2011 article, Shadish
makes the counterintuitive point that evaluation researchers (whom I associate with applied researchers) more often
cite papers about internal validity, because of evaluators’ “intuitive awareness” that their work is highly contextual and
therefore not widely generalizable. I think that this is a flawed assumption, because it is the researcher’s responsibility
to ensure that results are not over-generalized outside of the range of the research. Conflicting arguments state that
either researchers should never generalize outside the scope of the data, or (as SC&C believe) that science relies on
generalization outside of the data to formulate new hypotheses. Generalizations within the mind of the researcher to
spur new questions are one thing, but over-generalizations in the implementation of programs could be costly and
ineffective. In the “outside” world, it is crucial for scientists who do community and evaluation work to serve as
interpreters for the community and help stakeholders to understand what the results of a study tell us about the
program or intervention, and not to assume implicit knowledge about the external validity of a study.
Shadish et al (2002) thoroughly discuss validity, which is central to scientific effort. They describe four kinds of validity
(statistical conclusion, external, internal, and construct) and threats against them. I appreciate that the authors say that,
even as we design studies to avoid validity threats and to approximately infer the truth, we are fallible human beings
and irrefutable inferences cannot be achieved. I am particularly interested in discussing issues regarding external and
construct validity. As far as constructs go, the anthropologist in me is trained to question whether we, as contextual
scientists, can ever truly operationalize constructs. In the same way that truth is a social construct, so are emotions and
1
H 615 Week 2 Comments
perceptions. Can a US-based research adequately operationalize race in Latin America? Can we as individuals fully
operationally how everybody else experiences discrimination? And if so, how do we achieve that? Also, is there always
virtue in attempting to increase our power of generalization? If we conduct a study using random samples from the
entire nation so that we can generalize our conclusions to the US population, what do we lose in the process? Can we
truly apply our findings to every community in the US? Is that desirable or even practical?
In the health sciences, particularly in the field of health promotion where a great deal of importance lies in the
motivations, attitudes, internal drives and cognitions of study participants, it seems as though construct validity as it
relates to measurement is near impossible to achieve. Often, due to complex causal chains and behavioral determinants
that subjects themselves may not fully understand, there is a mismatch between tools that accurately measure a poorly
chosen trait or behavior which will not impact the outcome of interest. Furthermore, it is possible that researchers can
identify a factor that is a direct cause of the outcome of interest, but the factor is poorly defined or measures do not
completely capture it. The necessity of proxies in health promotion research makes the pursuit of a valid study an
endless task, and sadly, threats to construct validity do not always respond to study design. [Too much focus on
measurement issues rather than design issues.]
In Chapters 2 & 3, Shadish, Cook and Campbell (SCC) detail a validity typology according to the following four validity
types (and their numerous threats): statistical conclusion validity, internal validity, construct validity, and external
validity. Shadish’s 2011 commentary relates aspects of this typology (and those before it) to the issues of validity theory
and methodologies to strengthen causal inference put forth by others in Advancing Validity in Outcome Evaluation:
Theory and Practice. From these readings, I wondered what the public health field generally considers ‘the most serious’
validity threats to be at present, as well as how researchers and evaluators are influenced in their decisions to
address/circumvent certain validity threats versus others by the priorities of their funding sources and the standards of
the peer-reviewed publications to which they intend to submit manuscripts. Have we moved more toward concerns
regarding threats to external validity versus internal validity and/or statistical control validity as Shadish suggests in his
commentary on certain chapters? Finally, though discussion of qualitative research is embedded in the SCC Chapters, it
remained unclear to me to what extent/how statistical conclusion validity is plausible for causal inferences based on
qualitative research. [Very nice!]
SCC point out and Shadish reiterates the catch 22 wherein methodological solutions for improving one validity will
decrease another validity which has sparked debates between researchers on which methods are best practice and
which validity is most important to address. However, these arguments are largely arbitrary in answering questions as
choosing either side will lead to the exclusion of certain study designs and thus likely omit valuable information
surrounding the construct(s)/relationship(s) of interest. Instead, researchers should collaborate and consider questions
in terms of a series of observational, quasi-experimental, and experimental studies that inform conclusions on
causations after having examined the relationship using multi-definitions and through multi-methods covering all
validity considerations. Campbell and Stanley comment that matching is “overrated” and exclude it from their
commentary. Though matching does not guarantee representativeness and thus resulting relationships should not be
over emphasized, matching when randomization is impossible/unethical could be a useful tool in narrowing possible
confounders and should not be dismissed completely. SCC state researchers should publish more detailed data and let
readers decide whether resulting effect sizes support future research but are prevented by publications’ space
limitations. Sharing study details could better inform future studies and provide alternate methods for supporting
correlations.
2
H 615 Week 2 Comments
Although occasionally bogged down with thoughts such as “research seems like a losing battle” during the readings, I
was also reminded of instances where new ideas have emerged from studies whose designs took unexpected turns as a
result of realities outside of the control of the researchers. These chapters affirmed CCM’s assertion that forethought
and careful study design can help you avoid problems that retrospective statistical design cannot fix. However, I think
that being aware of all of these threats throughout the planning, implementing, analyzing and reporting stages of
research can sometimes lead researchers to use a particular threat to a design as a strength by tweaking the main
question of interest. For example, consider Elder’s use of the Berkeley Growth and Guidance study’s data to explore
differential impacts of the depression across different age and social strata. For McFarlane and other original
investigators, the depression may have posed a historical threat to internal validity given how different its impact on
development within each of the two samples, let alone on the financial standing and opportunity of each family within
the two samples. [Yes, one person’s problem can become another person’s main question of interest – and that is ok,
that is one way that science advances.]
The readings for this week raised several questions for me, particularly regarding the relationship between validity and
evaluation. Though the book covers various validity typologies, I was particularly intrigued by external validity, given
generalizability’s pertinence to program development, implementation, and evaluation. The book chapters note that
external validity reflects the generalizability of inferences across persons, settings, treatments, and outcomes (83). While
Shadish’s (2011) piece touches on the concept of validity as it relates to outcome evaluation. Though both concepts are
cornerstones of public health research and practice, their relationship can become a bit ambiguous particularly when
discussing the trends of translation and adaptation. How do we balance the use of existing programs/information,
adaptation, fidelity, and external validity? To what extent is the generalizability of inferences valid when evaluation is
heavily contextual? With these questions, it seems to me that the line between validity and evaluation can be blurred,
namely in deciphering whether conclusions about validity result from evaluation or are components of the evaluative
process and how we can balance context-specific evaluation with generalizability. [Some confusion re validity and
evaluation. Whether a study is a research study or an evaluation study, the types of validity apply to any inferences from
the result of that study. There is no line between validity and evaluation!]
So what exactly is most important – external or internal validity? And why? The Shadish reading presents the fact that
while internal validity is talked about and explored more so in the literature, external validity is where our attention is
drawn. Is this an effort to master the control of study design yet still demonstrate to our colleagues and policy makers
that outcomes or results of the study can be generalized to the larger population (under the parameters selected)?
The use of internal and external validity is also addressed by Shadish, Cook, and Campbell, “No one in the real
world of science ever talks about whether A caused B” (p.94). Their rationale lies within the descriptive causal
relationships. Is it feasible to think in terms of a controlled environment yet extending beyond such power to suggest a
causal relationship between A and B in the real world? Once we address the relationship in a real world setting (e.g.
intervention program) how well can we translate confounding variables to truly address that cause precedes effect?
Could it not be that the environment for which the intervention is employed is the true effect and still suggest the
intervention was more strongly associated with the effect? [This could not be the case with an RCT, but could occur
with a quasi-experiment.]
Acknowledging that truth is socially constructed, SCC describe how researchers can approximate the “truth” of causal
inferences through minimizing the effects of various threats to different types of validity in research studies. Contrary to
what I have previously learned about validity being a construct that describes whether or not researchers measure what
they set out to measure, SCC apply the concept of validity to causal inferences.
Question: Are the types of validity assessed when social scientists develop surveys or questionnaires from a
3
H 615 Week 2 Comments
different literature base than the SCC model? It seems that the application of this typology to survey design would
inevitably lead to improved ability to make causal inferences, especially regarding construct validity.
One of the more complex threats to validity, construct validity, stems from the abstract concepts that social
scientists study and the need to fully explicate and accurately measure the variables being addressed in the study. SCC
identify “experimenter expectancies” as a threat to construct validity and suggest that experimenters remain distant
from the participants and that control groups and multiple experimenters can reduce this threat. They discuss the
Pygmalion effect in education, and I question what implications this may have for action research in education.
[Be aware that the validity of measures and the validity of causal inference are two different "fields."]
Shadish, Cook, and Campbell (2002) traverse from broad explanations of types of validity to giving details on how
statistical conclusion validity can be compromised. One method for minimizing threats to statistical conclusion validity is
the use of Bayesian statistics, which reduces the need for null hypothesis significance testing by adding test results to
existing knowledge bases. Bayesian statistics can be used with meta-analytic procedures. Based on Shadish et al’s (2002)
perspective on the future usefulness of Bayesian statistics, graduate students in HBHP should consider learning the
method. Internal validity and compromising it was thoroughly reviewed. Shadish et al.’s descriptions of construct
validity and external validity focus on how researchers can make their experiments more easily replicable and applicable
to future users. If constructs are well defined, future users of interventions will more easily maintain fidelity to the
original intervention design. Designing interventions that have external validity is a bit more complicated. Shadish (2011)
notes that many interventions, especially those designed through community participation, might have limited
generalizability. Researchers must engage stakeholders to consider generalizability in such intervention design.
According to Shadish, the ability of stakeholders to effectively consider generalization needs to be further researched.
[Very good]
Validity is associated with the concept of truth with each type of validity having a set of threats. What stood out was the
additive and interactive effects of threats to internal validity since it had a multiplicative effect. I had not considered
additive/interactive effects during a study, but can see how the effect of multiple threats can occur among high
achieving students. I would have liked for them to include an example for selection-instrumentation addictive as I
struggled to identify one. Chapter 3 of SCC was informative since I initially had a narrow vision of generalization (5
different types). I was unaware it can be reversed from an experimental sample to an individual. I realized this is
common in medicine with clinical trials for testing experimental treatments that will eventually be for an individual
patient. The Shadish article presented multiple perspectives on validity. I initially agreed with Reichardt’s claim of time
being an important facet (since SCC omitted it). Cronbach argued the past cannot be duplicated and the other functions
of time can be incorporated into the other faucets. I agree with Cronbach’s first assessment, but am still unsure how
time can be incorporated into other faucets without altering it.
Validity in experimental research – statistical conclusion, internal, construct, and external – allows researchers to
approximate truth from inference. Challenges arise when understanding the interactive effects of each validity threat;
balancing the tenuous relationship between all four types. For example, statistical conclusion validity warns against
heterogeneity of units’ impact on variability (increasing standard deviation); however, utilizing more homogenous
sampling risks violating assumptions of independence of observations, restricting the range of responses, and
compromising external validity. Given this and other complicated additive threats, some scholars (i.e., Julnes cited in
Shadish, 2011) have called for an all-inclusive model to outline these overlapping dimensions. While such a framework
might prove comprehensive, it would be at a great expense of the utility and simplicity of the current approach.
Additionally, Julnes’ proposal detracts from the interactive nature of scientific research: challenging causal inferences
and experimental results through logic and replication.
4
H 615 Week 2 Comments
In light of the two opposing ‘camps’ of thought that Shadish (not so subtly) refers to (Shadish, 2011) it is interesting to
note that SCC are very quick to point out that validity is not a property of the design of a study, but rather, of inferences.
This might be somewhat unsettling for those who hold fast to the idea that certain study designs (e.g., the oft cited
randomized experiment) confer greater internal validity (and as such are more superior) than others. [As SCC end up
saying, it is common language to say that "certain designs confer great internal validity" and that is ok as short hand for
"certain designs allow for more valid inferences."]
The bigger question should be, ‘What are the aims of the study?’ If the greater need is for evidence of external
validity (as Shadish puts it, “Will this work [with xyz population/settings]?) then I see no need for nitpicking about the
superiority of one design over another. Indeed, it becomes clear that an attempt to increase the internal validity by
over-controlling for all possible sources of bias results in a trade-off in which high internal validity is obtained at the risk
of having low external validity.
In public health, where the end goal of studies is usually translation to large scale programs, the pursuit of greater
degrees of external validity seems the logical objective in evaluating outcomes. This is particularly important in
extremely diverse populations such as is the case in the US. [Before you worry about generalizability, you need to be very
confident that the intervention works; that is, you need studies with internal validity first, then you can do additional
studies to establish generalizability. Unfortunately, this last step is too rarely taken.]
However, in conducting pilot studies where the goal is to generate sufficient evidence to justify continued
research into promising solutions, designs that would generate a greater degree of internal validity would be far more
preferable. [But not just pilot studies. You need full-scale studies with adequate statistical power to ensure statistical
conclusion and internal validity.]
5
Download