H 615 Week 2 Comments The readings attempted to define validity and types of validity. A clear message across all readings was that validity itself is not a “one size fits all” construct. According to SCC (2002), “…validity judgments are not absolute…” (pg. 34). Rather, “…validity threats are heuristic aids...they are not universally relevant…” (pg. 40). Threats to validity serve researchers by helping to anticipate criticisms of causal inferences, such that study design controls may minimize threats. Statistical conclusion validity and internal validity relate to study operation and causal inference. Although the former concerns assessment of statistical covariation and the latter causal-reasoning, threats to either type may lead to incorrect causal conclusions. Threats to internal validity include potentially influential factors that exist outside of an intervention (e.g. history, maturation). A question remains: how do we tease out intervention effects in large, multilevel, community-based interventions? Construct validity and external validity relate to generalizability. The former is concerned with understanding what is measured and how to assess it. The latter concerns the extent to which causality holds true across persons, settings, treatments, and outcomes. Bottom line: addressing all threats to validity is an undertaking more appropriate for a program of research than for any single study. “The Truth about Validity” reading reminded me of the parable of the blind men and the elephant; particularly, the last line that stated, “For, quarreling, each to his view they cling. Such folk see only one side of a thing.” While reading Shadish’s comments and reflecting on the Shadish, Cook and Campbell (SCC) readings it became clearer that addressing validity in research is a process with the long-term goal of understanding the truth about a causal relationship. SCC’s typology is a heuristic, much like a theory, to help us focus on those threats most pertinent to the research at hand. I agree with SCC’s suggestion that it is not realistic to operationally address every threat to validity in an experiment. I think this is what Shadish meant when he stated “Good and truth are two different things.” It has been my perception that methods coursework and critiques place too much emphasis on validity within the single experiment rather than addressing the validity of a single experiment within a program of research. The latter shifts the focus to causal direction rather than effect size and requires more thoughtful consideration when identifying future actions related to uncovering the truth about a causal relationship. In Shadish, Cook and Campbell, the authors make the point that basic researchers are more concerned with construct validity, whereas applied researchers are more concerned with external validity. However, in his 2011 article, Shadish makes the counterintuitive point that evaluation researchers (whom I associate with applied researchers) more often cite papers about internal validity, because of evaluators’ “intuitive awareness” that their work is highly contextual and therefore not widely generalizable. I think that this is a flawed assumption, because it is the researcher’s responsibility to ensure that results are not over-generalized outside of the range of the research. Conflicting arguments state that either researchers should never generalize outside the scope of the data, or (as SC&C believe) that science relies on generalization outside of the data to formulate new hypotheses. Generalizations within the mind of the researcher to spur new questions are one thing, but over-generalizations in the implementation of programs could be costly and ineffective. In the “outside” world, it is crucial for scientists who do community and evaluation work to serve as interpreters for the community and help stakeholders to understand what the results of a study tell us about the program or intervention, and not to assume implicit knowledge about the external validity of a study. Shadish et al (2002) thoroughly discuss validity, which is central to scientific effort. They describe four kinds of validity (statistical conclusion, external, internal, and construct) and threats against them. I appreciate that the authors say that, even as we design studies to avoid validity threats and to approximately infer the truth, we are fallible human beings and irrefutable inferences cannot be achieved. I am particularly interested in discussing issues regarding external and construct validity. As far as constructs go, the anthropologist in me is trained to question whether we, as contextual scientists, can ever truly operationalize constructs. In the same way that truth is a social construct, so are emotions and 1 H 615 Week 2 Comments perceptions. Can a US-based research adequately operationalize race in Latin America? Can we as individuals fully operationally how everybody else experiences discrimination? And if so, how do we achieve that? Also, is there always virtue in attempting to increase our power of generalization? If we conduct a study using random samples from the entire nation so that we can generalize our conclusions to the US population, what do we lose in the process? Can we truly apply our findings to every community in the US? Is that desirable or even practical? In the health sciences, particularly in the field of health promotion where a great deal of importance lies in the motivations, attitudes, internal drives and cognitions of study participants, it seems as though construct validity as it relates to measurement is near impossible to achieve. Often, due to complex causal chains and behavioral determinants that subjects themselves may not fully understand, there is a mismatch between tools that accurately measure a poorly chosen trait or behavior which will not impact the outcome of interest. Furthermore, it is possible that researchers can identify a factor that is a direct cause of the outcome of interest, but the factor is poorly defined or measures do not completely capture it. The necessity of proxies in health promotion research makes the pursuit of a valid study an endless task, and sadly, threats to construct validity do not always respond to study design. [Too much focus on measurement issues rather than design issues.] In Chapters 2 & 3, Shadish, Cook and Campbell (SCC) detail a validity typology according to the following four validity types (and their numerous threats): statistical conclusion validity, internal validity, construct validity, and external validity. Shadish’s 2011 commentary relates aspects of this typology (and those before it) to the issues of validity theory and methodologies to strengthen causal inference put forth by others in Advancing Validity in Outcome Evaluation: Theory and Practice. From these readings, I wondered what the public health field generally considers ‘the most serious’ validity threats to be at present, as well as how researchers and evaluators are influenced in their decisions to address/circumvent certain validity threats versus others by the priorities of their funding sources and the standards of the peer-reviewed publications to which they intend to submit manuscripts. Have we moved more toward concerns regarding threats to external validity versus internal validity and/or statistical control validity as Shadish suggests in his commentary on certain chapters? Finally, though discussion of qualitative research is embedded in the SCC Chapters, it remained unclear to me to what extent/how statistical conclusion validity is plausible for causal inferences based on qualitative research. [Very nice!] SCC point out and Shadish reiterates the catch 22 wherein methodological solutions for improving one validity will decrease another validity which has sparked debates between researchers on which methods are best practice and which validity is most important to address. However, these arguments are largely arbitrary in answering questions as choosing either side will lead to the exclusion of certain study designs and thus likely omit valuable information surrounding the construct(s)/relationship(s) of interest. Instead, researchers should collaborate and consider questions in terms of a series of observational, quasi-experimental, and experimental studies that inform conclusions on causations after having examined the relationship using multi-definitions and through multi-methods covering all validity considerations. Campbell and Stanley comment that matching is “overrated” and exclude it from their commentary. Though matching does not guarantee representativeness and thus resulting relationships should not be over emphasized, matching when randomization is impossible/unethical could be a useful tool in narrowing possible confounders and should not be dismissed completely. SCC state researchers should publish more detailed data and let readers decide whether resulting effect sizes support future research but are prevented by publications’ space limitations. Sharing study details could better inform future studies and provide alternate methods for supporting correlations. 2 H 615 Week 2 Comments Although occasionally bogged down with thoughts such as “research seems like a losing battle” during the readings, I was also reminded of instances where new ideas have emerged from studies whose designs took unexpected turns as a result of realities outside of the control of the researchers. These chapters affirmed CCM’s assertion that forethought and careful study design can help you avoid problems that retrospective statistical design cannot fix. However, I think that being aware of all of these threats throughout the planning, implementing, analyzing and reporting stages of research can sometimes lead researchers to use a particular threat to a design as a strength by tweaking the main question of interest. For example, consider Elder’s use of the Berkeley Growth and Guidance study’s data to explore differential impacts of the depression across different age and social strata. For McFarlane and other original investigators, the depression may have posed a historical threat to internal validity given how different its impact on development within each of the two samples, let alone on the financial standing and opportunity of each family within the two samples. [Yes, one person’s problem can become another person’s main question of interest – and that is ok, that is one way that science advances.] The readings for this week raised several questions for me, particularly regarding the relationship between validity and evaluation. Though the book covers various validity typologies, I was particularly intrigued by external validity, given generalizability’s pertinence to program development, implementation, and evaluation. The book chapters note that external validity reflects the generalizability of inferences across persons, settings, treatments, and outcomes (83). While Shadish’s (2011) piece touches on the concept of validity as it relates to outcome evaluation. Though both concepts are cornerstones of public health research and practice, their relationship can become a bit ambiguous particularly when discussing the trends of translation and adaptation. How do we balance the use of existing programs/information, adaptation, fidelity, and external validity? To what extent is the generalizability of inferences valid when evaluation is heavily contextual? With these questions, it seems to me that the line between validity and evaluation can be blurred, namely in deciphering whether conclusions about validity result from evaluation or are components of the evaluative process and how we can balance context-specific evaluation with generalizability. [Some confusion re validity and evaluation. Whether a study is a research study or an evaluation study, the types of validity apply to any inferences from the result of that study. There is no line between validity and evaluation!] So what exactly is most important – external or internal validity? And why? The Shadish reading presents the fact that while internal validity is talked about and explored more so in the literature, external validity is where our attention is drawn. Is this an effort to master the control of study design yet still demonstrate to our colleagues and policy makers that outcomes or results of the study can be generalized to the larger population (under the parameters selected)? The use of internal and external validity is also addressed by Shadish, Cook, and Campbell, “No one in the real world of science ever talks about whether A caused B” (p.94). Their rationale lies within the descriptive causal relationships. Is it feasible to think in terms of a controlled environment yet extending beyond such power to suggest a causal relationship between A and B in the real world? Once we address the relationship in a real world setting (e.g. intervention program) how well can we translate confounding variables to truly address that cause precedes effect? Could it not be that the environment for which the intervention is employed is the true effect and still suggest the intervention was more strongly associated with the effect? [This could not be the case with an RCT, but could occur with a quasi-experiment.] Acknowledging that truth is socially constructed, SCC describe how researchers can approximate the “truth” of causal inferences through minimizing the effects of various threats to different types of validity in research studies. Contrary to what I have previously learned about validity being a construct that describes whether or not researchers measure what they set out to measure, SCC apply the concept of validity to causal inferences. Question: Are the types of validity assessed when social scientists develop surveys or questionnaires from a 3 H 615 Week 2 Comments different literature base than the SCC model? It seems that the application of this typology to survey design would inevitably lead to improved ability to make causal inferences, especially regarding construct validity. One of the more complex threats to validity, construct validity, stems from the abstract concepts that social scientists study and the need to fully explicate and accurately measure the variables being addressed in the study. SCC identify “experimenter expectancies” as a threat to construct validity and suggest that experimenters remain distant from the participants and that control groups and multiple experimenters can reduce this threat. They discuss the Pygmalion effect in education, and I question what implications this may have for action research in education. [Be aware that the validity of measures and the validity of causal inference are two different "fields."] Shadish, Cook, and Campbell (2002) traverse from broad explanations of types of validity to giving details on how statistical conclusion validity can be compromised. One method for minimizing threats to statistical conclusion validity is the use of Bayesian statistics, which reduces the need for null hypothesis significance testing by adding test results to existing knowledge bases. Bayesian statistics can be used with meta-analytic procedures. Based on Shadish et al’s (2002) perspective on the future usefulness of Bayesian statistics, graduate students in HBHP should consider learning the method. Internal validity and compromising it was thoroughly reviewed. Shadish et al.’s descriptions of construct validity and external validity focus on how researchers can make their experiments more easily replicable and applicable to future users. If constructs are well defined, future users of interventions will more easily maintain fidelity to the original intervention design. Designing interventions that have external validity is a bit more complicated. Shadish (2011) notes that many interventions, especially those designed through community participation, might have limited generalizability. Researchers must engage stakeholders to consider generalizability in such intervention design. According to Shadish, the ability of stakeholders to effectively consider generalization needs to be further researched. [Very good] Validity is associated with the concept of truth with each type of validity having a set of threats. What stood out was the additive and interactive effects of threats to internal validity since it had a multiplicative effect. I had not considered additive/interactive effects during a study, but can see how the effect of multiple threats can occur among high achieving students. I would have liked for them to include an example for selection-instrumentation addictive as I struggled to identify one. Chapter 3 of SCC was informative since I initially had a narrow vision of generalization (5 different types). I was unaware it can be reversed from an experimental sample to an individual. I realized this is common in medicine with clinical trials for testing experimental treatments that will eventually be for an individual patient. The Shadish article presented multiple perspectives on validity. I initially agreed with Reichardt’s claim of time being an important facet (since SCC omitted it). Cronbach argued the past cannot be duplicated and the other functions of time can be incorporated into the other faucets. I agree with Cronbach’s first assessment, but am still unsure how time can be incorporated into other faucets without altering it. Validity in experimental research – statistical conclusion, internal, construct, and external – allows researchers to approximate truth from inference. Challenges arise when understanding the interactive effects of each validity threat; balancing the tenuous relationship between all four types. For example, statistical conclusion validity warns against heterogeneity of units’ impact on variability (increasing standard deviation); however, utilizing more homogenous sampling risks violating assumptions of independence of observations, restricting the range of responses, and compromising external validity. Given this and other complicated additive threats, some scholars (i.e., Julnes cited in Shadish, 2011) have called for an all-inclusive model to outline these overlapping dimensions. While such a framework might prove comprehensive, it would be at a great expense of the utility and simplicity of the current approach. Additionally, Julnes’ proposal detracts from the interactive nature of scientific research: challenging causal inferences and experimental results through logic and replication. 4 H 615 Week 2 Comments In light of the two opposing ‘camps’ of thought that Shadish (not so subtly) refers to (Shadish, 2011) it is interesting to note that SCC are very quick to point out that validity is not a property of the design of a study, but rather, of inferences. This might be somewhat unsettling for those who hold fast to the idea that certain study designs (e.g., the oft cited randomized experiment) confer greater internal validity (and as such are more superior) than others. [As SCC end up saying, it is common language to say that "certain designs confer great internal validity" and that is ok as short hand for "certain designs allow for more valid inferences."] The bigger question should be, ‘What are the aims of the study?’ If the greater need is for evidence of external validity (as Shadish puts it, “Will this work [with xyz population/settings]?) then I see no need for nitpicking about the superiority of one design over another. Indeed, it becomes clear that an attempt to increase the internal validity by over-controlling for all possible sources of bias results in a trade-off in which high internal validity is obtained at the risk of having low external validity. In public health, where the end goal of studies is usually translation to large scale programs, the pursuit of greater degrees of external validity seems the logical objective in evaluating outcomes. This is particularly important in extremely diverse populations such as is the case in the US. [Before you worry about generalizability, you need to be very confident that the intervention works; that is, you need studies with internal validity first, then you can do additional studies to establish generalizability. Unfortunately, this last step is too rarely taken.] However, in conducting pilot studies where the goal is to generate sufficient evidence to justify continued research into promising solutions, designs that would generate a greater degree of internal validity would be far more preferable. [But not just pilot studies. You need full-scale studies with adequate statistical power to ensure statistical conclusion and internal validity.] 5