Outline • Validity of Inference • Theory of Validity • Statistical Conclusion Validity • Internal Validity • Construct Validity – Jill • External Validity – Tim • Trade-offs – Tim et al • Discussion Validity of Inference VALIDITY • The approximate truth of an inference – Judgment about the extent to which relevant evidence supports the inference as being true • Always entails fallible human judgments – Evidence comes from both empirical findings and their consistency with past findings and theories • Validity judgments are never absolute – No certainty that inferences are true or that all possible alternatives have been falsified Validity of Inferences • Validity is a property of inferences – not of designs or methods • Even using a randomized experiment does not guarantee a valid causal inference – Could be “broken” by • • • • Differential attrition Low statistical power Improper statistical analysis Sampling error Why is it important to remember that validity is a property of a knowledge claim, not a property of the design? Three Theories of Truth • Correspondence theory – A knowledge claim is true if it corresponds to the world – e.g., see it raining • Coherence theory – A claim is true if it belongs to a coherent set of claims • Pragmatism – A claim is true if it is useful to believe it • Philosophers do not agree on which theory of truth is correct – and for us it doesn’t matter! – Science uses them all to approximate the truth The Theory of Validity is pragmatic and uses them all • Correspondence between empirical evidence and abstract inferences • Sensitive to degree of coherence between findings and theory • Pragmatic ruling out of alternative explanations • Truth is a social construction! Campbell & Stanley, 1963 • Followed Campbell (1957) closely in defining internal and external validity. • Internal validity: inferences about whether “the experimental treatments make a difference in this specific experimental instance.” (p. 5) • Construct validity: asked “to what populations, settings, treatment variables and measurement variables can this effect be generalized?” (p. 5) Cook & Campbell (1979) Expanded Typology of Validity To draw generalized causal inferences it is useful to treat the causal and generalizability aspects of the inferences separately: – Statistical conclusion Validity – Internal Validity – Construct Validity – External Validity Corresponds to 4 Questions • How large and reliable is covariation between the presumed cause and the effect? • Is the covariation causal, or would it have been obtained without the treatment? • Which general constructs are involved in the persons (units), treatments, observations, and settings (UTOS)? • How generalizable is the locally-embedded causal relationship over varied UTOS? • These questions and inferences are often considered separately, so it is practical to have the typology reflect that • However, they are often related - and different combinations are possible (e.g., internal validity with or without construct validity) • Interesting to consider the limits of combinations (e.g., to what extent is both high internal and external validity possible?) Threats To Validity • Are specific reasons why we can be partly or completely wrong in our inferences – About covariation, causation, constructs or variations across UTOS • It is useful to anticipate criticisms of inferences by considering the types of limitations encountered by past research. • Heuristics, such as a list of potential threats, allow us to account for threats in the design or by including measures of anticipated threats. 3 Critical Questions about Threats • For any particular experiment and finding: – How would the threat apply in this case? – Is there other evidence that the threat is plausible rather than just possible? – Does the threat operate in the same direction as the observed effect (so that it could partially or totally explain it)? • But “ruling out” threats is a falsification enterprise, so is always limited Statistical Conclusion Validity • The validity of inferences about the covariation or correlation between the treatment and the outcome • How large and reliable is the covariation? – Whether the variables covary or not – How strongly they covary (SCC, p. 42) Testing covariation • Null hypothesis significance testing (NHST) • Common misunderstandings of p value • NHST tells little about effect size • Effect size bound by confidence intervals • An alternative approach • SCC recommend these along with exact p of type I error Classical Interpretation of p value •In the classic interpretation, exact Type I probability levels tell us the probability that the results that were observed in the experiment could have been obtained by chance from a population in which the null hypothesis is true (Cohen, 1994 as cited in SCC, p. 44). •“Perhaps not the most interesting hypothesis” (SCC) Alternative Interpretation of p value • p value (probability level) signifies the confidence we can have in deciding among the following claims: – 1) Treatment A did better than treatment B (sign of effect is +) – 2) Treatment B did better than treatment A (sign of effect is -) – 3) The sign is uncertain (P > .05 signifies 3, “too close to call”) Incorrect statistical conclusions (SCC, p.42) – 1) Whether the variables covary – Type I error (claim of a difference when there is none) – Type II error (conclude that there is no effect when in fact there is one) – 2) How strongly they covary – Overestimate magnitude of covariation (and confidence in estimate of magnitude) – Underestimate magnitude of covariation (and confidence in estimate of magnitude) Threats to Statistical Conclusion Validity • Low statistical power – See Table 2.3 (pp. 46-7) for methods to increase power • Violated assumptions of the test statistics • Fishing and the error rate problem • Unreliability of measures – Always attenuates bivariate relationships • • • • • Restriction of range – floor and ceiling effects Unreliability of treatment implementation Extraneous variance in experimental setting Heterogeneity of respondents (units) Inaccurate effect size estimation Can we prove that covariation between a treatment and an outcome is zero? To support the causal inferences, three things must be established (p. 53): • 1) A precedes B in time (use design) • 2) A covaries with B (use statistics) • 3) No other explanation for the relationship is plausible (use design if possible) Internal Validity • The ability to infer with confidence that an independent variable has produced the observed differences in the dependent variable (Singleton & Straits, 2005, p. 188) • Isolating the independent variable • Controlling confounds • Validity: the approximate truth of an inference (SCC, p. 34) Internal validity • The validity of inferences about whether observed covariation between A (treatment) and B (outcome) reflects a causal relationship from A to B as those variables were manipulated or measured. • Is the covariation causal or would the same effect be obtained without treatment? Internal validity •“Local Molar Causal Validity” – Local: generalizability is zero, limited to UTOS – Molar: treatments are a complex package – Causal: restricted to claims that “A caused B” •“One of the things that's most difficult to grasp about internal validity is that it is only relevant to the specific study in question” (Trochim, 2006). Threats to Internal Validity • Each threat signifies a distinct class of extraneous other possible causes (p. 55) – – – – – – – – – Ambiguous temporal precedence Selection bias History Maturation Statistical regression Attrition (a special case of selection bias) Testing effects Instrumentation Additive/Interactive effects of these threats *GARDASIL DAILY DOUBLE* Threats to internal validity are not necessarily independent of each other. Define two threats to internal validity and explain how they could be related / co-occur in a study. Randomization Controls Most Threats to Internal Validity • Indeed, all except –Differential attrition –Differential testing Relating Statistical and Internal Validity • Both concern operations (not the constructs they represent) • Statistical conclusion validity is concerned with errors in assessing covariation • Internal validity is concerned with errors in causal-reasoning • Internal validity depends substantially on statistical conclusion validity Jill and Tim • Jill –Construct Validity • Tim –External Validity –Trade-offs Shadish 2011 • Evaluators discuss external validity much less than internal validity • Some idea of disagreements in the field(s) • Threats to validity overlap – E.g, Attrition is listed as a threat to internal validity. But because sample size drops, it can threaten power (statistical conclusion validity), may require changing how we describe who is and is not in the study (construct validity), and may raise questions about whether the intervention would have the same effect in those who dropped out (external validity). THREATS TO VALIDITY DISCUSSION QUESTIONS What happens to the precision, and confidence intervals, of effect size estimates when a study has low power? What kind of validity is threatened? A specific instance of selection bias is also defined in SCC’s list as a separate threat to internal validity. What is it? Confounding of treatment effects with population differences threatens _______ validity You are a part of a research team that has been funded to tackle the adult obesity epidemic. The hypothesis is that adults receiving the intervention will have a healthier weight than adults who do not receive the intervention. You ask your boss, “How will we measure healthy weight?” To which, your boss replies, “Simple, we will ask each participant their height and weight.” You ask, “That’s it?”, and your boss replies, “Yes.” You’re new to the team, but you really want to speak up because this is a threat to _________ validity, known as ____________________. *HERPES DAILY DOUBLE* Random sampling, though rarely performed in experimental designs, improves what kind of validity? You*HEPATITIS work at High Times Community College DAILY DOUBLE* and your coworker comes to work sharing the results of a new study. He says, “Listen to this! In a new study, students were randomly assigned to take 10, 15, or 20 units of course credit. Results show that college students who took 20 or more credits were less likely to engage in marijuana use. So to reduce the prevalence of marijuana use here at High Times CC, we have to implement a policy putting a minimum credit hour of 20 for all students!”. You, having taken H699, take a closer look at the report and see that the study was conducted at one university…Harvard. Your response to your colleague is, “Sorry, my friend, but this study most likely lacks _________ because ____________________” You want to test out a novel approach to improving psychological distress among college students. Your technique is provided to students that come into the campus counseling center. You conduct two week follow-ups with these students and see that their self-reported levels of psychological distress has improved. You are ready to tell your boss about the success of your program when your colleague points out that your study has a threat to _______ validity known as ____________. Secular trends pose a threat to _________validity. You have completed an RTC in which you examined the impact of an SAT preparation course on SAT performance. You want to see if results differ for boys versus girls. What will happen to your power if your sample is divided by gender?