Experiments and Causal Inference ● ● We had brief discussion of the role of randomized experiments in estimating causal effects earlier on. Today we take a deeper look. Key concepts: – Explanatory, response, and confounding variables; Treatments – Randomized comparison – Statistical significance; Effect size – Placebo effect, double-blind experiments – Internal and External validity – Completely randomized design vs. block design Isolating Causal Effects: The Logic of Experimental Design ● We want to study the causal effect of some explanatory (“independent”) variable on some response (“dependent”) variable. We need to eliminate the effect of confounding variables – ● e.g. Effects of sitting in the front rows on course grades. Confounding factors? e.g. How might we conduct a classroom experiment on group label effects? Logic of experimental design: – Randomization produces comparable groups before different treatments (no-treatment being a special case) are applied to the groups – Because the groups are comparable in every respect except the treatment assignments, differences in the response variable must be due to the effects of the treatments Confounding (Lurking) Variables ● The solution: – Experiment: randomization: possible confounding variables should “even out” across groups – Observational Study: measure potential confounding variables and determine if they have an impact on the response (may then adjust for these variables in the statistical analysis. e.g. using them as “control” variables in regression models) Randomized Experiment vs. Observational Study ● ● Both typically have the goal of detecting a relationship between the explanatory and response variables. Experiment – ● Create differences in the explanatory variable and examine any resulting changes in the response variable Observational Study – Observe differences in the explanatory variable and notice any related differences in the response variable Why Not Always Use a Randomized Experiment? ● ● Sometimes it is unethical or impossible to assign people to receive a specific treatment. e.g. – Does talking on the cell phone while driving increases the risk of having an accident? – Do religious people live longer? Certain explanatory variables, such as handedness or gender, are inherent traits and cannot be randomly assigned. Vocabulary of Experiments Multiple Treatment Values ● ● e.g. Effects of TV advertising: length of time; frequency of repetition e.g. Do Get Out the Vote efforts work? Which methods work better? (Gerber and Green field experiments) – Personal canvassing – Direct mail – Phone calls Example Design: Energy Conservation Multiple Treatment Values (different methods for monitoring energy use) Statistical Significance ● ● ● ● ● If an experiment (or observational study) finds a difference in two (or more) groups, is this difference “real”? Could it be due to chance? If the “true” difference in the “population” is 0, what is the probability that we observe a “sample” difference of this size? If this probability is very small, then we call the observed effect statistically significant. (We'll learn later on how to determine this probability) Significance is partly determined by sample size, i.e., number of subjects in an experiment “Significant” typically means “non-zero effect”. We should also look at the actual effect size to determine if they are practically important. Experiments in the Real world: Issues and Techniques ● ● ● ● Hawthorne, placebo and experimenter effects (psychological effects); Double blind designs. Issues of internal validity: refusals, nonadherence, dropouts External validity: generalizing the results Toward more powerful inference: block designs and matched pairs (special case of block design) Hawthorne, Placebo and Experimenter Effects ● ● The problem: – People may respond differently when they know they are part of an experiment. – So for example the experimental effect of a new drug could be Placebo effect (A) + any real effect of the medicine (B) The solution: – Use placebos, control groups, and double-blind studies when possible to isolate (B) The Hawthorne Effects ● ● ● 1920’s Experiment (Company called “Hawthorne Works of the Western Electric Company”) What changes in working conditions improve productivity of workers? – More lighting? – Less lighting? – Other changes? All changes improved productivity! Double-blind experiment? Double-Blinded Experiment: an Example ● ● Quitting Smoking with Nicotine Patches (JAMA, Feb. 23, 1994, pp. 595-600) Variables: – Explanatory: Treatment assignment – Response: Cessation of smoking (yes/no) ● Double-blinded ● Participants don’t know which patch they received ● Nor do those measuring smoking behavior Internal Validity: Are we getting at the causal effect right within the study? Issues: – Refusals – Non-adherers (not following procedure) – Dropouts Problem: individuals in these categories are probably not random samples of the subjects Comparing Pre- and Post-Treatment Results ● ● ● When randomization fails due to issues such as noncompliance, the pre/post comparison can be helpful. Differences between the treatment and control groups that do not change in time (in change in time in a similar fashion) get differenced out The so called “difference in difference” estimation method compares the shift in the response variable, not the post-treatment response itself. External Validity: Generalizing the Results ● Potential problems: – lack of generalizability due to: ● unrealistic treatments ● unnatural settings ● ● sample not representative of population (e.g. Undergraduate students not representative of the larger population) “Natural experiments” or “quasi-experiments” can have an advantage in some of these as they take place in the real world. – e.g. Smoking ban in Helena, Montana for 6 months in 2002. Helena geographically isolated, served by only one hospital. Observed that during the ban heart attack rate dropped by 60%. Experimental Design: Blocking (Stratification) ● Randomization cannot eliminate differences. The smaller the N, the larger the differences between the two groups. (think extreme case of randomizing 4 people!) ● To improve: Stratify the subjects into similar groups ● Within each strata, do randomized experiments ● Reduces variance in estimated effects by reducing variance in the (stratified) population. Block Design Example: Effects of TV ads.