Association Collecting Data: Two variables are associated if values of one variable tend to be related to values of the other variable. Experiments and Observations What associations do you notice? Causation Causation Two variables are causally associated if changing the value of one variable influences the value of the other variable. When deciding about potential causality between two variables, we need to identify the explanatory variable and the response variable. Being female is positively associated with getting body piercings. Does being a female cause body piercings? Exercising is negatively associated with smoking. Does exercising cause a person to not smoke? Does smoking cause a person to exercise less? Height is positively associated with weight. Does being taller cause a person to weigh more? TVs and Life Expectancy 80 Japan Australia France Canada United KingdomUnited States Pakistan 60 Life Expectancy 70 Mexico Sri Lanka China Egypt Vietnam Morocco Iraq Yemen Russia Should people buy more TVs to live longer? Eating Ice Cream Causes Polio Cambodia Madagascar Haiti Uganda 50 Association does not implyr =causation! 0.74 40 South Africa Angola 0 200 400 600 800 1000 TVs per 1000 People 1 Confounding Variable A third variable that is associated with both the explanatory and response variable is called a confounding variable. Confounding Variable Explanatory Variable Confounding Variable Whenever confounding variables are present (or may be present), a causal association cannot be determined. Confounding Variable Whenever confounding variables are present (or may be present), a causal association cannot be determined. Hot Weather Eating Ice Cream Response Variable Wealth Getting Polio ? Data Collection Number of TVs per Capita ? Life Expectancy Experiment vs. Observational Study In an experiment, the researcher controls the value of the explanatory variable (i.e., controls who gets the “treatment”). Population Sample In an observational study, the researcher does not control the value of any variable, but simply observes the values as they naturally exist. Observational studies cannot be used to establish causation, because there are always confounding variables that have not been measured or accounted for in observational studies. The are two ways to collect data through an experiment or through an observational study. Data However, confounding variables can be avoided through experiments by randomly assigning the values of the explanatory variable. 2 Randomized Experiment Control Group In a randomized experiment the explanatory variable for each unit is determined randomly, before the response variable is measured. When determining whether a treatment is effective, it is important to have a comparison group, known as the control group. The explanatory variable is also known as the treatment and has the value of either 0 or 1. Randomly divide the sample into two groups and assign one of the groups to receive the treatment. It isn’t enough to know that everyone in one group improved, we need to know whether they improved more than they would have improved without the treatment. This assures that the explanatory variable for each unit is determined by random chance alone, and is not influenced by any confounding variables. All randomized experiments need either a control group, or two different treatments to compare. Setting up Randomized Experiments Caffeine and Academic Performance Start by gathering a random sample from the population. Does consuming caffeine before an exam undermine your performance on the exam? (n = 100) Then randomly assign the value of the explanatory variable by… Option 1: Putting all the names into a hat, and randomly pull out names to go into the different treatment groups. Option 2: Putting each name onto a card, shuffle the cards, and deal out the cards into as many piles as there are treatment groups. What is the explanatory variable? Consumed caffeine before the exam What is the response variable? Performance on the exam (i.e., your grade) In an observational study, we simply gather the data from the 100 participants. Option 3: Using technology Caffeine and Academic Performance If we find a relationship between caffeine consumption and exam performance, why can’t we make the causal claim that caffeine consumption undermines exam performance? What are some confounding variables that could affect both caffeine consumption and exam performance? Caffeine and Academic Performance Explanatory variable: Consumed caffeine before the exam Response variable: Grade on the exam In a randomized experiment, we randomly assign the value of the explanatory variable for each participant. We could control for all of these factors by gathering data on these variables. To do this, we would randomly select 50 students and have them consume caffeine before taking the exam and we would forbid the other 50 from consuming caffeine before the exam. However, there are many factors that may influence both caffeine consumption and exam performance that we could not possibly account for all of them. This assures that the explanatory variable for each unit (i.e., caffeine consumption) is determined by random chance alone, and is not influenced by any confounding variables. 3 Caffeine and Academic Performance Randomized Experiments If we find a relationship between caffeine consumption and exam performance, can we make the causal claim that caffeine consumption undermines exam performance? Because the explanatory variable is randomly assigned, it is not associated with any other variables, and thus confounding variables are eliminated!!! What about the multitude of confounding variables that could affect exam performance? Because people have been randomly assigned to be in the “caffeine” group and the “no caffeine” group, the values of the confounding variables will be evenly distributed between the two groups. Confounding Variable Randomized Experiment X Explanatory Variable: Caffeine Consumption ? Response Variable: Exam Performance Placebo and Blinding Placebo Effect Control groups should be given a placebo—a fake treatment that resembles the active treatment as much as possible. Often, people will experience the effect they think they should be experiencing, even if they aren’t actually receiving the treatment. This is known as the placebo effect. Using a placebo is only helpful if participants do not know whether they are getting the placebo or the active treatment. One study estimated that 75% of the effectiveness of anti-depressant medication is due to the placebo effect. If possible, randomized experiments should be double-blind: neither the participants or the researchers involved should know which treatment the participants are actually getting. The Strange Powers of Placebos Controlling for Placebo Effects Give the control group a placebo, so that every participant thinks they are receiving the treatment. When ethically acceptable, it is even better if the participants don’t even know the nature of the treatment they are receiving (e.g., We are giving you caffeine because we think it will undermine your exam performance). 4 Limitations of Randomized Experiments Randomization in Data Collection Randomized experiments are ideal, but sometimes they are not… ethical economically feasible methodologically possible Was the explanatory variable randomly assigned? Was the sample randomly selected? Yes No Yes No Can generalize to the population Can’t generalize to the population Can make causal claims Can’t make causal claims Often, you have to do the best you can with data from observational studies. Randomization Taking a random sample and conducting a randomized experiment is ideal, but rarely achievable. If the focus of the study is to use a sample to estimate a statistic for the entire population, you need a random sample, but you do not need a randomized experiment. Assignment Part I Graded Problems 1.74, 1.76, and 1.88 Additional Practice Problems (not to be turned in): 1.75, 1.77, and 1.85 Part II If the focus of the study is to establish causality from one variable to another, you need a randomized experiment and you can settle for a non-random sample. Goto http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10 and find 5 different variables that you think may be associated with 1 of the 10 variables you selected for the previous assignment. For each of the 5 new variables, provide the variable name and the question associated with it, provide the name of the variable it might be associated with, and briefly (in one sentence) explain why you think the two variables may be associated with each other. Summary Association does not imply causation! In observational studies, confounding variables almost always exist, so causation cannot be established Randomized experiments involve randomly assigning the explanatory variable Randomized experiments prevent confounding variables, so causation can be inferred A control or comparison group is necessary The placebo effect exists, so a placebo and blinding should be used http://xkcd.com/552/ 5