Collecting Data

advertisement
Association
Collecting Data:
Two variables are associated if values of one variable tend to be related
to values of the other variable.
Experiments and Observations
What associations do you notice?
Causation
Causation
Two variables are causally associated if changing the value
of one variable influences the value of the other variable.
When deciding about potential causality between two
variables, we need to identify the explanatory variable and
the response variable.
Being female is positively associated with getting body piercings.
Does being a female cause body piercings?
Exercising is negatively associated with smoking.
Does exercising cause a person to not smoke?
Does smoking cause a person to exercise less?
Height is positively associated with weight.
Does being taller cause a person to weigh more?
TVs and Life Expectancy
80
Japan
Australia
France Canada
United KingdomUnited States
Pakistan
60
Life Expectancy
70
Mexico
Sri Lanka
China
Egypt
Vietnam
Morocco
Iraq
Yemen
Russia
Should people buy more
TVs to live longer?
Eating Ice Cream Causes Polio
Cambodia
Madagascar
Haiti
Uganda
50
Association does not implyr =causation!
0.74
40
South Africa
Angola
0
200
400
600
800
1000
TVs per 1000 People
1
Confounding Variable
A third variable that is associated with both the explanatory
and response variable is called a confounding variable.
Confounding
Variable
Explanatory
Variable
Confounding Variable
Whenever confounding variables are present (or may be
present), a causal association cannot be determined.
Confounding Variable
Whenever confounding variables are present (or may be
present), a causal association cannot be determined.
Hot Weather
Eating
Ice Cream
Response
Variable
Wealth
Getting
Polio
?
Data Collection
Number of
TVs per Capita
?
Life
Expectancy
Experiment vs. Observational Study
In an experiment, the researcher controls the value of the
explanatory variable (i.e., controls who gets the “treatment”).
Population
Sample
In an observational study, the researcher does not control the
value of any variable, but simply observes the values as they
naturally exist.
Observational studies cannot be used to establish causation,
because there are always confounding variables that have not
been measured or accounted for in observational studies.
The are two ways to collect data
through an experiment or
through an observational study.
Data
However, confounding variables can be avoided through
experiments by randomly assigning the values of the explanatory
variable.
2
Randomized Experiment
Control Group
In a randomized experiment the explanatory variable for each unit
is determined randomly, before the response variable is measured.
When determining whether a treatment is effective, it is
important to have a comparison group, known as the control
group.
The explanatory variable is also known as the treatment and has
the value of either 0 or 1.
Randomly divide the sample into two groups and assign one of
the groups to receive the treatment.
It isn’t enough to know that everyone in one group improved,
we need to know whether they improved more than they
would have improved without the treatment.
This assures that the explanatory variable for each unit is
determined by random chance alone, and is not influenced by any
confounding variables.
All randomized experiments need either a control group, or
two different treatments to compare.
Setting up Randomized Experiments
Caffeine and Academic Performance
Start by gathering a random sample from the population.
Does consuming caffeine before an exam undermine your
performance on the exam? (n = 100)
Then randomly assign the value of the explanatory variable by…
Option 1: Putting all the names into a hat, and randomly pull
out names to go into the different treatment groups.
Option 2: Putting each name onto a card, shuffle the cards, and
deal out the cards into as many piles as there are
treatment groups.
What is the explanatory variable?
Consumed caffeine before the exam
What is the response variable?
Performance on the exam (i.e., your grade)
In an observational study, we simply gather the data from the
100 participants.
Option 3: Using technology
Caffeine and Academic Performance
If we find a relationship between caffeine consumption and
exam performance, why can’t we make the causal claim that
caffeine consumption undermines exam performance?
What are some confounding variables that could affect both
caffeine consumption and exam performance?
Caffeine and Academic Performance
Explanatory variable: Consumed caffeine before the exam
Response variable: Grade on the exam
In a randomized experiment, we randomly assign the value
of the explanatory variable for each participant.
We could control for all of these factors by gathering data on
these variables.
To do this, we would randomly select 50 students and have
them consume caffeine before taking the exam and we would
forbid the other 50 from consuming caffeine before the exam.
However, there are many factors that may influence both
caffeine consumption and exam performance that we could
not possibly account for all of them.
This assures that the explanatory variable for each unit (i.e.,
caffeine consumption) is determined by random chance
alone, and is not influenced by any confounding variables.
3
Caffeine and Academic Performance
Randomized Experiments
If we find a relationship between caffeine consumption and
exam performance, can we make the causal claim that
caffeine consumption undermines exam performance?
Because the explanatory variable is randomly assigned, it is
not associated with any other variables, and thus confounding
variables are eliminated!!!
What about the multitude of confounding variables that could
affect exam performance?
Because people have been randomly assigned to be in the
“caffeine” group and the “no caffeine” group, the values of
the confounding variables will be evenly distributed between
the two groups.
Confounding
Variable
Randomized
Experiment
X
Explanatory Variable:
Caffeine Consumption
?
Response Variable:
Exam Performance
Placebo and Blinding
Placebo Effect
Control groups should be given a placebo—a fake treatment
that resembles the active treatment as much as possible.
Often, people will experience the effect they think they
should be experiencing, even if they aren’t actually receiving
the treatment. This is known as the placebo effect.
Using a placebo is only helpful if participants do not know
whether they are getting the placebo or the active treatment.
One study estimated that 75% of the effectiveness of
anti-depressant medication is due to the placebo effect.
If possible, randomized experiments should be double-blind:
neither the participants or the researchers involved should
know which treatment the participants are actually getting.
The Strange Powers of Placebos
Controlling for Placebo Effects
Give the control group a placebo, so that every participant
thinks they are receiving the treatment.
When ethically acceptable, it is even better if the participants
don’t even know the nature of the treatment they are
receiving (e.g., We are giving you caffeine because we think
it will undermine your exam performance).
4
Limitations of
Randomized Experiments
Randomization in Data Collection
Randomized experiments are ideal, but sometimes they are
not…
ethical
economically feasible
methodologically possible
Was the explanatory
variable randomly
assigned?
Was the sample
randomly selected?
Yes
No
Yes
No
Can generalize
to the population
Can’t generalize
to the population
Can make
causal claims
Can’t make
causal claims
Often, you have to do the best you can with data from
observational studies.
Randomization
Taking a random sample and conducting a randomized
experiment is ideal, but rarely achievable.
If the focus of the study is to use a sample to estimate a
statistic for the entire population, you need a random sample,
but you do not need a randomized experiment.
Assignment
Part I
Graded Problems
1.74, 1.76, and 1.88
Additional Practice Problems (not to be turned in):
1.75, 1.77, and 1.85
Part II
If the focus of the study is to establish causality from one
variable to another, you need a randomized experiment and
you can settle for a non-random sample.
Goto http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10 and find 5 different
variables that you think may be associated with 1 of the 10 variables you
selected for the previous assignment. For each of the 5 new variables, provide
the variable name and the question associated with it, provide the name of the
variable it might be associated with, and briefly (in one sentence) explain why
you think the two variables may be associated with each other.
Summary
Association does not imply causation!
In observational studies, confounding variables almost always exist, so
causation cannot be established
Randomized experiments involve randomly assigning the explanatory
variable
Randomized experiments prevent confounding variables, so causation
can be inferred
A control or comparison group is necessary
The placebo effect exists, so a placebo and blinding should be used
http://xkcd.com/552/
5
Download