How can we mitigate against noncausal associations in design and
analysis?
Epidemiology matters: a new introduction to methodological foundations
Chapter 10
Seven steps
1.
Define the population of interest
2.
Conceptualize and create measures of exposures and health
indicators
3.
Take a sample of the population
4.
Estimate measures of association between exposures and health
indicators of interest
5. Rigorously evaluate whether the association observed
suggests a causal association
6.
Assess the evidence for causes working together
7.
Assess the extent to which the result matters, is externally valid,
to other populations
Epidemiology Matters – Chapter 1
2
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
3
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
4
Comparability

Exposed and unexposed should be comparable on all
factors associated with the disease other than the
exposure

One way to ensure this comparability is to randomize the
exposure
Epidemiology Matters – Chapter 10
5
Comparability
What is wrong with non-comparability? Consider an example:

Study: 5,000 smokers and 5,000 non-smokers are followed for
10 years

After 10 years, the smokers have 3.0 times the risk of motor
vehicle crash fatality compared with non-smokers

Are you comfortable reporting that smoking causes motor
vehicle crash fatality?
Epidemiology Matters – Chapter 10
6
Comparability, an example

Study: 5,000 smokers and 5,000 non-smokers are followed for
10 years

After 10 years, the smokers have 3.0 times the risk of motor
vehicle crash fatality compared with non-smokers

Are you comfortable reporting that smoking causes motor
vehicle crash fatality?

Individuals who choose to smoke are more likely to engage in
other behaviors with adverse consequences for health
Epidemiology Matters – Chapter 10
7
Randomization

Creates comparability between groups

Removes individual’s ability to choose exposure
status
Epidemiology Matters – Chapter 10
8
Randomized Control Trial, RCT
 Sample from population (purposive)
 Assign individuals to be exposed or unexposed
 Follow population forward to determine who
develops outcome
Epidemiology Matters – Chapter 10
9
The goal of RCT

We want our comparison groups to be

“different” on just our main exposure that
we are studying in relation to some
outcome
AND

the “same” on all the other important
covariates
Epidemiology Matters – Chapter 10
10
Why does randomization control
for non-comparability? Example
 Two investigators conduct two separate studies
 Exploring effects of regular cardiovascular exercise
on incidence of cardiovascular disease
 Population is post-menopausal women
 Hypothesis: exercise is protective against
cardiovascular disease
Epidemiology Matters – Chapter 10
11
Example, study 1
 Purposive sample of 80 post-menopausal women with no history of
cardiovascular disease
 Asks women if they engage in ≥ 30 minutes of regular
cardiovascular exercise ≥ 3 times/week (regular exercise compared
to non-regular exercise)
 Follows groups for five years
 Count women in each group who have a cardiovascular event
 Assume no losses to follow-up
Epidemiology Matters – Chapter 10
12
Non-diseased
Diseased
Non-exposed
Exposed
Epidemiology Matters – Chapter 10
13
Study 1
Epidemiology Matters – Chapter 10
14
Study 1, interpretation
Those who exercise have
approximately 0.5 times the risk of
cardiovascular disease compared
with those who do not exercise.
There are approximately 20 fewer cases of cardiovascular
disease per every 100 people who exercise compared with those
who do not exercise.
Epidemiology Matters – Chapter 10
15
Study 1,validity

Women who choose to exercise regularly may be more
likely to be non-smokers, eat a more healthy diet, take
multivitamins, etc.

We do not know whether the exercise had any causal
effect on their cardiovascular health

In fact, the women who exercise had much lower average
daily saturated fat intake than the non-exercisers
Epidemiology Matters – Chapter 10
16
Impact of saturated fat intake
Exerciser with high
saturated fat intake
Exerciser without high
saturated fat intake
Non-exerciser with high
saturated fat intake
Non-exerciser without high
saturated fat intake
Epidemiology Matters – Chapter 10
17
Impact of saturated fat intake
9 dotted people (high fat consumers) among 40 exercisers

Total prevalence = 22.5% of high fat consumption among the exercisers
18 dotted people (high fat consumers) among the 40 non-exercisers

Total prevalence = 45% of high fat consumption among the non-exercisers
There is a greater proportion of high fat consumers among the non-exercisers
Epidemiology Matters – Chapter 10
18
Example, study 2

Purposive sample of 80 post-menopausal women with no
history of cardiovascular disease

Randomly assigns women to engage in ≥ 30 minutes of
regular cardiovascular exercise ≥ 3 times/week (regular
exercise compared to non-regular exercise)

Follows groups for five years

Counts women in each group who have a cardiovascular
event

Assume no losses to follow-up or noncompliance
Epidemiology Matters – Chapter 10
19
Study 2
Epidemiology Matters – Chapter 10
20
Study 2 - interpretation
Risk of cardiovascular disease among
those randomized to exercise is 14.3%
less than the risk among those
randomized to not exercise.
We expect 10 fewer cases per 100 individuals exposed compared with the unexposed.
Epidemiology Matters – Chapter 10
21
Study 1 vs Study 2
Study 1 risk ratio = 0.5 and risk difference = -0.2
Study 2 risk ratio = 0.86 and risk difference = -0.1
Therefore, the effect is weaker in Study 2 than the effect in
Study 1.
Why?
Epidemiology Matters – Chapter 10
22
Study 2, impact of
saturated fat intake
Exerciser with high
saturated fat intake
Exerciser without high
saturated fat intake
Non-exerciser with high
saturated fat intake
Non-exerciser without high
saturated fat intake
Epidemiology Matters – Chapter 10
23
Study 2, impact of
saturated fat intake
12 dotted people (high fat consumers) among 40 exercisers

Total prevalence = 30% of high fat consumption among the exercisers
12 dotted people (high fat consumers) among the 40 non-exercisers

Total prevalence = 30% of high fat consumption among the non-exercisers
There is the same proportion of excess high fat consumers among both groups
Epidemiology Matters – Chapter 10
24
Limitations to randomization
1. Equipoise and ethics
2. Complication and intention to treat analysis,
3. Placebos and placebo effects, and the
4. Importance of blinding
Epidemiology Matters – Chapter 10
25
Randomization, summary
 When randomization works, all factors that would differ between
two groups who got to choose their exposure status are, on
average, evenly distributed between the groups
 This includes all known risk factors for the outcome and a myriad
unknown or difficult to measure
 Because they are evenly distributed across the groups, factors
cannot affect the study estimates
 Randomized trials are a powerful way to achieve comparability
between exposed and unexposed groups on both known and
unknown factors that cause the outcome
Epidemiology Matters – Chapter 10
26
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
27
Matching
1. Why and how to match
2. Analyzing matched pair data
Epidemiology Matters – Chapter 10
28
Matching

Randomization often unethical and infeasible

Matching controls non-comparability where
randomization is impossible
Epidemiology Matters – Chapter 10
29
Matching
 Participants matched on potential sources of noncomparability
 Matching is a common way to control for noncomparability in design stage
 In a cohort study, exposed individuals are matched to ≥ 1
unexposed individuals on ≥ 1 factor(s) of interest
 In a case control study, diseased individuals are matched
to a sample of disease free individuals
Epidemiology Matters – Chapter 10
30
Matching, example
 Research question: Is low regular consumption of fish oil
associated with development of depression?
 Sample
 25 individuals with a first diagnosis of depression recruited
from local mental health treatment center
 25 individuals with no history of depression from
community surrounding mental health treatment center
Epidemiology Matters – Chapter 10
31
Matching, example
 Concerned about sex as a potential source of noncomparability
 Women more likely to develop depression compared with
men
 Women on average have more nutritious diets and more
likely to supplement diets with fish oil
 Other potential sources of non-comparability to worry about
(though we are not necessarily matching on) are age, alcohol
and cigarette use, socio-economic factors
Epidemiology Matters – Chapter 10
32
Matching, example
Each time we select a case from the treatment center, we
select one or more controls of the same sex
Epidemiology Matters – Chapter 10
33
Matching to control non-comparability
Male low fish oil
Female low fish oil
Male high fish oil
Female high fish oil
Epidemiology Matters – Chapter 10
34
Matching to control non-comparability
Male
Female
Total
Low fish oil
9
18
27
High fish oil
7
16
23
34
50
Total 16
Male low fish oil
Female low fish oil
Male high fish oil
Female high fish oil
Epidemiology Matters – Chapter 10
35
Matching pairs, sex
Male low fish oil
Female low fish oil
Male high fish oil
Female high fish oil
Each pair is identical with respect to the matched
factors
Sample had 50 individuals
Sample now has 25 matched pairs
Epidemiology Matters – Chapter 10
36
Matching pairs, sex
Epidemiology Matters – Chapter 10
37
Analyzing matched pair data
Epidemiology Matters – Chapter 10
38
Analyzing matched pair, example
Interpretation: Individuals who do not consume fish oil are 2.0 times as likely
to develop depression as individuals who consume fish oil, controlling for sex.
Epidemiology Matters – Chapter 10
39
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
40
Control of non-comparability
Design stage
 Randomization
 Matching
Analysis stage
 Stratification
Epidemiology Matters – Chapter 10
41
Stratification
1. Why and how to stratify
2. Interpreting stratified analyses
Epidemiology Matters – Chapter 10
42
Control of non-comparability in the
analysis stage
 Collect data on variables that might contribute to noncomparability
 Our ability to control for non-comparability in analysis
stage is only as good as the quality of measures of
variables contributing to non-comparability
Epidemiology Matters – Chapter 10
43
Control of non-comparability in the
analysis stage
Is a potential factor related to non-comparability
associated with the exposure and the outcome?
Epidemiology Matters – Chapter 10
44
Stratification
Stratification removes effects of non-comparable
variable on an exposure-outcome relation by limiting
the variance on that outcome
Epidemiology Matters – Chapter 10
45
Stratification, example
Examine relation between alcohol consumption and esophageal cancer
among two groups
Non-smokers
 Among individuals who have never smoked a cigarette in their lives,
what is the relation between heavy alcohol consumption and
esophageal cancer?
 Smoking cannot confound the effect estimate because no individual in
this subgroup has engaged in any smoking
Smokers
 Among smokers (presumably around the same duration and average
amount), were those who are heavy alcohol consumers more likely to
develop esophageal cancer?
 Smoking cannot confound the estimate because everyone is a smoker
Epidemiology Matters – Chapter 10
46
Stratification example
non-smokers
Conditional probability of esophageal cancer among heavy alcohol consumers = 1/6 or 16.7%
Conditional probability of esophageal cancer among not heavy alcohol consumers = 1/16 or 6.3%
Risk ratio = 16.7/ 6.3 = 2.65
Risk difference = 16.7– 6.3 = 10.4
Interpretation: There is an increased risk of esophageal cancer among heavy alcohol
consumers, even in the subpopulation of individuals who do not smoke.
Epidemiology Matters – Chapter 10
47
Stratification example
smokers
31
Conditional probability of esophageal cancer among heavy alcohol consumers = 21/31, or 67.7%.
Conditional probability of esophageal cancer among not heavy alcohol consumers = 7/27 or 25.9%
Risk ratio = 67.7 / 25.9 = 2.61
Risk difference = 67.7 – 25.9 = 41.8
There is an increased risk of esophageal cancer among heavy alcohol consumers, even
in the subpopulation of individuals who all smoke.
Epidemiology Matters – Chapter 10
48
Stratification, example
 There is an increased risk of esophageal cancer among heavy
alcohol consumers, even in the subpopulation of individuals who do
not smoke
 There is an increased risk of esophageal cancer among heavy
alcohol consumers, even in the subpopulation of individuals who all
smoke
 Therefore, even when we limit variance on the possible source of
non-comparability (i.e., smoking) there still remains an increased
risk of esophageal cancer among heavy alcohol drinkers
Epidemiology Matters – Chapter 10
49
Non-comparability through
stratification
1.
Careful and rigorous measurement of potential non-comparable
variables is key to control for non-comparability in data analysis
2.
Before stratification, always check that potential non-comparable
variables are associated with exposure and outcome under study
3.
If a variable is not associated with both exposure and outcome,
then stratifying or otherwise controlling for that variable will not
change the estimate of the effect of exposure on outcome
Epidemiology Matters – Chapter 10
50
Non-comparability, another example
Example: cigarette smoking and depression
 Rate of depression higher among cigarette smokers than among
non-smokers
 Hypothesized that smoking can impact neurotransmitters in the
brain that impact negative mood and emotion
 How could sex be a potential source of non-comparability in this
association?
 Men are more likely than women to be smokers
 Men are less likely to experience depression compared with women
Epidemiology Matters – Chapter 10
51
Smoking and depression
example
 Population of interest is adults in general population
 Purposive sample of 80 individuals with no history of
depression
 Assess smoking status at baseline
 Follow over 5 years to see how many develop depression
 Assume no individuals were lost to follow-up
Epidemiology Matters – Chapter 10
52
Smoking and depression
example
Female smoker
Male smoker
Female non-smoker
Male non-smoker
Epidemiology Matters – Chapter 10
53
Smoking and depression
example
Male smoker
Female smoker
Male non-smoker
Female non-smoker
Epidemiology Matters – Chapter 10
54
Smoking and depression
example
Epidemiology Matters – Chapter 10
55
Smoking and depression
example interpretation
Over five years, smokers had 1.04 times the risk of developing depression
compared with nonsmokers, and 1.05 times the odds. There are 10 excess cases
of depression among the smoking group per 100 persons over the course of 5
years (risk difference).
But what about sex?
Epidemiology Matters – Chapter 10
56
Smoking and depression
sex association
Smoking and sex
73% of men are smokers
38.3% of women are smokers
Men are more likely than women to be smokers
Epidemiology Matters – Chapter 10
57
Smoking and depression
sex association
Smoking and sex
73% of men are smokers
38.3% of women are smokers
Men are more likely than women to be smokers
Depression and sex
15% of men are depressed
53.2% of women are depressed
Men are less likely to have depression than women
Epidemiology Matters – Chapter 10
58
Smoking and depression
stratified analysis, men
Among men, those who smoke have 1.5 times the risk of depression compared to those
who do not smoke, over 5 years.
Epidemiology Matters – Chapter 10
59
Smoking and depression
stratified analysis, women
Among women, those who smoke have 1.49 times the risk of depression compared to those
who do not smoke, over 5 years.
Epidemiology Matters – Chapter 10
60
Smoking and depression
stratified analysis, interpretation
 Smoking was not associated with depression in original,
crude analysis
 Stratifying by sex, smoking is associated with the
development of depression
 Sex obscured the association between smoking and
depression
Epidemiology Matters – Chapter 10
61
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
62
Is every variable that is associated with exposure and
outcome a potential source of non-comparability?
No
Epidemiology Matters – Chapter 10
63
Sources of non-comparability
1. Factors in the causal pathway are not noncomparable variables
2. Factors that are consequences of exposure and
outcome
Epidemiology Matters – Chapter 10
64
Factors in causal pathway
 Factors that are on the causal pathway of interest between
the exposure and outcome do not contribute to noncomparability
 If we control for them, we will obstruct the ability to observe
the true effects of the exposure on the outcome
 Factors on the causal pathway of interest should not be
controlled
Epidemiology Matters – Chapter 10
65
Factors in causal pathway, example
 Interested in prenatal exposure to tobacco smoke and
offspring growth restriction during puberty
 Hypothesize that prenatal exposure to tobacco causes low
birth weight, and then this low birth weight causes growth
restriction in puberty
 Should not control for birth weight
Epidemiology Matters – Chapter 10
66
What if we do control for birth weight
through stratification?
 Among offspring with low birth weight, we would find that
exposure to tobacco smoke is unrelated to offspring growth
restriction
 We restricted analysis to only those with the intermediary
outcome of interest - low birth weight
 Among offspring with normal birth weight, we would not find
an association between the exposure and outcome
 We restricted analysis to only those without the
intermediary outcome – low birth weight
Epidemiology Matters – Chapter 10
67
1. Randomization
2. Matching
3. Stratification
4. Sources of non-comparability
5. Summary
Epidemiology Matters – Chapter 10
68
Seven steps
1.
Define the population of interest
2.
Conceptualize and create measures of exposures and health
indicators
3.
Take a sample of the population
4.
Estimate measures of association between exposures and health
indicators of interest
5. Rigorously evaluate whether the association observed
suggests a causal association
6.
Assess the evidence for causes working together
7.
Assess the extent to which the result matters, is externally valid,
to other populations
Epidemiology Matters – Chapter 1
69
epidemiologymatters.org
Epidemiology Matters – Chapter 1
70