Novel Approaches to
Adjusting for Confounding:
Propensity Scores, Instrumental
Variables and MSMs
Matthew Fox
Advanced Epidemiology
What are the exposures you are
interested in studying?
Assuming I could guarantee you
that you would not create bias,
which approach is better:
randomization or adjustment for
every known variable?
What is intention to treat
analysis?
Yesterday

Causal diagrams (DAGS)
–
–
–

Discussed rules of DAGS
Goes beyond statistical methods and
forces us to use prior causal knowledge
Teaches us adjustment can CREATE bias
Helps identify a sufficient set of
confounders
–
Not how to adjust for them
This week

Beyond stratification and regression
–
–
–
–
New approaches to adjusting for (not
“controlling” ) confounding
Instrumental variables
Propensity scores (Confounder scores)
Marginal structural models
 Time
dependent confounding
Given the problems with the odds
ratio, why does everyone use it?
Non-collapsibility of the OR (Excel; SAS)
Odds ratio collapsibility but confounding
C+
E+
CE-
Total
E+
E-
E+
E-
Disease+
400
300
240
180
640
480
Disease-
100
200
360
720
460
920
Total
500
500
600
900
1100
1400
Risk
0.80
0.60
0.40
0.20
0.58
0.34
Odds
4.00
1.50
0.67
0.25
1.39
0.52
Crude
Adj
RR
1.33
2.00
1.6968
1.5496
OR
2.67
2.67
2.67
2.67
RD
0.2
0.2
0.23896
0.2
Solution: SAS Code
title "Crude relative risk model";
proc genmod data=odds descending;
model d = e/link=log dist=bin;
run;
title "Adjusted relative risk model";
proc genmod data=odds descending;
model d = e c/link=log dist=bin;
run;
Results
Model crude:
Exp(0.5288) = 1.6968
Crude RR was 1.6968
Results
Model adjusted:
Exp(0.3794) = 1.461
MH RR was 1.55
STATA


glm d e, family(binomial) link(log)
glm d e c, family(binomial) link(log)
What about risk differences?
Solution: SAS Code
title "Crude risk differences model";
proc genmod data=odds descending;
model d = e/link=bin dist=identity;
run;
title "Adjusted risk differences model";
proc genmod data=odds descending;
model d = e c/link=bin dist=identity;
run;
Results
Model
Model crude:
crude:
0.239= 1.6968
Exp(0.5288)
Crude
0.23896
CrudeRD
RR= was
1.7
Results
Adjusted model :
0.20
MH RD = 0.20
STATA

glm d e, family(binomial) link(identity)
glm d e c, family(binomial) link(identity)

glm d e c c*e, family(binomial) link(identity)

Novel approaches to
controlling confounding
Limitations of Stratification and
Regression

Stratification/regression work well with
point exposures with complete follow up
and sufficient data to adjust
–
–
–
Limited data on confounders or small cells
No counterfactual for some people in our dataset
 Regression often estimates parameters
Time dependent exposures and confounding
 A common situation
 With time dependence, DAGs gets complex
Randomization and Counterfactuals

Ideally, evidence comes from RCTs
–
Randomization gives expectation unexposed can
stand in for the counterfactual ideal

–
In expectation, assuming no other bias


Full exchangeability: E(p1=q1, p2=q2, p3=q3, p4=q4)
[Pr(Ya=1=1) - Pr(Ya=0=1)] = [Pr(Y=1|A=1) - Pr(Y=1|A=0)]
Since we assign A, RRAC = 1
–
If we can’t randomize, what can we do to
approximate randomization?
How randomization works
Randomized Controlled Trial
C1
C2
C3
Randomization
A
D
Randomization strongly predicts exposure (ITT)
A typical observational study
Observational Study
C1
C2
C3
?
A
D
A typical observational study
Observational Study
C1
C2
C3
A
D
Regression/stratification seeks to block backdoor path from
A to D by averaging A-D associations within levels of Cx
Approach 1:
Instrumental Variables
Intention to treat analysis

In an RCT we assign the exposure
–
–

e.g. assign people to take an aspirin a day vs. not
But not all will take aspirin when told to and others
will take it even if told not to
What to do with those who don’t “obey”?
–
The paradigm of intention to treat analysis says
analyze subject in the group they are assigned


Maintains the benefits of randomization
Biases towards the null at worst
Instrumental variables

An approach to dealing with confounding
using a single variable
–

Works along the same lines as randomization
Commonly used approach in economics,
yet rarely used in medical research
–
–
Suggests we are either behind the times or they
are hard to find
Party privileged in economics because little
adjustment data exists
Instrumental variables

An instrument (I):
–
A variable that satisfies 3 conditions:
 Strongly
associated with exposure
 Has no effect on outcome except through A (E)
 Shares no common causes with outcome

Ignore E-D relationship
–
Measure association between I and D

–
This is not confounded
Approximates an ITT approach
Adjust the IV estimate

Can optionally adjust IV estimate to
estimate the effect of A (exposure)
–

But differs from randomization
If an instrument can be found, has the
advantage we can adjust for unknown
confounders
–
This is the benefit we get from randomization?
Intention to Treat (IV Ex 1)



A(Exposure): Aspirin vs. Placebo
Outcome: First MI
Instrument: Randomized assignment
Condition
1:
Condition 2: no
Confounders
Predictor of
direct effect on
A?
the outcome?
Randomization
Therapy
Condition 3: No
common causes
with outcome?
MI
Regression (17 confounders), no effect
RD: -0.06/100; 95% CI -0.26 to 0.14
Confounding by indication
(IVProtective
Ex 2) effect of COX-2
IV:
RD: -1.31/100; -2.42 to -0.20
 A(Exposure): COX2 inhibitor vs NSAID
Compatible
with trial results
 Outcome: GI complications
RD: -0.65/100; -1.08 to -0.22
 Instrument: Physician’s previous prescription
Indications
Previous Px
COX2/NSAID
GI comp
For 3,964 women born 19191940, a 1 SD (1.3 ºC) > mean
st year life
summer
temp
in
1
Unknown confounders
(IV
Ex
3)
associated with 1.12-mmHg
(95% CI: 0.33, 1.91) > adult
systolic
blood pressure, and 1
Hypothesized
hottest/driest
 A(Exposure):
Childhood
dehydration
SD > mean summer rainfall
summers in infancy would
 Outcome: Adult high (33.9
bloodmm)
pressure
associated with <
be associated with severe
systolic blood
pressure (-1.65
Instrument: 1st year summer
climate
infant diarrhea/dehydration,
mmHg, 95% CI: -2.44, -0.85).
and consequently higher
blood pressure in adulthood.
SES
1st year climate
dehydration
High BP
Optionally we can adjust for “noncompliance”

Optionally if we want to estimate A-D
relationship, not I-D, we can adjust:
–
–
–

RDID / RDIE
Inflate the IV estimator to adjust for the lack of
perfect correlation between I and E
If I perfectly predicts E then RDIE = 1, so
adjustment does nothing
Like per protocol analysis
–
But adjusted for confounders
To good to be true?


Maybe
The assumptions needed for an
instrument are un-testable from the data
–

Can only determine if I is associated with A
Failure to meet the assumptions can
cause strong bias
–
Particularly if we have a “weak” instrument
Approach 2:
Propensity Scores
Comes out of a world of large
datasets (Health insurance data)

Cases where we have a small (relative to
the size of the dataset) exposed
population and lots and lots of potential
comparisons in the unexposed group
–

And lots of covariate data to adjust for
Then we have luxury of deciding who to
include in study as a comparison group
based on a counterfactual definition
Propensity Score

Model each subject’s propensity to receive
the index condition as a function of
confounders
–

Model is independent of outcomes, so good for
rare disease, common exposure
Use the propensity score to balance
assignment to index or reference by:
–
–
–
Matching
Stratification
Modeling
Propensity Scores

The propensity score for subject i is:
–
Probability of being assigned to treatment A = 1 vs. reference
A = 0 given a vector xi of observed covariates:
Pr ( Ai  1 | X i  x i )

In other words, the propensity score is:
–
Probability that the person got the exposure given anything
else we know about them
Why estimate the probability a subject
receives a certain treatment when it is
known what treatment they received?
How Propensity Scores Work

Quasi-experiment
–

Using probability a subject would have been treated
(propensity score) to adjust estimate of the
treatment effect, we simulate a RCT
2 subjects with = propensity, one E+, one E–
–
We can think of these two as “randomly assigned”
to groups, since they have the same probability of
being treated, given their covariates
Assumes we have enough observed data that within
levels of propensity E is truly random
Propensity Scores:
Smoking and Colon Cancer

Have info on people’s covariates:
–

Person A is a smoker, B is not
–

Alcohol use, sex, weight, age, exercise, etc:
Both had 85% predicted probability of smoking
If “propensity” to smoke is same, only
difference is 1 smoked and 1 didn’t
–
–
This is essentially what randomization does
B is the counterfactual for A assuming a correct
model for predicting smoking
Obtaining Propensity Scores
in SAS
Calculate propensity score
proc logistic;
model exposure = cov_1 cov_2 … cov_n;
output out = pscoredat pred = pscore;
run;
Either match subjects on propensity score
or adjust for propensity score
proc logistic;
model outcome = exposure pscore;
run;
Pros and Cons of PS

Pros
–
–
–

Adjustment for 1 confounder
Allows estimation of the exposure and fitting a
final model without ever seeing the outcome
Allows us to see parts of data we really should not
be drawing conclusions on b/c no counterfactual
Cons
–
–
–
Only works if have good overlap in pscores
Does not fix conditioning on a collider problem
Doesn’t deal with unmeasured confounders
Study of effect of neighborhood segregation on IMR
Approach 3:
Marginal Structural Models
Time Dependent Confounding

Time dependent confounding:
1)

Time dependent covariate that is a risk
factor for or predictive of the outcome and
also predicts subsequent exposure
Problematic if also:
2)
Past exposure history predicts
subsequent level of covariate
Example
 Observational
study of subjects
infected with HIV
E = HAART therapy
– D = All cause mortality
– C = CD4 count
–
Time Dependent Confounding
A0
1)
C0
D
A1
D
C1
A0
2)
C0
A1
C1
Failure of Traditional Methods

Want to estimate causal effect of A on D
–
–

Can’t stratify on C (it’s an intermediate)
Can’t ignore C (it’s a confounder)
Solution – rather than stratify, weight
Equivalent to standardization
 Create pseudo-population where RRCE = 1
– Weight each person by “inverse probability of
treatment” they actually received
– Weighting doesn’t cause problems pooling did
– In DAG, remove arrow C to A, don’t box
–
Remember back to the SMR
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
W *
a
N1
C1
E+
300
1200
1500
0.2
2.0
E20 D+
180 D200 Total
0.1
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0

300  
50  
a
1500
*

500
*

N1 *

 


W
1500 
500 
N1  

ˆ
SMR 


 2.0
b
b

20  
50  
W
*
N
*
 N
 1 N  1500* 200   500* 1000 
0
0
 


W
SMR
The SMR asks, what if the exposed
had also been unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
E+
300
1200
1500
0.2
2.0
Crude
E+
D+
350
D1650
Total 2000
Risk
0.18
RR
C1
E+
300
1200
1500
0.2
E-
D+
DTotal
RR
E20 D+
180 D200 Total
0.1
RR
E-
D+
DTotal
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0
C0
E+
50
450
500
0.1
E-
SMR
The SMR asks, what if the exposed
had also been unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
E+
300
1200
1500
0.2
2.0
Crude
E+
D+
350
D1650
Total 2000
Risk
0.18
RR
C1
E+
E300
D+
1200
D1500 1500 Total
0.2
RR
E-
D+
DTotal
RR
E20 D+
180 D200 Total
0.1
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0
C0
E+
E50
450
500 500
0.1
SMR
The SMR asks, what if the exposed
had also been unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
E+
300
1200
1500
0.2
2.0
Crude
E+
D+
350
D1650
Total 2000
Risk
0.18
RR
C1
E+
E300
D+
1200
D1500 1500 Total
0.2 0.1
RR
E-
D+
DTotal
RR
E20 D+
180 D200 Total
0.1
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0
C0
E+
E50
450
500 500
0.1 0.05
SMR
The SMR asks, what if the exposed
had also been unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
E+
300
1200
1500
0.2
2.0
Crude
E+
D+
350
D1650
Total 2000
Risk
0.18
RR
C1
E+
300
1200
1500
0.2
2.0
E-
D+
DTotal
RR
E20 D+
180 D200 Total
0.1
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0
E150 D+
1350 D1500 Total
0.1
RR
C0
E+
E50 25
450 475
500 500
0.1 0.05
2.0
SMR
The SMR asks, what if the exposed
had also been unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
C0
E+
EE+
E300
20 now
D+ equals
50 50
Crude
1200 180 D450 950
the adjusted. No
1500 200 Total 500 1000
need to adjust.
0.2 0.1
0.1 0.05
2.0
RR
2.0
Crude
E+
D+
350
D1650
Total 2000
Risk
0.175
RR
2.0
C1
E+
300
1200
1500
0.2
2.0
E175 D+
1825 D2000 Total
0.875
RR
E150 D+
1350 D1500 Total
0.1
RR
C0
E+
E50 25
450 475
500 500
0.1 0.05
2.0
Could also ask, what if everyone
was both exposed, unexposed?
Crude
E+
ED+
350 70 D+
D1650 1130 DTotal 2000 1200 Total
Risk
0.18 0.06
RR
3.0
RR
C1
E+
300
1200
1500
0.2
2.0
E20 D+
180 D200 Total
0.1
RR
C0
E+
E50 50
450 950
500 1000
0.1 0.05
2.0
Could also ask, what if everyone
was both exposed, unexposed?
Crude
E+
D+
DTotal
Risk
RR
C1
E+
ED+
DTotal
RR
E-
C0
E+
E-
D+
D1700 1700 Total 1500 1500
0.2 0.1
0.1 0.05
2.0
RR
2.0
Could also ask, what if everyone
was both exposed, unexposed?
Crude
E+
D+
DTotal
Risk
RR
ED+
DTotal
RR
C1
E+
340
1360
1700
0.2
2.0
C0
EE+
170 D+
150
1530 D1350
1700 Total 1500
0.1
0.1
RR
2.0
E75
1425
1500
0.05
Could also ask, what if everyone
was both exposed, unexposed?
Crude
E+
D+
490
D2710
Total 3200
Risk
0.153
RR
2.0
E245 D+
2955 D3200 Total
0.077
RR
C1
E+
340
1360
1700
0.2
2.0
C0
EE+
170 D+
150
1530 D1350
1700 Total 1500
0.1
0.1
RR
2.0
E75
1425
1500
0.05
What is Inverse Probability
Weighting (IPW)?
Weight each subject by inverse
probability of treatment received
 Probability of treatment is:

–
–

Weighting breaks E-C link only
–

p(receiving treatment received| covariates)
Adjust # of E+ and E- subjects in C strata
Now Marginal (Crude) = Causal Effect
But that’s what we just did
Calculate the weights
Crude
E+
350
1650
2000
0.18
3.0
D+
DTotal
Risk
RR
PT
IPTW


E70 D+
1130 D1200 Total
0.06
RR
C1
E+
300
1200
1500
0.2
2.0
E20 D+
180 D200 Total
0.1
RR
C0
E+
50
450
500
0.1
2.0
E50
950
1000
0.05
Calculate p(receiving treatment received|C)
For C=1, E=1
–
–
PT
= 1500/1700 = 0.88
IPTW = 1/0.88
= 1.13
Calculate the weights
Crude
E+
350
1650
2000
0.18
3.0
D+
DTotal
Risk
RR
PT
IPTW


E70 D+
1130 D1200 Total
0.06
RR
C1
E+
300
1200
1500
0.2
2.0
0.88
1.13
E20 D+
180 D200 Total
0.1
RR
C0
E+
50
450
500
0.1
2.0
E50
950
1000
0.05
Calculate p(receiving treatment received|C)
For C=1, E=1
–
–
PT
= 1500/1700 = 0.88
IPTW = 1/0.88
= 1.13
Calculate the weights
Crude
E+
350
1650
2000
0.18
3.0
D+
DTotal
Risk
RR
PT
IPTW


E70 D+
1130 D1200 Total
0.06
RR
C1
E+
E300
20
1200 180
1500 200
0.2
0.1
2.0
0.88 0.12
1.13 8.50
D+
DTotal
RR
C0
E+
50
450
500
0.1
2.0
E50
950
1000
0.05
Calculate p(receiving treatment received|C)
For C=1, E=0
–
–
PT
= 200/1700
IPTW = 1/0.12
= 0.12
= 8.50
Calculate the weights
Crude
E+
350
1650
2000
0.18
3.0
D+
DTotal
Risk
RR
PT
IPTW


D+
DTotal
RR
C0
E+
E50
50
450 950
500 1000
0.1 0.05
2.0
0.33 0.67
3.00 1.50
Calculate p(receiving treatment received|C)
For C=1, E=0
–
–

E70 D+
1130 D1200 Total
0.06
RR
C1
E+
E300
20
1200 180
1500 200
0.2
0.1
2.0
0.88 0.12
1.13 8.50
PT
= 200/1700
IPTW = 1/0.12
= 0.12
= 8.50
Multiply cell number by the weights
Apply the weights
Crude
E+
350
1650
2000
0.18
3.0
E70 D+
1130 D1200 Total
0.06
RR
D+
DTotal
Risk
RR
PT
IPTW
Pseudo population
Crude
E+
ED+
DTotal
Risk
RR
D+
DTotal
RR
C1
E+
E300
20
1200 180
1500 200
0.2
0.1
2.0
0.88 0.12
1.13 8.50
D+
DTotal
RR
C1
E+
E340 170 D+
1360 1530 D1700 1700 Total
0.2
0.1
2.0
RR
C0
E+
E50
50
450 950
500 1000
0.1 0.05
2.0
0.33 0.67
3.00 1.50
C0
E+
E150
75
1350 1425
1500 1500
0.1 0.05
2.0
Collapse
Crude
E+
EBroke link between
D+
350
70 D+
C and
E
without
D1650 1130 Dstratification,
no Total
Total
2000so1200
Risk
problem
of0.18 0.06
RR
3.0
RR
conditioning
on
PT
collider
IPTW
Pseudo population
Crude
E+
ED+
490
245 D+
D2710 2955 DTotal
3200 3200 Total
Risk
0.153 0.077
RR
2.0
RR
C1
E+
E300
20
1200 180
1500 200
0.2
0.1
2.0
0.88 0.12
1.13 8.50
D+
DTotal
RR
C1
E+
E340 170 D+
1360 1530 D1700 1700 Total
0.2
0.1
2.0
RR
C0
E+
E50
50
450 950
500 1000
0.1 0.05
2.0
0.33 0.67
3.00 1.50
C0
E+
E150
75
1350 1425
1500 1500
0.1 0.05
2.0
Pseudo-population

The “pseudo-population” breaks the link
between the exposure and the outcome
without stratification
–
–

Note this is different from stratifying
Create a standard population without confounding
By creating multiple copies of people,
standard errors will be biased
–
Use robust standard errors to adjust
“The IPTW method effectively
simulates the data that would be
observed had, contrary to fact,
exposure been conditionally
randomized”
Robins and Hernán
Time Dependent Confounding

Extend method to time dependent
confounders
–

Predict p(receiving treatment actually received at
time t1|covariates, treatment at t0)
Probability of treatment at t1 is:
p(receiving treatment received at t0) *
p(receiving treatment received at t1)

See Hernán for SAS code, not hard

scwgt command, robust SE (repeated statement)
Time Dependent Confounding
1) Before IPTW
E0
C0
2) After IPTW
C0
E1
D
E1
D
C1
E0
C1
Limitations of MSMs


Very sensitive to weights
Still need to be able to be able to predict
the exposure
–

The methods solves the structural problem, but
we still need the data to be able to accurately
predict exposure
Still have to get the model right