Correlation, Causation and Counterfactual Theory

Using causal graphs to
understand bias in the
medical literature
About these slides
 This presentation was created for the Boston Less Wrong Meetup by Anders
Huitfeldt (Anders_H)
 I have tried to optimize for intuitive understanding, not for technical
precision or mathematical rigor
 The slides are inspired by courses taught by Miguel Hernan and Jamie
Robins at the Harvard School of Public Health
Directed Acyclic Graphs
 This is a Directed Acyclic Graph:
 The nodes (letters) are variables
 The graph is “Directed” because the arrows have direction
 It is “Acyclic” because, if you move in the direction of the arrows, you can never
get back to where you began
 We use these graphs to represents the assumptions we are making about the
relationships between the individual variables on the graph
Directed Acyclic Graphs
 We can use DAGs to reason about which statements about independence
are logical consequences of other statements about independence
 The rules for this type of reasoning are called “D-Separation” (Pearl, 1987)
 It is possible to do the same thing using algebra, but D-Separation saves a
lot of time and energy
Directed Acyclic Graphs
 This DAG is complete
 There is a direct arrow between all variables on the graph
 This means we are not making any assumptions about independence
between any two variables
Directed Acyclic Graphs
 On this DAG, there are missing arrows
 Each missing arrow corresponds to assumptions about independence
 Specifically, when arrows are missing, we assume that every variable is independent of
the past, given the joint distribution of its parents
 Other independencies may follow automatically
Directed Acyclic Graphs
 There is a «path» between two variables if you can get from one variable to
the other by following arrows. The direction of the arrows does not matter
for determining if a path exists (but does matter for whether it is open).
 We can tell whether two variables are independent by checking whether
there is an open path between them
Colliders and Non-Colliders

A path from A to C via B could be of four different types:

ABC

ABC

ABC

A B  C

The last of these types is different from the others: We call B a “Collider” on this path

Notice how the arrows from A and C “Collide” in B

On the three other types of paths, B is a “Non-Collider”

Note that the concept of “Collider” is path-dependent: B could be a collider on one path, and a non-collider on another
path
Conditioning
 If we look at the data within levels of a covariate, that covariate has been
“Conditioned on”
 We represent that by drawing a box around the variable on the DAG
The Rules of D-Separation
 If variables have not been conditioned on:
 Non-Colliders are open
 Colliders are closed (unless a downstream consequence is conditioned on)
 If variables have been conditioned on:
 Non-colliders that have been conditioned on are closed
 Colliders that have been conditioned on are open
 Colliders that have downstream consequences that have been conditioned on,
are open
The Rules of D-Separation
 A path from A to B is open if all variables between A and B are open on
that path
 Two variables d-separated (independent) if all paths between them are
closed
 Two variables A and B are d-separated conditional on a third variable, if
conditioning on the third variable closes all paths between A and B
Causal DAGs
 We can give the DAG a causal interpretation if we are willing to assume
 That the variables are in temporal (Causal) order
 And that whenever two variables on our graph share a common ancestor, that
ancestor is also shown on the graph
 If we have a Causal DAG, we can use it as a map of the data generating
mechanism:
 Causal DAGs can be interpreted to say that if we change the value of A,
that «change» will propagate to other variables in the direction of the
arrows
Causal DAGs
 All scientific papers make assumptions about the data generating
mechanism. Causal DAGs are simply a very good way of being explicit
about those assumptions. We use them because:
 They assure us that our assumptions correspond to a plausible, logically
consistent causal process
 They make it easy to verify that our analysis matches the assumptions
 They give us very precise definitions of different types of bias
 They make it much easier to think about complicated data generating
mechanisms
Causal DAGs
 Note that we can never know what the data generating mechanism
actually looks like
 The best we can do is make arguments that our map fits the territory
 Sometimes it is very obvious that the map does not match the territory.
Causal Inference
 A pathway is causal if every arrow on the path is in the forward direction
 If I intervene to change the value of A, this will lead to changes in all
variables that are downstream from A
 The goal of causal inference is to predict how much the outcome Y will
change if you change A
 In other words, we are quantifying the magnitude of the combination of all
forward-going pathways from A to Y
 If we have data from the observed world, and we know that the only open
pathway from exposure to the outcome is in the forward direction, then the
observed correlation is purely due to causation
Bias
 However, if there exists any open pathway between the exposure and the
outcome where one or more of the arrows is in the backward direction,
there is bias
 Open pathways that have arrows in the backward direction will lead to
correlation in the observed data
 But that correlation will not be reproduced in an experiment where you
intervened to change the exposure
 The two main types of bias are confounding and selection bias
 Confounding is a question of who gets exposed
 Selection bias is a question of who gets to be part of the study
Confounding
 Confounding bias occurs when there is a common cause of the exposure
and the outcome
 You can check for it using the “Backdoor Criterion”
 If there exists an open path between A and Y that goes into A (as opposed
to out from A), we call that a “Backdoor path”
 A backdoor path between A and Y will always have an arrow in the
backwards direction
Example of a DAG with Confounding
Confounding
 Notice that if we had randomized people to be smokers or non-smokers,
the arrow from Sex to Smoking could not exist
 We would know it didn’t exist, because the only cause of smoking is our
random number generator
 Therefore, there could be no confounding
 The best way to abolish confounding is to randomize exposure. However,
this is expensive, and is usually not feasible
Controlling for Confounding
 There are many ways to control for confounding if the data is observational
instead of experimental
 Standard analysis (stratification, regression, matching) are based on looking
at the effect within levels of a confounder
 If we do this, we put a box around the confounder on the DAG
 This closes the backdoor path
 If we condition on all the confounders, the only open pathways will be in
the forward direction, and all remaining correlation between the exposure
and the outcome is due to causation
Controlling for Confounding
 An alternative way to control for confounding, is to simulate a world where
there are no arrows into treatment
 We do this by weighting all observation by the inverse probability of
treatment, given the confounders.
 We can represent this on the DAG by abolishing the arrows from the
confounders to the treatment (in contrast to drawing a box around the
confounder)
 In this simulated world we can run any analysis we want without
considering confounding
 There are situations where this type of analysis is valid, whereas all
conditioning-based methods such as regression are biased.
Controlling for Confounding
 Before you choose to control for a variable, make sure it actually is a
confounder
 If you control for something that is not a confounder, you can introduce
bias
 For example, this can happen if you control for a causal intermediate
Controlling for Confounding
 Make sure you never control for anything that is causally downstream from
the exposure
 For example, in this situation, the investigators want to find the effect of
eating at McDonalds on the risk of Heart Attacks. They have controlled for
BMI
 This introduces bias by blocking part of the effect we are interested in
M-Bias
 Just because a variable is pre-treatment and correlated with the outcome
does not make it safe to control for
 In fact, sometimes controlling for a pre-treatment variable introduces bias.
M-Bias
 Consider the following DAG:
 You want to estimate the effect of smoking on cancer
 Should you control for Coffee Drinking or not?
Selection Bias
 Selection bias occurs when the investigators have accidentally
conditioned on a collider
Selection Bias
 Imagine you are interested in the effect of Socioeconomic Status on
Cancer
 Since it is easier to get an exact diagnosis at autopsy, you decided to only
enroll people who had an autopsy in your study
 This means you are looking at the effect within a single level of autopsy:
“Autopsy = 1”
 The variable has been conditioned on
 People of low socioeconomic status are less likely to have an autopsy
 People with cancer are also less likely to have an autopsy.
Selection Bias
• There is now an open pathway from Socioeconomic Status to Cancer with a backward arrow:
• Socioeconomic Status  Autopsy  Cancer
Evaluating a Scientific Paper
 If you are given a paper, and you want to know if the claims are likely to be
true:
1.
First, make sure they are addressing a well-defined causal question
2.
Look at the analysis section and determine what map the authors have of the
data generating mechanism
3.
Ask yourself whether you think the implied map captures the important features
of the territory
 Is there confounding that has not been accounted for? Did the authors accidentally
condition on any variables to cause selection bias?
Evaluating a Scientific Paper
 Example:
 Prof Yudkowsky wants to estimate the effect of reading HPMOR, on the
probability of defeating dark lords
 He controls for sex
Evaluating a Scientific Paper
1.
Draw the DAG that Prof Yudkowski had in mind when he conducted this
analysis
2.
Do you think this DAG captures the most important aspects of the
territory?
Time-Dependent Confounding
 In many situations, exposure varies with time
 We can picture this has having an exposure variable for every time point,
labelled A0, A1, A2 etc
 There may also be time-dependent confounding by L0, L1 and L2
Time-Dependent Confounding
 On this graph, L1 confounds the effect of A1 on Y
 However, it is also on the causal pathway from A0 to Y
 Do you control for it or not?
Time-Dependent Confounding
 In this situation, all stratification-based approaches, such as regression or
matching, are biased.
 This is because these methods put a box around the variable L1, blocking
part of the effect we are studying
 Methods for controlling for confounding that do rely on conditioning on L1
are still valid
 This includes inverse probability weighting (marginal structural models), the
parametric g-formula and G-Estimation
Time-Dependent Confounding
 Time-dependent confounding is very common in real data-generating
mechanisms
 Consider the following scenario:
 If I don’t take my pills this year, my health is likely to decrease next year
 If my health has decreased next year, I am more likely to take my pills next year
 Health predicts my risk of death
 In this situation, it is impossible to obtain the effect of pills on mortality
without using generalized (non-stratification based) methods
 This is true whenever there is a feedback loop like the one described here
Time-Dependent Confounding
 There are many alternative “causal” models that do not recognize timedependent confounding
 These models work fine if exposure is truly something that does not vary with
time
 However, that is very rarely the case
 If we are not trained to draw maps that recognize this important feature of
the territory, we will end up assuming that it does not exist
 This is often a bad assumption
Further Reading
 If you are a mathematician or computer scientist, and want a very formal
understanding of the theory:
 Judea Pearl. Causality: Models of Reasoning and Inference. (Cambridge
University Press, 2000)
 If are not a mathematician, but want to understand how to apply causal
methods to analyze observational data
 Miguel Hernan and James Robins. Causal Inference. (Chapman & Hall/CRC,
2013)
 Most of the book is available for free at http://www.hsph.harvard.edu/miguelhernan/causal-inference-book/