Causal learning and modeling David Danks CMU Philosophy & Psychology

advertisement
Causal learning and
modeling
David Danks
CMU Philosophy & Psychology
2014 NASSLLI
High-level overview

Monday:
 History
of causal inference
 Basic representation of causal structures

Tuesday:
 Inference
& reasoning using graphical models
 Interventions in causal structures
High-level overview

Wednesday:
 Basic

principles of search & causal discovery
Thursday:
 Challenges
 Both
to causal discovery, and responses
principled and real-world
High-level overview

Friday: One of two possibilities:
 Singular
/ actual causation & counterfactuals (in the
causal graphical model framework)
 Recent advances in causal learning & inference
 Decided
by a vote at end-of-class tomorrow (Tues)
Structure & assumptions

Mix of lecture & (group) problem-solving, so if you
have questions/uncertainty,
Ask!
 If

you’re confused, then someone else probably is too…
Assuming basic knowledge of probabilities
 Focus
is on conceptual/foundational issues, not the
technical details
 But ask if you want to know more about those details!
A BRIEF HISTORY OF CAUSAL
DISCOVERY
“Big Picture” (very roughly)

Greeks - 1750: Unhelpful platitudes

1750 - 1950: Practical successes

1950 - present: Computers + Formal models =
principled methods
Aristotle



384-322 BC
Trying to answer:
“Why does X have A?”
Four types of ‘cause’
 Formal:
Because of its structure
 Material: Because of its composition
 Efficient: Because of its development
 Final: Because of its purpose

But no systematic theory of inference
Francis Bacon


1561-1626
Novum Organum (1620)
 For
any phenomenon, construct:
 The
table of presence (tabula praesentiae)
 The table of absence (tabula absentiae)
 The table of degrees (tabula graduum)
 The
cause of the phenomenon is the set of properties
that explains every case on each of the three tables
John Stuart Mill


1806-1873
System of Logic (1843)
 Algorithmic
form of Bacon’s
method (though unattributed)
 Method
of agreement
 Method of difference
 Method of concomitant variation
David Hume


1711-1776
Causal inference cannot be
done using deduction
 It
is always logically possible that future “causes” will
not be followed by the effect
 Actually a general argument about induction

But we do it by “custom or habit”
 Had
an evolutionary justification, but no framework in
which to express it
Responses to Hume’s skepticism

Hume’s arguments were quite influential in
philosophical circles
 And

still matter in present-day philosophy
But in the sciences, people were starting to find
methods that (sometimes) gave answers that at least
seemed right…
Regression (Least Squares)

18th c. astronomy: find the “best” values for 6
unknowns given 75 observations
 Euler
(1748)
 Failed
due to computational intractability
 Legendre
(1805)
 Developed
 Gauss
the method of least squares
(1795 / 1809)
 Independent

(earlier, unpublished) discovery & justification
Still the most common causal inference method…
Growth of statistics

Early theory of statistics emerges from probability
theory throughout the 1800s
1822
1911
Galton
1749
Laplace
1796
1827
Pearson
1857
Quetelet
1874
1863
Spearman
Yule
1871
1800
1936
1900
1945
1951
Ronald A. Fisher


1890-1962
Essentially the father of modern
statistics, and developed:
 An
array of statistical tests
 An analysis of various experimental designs
 The standard statistical and methodological reference
texts for a generation of scientists
Sewall Wright


1889-1988
Path analysis
 Graphs
encode high-level
structure, and then regression
can be used to estimate parameters
 By mid-20th c., it had been adopted by a number of
economists and sociologists
 But no search procedures were provided
 Have
to know the high-level structure
Causal graphical models

Developed by statisticians, computer scientists, and
philosophers
 Dawid,
Spiegelhalter, Wermuth, Cox, Lauritzen, Pearl,
Spirtes, Glymour, Scheines

Represent both qualitative and quantitative aspects
of causation
REPRESENTING CAUSAL
STRUCTURES
Qualitative representation

We want a representation that captures many
qualitative features of causality
Qualitative representation

We want a representation that captures many
qualitative features of causality
 Causation
occurs among variables ⇒
One node per variable
Qualitative representation

We want a representation that captures many
qualitative features of causality
 Causation
occurs among variables ⇒
One node per variable
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation

We want a representation that captures many
qualitative features of causality
 Asymmetry
of causation ⇒
Need an asymmetric connection in the graph
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation

We want a representation that captures many
qualitative features of causality
 Asymmetry
of causation ⇒
Need an asymmetric connection in the graph
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation

We want a representation that captures many
qualitative features of causality
 No
(immediate) reciprocal causation ⇒
No cycles (without explicit temporal indexing)
Food
Eaten
Exercise
Weight
Metabolism
Qualitative representation

We want a representation that captures many
qualitative features of causality
 No
(immediate) reciprocal causation ⇒
No cycles (without explicit temporal indexing)
Food
Eaten
Exercise
Food
Eaten
Weight
Metabolism
Time t
Exercise
Weight
Metabolism
Time t+1
Directed Acyclic Graphs

More precise: DAG G = <V, E>
V
= set of nodes (for variables)
 E = set of edges (i.e., ordered pairs of nodes)

Path π = sequence of adjacent edges
 Directed


path = path with all edges same direction
Acyclicity: No directed path from node A to itself
In general: We use genealogical & topological
language to describe graphical relationships
Quantitative representation

DAGs alone can represent “A causes B”…
but not “strength” or “form” of causation
 Need
to represent the relationships between the various
variables states
 Exact quantitative representation will depend on the
type of variables being represented
Bayesian networks


All variables are discrete/categorical
Represent quantitative causation using a joint
probability distribution
 I.e.,
a specification of the probability of any
combination of variable values, such as:
 P(E=Hi
& FE=Lo & M=Hi & W=Hi) = 0.001;
P(E=Hi & FE=Lo & M=Hi & W=Lo) = 0.03;
etc.

Note: Nothing inherently Bayesian about Bayes nets!
Structural Equation Models (SEMs)


All variables are continuous/real-valued
Represent quantitative causation using systems of
linear equations
 For
example:
Exercise = a1FE + a2M + a3W + εE_noise
FE = b1E + b2M + b3W + εFE_noise
etc.
Connecting the pieces

DAG-based graphical model:
Qualitative
Quantitative
???
P(X) =
P(X1) P(X2 | X1)
P(X3 | X1) P(X4 | X1,X2)
Connecting the pieces

Causal Markov assumption:
 Variables
are independent of their non-effects
conditional on their direct causes
 Use
the qualitative graph to constrain the quantitative
relationships
 Encodes
 Given
the intuition of “screening off”
the values of the direct causes, learning the value of
a non-effect doesn’t help me predict
Connecting the pieces

Markov assumption for Bayes nets ⇒
 Markov
factorization of P(X1, X2, …):
Connecting the pieces

Markov assumption for Bayes nets:
 Markov
factorization of P(X1, X2, …):
 Example:
Food
Eaten
Exercise
⇒
Weight
Metabolism
P(E, FE, W, M) =
P(E) * P(FE | E) *
P(M | E) *
P(W | M, FE)
Connecting the pieces

Markov assumption for SEMs:
 Markov
factorization of joint probability density:
Connecting the pieces

Markov assumption for SEMs:
 Markov
factorization of joint probability density:
 Example:
Food
Eaten
Exercise
⇒
Weight
Metabolism
E = εE_noise
FE = a1E + εFE_noise
M = b1E + εM_noise
W = c1FE + c2M + εC_noise
Connecting the pieces

Causal Faithfulness assumption
 The
only independencies are those predicted by the
Markov assumption
 Uses
the quantitative relations to constrain the qualitative
graph
 Implication: No exactly counter-balancing causal paths

Exercise → Food Eaten → Weight
Exercise → Metabolism → Weight
do not exactly offset one another
 Implication:

and
No perfectly deterministic relationships
In particular, no variable is a mathematical function of others
Causal vs. statistical models

Bayes nets and SEMs are not inherently causal
models
 Markov
and Faithfulness assumptions can be expressed
purely as graph-quant. constraints

Assuming a non-causal version of the assumptions ⇒
purely statistical model
 I.e.,
a compact representation of statistical
independencies among some set of variables
Causation and intervention

Causal claims support counterfactuals
 In
particular, those about interventions
 “If
I had flipped the switch, the light would have turned on”
 “If she hadn’t dropped the plate, then it would not have
broken”
 Etc.
Causation and intervention

One of the central causal asymmetries

Interventions on a cause lead to changes in the effect


In contrast, interventions on an effect do not lead to changes
in the cause


Flipping the switch turns off the light
Breaking the light bulb doesn’t flip the switch
Some have argued that this is the paradigmatic
feature of causation (Woodward, Hausman)
Looking ahead…


Have: Basic formal representation for causation
Need:
 Fundamental
causal asymmetry (of intervention)
 Inference & reasoning methods
 Search & causal discovery methods
Looking ahead…


Have: Basic formal representation for causation
Need:
 Fundamental
causal asymmetry (of intervention)
 Inference & reasoning methods
 Search & causal discovery methods
Download