Causal Inference from Multivariate Time Series: Principles and Problems Michael Eichler

advertisement
Theory of Big Data 2 Conference
Big Data Institute, University College London
Causal Inference from Multivariate Time Series:
Principles and Problems
Michael Eichler
Department of Quantitative Economics
Maastricht University
http://researchers-sbe.unimaas.nl/michaeleichler
6 January 2016
Outline
• Causality concepts
• Graphical representation
•
•
•
Definition
Markov properties
Extension: systems with latent variables
• Causal learning
•
•
Basic principles
Identification from empirical relationships
• Non-Markovian constraints
•
•
•
Trek-separation in graphs
Tetrad representation theorem
Testing for tetrad constraints
• Open problems and conclusions
2 / 52
Concepts of causality for time series
Z
We consider two variables X and Y measured at discrete times t ∈ :
X = Xt t∈Z , Y = Yt t∈Z .
Question: When is it justified to say that X causes Y?
Various approaches:
• Intervention causality (Pearl, 1993; Eichler & Didelez 2007, 2010)
• Structural causality (White and Lu, 2010)
• Granger causality (Granger, 1967, 1980, 1988)
• Sims causality (Sims, 1972)
3 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
4 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
This leads us to consider two information sets:
• F ∗ (t) - all information in the universe up to time t
∗
• F−X
(t) - this information except the values of X
4 / 52
Granger causality
Two fundamental princples:
• The cause precedes its effect in time.
• The causal series contains special information about the series being
caused that is not available otherwise.
This leads us to consider two information sets:
• F ∗ (t) - all information in the universe up to time t
∗
• F−X
(t) - this information except the values of X
Granger’s definition of causality (Granger 1969, 1980)
We say that X causes Y if the probability distributions of
• Yt+1 given F ∗ (t) and
∗
• Yt+1 given F−X
(t)
are different.
4 / 52
Granger causality
Problem: The definition cannot be used with actual data.
5 / 52
Granger causality
Problem: The definition cannot be used with actual data.
Suppose data consist of multivariate time series V = (X, Y, Z) and let
• {X t } - information given by X up to time t
• similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
Yt+1 ⊥⊥ X t | Y t , Zt .
• Otherwise we say that X Granger-causes Y with respect to V.
5 / 52
Granger causality
Problem: The definition cannot be used with actual data.
Suppose data consist of multivariate time series V = (X, Y, Z) and let
• {X t } - information given by X up to time t
• similarly for Y and Z
Definition: Granger non-causality
• X is Granger-noncausal for Y with respect to V if
Yt+1 ⊥⊥ X t | Y t , Zt .
• Otherwise we say that X Granger-causes Y with respect to V.
Additionally:
• X and Y are said to be contemporaneously independent w.r.t. V if
Xt+1 ⊥⊥ Yt+1 | V t
5 / 52
Sims causality
Definition: Sims non-causality
X does not Sims-cause Y with respect to V = (X, Y, Z) if
{Yt′ |t′ > t} ⊥⊥ Xt | X t−1 , Y t , Zt
Note:
• Granger causality is a concept of direct causality
• Sims causality is a concept of total causality (direct and indirect
pathways)
The following statistics are measures for Sims causality:
• impulse response function (time and frequency domain)
• direct transfer function (DTF)
6 / 52
Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector
autoregressive representation
Xt =
∞
P
Ak X t−k + ǫ t
k=1
Granger non-causality in VAR models:
The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈
N.
7 / 52
Vector autoregressive processes
Let X be a multivariate stationary Gaussian time series with vector
autoregressive representation
Xt =
∞
P
Ak X t−k + ǫ t =
k=1
∞
P
Bk ǫt−k
k=0
Granger non-causality in VAR models:
The following are equivalent:
• Xb does not Granger cause Xa with respect to X;
• Aab,k = 0 for all k ∈
N.
Sims non-causality in VAR models:
The following are equivalent:
• Xb does not Sims cause Xa with respect to X;
• Bab,k = 0 for all k ∈
N.
7 / 52
Outline
• Causality concepts
• Graphical representation
•
•
•
Definition
Markov properties
Extension: systems with latent variables
• Causal learning
•
•
Basic principles
Identification from empirical relationships
• Non-Markovian constraints
•
•
•
Trek-separation in graphs
Tetrad representation theorem
Testing for tetrad constraints
• Open problems and conclusions
8 / 52
Graphical models for time series
Basic idea: use graphs to encode conditional independences among
variables
• nodes/vertices represent variables
• missing edge between two nodes implies conditional independence
of the two variables
Application to time series:
• treat each variable at each time separately ( time series chain
graphs)
• treat each series as one variables (only one node in the graph)
9 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv ;
10 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv ;
• directed edges between the vertices indicate Granger-causal
relationships;
10 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Idea: represent Granger-causal relations in X by mixed graph G:
• vertices v ∈ V represent the variables (time series) Xv ;
• directed edges between the vertices indicate Granger-causal
relationships;
• additionally undirected (dashed) edges indicate contemporaneous
associations.
10 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
• X2,t = f2 (X4,t−1 ) + ǫ2,t
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
• X2,t = f2 (X4,t−1 ) + ǫ2,t
• X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
• X2,t = f2 (X4,t−1 ) + ǫ2,t
• X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t
• X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
• X2,t = f2 (X4,t−1 ) + ǫ2,t
• X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t
• X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t
• X5,t = f5 (X3,t−1 ) + ǫ5,t
11 / 52
Graphical models for time series
Granger causality graphs (Eichler 2007)
Example: consider five-dimensional autoregressive process XV
X t = f (X t−1 ) + ǫ t
4
2
1
3
5
with
• X1,t = f1 (X3,t−1 ) + ǫ1,t
• X2,t = f2 (X4,t−1 ) + ǫ2,t
• X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t
• X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t
• X5,t = f5 (X3,t−1 ) + ǫ5,t
• ǫ1,t , ǫ2,t , ǫ3,t ⊥⊥ ǫ4,t , ǫ5,t
ǫ4,t ⊥⊥ ǫ5,t
11 / 52
Markov properties
Objective: derive Granger-causal relationships for X S , S ⊆ V
12 / 52
Markov properties
Objective: derive Granger-causal relationships for X S , S ⊆ V
Idea: characterize pathways that induce associations
12 / 52
Markov properties
Objective: derive Granger-causal relationships for X S , S ⊆ V
Idea: characterize pathways that induce associations
Tool: concepts of separation in graphs
• DAGs: d-separation (Pearl 1988)
• mixed graphs: d-separation (Spirtes et al. 1998, Koster 1999) or
m-separation (Richardson 2003)
12 / 52
Markov properties
2
3
1
p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 )
⇒ X3 ⊥⊥ X1 | X2
13 / 52
Markov properties
2
2
3
1
3
1
p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 )
p(x) = p(x1 |x2 )p(x3 |x2 )p(x2 )
⇒ X3 ⊥⊥ X1 | X2
⇒ X3 ⊥⊥ X1 | X2
13 / 52
Markov properties
2
2
3
1
3
1
p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 )
⇒ X3 ⊥⊥ X1 | X2
2
p(x) = p(x1 |x2 )p(x3 |x2 )p(x2 )
⇒ X3 ⊥⊥ X1 | X2
3
1
p(x) = p(x2 |x1 , x3 )p(x3 )p(x1 )
6⇒ X3 ⊥⊥ X1 | X2
13 / 52
Global Granger-causal Markov property
Separation in mixed graphs
Question: What type of paths induce Granger causal relations between
variables?
Note: Granger (non)causality is not symmetric
Idea: consider only paths ending with a directed edge Examples: 1 2 3 4 entails
• X1 does not Granger cause X4 with respect to X1 , X4
• X1 does not Granger cause X4 with respect to X1 , X3 , X4
• X1 does not Granger cause X4 with respect to X1 , X2 , X3 , X4
but not
• X1 does not Granger cause X4 with respect to X1 , X2 , X4
14 / 52
Outline
• Causality concepts
• Graphical representation
•
•
•
Definition
Markov properties
Extension: systems with latent variables
• Causal learning
•
•
Basic principles
Identification from empirical relationships
• Non-Markovian constraints
•
•
•
Trek-separation in graphs
Tetrad representation theorem
Testing for tetrad constraints
• Open problems and conclusions
15 / 52
Principles of causal inference
Objective: identify causal structure of process X
Question: What to use in practise?
• Granger causality or Sims causality
• bivariate or fully multivariate analysis
16 / 52
Principles of causal inference
Objective: identify causal structure of process X
Question: What to use in practise?
• Granger causality or Sims causality
• bivariate or fully multivariate analysis
Answer:
For causal inference . . . all and more.
16 / 52
Principles of identification
An example of indirect causality:
2
1
3
implies for the bivariate submodel
1
3
17 / 52
Principles of identification
An example of spurious causality:
2
L
3
1
implies for the trivariate and bivariate submodels
2
1
3
1
3
18 / 52
Principles of identification
Inverse problem:
What can we say about the full system based on observed
Granger-noncausal relations for the observed (sub)process?
Suppose
• Xa → Xc [XS ] for all {a, c} ⊆ S ⊆ V
• Xc → Xb [XS ] for all {c, b} ⊆ S ⊆ V
Rules of causal inference
• Indirect causality rule: Xa truely causes Xb if
Xa 9 Xb [S]
for some S ⊆ V with c ∈ S
• Spurious causality rule: Xa is a spurious cause of Xb if
Xa 9 Xb [S]
for some S ⊆ V with c ∈
/S
19 / 52
Principles of causal inference
Z
U
Y
X
bivariate Granger
trivariate Granger
trivariate Sims
0.2
0.2
0.2
0.0
BYX(h)
0.4
AYX(h)
0.4
AYX(h)
0.4
0.0
−0.2
0.0
−0.2
2
4
6
lag h
8
10
−0.2
2
4
6
lag h
8
10
2
4
6
8
lag h
10
12
14
20 / 52
Principles of causal inference
Z
U
V
Y
X
bivariate Granger
trivariate Granger
trivariate Sims
0.2
0.2
0.2
0.0
BYX(h)
0.4
AYX(h)
0.4
AYX(h)
0.4
0.0
−0.2
0.0
−0.2
2
4
6
lag h
8
10
−0.2
2
4
6
lag h
8
10
2
4
6
8
lag h
10
12
14
21 / 52
Identification of causal structure
Algorithm: identification of adjacencies
b whenever Xa and Xb are not contemporaneously
independent
• insert a
• insert a b whenever
•
•
Xb → Xa [XS ] for all S ⊆ V with a, b ∈ S;
Xa (t − k) 6⊥⊥ Xb (t + 1) | FS1 (t) ∨ FS2 (t − k) ∨ Fa (t − k − 1)
for all k ∈ , t ∈ , for all disjoint S1 , S2 ⊆ V with b ∈ S1 and
a∈
/ S1 ∪ S2 .
N
Z
22 / 52
Identification of causal structure
Algorithm: identification of tails
• colliders:
a c b ∈ G and Xa 9 Xb [XS ] for some S such that c ∈
/S
⇒ c b cb
• non-colliders:
a c b ∈ G and Xa 9 Xb [XS ] for some S such that c ∈ S
⇒ c b cb
• ancestors:
a . . . b in G
⇒
a b ab
• discriminating paths: e.g. Ali et al. (2004)
23 / 52
Identification of causal structure
Example: application to neural spike train data
Neuron 1
Neuron 2
Neuron 3
Neuron 4
Neuron 5
Neuron 6
Neuron 7
Neuron 8
Neuron 9
Neuron 10
0
2
4
6
8
0.4
0.4
0.3
0.3
0.3
0.2
0.1
0.0
−0.1
0.2
0.1
0.0
−0.1
−0.2
−40
−20
0
lag
20
40
60
0.2
0.1
0.0
−0.1
−0.2
−60
−0.2
−60
−40
−20
0
lag
20
40
60
0.4
0.4
0.3
0.3
0.3
0.2
0.1
0.0
pdc(3 → 4)
0.4
pdc(2 → 4)
pdc(2 → 3)
pdc(1 → 4)
0.4
pdc(1 → 3)
pdc(1 → 2)
Time [sec]
0.2
0.1
0.0
−0.2
−0.2
−0.2
0
lag
20
40
60
−60
−40
−20
0
lag
20
40
60
0
lag
20
40
60
−60
−40
−20
0
lag
20
40
60
0.0
−0.1
−20
−20
0.1
−0.1
−40
−40
0.2
−0.1
−60
−60
24 / 52
Identification of causal structure
Example:
(a)
3
2
(b)
(c)
(d)
(f)
(g)
(h)
4
1
(e)
(j)
(i)
Result:
(k)
2
1
3
4
25 / 52
Outline
• Causality concepts
• Graphical representation
•
•
•
Definition
Markov properties
Extension: systems with latent variables
• Causal learning
•
•
Basic principles
Identification from empirical relationships
• Non-Markovian constraints
•
•
•
Trek-separation in graphs
Tetrad representation theorem
Testing for tetrad constraints
• Open problems and conclusions
26 / 52
Problem
Example:
L
1
2
3
4
• X1 , X2 , X3 , X4 are conditionally independent given L
• no conditional independences among X1 , . . . , X4 .
27 / 52
Trek separation
Problem:
• conditional independences are not sufficient to describe processes
that involve latent variables
• identification of such structures relies on sparsity that is often not
given
Approach: Sullivant et al (2011) for multivariate Gaussian distributions
• new concept of separation in graphs
• encodes rank constraints on minors of covariance matrix
• generalizes other concepts of separation
• special case: conditional independences
28 / 52
Trek separation
A trek between nodes i and j is a path π = (πL , πM , πR ) such that
• πL is a directed path from some node kL to i;
• πR is a directed path from some node kR to j;
kR or a path of length zero (kL = kR ).
Examples: i kR kL j, i v k j, i v j, i j
• πM is an undirected edge kL
Definition (trek separation)
(CL , CR ) t-separates sets A and B if for every trek (πL , πM , πR )
• πL contains a vertex in CL or
• πR contains a vertex in CR .
29 / 52
Trek separation
Let X be a stationary Gaussian process with spectral matrix Σ(ω)
satisfying
Σ(ω) =
1
2π
∞
P
u=−∞
cov(Xt , Xt−u ) e−i u ω .
Theorem
Let X be G-Markov. Then the following are equivalent:
• rank(ΣAB (ω)) ≤ r for all ω ∈ [−π, π]
• A and B are t-separated by some (CL , CR ) with |CL | + |CR | ≤ r.
30 / 52
Trek separation
Corollaries:
Let X be Gaussian stationary process. Then
XA ⊥⊥ XB | XC ⇔ rank(ΣA∪C,B∪C ) = |C|.
Furthermore the following are equivalent:
• XA ⊥⊥ XB | XC for all G-Markov processes X;
• (CA , CB ) t-separates A ∪ C and B ∪ C for some partition C = CA ∪ CB .
31 / 52
Tetrad representation theorem
Consider the class M (G) of all G-Markov stationary Gaussian processes
Proposition
The following are equivalent:
• The spectral matrices Σ(·) of processes in M (G) satisfy
Σik (ω) Σjl (ω) − Σil (ω) Σjk (ω) = 0;
• {i, j} and {k, l} are t-separated by (c, ∅) or (∅, c) for some node c in G
32 / 52
Tetrad representation theorem
If the spectral matrix Σ(ω) satisfies the tetrad constraints
Σik (ω)Σjl (ω) − Σil (ω)Σjk (ω) = 0
Σij (ω)Σkl (ω) − Σil (ω)Σkj (ω) = 0
Σik (ω)Σlj (ω) − Σij (ω)Σlk (ω) = 0
then there exists a node P such that Xi , Xj , Xk , and Xl are mutually
conditionally independent given XP .
P
1
2
3
4
Note: If no such XP is among the observed variables, XP must be a latent
factor.
33 / 52
Testing tetrad constraints
Approach: nonparametric test (Eichler 2008)
Null hypothesis: ψ(Σ(ω)) ≡ 0 where ψ(Z) = zik zjl − zil zjk
Test statistic:
Z
ST =
|ψ(Σ̂(ω))|2 dω.
where Σ̂(ω) is a kernel spectral estimator with bandwidth bT
34 / 52
Testing tetrad constraints
Approach: nonparametric test (Eichler 2008)
Null hypothesis: ψ(Σ(ω)) ≡ 0 where ψ(Z) = zik zjl − zil zjk
Test statistic:
Z
ST =
|ψ(Σ̂(ω))|2 dω.
where Σ̂(ω) is a kernel spectral estimator with bandwidth bT
Theorem Under the null hypothesis
1/2
−1/2
bT T S T − bT
D
µ → N (0, σ2 ),
where
µ = Ch Cw,2
2
σ =
Z
tr ∇ψ(Σ(ω))′ Σ(ω) ∇ψ(Σ(−ω)) Σ(ω) dω
4π Ch2 Cw,4
Z
| tr ∇ψ(Σ(ω))′ ΣAA (ω) ∇ψ(Σ(−ω)) ΣBB (ω) |2 dω,
34 / 52
Latent variable models
Common identifiability constraint for factor models:
factors are uncorrelated/independent
But: in many applications (eg in neuroscience), we think of latent
variables that are causally connected.
• EEG recordings measures neural activity in close cortical regions
• fMRI recordings measure hemodynamic responses which depend on
underlying neural activity
Objective: recover latent processes and interrelations among them
35 / 52
Latent variable models
Suppose that Y(t) can be partioned into YI1 (t), . . . , YIr (t) such that
YIj (t) = Λj Xj (t) + ǫIj (t)
and X(t) is a VAR(p) process.
Then the model can be fitted by the following steps:
• identify clusters of variables depending on one latent variable
(based on tetrad rules)
• use PCA to determine latent variable processes Xj (t)
• fit VAR model to all latent variable processes jointly
36 / 52
Latent variable models
Example
X(1)
15
10
5
0
−5
−10
−15
0
200
400
600
800
1000
600
800
1000
600
800
1000
600
800
1000
600
800
1000
time
X(2)
15
10
5
0
−5
−10
−15
0
200
400
time
X(3)
20
10
0
−10
−20
−30
0
200
400
X(4)
time
4
2
0
−2
−4
0
200
400
X(5)
time
4
2
0
−2
−4
0
200
400
time
37 / 52
Latent variable models
Example
Set {1, 2} with:
• {3, 5}: S = −0.31
0.8
abs(Res[m, ])
• {3, 4}: S = −0.98
1.0
0.6
0.4
0.2
• {4, 5}: S = −1.4
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
38 / 52
Latent variable models
Example
Set {1, 3} with:
• {2, 5}: S = 0.76
0.8
abs(Res[m, ])
• {2, 4}: S = −1.37
1.0
0.6
0.4
0.2
• {4, 5}: S = −0.44
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
39 / 52
Latent variable models
Example
Set {1, 4} with:
• {2, 5}: S = 6.54
0.8
abs(Res[m, ])
• {2, 3}: S = −1.19
1.0
0.6
0.4
0.2
• {3, 5}: S = 6.55
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
40 / 52
Latent variable models
Example
Set {1, 5} with:
• {2, 4}: S = 5.43
0.8
abs(Res[m, ])
• {2, 3}: S = −1.22
1.0
0.6
0.4
0.2
• {3, 4}: S = 5.77
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
41 / 52
Latent variable models
Example
Set {2, 3} with:
• {1, 5}: S = −1.21
0.8
abs(Res[m, ])
• {1, 4}: S = −1.18
1.0
0.6
0.4
0.2
• {4, 5}: S = −1.58
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
42 / 52
Latent variable models
Example
Set {2, 4} with:
• {3, 5}: S = 5.43
0.8
abs(Res[m, ])
• {3, 4}: S = −1.36
1.0
0.6
0.4
0.2
• {4, 5}: S = 5.66
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
43 / 52
Latent variable models
Example
Set {2, 5} with:
• {1, 4}: S = 6.55
0.8
abs(Res[m, ])
• {1, 3}: S = 0.76
1.0
0.6
0.4
0.2
• {3, 4}: S = 5.73
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
44 / 52
Latent variable models
Example
Set {3, 4} with:
• {1, 5}: S = 5.77
0.8
abs(Res[m, ])
• {1, 2}: S = −0.98
1.0
0.6
0.4
0.2
• {2, 5}: S = 5.73
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
45 / 52
Latent variable models
Example
Set {3, 5} with:
• {1, 4}: S = 6.54
0.8
abs(Res[m, ])
• {1, 2}: S = −0.31
1.0
0.6
0.4
0.2
• {2, 4}: S = 5.66
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
46 / 52
Latent variable models
Example
Set {4, 5} with:
• {1, 3}: S = −0.44
0.8
abs(Res[m, ])
• {1, 2}: S = −1.41
1.0
0.6
0.4
0.2
• {2, 3}: S = −1.58
0.0
0
200
400
600
800
1000
600
800
1000
600
800
1000
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
1.0
abs(Res[m, ])
0.8
0.6
0.4
0.2
0.0
0
200
400
Index
47 / 52
Latent variable models
Example:
Q
P
1
2
3
4
5
48 / 52
Latent variable models
Example:
L1
1
L3
L2
2
3
4
5
6
49 / 52
Conclusion
Causal Inference is a complex task
• requires modelling at all levels (bivariate to fully multivariate)
• requires Granger causality as well as other measures (e.g. Sims
causality)
• definite results may be sparse without further assumptions
• latent variables induces further (non-Markovian) constraints on the
distribution
Open Problems:
• merging of information about latent variables;
development of algortihms for latent variables
• uncertainty in identification of Granger causal relationships
• instantaneous causality
• aggregation over time (distortion of identification only possible up
to Markov equivalence
• non-stationarity and non-linearity
50 / 52
References
• E. (2007), Granger-causality and path diagrams for multivariate time series,
Journal of Econometrics 137, 334-353.
• E. (2008), Testing nonparametric and semiparametric hypotheses in vector
stationary processes. Journal of Multivariate Analysis 99, 968-1009.
• E. (2009), Causal inference from time series: what can be learned from
Granger causality? In: G. Glymour, W. Wang, D. Westerståhl (eds),
Proceedings of the 13th International Congress of Logic, Methodology and
Philosophy of Science, College Publications, London.
• E. (2010), Graphical Modelling of multivariate time series with latent
variables. Journal of Machine Learning Research W&CP 9
• E. (2012), Graphical modelling of multivariate time series. Probability
Theory and Related Fields 153, 233-268.
• E. (2012). Causal inference in time series analysis. In: C. Berzuini, A.P.
Dawid, L. Bernardinelli (eds), Causality: Statistical Perspectives and
Applications, Wiley, Chichester.
• E. (2013). Causal inference with multiple time series: principles and
problems. Philosophical Transaction of The Royal Society A 371, 20110613.
51 / 52
Download