Theory of Big Data 2 Conference Big Data Institute, University College London Causal Inference from Multivariate Time Series: Principles and Problems Michael Eichler Department of Quantitative Economics Maastricht University http://researchers-sbe.unimaas.nl/michaeleichler 6 January 2016 Outline • Causality concepts • Graphical representation • • • Definition Markov properties Extension: systems with latent variables • Causal learning • • Basic principles Identification from empirical relationships • Non-Markovian constraints • • • Trek-separation in graphs Tetrad representation theorem Testing for tetrad constraints • Open problems and conclusions 2 / 52 Concepts of causality for time series Z We consider two variables X and Y measured at discrete times t ∈ : X = Xt t∈Z , Y = Yt t∈Z . Question: When is it justified to say that X causes Y? Various approaches: • Intervention causality (Pearl, 1993; Eichler & Didelez 2007, 2010) • Structural causality (White and Lu, 2010) • Granger causality (Granger, 1967, 1980, 1988) • Sims causality (Sims, 1972) 3 / 52 Granger causality Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise. 4 / 52 Granger causality Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise. This leads us to consider two information sets: • F ∗ (t) - all information in the universe up to time t ∗ • F−X (t) - this information except the values of X 4 / 52 Granger causality Two fundamental princples: • The cause precedes its effect in time. • The causal series contains special information about the series being caused that is not available otherwise. This leads us to consider two information sets: • F ∗ (t) - all information in the universe up to time t ∗ • F−X (t) - this information except the values of X Granger’s definition of causality (Granger 1969, 1980) We say that X causes Y if the probability distributions of • Yt+1 given F ∗ (t) and ∗ • Yt+1 given F−X (t) are different. 4 / 52 Granger causality Problem: The definition cannot be used with actual data. 5 / 52 Granger causality Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V = (X, Y, Z) and let • {X t } - information given by X up to time t • similarly for Y and Z Definition: Granger non-causality • X is Granger-noncausal for Y with respect to V if Yt+1 ⊥⊥ X t | Y t , Zt . • Otherwise we say that X Granger-causes Y with respect to V. 5 / 52 Granger causality Problem: The definition cannot be used with actual data. Suppose data consist of multivariate time series V = (X, Y, Z) and let • {X t } - information given by X up to time t • similarly for Y and Z Definition: Granger non-causality • X is Granger-noncausal for Y with respect to V if Yt+1 ⊥⊥ X t | Y t , Zt . • Otherwise we say that X Granger-causes Y with respect to V. Additionally: • X and Y are said to be contemporaneously independent w.r.t. V if Xt+1 ⊥⊥ Yt+1 | V t 5 / 52 Sims causality Definition: Sims non-causality X does not Sims-cause Y with respect to V = (X, Y, Z) if {Yt′ |t′ > t} ⊥⊥ Xt | X t−1 , Y t , Zt Note: • Granger causality is a concept of direct causality • Sims causality is a concept of total causality (direct and indirect pathways) The following statistics are measures for Sims causality: • impulse response function (time and frequency domain) • direct transfer function (DTF) 6 / 52 Vector autoregressive processes Let X be a multivariate stationary Gaussian time series with vector autoregressive representation Xt = ∞ P Ak X t−k + ǫ t k=1 Granger non-causality in VAR models: The following are equivalent: • Xb does not Granger cause Xa with respect to X; • Aab,k = 0 for all k ∈ N. 7 / 52 Vector autoregressive processes Let X be a multivariate stationary Gaussian time series with vector autoregressive representation Xt = ∞ P Ak X t−k + ǫ t = k=1 ∞ P Bk ǫt−k k=0 Granger non-causality in VAR models: The following are equivalent: • Xb does not Granger cause Xa with respect to X; • Aab,k = 0 for all k ∈ N. Sims non-causality in VAR models: The following are equivalent: • Xb does not Sims cause Xa with respect to X; • Bab,k = 0 for all k ∈ N. 7 / 52 Outline • Causality concepts • Graphical representation • • • Definition Markov properties Extension: systems with latent variables • Causal learning • • Basic principles Identification from empirical relationships • Non-Markovian constraints • • • Trek-separation in graphs Tetrad representation theorem Testing for tetrad constraints • Open problems and conclusions 8 / 52 Graphical models for time series Basic idea: use graphs to encode conditional independences among variables • nodes/vertices represent variables • missing edge between two nodes implies conditional independence of the two variables Application to time series: • treat each variable at each time separately ( time series chain graphs) • treat each series as one variables (only one node in the graph) 9 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Idea: represent Granger-causal relations in X by mixed graph G: • vertices v ∈ V represent the variables (time series) Xv ; 10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Idea: represent Granger-causal relations in X by mixed graph G: • vertices v ∈ V represent the variables (time series) Xv ; • directed edges between the vertices indicate Granger-causal relationships; 10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Idea: represent Granger-causal relations in X by mixed graph G: • vertices v ∈ V represent the variables (time series) Xv ; • directed edges between the vertices indicate Granger-causal relationships; • additionally undirected (dashed) edges indicate contemporaneous associations. 10 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t • X2,t = f2 (X4,t−1 ) + ǫ2,t 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t • X2,t = f2 (X4,t−1 ) + ǫ2,t • X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t • X2,t = f2 (X4,t−1 ) + ǫ2,t • X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t • X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t • X2,t = f2 (X4,t−1 ) + ǫ2,t • X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t • X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t • X5,t = f5 (X3,t−1 ) + ǫ5,t 11 / 52 Graphical models for time series Granger causality graphs (Eichler 2007) Example: consider five-dimensional autoregressive process XV X t = f (X t−1 ) + ǫ t 4 2 1 3 5 with • X1,t = f1 (X3,t−1 ) + ǫ1,t • X2,t = f2 (X4,t−1 ) + ǫ2,t • X3,t = f3 (X1,t−1 , X2,t−1 ) + ǫ3,t • X4,t = f4 (X3,t−1 , X5,t−1 ) + ǫ4,t • X5,t = f5 (X3,t−1 ) + ǫ5,t • ǫ1,t , ǫ2,t , ǫ3,t ⊥⊥ ǫ4,t , ǫ5,t ǫ4,t ⊥⊥ ǫ5,t 11 / 52 Markov properties Objective: derive Granger-causal relationships for X S , S ⊆ V 12 / 52 Markov properties Objective: derive Granger-causal relationships for X S , S ⊆ V Idea: characterize pathways that induce associations 12 / 52 Markov properties Objective: derive Granger-causal relationships for X S , S ⊆ V Idea: characterize pathways that induce associations Tool: concepts of separation in graphs • DAGs: d-separation (Pearl 1988) • mixed graphs: d-separation (Spirtes et al. 1998, Koster 1999) or m-separation (Richardson 2003) 12 / 52 Markov properties 2 3 1 p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 ) ⇒ X3 ⊥⊥ X1 | X2 13 / 52 Markov properties 2 2 3 1 3 1 p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 ) p(x) = p(x1 |x2 )p(x3 |x2 )p(x2 ) ⇒ X3 ⊥⊥ X1 | X2 ⇒ X3 ⊥⊥ X1 | X2 13 / 52 Markov properties 2 2 3 1 3 1 p(x) = p(x3 |x2 )p(x2 |x1 )p(x1 ) ⇒ X3 ⊥⊥ X1 | X2 2 p(x) = p(x1 |x2 )p(x3 |x2 )p(x2 ) ⇒ X3 ⊥⊥ X1 | X2 3 1 p(x) = p(x2 |x1 , x3 )p(x3 )p(x1 ) 6⇒ X3 ⊥⊥ X1 | X2 13 / 52 Global Granger-causal Markov property Separation in mixed graphs Question: What type of paths induce Granger causal relations between variables? Note: Granger (non)causality is not symmetric Idea: consider only paths ending with a directed edge Examples: 1 2 3 4 entails • X1 does not Granger cause X4 with respect to X1 , X4 • X1 does not Granger cause X4 with respect to X1 , X3 , X4 • X1 does not Granger cause X4 with respect to X1 , X2 , X3 , X4 but not • X1 does not Granger cause X4 with respect to X1 , X2 , X4 14 / 52 Outline • Causality concepts • Graphical representation • • • Definition Markov properties Extension: systems with latent variables • Causal learning • • Basic principles Identification from empirical relationships • Non-Markovian constraints • • • Trek-separation in graphs Tetrad representation theorem Testing for tetrad constraints • Open problems and conclusions 15 / 52 Principles of causal inference Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis 16 / 52 Principles of causal inference Objective: identify causal structure of process X Question: What to use in practise? • Granger causality or Sims causality • bivariate or fully multivariate analysis Answer: For causal inference . . . all and more. 16 / 52 Principles of identification An example of indirect causality: 2 1 3 implies for the bivariate submodel 1 3 17 / 52 Principles of identification An example of spurious causality: 2 L 3 1 implies for the trivariate and bivariate submodels 2 1 3 1 3 18 / 52 Principles of identification Inverse problem: What can we say about the full system based on observed Granger-noncausal relations for the observed (sub)process? Suppose • Xa → Xc [XS ] for all {a, c} ⊆ S ⊆ V • Xc → Xb [XS ] for all {c, b} ⊆ S ⊆ V Rules of causal inference • Indirect causality rule: Xa truely causes Xb if Xa 9 Xb [S] for some S ⊆ V with c ∈ S • Spurious causality rule: Xa is a spurious cause of Xb if Xa 9 Xb [S] for some S ⊆ V with c ∈ /S 19 / 52 Principles of causal inference Z U Y X bivariate Granger trivariate Granger trivariate Sims 0.2 0.2 0.2 0.0 BYX(h) 0.4 AYX(h) 0.4 AYX(h) 0.4 0.0 −0.2 0.0 −0.2 2 4 6 lag h 8 10 −0.2 2 4 6 lag h 8 10 2 4 6 8 lag h 10 12 14 20 / 52 Principles of causal inference Z U V Y X bivariate Granger trivariate Granger trivariate Sims 0.2 0.2 0.2 0.0 BYX(h) 0.4 AYX(h) 0.4 AYX(h) 0.4 0.0 −0.2 0.0 −0.2 2 4 6 lag h 8 10 −0.2 2 4 6 lag h 8 10 2 4 6 8 lag h 10 12 14 21 / 52 Identification of causal structure Algorithm: identification of adjacencies b whenever Xa and Xb are not contemporaneously independent • insert a • insert a b whenever • • Xb → Xa [XS ] for all S ⊆ V with a, b ∈ S; Xa (t − k) 6⊥⊥ Xb (t + 1) | FS1 (t) ∨ FS2 (t − k) ∨ Fa (t − k − 1) for all k ∈ , t ∈ , for all disjoint S1 , S2 ⊆ V with b ∈ S1 and a∈ / S1 ∪ S2 . N Z 22 / 52 Identification of causal structure Algorithm: identification of tails • colliders: a c b ∈ G and Xa 9 Xb [XS ] for some S such that c ∈ /S ⇒ c b cb • non-colliders: a c b ∈ G and Xa 9 Xb [XS ] for some S such that c ∈ S ⇒ c b cb • ancestors: a . . . b in G ⇒ a b ab • discriminating paths: e.g. Ali et al. (2004) 23 / 52 Identification of causal structure Example: application to neural spike train data Neuron 1 Neuron 2 Neuron 3 Neuron 4 Neuron 5 Neuron 6 Neuron 7 Neuron 8 Neuron 9 Neuron 10 0 2 4 6 8 0.4 0.4 0.3 0.3 0.3 0.2 0.1 0.0 −0.1 0.2 0.1 0.0 −0.1 −0.2 −40 −20 0 lag 20 40 60 0.2 0.1 0.0 −0.1 −0.2 −60 −0.2 −60 −40 −20 0 lag 20 40 60 0.4 0.4 0.3 0.3 0.3 0.2 0.1 0.0 pdc(3 → 4) 0.4 pdc(2 → 4) pdc(2 → 3) pdc(1 → 4) 0.4 pdc(1 → 3) pdc(1 → 2) Time [sec] 0.2 0.1 0.0 −0.2 −0.2 −0.2 0 lag 20 40 60 −60 −40 −20 0 lag 20 40 60 0 lag 20 40 60 −60 −40 −20 0 lag 20 40 60 0.0 −0.1 −20 −20 0.1 −0.1 −40 −40 0.2 −0.1 −60 −60 24 / 52 Identification of causal structure Example: (a) 3 2 (b) (c) (d) (f) (g) (h) 4 1 (e) (j) (i) Result: (k) 2 1 3 4 25 / 52 Outline • Causality concepts • Graphical representation • • • Definition Markov properties Extension: systems with latent variables • Causal learning • • Basic principles Identification from empirical relationships • Non-Markovian constraints • • • Trek-separation in graphs Tetrad representation theorem Testing for tetrad constraints • Open problems and conclusions 26 / 52 Problem Example: L 1 2 3 4 • X1 , X2 , X3 , X4 are conditionally independent given L • no conditional independences among X1 , . . . , X4 . 27 / 52 Trek separation Problem: • conditional independences are not sufficient to describe processes that involve latent variables • identification of such structures relies on sparsity that is often not given Approach: Sullivant et al (2011) for multivariate Gaussian distributions • new concept of separation in graphs • encodes rank constraints on minors of covariance matrix • generalizes other concepts of separation • special case: conditional independences 28 / 52 Trek separation A trek between nodes i and j is a path π = (πL , πM , πR ) such that • πL is a directed path from some node kL to i; • πR is a directed path from some node kR to j; kR or a path of length zero (kL = kR ). Examples: i kR kL j, i v k j, i v j, i j • πM is an undirected edge kL Definition (trek separation) (CL , CR ) t-separates sets A and B if for every trek (πL , πM , πR ) • πL contains a vertex in CL or • πR contains a vertex in CR . 29 / 52 Trek separation Let X be a stationary Gaussian process with spectral matrix Σ(ω) satisfying Σ(ω) = 1 2π ∞ P u=−∞ cov(Xt , Xt−u ) e−i u ω . Theorem Let X be G-Markov. Then the following are equivalent: • rank(ΣAB (ω)) ≤ r for all ω ∈ [−π, π] • A and B are t-separated by some (CL , CR ) with |CL | + |CR | ≤ r. 30 / 52 Trek separation Corollaries: Let X be Gaussian stationary process. Then XA ⊥⊥ XB | XC ⇔ rank(ΣA∪C,B∪C ) = |C|. Furthermore the following are equivalent: • XA ⊥⊥ XB | XC for all G-Markov processes X; • (CA , CB ) t-separates A ∪ C and B ∪ C for some partition C = CA ∪ CB . 31 / 52 Tetrad representation theorem Consider the class M (G) of all G-Markov stationary Gaussian processes Proposition The following are equivalent: • The spectral matrices Σ(·) of processes in M (G) satisfy Σik (ω) Σjl (ω) − Σil (ω) Σjk (ω) = 0; • {i, j} and {k, l} are t-separated by (c, ∅) or (∅, c) for some node c in G 32 / 52 Tetrad representation theorem If the spectral matrix Σ(ω) satisfies the tetrad constraints Σik (ω)Σjl (ω) − Σil (ω)Σjk (ω) = 0 Σij (ω)Σkl (ω) − Σil (ω)Σkj (ω) = 0 Σik (ω)Σlj (ω) − Σij (ω)Σlk (ω) = 0 then there exists a node P such that Xi , Xj , Xk , and Xl are mutually conditionally independent given XP . P 1 2 3 4 Note: If no such XP is among the observed variables, XP must be a latent factor. 33 / 52 Testing tetrad constraints Approach: nonparametric test (Eichler 2008) Null hypothesis: ψ(Σ(ω)) ≡ 0 where ψ(Z) = zik zjl − zil zjk Test statistic: Z ST = |ψ(Σ̂(ω))|2 dω. where Σ̂(ω) is a kernel spectral estimator with bandwidth bT 34 / 52 Testing tetrad constraints Approach: nonparametric test (Eichler 2008) Null hypothesis: ψ(Σ(ω)) ≡ 0 where ψ(Z) = zik zjl − zil zjk Test statistic: Z ST = |ψ(Σ̂(ω))|2 dω. where Σ̂(ω) is a kernel spectral estimator with bandwidth bT Theorem Under the null hypothesis 1/2 −1/2 bT T S T − bT D µ → N (0, σ2 ), where µ = Ch Cw,2 2 σ = Z tr ∇ψ(Σ(ω))′ Σ(ω) ∇ψ(Σ(−ω)) Σ(ω) dω 4π Ch2 Cw,4 Z | tr ∇ψ(Σ(ω))′ ΣAA (ω) ∇ψ(Σ(−ω)) ΣBB (ω) |2 dω, 34 / 52 Latent variable models Common identifiability constraint for factor models: factors are uncorrelated/independent But: in many applications (eg in neuroscience), we think of latent variables that are causally connected. • EEG recordings measures neural activity in close cortical regions • fMRI recordings measure hemodynamic responses which depend on underlying neural activity Objective: recover latent processes and interrelations among them 35 / 52 Latent variable models Suppose that Y(t) can be partioned into YI1 (t), . . . , YIr (t) such that YIj (t) = Λj Xj (t) + ǫIj (t) and X(t) is a VAR(p) process. Then the model can be fitted by the following steps: • identify clusters of variables depending on one latent variable (based on tetrad rules) • use PCA to determine latent variable processes Xj (t) • fit VAR model to all latent variable processes jointly 36 / 52 Latent variable models Example X(1) 15 10 5 0 −5 −10 −15 0 200 400 600 800 1000 600 800 1000 600 800 1000 600 800 1000 600 800 1000 time X(2) 15 10 5 0 −5 −10 −15 0 200 400 time X(3) 20 10 0 −10 −20 −30 0 200 400 X(4) time 4 2 0 −2 −4 0 200 400 X(5) time 4 2 0 −2 −4 0 200 400 time 37 / 52 Latent variable models Example Set {1, 2} with: • {3, 5}: S = −0.31 0.8 abs(Res[m, ]) • {3, 4}: S = −0.98 1.0 0.6 0.4 0.2 • {4, 5}: S = −1.4 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 38 / 52 Latent variable models Example Set {1, 3} with: • {2, 5}: S = 0.76 0.8 abs(Res[m, ]) • {2, 4}: S = −1.37 1.0 0.6 0.4 0.2 • {4, 5}: S = −0.44 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 39 / 52 Latent variable models Example Set {1, 4} with: • {2, 5}: S = 6.54 0.8 abs(Res[m, ]) • {2, 3}: S = −1.19 1.0 0.6 0.4 0.2 • {3, 5}: S = 6.55 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 40 / 52 Latent variable models Example Set {1, 5} with: • {2, 4}: S = 5.43 0.8 abs(Res[m, ]) • {2, 3}: S = −1.22 1.0 0.6 0.4 0.2 • {3, 4}: S = 5.77 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 41 / 52 Latent variable models Example Set {2, 3} with: • {1, 5}: S = −1.21 0.8 abs(Res[m, ]) • {1, 4}: S = −1.18 1.0 0.6 0.4 0.2 • {4, 5}: S = −1.58 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 42 / 52 Latent variable models Example Set {2, 4} with: • {3, 5}: S = 5.43 0.8 abs(Res[m, ]) • {3, 4}: S = −1.36 1.0 0.6 0.4 0.2 • {4, 5}: S = 5.66 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 43 / 52 Latent variable models Example Set {2, 5} with: • {1, 4}: S = 6.55 0.8 abs(Res[m, ]) • {1, 3}: S = 0.76 1.0 0.6 0.4 0.2 • {3, 4}: S = 5.73 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 44 / 52 Latent variable models Example Set {3, 4} with: • {1, 5}: S = 5.77 0.8 abs(Res[m, ]) • {1, 2}: S = −0.98 1.0 0.6 0.4 0.2 • {2, 5}: S = 5.73 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 45 / 52 Latent variable models Example Set {3, 5} with: • {1, 4}: S = 6.54 0.8 abs(Res[m, ]) • {1, 2}: S = −0.31 1.0 0.6 0.4 0.2 • {2, 4}: S = 5.66 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 46 / 52 Latent variable models Example Set {4, 5} with: • {1, 3}: S = −0.44 0.8 abs(Res[m, ]) • {1, 2}: S = −1.41 1.0 0.6 0.4 0.2 • {2, 3}: S = −1.58 0.0 0 200 400 600 800 1000 600 800 1000 600 800 1000 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 1.0 abs(Res[m, ]) 0.8 0.6 0.4 0.2 0.0 0 200 400 Index 47 / 52 Latent variable models Example: Q P 1 2 3 4 5 48 / 52 Latent variable models Example: L1 1 L3 L2 2 3 4 5 6 49 / 52 Conclusion Causal Inference is a complex task • requires modelling at all levels (bivariate to fully multivariate) • requires Granger causality as well as other measures (e.g. Sims causality) • definite results may be sparse without further assumptions • latent variables induces further (non-Markovian) constraints on the distribution Open Problems: • merging of information about latent variables; development of algortihms for latent variables • uncertainty in identification of Granger causal relationships • instantaneous causality • aggregation over time (distortion of identification only possible up to Markov equivalence • non-stationarity and non-linearity 50 / 52 References • E. (2007), Granger-causality and path diagrams for multivariate time series, Journal of Econometrics 137, 334-353. • E. (2008), Testing nonparametric and semiparametric hypotheses in vector stationary processes. Journal of Multivariate Analysis 99, 968-1009. • E. (2009), Causal inference from time series: what can be learned from Granger causality? In: G. Glymour, W. Wang, D. Westerståhl (eds), Proceedings of the 13th International Congress of Logic, Methodology and Philosophy of Science, College Publications, London. • E. (2010), Graphical Modelling of multivariate time series with latent variables. Journal of Machine Learning Research W&CP 9 • E. (2012), Graphical modelling of multivariate time series. Probability Theory and Related Fields 153, 233-268. • E. (2012). Causal inference in time series analysis. In: C. Berzuini, A.P. Dawid, L. Bernardinelli (eds), Causality: Statistical Perspectives and Applications, Wiley, Chichester. • E. (2013). Causal inference with multiple time series: principles and problems. Philosophical Transaction of The Royal Society A 371, 20110613. 51 / 52