Towards a Multi-Subject Analysis of Neural Connectivity C. J. Oates1 , L. Costa1 , S. Mukherjee2 , T. Nichols1 , J. Q. Smith1 , J. Cussens3 1 Department of Statistics, University of Warwick, CV4 7AL, UK. MRC Biostatistics Unit and CRUK Cambridge Institute, University of Cambridge, CB2 0SR, UK. Department of Computer Science and York Centre for Complex Systems Analysis, University of York, YO10 5GE, UK. 2 3 3rd September 2014 Sponsored by The increasing sophistication of high-throughput molecular technologies offers exciting possibilities for systems-level analysis of biological systems. Yet such novel and diverse data are frequently accompanied by significant statistical and computational challenges. Workshop on Statistical Systems Biology The increasing sophistication of high-throughput molecularattechnologies exciting possibilities for systems-level This meeting aims to bring together researchers operating the interfaceoffers of systems biology, mathematical analysis of biological systems. Yet such novel and diverse data are frequently accompanied by significant statistical th th statistics, in order to better leverage emerging data using state-of-the-art statistical methodologies. 9modelling -11and December 2014 warwick.ac.uk/SSB and computational challenges. Keynote Speakers: This meeting aims to bring together researchers operating at the interface of systems biology, mathematical Prof. Davidand Balding, Chairinoforder the UCL Genetics Institute. modelling statistics, to better leverage emerging data using state-of-the-art statistical methodologies. Dr. Clive Bowsher, Keynote Speakers:School of Mathematics, Bristol. Sponsored by Prof. Mustafa Khammash, Control Theory and Systems Biology, ETH Zurich. Prof. David Balding, Chair of the UCL Genetics Institute. Prof. Walter Kolch,School Director the Conway Bristol. Institute and Systems Biology Ireland, University College Dublin. Dr. Clive Bowsher, of of Mathematics, Prof. John Lygeros, Head ofControl the Automatic Control Laboratory, Prof. Mustafa Khammash, Theory and Systems Biology,ETH ETHZurich. Zurich. Dr. Sach Mukherjee, Group Leader at the Netherlands Cancer Institute. Prof. Walter Kolch, Director of the Conway Institute and Systems Biology Ireland, University College Dublin. Prof. Tavaré Head FRS, Director of Cancer Control Research UK Cambridge Institute. Prof. Simon John Lygeros, of the Automatic Laboratory, ETH Zurich. Darren Wilkinson, School of Mathematics and Statistics, Newcastle. The increasingProf. sophistication ofGroup high-throughput molecular technologies offers exciting possibilities for systems-level Dr. Sach Mukherjee, Leader at the Netherlands Cancer Institute. analysis of biological systems. YetDirector such novel andResearch diverse are frequently Prof. Simon Tavaré FRS, of Cancer UKdata Cambridge Institute. accompanied by significant statistical and computational challenges. Organisers: Chris Oates, Mark Girolami, David Rand, Michael Stumpf. Contact: c.oates (at) warwick.ac.uk. Prof. Darren Wilkinson, School of Mathematics and Statistics, Newcastle. This meeting aims to bring together researchers operating at the interface of systems biology, mathematical modelling andOrganisers: statistics, in order better leverage emerging using state-of-the-art statistical methodologies. Oates,to Mark Girolami, Davidand Rand, Michaeldata Stumpf. Contact: c.oates (at) warwick.ac.uk. Call forChris Papers, Presentations Posters In partnership with Statistical Applications in Genetics and Molecular Biology (De Gruyter) Keynote Speakers: we are soliciting high-quality research for joint journal submission and presentation at the CallChair for Papers, Presentations and Posters Prof. David Balding, of the UCL Genetics Institute. workshop. In addition, we are encouraging the submission of contributed presentations and In partnership with Statistical Applications in Genetics and Molecular Biology (De Gruyter) Dr. Clive Bowsher, School of Mathematics, posters in relevant research areas.Bristol. we are soliciting high-quality research for joint journal submission and presentation at the Prof. Mustafa Khammash, Theory and Systems Biology,ofETH Zurich.presentations and workshop. In Control addition, we are encouraging the submission contributed posters in relevant research areas. Prof. Walter Kolch, Director of the Conway Institute and Systems Biology Ireland, University College Dublin. Prof. John Lygeros, Head of the Automatic Control Laboratory, ETH Zurich. Dr. Sach Mukherjee, Group Leader at the Netherlands Cancer Institute. Prof. Simon Tavaré FRS, Director of Cancer Research UK Cambridge Institute. Prof. Darren Wilkinson, School of Mathematics and Statistics, Newcastle. Motivation: Uncovering neural connectivity How are these brain regions interacting? Motivation: Uncovering neural connectivity How are these brain regions interacting? Multiregression dynamical models (MDMs; Catriona Queen and Jim Smith, 1993): Y1 (t) θ1 (t) Y2 (t) θ2 (t) θ3 (t) Y1 (t + 1) θ1 (t + 1) Y2 (t + 1) θ2 (t + 1) .. . .. . Y3 (t) Y3 (t + 1) θ3 (t + 1) Motivation: Uncovering neural connectivity Subject 1 Rep 2 Rep 3 Rep 4 λ = 2 λ = 0 Rep 1 λ = 4 Figure: fMRI data; replicate data from the same subject. DAGs estimated from time series data using MDMs. 10 λ = 8 λ = 6 Node Number 1 2 3 4 5 6 7 8 9 10 Symmetry Bilateral Bilateral Bilateral Bilateral Left Dominant Right Dominant Bilateral Bilateral Bilateral Bilateral Summary Motor:hand/face Sensory:All-but-face Motor:All-but-face UNKNOWN Sensorimotor: L Hand+Arms Sensorimotor: R Hand+Arms Sensory: Trunk-to-feet Sensory: Face Auditory Sensorimotor:All-but-face - Sensory:Face Rep 1 λ 16 λ= =10 λ20= λ 14 λ= =8 λ= =6 λ16= λ 10 λ= =4 λ14= λ 8λ= λ12= λ 6λ= =0 λ10= 4λ = λ8 = 2λ = λ18= λ 12 =2 An ideal algorithm Rep 1 Subject 1 Rep 2 Rep 3 Rep 4 .. . .. . Figure: fMRI data; joint learning of all DAGs simultaneously. [λ is a “regularity” parameter.] Rep 1 λ 16 λ= =10 λ20= λ 14 λ= =8 λ= =6 λ16= λ 10 λ= =4 λ14= λ 8λ= λ12= λ 6λ= =0 λ10= 4λ = λ8 = 2λ = λ18= λ 12 =2 An ideal algorithm Rep 1 Subject 1 Rep 2 Rep 3 Rep 4 Rep 1 .. . .. . Figure: fMRI data; joint learning of all DAGs simultaneously. [λ is a “regularity” parameter.] But how might this work? Seems challenging... Joint statistical model for multiple DAGs Population structure N DAGs G (1) G (2) G (3) ... G (K −1) G (K ) Subjects Y (1) Y (2) Y (3) ... Y (K −1) Y (K ) Parameters θ (1) θ (2) θ (3) θ (K −1) θ (K ) Figure: A Bayesian hierarchical model for multiple DAGs. Joint statistical model for multiple DAGs Joint prior over multiple DAGs: Y p(G (1:K ) |N) ∝ r (G (k) , G (l) ) × K Y m(G (k) ) k=1 (k,l)∈N | ! {z regularity } | {z } multiplicity correction Joint statistical model for multiple DAGs Joint prior over multiple DAGs: Y p(G (1:K ) |N) ∝ r (G (k) , G (l) ) × K Y m(G (k) ) k=1 (k,l)∈N | ! {z regularity } | {z Structural Hamming distance: log(r (G (k) , G (l) )) = −λ X (j,i) } multiplicity correction (k) I{j ∈ Gi (l) ∆Gi }. Joint statistical model for multiple DAGs Joint prior over multiple DAGs: Y p(G (1:K ) |N) ∝ r (G (k) , G (l) ) × K Y m(G (k) ) k=1 (k,l)∈N | ! {z regularity } | {z Structural Hamming distance: log(r (G (k) , G (l) )) = −λ X (k) I{j ∈ Gi (l) ∆Gi }. (j,i) Binomial correction: m(G (k) −1 P Y P (k) )= I{|Gi | ≤ dmax }. (k) |Gi | i=1 } multiplicity correction Joint statistical model for multiple DAGs Joint prior over DAGs and the network N: p(G (1:K ) , N) ∝ p(G (1:K ) |N)p(N) where η controls the density of the network N and +C log(p(N)) = ηkNk. Joint statistical model for multiple DAGs Joint prior over DAGs and the network N: p(G (1:K ) , N) ∝ p(G (1:K ) |N)p(N) where η controls the density of the network N and +C log(p(N)) = ηkNk. Then interest is in the “doubly joint” MAP (Ĝ (1:K ) , N̂) := arg maxG (1:K ) ,N p(Y (1:K ) |G (1:K ) , N)p(G (1:K ) , N). Why are multiple DAGs challenging? Acyclicity. Design a local move that encourages more similar DAGs... Why are multiple DAGs challenging? Acyclicity. Pick an edge on which the two DAGs differ. Why are multiple DAGs challenging? Acyclicity. Propose to add/remove this edge to the other DAG. Why are multiple DAGs challenging? Acyclicity. Check no cycles are created. Why are multiple DAGs challenging? Acyclicity. Delete an edge from each cycle. Why are multiple DAGs challenging? Acyclicity. But these new DAGs are as different as when we started! Clearly a different approach is needed. Integer linear programs for MAP DAGs Inference for single DAGs is “easy”: Integer linear programs for MAP DAGs Inference for single DAGs is “easy”: Consider a Bayesian network Y with respect to a directed acyclic graph (DAG) model G . i.e. pY (y |G ) = P Y i=1 p(yp |yGi , Gi ). Integer linear programs for MAP DAGs Inference for single DAGs is “easy”: Consider a Bayesian network Y with respect to a directed acyclic graph (DAG) model G . i.e. pY (y |G ) = P Y p(yp |yGi , Gi ). i=1 There is interest in the maximum a posteriori (MAP) estimate Ĝ := arg maxG pY (y |G )p(G ). Integer linear programs for MAP DAGs Inference for single DAGs is “easy”: Consider a Bayesian network Y with respect to a directed acyclic graph (DAG) model G . i.e. pY (y |G ) = P Y p(yp |yGi , Gi ). i=1 There is interest in the maximum a posteriori (MAP) estimate Ĝ := arg maxG pY (y |G )p(G ). Q Choose a “nice” p(G ) = Pi=1 pGi (Gi ). Integer linear programs for MAP DAGs Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a DAG model as an integer linear program (ILP): max f T x subject to Ax ≤ b, Cx = d , x ∈ Zd Integer linear programs for MAP DAGs Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a DAG model as an integer linear program (ILP): max f T x subject to Ax ≤ b, Cx = d , x ∈ Zd Objective function: f T x = log[pY (y |G )p(G )] = = P X i=1 P X log[p(yi |yGi , Gi )pGi (Gi )] X log[p(yi |yπ , π)pGi (π)]xi,π i=1 π⊆{1:P} where xi,π = I{Gi = π} and e.g. x = (0, 0, 1, 0, 0, 0, . . . , 1, 0, 0). Integer linear programs for MAP DAGs Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a DAG model as an integer linear program (ILP): max f T x subject to Ax ≤ b, Cx = d , x ∈ Zd Objective function: f T x = log[pY (y |G )p(G )] = = P X i=1 P X log[p(yi |yGi , Gi )pGi (Gi )] X log[p(yi |yπ , π)pGi (π)]xi,π i=1 π⊆{1:P} where xi,π = I{Gi = π} and e.g. x = (0, 0, 1, 0, 0, 0, . . . , 1, 0, 0). Q: How to ensure x corresponds to a well-defined DAG? Integer linear programs for MAP DAGs Convexity: X xi,π = 1 ∀i ∈ {1 : P} π⊆{1:P} No self-loops: xi,π = 0 ∀π ∈ i Acyclicity (version of Jaakola et al., ’10): X X xi,π ≥ 1 ∀∅ = 6 C ⊆ {1 : P}. i∈C π⊆{1:P} π∩C =∅ These constraints together exactly characterise the space of DAGs. Integer linear programs for multiple DAGs Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP. Integer linear programs for multiple DAGs Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP. Sketch: Need to encode N in the state vector x: E (k,l) := I{(k, l) ∈ N} ∀k, l with k < l Integer linear programs for multiple DAGs Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP. Sketch: Need to encode N in the state vector x: E (k,l) := I{(k, l) ∈ N} ∀k, l with k < l Now construct binary indicators of individual edges and disagreement between edges: X (k) (k) ej,i = xi,π ∀i, j, k. π⊆{1:P} j∈π (k,l) = I{j ∈ Gi (k,l) = I{j ∈ Gi dj,i Dj,i (k) ∆Gi } (l) (k) ∆Gi (l) ∀i, j, k, l with k < l and (k, l) ∈ N} ∀i, j, k, l. Integer linear programs for multiple DAGs Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP. Sketch: Need to encode N in the state vector x: E (k,l) := I{(k, l) ∈ N} ∀k, l with k < l Now construct binary indicators of individual edges and disagreement between edges: X (k) (k) ej,i = xi,π ∀i, j, k. π⊆{1:P} j∈π (k,l) = I{j ∈ Gi (k,l) = I{j ∈ Gi dj,i Dj,i (k) ∆Gi } (l) (k) ∆Gi (l) ∀i, j, k, l with k < l and (k, l) ∈ N} ∀i, j, k, l. XOR and AND constraints ≡ integer linear inequalities. Example: AND constraint A = AND(B, C ) +A −B +A −A +B ≤ 0 −C ≤ 0 +C ≤ 1 This linearisation of AND is optimal, in the sense that it describes all facets of the convex hull of feasible solutions for the AND constraint. Some preliminary results... Results: Simulation study P = 5, K = 5 0.6 0.5 MCC 0.4 0.3 0.2 0.1 0 −0.1 Independent \lambda = 0.1 \lambda = 0.5 \lambda = 1 \lambda = 2 \lambda = 100 Figure: Simulated data; fixed N = complete, varying λ. [MCC = Matthews’ correlation coefficient.] Intuition: A modest amount of regularisation should help, but too much can lead to artefacts. Results: Group analysis of fMRI data Subject 1 Rep 2 Rep 3 Rep 4 Rep 1 Subject 2 Rep 2 Rep 3 λ = 20 λ = 16 λ = 12 λ = 8 λ = 4 λ = 0 Rep 1 Figure: fMRI data on two subjects; learning λ. Rep 4 Results: Group analysis of fMRI data Density hyperparameter η = 30 η = 40 3 1 1 6 N 6 5 1 6 5 3 2 4 2 4 1 6 5 2 4 1 6 5 Sub 6 Sub 5 Sub 4 Sub 3 Sub 2 Sub 1 5 1 η = 80 3 2 4 η = 70 3 2 4 η = 60 3 2 4 η = 50 3 Figure: fMRI data on six subjects; learning N. [λ = 4] 6 5 Summary To do: I Large-scale empirical study I Informative group priors (e.g. based on demographic covariates or genealogy) I Causal semantics for transfer learning References: I CJO, Lilia Costa, Tom Nichols (2014) Towards a Multi-Subject Analysis of Neural Connectivity. To appear in Neural Computation. arXiv:1404.1239. I CJO, Jim Smith, Sach Mukherjee, James Cussens (2014) Exact Estimation of Multiple Directed Acyclic Graphs. arXiv:1404.1238.