Towards a Multi-Subject Analysis of Neural Connectivity C. J. Oates , L. Costa

advertisement
Towards a Multi-Subject Analysis of Neural
Connectivity
C. J. Oates1 , L. Costa1 , S. Mukherjee2 , T. Nichols1 ,
J. Q. Smith1 , J. Cussens3
1 Department of Statistics, University of Warwick, CV4 7AL, UK.
MRC Biostatistics Unit and CRUK Cambridge Institute, University of
Cambridge, CB2 0SR, UK.
Department of Computer Science and York Centre for Complex Systems
Analysis, University of York, YO10 5GE, UK.
2
3
3rd September 2014
Sponsored by
The increasing sophistication of high-throughput molecular technologies offers exciting possibilities for systems-level
analysis of biological systems. Yet such novel and diverse data are frequently accompanied by significant statistical
and computational challenges.
Workshop on Statistical Systems Biology
The increasing
sophistication
of high-throughput
molecularattechnologies
exciting
possibilities
for systems-level
This
meeting aims
to bring together
researchers operating
the interfaceoffers
of systems
biology,
mathematical
analysis of biological systems. Yet such novel and diverse data are frequently accompanied by significant statistical
th
th statistics, in order to better leverage emerging data using state-of-the-art statistical methodologies.
9modelling
-11and
December
2014
warwick.ac.uk/SSB
and computational
challenges.
Keynote Speakers:
This meeting aims to bring together researchers operating at the interface of systems biology, mathematical
Prof.
Davidand
Balding,
Chairinoforder
the UCL
Genetics
Institute.
modelling
statistics,
to better
leverage
emerging data using state-of-the-art statistical methodologies.
Dr.
Clive Bowsher,
Keynote
Speakers:School of Mathematics, Bristol.
Sponsored
by
Prof. Mustafa Khammash, Control Theory and Systems Biology, ETH Zurich.
Prof. David Balding, Chair of the UCL Genetics Institute.
Prof.
Walter
Kolch,School
Director
the Conway Bristol.
Institute and Systems Biology Ireland, University College Dublin.
Dr. Clive
Bowsher,
of of
Mathematics,
Prof.
John Lygeros,
Head ofControl
the Automatic
Control
Laboratory,
Prof. Mustafa
Khammash,
Theory and
Systems
Biology,ETH
ETHZurich.
Zurich.
Dr.
Sach
Mukherjee,
Group Leader
at the Netherlands
Cancer
Institute.
Prof.
Walter
Kolch, Director
of the Conway
Institute and
Systems
Biology Ireland, University College Dublin.
Prof.
Tavaré Head
FRS, Director
of Cancer Control
Research
UK Cambridge
Institute.
Prof. Simon
John Lygeros,
of the Automatic
Laboratory,
ETH Zurich.
Darren
Wilkinson,
School
of Mathematics
and Statistics,
Newcastle.
The increasingProf.
sophistication
ofGroup
high-throughput
molecular
technologies
offers exciting possibilities for systems-level
Dr.
Sach
Mukherjee,
Leader
at the Netherlands
Cancer
Institute.
analysis of biological
systems.
YetDirector
such novel
andResearch
diverse
are frequently
Prof. Simon
Tavaré FRS,
of Cancer
UKdata
Cambridge
Institute. accompanied by significant statistical
and computational
challenges.
Organisers: Chris Oates, Mark Girolami, David Rand, Michael Stumpf. Contact: c.oates (at) warwick.ac.uk.
Prof. Darren Wilkinson, School of Mathematics and Statistics, Newcastle.
This meeting aims to bring together researchers operating at the interface of systems biology, mathematical
modelling andOrganisers:
statistics,
in
order
better
leverage
emerging
using
state-of-the-art
statistical methodologies.
Oates,to
Mark
Girolami,
Davidand
Rand,
Michaeldata
Stumpf.
Contact:
c.oates (at) warwick.ac.uk.
Call forChris
Papers,
Presentations
Posters
In partnership with Statistical Applications in Genetics and Molecular Biology (De Gruyter)
Keynote Speakers:
we are soliciting high-quality research for joint journal submission and presentation at the
CallChair
for Papers,
Presentations
and Posters
Prof. David Balding,
of the UCL
Genetics Institute.
workshop. In addition, we are encouraging the submission of contributed presentations and
In partnership with Statistical Applications in Genetics and Molecular Biology (De Gruyter)
Dr. Clive Bowsher,
School
of Mathematics,
posters
in relevant
research areas.Bristol.
we are soliciting high-quality research for joint journal submission and presentation at the
Prof. Mustafa Khammash,
Theory
and Systems
Biology,ofETH
Zurich.presentations and
workshop. In Control
addition, we
are encouraging
the submission
contributed
posters in relevant research areas.
Prof. Walter Kolch,
Director of the Conway Institute and Systems Biology Ireland, University College Dublin.
Prof. John Lygeros, Head of the Automatic Control Laboratory, ETH Zurich.
Dr. Sach Mukherjee, Group Leader at the Netherlands Cancer Institute.
Prof. Simon Tavaré FRS, Director of Cancer Research UK Cambridge Institute.
Prof. Darren Wilkinson, School of Mathematics and Statistics, Newcastle.
Motivation: Uncovering neural connectivity
How are these brain regions interacting?
Motivation: Uncovering neural connectivity
How are these brain regions interacting?
Multiregression
dynamical models
(MDMs; Catriona
Queen and Jim Smith,
1993):
Y1 (t)
θ1 (t)
Y2 (t)
θ2 (t)
θ3 (t)
Y1 (t + 1)
θ1 (t + 1)
Y2 (t + 1)
θ2 (t + 1)
..
.
..
.
Y3 (t)
Y3 (t + 1)
θ3 (t + 1)
Motivation: Uncovering neural connectivity
Subject 1
Rep 2
Rep 3
Rep 4
λ = 2
λ = 0
Rep 1
λ = 4
Figure: fMRI data; replicate data from the same subject. DAGs
estimated from time series data using MDMs.
10
λ = 8
λ = 6
Node Number
1
2
3
4
5
6
7
8
9
10
Symmetry
Bilateral
Bilateral
Bilateral
Bilateral
Left Dominant
Right Dominant
Bilateral
Bilateral
Bilateral
Bilateral
Summary
Motor:hand/face
Sensory:All-but-face
Motor:All-but-face
UNKNOWN
Sensorimotor: L Hand+Arms
Sensorimotor: R Hand+Arms
Sensory: Trunk-to-feet
Sensory: Face
Auditory
Sensorimotor:All-but-face - Sensory:Face
Rep 1
λ
16
λ=
=10
λ20= λ
14
λ=
=8
λ=
=6
λ16= λ
10
λ=
=4
λ14= λ
8λ=
λ12= λ
6λ=
=0
λ10= 4λ = λ8 = 2λ =
λ18= λ
12
=2
An ideal algorithm
Rep 1
Subject 1
Rep 2
Rep 3
Rep 4
..
.
..
.
Figure: fMRI data; joint learning of all DAGs simultaneously. [λ is a
“regularity” parameter.]
Rep 1
λ
16
λ=
=10
λ20= λ
14
λ=
=8
λ=
=6
λ16= λ
10
λ=
=4
λ14= λ
8λ=
λ12= λ
6λ=
=0
λ10= 4λ = λ8 = 2λ =
λ18= λ
12
=2
An ideal algorithm
Rep 1
Subject 1
Rep 2
Rep 3
Rep 4
Rep 1
..
.
..
.
Figure: fMRI data; joint learning of all DAGs simultaneously. [λ is a
“regularity” parameter.]
But how might this work? Seems challenging...
Joint statistical model for multiple DAGs
Population structure N
DAGs
G (1)
G (2)
G (3)
...
G (K −1)
G (K )
Subjects
Y (1)
Y (2)
Y (3)
...
Y (K −1)
Y (K )
Parameters
θ (1)
θ (2)
θ (3)
θ (K −1)
θ (K )
Figure: A Bayesian hierarchical model for multiple DAGs.
Joint statistical model for multiple DAGs
Joint prior over multiple DAGs:


Y
p(G (1:K ) |N) ∝ 
r (G (k) , G (l) ) ×
K
Y
m(G (k) )
k=1
(k,l)∈N
|
!
{z
regularity
}
|
{z
}
multiplicity correction
Joint statistical model for multiple DAGs
Joint prior over multiple DAGs:


Y
p(G (1:K ) |N) ∝ 
r (G (k) , G (l) ) ×
K
Y
m(G (k) )
k=1
(k,l)∈N
|
!
{z
regularity
}
|
{z
Structural Hamming distance:
log(r (G (k) , G (l) )) = −λ
X
(j,i)
}
multiplicity correction
(k)
I{j ∈ Gi
(l)
∆Gi }.
Joint statistical model for multiple DAGs
Joint prior over multiple DAGs:


Y
p(G (1:K ) |N) ∝ 
r (G (k) , G (l) ) ×
K
Y
m(G (k) )
k=1
(k,l)∈N
|
!
{z
regularity
}
|
{z
Structural Hamming distance:
log(r (G (k) , G (l) )) = −λ
X
(k)
I{j ∈ Gi
(l)
∆Gi }.
(j,i)
Binomial correction:
m(G
(k)
−1
P Y
P
(k)
)=
I{|Gi | ≤ dmax }.
(k)
|Gi |
i=1
}
multiplicity correction
Joint statistical model for multiple DAGs
Joint prior over DAGs and the network N:
p(G (1:K ) , N) ∝ p(G (1:K ) |N)p(N)
where η controls the density of the network N and
+C
log(p(N)) = ηkNk.
Joint statistical model for multiple DAGs
Joint prior over DAGs and the network N:
p(G (1:K ) , N) ∝ p(G (1:K ) |N)p(N)
where η controls the density of the network N and
+C
log(p(N)) = ηkNk.
Then interest is in the “doubly joint” MAP
(Ĝ (1:K ) , N̂) := arg maxG (1:K ) ,N p(Y (1:K ) |G (1:K ) , N)p(G (1:K ) , N).
Why are multiple DAGs challenging? Acyclicity.
Design a local move that encourages more similar DAGs...
Why are multiple DAGs challenging? Acyclicity.
Pick an edge on which the two DAGs differ.
Why are multiple DAGs challenging? Acyclicity.
Propose to add/remove this edge to the other DAG.
Why are multiple DAGs challenging? Acyclicity.
Check no cycles are created.
Why are multiple DAGs challenging? Acyclicity.
Delete an edge from each cycle.
Why are multiple DAGs challenging? Acyclicity.
But these new DAGs are as different as when we started!
Clearly a different approach is needed.
Integer linear programs for MAP DAGs
Inference for single DAGs is “easy”:
Integer linear programs for MAP DAGs
Inference for single DAGs is “easy”:
Consider a Bayesian network Y with respect to a directed acyclic
graph (DAG) model G . i.e.
pY (y |G ) =
P
Y
i=1
p(yp |yGi , Gi ).
Integer linear programs for MAP DAGs
Inference for single DAGs is “easy”:
Consider a Bayesian network Y with respect to a directed acyclic
graph (DAG) model G . i.e.
pY (y |G ) =
P
Y
p(yp |yGi , Gi ).
i=1
There is interest in the maximum a posteriori (MAP) estimate
Ĝ := arg maxG pY (y |G )p(G ).
Integer linear programs for MAP DAGs
Inference for single DAGs is “easy”:
Consider a Bayesian network Y with respect to a directed acyclic
graph (DAG) model G . i.e.
pY (y |G ) =
P
Y
p(yp |yGi , Gi ).
i=1
There is interest in the maximum a posteriori (MAP) estimate
Ĝ := arg maxG pY (y |G )p(G ).
Q
Choose a “nice” p(G ) = Pi=1 pGi (Gi ).
Integer linear programs for MAP DAGs
Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a
DAG model as an integer linear program (ILP):
max f T x subject to Ax ≤ b, Cx = d ,
x ∈ Zd
Integer linear programs for MAP DAGs
Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a
DAG model as an integer linear program (ILP):
max f T x subject to Ax ≤ b, Cx = d ,
x ∈ Zd
Objective function:
f T x = log[pY (y |G )p(G )] =
=
P
X
i=1
P
X
log[p(yi |yGi , Gi )pGi (Gi )]
X
log[p(yi |yπ , π)pGi (π)]xi,π
i=1 π⊆{1:P}
where xi,π = I{Gi = π} and e.g. x = (0, 0, 1, 0, 0, 0, . . . , 1, 0, 0).
Integer linear programs for MAP DAGs
Cussens ’10 and Jaakola et al. ’10 cast the MAP estimator in a
DAG model as an integer linear program (ILP):
max f T x subject to Ax ≤ b, Cx = d ,
x ∈ Zd
Objective function:
f T x = log[pY (y |G )p(G )] =
=
P
X
i=1
P
X
log[p(yi |yGi , Gi )pGi (Gi )]
X
log[p(yi |yπ , π)pGi (π)]xi,π
i=1 π⊆{1:P}
where xi,π = I{Gi = π} and e.g. x = (0, 0, 1, 0, 0, 0, . . . , 1, 0, 0).
Q: How to ensure x corresponds to a well-defined DAG?
Integer linear programs for MAP DAGs
Convexity:
X
xi,π = 1 ∀i ∈ {1 : P}
π⊆{1:P}
No self-loops:
xi,π = 0 ∀π ∈ i
Acyclicity (version of Jaakola et al., ’10):
X X
xi,π ≥ 1 ∀∅ =
6 C ⊆ {1 : P}.
i∈C π⊆{1:P}
π∩C =∅
These constraints together exactly characterise the space of DAGs.
Integer linear programs for multiple DAGs
Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP.
Integer linear programs for multiple DAGs
Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP.
Sketch: Need to encode N in the state vector x:
E (k,l) := I{(k, l) ∈ N}
∀k, l with k < l
Integer linear programs for multiple DAGs
Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP.
Sketch: Need to encode N in the state vector x:
E (k,l) := I{(k, l) ∈ N}
∀k, l with k < l
Now construct binary indicators of individual edges and
disagreement between edges:
X
(k)
(k)
ej,i =
xi,π
∀i, j, k.
π⊆{1:P}
j∈π
(k,l)
= I{j ∈ Gi
(k,l)
= I{j ∈ Gi
dj,i
Dj,i
(k)
∆Gi }
(l)
(k)
∆Gi
(l)
∀i, j, k, l with k < l
and (k, l) ∈ N}
∀i, j, k, l.
Integer linear programs for multiple DAGs
Claim: (Ĝ (1:K ) , N̂) is characterised by an extended ILP.
Sketch: Need to encode N in the state vector x:
E (k,l) := I{(k, l) ∈ N}
∀k, l with k < l
Now construct binary indicators of individual edges and
disagreement between edges:
X
(k)
(k)
ej,i =
xi,π
∀i, j, k.
π⊆{1:P}
j∈π
(k,l)
= I{j ∈ Gi
(k,l)
= I{j ∈ Gi
dj,i
Dj,i
(k)
∆Gi }
(l)
(k)
∆Gi
(l)
∀i, j, k, l with k < l
and (k, l) ∈ N}
∀i, j, k, l.
XOR and AND constraints ≡ integer linear inequalities.
Example: AND constraint A = AND(B, C )
+A
−B
+A
−A
+B
≤
0
−C
≤
0
+C
≤
1
This linearisation of AND is optimal, in the sense that it describes
all facets of the convex hull of feasible solutions for the AND
constraint.
Some preliminary results...
Results: Simulation study
P = 5, K = 5
0.6
0.5
MCC
0.4
0.3
0.2
0.1
0
−0.1
Independent
\lambda = 0.1
\lambda = 0.5
\lambda = 1
\lambda = 2
\lambda = 100
Figure: Simulated data; fixed N = complete, varying λ. [MCC =
Matthews’ correlation coefficient.]
Intuition: A modest amount of regularisation should help, but too
much can lead to artefacts.
Results: Group analysis of fMRI data
Subject 1
Rep 2
Rep 3
Rep 4
Rep 1
Subject 2
Rep 2
Rep 3
λ = 20
λ = 16
λ = 12
λ = 8
λ = 4
λ = 0
Rep 1
Figure: fMRI data on two subjects; learning λ.
Rep 4
Results: Group analysis of fMRI data
Density
hyperparameter
η = 30
η = 40
3
1
1
6
N
6
5
1
6
5
3
2
4
2
4
1
6
5
2
4
1
6
5
Sub 6
Sub 5
Sub 4
Sub 3
Sub 2
Sub 1
5
1
η = 80
3
2
4
η = 70
3
2
4
η = 60
3
2
4
η = 50
3
Figure: fMRI data on six subjects; learning N. [λ = 4]
6
5
Summary
To do:
I
Large-scale empirical study
I
Informative group priors (e.g. based on demographic covariates or
genealogy)
I
Causal semantics for transfer learning
References:
I
CJO, Lilia Costa, Tom Nichols (2014) Towards a Multi-Subject
Analysis of Neural Connectivity.
To appear in Neural Computation.
arXiv:1404.1239.
I
CJO, Jim Smith, Sach Mukherjee, James Cussens (2014) Exact
Estimation of Multiple Directed Acyclic Graphs.
arXiv:1404.1238.
Download