Introduction

advertisement
Erdös-Rényi Mixture Model for Finding Community
Structure in Brain Networks.
D. Pavlovic1,3, P .Vertes2, M. Rubinov2, E. Bullmore3, T. Nichols1
1University of Warwick, Dept. of Statistics, Coventry, United Kingdom, 2University of Cambridge, Cambridge, United Kingdom, 3GlaxoSmithKline,
Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, United Kingdom.
Introduction
Dorsal ganglion
CEPDL, URXL, CEPDR, URXR, ALA, RID
Lateral ganglion
AIBL, AINL, AIZL, AVAL, AVBL, AVDL, AVEL,
AVHL, AVJL, RIAL, RIBL, RICL, SAAVL, SIBDL,
RIML, RMDL, RMDVL, SMDVL, RIVL, ADFL,
ADLL, AFDL, ASEL, ASGL, ASHL, ASIL, ASJL,
ASKL, AUAL, AWAL, AWBL, AWCL, AIBR,
AINR, AIZR, AVAR, AVBR, AVDR, AVER,
AVHR, AVJR, RIAR, RIBR, RICR, SAAVR,
SIBDR, RIMR, RMDR, RMDVR, SMDVR,
RIVR, ADFR, ADLR, AFDR, ASER, ASGR,
ASHR, ASIR, ASJR, ASKR, AUAR, AWAR,
AWBR, AWCR
Ventral ganglion
Retrovesicular
ganglion
AIAL, AIML, AIYL, AVKL, SAADL, SIADL,
SIAVL, SIBVL, RMDDL, RMFL, RMHL,
SMBDL, SMBVL, SMDDL, AIAR, AIMR, AIYR,
AVKR, SAADR, SIADR, SIAVR, SIBVR,
RMDDR, RMFR, RMHR, SMBDR, SMBVR,
SMDDR, RIH, RIR, RIS, AVL
ADAL, AVFL, RIFL, RIGL, SABVL, RMGL,
ADEL, FLPL, ADAR, AVFR, RIFR, RIGR,
SABVR, RMGR, ADER, FLPR, SABD, AS01,
DA01, DB01, DB02, DD01, VA01, VB01,
VB02, VD01, VD02, AQR, AVG
Ventral cord neuron
group
AS02, AS03, AS04, AS05, AS06, AS07, AS08,
AS09, AS10, DA02, DA03, DA04, DA05,
DA06, DA07, DB03, DB04, DB05, DB06,
DB07, DD02, DD03, DD04, DD05, VA02,
VA03, VA04, VA05, VA06, VA07, VA08,
VA09, VA10, VA11, VB03, VB04, VB05,
VB06, VB07, VB08, VB09, VB10, VB11,
VC01, VC02, VC03, VC04, VC05, VD03,
VD04, VD05, VD06, VD07, VD08, VD09,
VD10, VD11
PVPL, PVPR, PVT, AS11, DA08, DA09, DD06,
PDA, PDB, VA12, VD12, VD13
DVA, DVC, DVB
Lumbar ganglion
Fit
20
60 80
Fig 1.Above: Degree distribution
Fig 2. Below: Block matrix
Q1
Q2
LUAL, PVCL, PVNL, PVQL, PVWL, ALNL,
PHAL, PHBL, PHCL, PLML, PLNL, LUAR,
PVCR, PVNR, PVQR, PVWR, ALNR, PHAR,
PHBR, PHCR, PLMR, PLNR, PQR, PVR
Ganglion Functional
0.24
0.09 ERMM Q=6
0.25
0.09 N-M
Ganglion Functional
0.30
0.19
0.25
0.09
Results
For C.Elegans, ICL was maximized for Q=9; Fig. 1
shows the observed and empirical degree
distribution, showing an excellent fit. Table (a)
shows the estimated partition relative to the Ga
classification; some ERMM classes fall into only 2
or 3 Ga partitions, suggesting an informative result;
Fig. 2 shows the reorganized adjacency matrix.
Table (b) shows the ARI values comparing this with
partitions Ga & Fu; while the optimal Q=9 shows
similar performance as N-M, a 6-partition fit shows
We have fit a stochastic community model to
graphs based on C.Elegans and human data.
While gold-standard references are always difficult
to find, using the Adjusted Rand Index we found
respectable overlap of estimated communities with
a reference on C.Elegans (ARI>0.24), as well as
decent reproducibility of the communities found on
human data (similar if slightly worse than
hierarchical clustering).
Conclusions
Q4-5 Q6
classes
Community 2
Community 1
Table b: ARI scores
better agreement that N-M (which is, itself, a 6partition result). For the human data, the 36 ARI
values are summarized with a boxplot showing the
difference between ERMM and hierarchical
clustering results (Fig 3). While median delta-ARI is
worse with ERMM than hierarchical clustering,
some subjects were better with ERMM. The fit for
subject 1 shows again a sensible segmentation of
area (Fig 4).
Q3
Q7 Q8
Q9
Fig 3: ARI results for
ERMM versus
hierarchical cluster
shows ERMM less
stable; however ERMM
fit provides more
informative fit. Fig 4.
Bellow: Illustrative
results for subject 1.
Table a: Classification of neurons
ERMM Q=9
N-M
40
Degrees
Reorganized Adjacency matrix
BDUL, PVDL, SDQL, HSNL, ALML, PDEL,
BDUR, PVDR, SDQR, HSNR, ALMR, PDER,
AVM, PVM
Dorsorectal ganglion
Observed
0.001
Posterolateral
ganglion
Pre-anal ganglion
variable
Q1
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Class 8
Class 9
0.100
Q2
Legend
Methods
Model. For an N-node adjacency matrix, the
ERMM finds a partition of Q non-overlapping
classes; both Q and the class memberships are
estimated via a variational approach combined with
an Integrated Classification Likelihood (ICL)
criterion, where the degree distribution is modelled
as a Poisson mixture.
We evaluate the accuracy of our method with
clustering accuracy metrics [4], specifically the
adjusted Rand index (ARI) that measures the how
often two clusterings agree. We apply the ERMM
to the network of neuron connections of C.Elegans
with 279 vertices [5], and compare to a modularity
clustering [6,Table S1] which produced a 6-class
partition. As a measure of ground-truth we used a
10-class ganglion partition ("Ga") and 6-class
functional partition ("Fu") [5].
We also use resting-state fMRI data on 36 human
subjects, each with 176 TR=2s scans, where the
full image data is reduced to 150 time series based
on a high-dimensional temporally concatenated
ICA. Each subject's 150x150 correlation matrix is
binarized at P=0.0001 and submitted to the
ERMM. For each subject, the remaining 35
subjects serve as a 'gold standard', by creating a
FFX population average correlation matrix, also
thresholded at 0.0001. To compare to another
clustering method, agglomerative hierarchical
clustering (average linkage) was used to create a
partitioning, where the number of partitions was
chosen to match the estimated Q. For each
subject, then, ARI could then be computed twice,
once for ERMM once for hierarchical clustering.
RIPL, RMEL, IL1DL, IL1L, IL1VL, URADL,
URAVL, BAGL, CEPVL, IL2DL, IL2L, IL2VL,
OLLL, OLQDL, OLQVL, URBL, URYDL, URYVL,
RIPR, RMER, IL1DR, IL1R, IL1VR, URADR,
URAVR, BAGR, CEPVR, IL2DR, IL2R, IL2VR,
OLLR, OLQDR, OLQVR, URBR, URYDR,
URYVR, RMED, RMEV
Q9
Anterior ganglion
Degree Distribution Q=9
ERMM Classification
Q=9
Density
Ganglion
Classification
classes
Q8 Q7 Q6
Q3
While there has been much interest in using graph
theory metrics to summarize functional and
structural MRI networks, there have been no
previous attempts to fit a stochastic model for
estimation of community structure often present in
the data. While the Exponential Random Graph
Model provides a likelihood on the space of graphs,
the likelihood is a function of global graph statistics
and can't be used to estimate communities. The
Erdös-Rényi Mixture Model (ERMM) [1,2] allows
for structured networks, based on communities that
have homogeneous internal connection probabilities, and common connection probabilities between
communities (see also [3]). Unlike traditional blockmodels, it does not assume that communities are
tightly connected; rather just that connection
prevalence is common within a community or
between any given pair of communities.
In this work we demonstrate the ERMM in a model
organism and human RSN fMRI data.
Community 4
2
1 23 4 5 6 7 8
1
2
3
4
5
6
7
8
Community 3
Community 8
2
This suggests that community estimation methods
based on stochastic models can find and fit
structure in networks based on brain connectivity
data, and may provide greater insight than simple
scalar graph measures.
References
1. Nowicki, Snijders (2001). JASA, 96(455), 1077-1087.
2. Daudin, Picard, Robin (2007). Stats & Comp,1 8(2), 173-183.
3. Newman, Leicht (2007). PNAS, 104(23), 9564-9569.
4. Handl, Knowles, Kell (2005). Bioinfom, 21(15), 3201-3212.
5. Worm Atlas. http://www.wormatlas.org/neuronalwiring.html
6. Pan, Chatterjee, Sinha (2010). PloS one, 5(2), e9240.
http://go.warwick.ac.uk/tenichols
Download