Erdös-Rényi Mixture Model for Finding Community Structure in Brain Networks. D. Pavlovic1,3, P .Vertes2, M. Rubinov2, E. Bullmore3, T. Nichols1 1University of Warwick, Dept. of Statistics, Coventry, United Kingdom, 2University of Cambridge, Cambridge, United Kingdom, 3GlaxoSmithKline, Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, United Kingdom. Introduction Dorsal ganglion CEPDL, URXL, CEPDR, URXR, ALA, RID Lateral ganglion AIBL, AINL, AIZL, AVAL, AVBL, AVDL, AVEL, AVHL, AVJL, RIAL, RIBL, RICL, SAAVL, SIBDL, RIML, RMDL, RMDVL, SMDVL, RIVL, ADFL, ADLL, AFDL, ASEL, ASGL, ASHL, ASIL, ASJL, ASKL, AUAL, AWAL, AWBL, AWCL, AIBR, AINR, AIZR, AVAR, AVBR, AVDR, AVER, AVHR, AVJR, RIAR, RIBR, RICR, SAAVR, SIBDR, RIMR, RMDR, RMDVR, SMDVR, RIVR, ADFR, ADLR, AFDR, ASER, ASGR, ASHR, ASIR, ASJR, ASKR, AUAR, AWAR, AWBR, AWCR Ventral ganglion Retrovesicular ganglion AIAL, AIML, AIYL, AVKL, SAADL, SIADL, SIAVL, SIBVL, RMDDL, RMFL, RMHL, SMBDL, SMBVL, SMDDL, AIAR, AIMR, AIYR, AVKR, SAADR, SIADR, SIAVR, SIBVR, RMDDR, RMFR, RMHR, SMBDR, SMBVR, SMDDR, RIH, RIR, RIS, AVL ADAL, AVFL, RIFL, RIGL, SABVL, RMGL, ADEL, FLPL, ADAR, AVFR, RIFR, RIGR, SABVR, RMGR, ADER, FLPR, SABD, AS01, DA01, DB01, DB02, DD01, VA01, VB01, VB02, VD01, VD02, AQR, AVG Ventral cord neuron group AS02, AS03, AS04, AS05, AS06, AS07, AS08, AS09, AS10, DA02, DA03, DA04, DA05, DA06, DA07, DB03, DB04, DB05, DB06, DB07, DD02, DD03, DD04, DD05, VA02, VA03, VA04, VA05, VA06, VA07, VA08, VA09, VA10, VA11, VB03, VB04, VB05, VB06, VB07, VB08, VB09, VB10, VB11, VC01, VC02, VC03, VC04, VC05, VD03, VD04, VD05, VD06, VD07, VD08, VD09, VD10, VD11 PVPL, PVPR, PVT, AS11, DA08, DA09, DD06, PDA, PDB, VA12, VD12, VD13 DVA, DVC, DVB Lumbar ganglion Fit 20 60 80 Fig 1.Above: Degree distribution Fig 2. Below: Block matrix Q1 Q2 LUAL, PVCL, PVNL, PVQL, PVWL, ALNL, PHAL, PHBL, PHCL, PLML, PLNL, LUAR, PVCR, PVNR, PVQR, PVWR, ALNR, PHAR, PHBR, PHCR, PLMR, PLNR, PQR, PVR Ganglion Functional 0.24 0.09 ERMM Q=6 0.25 0.09 N-M Ganglion Functional 0.30 0.19 0.25 0.09 Results For C.Elegans, ICL was maximized for Q=9; Fig. 1 shows the observed and empirical degree distribution, showing an excellent fit. Table (a) shows the estimated partition relative to the Ga classification; some ERMM classes fall into only 2 or 3 Ga partitions, suggesting an informative result; Fig. 2 shows the reorganized adjacency matrix. Table (b) shows the ARI values comparing this with partitions Ga & Fu; while the optimal Q=9 shows similar performance as N-M, a 6-partition fit shows We have fit a stochastic community model to graphs based on C.Elegans and human data. While gold-standard references are always difficult to find, using the Adjusted Rand Index we found respectable overlap of estimated communities with a reference on C.Elegans (ARI>0.24), as well as decent reproducibility of the communities found on human data (similar if slightly worse than hierarchical clustering). Conclusions Q4-5 Q6 classes Community 2 Community 1 Table b: ARI scores better agreement that N-M (which is, itself, a 6partition result). For the human data, the 36 ARI values are summarized with a boxplot showing the difference between ERMM and hierarchical clustering results (Fig 3). While median delta-ARI is worse with ERMM than hierarchical clustering, some subjects were better with ERMM. The fit for subject 1 shows again a sensible segmentation of area (Fig 4). Q3 Q7 Q8 Q9 Fig 3: ARI results for ERMM versus hierarchical cluster shows ERMM less stable; however ERMM fit provides more informative fit. Fig 4. Bellow: Illustrative results for subject 1. Table a: Classification of neurons ERMM Q=9 N-M 40 Degrees Reorganized Adjacency matrix BDUL, PVDL, SDQL, HSNL, ALML, PDEL, BDUR, PVDR, SDQR, HSNR, ALMR, PDER, AVM, PVM Dorsorectal ganglion Observed 0.001 Posterolateral ganglion Pre-anal ganglion variable Q1 Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 0.100 Q2 Legend Methods Model. For an N-node adjacency matrix, the ERMM finds a partition of Q non-overlapping classes; both Q and the class memberships are estimated via a variational approach combined with an Integrated Classification Likelihood (ICL) criterion, where the degree distribution is modelled as a Poisson mixture. We evaluate the accuracy of our method with clustering accuracy metrics [4], specifically the adjusted Rand index (ARI) that measures the how often two clusterings agree. We apply the ERMM to the network of neuron connections of C.Elegans with 279 vertices [5], and compare to a modularity clustering [6,Table S1] which produced a 6-class partition. As a measure of ground-truth we used a 10-class ganglion partition ("Ga") and 6-class functional partition ("Fu") [5]. We also use resting-state fMRI data on 36 human subjects, each with 176 TR=2s scans, where the full image data is reduced to 150 time series based on a high-dimensional temporally concatenated ICA. Each subject's 150x150 correlation matrix is binarized at P=0.0001 and submitted to the ERMM. For each subject, the remaining 35 subjects serve as a 'gold standard', by creating a FFX population average correlation matrix, also thresholded at 0.0001. To compare to another clustering method, agglomerative hierarchical clustering (average linkage) was used to create a partitioning, where the number of partitions was chosen to match the estimated Q. For each subject, then, ARI could then be computed twice, once for ERMM once for hierarchical clustering. RIPL, RMEL, IL1DL, IL1L, IL1VL, URADL, URAVL, BAGL, CEPVL, IL2DL, IL2L, IL2VL, OLLL, OLQDL, OLQVL, URBL, URYDL, URYVL, RIPR, RMER, IL1DR, IL1R, IL1VR, URADR, URAVR, BAGR, CEPVR, IL2DR, IL2R, IL2VR, OLLR, OLQDR, OLQVR, URBR, URYDR, URYVR, RMED, RMEV Q9 Anterior ganglion Degree Distribution Q=9 ERMM Classification Q=9 Density Ganglion Classification classes Q8 Q7 Q6 Q3 While there has been much interest in using graph theory metrics to summarize functional and structural MRI networks, there have been no previous attempts to fit a stochastic model for estimation of community structure often present in the data. While the Exponential Random Graph Model provides a likelihood on the space of graphs, the likelihood is a function of global graph statistics and can't be used to estimate communities. The Erdös-Rényi Mixture Model (ERMM) [1,2] allows for structured networks, based on communities that have homogeneous internal connection probabilities, and common connection probabilities between communities (see also [3]). Unlike traditional blockmodels, it does not assume that communities are tightly connected; rather just that connection prevalence is common within a community or between any given pair of communities. In this work we demonstrate the ERMM in a model organism and human RSN fMRI data. Community 4 2 1 23 4 5 6 7 8 1 2 3 4 5 6 7 8 Community 3 Community 8 2 This suggests that community estimation methods based on stochastic models can find and fit structure in networks based on brain connectivity data, and may provide greater insight than simple scalar graph measures. References 1. Nowicki, Snijders (2001). JASA, 96(455), 1077-1087. 2. Daudin, Picard, Robin (2007). Stats & Comp,1 8(2), 173-183. 3. Newman, Leicht (2007). PNAS, 104(23), 9564-9569. 4. Handl, Knowles, Kell (2005). Bioinfom, 21(15), 3201-3212. 5. Worm Atlas. http://www.wormatlas.org/neuronalwiring.html 6. Pan, Chatterjee, Sinha (2010). PloS one, 5(2), e9240. http://go.warwick.ac.uk/tenichols