Rich Block: Rich Club Analysis with the Erdos-Renyi Mixture Model Soroosh Afyouni1, Dragana Pavlovic1, Emma Towlson2, Petra Vertes3, Theo Arvanitis1, Edward Bullmore3,4, and Thomas Nichols1 1University of Warwick 2 Northeastern University 1. Introduction 3 4 University of Cambridge GlaxoSmithKline 4. Evaluation and Validation en BI d y tit BI d 0.5 0.5 tit B en BI d R R N BI d od en al R tit B al R od y 0 y 0 0.8 0.3 0.2 Emprical RC ERMM-RC (q=10) ERMM-RC (q=20) ERMM-RC (q=40) ERMM-RC (q=70) ERMM-RC (q=100) 0.1 0 0 10 20 30 40 50 60 70 80 90 100 de Degree/Lambda Figure3 5. Application: Resting State fMRI Connectivity in Schizophrenia Rich Block was used to detect the differenceresting fMRI connectivity between controls (n=13) and schizophrenic subjects (n=12) (Lynall etal 2010). Wavelet correlation transformation (WCT) (scale 2) was used to form a network, after binarisation to give 10% edge density, GL-SBM with patient/control, age and movement covariates found Q=31 latent blocks. Fig4 Illustrates degree distribution of estimated models for control and patient versus degree distribution of their null models, sorted by descending λ. Despite detectable difference between degree distribution of estimated models and null models in earlier RBs, they significantly overlaps in further RB until they become identical in the last RB (degree distribution of last RB (RB31) is degree distribution of whole network. For sake of visualisation, we just showed 16 RBs below.) RB 1 RB 2 0.3 RB 3 0.2 RB 4 0.2 0.1 0 0 5 10 RB 7 0.15 0 0 10 0.1 20 RB 8 0.15 0 RB 5 0.15 RB 6 0.15 0.1 0.1 0.05 0.05 0 0 0.1 0 10 20 RB 9 0.1 0 0 20 40 RB 10 0.1 0 50 RB 11 0.1 0 50 RB 12 0.1 0.1 0.05 0.05 0 0 0 50 100 RB 13 0.1 0 50 0.05 0.05 0.05 0.05 0 0 0 0 100 RB 14 0.1 0 50 100 RB 15 0.1 0 0.05 0.05 0.05 0 0 0 0 100 0 50 100 0 100 200 0 100 200 0 100 200 0 100 200 RB 16 0.1 0.05 50 Control RB Control Null RB Patient RB Patient Null RB 14 12 BI Fig 3 shows an example with the C.Elegans connectome (N=279 nodes, E=2287 edges, [Towlson et al, 2013]), with RC’s Φ with thick black line, and RB’s Φ for various Q’s. There is close agreement that improves with Q. 0.4 300 250 R 0.5 0.1 Simulated Network nt it B 0.6 0.5 Matrix of Parameters od al R Figure2 N Rich Club Coefficient 0.7 0.2 Simulated networks were formed from synthetic probability matrices, Πs (Fig1B). Regardless of network size, 25% of each probability matrix were assigned higher probabilites with their intra-connections also inflated to ensure that Rich Club/Block will be formed. Twelve different types of networks were used to conduct the simulations: Networks of size 60, 120 and 300 were selected. For each category, four Q numbers (6,15,30,60) were investigated. All blocks assumed to have the same size. Three categories were discarded due to low (≥ 4) ratio of node to block Example for N=300, Q=15 y 0 1 3. Simulation Methods Q= 60 1 0.5 Erdos Renyi Mixture Model (ERMM): The ERMM decomposes a N × N binary P network G(N, E) into Q latent blocks. Prevalence of block ` ∈ {1, ..., Q} is α` , such that ` α` = 1. Edges occur between block q and ` as a Bernoulli random variable with parameter πq` . where Π = ((πql))1≤q,`≤Q . The degree distribution is then approximately Poisson mixture with parameter λ. Mixing weights αandλ PQ gives the expected degree of each block:λq = (n − 1) l=1 αlπql Rich Block (RB): We propose defining a new version of Rich Club in terms of the ERMM, describing how a set of Erdos-Renyi blocks with expected degree of k or larger interact with each other. Fundamental to RB is how the ERMM groups nodes with common a pattern of connectivity. Hence, all nodes in a given block have the same fitted degree, and thus we treat these ’clubs’ as the basic units of the RC. The ERMM implies the statistical distribution of RB degree (again, a Binomial mixture); with the GL-SBM, Π and α can be found for group data, not just single subjects. normalisation and inference of RB coefficients is done with a null model. In contrast to RC, we can estimate null Π while holding fixed the node-to-block assignments fixed. This allows computationally efficient sampling from the null, as the Binomial mixture can be directly sampled from the fit to one random sample. Q= 30 R 1 0.9 Generalised Linear Stochastic Block Model (GL-SBM): Fitting the ERMM to multi subjects will produce different block structures in each subject. In [Poster 3861 WT-PM] we introduce the GL-SBM that finds a common set of Q blocks, allowing subject connectivity to differ as per covariates in a logistic regression. Q= 15 R en en BI d od N 1 Non-normalised Rich Club of Hermaphrodite CElegans - 1k restars for each Q 1 Rich Club (RC): Rich Club measures how densly nodes of degree k are connected to other nodes with degree k or larger. Normalised Rich Club coefficient, Φk measures the density of club k divided by the average density of a null network’s club k. Null samples also permit computation of Pvalues and selection of significant RCs, and are obtained by random but degree-preserving rewiring (Maslove & Sneppen, 2002). Q= 6 y tit B al R tit B al R tit en BI d R od en R 0 R 0 B 0 0.5 al R 0.5 y 0.5 1 y 1 B BI d od N BI d R 1 al R tit B al R tit en al R od Results suggest that although distance between Nodal RB and Nodal RC remain respectively the same over different number of nodes and blocks, RB/RC identity values become closer to each other as number of blocks increase. 0 N 0 od 0 0.5 N 0.5 y 0.5 N= 300 1 y 1 N • Node-wise RB/RC labelling (Identity). Node i is given a value of 1 if it belongs to the largest RB/RC found significant at level 5%, 0 otherwise. We compare RB Identity to RC Identity with Adjusted Rand Index (ARI). 2. Methods 1 B • Node-wise RB/RC coefficient (Nodal RB/RC). The nodal coefficient for node i is the Φk of the smallest RC or RB containing i. We compare NodalRB to NodalRC with R 2. N= 120 N • Forming clubs (block) that not only share the same degree but also share the same pattern of connectivity. We use two approaches to compare RB with traditional RC: od • Provide an approach to multi-subject analysis of Rich Club. N= 60 N Rich Club (RC) analysis is one way to gain insight to the structure of the human connectome. One impediment to general use of the RC is difficulty with multi-subject data. In other work we have shown that stochastic block models like the Erdos-Renyi Mixture Model (ERMM) are useful for capturing of biological relevant group structure,in single networks (Pavlovic et al 2014), and in recent work we have extended this to find common network structure in multiple subjects (General Linearised Stochastic Block Model), GL-SBM, (Pavlovic etal, 2015 - See poster OHBM2015 3861 WT-PM). Here we use the GL-SBM to develop a new metric called Rich Block (RB), a stochastic block model implementation of RC, with two main goals: 0 100 200 200 10 Figure 4 8 150 4 50 2 2 4 6 8 10 12 14 50 Monte Carlo Test 100 150 200 250 300 Kullback-Leibler Divergence 5 14 RB Detected RB 4.5 12 4 10 Distance from Null Model -log10 p-values 3.5 3 2.5 2 1.5 10 15 0 0 5 0 5 10 15 20 25 30 Blocks 60 40 20 0 0 5 10 15 Blocks 1.2 1 Blocks Patient Expected Degrees 80 1.4 20 25 30 2 1.8 1.6 1.4 1.2 1 Blocks Figure 6 10 15 Rich Blocks Conclusions References Heuvel and Sporns. Journal of Neuroscience, 2011 — Lynall et al, Journal of Neuroscience, 2010 — Maslov and Sneppen, Science, 2002 — Pavlovic et al, PloS One, 2014 — Towlson et al, Journal of Neuroscience, 2013 S.Afyouni@Warwick.ac.uk 0 1.6 Fig6 shows block-wise RB coefficients, for patients and controls. A Monte Carlo (parametric Boostrap) test was used to compare Φ between populations. The blocks which are significantly larger was assigned by a red circle at the top of it. 2 Rich Blocks 20 1.8 6 0.5 5 40 Figure 5 4 0 Rich Block 60 2 8 1 0 Control Expected Degrees 80 Control Block-wise RB 100 Patient Block-wise RB 6 Expected Degree Expected Degree Fig5 shows detected RBs in red on unsorted expected degree bar plots for patients and controls. http://warwick.ac.uk/tenichols/ohbm In this study we have proposed a new metric called Rich Block which is combination of stochastic block model and empirical rich block. This approach simultaneously identifies a simplified or ’compressed’ network structure while identifying the ’richly connected’ components of this structure, and facilitating group comparisons.