Estimation of Common Community Structures in Multi-Subject Brain Networks Dragana M. 1 University Cambridge, UK; 1 Pavlovic , Emma K. 2 Towlson , Petra E. of Warwick, Dept. of Statistics, Coventry, UK; 3 University 3 Vértes , 2 University Edward T. 3,4 Bullmore ,Thomas E. 1 Nichols of Cambridge, Dept. of Physics, Cavendish Laboratory, of Cambridge, Brain Mapping Unit, Dept. of Psychiatry, Cambridge, UK; 4 GlaxoSmithKline, Clinical Unit Cambridge, Addenbrooke’s Hospital, Cambridge, UK. Introduction ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● PI ● ● ● ● ● ● ● ● ● ● PI1 PI2 PI3 PI4 n100, 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● n500, 0.05 ● n500, 0 ● ● ● ● ● ● ● ● ● ● ● ● ● PI5 PI6 PI7 PI8 α1 α2 α3 α1 α2 α3 α1 α2 α3 Dr aft Figure 2: RMSE of group size (alpha). The estimated community sizes are reasonably estimated for n = 100 & 500 nodes except for the very challenging (nearly homogeneous) settings, PI1, PI2 & PI5. alpha2 alpha3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● n500, 0 ● ● ● ● ● ● n100, 0.05 ● ● ● n100, 0 ● ● ● ● ● ● ● ● n50, 0.05 ● ● ● ● ● ● n50, 0 ● ● ● ● ● int ● ● ● ● int ● ● ● int ● ● int ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● β11 β12 β13 β22 β23 β33 n500, 0.05 ● n500, 0 ● n100, 0.05 ● n100, 0 ● n50, 0.05 ● slope ● slope ● slope ● slope ● slope ● β11 β12 β13 β22 β23 β33 n50, 0 slope ● PI n500, 0.05 int RMSE alpha1 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 0.8 0.6 0.4 0.2 0.0 int The original ERMM is for a single binary network with n nodes, and estimates the number (Q) of latent non-overlapping node groups; nodes in one group share similar intra- and inter-group connectivity patterns, and each group pair is seen as a mini ErdősRényi network (i.e. its edges follow a Bernoulli distribution with a specific group rate). In particular, this is a richer model than implied by the usual modularity algorithms [6]. For K subjects, we use the ERMM to estimate a common community of Q groups. For the k-th subject, each of the Q(Q + 1)/2 group rates is modelled by P regressors (GLM). The model parameters are estimated using the variational approximation, yielding an estimate of Q and the assignment of nodes into groups, for the most probable common community. Within the variational algorithm, we estimate the logistic regression coefficients with the NewtonRaphson algorithm, using the Firth’s modified score procedure [2] to control the variance of the regression coefficients. We simulated data for 10 subjects with a common network structure by agevarying connection strengths. Fixing the different connectivity rates (π or PI) and group prevalences (α or alpha), the common community structure is generated for Q = 3 (see Fig. 1). alpha3 n100, 0 Methods 0.3 0.2 0.1 0.0 0.3 0.2 0.1 0.0 0.3 0.2 0.1 0.0 0.3 0.2 0.1 0.0 0.3 0.2 0.1 0.0 0.3 0.2 0.1 0.0 alpha2 n50, 0.05 RMSE alpha1 Real Data Results n50, 0 There is great interest in community estimation in functional and structural brain networks. However, most existing work focuses on single subjects or group-pooled networks. To address this limitation, we propose a combination of the Erdős-Rényi Mixture Model (ERMM)[1, 4] and the generalised linear model (ERMM-GLM) for binary (or binarized) network data. Our approach allows the inclusion of subject’s specific covariates (e.g., age, patient/control) to model the variability of connectivity between subjects, while estimating the most probable common community structure in the multisubject networks; the covariates can be regarded as nuisance, or we can conduct hypothesis tests on the estimated GLM coefficients. Simulation Results ● PI1 ● PI2 ● PI3 ● PI4 ● PI5 ● PI6 ● PI7 ● PI8 β11 β12 β13 β22 β23 β33 Figure 3: RMSE of regression coefficients. The regression coefficient estimates are variable for the intercept with n = 100 and n = 50 (top 4 rows) but accurate for the remaining settings. 1.00 alpha1 alpha2 alpha3 n50_0 0.75 0.50 0.25 0.00 1.00 n50_005 0.75 0.50 0.25 0.00 1.00 PI n100_0 0.75 0.50 ARI Figure 1: Network simulation parameters 0.25 0.00 1.00 PI2 PI3 PI4 n100_005 0.75 0.50 0.25 PI5 PI6 PI7 PI8 0.00 1.00 0.75 n500_0 0.50 0.25 0.00 1.00 n500_005 The intercept is set from connectivity rates, while the age effect is set to 0.05 for all subjects. Using these parameters, we generated 100 realisations of undirected, binary networks on 50,100 and 500 nodes. For the resulting estimates, we calculate the Root Mean Square Error (RMSE). We next considered real data from a drug study with control and patient groups [3] with 3 treatment arms (we only consider the Aripiprazol arm). Functional connectivity for a 297-ROI atlas was computed from resting-state fMRI data. Binary networks were obtained by keeping the 10% strongest edges for each subject. PI1 0.75 0.50 0.25 Figure 5: Reorganised combined adjacency matrices for common fit. While there are many similar pattern of connectivity, some appreciable difference are present. Conclusions We have proposed an extension to the ERMM, the ERMM-GLM that allows us to model connectivity for a group of subjects while accounting for systematic difference between subjects expressed in a GLM. Our simulations show a reasonable performance, though evaluations are very difficult in this (or any mixture model) setting, as there is ambiguity as to which estimated group corresponds to which "true" group. The real data analysis found a common set of 36 groups, and many similar connectivity patterns as well as some appreciable differences. Our future work will investigate this model in practical setting. 0.00 PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8 PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8 PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8 Figure 4: Average similarity between true and fitted communities, evaluated with ARI scores. The ERMM-GLM accurately estimates the "true" communities, apart from expected challenging cases (PI1 & PI5). References [1] [2] [3] [4] Daudin et al., Statistics and computing, (2008). Firth et al., Biometrika 80.1 (1993): 27-38. Lynall et al., The Journal of Neuroscience (2010) Mariadassou et al., The Annals of Applied Statistics (2010): 715-742 [5] Hubert et al., Journal of classification, vol. 2, (1985). [6] Pavlovic et al., PLOS ONE (2014)