Estimation of Common Community Structures in Multi-Subject Brain Networks Dragana M. Pavlovic

advertisement
Estimation of Common Community Structures in Multi-Subject
Brain Networks
Dragana M.
1 University
Cambridge, UK;
1
Pavlovic ,
Emma K.
2
Towlson ,
Petra E.
of Warwick, Dept. of Statistics, Coventry, UK;
3 University
3
Vértes ,
2 University
Edward T.
3,4
Bullmore ,Thomas
E.
1
Nichols
of Cambridge, Dept. of Physics, Cavendish Laboratory,
of Cambridge, Brain Mapping Unit, Dept. of Psychiatry, Cambridge, UK;
4 GlaxoSmithKline,
Clinical Unit
Cambridge, Addenbrooke’s Hospital, Cambridge, UK.
Introduction
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
PI
●
●
●
●
●
●
●
●
●
●
PI1
PI2
PI3
PI4
n100, 0.05
●
●
●
●
●
●
●
●
●
●
●
●
●
n500, 0.05
●
n500, 0
●
●
●
●
●
●
●
●
●
●
●
●
●
PI5
PI6
PI7
PI8
α1 α2 α3 α1 α2 α3 α1 α2 α3
Dr
aft
Figure 2: RMSE of group size (alpha). The estimated community sizes are reasonably estimated for
n = 100 & 500 nodes except for the very challenging
(nearly homogeneous) settings, PI1, PI2 & PI5.
alpha2
alpha3
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
n500, 0
●
●
●
●
●
●
n100, 0.05
●
●
●
n100, 0
●
●
●
●
●
●
●
●
n50, 0.05
●
●
●
●
●
●
n50, 0
●
●
●
●
●
int
●
●
●
●
int
●
●
●
int
●
●
int
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
β11 β12 β13 β22 β23 β33
n500, 0.05
●
n500, 0
●
n100, 0.05
●
n100, 0
●
n50, 0.05
●
slope
●
slope
●
slope
●
slope
●
slope
●
β11 β12 β13 β22 β23 β33
n50, 0
slope
●
PI
n500, 0.05
int
RMSE
alpha1
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
0.8
0.6
0.4
0.2
0.0
int
The original ERMM is for a single binary network
with n nodes, and estimates the number (Q) of latent
non-overlapping node groups; nodes in one group
share similar intra- and inter-group connectivity patterns, and each group pair is seen as a mini ErdősRényi network (i.e. its edges follow a Bernoulli distribution with a specific group rate). In particular, this is a richer model than implied by the usual
modularity algorithms [6]. For K subjects, we use
the ERMM to estimate a common community of Q
groups. For the k-th subject, each of the Q(Q + 1)/2
group rates is modelled by P regressors (GLM).
The model parameters are estimated using the variational approximation, yielding an estimate of Q and
the assignment of nodes into groups, for the most
probable common community.
Within the variational algorithm, we estimate the
logistic regression coefficients with the NewtonRaphson algorithm, using the Firth’s modified score
procedure [2] to control the variance of the regression coefficients. We simulated data for 10 subjects with a common network structure by agevarying connection strengths. Fixing the different connectivity rates (π or PI) and group prevalences (α or alpha), the common community
structure is generated for Q = 3 (see Fig. 1).
alpha3
n100, 0
Methods
0.3
0.2
0.1
0.0
0.3
0.2
0.1
0.0
0.3
0.2
0.1
0.0
0.3
0.2
0.1
0.0
0.3
0.2
0.1
0.0
0.3
0.2
0.1
0.0
alpha2
n50, 0.05
RMSE
alpha1
Real Data Results
n50, 0
There is great interest in community estimation in
functional and structural brain networks. However, most existing work focuses on single subjects
or group-pooled networks. To address this limitation, we propose a combination of the Erdős-Rényi
Mixture Model (ERMM)[1, 4] and the generalised
linear model (ERMM-GLM) for binary (or binarized) network data. Our approach allows the inclusion of subject’s specific covariates (e.g., age,
patient/control) to model the variability of connectivity between subjects, while estimating the most
probable common community structure in the multisubject networks; the covariates can be regarded as
nuisance, or we can conduct hypothesis tests on the
estimated GLM coefficients.
Simulation Results
●
PI1
●
PI2
●
PI3
●
PI4
●
PI5
●
PI6
●
PI7
●
PI8
β11 β12 β13 β22 β23 β33
Figure 3: RMSE of regression coefficients. The regression coefficient estimates are variable for the intercept with n = 100 and n = 50 (top 4 rows) but
accurate for the remaining settings.
1.00
alpha1
alpha2
alpha3
n50_0
0.75
0.50
0.25
0.00
1.00
n50_005
0.75
0.50
0.25
0.00
1.00
PI
n100_0
0.75
0.50
ARI
Figure 1: Network simulation parameters
0.25
0.00
1.00
PI2
PI3
PI4
n100_005
0.75
0.50
0.25
PI5
PI6
PI7
PI8
0.00
1.00
0.75
n500_0
0.50
0.25
0.00
1.00
n500_005
The intercept is set from connectivity rates, while
the age effect is set to 0.05 for all subjects. Using these parameters, we generated 100 realisations
of undirected, binary networks on 50,100 and 500
nodes. For the resulting estimates, we calculate the
Root Mean Square Error (RMSE).
We next considered real data from a drug study with
control and patient groups [3] with 3 treatment arms
(we only consider the Aripiprazol arm). Functional
connectivity for a 297-ROI atlas was computed from
resting-state fMRI data. Binary networks were obtained by keeping the 10% strongest edges for each
subject.
PI1
0.75
0.50
0.25
Figure 5: Reorganised combined adjacency matrices for common fit. While there are many similar
pattern of connectivity, some appreciable difference
are present.
Conclusions
We have proposed an extension to the ERMM, the
ERMM-GLM that allows us to model connectivity
for a group of subjects while accounting for systematic difference between subjects expressed in a
GLM. Our simulations show a reasonable performance, though evaluations are very difficult in this
(or any mixture model) setting, as there is ambiguity
as to which estimated group corresponds to which
"true" group. The real data analysis found a common set of 36 groups, and many similar connectivity patterns as well as some appreciable differences.
Our future work will investigate this model in practical setting.
0.00
PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8
PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8
PI1 PI2 PI3 PI4 PI5 PI6 PI7 PI8
Figure 4: Average similarity between true and fitted communities, evaluated with ARI scores. The
ERMM-GLM accurately estimates the "true" communities, apart from expected challenging cases
(PI1 & PI5).
References
[1]
[2]
[3]
[4]
Daudin et al., Statistics and computing, (2008).
Firth et al., Biometrika 80.1 (1993): 27-38.
Lynall et al., The Journal of Neuroscience (2010)
Mariadassou et al., The Annals of Applied Statistics (2010):
715-742
[5] Hubert et al., Journal of classification, vol. 2, (1985).
[6] Pavlovic et al., PLOS ONE (2014)
Download