A Collective NMF Method for Detecting Protein Functional Module from Multiple Data Sources Yuan Zhang∗ , Nan Du+ , Liang Ge+ , Kebin Jia∗ , Aidong Zhang+ ∗ Beijing University of Technology Beijing, 100124, China zhangyuan@emails.bjut.edu.cn kebinj@bjut.edu.cn ABSTRACT Detecting functional modules from protein-protein interaction (PPI) networks is an active research area with many practical applications. However, there is always a critical concern on the false PPI interactions which are derived from the high-throughput experiments and the unsatisfactory results obtained from single PPI network with severe information insufficiency. To address this problem, we propose a Collective Non-negative Matrix Factorization (CoNMF) based soft clustering method which efficiently integrates information of gene ontology (GO), gene expression data and PPI networks. In our method, the three data sources are formed into two graphs with similarity adjacency matrices and these graphs are approximated by a matrix factorization with their common factor which provides the straightforward interpretation of clustering results. Extensive experiments show that we can improve the module detection performance by integrating multiple biological data sources and that CoNMF yields superior results compared to other multiple data sources fusion methods by identifying a larger number of more precise protein modules with actual biological meaning and certain degree of overlapping. Categories and Subject Descriptors H.2.8 [Database Management]: Database ApplicationsData Mining; J.3 [Life and Medical Sciences]: Biology and genetics General Terms Algorithm 1. INTRODUCTION The techniques of identifying communities from networks, such as online social networks [17], mobile phone networks [18], scientific collaboration networks and biological networks [10], are of great use in helping us understand and further Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM-BCB’12, October 7-10, 2012, Orlando, FL, USA Copyright 2012 ACM 978-1-4503-1670-5/12/10 ...$15.00. ACM-BCB 2012 + State University of New York at Buffalo Buffalo, 14260, U.S.A. nandu, liangge, azhang@buffalo.edu exploit these networks [27]. Previous studies have shown that the proteins belonging to the same module are densely connected with each other but sparsely interacting with the other proteins in the network [22]. Protein functional modules can be understood as independent sub-networks and proteins in the same module always interact more frequently and show stronger functional dependencies in the cellular system. It is a common practice to adopt clustering methods to detect protein functional modules from PPI networks [1, 4]. These clustering methods can be broadly characterized as distance-based or graph-based [15]. Distance-based clustering methods use classic clustering techniques and foof the distance between proteins. cus on the deı̈ňAnition Graph-based clustering approaches consider the topology of PPI networks and partition the graph/network based on some criteria which are to maximize the density of subgraph or minimize the cost of cut-off while separating the graph. However, we observe that these clustering methods have not provided satisfactory results because of the complex, incomplete and noisy nature of PPI networks. It is well known that a PPI network contains a lot of noise and errors: the rate of false positive links is sometimes up to 50% [25]. Solving the problem goes beyond what a single data source can provide and thus requires the integration of multiple information sources. These days, multiple high throughput techniques, such as microarray expression profiles and mass spectrometry experiments, have provided us with assistance to deal with the false information. Also, the Gene Ontology project is highly developed which aims at assigning functional annotations derived by small-scale experiments or proved literature to genes and gene products [21]. So far, the challenging task shifts to integrate these data sources in a manner that will lead to more reliable and valid functional modules. The other challenge we always face is the overlapping of functional modules in the PPI network. Since some proteins may perform different cellular functions, such multifunctional proteins are expected to specifically interact with distinct sets of partners, either simultaneously or not, depending on the function performed. Although the overlapping nature of protein functional modules has already been recognized, most existing clustering methods cannot handle the overlap of clusters. 655 To address these challenges, we propose a multiple graph clustering method which is based on collective symmetric non-negative matrix factorization (NMF). NMF is a popular matrix decomposition method which factorizes an input non-negative matrix into two non-negative matrices of lower rank via a multiplicative updates algorithm [6]. NMF has been proved to be useful in dimension reduction of image, text, and signal data. It also has been applied in an unsupervised setting in natural language processing such as document clustering [14]. More recently, NMF was successfully utilized to find co-expressed genes in gene expression data which directly used the dimensionality reduction nature of classic NMF by finding optimal proximity matrix factorizations of the high-dimensional data [6]. However, in our problem, we adopt multiple biological data sources, including gene expression data, gene ontology and PPI network, and formulate them into two similarity based graphs representing the relationships between the pairwise proteins. Our objective is to find a consistent partition across all the graphs, i.e., the common factor of the graphs. In this paper, we develop a graph clustering objective function based on symmetric NMF which simultaneously analyzes multiple similarity based graphs. In our method, the clustering of multiple similarity based graphs is automatically reduced to a multiplicative update algorithm which achieves local optimized solutions. Since the optimization problem is sparse, containing a large number of zero, 1-norm penalty on matrix factors are involved to achieve a more sparse solution. Moreover, by setting an experimental threshold on the optimized matrix factors, we obtain overlapping clusters of the graphs. The rest of the paper is organized as follows. In Section 2, we briefly introduce the related work on multiple data sources clustering. In Section 3, we introduce the construction of the similarity based graphs which integrate GO and gene expression data with the PPI network respectively. In Section 4, the collective NMF method are proposed. Extensive experimental studies are carried out in Section 5, which show the improvement of our CoNMF method. Finally, further discussions and conclusions are presented. 2. RELATED WORK For mining clusters from multiple data sources, there are mainly three approaches: weighted summation of original data, summation of kernels and clustering ensemble method. ×N , Given multiple similarity adjacency matrices A(m) ∈ RN + m = 1, 2, . . . M , which are derived from the data sources, we summarize the three methods as follow: Weighted Summation of Original Data. By a linear combination of all the similarity adjacency matrices with appropriate weights, the integrated similarity matrix A = M (m) is constructed. With this new matrix A, classim=1 A cal clustering methods can be used to find the clusters, such as the spectral clustering [19] or Markov clustering (MCL) [13]. Summation of Kernels. Given the original data sets, which are the multiple graphs in our problem, kernel based clustering methods first map the data into some feature space F by means of a certain map Φ, and then group patterns in the feature space according to a similarity or dis- ACM-BCB 2012 similarity criteria where clusters are set of similar patterns. One of the most relevant aspects of kernel based clustering is that it is possible to compute the distances of the nodes in the kernel space without knowing explicitly mapping method Φ. It can be done by applying the so called distance kernel trick. For multiple graph partition problem, the commonmethod is to summarize the kernel of each M (m) . One particular example for graph as K = m=1 K graph partition is the spectral kernel [23] as: K (m) = d (m) vk (m) T (vk ) , (1) k=1 (m) where vk is the kth smallest eigenvector of graph Laplacian L(m) and d N is the number of eigenvectors used per individual graph. Clustering can then be obtained by performing kernel K-means on K. Clustering Ensemble. Cluster ensemble has been studied by many researchers in the machine learning community. Strehl and Ghosh [24] proposed two approaches, i.e., instance-based and cluster-based approaches, to formulating graph partitioning problems for cluster ensemble. The instance-based approach, denoted as INENS, models each instance as a vertex in a graph and computes the similarity between a pair of instances according to how frequently they are clustered into the same clusters. The cluster-based approach, denoted as CLENS, takes each cluster from all the clustering partitions as a vertex and the similarity between the clusters based on the percentage of instances they share as the weight of each edge. Fern and Brodley [8] developed a Hybrid Bipartite Graph Formulation (HBGF) which constructed a bipartite graph based on the instances and clusters followed by the graph partition method to get the ensemble result. It is one of the most stable and effective approach for combining cluster partitions. 3. CONSTRUCTION OF RELATIONSHIP GRAPHS In this section, we construct two relationship graphs by integrating co-expression correlation and GO functional similarity with the PPI network respectively. By integrating multiple biological information sources we are able to alleviate the false information in the PPI network. 3.1 Co-expression Correlation Gene expression data has been used to assist in enhancing the reliability of PPI networks and detecting co-expressed protein modules in many clustering algorithms [4]. We use the Pearson correlation coefficient (normalized to the range of 0 to 1) to calculate the co-expression correlation, which is denoted as CoExp. And the co-expression correlation graph A(1) is constructed by combining with the PPI network as follow: A(1) (pi , pj ) = CoExp(pi , pj ) × P pi(pi , pj ). 3.2 (2) GO-driven Similarity The GO-driven similarity is referred to as semantic similarity. It is to measure the similarity between two terms by quantifying the information two terms share in common. One concept commonly used to quantify the information of 656 terms is Information Content (IC), which gives a measure how specific and informative a term is. Let C be a set of terms in the GO, and p(c) is the probability of finding a child c ∈ C in the annotation structure. The IC of a term c can be quantified as the negative log likelihood, −log(p(c)). If c is the root term of the taxonomy, −log(p(c)) will equal 0. One important model is proposed by Lin [16] which can be seen as a normalized version of Resnik’s model as follows: sim(c1 , c2 ) = 2 × maxc∈S(c1 ,c2 ) [log(p(c))] . log(p(c1 )) + log(p(c2 )) (3) The GO-driven similarity, Sim(pi , pj ), is calculated by aggregating maximum inter-set similarity values as follows: Sim(pi , pj ) = 1 max simp∈Cj (ck , cp ) × m×n k∈Ci + max simk∈Ci (ck , cp ) . (4) p∈Cj W,H≥0 (5) Intuitively, the symmetric NMF defined in Equation 7 is more suitable for graph clustering based on similarity matrix , which is illustrated in Figure 1(b). Since the similarity matrix of graph is symmetric, the clustering indicators for the rows and columns are in transpose relation. 2 (7) min A − HH T . CONSENSUS CLUSTERING BASED ON COLLECTIVE SYMMETRIC NMF NMF was first introduced as a dimension reduction method for pattern analysis. It has attracted great attention of researchers in many fields because of its straightforward interpretability, i.e., each observation data can be explained by additive linear combination of nonnegative basis vectors. In many real world pattern analysis problems, the non-negativity constraints are prevalent, e.g., image pixel values, chemical compound concentrations and signal intensities are always nonnegative. Recently, NMF is given more attention for its application to data clustering. Previous work has shown the direct relationships between NMF and kernel K-means [7, 12]. It can model widely varying data distributions and accomplish both hard and soft clustering simultaneously [3]. In our problem, our data are formulated into similarity based graphs as discussed in Section 3. In these graphs, each node corresponds to a protein and each edge corresponds to the similarity or relationship between the pairwise proteins. When a similarity matrix is constructed for the graph, the factorization of this similarity matrix will generate a clustering assignment matrix that is nonnegative and well captures the cluster structure inherent in the graph representation. Different from the traditional NMF as discussing below, the similarity based graph clustering are formulated in an alternative symmetric NMF. In this section, in order to integrate multiple similarity relationships, we propose a new collective symmetric NMF method. 4.1 Symmetric NMF for Graph Clustering NMF performs a lower rank approximation by minimizing the distance between non-negative matrix A and the multiplicative factor matrices. The typical NMF can be formu- ACM-BCB 2012 F The nonnegative constraint on H is crucial for its success since the entry hij of H represents the real-valued membership weight for protein i belonging to cluster j. 4.2 4. (6) F where A ∈ Rm×n , W ∈ Rm×k , H ∈ Rk×n , R represents + + + the set of nonnegative real numbers, and ·F represents Frobenius norm, and k min{m, n}. As shown in Figure 1(a), the interpretation of NMF in clustering problem can be explained in this way: the columns of W provide the clustering indicators for columns of A, and the columns of H T provides the clustering indicators for rows of A. H≥0 The GO-driven similarity are combined with the PPI network as Equation 5, and P pi is the adjacency matrix of the PPI network: A(2) (pi , pj ) = Sim(pi , pj ) × P pi(pi , pj ). lated as the following optimization problem [11]: 2 min A − W H T , Collective Symmetric NMF To simultaneously analyze multiple graphs and extract the consensus clusters, we construct a collective symmetric NMF model which aims at finding the common factor for all graphs. Suppose we are given M graphs whose adjacency matrices are A(m) , m = 1, 2, . . . , M , each of size N × N and corresponding to the same nodes, i.e., the same proteins. The modified formulation is given as: F = min H M 2 1 (m) − HH T + A 2 m=1 F η H2F + β H1 , s.t. 0 ≤ H ≤ 1, (8) where the H matrix is the cluster indicator matrix. The regularization terms on H are added to improve the numerical stability and avoid overfitting. What is more, to achieve sparsity of the solution on the cluster indicator matrix, we integrate 1-norm regularization which has been successfully utilized in a variety of sparse optimization problem [12]. Each row of cluster indicator matrix H can be seen as a vector representing the probability of a protein occurring in each cluster. Hence, we impose the constraint that the elements of H must fall between 0 and 1 and then we are able to obtain the overlapping clusters by setting an experimental threshold on it. We minimize the cost function, i.e., Equation 8, by extending gradient descent and employing multiplicative update rules on it. Taking the partial derivatives of the objective function F with respect to H yield the following: M M ∂F A(m) H − 2 HH T H − ηH − β/2. = ∂H m=1 m=1 (9) Given the objective function and its partial derivatives, one can solve H using a gradient descent approach. Here, we 657 A × W A HT (a) NMF × H HT (b) Symmetric NMF Figure 1: W and H in (a) are the clustering indicator matrices for non-negative matrix A; H in (b) is the clustering indicator matrix for symmetric non-negative matrix A. develop the following iterative matrix factorization based update rules for the unknown factors: M (m) H m=1 A (10) H←H◦ , T HH H + ηH + β/2 where ◦ stands for element-wise matrix multiplication. 5. 5.1 EXPERIMENT AND RESULTS Experiment Setup Our PPI data set is from Gavin [9] which contains 2551 proteins annotated with their GO function terms and 21413 interactions, which are retrieved by mass spectrometry of tandem affinity purification data (TAP). We downloaded GOslim file from http://www.geneontology.org/. Since we want to build a conservative similarity correlation matrix based only on experimentally determined annotations to avoid any bias in our data set due to other electronic annotation systems, we included those annotations under the evidence codes IDA, IEP, IGI, IMP, IPI, RCA, TAS, and excluded codes IC, IEA, ISS, NAS, ND—because the latter are either inferred from electronic annotations or assigned by automated methods without curatorial judgments [21]. We used the January 14, 2012 version of the GO slim mapping file and chose the Biological Process (BP) hierarchy to calculate the GO-driven similarity of proteins. The gene expression data is from Brem [2] in which each gene is described by 131 expression values, associated with 131 time courses during certain cell processes. To check whether the extracted modules correspond to real complexes we compared our results with the CYC2008 benchmark dataset [20], a comprehensive catalogue of manually curated protein complexes in S. cerevisiae reliably backed by small-scale experiments from the literature. 5.2 Comparison Criteria Known complexes are available from those catalogued in the CYC2008 database which serves as gold-standard data. Obviously, the gold-standard complexes Gc and predicted clusters Pc are expected to be matched as much as possible. Thus, the overlapping score OL(Pc , Gc ) is used to find the matched complexes: OL(Pc , Gc ) = ACM-BCB 2012 |VP c ∩ VGc |2 , |VP c | × |VGc | (11) where |VP c | is the size of predicted cluster, |VGc | is the size of known complex, and |VP c ∩ VGc | is the number of the intersections of the predicted clusters and the known complexes. Pc and Gc will be considered to be matched if their OL score is larger than a threshold δ, which is typically chosen as 0.2 [5, 26]. Based on the number of matched clusters, F -measure is used to estimate the matching results by taking into account both the precision and the recall. P recision is defined as P rec = T P/(T P + F N ), where T P (true positive) is the number of the predicted clusters matched with the known complexes by OL ≥ δ, and F N (false negative) is the number of the known complexes that are not matched by the predicted clusters. Recall is defined as Rec = T P/(T P + F P ), where F P (false positive) is the total number of the predicted clusters minus T P . The F -measure is: F -measure = 5.3 2 × P rec × Rec . P rec + Rec (12) Comparison Result We compared the performance of our method with the baseline methods which are introduced in Section 2. We implemented MCL method, with the best parameter setting, on the weighted summation of original data and the method is denoted as WeiSim. We also ran traditional NMF on the graph of GO similarity and co-expression similarity, which are denoted as NMFonGO and NMFonCo respectively. For the cluster ensemble methods, the K-means, hierarchical clustering and spectral clustering were separately implemented on these A(i) matrices to get base clusters. And in these clustering methods, where a priori cluster number k is needed, we set it as 300. We also set stop rules for hierarchical clustering method to get a number of k clusters. In the method of CoNMF, the parameters in Equation 8 were set as η = 0.4, β = 0.04 according to extensive experiments. The comparison results are presented in Figure2. Moreover, we extracted all the matched modules from these clustering methods and counted the number of proteins they covered and the modules they detected. The results are shown in Table 1 and the term of Cover proteins represents the number of unique protein these methods have detected. From Figure 2, we can see that our method outperforms other approaches by getting higher rates on all three evalua- 658 0.7 0.7 0.6 0.6 0.5 0.5 WeiSum KerSpe INENS CLENS HBGF NMFonGO NMFonCo CoNMF 0.4 0.3 0.2 0.1 Precision Recall tion criteria. Our method has not only found more matched modules from the PPI network but also obtained more precise cluster results than others. The significant improvement is partly because GO-driven similarity and co-expression correlation measurement have enhanced the accuracy and integrality of PPI networks as we can see from the comparison with NMFonGO and NMFonCo. Also, the new objective function in our method is a more effective way for extracting the consolidate cluster structures in both of graphs than the other clustering methods that deal with multiple data sources. The reasons of the bad performance of the kernel summation method are that the structure patterns of those graphs do not necessarily satisfy linear ensemble in the kernel space and that it is always a challenging problem to find the proper kernel for specific data. For the cluster ensemble methods, they find the consensus clusters based on the meta-clusters from different clustering methods and data sources. But none of them is unable to collect the overlapping proteins in the graph which definitely affects its performance. The straightforward method of summation of original datasets gets relative high F -measure too, but its cluster results tend to be smaller than our method according to the Table 1 and cannot deal with the overlapping problem neither. 5.4 Parameter Sensitivity Analysis We obtain overlapping clusters by setting a threshold λ on the factor matrix H. This parameter also affects the performance of CoNMF. We study the effect of parameter choosing and present the change of F -measure and overlapping rate in Figure 3. The F -measure generally increases at first as λ rises, getting the highest value, i.e., F -measure=0.617 around λ = 0.72, and goes down slightly and gradually after that, while the overlapping rate (the number of the protein belonging to more than one module divided by the number of protein derived from our method) decreases as λ increases. At the point of λ = 0.72, the overlapping rate of modules is 0.330. This phenomenon is consistent with the fact of functional overlapping of proteins. With a lower threshold, the clusters tend to absorb some proteins that do not really belong to ACM-BCB 2012 0.4 0.3 0.2 0.1 0.4 Fmeasure Figure 2: Comparison with baseline methods. Fmeasure Overlapping rate 0.5 0.6 0.7 0.8 0.9 1 Figure 3: Parameter effect on F-measure and overlapping rate. them and the precision of the method drops. As λ increases, proteins are assigned to clusters more harshly. Although the precision seems have not dramatically fall, the overlapping rate does. 6. CONCLUSION We have presented a novel approach CoNMF which adopts multiple non-negative matrix factorization models to detect protein functional modules. This method makes better use of multiple biological information sources. We applied our method on the TAP data and evaluated the results with the hand-curated complexes from CYC2008 benchmark dataset. Augmented with other clustering methods that deal with multiple sources, our methodology is superior in finding more modules with actual biological meaning and naturally tackling the overlapping detection problem. In the present work, we integrate gene expression data and GO functional annotations with the PPI network but the model can be extended for more similarity based data sources and for other multiple graph clustering problems. 7. ACKNOWLEDGMENTS The research work is supported by National Natural Science Foundation of China under Grant No. 30970780 and Ph.D. Programs Foundation of Ministry of Education of China under Grant No. 20091103110005. 8. REFERENCES [1] Gary D Bader and Christopher Wv Hogue. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics, 4(1):2, 2003. [2] Rachel B Brem and Leonid Kruglyak. The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences of the United States of America, 102(5):1572–1577, 2005. [3] Yanhua Chen, Manjeet Rege, Ming Dong, and Jing Hua. Non-negative matrix factorization for semi-supervised data clustering. Knowledge and Information Systems, 17(3):355–379, 2008. 659 Table 1: The matched modules from the base clustering methods and proposed method WeiSum KerSpe INENS CLENS HBGF NMFGO NMFCo CoNMF Matched modules 201 97 104 147 181 176 195 205 Average size 5.97 7.19 8.16 8.34 9.14 7.85 10.94 17.52 Cover proteins 1199 699 849 1226 1654 1496 2178 2277 Overlap proteins 220 354 752 [4] Young-rae Cho, Woochang Hwang, and Aidong Zhang. Efficient modularization of weighted protein interaction networks using k-hop graph reduction. BioInformatics and BioEngineering 2006 BIBE 2006 Sixth IEEE Symposium on, pages 289–298, 2006. [5] Young-Rae Cho, Lei Shi, and Aidong Zhang. Flownet: Flow-based approach for efficient analysis of complex biological networks. 2009 Ninth IEEE International Conference on Data Mining, pages 91–100, 2009. [6] Karthik Devarajan. Nonnegative matrix factorization: An analytical and interpretive tool in computational biology. PLoS Computational Biology, 4(7):12, 2008. [7] Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of nonnegative matrix factorization and spectral clustering. Proc SIAM Data Mining Conf, 44(4):606âĂŞ610, 2005. [8] Xiaoli Zhang Fern and Carla E Brodley. Solving cluster ensemble problems by bipartite graph partitioning. Twentyfirst international conference on Machine learning ICML 04, pages 36–41, 2004. [9] A C Gavin, P Aloy, P Grandi, R Krause, M Boesche, M Marzioch, C Rau, L J Jensen, S Bastuck, B Dumpelfeld, and et al. Proteome survey reveals modularity of the yeast cell machinery. Nature, 440(7084):631–636, 2006. [10] M Girvan and M E J Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12):7821–7826, 2002. [11] Hyunsoo Kim and Haesun Park. Nonnegative matrix factorization based on alternating nonnegativity constrained least squares and active set method. SIAM Journal on Matrix Analysis and Applications, 30(2):713, 2008. [12] Jingu Kim and Haesun Park. Sparse nonnegative matrix factorization for clustering. Science, pages 1–15, 2006. [13] Nevan J Krogan, Gerard Cagney, Haiyuan Yu, Gouqing Zhong, Xinghua Guo, Alexandr Ignatchenko, Joyce Li, Shuye Pu, Nira Datta, Aaron P Tikuisis, and et al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 440(7084):637–643, 2006. [14] D D Lee and H S Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–91, 1999. [15] Chuan Lin, Young-rae Cho, Woo-chang Hwang, Pengjun Pei, and Aidong Zhang. Clustering methods in protein-protein interaction network. in Knowledge Discovery in Bioinformatics Techniques Methods and Application, 2006. [16] Dekang Lin. An information-theoretic definition of similarity. Quality, 1:296–304, 1998. ACM-BCB 2012 [17] M E J Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review E - Statistical, Nonlinear and Soft Matter Physics, 74(3 Pt 2):036104, 2006. [18] J-P Onnela, J SaramÃd’ki, J HyvÃűnen, G SzabÃş, D Lazer, K Kaski, J KertÃl’sz, and A-L BarabÃasi. Structure and tie strengths in mobile communication networks. Proceedings of the National Academy of Sciences of the United States of America, 104(18):7332–7336, 2007. [19] Max Planck and Ulrike Von Luxburg. A tutorial on spectral clustering a tutorial on spectral clustering. Statistics and Computing, 17(August):395–416, 2006. [20] Shuye Pu, Jessica Wong, Brian Turner, Emerson Cho, and Shoshana J Wodak. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Research, 37(3):825–831, 2009. [21] Seung Yon Rhee, Valerie Wood, Kara Dolinski, and Sorin Draghici. Use and misuse of the gene ontology annotations. Nature Reviews Genetics, 9(7):509–515, 2008. [22] B Schwikowski, P Uetz, and S Fields. A network of protein-protein interactions in yeast. Nature Biotechnology, 18(12):1257–1261, 2000. [23] Alexander J Smola and Risi Kondor. Kernels and regularization on graphs. Machine Learning, 2777:1–15, 2003. [24] Alexander Strehl and Joydeep Ghosh. Cluster ensemblesa knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(3):583–617, 2003. [25] Christian Von Mering, Roland Krause, Berend Snel, Michael Cornell, Stephen G Oliver, Stanley Fields, and Peer Bork. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417(6887):399–403, 2002. [26] Jianxin Wang, Min Li, Jianer Chen, and Yi Pan. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEEACM Transactions on Computational Biology and Bioinformatics, 8(3):607–620, 2011. [27] R Wang, S Zhang, Y Wang, X Zhang, and L Chen. Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures. Neurocomputing, 72(1-3):134–141, 2008. 660