Detecting Mutual Functional Gene Clusters from Multiple Related Diseases Nan Du∗ , Xiaoyi Li∗ , Yuan Zhang† and Aidong Zhang∗ ∗ Computer Science and Engineering Department State University of New York at Buffalo, Buffalo, U.S.A nandu,xiaoyili,azhang@buffalo.edu † College of Electronic Information and Control Engineering Beijing University of Technology Beijing, China zhangyuan@emails.bjut.edu.cn Abstract—Discovering functional gene clusters based on gene expression data has been a widely-used method that offers a tremendous opportunity for understanding the functional genomics of a specific disease. Due to its strong power of comprehending and interpreting mass of genes, plenty of studies have been done on detecting and analyzing the gene clusters for various diseases. However, more and more evidence suggest that human diseases are not isolated from each other. Therefore, it’s significant and interesting to detect the common functional gene clusters driving the core mechanisms among multiple related diseases. There are mainly two challenges for this task: first, the gene expression from each disease may contain noise; second, the common factors underlying multiple diseases are hard to detect. To address these challenges, we propose a novel deep architecture to discover the mutual functional gene clusters across multiple types of diseases. To demonstrate that the proposed method can discover precise and meaningful gene clusters which are not directly obtainable from traditional methods, we perform extensive experimental studies on both synthetic and real datasets - public gene-expression data of three types of cancers. Experimental results show that the proposed approach is highly effective in discovering the mutual functional gene clusters. I. I NTRODUCTION Gene cluster detection based on gene expression data is a widely-used method proven to be helpful to understand gene function, gene regulation, cellular processes, and subtypes of cells. This approach helps us further understand the functions of multiple genes for which information has not been previously available. Genes with similar expression patterns (co-expressed genes) are grouped together into a gene cluster, and these genes are likely to be involved in the same cellular processes [11]. These gene clusters may help us further understand the functions of genes for which information has not been previously available [5]. A variety of methods have been proposed or used in the microarray literature of detecting the gene clusters for various kinds of diseases. All these methods are proposed for clustering the genes under the same disease or the same situation. However, increasing evidence shows that some diseases are very likely to be related to each other. A report from National Cancer Institute [10] shows that the mutation of two commonly referred breast cancer genes BRCA 1 and BRCA 2 are associated with a significantly increased risk of ovarian cancer; [18] showed their findings to prove that individuals who have cancer, including women with uterine, ovarian, and breast cancers, are at a statistically significant increased risk of colorectal cancer; moreover, it has been reported that women with breast cancer have a significantly increased risk of developing a subsequent lung cancer [16]. Therefore, we believe that some core mechanisms or hidden factors may influence multiple related diseases simultaneously. Therefore, learning the mutual gene clusters shared among the diseases provides us not only a global view of human diseases, but also potentially new insights into the etiology and design of novel therapeutic interventions. By now, few unified mathematical model has been proposed to detect the mutual functional gene clusters across multiple diseases. To find out the mutual functional gene clusters across various types of diseases, the easiest way is to use the clustering ensemble methods, whose key idea is independently detect the gene clusters from each data source and then ensemble the multiple clustering results into a single consensus clustering. However, due to the fact that the gene expression data usually contain noise which may come from samples contamination, experimental design or measurement errors, the clustering result directly based on each specific dataset is usually not very reliable, let alone the ensemble result. In addition, among so many factors effect on a specific disease, only few of them are mutual factors shared also by the other diseases. It is easy to see that it would be very challenging to find out the mutual factors from the exclusive factors in an unsupervised literature. Recently, many efforts have been devoted to develop learning algorithms for deep learning methods such as Deep Belief Networks and Stacked Autoencoder, with impressive results obtained in application domains such as computer vision and natural language processing [6]. The extensive learning power of these models is suitable for our task. Thus, we propose a novel deep architecture which can effectively discover the mutual functional gene clusters among multiple diseases. Note that our work differs from existing deep learning approaches in that we develop an approach to detect mutual clusters across multiple sources while existing work focuses on single data source and targets at different problem. Our proposed architecture includes three layers, where each layer takes a specific responsibility: the first layer discovers the exclusive hidden factors that can well represent a specific disease; the second layer extracts the mutual hidden factors shared by the diseases; the third layer groups the genes into clusters based on the mutual hidden factors. The overall structure of the proposed deep architecture is shown in Fig. 1. Since the major goal of the proposed approach is to detect the mutual gene clusters across multiple related diseases, our approach is referred as Mutual Gene Cluster Detection - MGCD. Visible Units 1 ... ... Mutual Hidden Units m Cluster Units h 2 h 2 ... ... ... ... ... ... hC vC ... ... 1-st Layer 2-nd Layer 3-rd Layer Fig. 1: Illustration of Overall Structure. In summary, there are three main contributions of this paper: • V c ... c Fig. 2: Illustration of the First Layer’s Network Structure. we are aiming at finding K gene clusters shared among the diseases. 1 v W Hidden Units h v ... c H B. Single Disease Representation As we mentioned, each disease’s gene expression profiling is very likely to be influenced by some hidden factors. Thus, we discover the hidden factors for each specific disease in the first layer of our model. To better represent a specific disease (assume it is c-th disease) via hidden factors, we propose to use the Restricted Boltzmann Machine (RBM) , which constructs a set of visible units v c and a set of hidden units hc . In our case, c 1×V c the visible unit vector v ∈ ℜ represents each sample’s expression profilesc on a specific gene, and the hidden unit vector hc ∈ ℜ1×H represents the hidden factors that we want to learn. Intuitively, the goal of learning in the first layer is to learn the significance of all hidden factors given the observed data, so that the hidden factors can get close to the true hidden factors as much as possible. The distribution over v c and hc is through the following energy function: c We investigate the problem of detecting the mutual gene clusters across multiple related diseases. • A novel deep architecture MGCD is proposed, which is represented as a multilayer network. • Experiments on synthetic data sets show that mutual clusters are easily detected from multiple sources by the proposed method. On real cancer datasets, meaningful mutual functional gene clusters are detected and enrichment indeed suggests that the detected mutual gene clusters may reflect core mechanisms of cancers. II. M ETHODOLOGY In this section, we present our deep architecture MGCD for discovering the mutual gene clusters across multiple related diseases. E(v c , hc ) = − c V ∑ H ∑ c c vic wij − V ∑ i=1 j=1 c bci vic − i=1 H ∑ acj hcj , (1) i=1 where V c denotes the number of visible units v c , H c denotes c c 1×H c the number of hidden units h , a ∈ ℜ represents the bias c units for the hidden layer and bc ∈ ℜ1×V represents the bias units for the visible layer. Based on the above energy function, the probability distribution of visible and hidden units can be defined as: ∑ p(v c , hc ) = exp(−E(v c , hc )) , Z c (2) c where Z = vc ,hc exp(−E(v , h )) is a normalization constant that equals to the sum over all pairs of visible and hidden units. Because the RBM model is represented as a bipartite graph in Fig. 2, thus there are no direct connections between hidden units. A hidden unit is activated with the probability: c A. PROBLEM SETTING We are considering the problem of discovering the mutual gene clusters across multiple related diseases. To address this problem, we propose a deep architecture, whose goal is to group the genes into clusters. Therefore our task is summarized as follows: Suppose} we have a set of gene expression data { W = W 1 , ..., W C from C different types of diseases. Each gene expression data W c (1 ≤ c ≤ C) is represented as an N × S c expression matrix, where N denotes the gene number, S c denotes the number of samples for the c-th disease, and c each cell wij in W c is the measured expression level of i-th gene in j-th sample in the c-th disease. Note that although we assume the genes across different diseases (i.e. N) are the same, the samples from each specific disease (S c (1 ≤ c ≤ C)) could be different. Based on this set of gene expression matrices, p(hcj = 1|v c ) = σ(bcj + V ∑ c vic wij ), (3) i where σ(x) represents the logistic sigmoid function 1/1 + exp(−x). Assuming hcj is the j-th hidden factor for the expression in c-th disease, the activation probability P (hcj = 1|v c ) shows the significance this factor affects the observed data. Using binary states for the hidden units is helpful for avoiding unnecessary sampling noise. Similarly, a visible unit’s state can be represented by the hidden units as: c p(vic c = 1|h ) = σ(aci + H ∑ j c c wij hj ). (4) Suppose the model parameters are denoted as θ = {ac , bc , W c }, the RBM can be trained by minimizing the negative log-likelihood with respect to θ: L(θ) = − ∑ log ∑ p(v c , hc ). (5) L(h, z) = − hc [hj log zj + (1 − hj ) log(1 − zj )] . (9) j=1 C. Mutual Hidden Units Learning Although the genes may perform very differently in different diseases, there are some subtle common hidden factors shared among diseases which can help us accurately discover the mutual gene clusters. These common factors may be some informative genes, core mechanisms, genetic mutations or external stimulus that we do not know yet. From the first layer, we have learned the hidden factors with respect to each specific disease. Now, based on these disease-specific factors, we want to find out the mutual factors that are shared across the diseases. To identify the valid hidden factors shared among the diseases in an unsupervised literature, we train the second layer in a feature selection fashion. Instead of directly mapping the initial input vector h ∈ ℜ1×H where H is the sum of all the hidden units from the first layer (i.e. H = H 1 +H 2 +...+H C ) to a mutual hidden representation m ∈ ℜ1×M (as Fig. 3), we first corrupts h to a partially destroyed version e h by means of a stochastic mapping [21] and then use e h to train the model. Specifically, a fixed percentage e of units h are chosen randomly whose values is forced to be zero, while the others are left untouched. The main idea of randomly zeroing some factors is to randomly guess some factors are not the mutual factors. As we can see, the mutual hidden units m, which can be viewed as a lossy compression of h, is a distributed representation that captures the common factors shared among multiple diseases. D. Gene Cluster Detection In the second layer, we have learned the mutual hidden factors underlying multiple diseases. Now, in the third layer, based on the learned mutual factors, we group the genes into multiple clusters via a competitive learning network. Suppose we want to cluster the genes into K clusters, then K cluster units would be added. Each cluster unit corresponds to a specific cluster, and there are fully connections (i.e. W̃ ) between the cluster and mutual hidden units. The network structure of the third layer is shown as Fig. 4. It is worth noticing that the third layer’s network is trained based on a competitive rule. When a gene that is represented by the mutual hidden unit mi (1 ≤ i ≤ M ) comes, the cluster unit with the least squared Euclidean distance between the mutual units would be selected as Eq. 10: v M u u∑ ∥mj − w̃xj ∥2 . argminx = x|t (10) j=1 Then the edge weights connecting this cluster unit x will be updated as: ... M N ∑ ^ W ... 1 ... ... 2 H w̃ij (t + 1) = ... H H After e h is corrupted from h, the hidden mutual units m is mapped from e h as: m = fθ (e h) = σ(e hŴ + p), (6) { } parametrized by θ = Ŵ , p , where Ŵ denotes the weight matrix in the second layer and p represents the bias for the mutual hidden units. Then resulting m is mapped back to reconstruct vector z ∈ ℜ1×H as in the following equation: ′ z = gθ′ (m) = σ(mŴ + p ), θ∗ , θ ∗ where η ∈ (0, 1] is a learning rate which controls the speed of learning. On the one hand, the value of the learning rate should be sufficiently large enough to allow a fast learning process. On the other hand, it should also be small enough to guarantee its effectiveness. Thus, we have utilized the adaptive learning rate method [17] that automatically updates η following the rules that increasing when the current η is expected to be far away from the optimal, but decreasing when the distance is uncertain. From the way of updating the weights, we can see that there is a competition among the cluster units, where only the ‘winner’ (i.e. the cluster unit with the closest distance to the input) would be updated, and the remaining units’ weights stay the same. (7) { ′ ′} which is parametrized with θ = Ŵ , p . The parameters of this model are optimized to minimize the average reconstruction error: ′ ′ (11) c Fig. 3: Illustration of the Second Layer’s Network Structure. ′ { w̃ij (t) + η(mj − w̃ij (t)), if i == x w̃ij (t), if i ̸= x N 1 ∑ ( [i] [i] ) = arg minθ,θ′ L h ,z N i=1 (8) N ( ( ))) 1 ∑ ( [i] e . L h , gθ′ fθ h = arg minθ,θ′ N i=1 In Eq. 8, L denotes the reconstruction cross-entropy that represents h and z as vectors of bit Bernoullis probabilities: K ... ~ W M ... Fig. 4: Illustration of the Third Layer’s Network Structure. There are mainly two advantages of using a competitive network to detect the gene clusters: first, due to the learning rate adaptation, the convergence speed is improved; second, as a network-based method, it is easy to reconstruct the detected cluster units to the units in every previous layer. To be more This provides us not only the important insights into understanding and validating the gene clusters, but also the opportunity to see the specificity of each gene cluster for a certain disease. The cluster data that are reconstructed from p(hc |m, k) can be viewed as the centric of this gene cluster. Therefore, for a given gene cluster i, we know its member genes’ original expression profiling and its reconstructed centric, and then we can calculate its members distance to the reconstructed centric via Root Mean Square Error (RMSE) at each type of disease. Ideally, if a cluster’s members have a large distance to the reconstructed centric in a certain disease, this gene cluster is likely to be not so enriched in this disease. Although the detected gene clusters are believed to share among the diseases, some gene clusters may be more active at some specific diseases than others. Thus, knowing the specificity of the gene clusters corresponding to each disease helps us better understand the human diseases. III. E XPERIMENTS ON S YNTHETIC DATASET In this section, we conduct experiments on the synthetic data to perform quantitative analysis on the proposed method. A. Data Generation and Evaluation Metric The synthetic data are generated based on the assumption that there are some common clusters shared across the multiple data sources. Therefore, we generate 200 objects (i.e. genes), which are divided into five clusters of 40 objects each, and these objects are measured by some features (i.e. samples). If objects belong to the same cluster, their expression profiles over features are drawn from the same distribution. Following the same rule, we have generated four sources, each of which obtains the same number of genes but different number of samples. It is worth noticing that each cluster’s distribution may be different in different sources. Furthermore, α percent of samples are randomly chosen to be the ones receiving unreliable expression profiling, and their expression profilings are randomly shuffled and noises are added to their expression profiling. It is easy to see that the way of generating the synthetic data simulates the actual situation. Due to the way of generating the synthetic data, we know the label of each gene, thus we can directly assess the quality of the results by measuring the discrepancy between the predictive mutual clustering and the ground-truth labels. The evaluation metric of this experiment is Normalized Mutual Information (NMI) criterion [20]. Note that NMI obtains 0 when a random partition is given, and 1 when the clustering result has the same partition of the data with the ground-truth. B. Performance Comparison To demonstrate the effectiveness of the proposed model, we first introduce some baselines. As discussed in Section I, one intuitive way to find out the mutual gene clusters shared by various diseases is to use clustering ensemble. Thus, our first baseline is Instance-Based Graph Formulation (IBGF) [20], which first constructs a graph to measure the pairwise similarity among objects based on the clustering result from each source, and then IBGF uses this matrix in conjunction with graph partition. [20] also proposed another method Cluster-Based Graph Formulation (CBGF), which is used as the second baseline. Instead of measuring the similarity among the objects, CBGF measures the similarity among different clusters in a given ensemble and partitions the graph into groups so that the clusters of the same group correspond to one another. The third baseline is proposed by [7], which models both objects and clusters of the ensemble as vertices in a bipartite graph, and then partitions the objects and clusters simultaneously. Besides the graph-based clustering ensemble methods such as the three mentioned above, Ayad et al. [1] proposed a voting based clustering ensemble method Adaptive Cumulative Voting (Ada-cvote), which would be used as the fourth baseline. Nonnegative Matrix Factorization (NMF) has also been used to find out the common clusters from multiple sources [12], which is used as the fifth baseline. To better demonstrate MGCD’s noise resistance power, we increase the noise rate α from 20% to 70% with a step of 10%. In this experiment, we initialize the learning rate η = 0.01 and the corrupted parameter e = 10%. The experimental results of proposed method comparing with baselines measured with NMI are shown in Fig. 5. Note that, each result shown here is the average of 50 times performances. It is clear that, for every method, the performance of detecting the mutual gene clusters drops as the noise rate increases. As the figure shows, the proposed method performs consistently better than the other methods of every noise rate, which demonstrates that the proposed method has a strong noise resistance power. In addition, MGCD performs well at detecting the underlying mutual clusters among multiple sources, which is due to that the second layer in our proposed framework can extract the common factors effectively. 0.9 0.8 NMI specific, each gene cluster can be represented as a binary vector by the cluster units kj (1 ≤ j ≤ K) where the corresponding element is 1 and the remaining elements are 0. When this cluster unit backpropogates its value to each layer, we can learn how this cluster is represented by units from different layers: kj w̃T and kj w̃T ŵT denote the j-th gene cluster’s representation at the mutual hidden units and exclusive hidden units, respectively. Interestingly, when the cluster unit backpropagates to the visible units of each disease, we can know the standard gene expression representation of this gene cluster in each disease. 0.7 0.6 IBGF CBGF HBGF Ada−cvote NMF MGCD 0.5 0.4 0.3 0.2 0.3 0.4 α 0.5 0.6 0.7 Fig. 5: Comparison with Different Noise Rate on Synthetic Dataset. C. Parameters Sensitivity In this part, we show how the proposed method performs in various learning scenarios by tuning two variables: the number of hidden units for each disease and the number of mutual hidden units. First of all, we fix the number of mutual hidden units at 200, and then vary the number of hidden units as 10, 40, 100, 200 and 400, respectively. Although the number of hidden units for different diseases can be set with different values, for the sake of simplicity, we set them to be equal here. Fig. 6(a) shows the performance of the proposed method in terms of the number of hidden units. As we can see from the Figure, when the number of hidden units is extremely small, i.e., 10, the performance is poor. In addition, we can also find that the deviation is relatively larger when only 10 hidden units are used for each disease. When the number of hidden units further increases, the performance is stable and the deviation decreases. This suggests that each specific synthetic source can be well represented with 40 hidden units. Note that real-life data is more complicated than the synthetic data, so a larger number of hidden units should be chosen. Similarly, to demonstrate the influence with different number of mutual hidden units, we fix the number of hidden units at 40, and then vary the number of mutual hidden units as 10, 40, 100, 200 and 400. Fig. 6(b) shows the performance of the proposed method in terms of the number of mutual hidden units. As we can see, the performance is rather stable, indicating for multiple synthetic sources, 40 mutual hidden units is sufficient. But, in the real disease gene expression data, we may need more mutual hidden units. 0.92 0.9 NMI NMI 0.9 0.85 0.88 0.86 0.8 0.84 0.82 0.75 10 40 100 200 Number of Mutal Hidden Units (a) Hidden Units 400 10 40 100 200 400 Number of Hidden Units for Each Disease (b) Mutual Hidden Units Fig. 6: Performance of MGCD in terms of Varying Number of Hidden and Mutual Hidden Units. IV. E XPERIMENTS ON R EAL DATASET In Section III, we demonstrated that the proposed approach is effective in discovering the mutual clusters shared by multiple sources. In this section, we apply the proposed method on the real cancer datasets and show the meaningful mutual gene clusters detected by it. A. Data Set Microarray gene expression data are collected from three different cancer types, including breast, prostate and lung cancer. The breast dataset was collected form 24 primary breast tumor patients, who were divided into two diagnostic categories based on the patient’s response to neoadjuvant treatment (sensitive or resistant) [3]; the prostate data set [19] includes the gene expression measurements for 52 prostate tumor patients; the lung data set [2] contains the gene expression information on 186 lung tissue samples. The reason of using these three types of cancers as the target diseases is because some previous studies have shown that these diseases are related to each other [16], [8]. Systematic analysis of mutual gene clusters provides important insights into the cellular defects of cancer. Therefore, it is important and interesting to find out the underlying core mechanisms of these three related diseases. B. Results Based on these three cancer datasets, we have detected 50 mutual gene clusters (i.e. K=50) via the proposed approach. After the gene clusters whose memberships are less than three are filtered out, we get 48 gene clusters whose quantities are in the range from 3 to 276. Moreover, we first calculate the average distance between each gene cluster’s members (i.e. raw gene expression profiling) to its corresponding reconstructed disease related pattern (i.e. reconstructed gene expression profiling, mentioned in Section II-D), and then we sort these clusters in an ascending order based on their average distance as a ranking list. The average distance measures how truly a gene cluster can reflect a specific disease. Therefore, the higher position a gene cluster obtains in the ranking list, the more confidently we believe that it is a mutual gene cluster shared among these three kinds of cancers. A widely-used way to analyze the gene cluster is subdividing them into functional categories for biological interpretation, which is most commonly accomplished using Gene Ontology (GO) categories. The GO provides biologists a list of gene annotations which are used as inferences for understanding the genes community biological functions instead of investigating each gene individually. When GO is used on the mutual gene clusters, we can find the strongest and the most significant gene functions that influence these types of cancers. Due to the page limit, Table I only shows the top 10 significant Biological Process (BP) gene annotations that have the lowest p-value for the top three stable gene clusters detected by the proposed approach. Note that, all these gene annotations are calculated by DAVID [9] and all of them are less than 5.74E03, which is far less than the general threshold 0.05. Thus, all the gene annotations from the detected mutual gene clusters are considered strongly enriched in the annotation categories. Besides the validation from the gene annotations’ p-value, we have also found some evidence on their relationships to the corresponding type of cancer. There are three information sources are mainly considered: 1) The Genes-to-Systems Breast Cancer (G2SBC) Database [14] is a bioinformatics resource that collects and integrates data about genes, transcripts, proteins and ontologies which have been reported in literature to be altered in breast cancer cells; 2) The cancer miRNA regulatory network (CMRN) was constructed by inferring miRNA mediated regulation from different cancer transcriptome profiling studies, which also provides the gene enrichment detected for cancers in previous studies; 3) Brig Mecham et al. used the Gene Information Content [13] statistic to measure the amount of activity for each gene in the several prostate datasets, and 500 GO terms that are likely to be related to prostate cancer are listed based on their findings. For each annotation, the evidence for its relationship with one of the target cancers are also listed in Table I. From this table, we can see that each detected gene cluster has some gene annotations that have been proved related to each of the cancers. In other words, it means that the detected mutual gene clusters are very likely to be really ‘shared’, at least on the views of gene annotations, by the related three cancers. V. R ELATED W ORK Broadly speaking, our work is related to clustering ensemble with various strategies such as graph-based methods [20], [7] and voting-based method [1]. All of these studies R EFERENCES TABLE I: Functional Description of Three Mutual Gene Clusters Cluster Cluster 1 Cluster 2 Cluster 3 GO term GO:0019226 GO:0048878 GO:0007268 GO:0006873 GO:0055082 GO:0008284 GO:0042127 GO:0035150 GO:0050880 GO:0051270 GO:0003018 GO:0030334 GO:0042311 GO:0007423 GO:0035295 GO:0060429 GO:0001568 GO:0001944 p-value 6.16E-04 6.89E-04 9.46E-04 9.86E-04 1.1E-03 1.55E-04 2.07E-04 2.15E-04 2.15E-04 3.33E-04 3.54E-04 4.60E-04 7.81E-04 1.04E-05 5.42E-04 6.39E-04 9.52E-04 1.07E-03 Breast [14] [14] [14] [14] Lung Prostate [13] [13] [15] [15] [1] [2] [13] [13] [14] [14], [15] [15] [14] [14] [14] [14] [15] [15] [15] [15] [15] [15] [13] [3] [13] [13] [13] [4] [13] [5] [15] [15] focus on combining multiple base clustering results to generate a stable and robust consensus clustering result. However, as we discussed in Section I, these methods’ performance highly depends on the base clustering results. Since in our case, the raw gene expression data contain noise, thus the base clustering results are also unreliable. In addition, the experiments on the synthetic dataset in Section III have also demonstrated that, the clustering ensemble based method cannot discover the underlying mutual gene clusters under high noise rate. [6] [7] [8] [9] Besides the topics mentioned above, Xu et al. [22] made a comparative study on identifying the genetic and serum markers for multiple cancer types based on microarray gene expression data. Chen et al. [4] estimated expression pattern similarities between several different tumor tissues and their corresponding normal tissues. Different from these studies, which focus on detecting the mutual information at the gene level (i.e. informative gene, bio-marker or differentially expressed genes), we focus on discovering the mutual information at the cluster level. VI. [10] [11] [12] [13] [14] C ONCLUSION [15] In this paper, we introduced a new problem of detecting the mutual gene clusters from several related diseases’s gene expression data. These mutual gene clusters reflected that the core mechanisms or hidden factors may influence across the related diseases. To handle this problem, we proposed a novel deep architecture MGCD, which is represented as a multilayer network. In such a network, exclusive hidden factors for each disease are discovered in the first layer, and then the mutual hidden factors are extracted from the exclusive factors in the second layer. Finally, the mutual gene clusters are detected in the third layer. Our extensive experimental analysis demonstrated that the proposed method is effective on the synthetic datasets. Case studies on three real cancer datasets showed that meaningful and interesting mutual gene clusters can be revealed by the proposed method. [16] [17] [18] [19] [20] [21] VII. ACKNOWLEDGMENTS The materials published in this paper are partially supported by the National Science Foundation under Grants No. 1218393, No. 1016929, and No. 0101244. [22] H. G. Ayad and M. S. Kamel. Cumulative voting consensus method for partitions with variable number of clusters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(1):160–173, 2008. A. Bhattacharjee, W. G. Richards, J. Staunton, and et al. Classification of human lung carcinomas by mrna expression profiling reveals distinct adenocarcinoma subclasses. Proceedings of the National Academy of Sciences of the United States of America, 98(24):13790–13795, 2001. J. C. Chang, E. C. Wooten, A. Tsimelzon, and et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lance, 362(9381):362–369, 2003. M. Chen, J. Xiao, Z. Zhang, J. Liu, J. Wu, and J. Yu. Identification of human hk genes and gene expression regulation study in cancer from transcriptomics data analysis. PLoS ONE, 8:e54082, 01 2013. M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America, 95(25):14863–14868, 1998. D. Erhan, A. Courville, and P. Vincent. Why does unsupervised pretraining help deep learning ? Journal of Machine Learning Research, 11(2007):625–660, 2010. X. Z. Fern and C. E. Brodley. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of the twenty-first international conference on Machine learning, ICML ’04, pages 36– , New York, NY, USA, 2004. ACM. J. H. Hankin, L. P. Zhao, L. R. Wilkens, and L. N. Kolonel. Attributable risk of breast, prostate, and lung cancer in hawaii due to saturated fat. Cancer causes control CCC, 3(1):17–23, 1992. D. W. Huang, B. T. Sherman, and R. A. Lempicki. Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols, 4(1):44–57, 2009. N. C. Institute. Genetics of breast and ovarian cancer: Peutz-jeghers syndrome, Aug. 2006. D. Jiang, C. Tang, and A. Zhang. Cluster analysis for gene expression data: A survey. IEEE Trans. on Knowl. and Data Eng., 16:1370–1386, 2004. C. M. Lee, M. A. V. Mudaliar, and et al. Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology. PloS one, 7(12):e48238, 2012. B. Mecham. Top500 go terms of prostate cancer, 2 2011. E. Mosca, R. Alfieri, I. Merelli, F. Viti, A. Calabria, and L. Milanesi. A multilevel data integration resource for breast cancer study. BMC Systems Biology, 4(1):1–11, 2010. C. L. Plaisier, M. Pan, and N. S. Baliga. A mirna-regulatory network explains how dysregulated mirnas perturb oncogenic processes across diverse cancers. Genome Research, (206):gr.133991.111–, 2012. M. Prochazka and et al. Lung cancer risks in women with previous breast cancer. Annals of oncology official journal of the European Society for Medical Oncology ESMO, 285(1):3090–3091, 2002. R. Ranganath, C. Wang, B. David, and E. Xing. An adaptive learning rate for stochastic variational inference. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), volume 28, pages 298–306, May 2013. R. E. Schoen, J. L. Weissfeld, and L. H. Kuller. Are women with breast, endometrial, or ovarian cancer at increased risk for colorectal cancer? American Journal of Gastroenterology, 89(6):835–842, 1994. D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, and et al. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 1(2):203–209, 2002. A. Strehl and J. Ghosh. Cluster ensembles — a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3:583–617, Mar. 2003. P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. Proceedings of the 25th International Conference on Machine Learning (2008), 307(July):1096–1103, 2008. K. Xu, J. Cui, V. Olman, Q. Yang, D. Puett, and Y. Xu. A comparative analysis of gene-expression data of multiple cancer types. PLoS ONE, 5:e13696, 10 2010.