Deep Neural Network for Supervised Inference of Gene Regulatory Network Meroua Daoudi1(&) and Souham Meshoul2 1 Computer Science Dept, Constantine 2 University, Ali Mendjeli, Algeria meroua.daoudi@univ-constantine2.dz 2 MISC Laboratory, Constantine 2 University, Ali Mendjeli, Algeria souham.mesoul@univ-constantine2.dz Abstract. Inferring gene regulatory network from gene expression data is a challenging task in system biology. Elucidating the structure of these networks is a machine-learning problem. Several approaches have been proposed to address this challenge using unsupervised semi-supervised and supervised methods. Semi-supervised and supervised methods use primordially SVM. Most supervised approaches infer local model where each local model is associated with one TF. In this work, we propose a global model to infer gene regulatory networks from experimental data using deep neural network architecture. We evaluate our method on DREAM4 multifactorial datasets. The obtained results show that prediction accuracy using deep neural network outperform SVM in all tested data. Keywords: Deep neural network Gene regulatory network Machine learning Supervised learning SVM 1 Introduction Inferring gene regulatory network from gene expression data is an active field of research. The current interest of the molecular biology is to deepen the knowledge of the genomes. A way to understand the organism is to know the function of each gene as well as the interaction between them. Now the characterization of gene regulatory networks (GRN) is strongly supported by the scaling up of experimental methods in molecular biology. In particular, microarray technology measure changes in the expression of thousands of genes simultaneously. Inferring (GRN) from expression data make it possible to understand the relations ships between transcription factor (TF) and target genes [1]. Learning the structure of gene regulatory networks from expression data is a machine-learning problem [2]. Several approaches including supervised, unsupervised and semi-supervised techniques. There four main categories in unsupervised models can be distinguished such as Boolean model, Bayesian model, differential equation model and information theory models. The Boolean network is one of the simplest, edges are represented by Boolean function to model the interaction between genes and the state of a gene activity is represented by a binary variable. Active if the expression level of the gene is above a certain threshold, inactive otherwise. A REVerse © Springer Nature Switzerland AG 2019 S. Chikhi et al. (Eds.): MISC 2018, LNNS 64, pp. 149–157, 2019. https://doi.org/10.1007/978-3-030-05481-6_11 150 M. Daoudi and S. Meshoul Engineering Algorithm REVAL [3] use the Boolean model to infer gene regulatory networks from expression data. Bayesian models are among the most effective to infer GRN, they make use of Bayes rules and consider gene expression as a random variable. For example, Friedman et al. [4] have introduced a framework for discovering the interaction between genes using Bayesian networks. Differential equation the most widely used class of dynamical models. It takes into consideration the change of concentration of metabolites over time [5]. In information theoretic models, the most proposed approach uses mutual information to capture complex regulatory relation including Relevance Network [6], Context likelihood of relatedness CLR [7] and ARACNE an algorithm for the reconstruction of gene regulatory networks [8]. The advantage of unsupervised methods is that they do not need any information about the system, but they are less efficient. Recently due to the identification of a large number of interactions between the transcription factor and target gene. Several supervised and semi-supervised approaches have been proposed to infer gene regulatory network. There are two approaches in semi-supervised learning. The first approach learns from only positive Data and the second approach learns from both positive and unlabeled data. Using only positive data, a semi-supervised approach is proposed by Cerulo et al. [9]. This method works under the assumption that all the positive examples are randomly sampled from a uniform distribution. By another side in [1, 10, 11] the authors propose a semisupervised approach to learn from positive and unlabeled data. Nihir Patel in [1] propose an iterative approach using random Forest (FR) and SVM to predict regulation of each TF with self-training. In [10, 11] authors use clustering techniques to extract reliable negative example from unlabeled data. Proposed supervised approaches use primordially SVM such as sirene [12] and compareSVM [13]. sirene decomposes the problem of gene regulatory network inference into a large number of binary classification problems, each sub problem is associated to on TF, and SVM is used to predict GRN. Where compareSVM is a tool that compares four SVM kernel functions: linear, Gaussian, sigmoid and polynomial kernels, including three steps: optimization, comparison and prediction. Supervised methods require gene expression data and known regulation ship between genes, but they are more accurate than unsupervised and semi-supervised methods. They can be trained efficiently even when only a portion of interaction is known, as shown by Maetschke et al. in [14]. Figure 1 shows the classification of the methods proposed in the literature according to the learning approach used. SVM classifier is widely used due to the efficiency of obtained results in gene regulatory networks [12, 13]. In the other side, the deep neural network is a powerful model inspired by the neural network of the brain that has been improved by the high performance in a wide range of applications [15] deep learning has been applied successfully to solve several prediction problems in bioinformatics [16]. Most supervised approaches infer local model where each local model is associated with one TF. In this work, we propose a global model to infer gene regulatory networks. In this context, we propose a deep neural network architecture to infer gene regulatory networks from DREAM4 multifactorial subchallenge, which are designed for the structure of a large scale GRN. The results show that the obtained results outperform SVM in all tested data. Deep Neural Network for Supervised Inference 151 Fig. 1. Classification of proposed methods in literature according to the learning approach used 2 System and Method 2.1 Data Classification The main purpose of the classification is to define rules for classifying objects in classes based on qualitative or quantitative variables characterizing these objects [17]. Initially, we have learning samples whose classification is known. These samples are used for learning classification rules. However, before applying these classification rules, they must be evaluated and for this task, a second independent sample, called validation or test is often used. And finally, these rules will be used to classify new or unknown objects. In another word, classification is part of the predictive techniques whose classes are predetermined. The process consists in analyzing the characteristics of a newly presented element in order to assign it to a class of a predefined set. The classification has been applied to numerous fields and applications such as medical diagnosis, image processing, agriculture, chemistry, geology, and automatic document processing. Indeed, there are several methods and algorithms used for classification such as linear and logistic regression, naïve Bayes, decision tree, k-nearest neighbour algorithm, support vector machines, and artificial neural network. 2.2 Deep Neural Network Deep learning is a form of artificial intelligence which derived from machine learning. Thus, deep learning architectures are based on Artificial Neural Networks that are inspired by the neurons of the human brain. They are composed of several artificial neurons connected to each other [18]. So, the higher the number of neurons, the deeper the network. The power of today’s computers and the explosion of data accessible have increased the effectiveness of deep learning. Over the last decade, deep learning has made progress advanced in many areas such as image recognition, speech processing 152 M. Daoudi and S. Meshoul and natural language processing. Furthermore, Deep Neural Network consists of a succession of fully connected layers whose consist on input layer, more than one hidden layer and an output layer. Its parameters are optimized by minimizing the misclassification error on the training datasets [15]. Then, each layer contains a set of neurons that apply an activation function, often nonlinear, at its input to produce an output. 2.3 DNN for Gene Regulatory Network Inference The inference of gene regulatory network using a supervised model aims to develop a model M that help to predict the relationship between a transcription factor and the target genes. The principle idea of supervised methods that if two genes have a relationship, then any other two genes having similar expression profile also likely to interact with each other [12] under this principle, we concatenate the features vectors of TF and target genes to construct new feature vectors that contained two twice the number of feature of genes. Most supervised approaches infer local model where each local model is associated with one TF. In this work, we propose a global model to infer gene regulatory networks. The algorithm provided below can outline the proposed approach. Algorithm: DNN for GRN inference Input: Exp, Regul For each gene Gi, Gj in R : Extraire expression profile Exp(Gi) Extraire expression profile Exp(Gi) Exp (Gi,Gj )= concatenate (Exp(Gi),Exp(Gj)) If R( Gi, Gj ) is positive then Exp (Gi,Gj ) is labled as 1 Else Exp (Gi,Gj ) is labled as 0 auc = TrainDNN(Exp, labels) output : auc Where the algorithm receives as input the expression file, Exp and regulation file Regul. For each pair Gi, Gj belongs to regul, the expression profile of both Gi, Gj is extracted from the expression file Exp, and a new feature vector is then created. Let for Deep Neural Network for Supervised Inference 153 example Exp1 be the expression value of gene G1 and Exp2 the expression value of gene G2, the concatenated vector of regulation G1-G2 is shown in Fig. 2. Exp1 f1,1 f1,2 f1,1 f1,3 f1,4 f1,2 f1,3 f2,1 Exp2 f1,4 f2,1 f2,2 f2,2 f2,3 f2,3 f2,4 f2,4 New feature vector Fig. 2. Resulting feature vector of regulation G1-G2 Note that the direction of the regulation is important, which means that the resulting feature means that gene G1 regulate gene G2 and not gene G2 regulate gene G1. Next, if the regulation is labelled as positive, then the resulting vector is labelled by 1 otherwise 0. Then a deep neural network is performed to differentiate between positives and negatives regulations. The proposed model is shown in Fig. 3. Output layer Sigmoid Relu Relu Relu Hidden Layers Input layer Interacting Non-Interacting Fig. 3. The architecture of the proposed deep neural network for gene regulatory network inference As shown in Fig. 3, we propose a fully connected neural network, which encompasses three hidden layers. Each neural is connected with all neurons in the previous layer, the input layer contains as a number of neurons the number of features in the concatenated vector which mean two twice the number of the features of genes. In this case, each concatenated vector contains 200 feature where each gene has 100 conditions. The hidden layer contains also the same number of neurons, which equal to 200. We have used as activation function the RELU function described as: RELU ð xÞ ¼ maxðx; 0Þ 154 M. Daoudi and S. Meshoul Relu function is the most used function in neural networks. As the classification here is binary, the choice then is to use the sigmoid function because we predict a probability of an output. Where the probability range between zero and one. The sigmoid function can be described as: Sigmoid ð xÞ ¼ 1 1 þ ex Once the global model is trained, new regulations Gi-Gi can be predicted as positives if there is a relationship between the gene Gi and the gene Gi, and as negative if there is no regulation. 3 Experimental Study 3.1 Datasets We evaluated the proposed approach using the DREAM4 multifactorial datasets. This challenge aims to infer five networks from Multifactorial perturbation data where each of them contained 100 genes and 100 samples. In the multifactorial experiment, a small number of genes are perturbed simultaneously. The data was simulated by GeneNetWeaver [19]. The topology of these networks was derived from the transcriptional regulatory system of E.coli and S.cerevisiae 3.2 Results To evaluate the effectiveness of the developed inference method, tests are performed on five different benchmark of DREAM4 multifactorial subchallenge, which are designed for the structure of a large scale GRN. We measured the prediction accuracy by the area under the Receiver Operator characteristic curve (AUC). We adopt a cross validation procedure to make sure that the performance of the model is measured on prediction. We compare our proposed model with SVM using the same procedure. We use 5 cross validation and the prediction of all folds are averaged. For SVM we use the Gaussian kernel that founded the best option for prediction of GRN from microarray data and it has high accuracy and less standard derivation as shown by Gillani et al. [13]. Table 1 shows the obtained results for the proposed DNN model and SVM. In addition, to assess the effectiveness of the proposed method, the roc curve obtained by the proposed DNN and SVM are shown in Figs. 4, 5, 6, 7 and 8. Figures 4, 5, 6, 7 and 8 Table 1. AUROC for DREAM4 multifactorial challenge Network Network Network Network Network Network 1 2 3 4 5 Proposed DNN SVM 0.82 0.75 0.87 0.81 0.70 0.67 0.76 0.73 0.78 0.77 Deep Neural Network for Supervised Inference 155 show the plot of ROC curves (AUC) of averaged AUC for all folds and AUC results obtained in each fold of cross validation procedure. The obtained results show that the prediction accuracy in term of averaged AUC of the proposed DNN is better than the averaged AUC obtained using SVM. Which means that the prediction of regulation between genes is more accurate using the proposed method comparing with SVM. SVM DNN Fig. 4. Network 1 SVM DNN Fig. 5. Network 2 SVM DNN Fig. 6. Network 3 156 M. Daoudi and S. Meshoul DNN SVM Fig. 7. Network 4 DNN SVM Fig. 8. Network 5 4 Conclusion Several approaches have been proposed to infer gene regulatory network including unsupervised, semi-supervised and supervised methods. The most of the proposed supervised models are based on SVM. In this work, we propose a DNN model to infer the GRNS. With the aim to predict new regulations that containing similar expressions profiles and differentiate between negatives regulations and positives ones. The method was compared with SVM and the obtained results show that DNN outperforms SVM even in small datasets. As ongoing work, we intend to integrate others type of data presented in the same challenge and test the proposed model on large datasets. References 1. Patel, N., Jason, T.L.Wang: Semi-supervised prediction of gene regulatory networks using machine learning algorithms. J. Biosci. 40(4), 731–740 (2015) 2. Ristevski, B.: A survey of models for inference of gene regulatory networks. Nonlinear Anal. Model. Control. 18(4), 444–465 (2013) Deep Neural Network for Supervised Inference 157 3. Liang, S., Fuhrman, S., Somogyi, R.: REVAL a general reverse engineering algorithm for inference of genetic network architectures. In: Pacific Symposium on Biocomputing, vol. 3, pp. 18–19 (1998) 4. Freidman, N., et al.: Using Bayesian network to analyze expression data [J]. Comput. Biologie 7, 601–620 (1996) 5. Jiguo, C., et al.: Modeling gene regulation network using ordinary differential equation. In: Next Generation Microarray. Bioinformatics, pp. 185–197 (2012) 6. Butte, A.J., Kohane, I.S.: Mutual information relevance networks, functional genomic clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing, pp. 418–429 (2000) 7. Meyer, PE., Kontos, K., Lafitte, F., Bontempi, G.: Information-theoretic inference of large transcriptional regulatory networks. EURASIP J. Bioinform. Syst. Biology (2007) 8. Margolin, A.A., et al.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinform. 7(1), S7 (2006) 9. Curelo, L., et al.: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinform (2010) 10. Jisha, A., Jereech, A.S.: Gene regulatory network: a semi supervised approach. In: International Conference on Electronics Communication and Aerospace Technology ICECA (2017) 11. Sasmita, R., et al.: Handling unlabeled data in gene regulatory network. In: Proceeding of International Conference on Frontiers of Intelligence Computing AISC 199, pp. 113–120. Springer, Heidelberg (2013) 12. Mordelet, F., Vert, J.P.: SIRENE: supervised inference of regulatory network. Bioinformatics 24, i76–i82 (2008) 13. Gillani, Z., et al.: Compare SVM: supervised support vector machine (SVM) inference of gene regularity network. BMC Bioinform. (2014) 14. Maetschke, S.R., et al.: Supervised, semi supervised and unsupervised inference of gene regulatory networks. Brief. Bioinform. 15(2), 195–211 (2013) 15. Buduma, N.: Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence Algorithms. O’Reilly Media, Boston (2017) 16. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851– 869 (2017) 17. Bramer, M.: Principle of Data Mining. Springer, London (2016) 18. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015) 19. Shaffer, T., et al.: GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2263–2270 (2011)