Uploaded by damtruong1202

DL in Gene regulatory network

advertisement
Deep Neural Network for Supervised Inference
of Gene Regulatory Network
Meroua Daoudi1(&) and Souham Meshoul2
1
Computer Science Dept, Constantine 2 University, Ali Mendjeli, Algeria
meroua.daoudi@univ-constantine2.dz
2
MISC Laboratory, Constantine 2 University, Ali Mendjeli, Algeria
souham.mesoul@univ-constantine2.dz
Abstract. Inferring gene regulatory network from gene expression data is a
challenging task in system biology. Elucidating the structure of these networks
is a machine-learning problem. Several approaches have been proposed to
address this challenge using unsupervised semi-supervised and supervised
methods. Semi-supervised and supervised methods use primordially SVM. Most
supervised approaches infer local model where each local model is associated
with one TF. In this work, we propose a global model to infer gene regulatory
networks from experimental data using deep neural network architecture. We
evaluate our method on DREAM4 multifactorial datasets. The obtained results
show that prediction accuracy using deep neural network outperform SVM in all
tested data.
Keywords: Deep neural network Gene regulatory network
Machine learning Supervised learning SVM
1 Introduction
Inferring gene regulatory network from gene expression data is an active field of
research. The current interest of the molecular biology is to deepen the knowledge of
the genomes. A way to understand the organism is to know the function of each gene as
well as the interaction between them. Now the characterization of gene regulatory
networks (GRN) is strongly supported by the scaling up of experimental methods in
molecular biology. In particular, microarray technology measure changes in the
expression of thousands of genes simultaneously. Inferring (GRN) from expression
data make it possible to understand the relations ships between transcription factor
(TF) and target genes [1].
Learning the structure of gene regulatory networks from expression data is a
machine-learning problem [2]. Several approaches including supervised, unsupervised
and semi-supervised techniques. There four main categories in unsupervised models
can be distinguished such as Boolean model, Bayesian model, differential equation
model and information theory models. The Boolean network is one of the simplest,
edges are represented by Boolean function to model the interaction between genes and
the state of a gene activity is represented by a binary variable. Active if the expression
level of the gene is above a certain threshold, inactive otherwise. A REVerse
© Springer Nature Switzerland AG 2019
S. Chikhi et al. (Eds.): MISC 2018, LNNS 64, pp. 149–157, 2019.
https://doi.org/10.1007/978-3-030-05481-6_11
150
M. Daoudi and S. Meshoul
Engineering Algorithm REVAL [3] use the Boolean model to infer gene regulatory
networks from expression data. Bayesian models are among the most effective to infer
GRN, they make use of Bayes rules and consider gene expression as a random variable.
For example, Friedman et al. [4] have introduced a framework for discovering the
interaction between genes using Bayesian networks. Differential equation the most
widely used class of dynamical models. It takes into consideration the change of
concentration of metabolites over time [5]. In information theoretic models, the most
proposed approach uses mutual information to capture complex regulatory relation
including Relevance Network [6], Context likelihood of relatedness CLR [7] and
ARACNE an algorithm for the reconstruction of gene regulatory networks [8]. The
advantage of unsupervised methods is that they do not need any information about the
system, but they are less efficient.
Recently due to the identification of a large number of interactions between the
transcription factor and target gene. Several supervised and semi-supervised approaches have been proposed to infer gene regulatory network. There are two approaches in
semi-supervised learning. The first approach learns from only positive Data and the
second approach learns from both positive and unlabeled data. Using only positive
data, a semi-supervised approach is proposed by Cerulo et al. [9]. This method works
under the assumption that all the positive examples are randomly sampled from a
uniform distribution. By another side in [1, 10, 11] the authors propose a semisupervised approach to learn from positive and unlabeled data. Nihir Patel in [1]
propose an iterative approach using random Forest (FR) and SVM to predict regulation
of each TF with self-training. In [10, 11] authors use clustering techniques to extract
reliable negative example from unlabeled data.
Proposed supervised approaches use primordially SVM such as sirene [12] and
compareSVM [13]. sirene decomposes the problem of gene regulatory network
inference into a large number of binary classification problems, each sub problem is
associated to on TF, and SVM is used to predict GRN. Where compareSVM is a tool
that compares four SVM kernel functions: linear, Gaussian, sigmoid and polynomial
kernels, including three steps: optimization, comparison and prediction. Supervised
methods require gene expression data and known regulation ship between genes, but
they are more accurate than unsupervised and semi-supervised methods. They can be
trained efficiently even when only a portion of interaction is known, as shown by
Maetschke et al. in [14]. Figure 1 shows the classification of the methods proposed in
the literature according to the learning approach used.
SVM classifier is widely used due to the efficiency of obtained results in gene
regulatory networks [12, 13]. In the other side, the deep neural network is a powerful
model inspired by the neural network of the brain that has been improved by the high
performance in a wide range of applications [15] deep learning has been applied
successfully to solve several prediction problems in bioinformatics [16]. Most supervised approaches infer local model where each local model is associated with one TF.
In this work, we propose a global model to infer gene regulatory networks. In this
context, we propose a deep neural network architecture to infer gene regulatory networks from DREAM4 multifactorial subchallenge, which are designed for the structure
of a large scale GRN. The results show that the obtained results outperform SVM in all
tested data.
Deep Neural Network for Supervised Inference
151
Fig. 1. Classification of proposed methods in literature according to the learning approach used
2 System and Method
2.1
Data Classification
The main purpose of the classification is to define rules for classifying objects in classes
based on qualitative or quantitative variables characterizing these objects [17]. Initially,
we have learning samples whose classification is known. These samples are used for
learning classification rules. However, before applying these classification rules, they
must be evaluated and for this task, a second independent sample, called validation or
test is often used. And finally, these rules will be used to classify new or unknown
objects. In another word, classification is part of the predictive techniques whose
classes are predetermined. The process consists in analyzing the characteristics of a
newly presented element in order to assign it to a class of a predefined set. The
classification has been applied to numerous fields and applications such as medical
diagnosis, image processing, agriculture, chemistry, geology, and automatic document
processing. Indeed, there are several methods and algorithms used for classification
such as linear and logistic regression, naïve Bayes, decision tree, k-nearest neighbour
algorithm, support vector machines, and artificial neural network.
2.2
Deep Neural Network
Deep learning is a form of artificial intelligence which derived from machine learning.
Thus, deep learning architectures are based on Artificial Neural Networks that are
inspired by the neurons of the human brain. They are composed of several artificial
neurons connected to each other [18]. So, the higher the number of neurons, the deeper
the network. The power of today’s computers and the explosion of data accessible have
increased the effectiveness of deep learning. Over the last decade, deep learning has
made progress advanced in many areas such as image recognition, speech processing
152
M. Daoudi and S. Meshoul
and natural language processing. Furthermore, Deep Neural Network consists of a
succession of fully connected layers whose consist on input layer, more than one
hidden layer and an output layer. Its parameters are optimized by minimizing the
misclassification error on the training datasets [15]. Then, each layer contains a set of
neurons that apply an activation function, often nonlinear, at its input to produce an
output.
2.3
DNN for Gene Regulatory Network Inference
The inference of gene regulatory network using a supervised model aims to develop a
model M that help to predict the relationship between a transcription factor and the
target genes. The principle idea of supervised methods that if two genes have a relationship, then any other two genes having similar expression profile also likely to
interact with each other [12] under this principle, we concatenate the features vectors of
TF and target genes to construct new feature vectors that contained two twice the
number of feature of genes. Most supervised approaches infer local model where each
local model is associated with one TF. In this work, we propose a global model to infer
gene regulatory networks. The algorithm provided below can outline the proposed
approach.
Algorithm: DNN for GRN inference
Input: Exp, Regul
For each gene Gi, Gj in R :
Extraire expression profile Exp(Gi)
Extraire expression profile Exp(Gi)
Exp
(Gi,Gj
)=
concatenate
(Exp(Gi),Exp(Gj))
If R( Gi, Gj ) is positive then
Exp (Gi,Gj ) is labled as 1
Else
Exp (Gi,Gj ) is labled as 0
auc = TrainDNN(Exp, labels)
output : auc
Where the algorithm receives as input the expression file, Exp and regulation file
Regul. For each pair Gi, Gj belongs to regul, the expression profile of both Gi, Gj is
extracted from the expression file Exp, and a new feature vector is then created. Let for
Deep Neural Network for Supervised Inference
153
example Exp1 be the expression value of gene G1 and Exp2 the expression value of
gene G2, the concatenated vector of regulation G1-G2 is shown in Fig. 2.
Exp1
f1,1
f1,2
f1,1
f1,3
f1,4
f1,2
f1,3
f2,1
Exp2
f1,4
f2,1
f2,2
f2,2
f2,3
f2,3
f2,4
f2,4
New feature vector
Fig. 2. Resulting feature vector of regulation G1-G2
Note that the direction of the regulation is important, which means that the resulting
feature means that gene G1 regulate gene G2 and not gene G2 regulate gene G1. Next,
if the regulation is labelled as positive, then the resulting vector is labelled by 1
otherwise 0. Then a deep neural network is performed to differentiate between positives
and negatives regulations. The proposed model is shown in Fig. 3.
Output layer
Sigmoid
Relu
Relu
Relu
Hidden Layers
Input layer
Interacting
Non-Interacting
Fig. 3. The architecture of the proposed deep neural network for gene regulatory network
inference
As shown in Fig. 3, we propose a fully connected neural network, which encompasses three hidden layers. Each neural is connected with all neurons in the previous
layer, the input layer contains as a number of neurons the number of features in the
concatenated vector which mean two twice the number of the features of genes. In this
case, each concatenated vector contains 200 feature where each gene has 100 conditions. The hidden layer contains also the same number of neurons, which equal to 200.
We have used as activation function the RELU function described as:
RELU ð xÞ ¼ maxðx; 0Þ
154
M. Daoudi and S. Meshoul
Relu function is the most used function in neural networks. As the classification
here is binary, the choice then is to use the sigmoid function because we predict a
probability of an output. Where the probability range between zero and one. The
sigmoid function can be described as:
Sigmoid ð xÞ ¼
1
1 þ ex
Once the global model is trained, new regulations Gi-Gi can be predicted as positives if there is a relationship between the gene Gi and the gene Gi, and as negative if
there is no regulation.
3 Experimental Study
3.1
Datasets
We evaluated the proposed approach using the DREAM4 multifactorial datasets. This
challenge aims to infer five networks from Multifactorial perturbation data where each
of them contained 100 genes and 100 samples. In the multifactorial experiment, a small
number of genes are perturbed simultaneously. The data was simulated by GeneNetWeaver [19]. The topology of these networks was derived from the transcriptional
regulatory system of E.coli and S.cerevisiae
3.2
Results
To evaluate the effectiveness of the developed inference method, tests are performed on
five different benchmark of DREAM4 multifactorial subchallenge, which are designed
for the structure of a large scale GRN. We measured the prediction accuracy by the area
under the Receiver Operator characteristic curve (AUC). We adopt a cross validation
procedure to make sure that the performance of the model is measured on prediction.
We compare our proposed model with SVM using the same procedure. We use 5 cross
validation and the prediction of all folds are averaged. For SVM we use the Gaussian
kernel that founded the best option for prediction of GRN from microarray data and it
has high accuracy and less standard derivation as shown by Gillani et al. [13]. Table 1
shows the obtained results for the proposed DNN model and SVM.
In addition, to assess the effectiveness of the proposed method, the roc curve obtained
by the proposed DNN and SVM are shown in Figs. 4, 5, 6, 7 and 8. Figures 4, 5, 6, 7 and 8
Table 1. AUROC for DREAM4 multifactorial challenge
Network
Network
Network
Network
Network
Network
1
2
3
4
5
Proposed DNN SVM
0.82
0.75
0.87
0.81
0.70
0.67
0.76
0.73
0.78
0.77
Deep Neural Network for Supervised Inference
155
show the plot of ROC curves (AUC) of averaged AUC for all folds and AUC results
obtained in each fold of cross validation procedure. The obtained results show that the
prediction accuracy in term of averaged AUC of the proposed DNN is better than the
averaged AUC obtained using SVM. Which means that the prediction of regulation
between genes is more accurate using the proposed method comparing with SVM.
SVM
DNN
Fig. 4. Network 1
SVM
DNN
Fig. 5. Network 2
SVM
DNN
Fig. 6. Network 3
156
M. Daoudi and S. Meshoul
DNN
SVM
Fig. 7. Network 4
DNN
SVM
Fig. 8. Network 5
4 Conclusion
Several approaches have been proposed to infer gene regulatory network including
unsupervised, semi-supervised and supervised methods. The most of the proposed
supervised models are based on SVM. In this work, we propose a DNN model to infer
the GRNS. With the aim to predict new regulations that containing similar expressions
profiles and differentiate between negatives regulations and positives ones. The method
was compared with SVM and the obtained results show that DNN outperforms SVM
even in small datasets. As ongoing work, we intend to integrate others type of data
presented in the same challenge and test the proposed model on large datasets.
References
1. Patel, N., Jason, T.L.Wang: Semi-supervised prediction of gene regulatory networks using
machine learning algorithms. J. Biosci. 40(4), 731–740 (2015)
2. Ristevski, B.: A survey of models for inference of gene regulatory networks. Nonlinear Anal.
Model. Control. 18(4), 444–465 (2013)
Deep Neural Network for Supervised Inference
157
3. Liang, S., Fuhrman, S., Somogyi, R.: REVAL a general reverse engineering algorithm for
inference of genetic network architectures. In: Pacific Symposium on Biocomputing, vol. 3,
pp. 18–19 (1998)
4. Freidman, N., et al.: Using Bayesian network to analyze expression data [J]. Comput.
Biologie 7, 601–620 (1996)
5. Jiguo, C., et al.: Modeling gene regulation network using ordinary differential equation. In:
Next Generation Microarray. Bioinformatics, pp. 185–197 (2012)
6. Butte, A.J., Kohane, I.S.: Mutual information relevance networks, functional genomic
clustering using pairwise entropy measurements. In: Pacific Symposium on Biocomputing,
pp. 418–429 (2000)
7. Meyer, PE., Kontos, K., Lafitte, F., Bontempi, G.: Information-theoretic inference of large
transcriptional regulatory networks. EURASIP J. Bioinform. Syst. Biology (2007)
8. Margolin, A.A., et al.: ARACNE: an algorithm for the reconstruction of gene regulatory
networks in a mammalian cellular context. BMC Bioinform. 7(1), S7 (2006)
9. Curelo, L., et al.: Learning gene regulatory networks from only positive and unlabeled data.
BMC Bioinform (2010)
10. Jisha, A., Jereech, A.S.: Gene regulatory network: a semi supervised approach. In:
International Conference on Electronics Communication and Aerospace Technology ICECA
(2017)
11. Sasmita, R., et al.: Handling unlabeled data in gene regulatory network. In: Proceeding of
International Conference on Frontiers of Intelligence Computing AISC 199, pp. 113–120.
Springer, Heidelberg (2013)
12. Mordelet, F., Vert, J.P.: SIRENE: supervised inference of regulatory network. Bioinformatics 24, i76–i82 (2008)
13. Gillani, Z., et al.: Compare SVM: supervised support vector machine (SVM) inference of
gene regularity network. BMC Bioinform. (2014)
14. Maetschke, S.R., et al.: Supervised, semi supervised and unsupervised inference of gene
regulatory networks. Brief. Bioinform. 15(2), 195–211 (2013)
15. Buduma, N.: Fundamentals of Deep Learning: Designing Next-Generation Machine
Intelligence Algorithms. O’Reilly Media, Boston (2017)
16. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinform. 18(5), 851–
869 (2017)
17. Bramer, M.: Principle of Data Mining. Springer, London (2016)
18. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117
(2015)
19. Shaffer, T., et al.: GeneNetWeaver: in silico benchmark generation and performance
profiling of network inference methods. Bioinformatics 2263–2270 (2011)
Download