Evolutionary Analysis of Functional Modules in Dynamic PPI Networks Nan Du Yuan Zhang

advertisement
Evolutionary Analysis of Functional Modules in Dynamic
PPI Networks
Nan Du
Computer Science and
Engineering Department
SUNY at Buffalo
Buffalo, 14260, U.S.A.
nandu@buffalo.edu
Jing Gao
Computer Science and
Engineering Department
SUNY at Buffalo
Buffalo, 14260, U.S.A.
Yuan Zhang
College of Electronic
Information and Control
Engineering
Beijing University of
Technology
Beijing, 100124, China
Kang Li
Computer Science and
Engineering Department
SUNY at Buffalo
Buffalo, 14260, U.S.A.
kli22@buffalo.edu
zhangyuan@emails.bjut.edu.cn
Bindukumar B Nair
Supriya D Mahajan
Department of Medicine
SUNY at Buffalo
Buffalo, 14260, U.S.A.
Department of Medicine
SUNY at Buffalo
Buffalo, 14260, U.S.A.
smahajan@buffalo.edu
bnair@buffalo.edu
jing@buffalo.edu
Stanley A. Schwartz
Aidong Zhang
Department of Medicine
SUNY at Buffalo
Buffalo, 14260, U.S.A.
sasimmun@buffalo.edu
ABSTRACT
Functional module detection in Protein-Protein Interaction
(PPI) networks is essential to understanding the organization, evolution and interaction of the cellular systems. In recent years, most of the researches have focused on detecting
the functional modules from the static PPI networks. However, sometimes the structure of the PPI networks changes in
response to stimuli resulting in the changes of both the composition and functionality of these modules. These changes
occur gradually and can be thought of as an evolution of the
functional modules. In our opinions the evolutionary analysis of functional modules is a key to form important insights
of the functional modules’ underlying behaviors, particularly
when targeting complex living systems.
In this paper, we propose a novel computational framework which integrates a PPI network with multiple dynamic
gene coexpression networks to categorize and track the evolutionary pattern of functional modules over consecutive timestamps. We first propose a method to construct dynamic
PPI networks, and then design a new functional influence
based algorithm to detect the functional modules from these
dynamic PPI networks. Based on the results of this approach, we provide a simple but effective method to charac-
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ACM-BCB’12, October 7-10, 2012, Orlando, FL, USA
Copyright 2012 ACM 978-1-4503-1670-5/12/10 ...$15.00.
ACM-BCB 2012
Computer Science and
Engineering Department
SUNY at Buffalo
Buffalo, 14260, U.S.A.
azhang@buffalo.edu
terize and track the evolutionary patterns of dynamic modules, which involves detecting evolutionary events between
modules found at consecutive timestamps. Extensive experiments on the fermentation process dataset of S. cerevisiae show that the proposed framework not only outperforms previous functional module detection methods, but
also efficiently tracks the evolutionary patterns of functional
modules.
Categories and Subject Descriptors
J.3 [Life And Medical Sciences]: Biology and Genetics
General Terms
ALGORITHMS
1.
INTRODUCTION
Protein Protein Interaction (PPI) networks help us systematically analyzing the structure of a large living system
and also allow us to understand principles like essentiality,
protein interactions, functional modules and cellular pathways. The identification of functional modules in PPI networks is of great interest as it often reveals unknown functional ties between proteins and thus helps in predicting
functionalities of unknown genes.
However, traditional functional module detection approaches
treat the PPI network as a static graph, where the graph is
either derived from data which is fixed at a certain timestamp or aggregated from the data collected over a period.
These approaches ignore the temporal evolution of the functional modules which can offer biologists valuable insights.
In the absence of capturing the inherent dynamic charac-
250
teristics within the PPI networks, one may miss the opportunity to capture the evolutionary pattern of functional
modules.
Protein-Protein interactions are often subjected to external stimuli and this results in a change in the structure of
the network during the development. These dynamically
varying interactions which sometimes are referred to as transient interactions are caused by stimuli that may be either
reactive (caused by exogenous factors, such as a response
to environmental stimulus) or programmed (due to endogenous signals, such as cell-cycle dynamics or developmental
process) [23]. Also, the functional modules detected at each
timestamp may evolve regularly as the protein interactions
dynamically change over time. Specifically, detecting the
functional module evolution, that is, the module’s functions
change over time, provides insights into the underlying behavior of the molecular system. For example, network dynamics can describe how cells respond to environmental cues
or how an interaction network changes during development.
It is also worth mentioning that temporal evolution of the
functional modules will also be very useful for monitoring
chronic and genetic disease development and outcome. Thus
we believe that it is promising to track the evolution of functional modules and proteins in the dynamic PPI networks.
In this paper, we propose a framework to categorize and
track the evolutionary pattern of functional modules over
consecutive timestamps. Accordingly, we begin by constructing a series of dynamic PPI networks based on both the PPI
network and the dynamic gene coexpression networks during
various timestamps. We then solve the functional module
detection problem with a novel functional influence based
algorithm which quantifies the influence from one biological
component to another. In addition, the proposed functional
module detection method maintains certain levels of module
equivalence between consecutive timestamps, the detailed
definition of which will be discussed in Section 2.2. Finally,
we try to capture complex evolutionary patterns of functional modules over time by analyzing the key evolutionary
events among modules in consecutive timestamps.
In summary, there are three main contributions of our paper: (i) we propose a novel method to construct the dynamic
PPI networks by integrating the static PPI network with the
dynamic gene coexpression networks; (ii) we propose a new
functional influence based functional module detection algorithm in which the functional modules detected are allowed
to be overlapping and would not change dramatically over
short time; (iii) we provide a model for tracking the evolutionary process of functional modules over time.
To the best of our knowledge, this is the first work in analyzing the evolutionary patterns of functional modules over
consecutive timestamps. The rest of the paper is organized
as follows. The proposed approach is presented in Section
2. Extensive experimental results are shown in Section 3.
Finally, we conclude our work in Section 4.
2.
METHOD
We begin by introducing the method of constructing the
dynamic PPI networks in Section 2.1. In Section 2.2, we
will present the functional influence based algorithm used
for detecting the functional modules. Finally, the model we
used for tracking the evolution of the functional modules is
presented in Section 2.3.
ACM-BCB 2012
2.1
Dynamic PPI Network Construction
Several researchers have worked on integrating static data
with dynamic data to discover the temporal evolution of
protein interaction networks. Han et al. integrated the PPI
networks with gene expression data and suggested that some
modules are active at specific times and locations [8]. Qi et
al. further noted that the integration of a variety of datasets,
including binary interactions, protein complexes and expression profiles, enables the identification of subnetworks that
are active under certain conditions [17].
In order to discover the temporal evolution of functional
modules, we integrate the static PPI network with a series of
dynamic gene coexpression networks. Given a PPI network
P = (V, E), where V is a set of proteins and E is a set
of interactions between these proteins, let M 1 , M 2 , ..., M T
be a set of |V | × n gene expression matrices, where T is
the number of timestamps and n is the number of samples
(replicates) in the experiments. Our goal is to construct T
dynamic PPI networks D1 , D2 , ..., DT , each of which is a
|V | × |V | matrix. Note that each gene expression matrix
M i (1 ≤ i ≤ T ) and dynamic PPI network Di (1 ≤ i ≤ T )
corresponds to a specific timestamp i.
Before constructing the dynamic PPI networks, we first
need to construct a series of gene coexpression networks
G1 , G2 , ..., GT . Gene coexpression networks have been used
to demonstrate that functionally related genes are frequently
coexpressed across multiple datasets and across different organisms [10], and to estimate the underlying regulatory relationships between genes under various experimental conditions [1]. By constructing specific gene coexpression network at each timestamp, e.g., at early stage, intermediate
stage and terminal stage of a certain disease, it is possible
to identify disease-mediated changes in the network connectivity patterns.
For each gene pair, the absolute Pearson correlation coefficient of their expression profiles along samples is calculated,
and the output is a |V |×|V | correlation matrix, which represents expression similarity between each gene pair. Based on
these correlation matrices, we can easily construct the gene
coexpression network, where each node is a gene and each
edge represents that the correlation measure between two
genes is greater than a cutoff threshold. This cutoff threshold is used to remove all but the most likely biologicallysignificant relationships, and we choose an appropriate cutoff threshold based on the average correlation similarity from
each correlation matrix.
Combining static PPI network with time course gene expression data leads to a better understanding of protein or
gene function and reveals global changes in network topology that hint at higher level cellular organizational principles and functions [16]. Furthermore, we can regulate the
changes of proteins’ relationships and also track the evolutionary process of the functional modules by integrating the
static PPI network with time course gene expression data.
After we get the gene coexpression networks G1 , G2 , ..., GT ,
we integrate them with the PPI network P by the rule that
if one interaction exists at both the PPI network P and the
i-th dynamic gene coexpression network Gi , this interaction
would be added to the i-th dynamic PPI network D i . Otherwise, we believe that there is no interaction between this
protein pair at this timestamp. An example of constructing
dynamic PPI networks is presented in Figure 1.
251
Figure 1: An example of constructing dynamic PPI networks at five timestamps.
2.2
Functional Influence based Functional Module Detection
In recent years, many methods have been developed to detect functional modules in a PPI network, such as Markov
Clustering (MCL) [5] which is a fast stochastic flow based
clustering algorithm for graph, hierarchical clustering method
[7] and spectral clustering method [24]. Furthermore, two of
our previous algorithms based on functional influence have
also been proposed, which efficiently analyzed large-sized,
complex PPI networks [3, 20]. The functional influence algorithm was first proposed by Nabieva et al [13], and the basic idea of it is that influence is propagated from the source
proteins to the surrounding neighborhoods, and this process
is repeated for each protein until each protein in the graph
has an influence score. This influence score represents the
amount of functional influence received by the protein for a
given function.
However, since these approaches are not designed for dynamic graphs clustering, they do not consider the temporal
characteristic of the dynamic PPI networks, where the interactions between proteins continuously evolve. Therefore, we
propose to design a novel functional influence based method
which can effectively identify the protein functional modules
that reflect the temporal evolution over consecutive timestamps. Our method also allows the overlapping between the
modules and can automatically estimate the optimal number of modules at each timestamp.
The Principle of Module Equivalence.
Since living systems are subjected to the external stimuli, the interactions between proteins also evolve with time
which raises a new challenge for the traditional clustering algorithms. Since in our case, the clusters evolve continuously,
which is different with the case in which the traditional clustering algorithms usually handle, some new considerations
are needed. On one hand, we expect to detect the functional modules that depend on the current PPI network; on
the other hand, we also expect that the detected functional
modules do not deviate too dramatically from the previous
timestamp’s PPI network. Similar principles have also been
used in [2]. In other words, since the living system is more
likely to change gradually instead of dramatically, we expect certain level of module equivalence between functional
modules detected in consecutive timestamps. Moreover, in
ACM-BCB 2012
many cases, the dramatic change of functional modules over
a short time could be due to the noise which may come from
sample contamination, experimental design or the clustering method. Fulfilling the module equivalence can also help
in generating more robust results that are not sensitive to
noise; this is validated in the experiment.
Figure 2: An example of illustrating module equivalence. (a) the clustering results evolve gradually;
(b) the clustering results change dramatically.
Consider the simple example shown in Figure 2. There are
two clustering results (a) and (b) of 7 proteins over 3 timestamps, where each node is a protein and the nodes enclosed
together denotes a cluster. It is easy to notice that, the
proteins partitioned into the same cluster are stable in result (a), where each cluster changes gradually over time. On
the contrary, the proteins partitioned together in result (b)
change dramatically. Therefore, according to the principle
of module equivalence, (a) should be preferred. Obviously,
it is easier and more reasonable to track the evolutionary
patterns of functional modules in (a) than (b).
To achieve certain level of module equivalence between
functional modules in consecutive timestamps, we propose
a method to construct a series of weighted dynamic PPI
networks, which takes the PPI network from the previous
timestamp into account and guarantees that the modules
252
change smoothly in consecutive timestamps. Given T timestamps’ unweighted dynamic PPI networks D1 , D2 , ..., DT
which have been introduced in Section 2.1, we aim at constructing T weighted dynamic PPI networks W D1 , W D2 , ...
, W DT , where each dynamic PPI network can be represented
as W Di = (V i , E i ). The weight between proteins u and v
in W Di is defined as:
i
W Duv
⎧
i−1
i
⎪
⎨α, if Duv = 1 xor Duv = 1,
i−1
i
= β, if Duv = 1 and Duv
= 1,
⎪
⎩0, otherwise,
(1)
where α and β are pre-set weights, and 0 ≤ α < β ≤ 1. The
assumption is that the weight of an interaction between proteins u and v at i-th timestamp is based on both unweighted
dynamic PPI networks Di−1 and Di . If a particular interaction exists at both of these consecutive timestamps, we
have a high confidence that this interaction is reliable and
stable, and thus it would be assigned a high weight β. If this
interaction only exists at one of the two consecutive timestamps, it would be less confident that it does not come from
noise, and thus it would be assigned a relatively low weight
α. It can also be considered as that we use previous PPI
network as an evidence to weigh the current network. In
addition, when i = 1 it does not have previous timestamp,
1
= α if there is an interaction between protein
thus W Duv
u and v in D1 . In our experiments, we set α = 0.1 and
β = 0.2.
Functional Flow Model.
Based on the weighted dynamic PPI networks W Di (1 ≤
i ≤ T ), we design a modified influence based functional module detection algorithm. We first select some proteins to be
the source protein set S which are the start points to propagate the influence based on the weighted degrees of the
proteins. A previous research [9] has observed that the connectivity of nodes in biological networks plays a crucial role
in cellular functions. The weighted degree of protein u, denoted d(u), is the summation of the weights between u and
its neighbors and the formula is shown as Eq. 2, where N (u)
is the set of the neighbors of protein u and wuv is the weight
of the edge between the protein u and v.
d(u) =
wuv .
(2)
v∈N (u)
Secondly, we assign an initial influence weight to each
source protein s (s ∈ S) and propagates the weight to its
neighbors x. The process of computing the initial flow f (s →
x) from s to x is denoted as:
f (s → x) = wsz
× F (s) ,
(3)
where F (s) is the initial influence score for the source protein which we assign as a constant value 1 and wsx is the
normalized weight of the edge between s and x. The influence score of x is then updated by summing of all incoming
flows from its neighbors, which is shown as Eq. 4.
fs (u → x) .
(4)
F (x) =
u∈N (x)
After updating the influence weight, x propagates its influence weight to its neighbors, this process is defined as:
ACM-BCB 2012
wxy
z∈N (x)
wxz
× F (x) .
(5)
The flow f (x → y) would be removed if it is less than a
threshold θf low . Eq. 4 and Eq. 5 are repeated until there
is no more flow in the network. By the end of the flow
simulation, we can obtain a flow pattern which is a |S| × |V |
matrix, where each vector is a set of cumulative quantities
of functional influences for a particular source protein s over
all the proteins. The functional influence profile is a vector
where each item reflects the functional influence received
from a source protein in the network. In the flow pattern,
all the proteins that have a higher functional influence score
than the threshold θf low , would be grouped into a functional
module.
Merging Preliminary Modules.
Note that the preliminary modules extracted from flow
pattern are typically overlapped since a protein may have a
high functional influence to multiple source proteins. However, the quality of these preliminary modules mainly depends on the source protein selection. Through merging the
similar preliminary modules which have a large fraction of
common members, we obtain the final modules which have
higher accuracy. It is an important step to merge the similar
preliminary modules to generate the final modules [6]. Since
these final modules are merged from the overlapped preliminary modules, they are also overlapped. The real functional modules are likely to be overlapping, since a molecule
generally may perform different biological processes or functions in different environments [26]. In our work, we set
θf low = 0.02.
In our case, we use a hierarchical clustering algorithm to
merge the preliminary modules based on the Jaccard index between modules [25]. However, one difficult issue in
functional module detection is to determine the number of
clusters. As we know, the classic hierarchical clustering algorithms suffer from the limitation that the number of clusters is specified by users. It is impractical to expect we
have sufficient domain knowledge to determine the number
of modules for each timestamp. Also, it is unreasonable to
assume that the number of clusters at each timestamp is
the same. Therefore, in our work, we use the method of
[19] which proposed a L curve method to automatically estimate the optimal number of clusters by using the property
of the knee shape graph to identify the appropriate number
of functional modules. Therefore, in our method, the number of clusters is unbounded, and an optimal number can be
automatically determined.
2.3
wsx
z∈N (s)
f (x → y) = Evolutionary Events
Recently, a few approaches have been proposed to characterize the evolution of clusters over consecutive timestamps
in social networks. Takaffoli et al. [22] described an eventbased framework to track the transitions between clusters at
consecutive timestamps, and they improved the event formulae to track the entire observation time in a later work [21].
All these works have used a two-stage approach in which
the clusters are first detected independently at each timestamp, and then matched to determine the critical evolutionary events. As mentioned before, our functional modules detected from consecutive timestamps are simultaneously influenced by two consecutive timestamps which makes our
253
framework different. We believe that analyzing the evolutionary pattern of the functional modules detected at each
timestamp, including form, dissolve, continue, merge and
split, can help us discover underlying evolutionary trends or
behaviors of different diseases or species.
We state the problem of characterizing the evolutionary
pattern of the functional modules in dynamic PPI networks
in the following way. At a particular timestamp i, we can detect ki functional modules from the weighted dynamic PPI
network W Di which is mentioned in the previous section,
denoted as C i = {C1i , C2i , ..., Cki i }. Note that there are overlapping between modules generated by our method. The
evolutionary patterns of functional modules can be represented as a sequence of key evolutionary events (change)
in consecutive timestamps. These key evolutionary events
cover the evolution of functional modules and can be further
formulated as a set of rules. We use the definition of transitionary events from [21], but we only focus on tracking the
informative events from consecutive timestamps instead of
entire observation timestamps.
Given a module Cxi from i-th timestamp, the metric which
tracks the optimal module which has the highest similarity
with Cxi at (i + 1)-th timestamp, is defined as:
track(Cxi , i + 1) = Cyi+1
Cyi+1 =
∩ Vzi+1 |
arg max {
}
i+1
i
i+1
|)
Cz
∈C i+1 max(|Vx |, |Vz
∃Cyi+1 ∈ C i+1 track(Cxi , i + 1) = Cyi+1 .
≥ α,
(6)
where Vxi is the set of proteins of Cxi , and the overlap threshold α defines whether two modules are matched, which is
also used in the definitions of evolutionary events below. So
track(Cxi , i+1) denotes the optimal matching module for Cxi
at (i + 1)-th timestamp. If none of the modules in C i+1 has
an overlap ratio larger than α, then track(Cxi , i + 1) = ∅ (∅
denotes an empty matching result). It is worth mentioning
that this metric could also be used in the reverse direction
with simple revision. The formal definitions of the five evolutionary events are defined as follows:
(9)
Split.
If a particular functional module Cxi in i-th timestamp is
matched to a set of modules C∗i+1 = {C1i+1 , C2i+1 , ..., Cki+1 }
in the coming (i + 1)-th timestamp then we say Cxi is split
to C1i+1 , C2i+1 , ..., Cki+1 , and it is worth noticing that C∗i+1 ⊆
C i+1 . For example, in Figure 3, module C11 is split into two
modules - C12 and C22 in the next timestamp. Formally, a
module Cxi in the i-th timestamp is split into a set of modules
C1i+1 , C2i+1 , ..., Cki+1 in the (i + 1)-th timestamp iff:
∃C∗i+1 = {C1i+1 , C2i+1 , ..., Cki+1 } ⊆ C i+1 :
∀Cyi+1 ∈ C∗i+1 :
Merge.
if f
|Vxi
Cyi+1 is the continuation of Cxi in the next timestamp. It can
also be considered as a module which continues its existence
in the consecutive timestamps. Note that we do not ask for
two modules to be totally the same. In Figure 3, module C32
is the continuation of module C21 . Formally, a module Cxi in
the i-th timestamp continues its existence to the (i + 1)-th
timestamp iff:
|Vxi ∩ Vyi+1 |
≥ α.
|Vyi+1 |
(10)
If a particular functional module Cxi+1 in (i + 1)-th timestamp is matched to a set of modules C∗i = {C1i , C2i , ..., Cki }
in the previous i-th timestamp then we say Cxi+1 is merged
from C1i , C2i , ..., Cki , and C∗i ⊆ C i . For example, in Figure
3, module C23 is merged from three modules - C22 , C32 and
C42 in the previous timestamp. Formally, a set of modules
C1i , C2i , ..., Cki in the i-th timestamp is merged into a modules
Cxi+1 in the (i + 1)-th timestamp iff:
∃Cxi+1 :
∀Cyi ∈ C∗i :
|Vyi ∩ Vxi+1 |
≥ α.
|Vyi |
(11)
Form.
A particular functional module Cxi is marked as form if it
did not exist in the previous timestamp. To be more specific,
a form indicates that it is the first time a set of proteins
are grouped together to perform some function, and some
examples are shown as modules C11 , C21 and C42 in Figure 3.
Thus module Cxi is formed in the i-th timestamp iff:
Dissolve.
track(Cxi , i − 1) = ∅.
(7)
A dissolve occurs for a particular functional module Cxi
if no similar module exists in the next timestamp. Specifically, a dissolve indicates that it is the last time a set of proteins are grouped together to perform some function, and
an example is shown as module C31 in Figure 3. Formally, a
module Cxi in the i-th timestamp is defined as dissolve iff:
Continue.
track(Cxi , i + 1) = ∅.
(8)
The continue occurs if there is a particular functional
module Cyi+1 detected in timestamp i + 1 that is close to
a module Cxi in the previous timestamp i-th. We then say
ACM-BCB 2012
Figure 3: An example of functional modules evolution over three timestamps, where five evolutionary
events: form, dissolve, continue, split and merge are
included.
3.
EXPERIMENTS
In this section, we show the experimental results of our
proposed framework.
3.1
Dataset
254
To construct the dynamic PPI networks, we have used two
data sources, one is the static PPI network, and the other is
the time course gene expression data.
Time Course Gene Expression Data.
We use a time course gene expression dataset which represents the response of S. cerevisiae in a 15-day wine fermentation that is the process of S. cerevisiae turning the sugar
of crushed grapes into alcohol. The dataset consists of seven
timestamps (0, 12, 24, 48, 60, 120, and 340 hours which response to different ethanol concentrations), and there is a
gene expression matrix created at each timestamp. In order to have a high cover ratio with the PPI network, we
used the top 1285 genes which have the most known interactions in the DIP’s PPI dataset1 . In addition, for each of
the 1285 genes, the primary data consist of three independent biological samples at each of seven timestamps. The
raw microarray data are published on Apr. 17, 2008 and
available at the National Center for Biotechnology Information database2 (NCBI) with the accession number GSE8536
[12]. In our experiments, we set the cutoff thresholds for
seven timestamps’ correlation matrices as 0.76, 0.76, 0.83,
0.79, 0.73, 0.76 and 0.70, respectively, corresponding to their
average correlation similarity.
PPI Network.
We used the S. cerevisiae data from the Database of Interacting Proteins3 (DIP) database which was updated on
Feb. 28, 2012. The S. cerevisiae PPI dataset contains totally
22,418 interactions.
3.2
Similarity between Functional Modules over
Timestamps
As we mentioned before, in the real world, the cellular
system evolves gradually over time; thus we believe that the
functional modules detected from each timestamp should
change smoothly instead of dramatically. We assessed the
functional modules’ similarity across the timestamps by comparing the proposed method with some classical clustering
methods: K-means, Hierarchical clustering, Fuzzy c-means
clustering (FCM) and Spectral clustering. In addition, since
these baseline algorithms are required to preset the cluster
number K, thus for each algorithm, we have tested both the
cases when K = 15 and when K = 30. Note that among
these baseline algorithms, K-means, Hierarchical clustering
and Spectral clustering are non-overlapping clustering algorithms, and Fuzzy c-means is an overlapping clustering algorithm in which each node has a membership value for each
cluster. In our experiments, if one particular node x’s membership value for a cluster Cji is larger than 0.1 we would
assign x to Cji . We also show our proposed method’s performance without considering the module equivalence through
the consecutive timestamps.
To measure the similarity between the functional modules,
we use the Jaccard index, which is defined as:
J(Cxi , Cyi+1 ) =
|Vxi ∩ Vyi+1 |
,
|Vxi ∪ Vyi+1 |
(12)
which is between 0 and 1. Then we summed up and av1
As list at www.acsu.buffalo.edu/ nandu/GeneNames.docx
www.ncbi.nlm.nih.gov/
3
http://dip.doe-mbi.ucla.edu/dip/
2
ACM-BCB 2012
eraged all the maximal Jaccard value for each module at a
certain timestamp to be the final result, where a high value
indicates that the modules detected at two separate timestamps are similar, or dissimilar otherwise. The results of all
the methods are shown in Table 1. As can be seen, our proposed method shows higher module similarity over all timestamps than the other methods, since the baseline algorithms
only consider the PPI network at the current timestamp. It
demonstrates that our proposed framework properly handled the functional modules’ smoothly evolution.
3.3
Functional Module Identification
To evaluate the effectiveness of our proposed framework,
we used Funcat as the functional annotation from MIPS
database. MIPS Functional Catalogue (FunCat) [18] is an
annotation scheme for the functional description of proteins
of prokaryotic and eukaryotic origin, and we used the top
four levels of Funcat for validation. For statistical evolution of the detected modules, we used the p-value from the
hypergeometric distribution, which is defined as:
p=1−
m−1
i=0
|X||V |−|X|
i
n−i
|V |
,
(13)
n
where |V | is the number of proteins in the PPI network, |X|
is the number of proteins in a reference function, n is the size
of the modules, and m is the number of proteins in common
between the function and the module. It is understood as
the probability that at least m proteins in a module of size n
are included in a reference function of size |X|. A low value
of p-value demonstrates that the module closely corresponds
to the function, since it is not likely that the network will
produce the module by chance.
Similarly , we assessed the proposed algorithm’s performance by comparing it with the baseline algorithms described in Section 3.2. The results are shown in Table 2. As
the table shows, our proposed framework remarkably outperforms the baseline algorithms at each timestamp. This
result indicates two things: 1) by following the principle of
module equivalence, our functional influence based method
provides more robust functional modules which are not sensitive to noise; and 2) our functional influence based overlapping functional module detection algorithm is more effective.
3.4
Informative Module Identification
In this part, we used the evolutionary events which are
defined in Section 2.3 to track the informative behavioral
patterns in the evolving graph. We define core-module as
the intersection of a series of modules which are linked as
a connected graph by the evolutionary events at different
timestamps and represents the evolution of its constituent
communities ordered by time over the entire timestamps.
To be more specific, the core-community is denoted as M =
}, where t1 < t2 < ... < tm .
{Ckt11 ∩ Ckt22 ... ∩ Cktm
m
By tracking the critical evolutionary events between timestamps, we found some interesting results. Figure 4 shows the
evolving graphs for four α values: 0.6, 0.7, 0.8 and 0.9, respectively. In the evolving graph, each node is a functional
module detected at a particular timestamp and each edge
is an interaction (event) between modules between two consecutive timestamps. We see from Figure 4 that, as the α
increases, the number of detected evolutionary events becomes less and less. Also, the backbone of the evolution
255
Table 1: Comparing of modules’ similarity across timestamp
t=0-12 t=12-24 t=24-48 t=48-60 t=60-120
Evolution Flow
0.49
0.53
0.55
0.53
0.51
Evolution Flow (Without Smoothness)
0.24
0.29
0.32
0.29
0.3
K-means (K=15)
0.10
0.13
0.07
0.09
0.09
K-means (K=30)
0.19
0.23
0.24
0.21
0.21
FCM (K=15)
0.22
0.21
0.22
0.22
0.22
FCM (K=30)
0.16
0.15
0.22
0.24
0.14
Spectral Clustering (K=15)
0.24
0.27
0.30
0.30
0.26
Spectral Clustering (K=30)
0.2
0.16
0.21
0.17
0.17
Table 2: Comparing of − log(p-value)
t=0 t=12 t=24 t=48 t=60 t=120
Evolution Flow
7.51 10.64 9.03 11.71 8.99
9.56
K-means (K=15)
4.66 3.64
3.79
4.63
4.48
3.92
K-means (K=30)
4.21 4.34
4.13
3.84
3.82
4.01
FCM (K=15)
6.69 8.26
9.09
6.77
8.03
5.43
FCM (K=30)
5.18
6.8
6.5
5.79
6.27
5.52
Spectral Clustering (K=15) 6.25 7.97
8.57
9.57
8.14
8.17
Spectral Clustering (K=30) 5.56 5.33
5.52
5.29
5.32
4.75
becomes clearer. Finally, when α = 0.9, we can detect a
module which is consistent over all timestamps. To make
it clearer, we extracted this module and represented it in
dashed lines in Figure 4(d). It is easy to note that the coremodule is M ∗ = {C11 ∩ C22 ∩ C23 ∩ C14 ∩ C15 ∩ C36 ∩ C17 }, which
includes 25 core proteins which are POL30, RAD1, PIN3,
RAD23, HRT1, YOL087C, RAD7, UBA1, MET30, MGT1,
RVS167, HSE1, CDC48, SAN1, PRP8, RPL40A, SNF1,
CLB2, KSS1, SWD1, RPL40B, MUS81, SWI5, GRR1 and
GPA1.
The consistency shows that the proteins which are included in this core-module interact strongly over the entire observation period. This is not surprising since this
functional module is essentially involved in cell growth and
cell death, as well as ethanol concentrations changing. Such
consistency in evolutionary patterns of this module may provide clues about how proteins response to external stimuli
during the wine fermentation progression. The top 10 biological process annotations of this core-module M ∗ with
very low p-value are shown in Table 3, which are calculated
by [11]. Some functional key words such as protein ubiquitination, protein conjugation, post-translational modification,
response to stimulus and catabolic process, have been proven
to play an important role in the process of S. cerevisiae fermentation [15, 14, 4].
4.
CONCLUSIONS
In this paper, we proposed a framework for analyzing
the evolutionary patterns of functional modules in dynamic
PPI networks. Since this framework has considered the inherent dynamic characteristics within the PPI networks, it
may provide novel insights into the underlying behaviors of
the molecular system. To our best knowledge, this is the
first evolutionary analysis of functional modules in dynamic
PPI networks. Using the wine fermentation of S. cerevisiae
dataset over consecutive timestamps, we demonstrated the
gene annotation enrichment of the identified functional modules, the sets of proteins that participate in the same biological function, in high confidence. Also, the results of the
ACM-BCB 2012
t=340
10.46
4.21
3.64
8.4
7.39
7.21
5.31
t=120-340
0.51
0.28
0.10
0.2
0.25
0.15
0.21
0.22
Ave
0.52
0.28
0.10
0.21
0.21
0.17
0.26
0.18
Ave
9.7
4.19
3.99
7.52
6.85
7.98
5.29
experiment in Section 3.4 lead to the conclusion that the
proposed framework can categorize and track the evolutionary events of the functional modules effectively, and obtains
an informative functional module which plays an important
role over the entire observation time. Through deeply analyzing the gene annotations of the functional modules whose
evolutionary pattern are distinctive, we may capture important insights of various diseases or creatures.
5.
REFERENCES
[1] K. Basso and et al. Reverse engineering of regulatory
networks in human b cells. Nature Genetics, 37(4):382–390,
2005.
[2] Y. Chi and et al. On evolutionary spectral clustering. ACM
Transactions on Knowledge Discovery from Data,
3(4):1–30, 2009.
[3] Y.-R. Cho, L. Shi, and A. Zhang. flownet: Flow-based
approach for efficient analysis of complex biological
networks. 2009 Ninth IEEE International Conference on
Data Mining, pages 91–100, 2009.
[4] J. Ding and et al. Tolerance and stress response to ethanol
in the yeast saccharomyces cerevisiae. Applied Microbiology
and Biotechnology, 74(2):253–263, 2010.
[5] A. J. Enright, S. Van Dongen, and C. A. Ouzounis. An
efficient algorithm for large-scale detection of protein
families. Nucleic Acids Research, 30(7):1575–1584, 2002.
[6] L. Getoor and C. P. Diehl. Link mining: a survey. SIGKDD
Explor. Newsl., 7(2):3–12, Dec. 2005.
[7] M. Girvan and M. E. J. Newman. Pnas community
structure in social and biological networks community
structure in social and biological networks- pnas. PNAS,
pages 1–9, 2002.
[8] J.-D. J. Han and et al. Evidence for dynamically organized
modularity in the yeast protein-protein interaction network.
Nature, 430(6995):88–93, 2004.
[9] H. Jeong, S. P. Mason, A. L. BarabÃasi,
and Z. N. Oltvai.
Lethality and centrality in protein networks. Nature,
411(6833):41–42, 2001.
[10] H. K. Lee and et al. Coexpression analysis of human genes
across many microarray data sets. Genome Research,
14(6):1085–1094, 2004.
[11] S. Maere, K. Heymans, and M. Kuiper. Bingo: a cytoscape
plugin to assess overrepresentation of gene ontology
256
GO-ID
16567
32446
70647
43687
51716
43412
42787
6974
6464
50896
Table 3:
p-value
7.92E-10
5.48E-09
3.42E-08
4.13E-07
8.25E-07
1.15E-06
1.17E-06
1.78E-06
1.88E-06
3.64E-06
Top 10 biological process annotations for the core-module M ∗
Description
protein ubiquitination
protein modification by small protein conjugation
protein modification by small protein conjugation or removal
post-translational protein modification
cellular response to stimulus
macromolecule modification
protein ubiquitination involved in ubiquitin-dependent protein catabolic process
response to DNA damage stimulus
protein modification process
response to stimulus
Figure 4: Plot of evolving graph with varying α values.
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
categories in biological networks. Bioinformatics,
21(16):3448–3449, 2005.
V. Marks and et al. Dynamics of the yeast transcriptome
during wine fermentation reveals a novel fermentation
stress response. FEMS Yeast Research, 8(1):35–52, 2008.
E. Nabieva and et al. Whole-proteome prediction of protein
function via graph-theoretic analysis of interaction maps.
Bioinformatics, 21 Suppl 1:302–310, 2005.
S. Ostergaard, L. Olsson, and J. Nielsen. Metabolic
engineering of saccharomyces cerevisiae. Microbiology and
Molecular Biology Reviews, 64(1):34–50, 2000.
N. Piggott, M. Cook, M. Tyers, and V. Measday.
Genome-wide fitness profiles reveal a requirement for
autophagy during yeast fermentation. G3 (Bethesda),
1(5):353–67, 2011.
T. M. Przytycka, M. Singh, and D. K. Slonim. Toward the
dynamic interactome : it’s about time. Access, 11(1), 2010.
Y. Qi and H. Ge. Modularity and dynamics of cellular
networks. PLoS Computational Biology, 2(12):9, 2006.
A. Ruepp and et al. The funcat, a functional annotation
scheme for systematic classification of proteins from whole
genomes. Nucleic Acids Research, 32(18):5539–5545, 2004.
S. Salvador and P. Chan. Determining the number of
clusters/segments in hierarchical clustering/segmentation
algorithms. 16th IEEE International Conference on Tools
ACM-BCB 2012
with Artificial Intelligence, 1(Ictai):576–584, 2004.
[20] L. Shi, Y.-R. Cho, and A. Zhang. Functional flow
simulation based analysis of protein interaction network.
BIBE ’10, pages 144–149, 2010.
[21] M. Takaffoli, F. Sangi, J. Fagnan, and O. R. Za. Modec modeling and detecting evolutions of communities.
Artificial Intelligence, pages 626–629, 2010.
[22] M. Takaffoli, F. Sangi, J. Fagnan, and O. R. Zaiane. A
framework for analyzing dynamic social networks. Science,
2010.
[23] X. Tang, J. Wang, B. Liu, M. Li, G. Chen, and Y. Pan. A
comparison of the functional modules identified from time
course and static ppi network data. BMC Bioinformatics,
12(1):339, 2011.
[24] S. White and P. Smyth. A spectral clustering approach to
finding communities in graphs. Proceedings of the fifth
SIAM international conference on data mining, 119:274,
2005.
[25] A. Zhang. Protein Interaction Networks: Computational
Analysis. 2009.
[26] S. Zhang, H.-W. Liu, X.-M. Ning, and X.-S. Zhang. A
hybrid graph-theoretic method for mining overlapping
functional modules in large sparse protein interaction
networks. International journal of data mining and
bioinformatics, 3(1):68–84, 2009.
257
Download