Protein Complexes Enriched with Cancer Proteins

advertisement
Advances in Environment, Computational Chemistry and Bioscience
Protein Complexes Enriched with Cancer Proteins
Chien-Hung Huang
Department of Computer
Science and Information
Engineering, National
Formosa University,
Yun-Lin, 632, Taiwan
chhuang@nfu.edu.tw
Szu-Yu Chou
Department of Computer
Science and Information
Engineering, National
Formosa University,
Yun-Lin, 632, Taiwan
19966157@gm.nfu.edu.tw
Ka-Lok Ng*
Department of Biomedical
Informatics,
Asia University,
Taichung, 413, Taiwan
ppiddi@gmail.com
Abstract: Proteins participate in many aspects of biological processes within an organism, but they
rarely function in isolation. More specifically, most proteins achieve a particular function by
interacting with other proteins to from protein complexes. In this work, enrichment analysis is adopted
for studying protein complexes enriched with cancer proteins; which include oncoproteins (OCP),
tumor suppressor proteins (TSP), and cancer protein acts as OCP and TSP. Furthermore, we construct
the protein complex interaction network, and studied the pattern of interaction of the network.
A total of 1818 human protein complexes are retrieved from the MIPS database. Cancer protein data are
obtained from the Tumor Associated Gene (TAG) database, and Memorial Sloan-Kettering Cancer Center
(MSKCC). Protein-protein interaction (PPI) data are derived from the BioGrid database. We adopted the
MATLAB package to pre-process the data, identified PPI, conducted data statistical analysis and presented
visualization by animation. Hypergeometric analysis indicated that 248 and 36 protein complexes are enriched
with cancer proteins, at a statistical significant level of 0.05 and 0.01 respectively. It is found that complexes
consisting of cancer proteins tend not to interact with each other. Compare to the interaction probability
between cancer-related protein complex (CRPC) pair and that of CRPC and non-CRPC pair, the ratio is 0.124.
This suggested that CRPC tends to interact with non-CRPC.
Understanding the biological significance of those complexes enriched with known cancer proteins and
their interactions may play a crucial role in studying the cause of cancer and biomedicine. It is expected that the
present work can lead to a better understanding of cell growth, differentiation and apoptosis. A web service
related
to
this
work
has
been
set
up
which
can
be
accessed
at
http://bioinfo.csie.nfu.edu.tw:8080/ProteinComplex/Default.aspx.
Key-Words: protein complex; cancer; oncoprotein; tumor suppressor protein; protein-protein
interaction; enrichment analysis
also likely connected to the same disease [2, 6,
10].
1 Background
The cause of cancer is closely related to the
gain of function of an oncoprotein (OCP) or the
lost of function of a tumor suppressor protein
(TSP), but the relationship among these cancerrelated proteins is still largely unknown and
uninvestigated. The cause of disease is often
associated with many proteins, and there are
great chances that these proteins are mutually
regulated in biological functions. Several
researches have suggested that two proteins
participating in the same protein-protein
interaction (PPI) have highly similarity in their
biological function, therefore, if a protein is
related to a disease, then its partners in PPI are
ISBN: 978-1-61804-147-0
Recent experimental studies showed that most of
the proteins do not work alone in biological
processes within an organism, but they have
temporary or stable interactions with other proteins.
Cellular functions are performed by protein
complexes which are generally composed of many
proteins. Protein complexes play critical roles in
integrating individual gene products to conduct
many cellar functions. For example, α 3β 1 tetraspanin protein complex is of vital importance in
regulating protruding activity in tumor cell [13], and
the complexes consisting of PDZ protein are critical
in constructing cell-cell adhesions and epithelial cell
polarity processes [11]. Hence, to identify the
273
Advances in Environment, Computational Chemistry and Bioscience
disease specific functional modules. They also
indicated that the majority of human essential genes
are hub proteins expressed in many organs, but most
of the disease causative genes are not essential
genes and are existed in functional periphery of the
network.
A protein is composed of many functional
domains; therefore, domain-domain interaction
(DDI) is used by many studies to examine PPI.
Jonsson and Bates employed PPI data to analyze
cancer-related proteins and domains [5]. They used
graph theory to analyze the cancer-related PPI
network, and found that the interactions mostly are
induced by cancer-related proteins than those of non
cancer-related proteins. Besides, the authors also
listed the first twenty most frequently domains,
many of them are significantly related to DNA
manipulation and restoring, such as Zinc-finger,
PHD-finger, BRCT and paired-box domains.
Chan established the weighted form of functional
region for the cancer-related protein [1]. Chan’s
study suggested that some domains having higher
tumor stimulated weight, such as protein kinase
domain, tyrosine protein kinase, SH2, SH3 and
pleckstrin-like domain. There are also some
domains having higher tumor suppressed weight,
such as armadillo-like helical, ankyrin, cullin and
exostosin-like. Adopting the weighed score to match
the complete cDNA sequence in human database,
the author identified some novel cancer-related
proteins.
Schuster-Bockler and Bateman applied PPI data
to analyze DDI [12]. They computed the distribution
of DDI, obtained from iPfam, in several PPI
databases, such as HPRD [9], MPact [4], BioGRID,
DIP [14] and IntAct. The results indicated that the
majority of PPI can be explained by a few DDI
combinations. In addition, this paper indicated that
quite a few DDI combinations also exist in crossspecies, which shows that DDI is quite conservative.
Lee et al. employed integrated heterogeneous
data to predict DDI [8]. In their paper, the authors
integrated the PPI data of yeast, C. elegans, D.
melanogaster and H. sapiens from DIP database,
domain fusion and domain function. Bayesian
method was used to integrate these data, and then a
scoring function was constructed to derive the
highly correlated DDIs. The authors listed ten
common sets of DDI, and finally the predicated
results were compared with iPFam database, the
results showed that this method has decent accuracy.
Guimaraes et al. proposed a scheme based on
PPI data to predict DDI [3]. In their work, the
authors employed PPI network and parsimony
principle to predict DDI, and then used linear
members in a protein complex is an elementary step
to understand various biological processes.
In attempting to further analyze the cancerrelated protein complexes (CRPC) and the
interactions between them, this article employed PPI
data to construct protein complex interaction
network. The findings will provide some useful
information to cancer researcher, such as: (1)
identify the CRPCs comprising with significant
higher number of OCP or TSP, and (2) characterize
the interaction pattern of the CRPC network.
This study calculated the probability of a protein
complex having at least one cancer protein, this
result could be used to predict new CRPC.
Furthermore, the biological functions in which
CRPC participate are highly related to cell growth,
differentiation and activity. Therefore, building a
model to quantify the corresponding correlation
factor will provide some useful information to
annotate those function-unknown CRPC.
Due to the availability of massive PPI data in
recent years, a large-scale investigation becomes
possible. In this research, we suggested to integrate
the PPI data and cancer protein data to determine
which cancer proteins often appear in a CRPC.
In a previous work, Oti et al. applied PPI data to
predict disease [10]. They collected 10894 human
proteins (only 6005 proteins of them came from the
true human proteins, others were inferred from
ortholog) and 72940 PPIs. Besides, the authors also
collected 432 loci of 383 diseases. If a protein
whose interaction partner nears these loci, then it is
predicted to be a possible disease causative protein.
However, there are some drawbacks of this method.
First, PPI data from yeast two hybrid (Y2H) have
high false positive and false negative problems;
second, the accuracy of PPI data inferred by crossspecies ortholog is questionable; third, the disease
gene loci from OMIM Morbid Map are different
from those from Ensembl.
Kar et al. used PPI data and protein structure
information to analyze cancer proteins [6]. The
authors integrated human cancer PPI network and
protein structure data to derive some cancer related
protein properties. They found that, when compared
with non-cancer proteins, cancer-related proteins
have four features: smaller binding sites, more flat,
highly electric charge and less hydrophobicity.
Due Goh et al. discussed the relation between
human disease network and PPI data [2]. In their
work, the authors constructed human disease
network, and they found that the disease causative
proteins from same disease have higher probability
of interacting with one another and have higher
transcript expressions, which mean the existence of
ISBN: 978-1-61804-147-0
274
Advances in Environment, Computational Chemistry and Bioscience
protein complex network. To measure the
interaction probability between CRPC pair, denoted
by p(C-C), and that of CRPC and non-CRPC pair,
denoted by p(C-X) (this number comprises of both
p(C-X) and p(X-C)), then:
p(C − C)
Ca
(2)
Rexp =
= a2 b
p(C − X) C1 C1
where Rexp , a and b are the expected ratio of p(C-C)
compared to p(C-X), the total number of CRPC and
that of non-CRPC, respectively. We defined the
observed ratio, Robs to be the observed numbers of
C-C and C-X interactions in the protein complex
interaction network Thus, if the ratio of Robs to Rexp
is less than one, it implies the observed pattern of
interaction is suppressed relative to the expected
value.
programming optimization method to estimate the
reliability of the predicted DDIs.
Krycer et al. employed PPI and DDI data to
confer protein complex network [7]. In their work,
the authors investigated that PPI and DDI data can
be used to interpret the core-module mechanism in
protein complex.
2 Method
2.1 Data source
The OCP and TSP data are derived from the
following three databases: (1) Tumor Associated
Gene database of Taiwan national Cheng Kung
University (http://www.binfo.ncku.edu.tw/TAG/),
(2) Memorial Sloan-Kettering Cancer Center and (3)
National Yang Ming University. The PPI and
protein domain data are obtained from BioGrid and
Pfam. This research collected 536/139 OCG/OCP
and 900/422 TSG/TSP. The number of OCP and
TSP is less than that of OCG and TSG, respectively,
which is due to that some genes have no Uniprot
numbers. The above data are integrated with PPI
data to derive PPI data for OCP and TSP. A total of
1818 protein complexes are analyzed for the present
study.
2.4 Protein complex interaction network
Protein interaction network can be treated as a
simple undirected graph, where each protein is
mapped to a node and the interaction between two
proteins is mapped to an edge. The visualization
graph proposed in this work can clearly show the
interaction between each protein pair and the protein
attributes, and it can be further combined to various
graph clustering algorithms to predict protein
attributes and protein complexes.
2.2 Measure the probability of cancer
protein appearing in protein complex
3 Results
The probability of a protein complex consisting of at
least one cancer protein can be computed by
hypergeometric distribution analysis. Let P(x,y)
denote the probability that the protein complex
consists of x cancer proteins and y non-cancer
proteins, then the probability is given by:
p ( x, y ) =
C xN C yN −n
C xN+ y
We proposed some statistical analysis about cancerrelated proteins in protein complex, and the
calculations were carried out using MATLAB.
3.1 Measure the probability of a cancer
protein appearing in protein complex
By hypergeometric distribution analysis, it indicated
that 248 and 36 protein complexes are enriched with
cancer proteins, i.e. more than 50% of the
complex’s subunits are cancer proteins; with
statistical significant levels at 0.05 and 0.01
respectively. It appears to be an interesting issue to
further explore the relationship between these
cancer-related protein complexes and the formation
of cancers in the future.
(1)
where C xN = N !
, N and n represent total
(( N − x)! x! )
number of proteins and cancer proteins in
protein complexes, respectively. And x+y is
total number of proteins in a certain protein
complex.
2.3 Measure the interaction
between CRPC and non-CRPC
3.2 Measure the interacted probability
among cancer-related protein complexes
probability
The computed results of p(C-C) and p(C-X) are
0.021 and 0.166, respectively. Therefore, the ratio of
p(C-C) compared to p(C-X) is estimated to be 0.124,
which means complexes consisting of cancer
proteins tend to interact with non-CRPC.
The subunit (protein) in a protein complex may
interact with another subunit in other protein
complex. If we treat a protein as a node and the
interaction among the subunit pair in different
complexes as an edge, then we can construct a
ISBN: 978-1-61804-147-0
275
Advances in Environment, Computational Chemistry and Bioscience
3.3 Protein complex interaction network
Fig.2 is a view of a non cancer-related protein
complex without PPI. In this case, there is not any
interaction among subunits, therefore, no connection
line is shown in this graph.
By The powerful graphic capability of MATLAB
makes it possible to visualize the cancer protein
complex network. In this network, proteins and
interactions are drawn as nodes and edges
respectively. OCP, TSP and non-cancer proteins are
represented by different shapes and presented by
animations.
We labeled each cancer class in PPI by different
color and shape. On the other hand, recent
experimental studies indicated that the protein
complex can be visualized as a unit composed of the
cores, modules and attachments. Core proteins are
proteins that have relatively more interactions
among
themselves.
For
clarification,
the
visualization graph is divided into two parts, with
PPI and without PPI. The results are displayed by
animation and a web service has been set up which
can
be
accessed
at
http://bioinfo.csie.nfu.edu.tw:8080/ProteinComplex/
Default.aspx.
4 Conclusion
The results indicated that 248 (13.6%) and 36
(1.98%) protein complexes are enriched with cancer
proteins at a 0.05 and 0.01 significant level
respectively. It is also found that CRPC pair tends to
seldom interact with one another, when compared to
the interactions between CRPC and non-CRPC pair,
the ratio is 0.124, which also means the observed
pattern of interaction is suppressed relative to the
expected value. Those complexes enriched with
cancer proteins and the interactions between CRPCs
are worthy to be further exploited their relations
with cancer formulation mechanisms in the future.
Acknowledgement
The work of Chien-Hung Huang and Ka-Lok Ng is
supported by the National Science Council of
Taiwan under grants NSC 101-2221-E-150-088MY2 and NSC 100-2221-E-468-013, respectively.
References:
[1] Chan HH, Identification of novel tumorassociated gene (TAG) by bioinformatics
analysis, MSc. Thesis, National Cheng Kung
University, Taiwan.
[2] Goh KI, Cusick ME, Valle D, Childs B, Vidal
M and Barabási AL, The human disease
network, Proc Natl Acad Sci U S A, Vol. 104,
No. 21, 2007, pp. 8685-8690.
[3] Guimarães KS, Jothi R, Zotenko E and
Przytycka TM, Predicting domain-domain
interactions using a parsimony approach,
Genome Biol, Vol. 7, No. 11, 2006, pp. R104.
[4] Güldener U, Münsterkötter M, Oesterheld M,
Pagel P, Ruepp A, Mewes HW and Stümpflen
V M, Pact: the MIPS protein interaction
resource on yeast, Nucleic Acids Res, 34, 2006,
pp. D436-441.
[5] Jonsson PF and Bates PA Global topological
features of cancer proteins in the human
interactome, Bioinformatics, Vol. 22, No. 18,
2006, pp. 2291-2297.
[6] Kar G, Gursoy A and Keskin O, Human cancer
protein-protein interaction network: a structural
Fig.1 A cancer-related protein complex with PPI among
its subunits.
Fig.1 is a cancer-related protein complex with
PPI among its subunits. The different types of
proteins are displayed by different shapes of nodes;
that is, TCP, OCP and non cancer-related protein are
represented by oval, square and circle, respectively.
The number inside a node is the Uniprot ID of the
corresponding protein.
Fig 2. A non cancer related protein complex without PPI.
ISBN: 978-1-61804-147-0
276
Advances in Environment, Computational Chemistry and Bioscience
285, No. 3, 2003, pp. F377-F387.
[12] Schuster-Böckler B and Bateman A, Reuse of
structural domain-domain interactions in
protein networks, BMC Bioinformatics,
18;8:259, 2007.
[13] Sugiura T and Berditchevski F, Function of
alpha3beta1-tetraspanin protein complexes in
tumor cell invasion. Evidence for the role of the
complexes
in
production
of
matrix
metalloproteinase 2 (MMP-2), J Cell Biol, Vol.
146, No. 6, 1999, pp. 1375-1389.
[14] Xenarios I, Rice DW, Salwinski L, Baron MK,
Marcotte EM and Eisenberg D, DIP: the
database of interacting proteins, Nucleic Acids
Res, Vol. 28, No. 1, 2000, pp. 289-291.
perspective, PLoS Comput Biol, Vol. 5, No. 12,
2009, pp. e1000601.
[7] Krycer JR, Pang CN and Wilkins MR, High
throughput protein-protein interaction data:
clues for the architecture of protein complexes,
Proteome Sci, 6:32, 2008.
[8] Lee H, Deng M, Sun F and Chen T, An
integrated approach to the prediction of
domain-domain
interactions,
BMC
Bioinformatics, 7:269, 2006.
[9] Mishra GR, Suresh M and Kumaran K et al.,
Human protein reference database--2006
update, Nucleic Acids Res, 34, 2006, pp. D411414.
[10] Oti M, Snel B, Huynen MA and Brunner HG,
Predicting disease genes using protein-protein
interactions, J Med Genet, Vol. 43, No. 8, 2006,
pp. 691-698.
[11] Roh MH and Margolis B, Composition and
function of PDZ protein complexes during cell
polarization, Am J Physiol Renal Physiol, Vol.
ISBN: 978-1-61804-147-0
277
Download