Protein-protein interaction and pathway databases, a graphical review

advertisement
Presentation for Shamir group meeting
Interactome under construction:
protein-protein interaction and pathway databases
5/1/2011
Based on the papers:
Protein-protein interactions: Interactome under construction.
Bonetta L. Nature. (PMID: 21150998, Dec 2010)
Protein-protein interaction and pathway databases, a graphical
review. Klingström T, Plewczynski D. Brief Bioinform. (PMID:
20851835, Sep 2010)
Protein-protein Interaction (PPI) and Biological Pathways (BP)
databases

There are two kinds of databases, each concentrate on one of
the two aspects of the biochemical/biological data:

(i) Protein-protein interaction (PPI) databases gather data on
the physical interactions between proteins.

(ii) Biological pathways (BP), databases including metabolic
and transport pathways , signaling cascades, and regulation
networks, gather data on the biological meaning of PPIs and
other possible interaction between gene products.

Most of these two kinds of databases enable visualization and
producing of maps showing a selected group of interactions.

In this presentation I will concentrate mainly on PPIs
databases, and BP databases that use PPIs.
STRING ATM- 73 interactors (> 0.700 level of integrity)
STRING ATM-100
ATM- signaling network
MAPK Cascade
KEGG map for the development of melanoma
Methods for detecting protein-protein interactions

There are two main approaches for detecting interacting proteins:
techniques that measure direct physical interactions between protein pairs
— binary approaches — and those that measure interactions among groups
of proteins that may not form physical contacts — co-complex methods.

The two main binary methods for measuring of direct physical interactions
between protein pairs are:
Yeast two-hybrid (Y2H)
luminescence-based mammalian interactome mapping (LUMIER)



The most common co-complex method is co-immunoprecipitation (co-IP)
coupled with mass spectrometry (MS)

In addition to these empirical methods, researchers have used
computational techniques to predict interactions on the basis of factors such
as amino-acid sequence and structural information.

The most frequently used binary method is the yeast two-hybrid (Y2H)
system. It has variations involving different reagents, and has been adapted
to high-throughput screening. The strategy interrogates two proteins, called
bait and prey, coupled to two halves of a transcription factor and expressed
in yeast. If the proteins make contact, they reconstitute a transcription factor
that activates a reporter gene.

LUMIER (luminescence-based mammalian interactome mapping) is a
method for identifying binary interactions. This strategy fuses Renilla
luciferaze (RL) enzyme, which catalyses light emitting reactions, to a bait
protein, which is expressed in a mammalian cell along with candidate protein
partners tagged with a polypeptide called Flag. Researchers use a Flag
antibody to immunoprecipitate all proteins with the Flag tag, along with any
that interact with them. Interactions between the RL-fused bait and the Flagtagged prey are detected when light is emitted.

The most common co-complex method is co-immunoprecipitation (coIP)
coupled with mass spectrometry (MS). In this approach, a protein bait is
tagged with a molecular marker. There are techniques to recognize the tag
and fish the bait protein out of the cell lysate, bringing with it any interacting
Proteins. These proteins are then identified by Mass Spectometry (MS).

The binary methods for measuring of direct physical interactions:

The yeast two hybrid system
A plasmid containing the DNA encoding the DNA-binding
domain of a transcription factor needed to turn on
expression of a "reporter gene" such as the lacZ gene
(that encodes the enzyme β-galactosidase) coupled to
the DNA encoding the "target" protein (the protein whose
possible partners we wish to identify) is inserted to amating type cell . In a second yeast cells, α-mating type
cells, a plasmid with the DNA encoding the activation
domain of the transcription factor coupled to the DNA
encoding a possible partner ("bait") protein is inserted .
Following the mating the α yeast cells with the a type
cells If the fusion protein produced by the transcription
and translation of a "bait"-containing plasmid can bind to
the fusion protein containing the target, the two domains
of the transcription factor can interact to turn on
expression of the reporter gene (lacZ in our case).
Grown on an indicator substrate, these colonies will turn
blue. The DNA in these colonies can then be isolated and
sequenced. The result: identification of the proteins that
can associate
LUMIER (luminescence-based mammalian interactome mapping)

LUMIER (luminescence-based mammalian interactome mapping) is a method for
identifying binary interactions, and a high throughput approach developed. This
strategy fuses Renilla luciferaze (RL) enzyme, which catalyses light emitting
reactions, to a bait protein, (A in the picture) which is expressed in a mammalian cell
along with candidate protein partners tagged with a polypeptide called Flag.
Researchers use a Flag antibody to immunoprecipitate all proteins with the Flag tag,
along with any that interact with them. Interactions between the RL-fused bait and
the Flag-tagged prey are detected when light is emitted.
The problem of the false positive PPI reports





The integrity of the results of Y2H experiment are relatively low.
The integrity of the co-IP experiments are low, partly due to the including of
some non-specific partners in a reported PPI, but mainly due to the
identification of proteins in complexes, and not direct partners.
Possible solutions:
1) Using at least two different methods when analyzing specific PPIs.
2) The interaction data obtained in an experiment can also be combined with
that available in public databases, thus providing a more complete picture
(for example using known PPIs networks of other organisms, co-expression
data, and bioinformatics tools for identification of sequences in the proteins
that promote specific interactions between proteins ).
The false negative problem

One challenge in defining protein–protein interaction networks is that unlike
the genome, the interactome is dynamic. Many interactions are transient,
and others occur only in certain cellular contexts or at particular times in
development. Interactions vary depending on the type of cell and the
cellular environment.

In the paper “Interactome under construction”, the protein–protein
interaction network for TGF-β, a growth factor that regulates cell functions
was given as an example for aforementioned complexity: It was found that
two proteins that pass on the signals from the factor inside the cell —
Smad2 and Smad4 — interact with one another only when the cells are
stimulated with TGF-β. If the cells are not stimulated, these two proteins
don’t come into contact. It seems that following the stimulation the contact
can be formed due to a change in the environment of the proteins, and/or
by implementation of specific post translation modifications on these
proteins.
New methods were developed for identifications of
Interactome changes during diseases

AQUA (multiplex absolute quantification) is a new method that its aim is to
look at dynamic changes in protein interaction networks. AQUA uses
synthetic peptides that contain stable isotopes as internal standards for the
native peptides that are produced when proteins from a cell lysate are
digested. Using tandem MS, researchers can compare the levels of native
and synthetic peptides in a cell to obtain a measure of the amount of native
proteins present. Synthetic peptides can also be prepared with
modifications This method can provide an accurate and sensitive measure
of how the stoichiometry of components within complexes that make up a
network are altered in response to a stimulus.

KAYAK (kinase activity assay for protein profiling) is another approach to
developing diagnostic tools for cancer on the basis of the functional
consequences of the interaction between a protein, in this case a kinase,
and its substrate. In this method, up to 90 peptide substrates for kinases
are used to simultaneously measure the addition of phosphate groups to
proteins in a cell lysate — in essence providing a ‘phosphorylation
signature’ for that particular cell.
Some examples for PPI and biological pathways databases








PPI databases
BIOGRID (http://thebiogrid.org)
STRING (http://stringdb.org)
Dip (http://dip.doe-mbi.ucla.edu/)
MINT (http://mint.bio.uniroma2.it/mint/Welcome.do)
INTERACTOME (http://www.ebi.ac.uk/intact/main.xhtml)
HPRD (http://hprd.org/)
BIND (http://bind.ca/)




Biological Pathways Databases
SPIKE (http://www.cs.tau.ac.il/~spike/ and http://spike.cs.tau.ac.il/spike2/ )
REACTOME (http://www.reactome.org/)
KEGG (http://www.genome.jp/kegg/)
 GeneMANIA (http://genemania.org/)

CYTOSCAPE (http://www.cytoscape.org/) is an open-source software for
network visualization It is the most important site for visualization of PPI and
biological pathway databases
NCBI_GENE (http://www.ncbi.nlm.nih.gov/gene/) is the data source for the
human genes, but gather also relevant data on the gene/gene products
interactions and regulations.
SPIKE imported data on PPIs from INTERACTOME PPI database and from
REACTOME and KEGG Biological pathways databases.
PPI databases categorization and qualifications

Stand-alone databases: BIND, DIP, HPRD, IntAct and MINT do not
incorporate data from other databases. BioGRID imported the HPRD and
Flybase databases in 2006, but have not added any more data from other
databases since then.

Topical databases: DroID (PPIs in Drosophila melanogaster), MatrixDB
(extracellular PPIs), InnateDB (PPIs in the immune system) and MPIDB
(PPIs in microbes) combine datamining from other source databases with
their own curation efforts.

Metamining databases: APID, MiMI and UniHI are with the mission to unify
source databases into a single comprehensive source meta-database.

Predictive interactions databases: HAPPI,, STRING STITCH and
Scansite. STRING combines known interaction data from interaction
databases BIND, BioGRID, DIP, IntAct MINT and HPRD with interactions
from the pathway databases PID, Reactome, KEGG and EcoCyc.
Inconsistencies in the definition of proteins’ “interaction”

Three different classes of proteins interactions are used by databases, sometimes
even without separation: binary physical interactions, same-complex belonging
(non-direct interactions) , and non-physical functional interactions.

Due to these inconsistencies in the “interaction” definition, there is a confusion
regarding the size of the human interactome: Venkatesan et al estimates the size
to 130,000 interactions, Hart et al. to 154,000–369,000 interactions and Stumpf et
al. to 650,000 interactions.

Closer inspection reveals that each team has defined its own search space as the
human interactome: Venkatesan et al. use the most restrictive definition and only
include binary physical interactions, Hart et al. use in-house experimental data
obtained by IP-MS to create its source networks which means that proteins
belonging to the same protein complex are also considered to be interacting, thus
increasing the size of their defined interactome. Stumpf et al. rely on a
combination of yeast two hybrid (Y2H) derivated data sets and literature curated
data from DIP and IntAct. Some Literature curated databases uses a more flexible
definition of “interaction”: some of the papers considers also non-physical functional
interactions to be a form of interaction. This definition enlarged significantly the
number of interactions.

With the current technologies the human known PPIs are ~35,000, only about 1/4 of
the estimated number of interactions, so the central problem in the construction of
the Interactome is the false negative problem – the known interactions are just
we the tip of the iceberg and we still need to identify a huge amount of PPIs.
Some examples for the organisms and volume of PPI and BP databases



BIOGRID (http://thebiogrid.org/)
50 model organism species
The online interaction repository with data compiled version 3.1.71
includes 362,355 raw protein and genetic interactions from major model organism
species.
STRING (http://stringdb.org)
STRING is a database of known and predicted protein interactions.
The interactions include direct (physical) and indirect (functional) associations;
STRING quantitatively integrates interaction data from these sources for a large
number of organisms, and transfers information between these organisms where
applicable. The database currently covers 2,590,259 proteins from 630 organisms.
GeneMANIA (http://genemania.org/)
Indexing 817 association networks containing 185,324,281 interactions mapped to
135,148 genes from 6 organisms.
HPRD (http://hprd.org/)
39,194 Protein-Protein Interactions (human)






Dip (http://dip.doe-mbi.ucla.edu/)
more than 80 genome BUT for human they have only 2529 proteins, 3376
interactions
INTERACTOME (http://www.ebi.ac.uk/intact/main.xhtml)
Contains: 234,147 binary interactions. 69,669 proteins.
MINT (http://mint.bio.uniroma2.it/mint/Welcome.do)
30 organisms, 90503 interactions (21938 human)
SPIKE: (http://www.cs.tau.ac.il/~spike/) and
http://spike.cs.tau.ac.il/spike2/ 34266 interactions (human only)
STRING_ATM_Interactors
Interactions databases
Metamining and predictive PPIs databases

The PPI community has been characterized by a wide and open
distribution of proteomic data through the collection of PPI and
pathway databases. The ability to distribute and share data
between various research groups has resulted in a large number of
different source databases. However, the general overlap between
PPIS databases is limited which means that a common procedure
for researchers is to unify these diverse data sets to support their
own work. Several metamining databases have been created that
perform such unification. This has lead to the spontaneous
development of a network of data exchange between literature
curated databases, metamining databases and databases
generating predicted PPIs.

The exchange of information is supported by three major data
exchange formats: BioPAX, PSI-MI and SBML.
Predictive interaction databases
Metamining databases
Pathway databases
Download