15.04.2020
Master title
Molecular Interactions – the IntAct Database
5 Sandra Orchard
EMBL-EBI
EBI is an Outstation of the European Molecular Biology Laboratory.
• Proteins are the workhorses of cell – and all activities are controlled through interactions with other molecules.
• To understand the biology of a single protein, you have to study its interacting partners
• One way to predict protein function is through identification of binding partners – Guilt by Association. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)
• Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation
2
Why are there so many issues with interaction data?
1.
Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses
2. No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions
Why do we need interaction databases
• Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories
• Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data
• Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers
Interaction Databases
Deep Curation
IntAct – active curation, broad species coverage, all molecule types
MINT – active curation, broad species coverage, PPIs
DIP – active curation, broad species coverage, PPIs
MPACT - ? curation, limited species coverage, PPIs
MatrixDB – active curation, extracellular matrix molecules only
BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated
Shallow curation
BioGRID – active curation, limited number of model organisms
HPRD – active curation, human-centric, modelled interactions
MPIDB – active curation, microbial interactions
6
Engineering 1850
• Nuts and bolts fit perfectly together, but only if they originate from the same factory
• Standardisation proposal in 1864 by
William Sellers
• It took until after WWII until it was generally accepted, though …
Proteomics 2003
•
Proteomics data are perfectly compatible, but only if they are from the same lab / database / software
• “Publish and vanish” by data producers
•
Collecting all publicly available data requires huge effort
• Urgent need for standardisation
What constitutes a PSI standard
• Documents that make up each individual standard
• Minimal reporting requirements => MIAPE document
• XML Data exchange format
• Domain-specific controlled vocabulary
MIMIx
9
PSI-MI XML format
• Community standard for Molecular Interactions
• XML schema and detailed controlled vocabularies
• Jointly developed by major data providers:
BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.
Bordeaux, U. Cambridge, and others
• Version 1.0 published in February 2004
The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.
Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.
• Version 2.5 published in October 2007
Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;
Samuel Kerrien et al. BioMed Central. 2007.
10
PSI-MI XML benefits
• Collecting and combining data from different sources has become easier
• Standardized annotation through PSI-MI ontologies
• Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape.
Home page http://www.psidev.info/MI
www.ebi.ac.uk/ols
Controlled vocabularies
Additional benefits
• MITAB format – released 2007 by popular demand. Tab-delimitated organisation of data.
• PSIQUIC – query access that runs across all interaction databases using PSI formats
• PSISCORE – common scoring mechanism in development
• Access to R Bioconductor statistics packages
• Growth industry in “composite” databases – do no new curation but merge the output of resources producing data in PSI format.
• IMEx
IMEx
• Consortium of molecular interaction databases dedicated to producing high quality, annotated data, curated to the same standards
• Data will be curated once at a single centre then exchanged between partners
• Users need only go to a single site to obtain all data
14
IntAct goals & achievements
1. Publicly available repository of molecular interactions (mainly PPIs) - ~300K binary interactions taken from >5,300 publications (May
2012)
2. Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC http://www.ebi.ac.uk/intact ftp://ftp.ebi.ac.uk/pub/databases/intact www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml
3. Provide open-access versions of the software to allow installation of local IntAct nodes.
IntAct Curation
“Lifecycle of an Interaction”
Publication
(full text)
CVs
Sanity Checks
(nightly) reject
Public web site
.
Curation manual annotate exp p2
I p1 accept
FTP site
IMEx report curator
Master headline report
Super curator
MatrixDB
Mint DIP
16
UniProt Knowledge Base
Interactions can be mapped to the canonical sequence…
.. to splice variants..
.. or to postprocessed chains
Relationship with UniProtKB
Interaction curation
Protein sequence
High confidence
PPIs
Data filters
Other
DBs
Master headline
Other IMEx databases
In place
Early 2012
18
Data model
• Support for detailed features i.e. definition of interacting interface
Interacting domains
Overlay of Ranges on sequence:
19
How to deal with Complexes
• Some experimental protocol do generate complex data:
Eg. Tandem affinity purification (TAP)
• One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:
Performing and visualing a Simple Search
Data, Standards and Tools
EBI Walthrough
May 2009
EBI
EBI is an Outstation of the European Molecular Biology Laboratory.
21
IntAct – Home Page
22
Performing a Simple Search
23
Visualizing - networkView
24
Extend and Visualise your Search
25
Visualizing networkView
26
Cytoscape Web
• Cytoscape Web - web-based network visualization tool
• Modeled after Cytoscape – open-source, interactive, customizable and easily integrated into web sites.
• Contains none of the plugin architecture functionality of
Cytoscape
Visualization
Master headline
Visualization
Master headline
Visualization
Master headline
Visualization
Master headline
Visualization
Master headline
Visualization
Master headline
Visualization
Master headline
34
Cytoscape Plugins
Exploring a single interaction in more depth
EBI is an Outstation of the European Molecular Biology Laboratory.
Interaction detail
36
Details of interaction
Choice of UniProtKB or Dasty View UniProt
Taxonomy
PubMed/IMEx ID
37
Detail of interaction
Details of interaction
Interaction
Score
Expansion method
38
Interaction Score
• All evidences of Protein A interacting with Protein B are clustered.
• Evidences are scored according to a. Interaction detection method b. Interaction type c. Number of publications interaction has been observed in
Score is normalised on 0-1 scale
Low score – low confidence interaction
High score – high confidence interaction
39
Changing the tabular view
40
Participant information
41
Interaction detail
Details of interaction
Viewing Interaction Details
42
Additional information
43
Interaction Details
44
IntAct – Home Page-Quick Search
Advanced search
Filtering options
Add more filtering options
46
Ontology search
47
Searching with MIQL
• Using the Molecular Interaction Query Language
(MIQL), one can also build complex queries
• List of terms one can query on :
48
Browsing – Molecule View
49
Browsing – extending your search
50 http://www.ebi.ac.uk/training/online/
Interactions, Pathways and Networks
Network analysis
Analyzing protein-protein interaction networks.
Koh GC , Porras P , Aranda B , Hermjakob H , Orchard SE
PMID:22385417
J Proteome Res [2012 (11) ] page info:2014-31
51
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?