PPTX

advertisement
Macromolecular complexes – A new
Online Portal (under construction!)
Birgit Meldal
(IntAct)
Overview
• Aims & Definitions
• Data Sources
• Issues and Challenges:
• Nomenclature
• Sets
• ‘Transient’ complexes
• GO
• Confidence scores
• Inference
• Visualisation
• Search Parameters and Filters
• Status quo
Project Aim
• To design a Online Portal to search and visualise protein complexes
• Including cross-referencing to source databases and beyond
• Export to interested parties in a format of their choice
• Incorporate the data into network analysis tools
• To curate a ‘starter set’ of protein complexes for 4 major model
organisms, chosen to span the taxonomic range –
• Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae,
Escherichia coli
• Which will be expanded to a second set of organisms –
• Mus musculus, Caenorhabditis elegans, Drosophila melanogaster,
Saccharomyces pombe
• IntAct provides the data structure
Long-term Strategy
• Create stable complex identifiers
• Joined curation effort
 benefit to all collaborating databases:
• Resource sharing
• Elimination of redundancies
 benefit to user:
• One central resource that links to all source databases
Definition: stable protein complexes
A stable set (2 or more) of interacting protein molecules which
•
can be co-purified and
•
have been shown to exist as a functional unit in vivo.
Non-protein molecules (e.g. small molecules, nucleic acids) may also be
present in the complex.
What is not a stable complex?
• Enzyme/substrate or any similar transient interaction
• Two proteins associated in a pulldown / coimmunoprecipitation
with no functional link
Source Databases
• Reactome – human (EBI), Gramene – arabidopsis ,
Microme – bacteria (EBI)
•
•
•
•
•
PDBe (EBI) – mainly human
ChEMBL (EBI)
MatrixDB (Sylvie Richard-Blum)
Mining UniProt – yeast (Bernd Roechert, SIB – manually)
Unmaintained web resources – CYGD (yeast), CORUM
(human), E. coli website, 3D Complexes (Sarah
Teichmann, EBI)
• Manual curation from IMEx DBs & the literature (Sandra
& Birgit)
Issues • Currently, complexes are shoe-horned into an interaction
which is part of a dummy publication and dummy
experiment
• New, complex-specific functionality, parameters and tools
are needed
Issues - Nomenclature
• Most complexes have no ‘common’ name, or the
‘common’ name is defined differently depending on
authors or host organism.
• One name can describe multiple complexes (e.g. AP1
describes ~25 different homo/heterodimers)
• Reactome makes a string of all components by gene
name but this can become too long for our short-label.
• We will need both ‘recommended’ and ’systematic’ name.
• List of synonyms already available as free-text.
• Collaboration with GO, Reactome, HGNC
Issues – open/fuzzy sets
• Complexes where the identity of one or more participants
is unknown, i.e. participant(s) are only identified to a set
of (related) proteins
• Stoichiometry: often not known or ‘average’ (e.g. ion
channel pore proteins)
• Only sub-set of a given complex curated because
functional assays often focus on interactions between
catalytic subunits
Issues – indirect activation & transient complexes
•
Complexes that are activated without direct ligand
interaction
− e.g. through change of pH
− transient interactions
• Kim van Roey, Heidelberg: coorperative interactions
 Different complex? Same participants!
GO:0043234 – protein complex (> 400)
Issues - Gene Ontology
• Currently, complexes mostly children of GO:0043234
protein complex (> 400) – lacking hierarchal structure
• Collaboration with GO to provide structured annotation
• New terms should capture all potential complexes from
all species for which a parental term is appropriate
• E.g. DNA Polymerase complex
• Needs to allow for (open) sets of proteins / protein
families
Issues - Confidence
• We need to define confidence scores:
• Do we know all participants of the complex?
• Do we have (open) sets of participants?
• How do we indicate the depth of data available, i.e. compare
Reactome import vs. manual curation?
• e.g. using Evidence Code Ontology (ECO)
•  only qualitative description
• Need a quantitative identifier
Issues – Inference data
• Do we use inference/modelling data (e.g. Compara)?
• Where is the cut-off for ‘model organisms’?
• e.g. function remains but participants change
Issues – Visualisation
• Flexible display of 2D and 3D options to capture complexity
• The majority of complexes has 5 participants, average size 2.3
• For large complexes it needs to be dynamic:
• use zoom-in/-out functionality on demand,
• display only main participants or subcomplexes by default and expand on
demand,
• This might be achieved by assigning confidence scores to different levels
of the complex by which it collapses/expands…
• Most biological network packages, e.g. Cytoscape, not up to it
• BioLayout 3D, ONDEX
• For crystal structures link to PDB (e.g. BioJS widget)
Gene name in bubble with
hyperlink to UniProtKB
Bubble diagram
Weak evidence
of Ix
Protein
B
Small
Molecule
Ix
Ix
Protein
A
Search for all Ix or Cx
containing one or more
of these participants
Ix
Hyperlink to
IMEx Ix AC
*
?
Protein
D
Protein
C
*
Ix = Interaction, Cx = Complex
Ix
*
Ix
Strong evidence
of Ix
Ix
Unknown which participant
is direct interactor
Protein
C
Hyperlink to binding site
(IMEx/InterPro)
* Need to query
hyperlinks from whole
database on the fly
rather than having a
static link to just one Ix
Issues – Search Parameters
Simple Search:
Advanced Search Filters:
•
•
•
•
•
•
•
•
•
Stoichiometry
•
Binding sites
•
Biological role
•
Source DB
InterPro Domain
•
Host organism
GO term
•
Interactor type (protein, small mol., NA)
PMID
•
ECO
•
Process/Pathway
•
Stable vs. transient
•
Confidence score
•
Orthology
•
Disease
•
No. of participants
-
UniprotKB ID / protein name
Gene ID / name
Small molecule ID / name
Complex ID / name
Drug
Already searchable
New search parameters
Most important new search parameter!
Status quo?
• > 550 complexes already curated (Sandra, Bernd, Birgit),
many imported (e.g. MatrixDB from Sylvie)
•
•
•
•
Exporter for Reactome working (David Croft)
PDB export under construction (Jose Dana)
ChEMBL xref list available (Yvonne Light)
Not all necessary features incorporated into Editor 
breaks release!
• e.g. complexes can’t be participants
• JAMI under construction (Marine!)
• It’s a complex project  which needs collaboration!!!
Acknowledgements
Proteomics Services
Reactome
•
IntAct
•
•
•
Sandra Orchard
ChEMBL
•
Marine Dumousseau
•
Noemi del Toro Ayllón
•
•
•
Rafael Jimenez
•
Pablo Porras
•
Margaret Duesbury
Henning Hermjakob
SIB
•
Bernd Roechert
MatrixDB
•
Sylvie-Ricard-Blum
Steve Jupe
David Croft
Anna Gaulton
Yvonne Light
PDBe
•
•
Sameer Velankar
Jose Dana
GO
•
•
•
Jane Lomax
Rachel Huntley
Heiko Dietze
Download