IntAct Presentation

advertisement

15.04.2020

Master title

Molecular Interactions – the IntAct Database

5 Sandra Orchard

EMBL-EBI

EBI is an Outstation of the European Molecular Biology Laboratory.

Why is it useful to study PPI interactions and networks?

• Proteins are the workhorses of cell – and all activities are controlled through interactions with other molecules.

• To understand the biology of a single protein, you have to study its interacting partners

• One way to predict protein function is through identification of binding partners – Guilt by Association. If the function of at least one of the components with which the protein interacts is known, that should let us assign its function(s) and the pathway(s)

• Hence, through the intricate network of these interactions we can map cellular pathways, their interconnectivities and their dynamic regulation

2

Why are there so many issues with interaction data?

1.

Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses

2. No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions

Why do we need interaction databases

• Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories

• Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data

• Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers

Interaction Databases

Deep Curation

IntAct – active curation, broad species coverage, all molecule types

MINT – active curation, broad species coverage, PPIs

DIP – active curation, broad species coverage, PPIs

MPACT - ? curation, limited species coverage, PPIs

MatrixDB – active curation, extracellular matrix molecules only

BIND – ceased curating 2006/7, broad species coverage, all molecule types – information becoming dated

Shallow curation

BioGRID – active curation, limited number of model organisms

HPRD – active curation, human-centric, modelled interactions

MPIDB – active curation, microbial interactions

6

Engineering 1850

• Nuts and bolts fit perfectly together, but only if they originate from the same factory

• Standardisation proposal in 1864 by

William Sellers

• It took until after WWII until it was generally accepted, though …

Proteomics 2003

Proteomics data are perfectly compatible, but only if they are from the same lab / database / software

• “Publish and vanish” by data producers

Collecting all publicly available data requires huge effort

• Urgent need for standardisation

What constitutes a PSI standard

• Documents that make up each individual standard

• Minimal reporting requirements => MIAPE document

• XML Data exchange format

• Domain-specific controlled vocabulary

MIMIx

9

PSI-MI XML format

• Community standard for Molecular Interactions

• XML schema and detailed controlled vocabularies

• Jointly developed by major data providers:

BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U.

Bordeaux, U. Cambridge, and others

• Version 1.0 published in February 2004

The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data.

Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183.

• Version 2.5 published in October 2007

Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions;

Samuel Kerrien et al. BioMed Central. 2007.

10

PSI-MI XML benefits

• Collecting and combining data from different sources has become easier

• Standardized annotation through PSI-MI ontologies

• Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape.

Home page http://www.psidev.info/MI

www.ebi.ac.uk/ols

Controlled vocabularies

Additional benefits

• MITAB format – released 2007 by popular demand. Tab-delimitated organisation of data.

• PSIQUIC – query access that runs across all interaction databases using PSI formats

• PSISCORE – common scoring mechanism in development

• Access to R Bioconductor statistics packages

• Growth industry in “composite” databases – do no new curation but merge the output of resources producing data in PSI format.

• IMEx

IMEx

• Consortium of molecular interaction databases dedicated to producing high quality, annotated data, curated to the same standards

• Data will be curated once at a single centre then exchanged between partners

• Users need only go to a single site to obtain all data

14

IntAct goals & achievements

1. Publicly available repository of molecular interactions (mainly PPIs) - ~300K binary interactions taken from >5,300 publications (May

2012)

2. Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC http://www.ebi.ac.uk/intact ftp://ftp.ebi.ac.uk/pub/databases/intact www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml

3. Provide open-access versions of the software to allow installation of local IntAct nodes.

IntAct Curation

“Lifecycle of an Interaction”

Publication

(full text)

CVs

Sanity Checks

(nightly) reject

Public web site

.

Curation manual annotate exp p2

I p1 accept

FTP site

IMEx report curator

Master headline report

Super curator

MatrixDB

Mint DIP

16

UniProt Knowledge Base

Interactions can be mapped to the canonical sequence…

.. to splice variants..

.. or to postprocessed chains

Relationship with UniProtKB

Interaction curation

Protein sequence

High confidence

PPIs

Data filters

Other

DBs

Master headline

Other IMEx databases

In place

Early 2012

18

Data model

• Support for detailed features i.e. definition of interacting interface

Interacting domains

Overlay of Ranges on sequence:

19

How to deal with Complexes

• Some experimental protocol do generate complex data:

Eg. Tandem affinity purification (TAP)

• One may want to convert these complexes into sets of binary interactions, 2 algorithms are available:

Performing and visualing a Simple Search

Data, Standards and Tools

EBI Walthrough

May 2009

EBI

EBI is an Outstation of the European Molecular Biology Laboratory.

21

IntAct – Home Page

22

Performing a Simple Search

23

Visualizing - networkView

24

Extend and Visualise your Search

25

Visualizing networkView

26

Cytoscape Web

• Cytoscape Web - web-based network visualization tool

• Modeled after Cytoscape – open-source, interactive, customizable and easily integrated into web sites.

• Contains none of the plugin architecture functionality of

Cytoscape

Visualization

Master headline

Visualization

Master headline

Visualization

Master headline

Visualization

Master headline

Visualization

Master headline

Visualization

Master headline

Visualization

Master headline

34

Cytoscape Plugins

Exploring a single interaction in more depth

EBI is an Outstation of the European Molecular Biology Laboratory.

Interaction detail

36

Details of interaction

Choice of UniProtKB or Dasty View UniProt

Taxonomy

PubMed/IMEx ID

37

Detail of interaction

Details of interaction

Interaction

Score

Expansion method

38

Interaction Score

• All evidences of Protein A interacting with Protein B are clustered.

• Evidences are scored according to a. Interaction detection method b. Interaction type c. Number of publications interaction has been observed in

Score is normalised on 0-1 scale

Low score – low confidence interaction

High score – high confidence interaction

39

Changing the tabular view

40

Participant information

41

Interaction detail

Details of interaction

Viewing Interaction Details

42

Additional information

43

Interaction Details

44

IntAct – Home Page-Quick Search

Advanced search

Filtering options

Add more filtering options

46

Ontology search

47

Searching with MIQL

• Using the Molecular Interaction Query Language

(MIQL), one can also build complex queries

• List of terms one can query on :

48

Browsing – Molecule View

49

Browsing – extending your search

50 http://www.ebi.ac.uk/training/online/

Interactions, Pathways and Networks

Network analysis

Analyzing protein-protein interaction networks.

Koh GC , Porras P , Aranda B , Hermjakob H , Orchard SE

PMID:22385417

J Proteome Res [2012 (11) ] page info:2014-31

51

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

Download