13.04.2015 Master title Molecular Interactions and Pathways 5 Sandra Orchard EMBL-EBI EBI is an Outstation of the European Molecular Biology Laboratory. Why is it useful to study PPI interactions, networks and pathways? • Proteins are the workhorses of cell and all their activities are controlled through interactions with other molecules. • To understand the biology of a single protein, you have to study its interacting partners • Network/pathway analysis increasingly used as a tool to annotate large data sets – proteins involved in a common process tend to cluster and be present in the same pathway 2 Why are there so many issues with interaction data? 1. Wide variety of methods for demonstrating molecular interactions – all have their strengths and weaknesses 2. No single method accurately defines an interaction as being a true binary interaction observed under physiological conditions Why do we need interaction databases • Issues with all interaction data – true picture can only be built up by combining data derived using multiple techniques, multiple laboratories • Problematic for any bench researcher to do – issues with data formats, molecular identifiers, sheer volume of data • Molecular interaction databases publicly funded to collect this data and annotate in a format most useful to researchers Why are data standards essential • Prior to 2003, many databases= many formats. User must reformat when merging data • File conversion inevitably leads to data loss • Many formats compromised tool development – each tool developed tended to be database specific 5 PSI-MI XML format • Community standard for Molecular Interactions • XML schema and detailed controlled vocabularies • Jointly developed by major data providers: BIND, CellZome, DIP, GSK, HPRD, Hybrigenics, IntAct, MINT, MIPS, Serono, U. Bielefeld, U. Bordeaux, U. Cambridge, and others • Version 1.0 published in February 2004 The HUPO PSI Molecular Interaction Format - A community standard for the representation of protein interaction data. Henning Hermjakob et al, Nature Biotechnology 2004, 22, 176-183. • Version 2.5 published in October 2007 Broadening the Horizon – Level 2.5 of the HUPO-PSI Format for Molecular Interactions; Samuel Kerrien et al. BioMed Central. 2007. 6 PSI-MI XML benefits • Collecting and combining data from different sources has become easier • Standardized annotation through PSI-MI ontologies • Tools from different organizations can be chained, e.g. analysis of IntAct data in Cytoscape. Home page http://www.psidev.info/MI 7 Controlled vocabularies www.ebi.ac.uk/ols IMEx • Consortium of 9 molecular interaction databases dedicated to producing high quality, annotated data, curated to the same standards • Data is curated once at a single centre then exchanged between partners • Users need only go to a single site to obtain all data • www.imexconsortium.org 10 www.imexconsortium.org IntAct goals & achievements 1. Publicly available repository of molecular interactions (mainly PPIs) - ~305K binary interactions taken from >6,200 publications (December 2012) 2. Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC http://www.ebi.ac.uk/intact ftp://ftp.ebi.ac.uk/pub/databases/intact www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml 3. Provide open-access versions of the software to allow installation of local IntAct nodes. 11 IntAct Curation “Lifecycle of an Interaction” Sanity Checks (nightly) reject Public web site Publication (full text) . exp accept p2 I p1 FTP site check CVs annotate Curation manual IMEx report report MatrixDB curator Master headline Super curator Mint DIP UniProt Knowledge Base http://www.ebi.uniprot.org/ Interactions can be mapped to the canonical sequence… 13 .. to splice variants.. .. or to postprocessed chains Data model • Support for detailed features i.e. definition of interacting interface Interacting domains Overlay of Ranges on sequence: 14 How to deal with Complexes 15 • Some experimental protocol do generate complex data: Eg. Tandem affinity purification (TAP) • One may want to convert these complexes into sets of binary interactions, 2 algorithms are available: http://www.ebi.ac.uk/intact IntAct – Home Page 16 Ontology search 17 Interaction detail Choice of UniProtKB or Dasty View 18 PubMed/IMEx ID Details of interaction Viewing Interaction Details Additional information 19 Interaction Details 20 Visualizing - networkView 21 Applying a better graph layout… Visualization Master headline Cytoscape Plugins 23 A Database of human biological pathways Reactome is… Extensively cross-referenced Tools for data analysis – Pathway Analysis, Expression Overlay, Species Comparison, Biomart… Used to infer orthologous events in 20 other species Using model organism data to build pathways – Inferred pathway events PMID:5555 Direct evidence PMID:4444 Direct evidence human PMID:8976 mouse Indirect evidence PMID:1234 cow Theory - Reactions Pathway steps = the “units” of Reactome = events in biology BINDING DEGRADATION DISSOCIATION DEPHOSPHORYLATION PHOSPHORYLATION CLASSIC TRANSPORT BIOCHEMICAL Reactions Connect into Pathways CATALYST CATALYST CATALYST INPUT OUTPUT INPUT OUTPUT INPUT OUTPUT Species Selection Data Expansion – Projecting to Other Species Human B A + ATP A -P + ADP Mouse B A A -P + ADP + ATP Drosophila A + ATP B No orthologue - Protein not inferred Reaction not inferred The Pathway Browser Species selector Diagram Key Sidebar Zoom/move toolbar Pathway Diagram Panel Details Panel (hidden) Thumbnail The Details Panel Pathway Analysis Pathway Analysis – Overrepresentation P-val Reveal next level ‘Top-level’ Species Comparison I Species Comparison II Yellow = human/rat Blue = human only Grey = not relevant Black = Complex Expression Analysis I Expression Analysis II Step through Data columns ‘Hot’ = high ‘Cold’ = low Summary Network and pathway analysis enable the researcher to: 1. Identify clusters of proteins – these may share the same function (stable complex), process or subcellular location 2. Identify proteins involved in the same pathway i.e. in the same process (only works for those proteins which can be placed in pathways) 3. Add biological meaning to a list of gene/transcript/protein identifiers. 39 http://www.ebi.ac.uk/training/online/ Interactions, Pathways and Networks Analyzing protein-protein interaction networks. Koh GC , Porras P , Aranda B , Hermjakob H , Orchard SE PMID:22385417 J Proteome Res [2012 (11) ] page info:2014-31 40 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 41 ? ? ? Current IntAct support: European Commission grants PSIMEx (FP7-HEALTH-2007-223411) APO-SYS (FP7-HEALTH-2007-200767) Affinomics (241481) The development of Reactome is supported by a grant from the US National Institutes of Health (P41 HG003751), EU grant LSHGCT-2005-518254 "ENFIN", Ontario Research Fund, and the EBI Industry Programme. 42