Session outline 1. Standards and the problem of data integration • 2. Example: PSICQUIC and the PSICQUIC game Introduction to ontologies. • Exploring the Gene Ontology -Coffee break- 3. Introduction to protein-protein interactions (PPIs) • IntAct: the molecular interactions database at the EBI -Lunch break- 4. Introduction to pathways • Reactome: a database of human biological pathways -Coffee break- 5. Network representation and analysis: strategies and limitations • Network generation and analysis through Cytoscape and PSICQUIC Introduction to protein-protein interactions (PPIs) A couple of definitions… Protein-protein interactions (PPIs): physical and selective contacts that happen between pairs of proteins, in certain molecular regions and in a defined biological context. Interactome: the totality of PPIs that happen in a cell / in an organism / in a specific biological context... Protein-protein interaction network: Graphical representation of a group of PPIs in which proteins are represented as nodes and interactions as edges. EMBL-EBI Why protein-protein interactions? Gene level DNA Protein level RNA 1 protein = 1 protein n functions = = function n1networks! WRONG! 1. To predict a protein biological function • “guilt by association” • proteins with similar functions should cluster together 2. To improve characterization of protein complexes and pathways • interaction networks work as a draft map that brings detail to biological processes and pathways EMBL-EBI Guilt by association Cyclin-dependent kinase Role in transcription regulation Proto-oncogene Transcription factor Histone methyltransferase Role in early development and hematopoiesis Cyclin Regulation of cyclin kinase, transcription regulation and cell cycle control Mediator of RNA polymerase transcription activity Role in transcription regulation Transcriptional modulator Transcription regulation dependent on kinases Something to do with transcription and cell cycle control Putative oncoprotein Involved in transcription regulation Putative oncoprotein Transcription activation Receptor-activated transcription modulator Transcription modulation, signal transduction, activated by kinases. Role in inhibition of wound healing EMBL-EBI Characterization of protein complexes and pathways Hook, B. and Schagat, T. [Internet] 2011. Available from: www.promega.com/resources/articles/pu bhub/functional-proteomics-techniquesto-isolate-and-characterize-the-humanproteasome/ Glaab et al., BMC Bioinformatics, PMID: 21144022. EMBL-EBI Protein-protein interaction detection methods Lowthroughput High-throughput Yeast-two hybrid (Y2H) Tandem affinity purification+ mass spectrometry (TAP-MS) No single method can accurately reproduce a true binary interaction observed under physiological conditions – every interaction detected experimentally is fundamentally artefactual. X-ray diffraction studies EMBL-EBI Protein-protein interaction detection methods Braun et al., Nat Methods, 2008, PMID: 19060903. EMBL-EBI Types of protein-protein interactions • Binary interactions – Two participants • Co-localization – Two or more participants closely located • N-ary ints. (associations) – Complex purification • Functional / direct ints. – E. g. enzymatic interactions EMBL-EBI Binary interactions 1 0 EMBL-EBI Colocalization 1 1 EMBL-EBI N-ary interactions (association) Schleiff et al., Nat Rev Mol Cell Biol. 2011. PMID: 211396380 EMBL-EBI Direct / functional interactions • Usually in vitro assays • Participants IDs are known in advance (predetermined) • Examples – enzymatic assays, SPR, crystallography, methods using purified protein… Images from Wikipedia: http://en.wikipedia.org/wiki/X-ray_crystallography http://en.wikipedia.org/wiki/Surface_plasmon_resonance http://en.wikipedia.org/wiki/Protein_adsorption_in_the_food_industry EMBL-EBI Representing PPIs: interaction domains interaction domains Overlap in sequence ranges: EMBL-EBI Representing PPIs: The problem with complexes • Some experimental methods generate complex data: E. g. Tandem affinity purification (TAP) • There are two algorithms to transform this information into binary data: EMBL-EBI Interactions databases: types De Las Rivas & Fontanillo, PLoS Computational biology, PMID: 20589078. EMBL-EBI Databases: curation levels SHALLOW CURATION DEEP CURATION EMBL-EBI Databases: curation levels Shallow curation BioGRID – active curation, limited number of model organisms HPRD – active curation, human focus, predicted interactions MPIDB – curation stopped, data re-located to IntAct, interactions in microorganisms InnateDB – active curation, interactions related with innate immunity Deep curation IntAct – active curation, wide species coverage, all types of molecules MINT – active curation, wide species coverage, PPIs only DIP – active curation, wide species coverage, PPIs only MPACT – curation currently stopped, limited species coverage, PPIs only MatrixDB – active curation, extracellular matrix molecules only BIND – curation stopped in 2006/7, wide species coverage, all types of molecules – information getting outdated I2D – active curation, PPIs involved in cancer EMBL-EBI Primary databases: coverage Human PPIs coverage in the main public primary databases (Dec 2009) De Las Rivas & Fontanillo, PLoS Computational biology, PMID: 20589078. EMBL-EBI A standard for PPIs representation: the IMEx consortium www.imexconsortium.org Orchard et al., Nature Methods, PMID: 22453911. EMBL-EBI IntAct: The molecular interactions database at the EBI IntAct goals & achievements 1. Publicly available repository of molecular interactions (mainly PPIs) - >430K binary interactions taken from >12,000 publications (October 2013) 2. Data is standards-compliant and available via our website, for download at our ftp site or via PSICQUIC www.ebi.ac.uk/intact ftp://ftp.ebi.ac.uk/pub/databases/intact www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml 3. Provide open-access versions of the software to allow installation of local IntAct nodes. EMBL-EBI IntAct: Data storage schema Entry [A] Publication level (entry) Publication Experiment 1 Experiment 2 [B] Experiment level [C] Interaction level Interaction 1 Interaction 2 … [D] Participant level [E] Feature level Participant 1 Participant 2 Features Features Interaction 3 Interaction 4 … … EMBL-EBI UniProt Knowledge Base Interactions can be mapped to the canonical sequence… ... to splice variants... ... or to postprocessed chains www.uniprot.org EMBL-EBI IntAct: PSI-MI ontology EMBL-EBI IntAct Curation pipeline “Lifecycle of an Interaction” Sanity Checks (nightly) reject Public web site Publication (full text) . accept p2 I p1 Curation manual report exp FTP site check CVs annotate IMEx report MatrixDB curator Mint DIP Super curator EMBL-EBI IntAct: the role of the curator DIRECT SUBMISSIONS LARGE DATASETS FROM HIGH-THROUGHPUT PROJECTS PUBLISHED MOLECULAR INTERACTIONS DATA CURATION EMBL-EBI DIRECT SUBMISSION LARGE DATASETS FROM HIGH-THROUGHPUT PROJECTS Gene Ontology FUNCTION UniProtKB PROTEIN SEQUENCES PUBLISHED MOLECULAR INTERACTIONS DATA CROSS-REFERENCES CURATION ChEBI SMALL MOLECULES Ensembl GENOME SEQUENCES InterPro FAMILIES AND DOMAINS Others STRUCTURES, ORGANISM, TISSUE... EMBL-EBI IntAct as a common curation platform General curation, domain int. General curation, large scale UniProt entry related Extracellular matrix Model organisms Immune system Commercial curation Cellular mechanics Regulatory interactions Specific curation focus/expertise Host – pathogen interactions Cardiovascular proteins Other DBs Common curation platform Specific Data Dissemination Platforms EMBL-EBI www.ebi.ac.uk/intact IntAct – Home Page EMBL-EBI IntAct webpage-based search EMBL-EBI IntAct webpage-based search Details of interaction Choice of UniProtKB or Dasty View EMBL-EBI IntAct: changing the layout EMBL-EBI IntAct: download formats EMBL-EBI PSIMITAB Columns MITAB 2.5 Standard columns (15): ID(s) interactor A & B • Alt. ID(s) interactor A & B • Alias(es) interactor A & B • Interaction detection method(s) • Publication 1st author(s) • Publication Identifier(s) • Taxid interactor A & B • Interaction type(s) • Source database(s) • Interaction identifier(s) • Confidence value(s) • MITAB 2.7 specific columns (+27): • Expansion method(s) • Biological role(s) of interactors • Experimental role(s) of interactors • Type(s) of interactors • Properties (CrossReference) of interactors / interaction • Annotation(s) of interactors / interaction • HostOrganism(s) • Parameters of interaction • Creation and update dates • Checksum(s) of interactors / interaction • Negative • Feature(s) interactors • Stoichiometry(s) interactors • Participant(s) identification method(s) EMBL-EBI Interaction detail in IntAct EMBL-EBI Detailed participant information: Dasty view EMBL-EBI IntAct: filtering results EMBL-EBI IntAct: visualizing results as a network EMBL-EBI IntAct: using lists EMBL-EBI IntAct: browse menu EMBL-EBI IntAct: advanced search ... EMBL-EBI IntAct: MIQL syntax search EMBL-EBI Searching and visualizing EMBL-EBI Using lists EMBL-EBI Checking details EMBL-EBI Advanced search EMBL-EBI Advanced search: using MIQL EMBL-EBI Ontology search EMBL-EBI Link to other PSICQUIC services EMBL-EBI More about IntAct: “on-line” EBI courses www.ebi.ac.uk/training/online/course/intact-molecular-interactions-ebi EMBL-EBI The PSICQUIC client: A unified gate to interactomics data Unified query client: PSICQUIC www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml PSI-MI MIQL Interactions Query input output PSICQUIC PSICQUIC Service A PSICQUIC Service B PSICQUIC Service C PSICQUIC Registry EMBL-EBI Unified query client interface: PSICQUIC view www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml EMBL-EBI Acknowledgements Group leader Henning Hermjakob Coordinator Sandra Orchard Curation team Margaret Duesbury Birgit Meldal Developing team Rafael Jiménez Marine Dumousseau Noemí del Toro EMBL-EBI