Small Molecules EBI Bioinformatics Roadshow Gareth Owen, ChEBI group The Jackson Laboratory October 18th 2012 Services | Research | Training | Industry Course Objectives In this course you will learn… • How small molecules are stored in databases. • How data related to small molecules is stored in ChEBI and ChEMBL and how to query these databases • Understand the ChEBI ontology • How to access and query enzyme resources at the EBI, using the Enzyme Portal, with a closer look at individual resources such as IntEnz and Rhea • How the Metabolights database can be used for storing information about metabolomics experiments Exercises. • • • • Separate exercise sheets for each resource discussed. Help reinforce learning. Work alone or in teams. Solutions will be shown in a run-through before the start of the next session. Questions • Please feel free to ask questions at any time. • If you are confused, you are probably not alone. • I am be happy to answer all questions, provided you will allow the following responses: • “We’ll be discussing that later”. • “I don’t know” • Please do not deal with emails, etc. during the sessions • Please turn off mobiles, or set to vibrate. EBI Metabolomics and Bioinformatics Resources training workshop The Jackson Laboratory Thursday 18th October 2012 Time Subject 09.00-09.30 Introduction to EBI and EBI search 09.30-10.30 Introduction to ChEBI Exercises 10.30-11.00 Tea & Coffee/ break 11.00-12.30 ChEBI: Searching and the ChEBI Ontology Exercises 12.30-13.15 13.15-14.30 Lunch 14.30-15.00 15.00-15.30 The Enzyme Portal, IntEnz and Rhea Exercises Introduction to MetaboLights Tea & Coffee break 15.30-16.00 Small molecules and PDBe 16.00-17.00 Introduction to ChEMBL Exercises Course Feedback The EMBL-European Bioinformatics Institute A whistlestop tour Services | Research | Training | Industry What is bioinformatics? • The science of storing, retrieving and analysing large amounts of biological information • An interdisciplinary science, involving biologists, computer scientists and mathematicians • At the heart of modern biology 7 Biology is changing • Data explosion • New types of data 12000 • High-throughput biology 10000 • Growth of applied biology 8000 Disks (TB) • Emphasis on systems, not reductionism Growth of raw storage at EMBL-EBI (in terabytes) • molecular medicine 6000 4000 2000 0 • agriculture • food • environmental sciences… 8 Year New types of data Literature Genomes Protein sequence Proteomes Nucleotide sequence Protein structure Gene expression Protein families, domains and motifs Chemical entities Protein-protein interactions Pathways 9 Systems What is EMBL-EBI? • • • • 10 Bioinformatics research and services institute Non-profit organisation ~ 500 staff Part of the European Molecular Biology Laboratory The five branches of EMBL Heidelberg • Basic research in molecular biology • Administration • EMBO • • Hamburg Structural biology Grenoble Hinxton Bioinformatics Monterotondo 1500 staff >60 nationalities Structural biology Mouse biology EMBL member states Austria, Belgium, Croatia, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Israel, Italy, Luxembourg, the Netherlands, Norway, Portugal, Spain, Sweden, Switzerland and the United Kingdom Associate member state: Australia 12 The Wellcome Trust Genome Campus Sanger Institute Sulston building Data centre Sanger labs / informatics Cairns Pavilion (shared) EMBL-EBI © John Freebury 13 EMBL-EBI’s Mission • • • • • 14 To provide freely available data and bioinformatics services to all facets of the scientific community in ways that promote scientific progress To contribute to the advancement of biology through basic investigator-driven research in bioinformatics To provide advanced bioinformatics training to scientists at all levels, from PhD students to independent investigators To help disseminate cutting-edge technologies to industry To coordinate biological data provision across Europe EMBL-EBI external funding Sources of external funding for the year as of December 2010. The Wellcome Trust also supports us through provision of our buildings. The UK’s Biotechnology and Biological Sciences Research Council (BBSRC) awarded a further €11.4m in August 2009 in support of EMBL-EBI’s planned role as the central hub of ELIXIR. 15 Services www.ebi.ac.uk/services Services | Research | Training | Industry Key facts about services • European node for globally coordinated data collection and dissemination projects • Core databases produced in collaboration with other world leaders, including NCBI (US), National Institute of Genetics (Japan), Swiss Institute of Bioinformatics, Cold Spring Harbor Laboratory (US) • The world’s most comprehensive collection of molecular databases 17 Principles of service provision 18 • Accessibility – all data and tools freely available without restriction, apart from information that could be used to identify individuals • Compatibility – we develop and promote the use of standards in bioinformatics • Comprehensive data sets – agreements with other data providers ensure that our resources contain comprehensive and up-to-date data; agreements with publishers ensure that published data are placed in a public repository at the earliest opportunity • Portability – data and software can be downloaded and installed locally • Quality – Our databases are enhanced through annotation and cross-referencing Databases: molecules to systems Genomes Ensembl Ensembl Genomes EGA Nucleotide sequence ENA Functional genomics ArrayExpress Expression Atlas Literature and ontologies CiteXplore, GO Protein families, motifs and domains InterPro Macromolecular PDBe Protein activity IntAct , PRIDE Pathways Reactome Protein Sequences UniProt Chemical entities ChEBI Chemogenomics ChEMBL 19 Systems BioModels BioSamples Database collaborations 20 Standards development – international collaborations Genomics Standards Consortium (GSC) http://gensc.org Genome annotation www.geneontology.org Protein sequence www.uniprot.org Nucleotide sequence www.insdc.org Functional Genomics Data Society www.fged.org Cheminformatics www.ebi.ac.uk/chebi HUPOProteomics Standards Initiative (PSI) www.psidev.info/ Pathways www.reactome.org www.biopax.org Metabolomics Standards Initiative (MSI) www.metabolomicssociety.org 21 Protein structure www.wwpdb.org Systems modelling standards www.sbml.org New search service Access from the EBI’s homepage Species selector allows for easy comparison Data organised according to: • gene • expression • protein • structure • literature 22 Explore data, return easily to your results Goals of the new EBI Search • Relevant to ‘wet-lab’ biologists • Organises information based around a single gene (or a small number of genes) • User-expectation centric (not database centric) • Smooth transition to the detailed information in many of EBI’s core databases • NOT for bioinformaticians: does not provide programmatic access 23 User support • E-mail support – www.ebi.ac.uk/support • Online help pages – www.ebi.ac.uk/help • eLearning Portal – coming soon 24 Research www.ebi.ac.uk/groups Services | Research | Training | Industry Key facts about research at EMBL-EBI • A unique environment for bioinformatics research • Nine dedicated research groups • Seven services teams also carry out R&D • Research and services are mutually supportive Training www.ebi.ac.uk/training Services | Research | Training | Industry Pre- and postdocs at EMBL-EBI • EMBL International PhD Programme • Postdoctoral fellowships: • EIPOD – EMBL-sponsored interdisciplinary fellowships • ESPOD – EBI–Sanger combined experimental and computational fellowships 3 1 For further information go to: http://www.ebi.ac.uk/Information/Brochures/ EBI in a Nutshell Guide to data resources Research at a Glance 2012 32