EBI as a research infrastructure Graham Cameron, EBI EMBL Heidelberg Grenoble Hamburg Monterotondo EBI Service Hinxton Research Training Industry Member States of EMBL • • • • • Austria Belgium Denmark Finland France • • • • • • Germany Greece Israel Italy The Netherlands Norway • • • • • Portugal Spain Sweden Switzerland United Kingdom EBI Service Hinxton Research Training Industry ~ €3.8 Billion Wellcome Trust Medical Research Council Council for the Central Laboratory of the Research Councils Biotechnology & Biological Sciences Research Council Arts & Humanities Research Council Natural Environment Research Council Engineering & Physical Sciences Research Council Economic & Social Research Council Particle Physics & Astronomy Research Council We have amassed a wealth of knowledge about the molecular processes of living systems • Biomacromolecules • Biologically active molecules • The behaviour and interactions of these molecules • The phenotypic effects of molecular changes • Mutations • Drugs • Nutrients • The molecular adjuncts of phenotypic changes • Disease • Aging • • • • Databases Web access Tools to explore the information Systems to capture the information • Service centres DNA Protein Sequences Expression Structures molecules interact PDB code 1DIF HIV-1 Protease/Inhibitor Complex A79285 (Difluoroketone) Pathways EMBL-Bank DNA sequences Reactome Array-Express Microarray Expression Data UniProt Protein Sequences EnsEMBL Genome Annotation IntAct Protein Interactions EMSD Macromolecular Structure Data Usage • Basic research • Industry • • • • • • • • Pharma Diagnostics Medical device research Personal care Nutrition Agriculture Forestries Fishery • Patent searching and provenance Using the information Healthy Diseased High Yield Low Yield Disease Resistant Disease prone Salt Tolerant Not Salt Tolerant Suppose a gene’s variation seems important Using the information Healthy Diseased High Yield Low Yield Disease Resistant Disease prone Salt Tolerant Not Salt Tolerant Look in databases for similar genes, their products, and functions, structures, interactions and expression patterns. The processes in which they are involved. Using the information Healthy Diseased High Yield Low Yield Disease Resistant Disease prone Salt Tolerant Not Salt Tolerant Can we influence the processes in which they are involved? Using the information Healthy Diseased High Yield Low Yield Disease Resistant Disease prone Salt Tolerant Not Salt Tolerant Can we influence the processes in which they are involved? • Working out what in the lab what a gene does could easily be a year’s work • Searching databases can do it in half an hour Date Jun-05 Jun-04 Jun-03 Jun-02 Jun-01 Jun-00 Jun-99 Jun-98 Jun-97 Jun-96 Jun-95 Jun-94 Jun-93 Jun-92 Jun-91 Jun-90 Jun-89 Jun-88 Jun-87 Jun-86 Jun-85 Jun-84 Jun-83 Jun-82 Megabases 120000 Nucleotide Sequence Database Growth 100000 80000 60000 40000 20000 0 2,500,000 Average Web Hits per Day Average Hits per Day 2,000,000 Including Ensembl 1,500,000 1,000,000 500,000 Note: Ensembl is a joint project with The Wellcome Trust Sanger Institute. Equivalent usage data have only been available since 2004. 0 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 99 99 99 99 00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 04 04 04 04 05 05 05 Quarter Year European Context • BioSapiens • EMBRACE • ENFIN • (and many others) Biosapiens • • • • • • • • • European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK. European Molecular Biology Laboratory, Heidelberg, Germany. German National Centre for Environment and Health, Neuherberg, Münich, Germany Université Libre de Bruxelles, Brussels, Belgium Consejo Superior de Investigaciones Cientificas, Madrid, Spain Institut Municipal d'Assistència Sanitària, Barcelona, Spain Genome Research Ltd, Hinxton, Cambridge, UK. Max-Planck Institute for Informatics, Saarbrücken, Germany The Hebrew University of Jerusalem, Girat Ram, Israel • • • • • • • • • • • • • • • • • Department of Biochemical Sciences University of Rome "La Sapienza", Rome, Italy University of Stockholm, Stockholm, Sweden University of Oxford, Oxford, UK. University College London, London, UK. Radboud University Nijmegen, Nijmegen, The Netherlands Swiss Institute of Bioinformatics, Geneva, Switzerland Technical University of Denmark, Lyngby, Denmark University of Helsinki, Helsinki, Finland University of Geneva, Geneva, Switzerland Institute of Enzymology, Hungarian Academy of Sciences, Budapest, Hungary University of Cologne, Cologne, Germany Institut Pasteur, Paris, France BioInfo Bank Institute, Poznan, Poland Max Planck Institute for Molecular Genetics, Berlin, Germany Genoscope, Evry, France University of Bologna, Bologna, Italy European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, Cambridge, UK EMBRACE European Molecular Biology Laboratory • European Bioinformatics Institute, Hinxton, Cambridge, UK. European Molecular Biology Laboratory, • Heidelberg, Germany. Institute of Biomedical Technologies, Section • Bari, CNR, Bari, Italy University of Manchester, UK • Swiss Institute of Bioinformatics, Geneva, Switzerland • Swedish University of Agricultural Sciences.The Linnaeus Centre for Bioinformatics, Sweden • Centre National de la Recherche Scientifique, • Clermont-Ferrand and Lyon, France • Centre for Biological Sequence Analysis,Technical University of Denmark, Lyngby, Denmark • Centro Nacional de Biotecnologia/Consejo Superior de Investigaciones Cientificas, Madrid Spain University of Stockholm, Stockholm Bioinformatics Centre, Sweden Institut National de la Recherche Agronomique, Toulouse, France Max Planck Institute for Molecular Genetics, Berlin, Germany CSC, the Finnish IT Center for Science, Espoo, Finland University College London, London, UK. The Weizmann Institute, Rehovot, Israel Centre for Molecular and Biomolecular Informatics, University of Nijmegen, The Netherlands Carretera de Ajalvir, km. 4, 28850 Torrejon de Ardoz, Madrid ENFIN • • • • • • • • • • The European Bioinformatics Institute / The European Molecular Biology Laboratory, Europe The University of Dundee UK Technical University of Denmark University of Rome Tor Vergata Italy) Medical Research Council Mammalian Genetics Unit (MRCMGU), UK Ludwig Institute for Cancer Research, Uppsala (LICR-UPP), Germany The Max Planck Institute, Germany University of Helsinki (UH), Iceland University College London (UCL), UK National Center for Research and Technology, Hellas (CERTH), Greece • • • • • • Universitaet zu Koeln (UNIK), Germany Weizmann Institute (Weizmann), Israel Egeen (EGEEN), Estonia Serono Pharmaceutical Research Institute (SPRI), Switzerland Consejo Superior de Investigaciones Científicas (CSIC), Spain Centre for Integrative Bioinformatics VU (IBIVU), Netherlands Global Picture • DNA – tripartite international collaboration (including patent data acquisition) • Protein sequences – Uniprot collaboration • Macromolecular structures – tripartite international collaboration • Intact international agreements • Reactome – USA Europe collaboration • Etc. Large resources in related disciplines Specialist biomolecular data resource examples BRENDA Medical data resources IMGT Pasteur DBs Core biomolecular resources Biodiversity data resources SGD Flybase Chemical data resources Eumorphia/ Phenotypes MGD Mutants Mouse Atlas Model organism resource examples Large resources in related disciplines Specialist biomolecular data resource examples BRENDA Medical data resources IMGT Pasteur DBs Core biomolecular resources Biodiversity data resources SGD Flybase Chemical data resources Eumorphia/ Phenotypes MGD Mutants Mouse Atlas Model organism resource examples Medical data resources Core biomolecular resources Large resources in related disciplines Specialist biomolecular data resource examples BRENDA Medical data resources IMGT Pasteur DBs Core biomolecular resources Biodiversity data resources SGD Flybase Chemical data resources Eumorphia/ Phenotypes MGD Mutants Mouse Atlas Model organism resource examples Web Hits Denmark Israel Australia Austria Other Finland Belgium Switzerland Norway Taiwan Netherlands) Sweden USA Canada Spain Italy Japan France Germany UK EBI Total Running Budget 2005 = €26 million Other Industry 3% 3% UK Research Councils 7% Wellcome Trust 7% USA 8% EMBL 50% EU 22% Projected budget 2011 = €43 million € 60 € 50 € 30 Millions € 40 € 20 € 10 €0 NCBI 2004/5 + PDB EBI 2005 EBI 2011 € 3,000 € 2,000 € 1,500 Millions € 2,500 € 1,000 € 500 €0 Cost of the data NCBI 2004/5 + PDB EBI 2005 EBI 2011 Read-only or dynamic • There’s nothing particularly difficult about archiving unchanging data • But most aren’t • Todays best bet • E.g, Ensembl • Provenance • E.g., patent searching • N.B. Versioning (complex!) • Cititation How much data • Canonical vs. episodic • Genomes, expression profiles • Raw vs. processed • Sequence traces • Structure factors Custodianship acquisition and ownership • Widely accepted obligation to deposit data • Depend on the goodwill of the community • Add “organisation” • Add “services” • Add “value” Annotation as added value • First/second/third party annotation • Computational vs. experimental • Bundled vs. distributed • (DAS) Openness • We approve of it • Data must be made available as soon as they are discussed in a publication • Data from “community” projects should be made available immediately • Confidentiality issues must be addressed Federation • Monolithic solutions fail • Centralisation yields more than the sum of the parts • Aggregation of institutional repositories is essential Slice it vertically or horizontally? • E.g., the EBI and AstroGrid are domain specific • Would it be better if they were jointly managed by data experts? • Standardisation • Mixed success Supporting the electronic record of science • This is more like libraries than research projects • Needs long term commitment • With accountability • Current funding structures are not well adapted to the task • Pitching the information providers in competition with their research community is damaging. Bioinformatics Infrastructure • Has captured the data from several billion Euros worth of science • Serves a community of perhaps a million users • Supports science on which the UK alone spends €3-4 billion a year • Cuts years of lab work down to hours of computer work • Is crucial to human well being from medicine to agriculture • Sees data volume and usage growing exponentially • Might cost a few tens of millions (at most a couple of percent of the cost of the science it supports).