Presentation

advertisement
EBI as a research infrastructure
Graham Cameron, EBI
EMBL
Heidelberg
Grenoble
Hamburg
Monterotondo
EBI
Service
Hinxton
Research
Training
Industry
Member States of EMBL
•
•
•
•
•
Austria
Belgium
Denmark
Finland
France
•
•
•
•
•
•
Germany
Greece
Israel
Italy
The Netherlands
Norway
•
•
•
•
•
Portugal
Spain
Sweden
Switzerland
United Kingdom
EBI
Service
Hinxton
Research
Training
Industry
~ €3.8 Billion
Wellcome Trust
Medical Research
Council
Council for the
Central Laboratory
of the Research
Councils
Biotechnology &
Biological Sciences
Research Council
Arts & Humanities
Research Council
Natural Environment
Research Council
Engineering &
Physical Sciences
Research Council
Economic & Social
Research Council
Particle Physics &
Astronomy
Research Council
We have amassed a wealth of knowledge
about the molecular processes of living
systems
• Biomacromolecules
• Biologically active molecules
• The behaviour and interactions
of these molecules
• The phenotypic effects of
molecular changes
• Mutations
• Drugs
• Nutrients
• The molecular adjuncts of
phenotypic changes
• Disease
• Aging
•
•
•
•
Databases
Web access
Tools to explore the information
Systems to capture the
information
• Service centres
DNA
Protein Sequences
Expression
Structures
molecules interact
PDB code 1DIF
HIV-1 Protease/Inhibitor Complex
A79285 (Difluoroketone)
Pathways
EMBL-Bank
DNA sequences
Reactome
Array-Express
Microarray
Expression Data
UniProt
Protein Sequences
EnsEMBL
Genome
Annotation
IntAct
Protein Interactions
EMSD
Macromolecular
Structure Data
Usage
• Basic research
• Industry
•
•
•
•
•
•
•
•
Pharma
Diagnostics
Medical device research
Personal care
Nutrition
Agriculture
Forestries
Fishery
• Patent searching and provenance
Using the information
Healthy
Diseased
High Yield
Low Yield
Disease Resistant
Disease prone
Salt Tolerant
Not Salt Tolerant
Suppose a gene’s variation seems important
Using the information
Healthy
Diseased
High Yield
Low Yield
Disease Resistant
Disease prone
Salt Tolerant
Not Salt Tolerant
Look in databases for similar genes, their
products, and functions, structures,
interactions and expression patterns.
The processes in which they are involved.
Using the information
Healthy
Diseased
High Yield
Low Yield
Disease Resistant
Disease prone
Salt Tolerant
Not Salt Tolerant
Can we influence the processes in which
they are involved?
Using the information
Healthy
Diseased
High Yield
Low Yield
Disease Resistant
Disease prone
Salt Tolerant
Not Salt Tolerant
Can we influence the processes in which
they are involved?
• Working out what in the lab
what a gene does could easily
be a year’s work
• Searching databases can do it
in half an hour
Date
Jun-05
Jun-04
Jun-03
Jun-02
Jun-01
Jun-00
Jun-99
Jun-98
Jun-97
Jun-96
Jun-95
Jun-94
Jun-93
Jun-92
Jun-91
Jun-90
Jun-89
Jun-88
Jun-87
Jun-86
Jun-85
Jun-84
Jun-83
Jun-82
Megabases
120000
Nucleotide Sequence
Database Growth
100000
80000
60000
40000
20000
0
2,500,000
Average Web Hits per Day
Average Hits per Day
2,000,000
Including Ensembl
1,500,000
1,000,000
500,000
Note: Ensembl is a joint project with
The Wellcome Trust Sanger Institute.
Equivalent usage data have only been
available since 2004.
0
1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd
99 99 99 99 00 00 00 00 01 01 01 01 02 02 02 02 03 03 03 03 04 04 04 04 05 05 05
Quarter Year
European Context
• BioSapiens
• EMBRACE
• ENFIN
• (and many others)
Biosapiens
•
•
•
•
•
•
•
•
•
European Molecular Biology
Laboratory - European Bioinformatics
Institute, Hinxton, Cambridge, UK.
European Molecular Biology
Laboratory, Heidelberg, Germany.
German National Centre for
Environment and Health, Neuherberg,
Münich, Germany
Université Libre de Bruxelles,
Brussels, Belgium
Consejo Superior de Investigaciones
Cientificas, Madrid, Spain
Institut Municipal d'Assistència
Sanitària, Barcelona, Spain
Genome Research Ltd, Hinxton,
Cambridge, UK.
Max-Planck Institute for Informatics,
Saarbrücken, Germany
The Hebrew University of Jerusalem,
Girat Ram, Israel
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Department of Biochemical Sciences University of
Rome "La Sapienza", Rome, Italy
University of Stockholm, Stockholm, Sweden
University of Oxford, Oxford, UK.
University College London, London, UK.
Radboud University Nijmegen, Nijmegen, The
Netherlands
Swiss Institute of Bioinformatics, Geneva, Switzerland
Technical University of Denmark, Lyngby, Denmark
University of Helsinki, Helsinki, Finland
University of Geneva, Geneva, Switzerland
Institute of Enzymology, Hungarian Academy of
Sciences, Budapest, Hungary
University of Cologne, Cologne, Germany
Institut Pasteur, Paris, France
BioInfo Bank Institute, Poznan, Poland
Max Planck Institute for Molecular Genetics, Berlin,
Germany
Genoscope, Evry, France
University of Bologna, Bologna, Italy
European Molecular Biology Laboratory - European
Bioinformatics Institute, Hinxton, Cambridge, UK
EMBRACE
European Molecular Biology Laboratory •
European Bioinformatics Institute, Hinxton,
Cambridge, UK.
European Molecular Biology Laboratory,
•
Heidelberg, Germany.
Institute of Biomedical Technologies, Section •
Bari, CNR, Bari, Italy
University of Manchester, UK
•
Swiss Institute of Bioinformatics, Geneva,
Switzerland
•
Swedish University of Agricultural Sciences.The
Linnaeus Centre for Bioinformatics, Sweden
•
Centre National de la Recherche Scientifique, •
Clermont-Ferrand and Lyon, France
•
Centre for Biological Sequence
Analysis,Technical University of Denmark,
Lyngby, Denmark
•
Centro Nacional de Biotecnologia/Consejo
Superior de Investigaciones Cientificas, Madrid
Spain
University of Stockholm, Stockholm
Bioinformatics Centre, Sweden
Institut National de la Recherche Agronomique,
Toulouse, France
Max Planck Institute for Molecular Genetics,
Berlin, Germany
CSC, the Finnish IT Center for Science, Espoo,
Finland
University College London, London, UK.
The Weizmann Institute, Rehovot, Israel
Centre for Molecular and Biomolecular
Informatics, University of Nijmegen, The
Netherlands
Carretera de Ajalvir, km. 4, 28850 Torrejon de
Ardoz, Madrid
ENFIN
•
•
•
•
•
•
•
•
•
•
The European Bioinformatics Institute /
The European Molecular Biology
Laboratory, Europe
The University of Dundee UK
Technical University of Denmark
University of Rome Tor Vergata Italy)
Medical Research Council Mammalian
Genetics Unit (MRCMGU), UK
Ludwig Institute for Cancer Research,
Uppsala (LICR-UPP), Germany
The Max Planck Institute, Germany
University of Helsinki (UH), Iceland
University College London (UCL), UK
National Center for Research and
Technology, Hellas (CERTH), Greece
•
•
•
•
•
•
Universitaet zu Koeln (UNIK), Germany
Weizmann Institute (Weizmann), Israel
Egeen (EGEEN), Estonia
Serono Pharmaceutical Research Institute
(SPRI), Switzerland
Consejo Superior de Investigaciones
Científicas (CSIC), Spain
Centre for Integrative Bioinformatics VU
(IBIVU), Netherlands
Global Picture
• DNA – tripartite international collaboration
(including patent data acquisition)
• Protein sequences – Uniprot collaboration
• Macromolecular structures – tripartite international
collaboration
• Intact international agreements
• Reactome – USA Europe collaboration
• Etc.
Large resources in related disciplines
Specialist biomolecular data
resource examples
BRENDA
Medical data
resources
IMGT
Pasteur DBs
Core
biomolecular
resources
Biodiversity
data
resources
SGD
Flybase
Chemical
data
resources
Eumorphia/
Phenotypes
MGD
Mutants
Mouse Atlas
Model organism resource examples
Large resources in related disciplines
Specialist biomolecular data
resource examples
BRENDA
Medical data
resources
IMGT
Pasteur DBs
Core
biomolecular
resources
Biodiversity
data
resources
SGD
Flybase
Chemical
data
resources
Eumorphia/
Phenotypes
MGD
Mutants
Mouse Atlas
Model organism resource examples
Medical data
resources
Core
biomolecular
resources
Large resources in related disciplines
Specialist biomolecular data
resource examples
BRENDA
Medical data
resources
IMGT
Pasteur DBs
Core
biomolecular
resources
Biodiversity
data
resources
SGD
Flybase
Chemical
data
resources
Eumorphia/
Phenotypes
MGD
Mutants
Mouse Atlas
Model organism resource examples
Web Hits
Denmark
Israel
Australia
Austria
Other
Finland
Belgium
Switzerland
Norway
Taiwan
Netherlands)
Sweden
USA
Canada
Spain
Italy
Japan
France
Germany
UK
EBI Total Running
Budget 2005 = €26 million
Other
Industry
3%
3%
UK Research Councils
7%
Wellcome Trust
7%
USA
8%
EMBL
50%
EU
22%
Projected budget 2011 = €43 million
€ 60
€ 50
€ 30
Millions
€ 40
€ 20
€ 10
€0
NCBI 2004/5 + PDB
EBI 2005
EBI 2011
€ 3,000
€ 2,000
€ 1,500
Millions
€ 2,500
€ 1,000
€ 500
€0
Cost of the
data
NCBI 2004/5 +
PDB
EBI 2005
EBI 2011
Read-only or dynamic
• There’s nothing particularly difficult about archiving
unchanging data
• But most aren’t
• Todays best bet
• E.g, Ensembl
• Provenance
• E.g., patent searching
• N.B. Versioning (complex!)
• Cititation
How much data
• Canonical vs. episodic
• Genomes, expression profiles
• Raw vs. processed
• Sequence traces
• Structure factors
Custodianship acquisition and ownership
• Widely accepted obligation to deposit data
• Depend on the goodwill of the community
• Add “organisation”
• Add “services”
• Add “value”
Annotation as added value
• First/second/third party annotation
• Computational vs. experimental
• Bundled vs. distributed
• (DAS)
Openness
• We approve of it
• Data must be made available as soon as they are
discussed in a publication
• Data from “community” projects should be made available
immediately
• Confidentiality issues must be addressed
Federation
• Monolithic solutions fail
• Centralisation yields more than the sum of the parts
• Aggregation of institutional repositories is essential
Slice it vertically or horizontally?
• E.g., the EBI and AstroGrid are domain specific
• Would it be better if they were jointly managed by data
experts?
• Standardisation
• Mixed success
Supporting the electronic record of science
• This is more like libraries than research projects
• Needs long term commitment
• With accountability
• Current funding structures are not well adapted to the
task
• Pitching the information providers in competition with their
research community is damaging.
Bioinformatics Infrastructure
• Has captured the data from several billion Euros worth of
science
• Serves a community of perhaps a million users
• Supports science on which the UK alone spends €3-4 billion a
year
• Cuts years of lab work down to hours of computer work
• Is crucial to human well being from medicine to agriculture
• Sees data volume and usage growing exponentially
• Might cost a few tens of millions (at most a couple of percent of
the cost of the science it supports).
Download