PowerPoint-Präsentation - European Bioinformatics Institute

advertisement
Short Introduction To EMBL-EBI
Vicky Schneider,
EMBL-EBI Training Programme Project leader
vicky@ebi.ac.uk
What is EMBL-EBI?
• Based on the Wellcome
Trust Genome Campus near
Cambridge, UK
• Part of the European
Molecular Biology
Laboratory
• Non-profit organisation
2
13.04
.2015
The five branches of EMBL
Heidelberg
Basic research in
molecular biology
Administration
EMBO
•
1500 staff
•
>60 nationalities
Hamburg
Structural biology
Grenoble
Structural biology
3
Hinxton
Bioinformatics
Monterotondo
Mouse biology
EMBL member states
Austria, Belgium, Croatia,
Denmark, Finland, France,
Germany, Greece, Iceland, Ireland,
Israel, Italy, Luxembourg, the
Netherlands, Norway, Portugal,
Spain, Sweden, Switzerland and
the United Kingdom
Associate member state: Australia
4
How is EMBL-EBI funded?
• In 2010 it cost €41 million to run EMBL EBI.
EU (€7.4 M)
EMBL member states (€22.4 M)
Charity (€4.1 M)
5
US Govt
(€2.9 M)
UK Research
Councils
(€2.5 M)
What Is Bioinformatics?
What is bioinformatics?
storing
7
13.04
.2015
retrieving
Interdisciplinary
analysing
Heart of
modern biology
Biology is changing
• Data explosion
• New types of data
12000
• High-throughput biology
10000
• Growth of applied biology
8000
Disks (TB)
• Emphasis on systems,
not reductionism
Growth of raw storage
at EMBL-EBI
(in terabytes)
• molecular medicine
6000
4000
2000
0
• agriculture
• food
• environmental sciences…
8
Year
The molecules of life
Nature’s ingredients
Small molecules provide
building blocks,
messengers and helpers:
Amino acids: the building
blocks of proteins
Nucleotides and sugars: the
building blocks of DNA and
RNA
Co-enzymes: pigments
such as chlorophyll and
haem help imprortant
processes such as
photosynthesis and
respiration
Hormones: small molecules
such as adrenalin and
testosterone send important
messages from cell to cell
9
13.04
.2015
The ‘book of life’
DNA contains the
information needed to
build an organism
The interpreter
RNA translates
the DNA code
into protein
Molecular machines
Proteins carry out the functions of life:
Catalysts: enzymes enable reactions to
occur at body temperature
Structural support: keratin and collagen
give structure to our tissues
Transport: carrier proteins move molecules
into and out of cells
Defense: antibodies protect us from
disease-causing organisms
Movement: myosin in muscles enables
them to contract
Bioinformatics underpins life-science research
1 Genomes
Contain genes
2 Genes are
transcribed
3 Transcripts translate
to protein sequences
4 Proteins form threedimensional structures
5 Proteins interact with each other
and with small molecules to form
pathways
6 Pathways combine
to build systems
From molecules to medicine
Molecular components
Integration
Translation
Genomes
Human
populations
Nucleotides
Biobanks
Tissues and organs
Transcripts
Complexes
Therapies
Proteins
Disease
prevention
Domains
Pathways
Cells
Structures
Small molecules
11
13.04
.2015
Human
individuals
Early
Diagnosis
Examples of the importance of biological
information to all of us
Genome-wide analysis of crop plants
• Population growth and climate
change are major challenges to
food security.
• Traditional routes to crop
improvement are too slow to
keep up with this increase in
demand.
• Understanding plant genomes
helps us identify which species
will be most tolerant to drought,
salt and pests while still
providing optimum nutrition.
Matching the treatment to the cancer
• One in ten women in
the EU-27 will develop
breast cancer before
the age of 80.
• If we can identify
patterns of genes that
are active in different
tumours, we can
diagnose and treat
cancers earlier.
Tracking the source of infectious disease
• Methicillin-resistant MRSA
(Staphylococcus aureus)
infection is a global problem.
• Transmission of individual
clones can be tracked using
small variations in DNA
sequence.
• This technology can be used
to identify the source of new
outbreaks across continents
and within wards.
Barcoding life
• DNA barcodes are short
sections of DNA that we use to
identify an organism.
• The Barcode of Life Initiative is
developing DNA barcoding as a
global standard for identifying
species.
• Applications include:
• Protection of endangered species
• Sustaining natural resources
through pest control
• Food labelling
Repurposing drugs for neglected diseases
• Schistosomiasis is a parasitic
infection that affects 210
million people in 76 countries.
• Resistance is developing to
the one available drug.
• We look at the Schistosome
genome to identify the targets
of existing drugs.
• Candidates can be tested for
anti-schistosomal activity or
used as leads for further
optimisation.
Lots of data and new types of data
Literature
Genomes
Protein sequence
Proteomes
Nucleotide sequence
Protein structure
Gene expression
Protein families,
domains and motifs
Chemical entities
Protein-protein
interactions
Pathways
18
Systems
EMBL-EBI’s mission statement
• To provide freely available data and bioinformatics services
to all facets of the scientific community in ways that promote
scientific progress
• To contribute to the advancement of biology through basic
investigator-driven research in bioinformatics
• To provide advanced bioinformatics training to scientists at all
levels, from PhD students to independent investigators
• To help disseminate cutting-edge technologies to industry
• To coordinate biological data provision across Europe
13/04
/2015
Services
www.ebi.ac.uk/services
Principles of service provision
@ Patrick Hoesly
Accessibility
Compatibility
Portability
21
Comprehensive
Quality
Databases: molecules to systems
Genomes
Ensembl
Ensembl Genomes
EGA
Nucleotide sequence
ENA
Functional
genomics
ArrayExpress
Expression Atlas
Literature and ontologies
CiteXplore, GO
Protein families,
motifs and domains
InterPro
Macromolecular
PDBe
Protein activity
IntAct , PRIDE
Pathways
Reactome
Protein Sequences
UniProt
Chemical entities
ChEBI
Chemogenomics
ChEMBL
22
Systems
BioModels
BioSamples
Database collaborations
23
Standards development – international collaborations
Genomics Standards Consortium (GSC)
http://gensc.org
Genome annotation
www.geneontology.org
Protein sequence
www.uniprot.org
Nucleotide sequence
www.insdc.org
Functional Genomics
Data Society
www.fged.org
Cheminformatics
www.ebi.ac.uk/chebi
HUPOProteomics
Standards
Initiative (PSI)
www.psidev.info/
Pathways
www.reactome.org
www.biopax.org
Metabolomics Standards Initiative (MSI)
www.metabolomicssociety.org
24
Protein structure
www.wwpdb.org
Systems modelling
standards
www.sbml.org
CATH
BLAST
Ensembl
PDBsum
MACiE
VAST
ENA
PubChem
UCSC Genome
Browser
CiteXplore
SCOP
GEO
STRING
Flybase
DDBJ
UniProt
ChEBI
RefSeq
Gene3D
PRIDE
PDB
Reactome
GenBank
ProFunc
Pfam
Pubmed
Protein Sequences
Macromolecular Structures
Small Molecules
Gene Expression
Protein Families (Diagnostic)
Literature
Ontologies
Proteomics
Sequence Similarity & Analysis
BioModels
Gramene
Reactions & Pathways
Enzymes
ArrayExpress
FASTA
Nucleotide Sequences
Molecular Interactions
IntEnz
IntAct
GO
PRINTS
InterProScan
Atlas
Genomes
Pattern & Motif Search (Diagnostic)
GOA
Structure Analysis
UCSC
Genome
Browser
Flybase
Gramene
DDBJ
RefSeq
Ensembl
RefSeq GenBank
Gramene
SCOP
ENA
RefSeq
PDBsum
ChEBI
PubChem
ArrayExpress
Atlas
IntAct
Reactome
InterPro
Nucleotide Sequences
UniProt
PDB
CATH
PRINTS
GEO
SCOP
PRINTS
Small Molecules
Gene Expression
BioModels
Reactions & Pathways
CiteXplore
GO
FASTA
Gene3D
Macromolecular Structures
Molecular Interactions
IntEnz MACiE
GOA
Protein Sequences
STRING
Pfam
Pubmed
Genomes
Enzymes
Literature
ChEBI
Ontologies
PRIDE
Proteomics
BLAST
InterProScan
CATH ProFunc
Protein Families (Diagnostic)
VAST
Sequence Similarity & Analysis
Pattern & Motif Search (Diagnostic)
Structure Analysis
New search service
Access from the
EBI’s homepage
Species selector
allows for easy
comparison
Data organised
according to:
• gene
• expression
• protein
• structure
• literature
27
Explore data,
return easily to
your results
Goals of the new EBI Search
• Relevant to ‘wet-lab’ biologists
• Organises information based around a single gene
(or a small number of genes)
• User-expectation centric (not database centric)
• Smooth transition to the detailed information in
many of EBI’s core databases
• NOT for bioinformaticians:
does not provide programmatic access
28
Quick databases tour
29
Genomes 1: Ensembl
Chromosomes
Genes
Genomic alignments
Pick a genome
Synteny
Variations
Variation Effect
Predictor
Gene trees
Gene families
30
User
Upload
Genomes 2: Ensembl Genomes
Genome portals for the five
kingdoms of life
Interface uses
Ensembl technology
Variation data for
plant, metazoan
and fungal
species
Multi-way comparison
of whole bacterial
chromosomes
31
Pan-taxonomic
comparative analysis
Nucleotides: European Nucleotide Archive
(ENA)
The ENA has a three-tiered data
architecture.
It consolidates information from
EMBL-Bank, the European Trace
Archive (containing raw data from
electrophoresis-based sequencing
machines) and the Sequence Read
Archive (containing raw data from
next-generation sequencing
platforms).
Figure adapted from: Cochrane, G. et al. Public Data
Resources as the Foundation for a Worldwide
Metagenomics Data Infrastructure. In: Metagenomics:
Theory, Methods and Applications (Chapter 5), Caister
Academic Press, Universidad Nacional de Cordoba,
Argentina. Ed. D. Marco (2010).
32
Transcriptomes: ArrayExpress
Expand results
ArrayExpress Archive:
browse experiments
Search by keyword
Spreadsheets describing
the sample properties
33
Transcriptomes: Gene Expression Atlas
Atlas: browse
changes in gene
expression
Gene
page
Experiment page
34
Search by gene or
biological condition
Some data sources for annotation
Input sources for UniProtKB
35
GO
Functional info
PRIDE
Protein
identification data
InterPro
Protein families and
domains
IntAct
Molecular
interactions
IntEnz
Enzymes
HAMAP
RESID
Microbial protein
families
Post-translational
modifications
•
Manual curation
•
Literature-based
annotation
•
Sequence analysis
InterPro
classification
Signal
prediction
UniProt
•
Automated
annotation
Transmembrane
prediction
Other
predictions
Protein
classification
Protein families, motifs and domains: InterPro
Powerful tool for protein
classification, integrating several
methods into one resource
Compare methods of protein
signature prediction
Visualise the taxonomic range
for a protein signature
View architectures of proteins
containing a signature
36
Proteomics services
PRIDE: protein identifications
from proteomics experiments
IntAct: molecular interactions
ChEBI: small molecules
37
INTENZ: enzyme classification
Structures: PDBe
38
Chemogenomics: ChEMBL
ChEMBL
database
Neglected
Tropical
Disease
(NTD) archive
ChEMBL
Browse targets
Target
search
Kinase SARfari
Search
results
Compound
search
39
GPCR SARfari
Pathways: Reactome
Compare events in
different species
View expression values
overlaid on a pathway
Link to source
databases
Interaction overlay on a
pathway diagram
40
Export pathway
to your favourite
modelling
software
Data management
• Over 4M web requests per day – over 4.6M if
Ensembl is included
•
Over 280,000 unique hosts served per month,
excluding Ensembl
• Total disk space: 10 petabytes in 2010.
• Leased two new data
centres (with €11.4M from
UK Research Councils)
• Over 800 million crossreferences in the
databases we serve
41
User support
• E-mail support – www.ebi.ac.uk/support
• Online help pages – www.ebi.ac.uk/help
• 2Can bioinformatics user support – www.ebi.ac.uk/2Can
• eLearning Portal – coming soon (elearning@ebi.ac.uk)
42
Research
www.ebi.ac.uk/groups
Key facts about research
• The EBI provides a unique environment for bioinformatics
research
• Eight dedicated research groups aim to understand
biology through new approaches to interpreting biological
data
• Services teams also carry out R&D to enhance existing
services and develop new ones
• Research programme complements services and the two
are mutually supportive
44
Curiosity-driven research
Genomes
Transcriptomes
Proteins
Ewan
Birney
Alvis
Brazma
Janet
Thornton
Nicolas
Le Novère
Paul
Flicek
Anton
Enright
Rolf
Apweiler
Nick
Luscombe
Nick
Goldman
John
Marioni
Gerard
Kleywegt
Paul
Bertone
Text mining
biology/medicine
chemistry/chem
engineering
Dietrich
RebholzSchuhmann
Chemistry
Christoph
Steinbeck
maths
physics
Pathways and systems
John
Overington
Julio SaezRodriguez
Training
www.ebi.ac.uk/training
Hands-on training for all levels of experience
• Interactive training in our purpose-built IT training suite at
EMBL-EBI, Hinxton, Cambridge
• Learn from the EBI’s experts through a combination of
talks and practical exercises
• Take a tour of all our core data resources, or focus in on
specific data types
• Full programme at www.ebi.ac.uk/training/handson
48
Predoc and postdoc training
• Open Days for bioinformatics
early-stage researchers
www.ebi.ac.uk/training/openday
• PhD studentships through EMBL
International PhD Programme
www.ebi.ac.uk/training/Studentships
• EIPOD interdisciplinary post-doc
fellowship programme
www.embl.de/training/postdocs/eipod
• EBI–Sanger postdoc programme
ww.ebi.ac.uk/training/postdoc/ESPOD
49
Download