Other_biol_databases - National Bioinformatics Training

advertisement
Other biological databases
Biological systems
Sequence data
Protein folding and 3D structure
Taxonomic data
Literature
Pathways and networks
Protein families and domains
Small molecules
Whole genome data
Ontologies -GO
Biological systems
Other Biological Databases
•
•
•
•
•
•
•
•
Transcription factor binding sites -TRANSFAC
Protein structure databases- PDB, SCOP, CATH
Protein family databases- Pfam, Prints, PROSITE etc.
Chemicals and small molecules -ChEBI
Gene expression databases –GEO, ArrayExpress
Metabolic pathways - Reactome, KEGG
Genome Databases- Ensembl, FlyBase, WormBase etc.
Human genetics-related databases –HapMap, dbSNP
Transcription factor binding sites
• TRANSFAC –database of eukaryotic transcription
factors: http://www.generegulation.com/pub/databases.html#transfac
• TESS –Transcription Element Search System –for
predicting transcription factor binding sites, uses
TRANSFAC: http://www.cbi.upenn.edu/tess
• TFsearch –for searching transcription factor binding
sites:
http://www.cbrc.jp/research/db/TFSEARCH.html
Protein structure databases
• Main resource is Protein Data Bank (PDB):
http://www.rcsb.org/pdb/
• Contains the spatial coordinates of macromolecule
atoms whose 3D structure has been obtained by
X-ray or NMR studies
• Proteins represent more than 90% of available
structures (others are DNA, RNA, sugars, viruses,
protein/DNA complexes…)
• Can search by PDB code
Searching MSD
http://www.ebi.ac.uk/msd -Search by PDB code
Protein structure-related databases
• Structural family databases based on PDB –
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
and CATH
(http://www.biochem.ucl.ac.uk/bsm/cath/)
• Predicted structures in SWISS-MODEL
(http://swissmodel.expasy.org//SWISSMODEL.html)
Protein family databases
• Databases that produce signatures for identifying
protein families or domains
• Used for functional classification of proteins
• E.g. Pfam, PROSITE, Prints, SMART,
TIGRFAMs etc.
• Integrated into single resource InterPro
(http://www.ebi.ac.uk/interpro)
InterProScan sequence search
Stand-alone
version available
InterPro text search
Search
keyword,
protein acc
or InterPro
acc
Results
for
protein
acc
Example
InterPro
entry
Chemicals and small molecules
• Chemical abstracts- http://www.cas.org/
• ChEBI- http://www.ebi.ac.uk/chebi
• KEGG –part of it includes chemicals
http://www.genome.jp/kegg
• ChemID plus -chemicals cited in NLM databases
http://chem2.sis.nlm.nih.gov/chemidplus/chemi
dlite.jsp
• MSD-Chem –ligands and chemicals in MSD
CheBI example entry
Hierarchy
for
chemicals
Gene expression databases
• NCBI Gene Expression Omnibus (GEO)
http://www.ncbi.nlm.nih.gov/geo/
• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/
• Stanford microarray database http://genomewww5.stanford.edu/
• Can usually search for experiments or particular
expression profiles
GEO
search
page
Profiles search results
Specific
entry and
experiment
info
ArrayExpress
search results
What does the data look like?
• Info on experiment, array used, etc.
• Raw or processed tab delimited file containing
spots and their intensities cy3/cy5 ratios) across
different samples
• Files with meta data e.g. sample info, annotation
and coordinates of each spot on array
Proteomics: SWISS-2DPAGE
Enzymes and metabolic pathways
• Contain information describing enzymes,
biochemical reactions and metabolic pathways;
• ENZYME and BRENDA: nomenclature databases
that store information on enzyme names and
reactions;
• IntEnz: Integrated relational Enzyme database
Enzyme nomenclature
• E.C. (Enzyme Commission) numbers assigned based
on reactions they catalyze
• Hierarchy, high level groups:
–
–
–
–
–
–
EC 1 –Oxidoreductases
EC 2 –Transferases
EC 3 –Hydrolases
EC 4 –Lyases
EC 5 –Isomerases
EC 6 –Ligases
EC example
Metabolic Pathway databases
• PATHGUIDE >200 pathways
• KEGG (Kyoto encyclopedia of genes and genomes):
http://www.genome.jp/kegg -includes:
– Database of chemicals, genes and networks (metabolic,
regulatory etc.)
– Well-curated and quite specific
• EcoCyc (Encyclopedia of E. coli K12 genes and
metabolism): http://ecocyc.org –curation of entries
genome
• Reactome –curated biological pathways:
http://www.reactome.org/
• GenMAPP –pathways contributed by users
http://www.genome.ad.jp/kegg
Different pathway in
different species: ->
comparison
Pathway in Reactome
Example of a pathway in BioCyc
Protein-protein interaction databases
• Protein-protein interaction databases store pairwise
interactions or complexes
• Can get 1 to more than 20,000 interactions per publication
• IntAct http://www.ebi.ac.uk/intact
• DIP (Database of Interacting Proteins) http://dip.doembi.ucla.edu/
• BIND (Biomolecular Interaction Network Database)
http://submit.bind.ca:8080/bind/
Protein-protein interactions in IntAct
Integrated functional interactions in STRING
Genome browsers
• Integrate sequence & functional data for a genome
• Ensembl –genome browser for major eukaryotic genomes,
e.g. human, mouse etc. http://www.ensembl.org
• UCSC browser -http://genome.ucsc.edu/
• FlyBase –Drosophila genome database:
http://www.ebi.ac.uk/flybase
• WormBase –C. elegans: http://www.wormbase.org
• PlasmoDB –Plasmodium (malaria): http://plasmodb.org
• Etc.
Ensembl genome browser
Ensembl gene view 1
Ensembl
gene view 2
Gene within context on chromosome
Human genetics databases
•
•
•
•
GeneCards (http://www.genecards.org/)
HapMap (http://hapmap.ncbi.nlm.nih.gov/)
OMIM http://www.ncbi.nlm.nih.gov/omim
HGDP Human Genome Diversity Project
(http://hagsc.org/hgdp/files.html)
Mutation/polymorphism databases
Most of the databases are disease or gene
centric i.e. p53
dbSNP
http://www.ncbi.nlm.nih.gov/SNP/
Repository of all known mutation
(human and other organisms)
Where to find the databases
• Table of addresses for major databases and tools
• Nucleic Acids Research Database issue January
each year
• Nucleic Acids Research Software issue –new
• Expasy list of tools:
http://ca.expasy.org/links.html
Large scale data retrieval
•
•
•
•
Programmatic access to many databases
MySQL access to some
BioMart access –public and private
FTP sites –large data downloads
Other tutorials
• http://www.ensembl.org/info/website/tutorials/ind
ex.html
• http://www.ebi.ac.uk/training/online/
• http://www.ebi.ac.uk/2can/home.html
Download