Other biological databases Biological systems Sequence data Protein folding and 3D structure Taxonomic data Literature Pathways and networks Protein families and domains Small molecules Whole genome data Ontologies -GO Biological systems Other Biological Databases • • • • • • • • Transcription factor binding sites -TRANSFAC Protein structure databases- PDB, SCOP, CATH Protein family databases- Pfam, Prints, PROSITE etc. Chemicals and small molecules -ChEBI Gene expression databases –GEO, ArrayExpress Metabolic pathways - Reactome, KEGG Genome Databases- Ensembl, FlyBase, WormBase etc. Human genetics-related databases –HapMap, dbSNP Transcription factor binding sites • TRANSFAC –database of eukaryotic transcription factors: http://www.generegulation.com/pub/databases.html#transfac • TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess • TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html Protein structure databases • Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ • Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies • Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…) • Can search by PDB code Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code Protein structure-related databases • Structural family databases based on PDB – SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) • Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISSMODEL.html) Protein family databases • Databases that produce signatures for identifying protein families or domains • Used for functional classification of proteins • E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. • Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro) InterProScan sequence search Stand-alone version available InterPro text search Search keyword, protein acc or InterPro acc Results for protein acc Example InterPro entry Chemicals and small molecules • Chemical abstracts- http://www.cas.org/ • ChEBI- http://www.ebi.ac.uk/chebi • KEGG –part of it includes chemicals http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemi dlite.jsp • MSD-Chem –ligands and chemicals in MSD CheBI example entry Hierarchy for chemicals Gene expression databases • NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ • Stanford microarray database http://genomewww5.stanford.edu/ • Can usually search for experiments or particular expression profiles GEO search page Profiles search results Specific entry and experiment info ArrayExpress search results What does the data look like? • Info on experiment, array used, etc. • Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples • Files with meta data e.g. sample info, annotation and coordinates of each spot on array Proteomics: SWISS-2DPAGE Enzymes and metabolic pathways • Contain information describing enzymes, biochemical reactions and metabolic pathways; • ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions; • IntEnz: Integrated relational Enzyme database Enzyme nomenclature • E.C. (Enzyme Commission) numbers assigned based on reactions they catalyze • Hierarchy, high level groups: – – – – – – EC 1 –Oxidoreductases EC 2 –Transferases EC 3 –Hydrolases EC 4 –Lyases EC 5 –Isomerases EC 6 –Ligases EC example Metabolic Pathway databases • PATHGUIDE >200 pathways • KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: – Database of chemicals, genes and networks (metabolic, regulatory etc.) – Well-curated and quite specific • EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome • Reactome –curated biological pathways: http://www.reactome.org/ • GenMAPP –pathways contributed by users http://www.genome.ad.jp/kegg Different pathway in different species: -> comparison Pathway in Reactome Example of a pathway in BioCyc Protein-protein interaction databases • Protein-protein interaction databases store pairwise interactions or complexes • Can get 1 to more than 20,000 interactions per publication • IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doembi.ucla.edu/ • BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/ Protein-protein interactions in IntAct Integrated functional interactions in STRING Genome browsers • Integrate sequence & functional data for a genome • Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org • UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase • WormBase –C. elegans: http://www.wormbase.org • PlasmoDB –Plasmodium (malaria): http://plasmodb.org • Etc. Ensembl genome browser Ensembl gene view 1 Ensembl gene view 2 Gene within context on chromosome Human genetics databases • • • • GeneCards (http://www.genecards.org/) HapMap (http://hapmap.ncbi.nlm.nih.gov/) OMIM http://www.ncbi.nlm.nih.gov/omim HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html) Mutation/polymorphism databases Most of the databases are disease or gene centric i.e. p53 dbSNP http://www.ncbi.nlm.nih.gov/SNP/ Repository of all known mutation (human and other organisms) Where to find the databases • Table of addresses for major databases and tools • Nucleic Acids Research Database issue January each year • Nucleic Acids Research Software issue –new • Expasy list of tools: http://ca.expasy.org/links.html Large scale data retrieval • • • • Programmatic access to many databases MySQL access to some BioMart access –public and private FTP sites –large data downloads Other tutorials • http://www.ensembl.org/info/website/tutorials/ind ex.html • http://www.ebi.ac.uk/training/online/ • http://www.ebi.ac.uk/2can/home.html