METABOLIC PATHWAYS & GENOMICS File: genome&p

advertisement
COMPARATIVE GENOMICS WORKSHOP
‘RESSOURCEMENT’
BASIC RESOURCES
● General
NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search
programs.
TAIR http://www.arabidopsis.org/ The Arabidopsis information resource.
TIGR Gene Indices http://compbio.dfci.harvard.edu/tgi/ Annotated Arabidopsis, rice etc genomes. TIGR
Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns).
PlantGDB EST Assemblies http://www.plantgdb.org/search/misc/
EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki
SGD http://www.yeastgenome.org/ Saccharomyces genome database
ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames
ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html
ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its
amino acid composition.
Primer3 http://primer3.ut.ee Primer design site
NEBcutter http://tools.neb.com/NEBcutter2/index.php Molecular Biology restriction digests site
Seq Massager http://www.attotron.com/cybertory/analysis/seqMassager.htm Cleaning up sequences for
Bioinformatics platforms
ABIM
listing
http://sites.univ-provence.fr/~wabim/english/logligne.html ABIM online sequence analysis tools
GOLD http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
genome and metagenome projects
DiArk http://www.diark.org/diark/
Resource for centralized monitoring of
Compilation of eukaryotic genome and EST sequencing projects
Bionumbers http://bionumbers.hms.harvard.edu/default.aspx
● Multiple sequence alignment, phylogeny, Venn diagrams
Computational Approaches in Comparative Genomics
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY
Galperin.
Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color)
and makes phylogenetic trees.
ClustalOmega and Phylogenetic trees http://www.ebi.ac.uk/Tools/msa/clustalo/ Aligns protein sequences
and makes phylogenetic trees.
T-Coffee & M-Coffee http://tcoffee.vital-it.ch/apps/tcoffee/index.html Tools for combining and comparing
multiple sequence alignments.
WebLogos http://weblogo.berkeley.edu/ Creates Logos from multiple alignments
MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual.
Phylogeny.fr www.phylogeny.fr/ Very good phylogentic tree platform for beginners, has a great tool
BlastExlorer to collect sequences for alignments and trees
iTOL Interactive tree of life
VENNY - http://bioinfogp.cnb.csic.es/tools/venny/index.html - Interactive tool for comparing lists with Venn
Diagrams.
● Transmembrane and organellar targeting predictions
TMHMM http://www.cbs.dtu.dk/services/TMHMM/
Prediction of transmembrane helices.
TargetP http://www.cbs.dtu.dk/services/TargetP/ Prediction of protein localization.
Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization.
iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization.
WoLF PSORT http://www.genscript.com/psort/wolf_psort.html Prediction of protein localization.
Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction.
COSMOSS Ambiguous Targeting Predictor
http://www.cosmoss.org/bm/ATP
● Long-range homology searches/structure modeling
PSI-BLAST Position-Specific Iterated BLAST)
Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index
Recognition Engine V 2.0
Protein Homology/analogY
PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server.
MESSA http://prodata.swmed.edu/MESSA/MESSA.cgi MEta-Server for protein Sequence Analysis provides predictions of local sequence features, spatial structure, domain architecture and function for a
protein sequence
FFAS http://ffas.sanfordburnham.org/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System.
COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence
Alignments with assessment of Statistical Significance.
I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/ On-line platform for protein structure and
function predictions (use Google Chrome Browser) Password required IT_0dfdz
The same group has modeled all E. coli structures: http://zhanglab.ccmb.med.umich.edu/Ecoli/
FINDSITECOMB http://cssb.biology.gatech.edu/FINDSITE-COMB Threading/Structure-Based, ProteomicScale Virtual Ligand Screening tool
● Protein structures
PDB and PDBe http://www.pdb.org/ and http://www.ebi.ac.uk/pdbe/ PDB is the main structure databases
PDBe has user friendly features
TargetTrack http://sbkb.org/tt/ Gives experimental progress and status of targets selected for structure
determination.
MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000
structures, linked to the rest of the NCBI databases.
● Conserved domains, motifs, and protein families
COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by
comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic
lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient
conserved domain.
CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database
NCBI PSSM viewer http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi?
Specific Scoring Matrix data used for Conserved Domain Database
Position-
PFAM http://pfam.janelia.org/ Protein FAMily database
Superfamily database
http://supfam.cs.bris.ac.uk/SUPERFAMILY/index.html
PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database
Seq2Ref http://prodata.swmed.edu/seq2ref/ Performs BLAST search for a query protein and retrieves the
reference proteins (= experimentally studied or manually curated proteins) from NCBI, PDB and Swiss-Prot
ENZYME & METABOLIC PATHWAY RESOURCES
Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISSPROT protein database, BRENDA, KEGG, etc)
IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database
BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database.
KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes
metabolic pathways, and compound structures that can be captured.
IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes
http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of
Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high
quality information on pathways etc.
Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/
EcoSal http://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press
publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive
archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic
pathways.
BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism;
MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases.
AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows
querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway
overview diagram.
MetaCrop http://metacrop.ipk-gatersleben.de/apex/f?p=269:111: Summarizes diverse information about
around 40 metabolic pathways in crop plants
HMDB http://www.hmdb.ca/ Human metabolome database
COMPARATIVE GENOMICS RESOURCES
General integration platforms (genome browsers, genome comparisons (pathways or
synteny), phylogenetic distribution queries, physical clustering etc.)
SEED http://www.theseed.org/wiki/Main_Page
valuable tools.
Database containing hundreds of genomes and many
OpenSEED http://open.theseed.org/
PlantSEED http://plantseed.theseed.org
Accessing SEES Clearinghouse: From the old SEED Front page at http://pubseed.theseed.org/index.cgi
follow the link to "Peer-to-peer Updates" .
Alternatively from ANY peg page in PubSEED, follow the blue link [to old protein page] - and from there use
link to same link "Peer-to-peer Updates"
Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial
genomes), multiple tools
MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation
platform (Strong on metabolism)
IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system (most up to date in terms of
genomes)
MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics.
NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data
Resource.
EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence /
structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological
functions of unknown enzymes discovered in genome projects.
To classify a sequence into a superfamily, subgroup, or family using Hidden-Markov-Models or a BLAST
search: From face page click on EFI - Informatics (SFLD) * Click on Search by Enzyme tab * Paste in
sequence and select HMM or Blast.
Multiple Associations platforms
STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from
genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (coexpression), and the literature. STRING quantitatively integrates data from bacteria and other organisms.
STITCH http://stitch.embl.de/ STRING incorporating small molecules
eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and
protein interaction data.
AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis.
DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms.
Genome Projector http://www.g-language.org/GenomeProjector/
Protein-protein interactions
BioGRID http://thebiogrid.org/ Protein and genetic interactions from major model organisms
Arabidopsis Interactions Viewer http://bar.utoronto.ca/interactions/cgibin/arabidopsis_interactions_viewer.cgi
Queries a database of 70944 predicted and 29777 confirmed
Arabidopsis interacting proteins. The predicted interactions (interologs) are from Interactome 2.0.
Specific for Algae
AFAT http://pathways.mcdb.ucla.edu/algal/index.html Algal Functional Annotation Tool
Phylogenetic distribution tools
JGI Phylogenetic Profiler https://img.jgi.doe.gov/cgibin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes.
Account required (Login name: A_D_Hanson Password: nosnaH_D_A)
MicroScope Phyloprofile Exploration
https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php?
MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgi-bin/matchphyloprofile.cgi
Regulatory sites prediction and analysis in bacteria
RegPrecise http://regprecise.lbl.gov/RegPrecise/ Regulon database (Is intergrated in MicrobesOnline)
RegPredict http://regpredict.lbl.gov/regpredict/ Platform to discover regulator binding sites
RegTransbase http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main Knowledge database on
bacterial regulators
Promoter prediction in bacteria
http://nucleix.mbu.iisc.ernet.in/prombase/
In general the Meme suite is great to identify motifs both DNA and Protein:
Meme Suite http://meme.nbcr.net/meme/intro.html
Glam2 for example is very powerful
Associations based on phenotypes
E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/ Profiling of all viable E.coli mutants on
hundreds of chemicals
Yeast Fitness database http://fitdb.stanford.edu/ Profiling of all S. cerevisiae mutants on hundreds of
chemicals
MICROARRAY & RNASeq DATABASES AND ANALYSIS RESOURCES
GEO
http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus
Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html
Good tools for getting an overview of gene expression in Arabidopsis, and for finding co-responses.
Golm yeast co-response database http://csbdb.mpimp-golm.mpg.de/csbdb/dbcor/sce.html
ATTED http://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene
networks, not just lists of correlated genes.
COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals.
COXPRESdb provides data for both S. cerevisiae and S. pombe, while Golm does cerevisae only.
COXPRESdb displays co-expression data for orthologs -when they exist- in invertebrates and vertebrates.
Such similar patterns of coexpression accross species can generate very strong predictions.
FungiDB http://fungidb.org/fungidb/
GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice,
poplar, and barley
Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of
Arabidopsis, rice, or poplar genes
Translatome eFP
http://efp.ucr.edu/
Transcriptome profiling of 13 discrete Arabidopsis cell populations
PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics,
transcriptomics and integrated analysis of different omics data.
PLEXdb http://www.plexdb.org/ Plant Expression Database
Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic
Northerns.
CyanoExpress http://cyanoexpress.sysbiolab.eu Co-expression database for Synechocystis sp. PCC
6803. Instructions:
1) On the homepage, click 'gene expression'
2) Click 'all perturbations'
3) In the blank field 'search for' on the upper left part of the page paste your gene id (e.g. sll0635), click go
4) In the new page, click the microarray picture (a single band). A larger microarray image will appear; there
is a column on the right that lists your top 20 co-regulated genes (most of them functionally annotated from
the poorly maintained and obsolete Cyanobase).
MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm Tool to plot and analyze large datasets
qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various
organs, correlation of expression of two genes.
● Bacteria
MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene
expression in E. coli and other bacteria
EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major
changes in gene expression observed in various experiments.
GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase
Porteco http://expression.porteco.org/ E. coli microarray analysis they also have analysis of the
phenotype data
● Yeast
SPELL http://spell.yeastgenome.org/ Co-response search tool for yeast
● Mammals
BioGPS http://biogps.org/#goto=welcome
Comparing bacterial genomes
MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php
Seedviewer http://pubseed.theseed.org/seedviewer.cgi
IMG http://img.jgi.doe.gov/
All have genome synteny viewers
ESSENTIAL GENES DATABASES (Pro- and Eukaryote)
OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database
DEG
http://tubic.tju.edu.cn/deg/ or http://www.essentialgene.org/ Database of Essential Genes
PLANT PHENOME DATABASES
RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database,
phenotypic data in transposon-insertional mutants.
SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation.
Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of
chloroplast genes.
BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase
PLANT METABOLOME DATABASE
PlantMetabolomics http://tht.vrac.iastate.edu:81/ Consortium profiling the metabolome of specific TDNA knockout alleles for targeted genes
PLANT PROTEOME DATABASES
PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase
SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins
(includes GFP and MS-MS data)
pep2pro http://fgcz-pep2pro.uzh.ch/ Organ-specific characterisation of the Arabidopsis proteome
containing 14,522 identified proteins
NBrowse http://www.arabidopsis.org/tools/nbrowse.jsp Arabidopsis protein-protein interaction database
UNKNOWN GENE/ENZYME DATABASES
ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan
enzymes)
ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic
Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae).
Orphan enzymes project http://www.orphanenzymes.org/
GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative
enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites
(compounds known to be present at least in a living organism, but whose synthetic/degradation pathways
are unknown).
POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis
Unknown-eome (may be defunct)
LITERATURE MINING RESOURCES
PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc
HighWire Press http://highwire.stanford.edu/
Google Scholar http://scholar.google.com/
eTBlast http://etest.vbi.vt.edu/etblast3/
iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins
MAIZE GENOME RESOURCES
Maizesequence.org http://www.maizesequence.org/index.html Browser providing the latest sequence
and annotation of the maize genome from the Maize Genome Sequencing Project
MaizeGDB http://www.maizegdb.org/
Gramene http://www.gramene.org/
analysis of grasses
Updated 12/11/2014
Maize genetics and genomics database
Curated, open-source, data resource for comparative genome
Download