COMPARATIVE GENOMICS WORKSHOP ‘RESSOURCEMENT’ BASIC RESOURCES ● General NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs. TAIR http://www.arabidopsis.org/ The Arabidopsis information resource. TIGR Gene Indices http://compbio.dfci.harvard.edu/tgi/ Annotated Arabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns). PlantGDB EST Assemblies http://www.plantgdb.org/search/misc/ EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki SGD http://www.yeastgenome.org/ Saccharomyces genome database ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition. Primer3 http://primer3.ut.ee Primer design site NEBcutter http://tools.neb.com/NEBcutter2/index.php Molecular Biology restriction digests site Seq Massager http://www.attotron.com/cybertory/analysis/seqMassager.htm Cleaning up sequences for Bioinformatics platforms ABIM listing http://sites.univ-provence.fr/~wabim/english/logligne.html ABIM online sequence analysis tools GOLD http://www.genomesonline.org/cgi-bin/GOLD/index.cgi genome and metagenome projects DiArk http://www.diark.org/diark/ Resource for centralized monitoring of Compilation of eukaryotic genome and EST sequencing projects Bionumbers http://bionumbers.hms.harvard.edu/default.aspx ● Multiple sequence alignment, phylogeny, Venn diagrams Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin. Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees. ClustalOmega and Phylogenetic trees http://www.ebi.ac.uk/Tools/msa/clustalo/ Aligns protein sequences and makes phylogenetic trees. T-Coffee & M-Coffee http://tcoffee.vital-it.ch/apps/tcoffee/index.html Tools for combining and comparing multiple sequence alignments. WebLogos http://weblogo.berkeley.edu/ Creates Logos from multiple alignments MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual. Phylogeny.fr www.phylogeny.fr/ Very good phylogentic tree platform for beginners, has a great tool BlastExlorer to collect sequences for alignments and trees iTOL Interactive tree of life VENNY - http://bioinfogp.cnb.csic.es/tools/venny/index.html - Interactive tool for comparing lists with Venn Diagrams. ● Transmembrane and organellar targeting predictions TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices. TargetP http://www.cbs.dtu.dk/services/TargetP/ Prediction of protein localization. Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization. iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization. WoLF PSORT http://www.genscript.com/psort/wolf_psort.html Prediction of protein localization. Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction. COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP ● Long-range homology searches/structure modeling PSI-BLAST Position-Specific Iterated BLAST) Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index Recognition Engine V 2.0 Protein Homology/analogY PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server. MESSA http://prodata.swmed.edu/MESSA/MESSA.cgi MEta-Server for protein Sequence Analysis provides predictions of local sequence features, spatial structure, domain architecture and function for a protein sequence FFAS http://ffas.sanfordburnham.org/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System. COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance. I-TASSER http://zhanglab.ccmb.med.umich.edu/I-TASSER/ On-line platform for protein structure and function predictions (use Google Chrome Browser) Password required IT_0dfdz The same group has modeled all E. coli structures: http://zhanglab.ccmb.med.umich.edu/Ecoli/ FINDSITECOMB http://cssb.biology.gatech.edu/FINDSITE-COMB Threading/Structure-Based, ProteomicScale Virtual Ligand Screening tool ● Protein structures PDB and PDBe http://www.pdb.org/ and http://www.ebi.ac.uk/pdbe/ PDB is the main structure databases PDBe has user friendly features TargetTrack http://sbkb.org/tt/ Gives experimental progress and status of targets selected for structure determination. MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases. ● Conserved domains, motifs, and protein families COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain. CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database NCBI PSSM viewer http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi? Specific Scoring Matrix data used for Conserved Domain Database Position- PFAM http://pfam.janelia.org/ Protein FAMily database Superfamily database http://supfam.cs.bris.ac.uk/SUPERFAMILY/index.html PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database Seq2Ref http://prodata.swmed.edu/seq2ref/ Performs BLAST search for a query protein and retrieves the reference proteins (= experimentally studied or manually curated proteins) from NCBI, PDB and Swiss-Prot ENZYME & METABOLIC PATHWAY RESOURCES Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISSPROT protein database, BRENDA, KEGG, etc) IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database. KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured. IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc. Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/ EcoSal http://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways. BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases. AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram. MetaCrop http://metacrop.ipk-gatersleben.de/apex/f?p=269:111: Summarizes diverse information about around 40 metabolic pathways in crop plants HMDB http://www.hmdb.ca/ Human metabolome database COMPARATIVE GENOMICS RESOURCES General integration platforms (genome browsers, genome comparisons (pathways or synteny), phylogenetic distribution queries, physical clustering etc.) SEED http://www.theseed.org/wiki/Main_Page valuable tools. Database containing hundreds of genomes and many OpenSEED http://open.theseed.org/ PlantSEED http://plantseed.theseed.org Accessing SEES Clearinghouse: From the old SEED Front page at http://pubseed.theseed.org/index.cgi follow the link to "Peer-to-peer Updates" . Alternatively from ANY peg page in PubSEED, follow the blue link [to old protein page] - and from there use link to same link "Peer-to-peer Updates" Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial genomes), multiple tools MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation platform (Strong on metabolism) IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system (most up to date in terms of genomes) MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics. NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource. EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence / structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological functions of unknown enzymes discovered in genome projects. To classify a sequence into a superfamily, subgroup, or family using Hidden-Markov-Models or a BLAST search: From face page click on EFI - Informatics (SFLD) * Click on Search by Enzyme tab * Paste in sequence and select HMM or Blast. Multiple Associations platforms STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (coexpression), and the literature. STRING quantitatively integrates data from bacteria and other organisms. STITCH http://stitch.embl.de/ STRING incorporating small molecules eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and protein interaction data. AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis. DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms. Genome Projector http://www.g-language.org/GenomeProjector/ Protein-protein interactions BioGRID http://thebiogrid.org/ Protein and genetic interactions from major model organisms Arabidopsis Interactions Viewer http://bar.utoronto.ca/interactions/cgibin/arabidopsis_interactions_viewer.cgi Queries a database of 70944 predicted and 29777 confirmed Arabidopsis interacting proteins. The predicted interactions (interologs) are from Interactome 2.0. Specific for Algae AFAT http://pathways.mcdb.ucla.edu/algal/index.html Algal Functional Annotation Tool Phylogenetic distribution tools JGI Phylogenetic Profiler https://img.jgi.doe.gov/cgibin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes. Account required (Login name: A_D_Hanson Password: nosnaH_D_A) MicroScope Phyloprofile Exploration https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php? MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgi-bin/matchphyloprofile.cgi Regulatory sites prediction and analysis in bacteria RegPrecise http://regprecise.lbl.gov/RegPrecise/ Regulon database (Is intergrated in MicrobesOnline) RegPredict http://regpredict.lbl.gov/regpredict/ Platform to discover regulator binding sites RegTransbase http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main Knowledge database on bacterial regulators Promoter prediction in bacteria http://nucleix.mbu.iisc.ernet.in/prombase/ In general the Meme suite is great to identify motifs both DNA and Protein: Meme Suite http://meme.nbcr.net/meme/intro.html Glam2 for example is very powerful Associations based on phenotypes E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/ Profiling of all viable E.coli mutants on hundreds of chemicals Yeast Fitness database http://fitdb.stanford.edu/ Profiling of all S. cerevisiae mutants on hundreds of chemicals MICROARRAY & RNASeq DATABASES AND ANALYSIS RESOURCES GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of gene expression in Arabidopsis, and for finding co-responses. Golm yeast co-response database http://csbdb.mpimp-golm.mpg.de/csbdb/dbcor/sce.html ATTED http://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene networks, not just lists of correlated genes. COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals. COXPRESdb provides data for both S. cerevisiae and S. pombe, while Golm does cerevisae only. COXPRESdb displays co-expression data for orthologs -when they exist- in invertebrates and vertebrates. Such similar patterns of coexpression accross species can generate very strong predictions. FungiDB http://fungidb.org/fungidb/ GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes Translatome eFP http://efp.ucr.edu/ Transcriptome profiling of 13 discrete Arabidopsis cell populations PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data. PLEXdb http://www.plexdb.org/ Plant Expression Database Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns. CyanoExpress http://cyanoexpress.sysbiolab.eu Co-expression database for Synechocystis sp. PCC 6803. Instructions: 1) On the homepage, click 'gene expression' 2) Click 'all perturbations' 3) In the blank field 'search for' on the upper left part of the page paste your gene id (e.g. sll0635), click go 4) In the new page, click the microarray picture (a single band). A larger microarray image will appear; there is a column on the right that lists your top 20 co-regulated genes (most of them functionally annotated from the poorly maintained and obsolete Cyanobase). MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm Tool to plot and analyze large datasets qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes. ● Bacteria MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments. GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase Porteco http://expression.porteco.org/ E. coli microarray analysis they also have analysis of the phenotype data ● Yeast SPELL http://spell.yeastgenome.org/ Co-response search tool for yeast ● Mammals BioGPS http://biogps.org/#goto=welcome Comparing bacterial genomes MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Seedviewer http://pubseed.theseed.org/seedviewer.cgi IMG http://img.jgi.doe.gov/ All have genome synteny viewers ESSENTIAL GENES DATABASES (Pro- and Eukaryote) OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database DEG http://tubic.tju.edu.cn/deg/ or http://www.essentialgene.org/ Database of Essential Genes PLANT PHENOME DATABASES RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants. SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation. Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes. BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase PLANT METABOLOME DATABASE PlantMetabolomics http://tht.vrac.iastate.edu:81/ Consortium profiling the metabolome of specific TDNA knockout alleles for targeted genes PLANT PROTEOME DATABASES PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data) pep2pro http://fgcz-pep2pro.uzh.ch/ Organ-specific characterisation of the Arabidopsis proteome containing 14,522 identified proteins NBrowse http://www.arabidopsis.org/tools/nbrowse.jsp Arabidopsis protein-protein interaction database UNKNOWN GENE/ENZYME DATABASES ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes) ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae). Orphan enzymes project http://www.orphanenzymes.org/ GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown). POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome (may be defunct) LITERATURE MINING RESOURCES PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc HighWire Press http://highwire.stanford.edu/ Google Scholar http://scholar.google.com/ eTBlast http://etest.vbi.vt.edu/etblast3/ iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins MAIZE GENOME RESOURCES Maizesequence.org http://www.maizesequence.org/index.html Browser providing the latest sequence and annotation of the maize genome from the Maize Genome Sequencing Project MaizeGDB http://www.maizegdb.org/ Gramene http://www.gramene.org/ analysis of grasses Updated 12/11/2014 Maize genetics and genomics database Curated, open-source, data resource for comparative genome