COMPARATIVE GENOMICS WORKSHOP ‘RESSOURCEMENT’ BASIC RESOURCES ● General NCBI http://www.ncbi.nlm.nih.gov/ Entrez nucleotide and protein data bases; Blast similarity search programs. TAIR http://www.arabidopsis.org/ The Arabidopsis information resource. TIGR http://plantta.jcvi.org/index.shtml Annotated Arabidopsis, rice etc genomes. TIGR Gene Indices (analysis of public EST data (contig assembly, analysis of expression patterns). EcoliHub http://ecolihub.org/ & EcoliWiki http://ecoliwiki.net/colipedia/index.php/Welcome_to_EcoliWiki SGD http://www.yeastgenome.org/ Saccharomyces genome database ExPASy Translate Tool http://www.expasy.ch/tools/dna.html Translates a DNA sequence in all 6 frames ExPASy Compute PI/Mol Wt Tool http://www.expasy.ch/tools/pi_tool.html ExPASy AACompIdent Tool http://www.expasy.ch/tools/aacomp/ Identification of a protein from its amino acid composition. Primer3 http://frodo.wi.mit.edu/primer3/ Primer design site NEBcutter http://tools.neb.com/NEBcutter2/index.php Molecular Biology restriction digests site Seq Massager http://www.attotron.com/cybertory/analysis/seqMassager.htm Cleaning up sequences for Bioinformatics platforms ABIM listing http://sites.univ-provence.fr/~wabim/english/logligne.html ABIM online sequence analysis tools GOLD http://www.genomesonline.org/cgi-bin/GOLD/index.cgi genome and metagenome projects DiArk http://www.diark.org/diark/ Resource for centralized monitoring of Compilation of eukaryotic genome and EST sequencing projects ● Multiple sequence alignment, phylogeny, Venn diagrams Computational Approaches in Comparative Genomics http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=sef.TOC&depth=1 On-line textbook by EV Koonin & MY Galperin. Multalin Sequence Alignment http://multalin.toulouse.inra.fr/multalin/ Aligns sequences (output in color) and makes phylogenetic trees. ClustalOmega and Phylogenetic trees http://www.ebi.ac.uk/Tools/msa/clustalo/ Aligns protein sequences and makes phylogenetic trees. T-Coffee & M-Coffee http://tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee_cgi/index.cgi Tools for combining and comparing multiple sequence alignments. WebLogos http://weblogo.berkeley.edu/ Creates Logos from multiple alignments MEGA http://www.megasoftware.net/ The MEGA phylogeny program, downloads and manual. Phylogeny.fr www.phylogeny.fr/ Very good phylogentic tree platform for beginners, has a great tool BlastExlorer to collect sequences for alignments and trees iTOL Interactive tree of life VENNY - http://bioinfogp.cnb.csic.es/tools/venny/index.html - Interactive tool for comparing lists with Venn Diagrams. ● Transmembrane and organellar targeting predictions TMHMM http://www.cbs.dtu.dk/services/TMHMM/ Prediction of transmembrane helices. TargetP http://www.cbs.dtu.dk/services/ Prediction of protein localization. Predotar http://urgi.versailles.inra.fr/predotar/predotar.html Prediction of protein localization. iPSORT http://hc.ims.u-tokyo.ac.jp/iPSORT/ Prediction of protein localization. WoLF PSORT http://wolfpsort.org/ Prediction of protein localization. Signal-3L http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/# Signal peptide prediction. COSMOSS Ambiguous Targeting Predictor http://www.cosmoss.org/bm/ATP ● Long-range homology searches PSI-BLAST Position-Specific Iterated BLAST) Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index Recognition Engine V 2.0 Protein Homology/analogY PSIPRED GenTHREADER http://bioinf.cs.ucl.ac.uk/psipred/ Protein structure prediction server. MESSA http://prodata.swmed.edu/MESSA/MESSA.cgi MEta-Server for protein Sequence Analysis provides predictions of local sequence features, spatial structure, domain architecture and function for a protein sequence FFAS03 http://ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl Fold & Function Assignment System. COMPASS http://prodata.swmed.edu/compass/compass.php COmparison of Multiple Protein sequence Alignments with assessment of Statistical Significance. ● Protein structures PDB and PDBe http://www.pdb.org/ and http://www.ebi.ac.uk/pdbe/ PDB is the main structure databases PDBe has user friendly features TargetTrack http://sbkb.org/tt/ Gives experimental progress and status of targets selected for structure determination. MMDB http://www.ncbi.nlm.nih.gov/sites/entrez?db=structure Molecular Modeling DataBase with >40,000 structures, linked to the rest of the NCBI databases. ● Conserved domains, motifs, and protein families COGs http://www.ncbi.nlm.nih.gov/COG/ Clusters of Orthologous Groups (COGs), delineated by comparing protein sequences encoded in many complete genomes representing 30 major phylogenetic lineages. Each COG consists of proteins from at least 3 lineages and thus corresponds to an ancient conserved domain. CCD http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml NCBI Conserved Domain Database NCBI PSSM viewer http://www.ncbi.nlm.nih.gov/Class/Structure/pssm/pssm_viewer.cgi? Specific Scoring Matrix data used for Conserved Domain Database Position- PFAM http://pfam.janelia.org/ Protein FAMily database Superfamily database http://supfam.cs.bris.ac.uk/SUPERFAMILY/index.html PRODOM http://prodom.prabi.fr/prodom/current/html/home.php PROtein DOMain families database Seq2Ref http://prodata.swmed.edu/seq2ref/ Performs BLAST search for a query protein and retrieves the reference proteins (= experimentally studied or manually curated proteins) from NCBI, PDB and Swiss-Prot ENZYME & METABOLIC PATHWAY RESOURCES Swiss-Prot Enzyme http://ca.expasy.org/enzyme/ Enzyme nomenclature data base (linked to SWISSPROT protein database, BRENDA, KEGG, etc) IntEnz http://www.ebi.ac.uk/intenz/ Integrated relational Enzyme database BRENDA http://www.brenda-enzymes.info/ Comprehensive enzyme database. KEGG http://www.genome.ad.jp/kegg/ The Kyoto Encyclopedia of Genes and Genomes. Includes metabolic pathways, and compound structures that can be captured. IUBMB http://www.chem.qmul.ac.uk/iubmb/ and the subsection on Reaction schemes http://www.chem.qmul.ac.uk/iubmb/enzyme/reaction/ The website of the International Union of Biochemistry and Molecular Biology – Searchable database on enzyme, enzyme nomenclature; some high quality information on pathways etc. Thermodynamics of Enzyme-Catalyzed Reactions http://xpdb.nist.gov/enzyme_thermodynamics/ EcoSal http://www.ecosal.org/ EcoSal, a new, continually updated Web resource based on the ASM Press publication Escherichia coli and Salmonella: Cellular and Molecular Biology. EcoSal is a comprehensive archive of knowledge on the enteric bacterial cell and a good source of the latest knowledge of metabolic pathways. BioCyc, EcoCyc & MetaCyc http://BioCyc.org/ EcoCyc - Encyclopedia of E. coli Genes and Metabolism; MetaCyc - Metabolic Encyclopedia. Also computationally-derived pathway/genome databases. AraCyc http://www.arabidopsis.org/biocyc/index.jsp Similar to BioCyc, for Arabidopsis. Software allows querying, graphical representation of pathways, and overlay of expression data on the biochemical pathway overview diagram. MetaCrop http://pgrc-35.ipk-gatersleben.de/pls/htmldb_pgrc/f?p=269:111: Summarizes diverse information about around 40 metabolic pathways in crop plants HMDB http://www.hmdb.ca/ Human metabolome database COMPARATIVE GENOMICS (‘PHYLOGENOMICS’) RESOURCES General integration platforms (genome browsers, genome comparisons (pathways or synteny), phylogenetic distribution queries, physical clustering etc.) SEED http://www.theseed.org/wiki/Main_Page valuable tools. Database containing hundreds of genomes and many Patric http://www.patricbrc.org/ Emphasis on Pathogenic bacteria (but contains all sequenced bacterial genomes), multiple tools MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Microbial genome annotation platform (Strong on metabolism) IMG http://img.jgi.doe.gov/ Integrated Microbial genomes data analysis system (most up to date in terms of genomes) MGDB http://mbgd.genome.ad.jp/ Microbial Genome DataBase for comparative genomics. NMPDR http://www.nmpdr.org/cur/FIG/wiki/view.cgi/Main/WebHome National Microbial Pathogen Data Resource. EFI http://enzymefunction.org/ The Enzyme Function Initiative (EFI) is developing a robust sequence / structure based strategy for facilitating discovery of in vitro enzymatic and in vivo metabolic / physiological functions of unknown enzymes discovered in genome projects. To classify a sequence into a superfamily, subgroup, or family using Hidden-Markov-Models or a BLAST search: From face page click on EFI - Informatics (SFLD) * Click on Search by Enzyme tab * Paste in sequence and select HMM or Blast. Multiple Associations platforms STRING http://string.embl.de/ Database of known and predicted protein-protein relationships, derived from genomic context (fusions, conserved gene clusters, co-occurrence), high throughput experiments (coexpression), and the literature. STRING quantitatively integrates data from bacteria and other organisms. STITCH http://stitch.embl.de/ STRING incorporating small molecules eNet http://ecoli.med.utoronto.ca/ E. coli gene function prediction database integration of microarray and protein interaction data. AraNet http://www.functionalnet.org/aranet/search.html Gene associations in Arabidopsis. DIP http://dip.doe-mbi.ucla.edu/dip/Main.cgi database of interacting proteins from different organisms. Genome Projector http://www.g-language.org/GenomeProjector/ The following two are a bit outdated but can still be useful: PHYDBAC http://igs-server.cnrs-mrs.fr/phydbac/ PHYDBAC displays phylogenomic profiles (fusions, cooccurrence, co-localization in genome) of bacterial protein sequences. Analyzing the annotation of a protein’s phylogenomic neighbors helps generate hypothetical functions for the query protein(s). FusionDB http://igs-server.cnrs-mrs.fr/FusionDB/main.html FusionDB is a database of bacterial and archaeal gene fusion events. Specific for Algae AFAT http://pathways.mcdb.ucla.edu/algal/index.html Algal Functional Annotation Tool Phylogenetic distribution tools JGI Phylogenetic Profiler http://img.jgi.doe.gov/cgibin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm Phylogenetic Profiler for Single Genes. MicroScope Phyloprofile Exploration https://www.genoscope.cns.fr/agc/microscope/compgenomics/phyloprofil.php? MicrobesOnline Phyletic Pattern http://www.microbesonline.org/cgi-bin/matchphyloprofile.cgi Regulatory sites prediction and analysis in bacteria RegPrecise http://regprecise.lbl.gov/RegPrecise/ Regulon database (Is intergrated in MicrobesOnline) RegPredict http://regpredict.lbl.gov/regpredict/ Platform to discover regulator binding sites RegTransbase http://regtransbase.lbl.gov/cgi-bin/regtransbase?page=main Knowledge database on bacterial regulators Promoter prediction in bacteria http://nucleix.mbu.iisc.ernet.in/prombase/ In general the Meme suite is great to identify motifs both DNA and Protein: Meme Suite http://meme.nbcr.net/meme/intro.html Glam2 for example is very powerful Associations based on phenotypes E. coli Phenotypic Landscape http://ecoliwiki.net/tools/chemgen/ Profiling of all viable E.coli mutants on hundreds of chemicals Yeast Fitness database http://fitdb.stanford.edu/ Profiling of all S. cerevisiae mutants on hundreds of chemicals MICROARRAY – RNASeq DATABASES AND ANALYSIS RESOURCES ● General GEO http://www.ncbi.nlm.nih.gov/geo/ Gene Expression Omnibus ● Plants Golm Transcriptome database http://csbdb.mpimp-golm.mpg.de/csbdb/dbxp/ath/ath_xpmgq.html Good tools for getting an overview of gene expression in Arabidopsis, and for finding co-responses. ATTED http://atted.jp/ A simple site to use to look for co-expression patterns in Arabidopsis; it shows gene networks, not just lists of correlated genes. COEXPRESdb http://coxpresdb.jp Co-expression in yeasts and animals. GeneCAT http://genecat.mpg.de/ GeneCAT Gene Co-expression Analysis Toolbox for Arabidopsis, rice, poplar, and barley Diurnal http://diurnal.mocklerlab.org/ Circadian/Diurnal gene expression data for an individual or set of Arabidopsis, rice, or poplar genes Translatome eFP http://efp.ucr.edu/ Transcriptome profiling of 13 discrete Arabidopsis cell populations PRIMe http://prime.psc.riken.jp/ Server for metabolomics and transcriptomics, tools for metabolomics, transcriptomics and integrated analysis of different omics data. PLEXdb http://www.plexdb.org/ Plant Expression Database Botany Array Resource http://bbc.botany.utoronto.ca/ Tools for finding co-responses, electronic Northerns. MetaOmGraph http://metnetdb.org/MetNet_MetaOmGraph.htm Tool to plot and analyze large datasets qteller http://qteller.com/ RNAseq data for maize, sorghum, rice. Simple tools for expression in various organs, correlation of expression of two genes. ● Bacteria MicrobesOnline http://www.microbesonline.org/ A comprehensive database that includes correlated gene expression in E. coli and other bacteria EcoGene http://ecogene.org/ A rich resource on E. coli that includes Microarray data on the major changes in gene expression observed in various experiments. GenExpDB http://chase.ou.edu/oubcf/ E. coli Community Gene Expression DataBase Porteco http://expression.porteco.org/ E. coli microarray analysis they also have analysis of the phenotype data ● Yeast SPELL http://imperio.princeton.edu:3000/yeast Co-response search tool for yeast ● Mammals BioGPS http://biogps.gnf.org/#goto=welcome Comparing bacterial genomes MicroScope https://www.genoscope.cns.fr/agc/microscope/home/index.php Seedviewer http://pubseed.theseed.org/seedviewer.cgi IMG http://img.jgi.doe.gov/ All have genome synteny viewers ESSENTIAL GENES DATABASES (Pro- and Eukaryote) OGEE http://ogeedb.embl.de/#summary Online GEne Essentiality database DEG http://tubic.tju.edu.cn/deg/ Database of Essential Genes PLANT PHENOME DATABASES RAPID http://rarge.gsc.riken.jp/phenome/ RIKEN Arabidopsis Phenome Information Database, phenotypic data in transposon-insertional mutants. SeedGenes http://www.seedgenes.org/ Genes that give a seed phenotype when disrupted by mutation. Chloroplast2010 http://www.plastid.msu.edu/ Large set of phenotypic for homozygous mutant of chloroplast genes. BAPDB http://bioweb.ucr.edu/bapdb/ Bioassay And Phenotype DataBase PLANT METABOLOME DATABASE PlantMetabolomics http://tht.vrac.iastate.edu:81/ Consortium profiling the metabolome of specific TDNA knockout alleles for targeted genes PLANT PROTEOME DATABASES PPDB http://ppdb.tc.cornell.edu/ The Plant Proteome DataBase SUBA3 http://suba.plantenergy.uwa.edu.au/ SUB-cellular location database for Arabidopsis proteins (includes GFP and MS-MS data) pep2pro http://fgcz-pep2pro.uzh.ch/ Organ-specific characterisation of the Arabidopsis proteome containing 14,522 identified proteins http://www.grenoble.prabi.fr/at_chloro/ AT_CHLORO stores information for proteins that have been identified in stroma, thylakoid, and envelope fractions of Arabidopsis chloroplasts NBrowse http://www.arabidopsis.org/tools/nbrowse.jsp Arabidopsis protein-protein interaction database UNKNOWN GENE/ENZYME DATABASES POND http://bioweb.ucr.edu/scripts/unknownsDisplay.pl Plant Unknown-eome DB (POND) – Arabidopsis Unknown-eome ORENZA http://www.orenza.u-psud.fr/ ORphan ENZyme Activities database (lists 1,200 orphan enzymes) ADOMETA http://vitkuplab.cu-genome.org/html/adometa/adometa.html ADoption of Orphan METabolic Activities (Orphan enzyme activities in E. coli, B. subtilis, and S. cerevisiae). GREP http://bisscat.org/GREP/ Generator of Reaction Equations & Pathways look for reported and putative enzyme reaction equations, especially designed for finding metabolic pathways on orphan metabolites (compounds known to be present at least in a living organism, but whose synthetic/degradation pathways are unknown). LITERATURE MINING RESOURCES PubMed Central http://www.ncbi.nlm.nih.gov/sites/entrez?db=pmc HighWire Press http://highwire.stanford.edu/ Google Scholar http://scholar.google.com/ eTBlast http://etest.vbi.vt.edu/etblast3/ iHOP http://www.ihop-net.org/UniPub/iHOP/ information Hyperlinked Over Proteins MAIZE GENOME RESOURCES Maizesequence.org http://www.maizesequence.org/index.html Browser providing the latest sequence and annotation of the maize genome from the Maize Genome Sequencing Project MaizeGDB http://www.maizegdb.org/ Gramene http://www.gramene.org/ analysis of grasses Updated 5/2/13 Links verified 5/2/13 Maize genetics and genomics database Curated, open-source, data resource for comparative genome