Biological Concepts Genomes A genome is an organism’s entire complement of DNA. DNA is a directional molecule composed of two anti-parallel strands. The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose. Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons. Transposons can be located in intergenic regions (between genes) or in introns (within genes). Genes and transposons are directional, and can be encoded on either DNA strand. Repeats are non-directional, and, in effect, do occur on both strands. Transposons can mutate like any other DNA sequence. Genes Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a stop codon. Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.). The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called the “template strand,” because it serves as the template for synthesizing mRNA. Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes. Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF). Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining segments (exons) are spliced together. Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus (spliceosome). Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries. Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and end in AG (mRNA: AG). The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions (UTRs); introns can be located in UTRs. In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins. UTRs hold information for the half-lives of mRNAs and for regulatory purposes. Gene > mRNA > CDS. CDS = nucleotides that encode amino acid sequence. In mRNA: CDS = ORF. BLAST Searches Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein sequence. Gene or protein homologs share sequence similarities due to descent from a common ancestor. Biological evidence is needed to edit and confirm gene models predicted by computer algorithms. Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence data are available, too, but much less common. Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA. ESTs & cDNAs may be incomplete. The BLAST algorithm does not resolve intron/exon boundaries. The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but, instead, matches query subsequences as well (“local” matches). The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence twice. 1 Web Resources A. Major Plant Genome Hubs: DOE JGI’s http://www.phyotozme.net University of Iowa: http://www.plantgdb.org/ CSHL: http://www.gramene.org/ ENSEMBL: http://plants.ensembl.org/index.html NCBI: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html NCBI: http://www.ncbi.nlm.nih.gov/mapview/ B. Some Plant Genome Portals: Arabidopsis, TAIR: http://www.arabidopsis.org/ Corn: http://www.maizesequence.org/index.html Grape: http://www.cns.fr/externe/GenomeBrowser/Vitis/ Poplar: http://genome.jgi-psf.org/poplar/poplar.home.html Rice: http://rice.plantbiology.msu.edu/ Tomato: http://solgenomics.net/about/tomato_sequencing.pl C. Browsers: Ensembl: http://www.ensembl.org GBrowse: http://gmod.org/wiki/GBrowse JBRowse: http://jbrowse.org/ UCSC Browser: http://genome.ucsc.edu xGDB: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php D. Annotation Tools: Apollo: http://apollo.berkeleybop.org/current/index.html Artemis: http://www.sanger.ac.uk/resources/software/artemis/ yrGATE: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php E. Other Resources: Course download site: http://gfx.dnalc.org/files/evidence DynamicGene: http://www.sanger.ac.uk/resources/software/artemis/ GeneBoy: http://www.dnai.org/geneboy/ BioServers: http://www.bioservers.org/bioserver/ mRNA/gDNA: http://www.ncbi.nlm.nih.gov/spidey/ mRNA/gDNA: http://pbil.univ-lyon1.fr/sim4.php Splice site predictor: http://www.fruitfly.org/seq_tools/splice.html 2 Promoter predictor: http://www.fruitfly.org/seq_tools/promoter.html 3