Concepts and Links

advertisement
Biological Concepts
Genomes
 A genome is an organism’s entire complement of DNA.
 DNA is a directional molecule composed of two anti-parallel strands.
 The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose.
 Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons.
 Transposons can be located in intergenic regions (between genes) or in introns (within genes).
 Genes and transposons are directional, and can be encoded on either DNA strand.
 Repeats are non-directional, and, in effect, do occur on both strands.
 Transposons can mutate like any other DNA sequence.
Genes
 Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a
stop codon.
 Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.).
 The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called
the “template strand,” because it serves as the template for synthesizing mRNA.
 Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes.
 Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF).
 Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining
segments (exons) are spliced together.
 Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus
(spliceosome).
 Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries.
 Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and
end in AG (mRNA: AG).
 The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions
(UTRs); introns can be located in UTRs.
 In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins.
 UTRs hold information for the half-lives of mRNAs and for regulatory purposes.
 Gene > mRNA > CDS.
 CDS = nucleotides that encode amino acid sequence.
 In mRNA: CDS = ORF.
BLAST Searches
 Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein
sequence.
 Gene or protein homologs share sequence similarities due to descent from a common ancestor.
 Biological evidence is needed to edit and confirm gene models predicted by computer algorithms.
 Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence
data are available, too, but much less common.
 Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA.
 ESTs & cDNAs may be incomplete.
 The BLAST algorithm does not resolve intron/exon boundaries.
 The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but,
instead, matches query subsequences as well (“local” matches).
 The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence
twice.
1
Web Resources
A. Major Plant Genome Hubs:
DOE JGI’s http://www.phyotozme.net
University of Iowa: http://www.plantgdb.org/
CSHL: http://www.gramene.org/
ENSEMBL: http://plants.ensembl.org/index.html
NCBI: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html
NCBI: http://www.ncbi.nlm.nih.gov/mapview/
B. Some Plant Genome Portals:
Arabidopsis, TAIR: http://www.arabidopsis.org/
Corn: http://www.maizesequence.org/index.html
Grape: http://www.cns.fr/externe/GenomeBrowser/Vitis/
Poplar: http://genome.jgi-psf.org/poplar/poplar.home.html
Rice: http://rice.plantbiology.msu.edu/
Tomato: http://solgenomics.net/about/tomato_sequencing.pl
C. Browsers:
Ensembl: http://www.ensembl.org
GBrowse: http://gmod.org/wiki/GBrowse
JBRowse: http://jbrowse.org/
UCSC Browser: http://genome.ucsc.edu
xGDB: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php
D. Annotation Tools:
Apollo: http://apollo.berkeleybop.org/current/index.html
Artemis: http://www.sanger.ac.uk/resources/software/artemis/
yrGATE: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php
E. Other Resources:
Course download site: http://gfx.dnalc.org/files/evidence
DynamicGene: http://www.sanger.ac.uk/resources/software/artemis/
GeneBoy: http://www.dnai.org/geneboy/
BioServers: http://www.bioservers.org/bioserver/
mRNA/gDNA: http://www.ncbi.nlm.nih.gov/spidey/
mRNA/gDNA: http://pbil.univ-lyon1.fr/sim4.php
Splice site predictor: http://www.fruitfly.org/seq_tools/splice.html
2
Promoter predictor: http://www.fruitfly.org/seq_tools/promoter.html
3
Download