Yellow Line Walk-through

advertisement
Yellow Line Walkthrough
A. Examining Transposons
Example Sequences:
mPing Mite Element, Ping Transposase Gene, Ping Transposase Protein
Tool(s):
Yellow Line TARGeT
Concept(s):
Mobile genetic elements (transposons), Non-autonomous
TARGeT: TARGeT (Tree Analysis
of Related Genes and
Transposons) uses either a DNA
or amino acid ‘seed’ query to:
(i) automatically identify and
retrieve gene family homologs
from a genomic database, (ii)
characterize gene structure
and (iii) perform phylogenetic
analysis. Due to its high speed,
TARGeT is also able to
characterize very large gene
families, including transposable
elements (TEs) (-from the
abstract of the TARGeT paper@
doi: 10.1093/nar/gkp295)
Transposons (DNA, Retroviral,
LINES): Genetic elements which
have the ability to be amplified
and redistributed within a
genome.
Non-autonomous transposons:
Transposons which lack an
active transposase gene, thus
requiring help from another
transposon to move.
Autonomous transposons:
Transposons which have a
functional tranposase and can
move within the genome.
I. Create Project
1. Log-in to DNA Subway (dnasubway.iplantcollaborative.org)
2. Click ‘Prospect Genomes using TARGeT’ (Yellow Square)
3. Select sample: mPing Mite Element (Oryza sativa/ Rice)
4. Provide your project with a title, then Click ‘Continue’
II. Search the O.sativa genome using TARGeT
1. Click ‘Oryza sativa japonica’ in the ‘Select Genomes’ stop
2. Click ‘Run’ again to search the genome.
III. Identify the number of mPing elements in the O.sativa genome
1. Click ‘Alignment Viewer’ to see results returned.
Genome name
Hit# Project #
Key to results naming in alignment viewer
*Double clicking the hit name opens the sequence and location in new browser tab.
2. Record the number of hits in the table below.
IV. Identify the number of Ping transposons (using DNA sequence and protein)
Repeat the steps above (Sections I-III) using Ping transposase gene and Ping Transposase protein to answer
collect the following data and answer the following questions.
Number of hits in O.sativa
Hit number 1 – locus
Hit number 2 – locus
Hit number 3 – locus
Hit number 4 – locus
Hit number 5 – locus
1
mPing mite element
52
Chr: 6
Ping Transposon (DNA)
Ping Transposon (Protein)
Advanced Yellow Line Example
Prospecting example: Finding and analyzing DNA transposons (Ping - DNA transposon in rice)
Background Reading: http://www.nature.com/nature/journal/v421/n6919/full/nature01214.html
Example:
1.
2.
3.
4.
5.
6.
2
Open DNA Subway and start a new project in the yellow line selecting the Ping Transposon from
the sample sequences.
Enter a project title and click ‘Continue.’
In the ‘Search Genomes’ stop select Oryza sativa japonica and click ‘Run.’
a. Click ‘Alignment Viewer’ to view the results of your search. This will open up two screens, one
displaying a tree and another displaying sequence alignments. How many matches did the
search yield? What is the relationship between the match and the query?
b. Close all viewers and return to DNA Subway.
Create a new project, this time querying rice with the Ping transposase Gene [ORF] as query.
a. How many matches did this search yield? (Again, use the alignment screen to count.)
b. To view details about a match, double-click its ID (left-most column in Alignment Viewer;
enable pop-ups in your browser). This screen also has a link to open Phytozome at the location
of the match.
c. Using the tree, determine the relationships among the hits. As the query sequence originates in
the rice genome you can identify the match that’s identical to the query sequence.
d. Close all viewers and return to DNA Subway.
Create a new project, this time querying rice with the Ping Transposase Protein.
a. How many matches did this search yield? Explain the differences in the number of results for
the three queries.
b. In the alignment screen, find the row for the query (ID=Ping), click its ID field once (left-most
column), then bring the tree screen to the foreground and find Ping among the matches
displayed.
c. All matches constitute sequences that are contained in the genome of the rice plant that was
sequenced to determine the sequence of the entire rice genome. What do the lengths of tree
branches indicate?
d. Transposable elements that diverged from a common ancestor more recently will differ from
each other less than they would differ from those that diverged in the more distant past. How
many groups of transposons contain matches that seem to have diverged from each other
more recently? What would you be looking for in order to answer this question?
Repeat the different kinds of searches and analyses in other genomes. To date only rice, maize,
and Arabidopsis have been exhaustively studied for TEs. Prospecting other genomes will reveal
new information about these organisms.
Biological Concepts
Genomes, Genes and Transposons
 A genome is an organism’s entire complement of DNA.
 DNA is a directional molecule composed of two anti-parallel strands.
 The genetic code is read in a 5’ to 3’ direction, referring to the 5’ and 3’ carbons of deoxyribose.
 Eukaryotic genomes contain large amounts of repetitive DNA, including simple repeats and transposons.
 Transposons can be located in intergenic regions (between genes) or in introns (within genes).
 Genes and transposons are directional, and can be encoded on either DNA strand.
 Repeats are non-directional, and, in effect, do occur on both strands.
 Transposons can mutate like any other DNA sequence.
 Protein-coding information in DNA and RNA begins with a start codon, is followed by codons, and ends with a
stop codon.
 Codons in mRNA (5’-AUG-3’, etc.) have sequence equivalents in DNA (5’-ATG-3’, etc.).
 The DNA strand that is equivalent to mRNA is called the “coding strand.” The complementary strand is called
the “template strand,” because it serves as the template for synthesizing mRNA.
 Non-spliced genes, which are characteristic of prokaryotes, are also found in eukaryotes.
 Even in a spliced gene, the protein-coding information may be organized as Open Reading Frame (ORF).
 Most eukaryotic genes are spliced, whereby intervening segments (introns) are removed and the remaining
segments (exons) are spliced together.
 Splice sites (exon-intron boundaries) have sequence patterns that are recognized by the splicing apparatus
(spliceosome).
 Gene prediction programs use consensus sequences around splice sites to predict exon-intron boundaries.
 Over 90% of eukaryotic introns have “canonical splice sites,” whereby introns begin with GT (mRNA: GU) and
end in AG (mRNA: AG).
 The protein coding sequence of a eukaryotic mRNA (or gene) is flanked by 5’- and 3’-untranslated regions
(UTRs); introns can be located in UTRs.
 In most eukaryotic genes, transcripts are alternatively spliced, yielding different mRNAs and proteins.
 UTRs hold information for the half-lives of mRNAs and for regulatory purposes.
 Gene > mRNA > CDS.
 CDS = nucleotides that encode amino acid sequence.
 In mRNA: CDS = ORF.
BLAST Searches
 Basic Local Alignment Search Tool (BLAST) searches databases for matches to a query DNA or protein
sequence.
 Gene or protein homologs share sequence similarities due to descent from a common ancestor.
 Biological evidence is needed to edit and confirm gene models predicted by computer algorithms.
 Biological evidence is most often derived from mRNA transcripts (ESTs, cDNAs, RNAseq). Protein sequence
data are available, too, but much less common.
 Many ESTs and cDNAs are disrupted by “introns” when they are aligned against genomic DNA.
 ESTs & cDNAs may be incomplete.
 The BLAST algorithm does not resolve intron/exon boundaries.
 The BLAST algorithm is not restricted to detecting sequences that fully match a query (“global” matches) but,
instead, matches query subsequences as well (“local” matches).
The BLAST algorithm matches sequences to the fullest extent possible and, often, realigns the same sequence
3
Web Resources
A. Major Plant Genome Hubs:
DOE JGI’s http://www.phyotozme.net
University of Iowa: http://www.plantgdb.org/
CSHL: http://www.gramene.org/
ENSEMBL: http://plants.ensembl.org/index.html
NCBI: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html
NCBI: http://www.ncbi.nlm.nih.gov/mapview/
B. Some Plant Genome Portals:
Arabidopsis, TAIR: http://www.arabidopsis.org/
Corn: http://www.maizesequence.org/index.html
Grape: http://www.cns.fr/externe/GenomeBrowser/Vitis/
Poplar: http://genome.jgi-psf.org/poplar/poplar.home.html
Rice: http://rice.plantbiology.msu.edu/
Tomato: http://solgenomics.net/about/tomato_sequencing.pl
C. Browsers:
Ensembl: http://www.ensembl.org
GBrowse: http://gmod.org/wiki/GBrowse
JBRowse: http://jbrowse.org/
UCSC Browser: http://genome.ucsc.edu
xGDB: http://brendelgroup.org/bioinformatics2go/bioinformatics2go.php
D. Other Resources:
Course download site: http://gfx.dnalc.org/files/evidence
DynamicGene: http://www.sanger.ac.uk/resources/software/artemis/
GeneBoy: http://www.dnai.org/geneboy/
BioServers: http://www.bioservers.org/bioserver/
mRNA/gDNA: http://www.ncbi.nlm.nih.gov/spidey/
mRNA/gDNA: http://pbil.univ-lyon1.fr/sim4.php
Splice site predictor: http://www.fruitfly.org/seq_tools/splice.html
Promoter predictor: http://www.fruitfly.org/seq_tools/promoter.html
4
Download