91.350/580 LAB: GENOME ANNOTATION 2/25/15 Reference on

advertisement
COMP.350/580.202
LAB: GENOME ANNOTATION
2/3/16
Reference on Annotation (www.cs.uml.edu/~kim/580/review_Annotation.pdf).
LAB Experiments
DUE 2/4 (Th) 5:00 PM
Write answers to the bold-faced questions through Experiment 3. If you can
complete any other questions after question 12 in Experiment 3, include them
in the answers. Email the answers to kim@cs.uml.edu.
Experiment 1: Find Repeats in DNA
Concept: Genomes consist to a larger or lesser extend of various types of repetitive
DNA.
I. Create a Project
1. Go to http://www.dnasubway.org and sign up as a guest.
2. In DNA Subway, click the red square to annotate a genomic sequence.
3. Under ‘Select Organism type,’ select ‘Plant’ and ‘Dictyledon.’
4. Select ‘Select a sample sequence,’ and pick Arabidopsis thaliana (mouse-ear
cress) Synthetic Contig.
5. Provide a title (required), a project description (optional) and click
Continue.
II. Identify and Mask Repeats
1. Click RepeatMasker.
2. Once the bullet has finished blinking, click RepeatMasker again to view a
listing of repetitive DNA sequences RepeatMasker has identified and
masked.
3. How many and which types of repetitive DNA did RepeatMasker
identify?
What are their lengths?
Can you identify any association between types and length ranges?
4. Close the table to return to DNA Subway.
5. Click Local Browser to view the results in a graphical interface.
6. Maximize the browser window.
7. Change Show 10 kb to Show 25 kbp.
8. How many and which types of repetitive DNA does the browser
display?
9. Which of the two views, table or graphics, would you find easier to
work with.
10. Close the Local Browser screen to return to DNA Subway.
Experiment 2: Predict Genes in DNA
Genes can be identified by their characteristics – where do gene predictors “see” genes?
1. Click Augustus.
2. Once Augustus has finished click FGenesH. Then, click SNAP. Finally, click
tRNA Scan. (The Augustus, FGenesH and SNAP algorithms predict proteincoding genes; tRNA Scan identifies tRNA genes.)
3. Which program runs significantly longer than any other?
4. Again, view the results in the table view and the Local Browser. How many
genes did the gene predictors predict?
What kind of structures can you identify in the browser?
What do the different structure elements symbolize?
5. Do the different programs predict the same genes ?
Can you identify differences among the predictions?
Which do you think got it right?
Experiment 3: Insert a Start Codon into a Gene
Genes have a beginning and an end.
1. Click Apollo.
2. Click Tiers and select Expand Tiers to view the entire evidence available.
(Apollo initially collapses each evidence types onto a single line each,
regardless of how many pieces of evidence are available for each position.)
3. Describe how gene features are displayed by Apollo; does Apollo use the
same or different graphical elements than the browser?
4. Compare and contrast the predicted gene models for the four locations.
Zoom, pan and scroll to nucleotide position 600-1,600 until you can
comfortably view details for a gene on the forward strand in this location.
5. Compare the predictions with each other – what similarities and
differences can you identify?
6. Discrepancies between the gene predictions and biological evidence consist
in: inaccurate transcriptional start and termination sites and therefore
inaccurate 5’- and 3’-untranslated regions (caused by difficulties predicting
first and last exons due to transcriptional start and termination sites not
following easily discernable patterns).
7. Double-click the FGenesH prediction and move it onto the workspace.
8. What is the meaning of the green and red lines that appeared at the
ends of this prediction upon moving it onto the workspace?
Zoom into the beginning of the gene until you can discern the nucleotide
triplet in the position to the left.
What does the green highlight indicate?
9. Zoom and pan to the end of the gene to examine the meaning of the red
highlight. What nucleotide triplet do you find? What is its meaning?
10. Do these findings synch with what you know about molecular biology?
Explain how a G on DNA ends up being a G on mRNA instead of a C.
11. Zoom out to view the region from position 600 to position 1600.
12. Double-click and move the Augustus prediction onto the workspace.
What structures can you identify in this model?
Zoom into the model until you can discern the individual letters of the
sequences.
What does the filled box indicate? What about the open part of the box?
13. The August-predicted model does not seem to entail a start codon. In order
to fix this, move your cursor to the top of the Apollo screen where you should
be able to identify three rows of green and three rows of red ticks. What do
you think these represent? (Hint: zoom into the locations for a few of these
ticks and check the sequence that is associated with each of them.)
14. Drag the first green tick that is located within the boundaries of the Augustus
predicted gene model onto the model in your workspace and let go. Describe
the result of this action.
15. The FGenesH prediction and the Augustus prediction for this gene are not
mutually exclusive; explain why this is so. What parts of genes do you think
FGenesH is programmed to predict? How about Augustus?
Experiment 4: Examine Spliced Genes
What sequence patterns signify splice sites?
1. Zoom, pan and scroll to nucleotide position 2,000-5,600 until you can
comfortably view details for a gene on the forward strand in this location.
2. Compare the predictions with each other – what similarities and differences
can you identify?
3. What would you need in order to decide which of the predictions is correct?
4. Double-click each of the three predictions and move them onto the
workspace.
5. Determine the pattern that signifies the borders between exons and introns
(splice sites):
a. zoom into the first exon-intron border for the Augustus-derived model
(position 2387/2388) until you can read the nucleotide sequence;
b. record the last three nucleotides of the exon and the first three
nucleotides of the successive intron;
c. pan to the next exon-intron border (position 3017/3018) and repeat;
d. repeat again for the last exon-intron border at position 4353/4354;
e. pan to the first intron-exon border (position 2694/2695) and record the
last three nucleotides for the intron and the first three nucleotides for
the successive exon;
f. repeat for the intron-exon borders in positions 3723/3724 and
4445/4446.
g. Determine the nucleotide sequence pattern for exon-intron and the
pattern for intron-exon borders;
h. refine your findings by conducting the same analysis for the exon-intron
border of the Augustus-derived gene model in position 6,400-9,200.
6. Zoom out to view the region from position 2,000 to position 5,600 again.
7. What differentiates the Augustus-predicted model from the FGenesHpredicted model? Which of the two does SNAP emulate?
8. Move on to the other two locations that contain predicted genes and
determine the differences between the models predicted by the three
different algorithms Augusts, FGenesH and SNAP.
9. If you find different predictions leading to conflicting models, explain what
would be required to be able to decide which gene prediction got it right.
10. To conclude your work click menu tab File and select Upload to DNA Subway.
11. Close the Apollo to return to DNA Subway.
Experiment 5: Identify Biological Evidence
Protein-coding genes are transcribed into RNA, which is processed into mRNA, which is
translated into proteins – were can one find material evidence for the genes in this
contig?
1. Click the BLAST buttons to search databases of known genes and transcripts
such as cDNAs or ESTs (BLASTN) and proteins (BLASTX) for sequences that
match the genomic DNA sequence. (Too brush up on how mRNA is isolated
and transformed into expressed sequence tags (ESTs) and complementary
DNA (cDNA) click the Background button at the bottom of the DNA Subway
screen.)
2. View BLASTN and BLASTX matches in the table view and the Local Browser.
3. For how many predicted genes did BLAST generate biological evidence?
Download