HW Assignment 2

advertisement
BCB 444/544 Fall 07 Sept5
BCB 444/544
Homework 2 (20 pts)
Due Fri Sept 14
HW2 p 1
Name _________________________________________
(please bring hard copy to class or deliver to MBB 106 by 5 PM)
Note: You may work with other students to solve these problems, but each student must submit separate answers
in his/her own words. If necessary, use additional paper for your answers.
Objectives:
1. Understand how to use dot plots and interpret their results
2. Understand how dynamic programming works
3. Gain experience using different types of substitution matrices
Problem 1 - (3 pts total)
1. Suppose we are given two identical 100 kb DNA sequences A and B. A dot plot comparing these two sequences
would have the pattern shown below:
1
A
100 kb
1
B
100 kb
Draw simple diagrams of dot plots that would result from each of the following comparisons.
Be sure to label both axes, as in the example shown above.
1a) (1 pt) A 120 kb DNA sequence, A, and a 100 kb DNA sequence, B, identical to A,
except that A has a 20 kb segment duplicated near the beginning, relative to B.
1b) (1 pt) Two 100 kb DNA sequences, identical except that B has 40 kb
segment inverted relative to A, near the center.
1c) (1 pt) A 10 kb gene sequence, A, and the 8 kb mRNA, B, produced from it.
There is a single 2 kb intron located 1 kb from the beginning of the A gene (i.e.,
from the 5'end).
BCB 444/544 Fall 07 Sept5
HW2 p 2
Problem 2 - (5 pts total)
2. Re-read pages 41-49 of your textbook re: substitution matrices and statistical significance of alignment.
Recall that to evaluate similarities between divergent proteins, a PAM matrix with a larger "index" but a BLOSUM
matrix with a smaller "index" should be used. In this exercise, you will compare different matrices to test whether
they really give different results in a BLAST search. Your task is to determine whether any yeasts have an
enzyme similar to the human telomerase enzyme:
>gi|109633031|ref|NP_937983.2| telomerase reverse transcriptase isoform 1 [Homo sapiens]
MPRAPRCRAVRSLLRSHYREVLPLATFVRRLGPQGWRLVQRGDPAAFRALVAQCLVCVPWDARPPPAAPSFRQVSCLKELVARVLQRLCERGAKNVLAF
GFALLDGARGGPPEAFTTSVRSYLPNTVTDALRGSGAWGLLLRRVGDDVLVHLLARCALFVLVAPSCAYQVCGPPLYQLGAATQARPPPHASGPRRRLG
CERAWNHSVREAGVPLGLPAPGARRRGGSASRSLPLPKRPRRGAAPEPERTPVGQGSWAHPGRTRGPSDRGFCVVSPARPAEEATSLEGALSGTRHSHP
SVGRQHHAGPPSTSRPPRPWDTPCPPVYAETKHFLYSSGDKEQLRPSFLLSSLRPSLTGARRLVETIFLGSRPWMPGTPRRLPRLPQRYWQMRPLFLEL
LGNHAQCPYGVLLKTHCPLRAAVTPAAGVCAREKPQGSVAAPEEEDTDPRRLVQLLRQHSSPWQVYGFVRACLRRLVPPGLWGSRHNERRFLRNTKKFI
SLGKHAKLSLQELTWKMSVRDCAWLRRSPGVGCVPAAEHRLREEILAKFLHWLMSVYVVELLRSFFYVTETTFQKNRLFFYRKSVWSKLQSIGIRQHLK
RVQLRELSEAEVRQHREARPALLTSRLRFIPKPDGLRPIVNMDYVVGARTFRREKRAERLTSRVKALFSVLNYERARRPGLLGASVLGLDDIHRAWRTF
VLRVRAQDPPPELYFVKVDVTGAYDTIPQDRLTEVIASIIKPQNTYCVRRYAVVQKAAHGHVRKAFKSHVSTLTDLQPYMRQFVAHLQETSPLRDAVVI
EQSSSLNEASSGLFDVFLRFMCHHAVRIRGKSYVQCQGIPQGSILSTLLCSLCYGDMENKLFAGIRRDGLLLRLVDDFLLVTPHLTHAKTFLRTLVRGV
PEYGCVVNLRKTVVNFPVEDEALGGTAFVQMPAHGLFPWCGLLLDTRTLEVQSDYSSYARTSIRASLTFNRGFKAGRNMRRKLFGVLRLKCHSLFLDLQ
VNSLQTVCTNIYKILLLQAYRFHACVLQLPFHQQVWKNPTFFLRVISDTASLCYSILKAKNAGMSLGAKGAAGPLPSEAVQWLCHQAFLLKLTRHRVTY
VPLLGSLRTAQTQLSRKLPGTTLTALEAAANPALPSDFKTILD
First, use the sequence above as Query sequence in a BLASTp search http://www.ncbi.nlm.nih.gov/BLAST/,
using default parameters to search the non-redundant protein sequences database (nr) for similar sequences
in yeasts.
Hints:
1- Give this search the JobTitle "Default" (or "BLOSUM62")
2- Set Organism = "yeast taxid:4932"
Next, run 4 additional BLAST searches using each of the 4 alternative substitution matrices listed in 2a.
More Hints:
3- Click on Algorithm Parameters to reveal choices for Substitution Matrices
4- Click on Edit and Resubmit link at top of output page to change parameters
5- Use the Recent Results tab to review & compare your results
BLOSUM62
(default)
BLOSUM45
BLOSUM80
PAM30
PAM70
2a) (1 pts) How many hits did you obtain?
2b) (1 pt) Describe & explain differences you observe in results obtained with BLOSUM45 vs BLOSUM80.
2c) (1 pt) Describe & explain differences you observe in results obtained with PAM30 vs PAM70.
2d) (1 pt) Taken together, do these results make sense, given what you've learned about PAM and
BLOSUM matrices? Explain.
2e) (1 pt) Do yeasts have a telomerase enzyme? Explain.
Is the default BLOSUM62 matrix "the best" for answering this question? Explain.
BCB 444/544 Fall 07 Sept5
HW2 p 3
Problem 3 - (12 pts total)
3. Consider the following sequences for a "toy" alignment problem in which you will perform & score both a
global & a local alignment:
x = ACCTT
y = ACTTG
3a) (4 pts) Complete the Global Alignment Dynamic Programming matrix below (with initial values already
entered). Use the following scoring scheme:
Reward for matches: +10
Mismatch penalty: -2
Space penalty: -5
0
A
- 5
C
T
T
G
-10
-15
-20
-25
A
C
C
T
T
-5
-10
-15
-20
-25
3b) (1 pt) What is the score of the optimal global alignment(s)?
3c) (1 pt) Draw the alignment(s) that give this score.
3d) (4 pts) Complete the Local Alignment DP matrix below (with initial values already entered).
Use the following scoring scheme:
Match: +2
Mismatch and space: -1
0
A
0
C
0
T
0
T
0
G
0
A
C
C
T
T
0
0
0
0
0
3e) (1 pt) What is the score of the optimal local alignment(s)?
3f) (1 pt) Draw the alignment(s) that give this score.
Download