BCB 444/544 Fall 07 Sept5 BCB 444/544 Homework 2 (20 pts) Due Fri Sept 14 HW2 p 1 Name _________________________________________ (please bring hard copy to class or deliver to MBB 106 by 5 PM) Note: You may work with other students to solve these problems, but each student must submit separate answers in his/her own words. If necessary, use additional paper for your answers. Objectives: 1. Understand how to use dot plots and interpret their results 2. Understand how dynamic programming works 3. Gain experience using different types of substitution matrices Problem 1 - (3 pts total) 1. Suppose we are given two identical 100 kb DNA sequences A and B. A dot plot comparing these two sequences would have the pattern shown below: 1 A 100 kb 1 B 100 kb Draw simple diagrams of dot plots that would result from each of the following comparisons. Be sure to label both axes, as in the example shown above. 1a) (1 pt) A 120 kb DNA sequence, A, and a 100 kb DNA sequence, B, identical to A, except that A has a 20 kb segment duplicated near the beginning, relative to B. 1b) (1 pt) Two 100 kb DNA sequences, identical except that B has 40 kb segment inverted relative to A, near the center. 1c) (1 pt) A 10 kb gene sequence, A, and the 8 kb mRNA, B, produced from it. There is a single 2 kb intron located 1 kb from the beginning of the A gene (i.e., from the 5'end). BCB 444/544 Fall 07 Sept5 HW2 p 2 Problem 2 - (5 pts total) 2. Re-read pages 41-49 of your textbook re: substitution matrices and statistical significance of alignment. Recall that to evaluate similarities between divergent proteins, a PAM matrix with a larger "index" but a BLOSUM matrix with a smaller "index" should be used. In this exercise, you will compare different matrices to test whether they really give different results in a BLAST search. Your task is to determine whether any yeasts have an enzyme similar to the human telomerase enzyme: >gi|109633031|ref|NP_937983.2| telomerase reverse transcriptase isoform 1 [Homo sapiens] MPRAPRCRAVRSLLRSHYREVLPLATFVRRLGPQGWRLVQRGDPAAFRALVAQCLVCVPWDARPPPAAPSFRQVSCLKELVARVLQRLCERGAKNVLAF GFALLDGARGGPPEAFTTSVRSYLPNTVTDALRGSGAWGLLLRRVGDDVLVHLLARCALFVLVAPSCAYQVCGPPLYQLGAATQARPPPHASGPRRRLG CERAWNHSVREAGVPLGLPAPGARRRGGSASRSLPLPKRPRRGAAPEPERTPVGQGSWAHPGRTRGPSDRGFCVVSPARPAEEATSLEGALSGTRHSHP SVGRQHHAGPPSTSRPPRPWDTPCPPVYAETKHFLYSSGDKEQLRPSFLLSSLRPSLTGARRLVETIFLGSRPWMPGTPRRLPRLPQRYWQMRPLFLEL LGNHAQCPYGVLLKTHCPLRAAVTPAAGVCAREKPQGSVAAPEEEDTDPRRLVQLLRQHSSPWQVYGFVRACLRRLVPPGLWGSRHNERRFLRNTKKFI SLGKHAKLSLQELTWKMSVRDCAWLRRSPGVGCVPAAEHRLREEILAKFLHWLMSVYVVELLRSFFYVTETTFQKNRLFFYRKSVWSKLQSIGIRQHLK RVQLRELSEAEVRQHREARPALLTSRLRFIPKPDGLRPIVNMDYVVGARTFRREKRAERLTSRVKALFSVLNYERARRPGLLGASVLGLDDIHRAWRTF VLRVRAQDPPPELYFVKVDVTGAYDTIPQDRLTEVIASIIKPQNTYCVRRYAVVQKAAHGHVRKAFKSHVSTLTDLQPYMRQFVAHLQETSPLRDAVVI EQSSSLNEASSGLFDVFLRFMCHHAVRIRGKSYVQCQGIPQGSILSTLLCSLCYGDMENKLFAGIRRDGLLLRLVDDFLLVTPHLTHAKTFLRTLVRGV PEYGCVVNLRKTVVNFPVEDEALGGTAFVQMPAHGLFPWCGLLLDTRTLEVQSDYSSYARTSIRASLTFNRGFKAGRNMRRKLFGVLRLKCHSLFLDLQ VNSLQTVCTNIYKILLLQAYRFHACVLQLPFHQQVWKNPTFFLRVISDTASLCYSILKAKNAGMSLGAKGAAGPLPSEAVQWLCHQAFLLKLTRHRVTY VPLLGSLRTAQTQLSRKLPGTTLTALEAAANPALPSDFKTILD First, use the sequence above as Query sequence in a BLASTp search http://www.ncbi.nlm.nih.gov/BLAST/, using default parameters to search the non-redundant protein sequences database (nr) for similar sequences in yeasts. Hints: 1- Give this search the JobTitle "Default" (or "BLOSUM62") 2- Set Organism = "yeast taxid:4932" Next, run 4 additional BLAST searches using each of the 4 alternative substitution matrices listed in 2a. More Hints: 3- Click on Algorithm Parameters to reveal choices for Substitution Matrices 4- Click on Edit and Resubmit link at top of output page to change parameters 5- Use the Recent Results tab to review & compare your results BLOSUM62 (default) BLOSUM45 BLOSUM80 PAM30 PAM70 2a) (1 pts) How many hits did you obtain? 2b) (1 pt) Describe & explain differences you observe in results obtained with BLOSUM45 vs BLOSUM80. 2c) (1 pt) Describe & explain differences you observe in results obtained with PAM30 vs PAM70. 2d) (1 pt) Taken together, do these results make sense, given what you've learned about PAM and BLOSUM matrices? Explain. 2e) (1 pt) Do yeasts have a telomerase enzyme? Explain. Is the default BLOSUM62 matrix "the best" for answering this question? Explain. BCB 444/544 Fall 07 Sept5 HW2 p 3 Problem 3 - (12 pts total) 3. Consider the following sequences for a "toy" alignment problem in which you will perform & score both a global & a local alignment: x = ACCTT y = ACTTG 3a) (4 pts) Complete the Global Alignment Dynamic Programming matrix below (with initial values already entered). Use the following scoring scheme: Reward for matches: +10 Mismatch penalty: -2 Space penalty: -5 0 A - 5 C T T G -10 -15 -20 -25 A C C T T -5 -10 -15 -20 -25 3b) (1 pt) What is the score of the optimal global alignment(s)? 3c) (1 pt) Draw the alignment(s) that give this score. 3d) (4 pts) Complete the Local Alignment DP matrix below (with initial values already entered). Use the following scoring scheme: Match: +2 Mismatch and space: -1 0 A 0 C 0 T 0 T 0 G 0 A C C T T 0 0 0 0 0 3e) (1 pt) What is the score of the optimal local alignment(s)? 3f) (1 pt) Draw the alignment(s) that give this score.