BIT150 – Fall 2009 –Homework 2 Due on Thursday October 8th by email to TA: rlnitcher@ucdavis.edu as Hwk2_Lastname BEFORE the Lab 1. 15 points Using the DNA sequence presented below: >DNA_Tm43J16 CAGTATGCCATGGATGTTCCCTCTATGGAAGCGTTAGCACTTTTGAAAGGACTGCGATTGGCAGAGCATTTGGGTATTTATGCACTGGTTGTGGAGT CCGACTCACAGGAGATAGTACAAGCTATGAAGGATCCTTCGGAGTACCGTCCTTCGGCTGCGGTGTTGAGGGATGATTGCCGACAGGTGTTGAGTTC CTTCGGCAGAGCGACGATTGTTCATTGTGCTACGGAGAGTAATGGGGTAGCGCATTGGCTAGTCAGGACTAGTTATAGAAGGAAGTTAAATGAGGTC TGGCAAGAGCATCCCCCTGATTTCTTAATTCCCTCCCTTGTATCTGATATGATTATTTTGTAATAAAGGTCGCCCAAGTTCAGAAAAAACTCTGTTT TACTGTTTAGTAATAGGATAAAAAAACTGACACAACGGCAGGAAATCTGGCTATTGCTTTTGGCTGTTAGGTAAATATAAAATATTTTACAGAAATA CTAACGCTCATACGTGTGGCAACATTGTAACTCACCCACATGTCTGGATCATCGTCCAGTTAAGTTTGCACGAATCTTGACACACCGAGGCAGTTTT CGCGTGCCACGTAGGATAAGGCCGGTGTGTG 1.1. Do a blastn search against the nucleotide collection database. - Report the lowest E value and calculate the probability of finding an alignment with this E value by chance (P=1-e-E). - Can you conclude that your finding is NOT just a random alignment? ANSWER 1.1 The Lowest E is 0.63. This makes P = 0.4674. (P=1-e-0.63 P=1-0.5326) With this low probability, I can NOT conclude that my finding is NOT just a random alignment, since the lowest expected value E = 0.63 of finding alignments with scores equivalent to or better than the Bit Score occurring by chance in the database search is relatively high and, as a consequence, the probability P = 0.4674 of finding at least one high-scoring segment pair with the Bit Score by chance is also high. (Remember: the suggested BLAST cutoff for nucleotide searches is E < e-10 to consider that your finding is NOT a random alignment). 1.2. Repeat your blastn search but now against the est_others database. - Report the lowest E value. Can you conclude now that your finding is NOT just a random alignment? - Are all these EST sequences present in the nucleotide collection database? - Click in the link ‘Distance tree of results’ that appears on top of your table of ‘Sequences producing significant alignments’. Using Shift+PrintScreen, include the picture of the tree in your homework. How can this tool be used to understand the BLAST results? ANSWER 1.2 1 Lowest E value is 5e-14. We still can now conclude that the finding is NOT just a random alignment (5e-14 < e-10). All these EST sequences from the EST database are NOT present in the nucleotide collection database. Distance Tree of results: In cases where many results appear to be closely related to your query sequence, you can use phylogenic relationships to correct identify your sequence. 1.3. This sequence is from cultivated diploid wheat, Triticum monococcum L., a species that belongs to the Triticeae tribe within the Poaceae (grass) family. Repeat your blastn search using the tribe as the limit by Organism. - How many alignments did you find? Open the flat files of the first three entries and indicate the NCBI division to which they belong. Report any accession.version numbers which you can conclude are not a random alignment. - Report the lowest E value. Is this a lower or a higher E value than the one obtained in 1.2.? ANSWER 1.3 -With the Family limit, 30 accessions were found. All of the first three accessions (GH732400.1, CJ928818.1, and CK158097.1) are from the EST database, as the search was done against the est_others database. Only one accession (GH732400.1) has a low enough probability of not being just a random alignment. -The E value for this session is lower than the one found in 1.2. 2 2. 40 points Using the DNA sequence presented below: >DNA TaFDL2_D ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA 2.1. Use BLAST2Sequences to predict if any repetitive elements are present. Paste an image of your alignment in the homework and state what kind of repeat it is. Highlight the repeat region in your sequence in Grey. ANSWER 2.1 3 There is a MITE present in the first intron. 2.2. Do a blastn search against the est_others database with a limit to the organisms in the Triticeae family. - Report the lowest E value and the number of alignments you find with this E value. - Indicate the Maximum score and the Total score? How are these two measurements obtained? - Which % of the sequence is covered by the aligned bases? - A gene is present in this DNA sequence. From your blastn search against the est_others database, how many exons would you predict this gene has? - Highlight with different colors in the DNA sequence the exons of the gene defining their borders based on your best alignments. - In which tissues it is transcribed based on the EST profile viewer? 4 ANSWER 2.2 There are three alignments with and E value of 2e-123. These alignments have a Max Score=448, Total=sum of 3 HSP=734, Query coverage=15%. I would predict this gene has 3 exons. Exon1, Exon2 and Exon3. >DNA TaFDL2_D ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA 5 FDL2 is expressed in the inflorescence (flower), the root, and the seed. 2.3. The following is the protein sequence of the gene present in the DNA sequence provided above. >Protein TaFDL2 MAGPFMASHLGPQPLSVATGAIMEPIYPDGQITSPMLDALSDPQTPRRKRGASDGVTDKVVERRQKRMIKNRESAARSRARKQAYTNELENKVSRLE EENERLKKQKELDMMITSAPPPEPKYQLRRTSSAPV* - - Use the appropriate blast program to perform an alignment between the DNA sequence and the protein sequence. Can you confirm the number of exons you had predicted in 2.2.? Improve the borders of the exons defined in 2.2. based on your alignment. Find the START codon (ATG), the STOP codon (TGA), and the splicing sites (5’ GT and 3’ AG) of the gene, and indicate them in the DNA sequence with bold red letters (the gene is in the 5’ -> 3’ orientation). Obtain the cDNA of the gene from the START codon to the STOP codon after eliminating the introns according to the splicing sites. Present the cDNA sequence in your homework. (Hint: to check if you correctly annotated the cDNA, try translating it. Your protein should match the one provided above.) ANSWER 2.2 Yes, 3 exons are found by tblastn. 6 ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA >FDL2_cDNA ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGCTTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAA GAGGAGAACGAGAGGTTGAAGAAGCAGAAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAGAACCCAAGTATCAACTCCGGAGAACAAGTT CTGCCCCTGTTTGA 7 2.4.Using the protein sequence of this gene, perform a blastp search. - Report the lowest E value and the number of alignments you find with this E value. - What is this gene? Which percentages of identity between your query protein sequence and the aligned proteins from the database support your conclusion? Report the accession.version numbers of the aligned proteins from the database. - What is the conserved domain present in this protein? Using Shift+PrintScreen, present a picture of it in your homework. ANSWER 2.3 The lowest E value is 3e-70 – Triticum aestivum This gene is a bZip transcription factor. Percent Identity Triticum aestivum 99% Rice 80% Sorghum Bicolor 82% Zea mays 78% The protein contains a conserved bZIP domain. 3. 10 points The following is the protein sequence of the corn (Zea mays) orthologue of the wheat gene presented in 2: >protein ZmFD mssqtgggkesdagpgqhrqmqslarqgslynltldevqnhlgepllsmnfdellksvfpdgvdsdgavtgkpdrts slqrqgsilmppqlskktvdevwkgiqggpetstvvdglqrrerhptlgemtledflvkagvvteglvkdsadfpsn mdtagssvvvaaasslnpgaqwlqqyqqqvlgsqqlslagsymasqlrpqplsiatgatldsiysddqitspsfgal 8 sdpqtpgrkrgalgevvdkvverrqkrmiknresaarsrarkqaytnelenkvfrleeenkrlkkqqeldeilssap ppepkyqlrrtgsaaf 3.1. Use the appropriate blast program to perform an alignment between these two protein sequences, FDL2 from wheat and protein from rice. - What is the percentage of identity between them? 78% Identity 3.2. Use one of the dynamic-programming methods shown in both Lecture2 and Lab2 to align both protein sequences. - Would you perform a global or a local alignment? Local. The proteins are different lengths, making Local alignment more appropriate in this case. - Which BLOSUM matrix would you use? BLOSUM62, greater than 30% identity - Answer these questions and based on your answers run the alignment. Present your results reporting the length of the sequence aligned, similarity, identity, number of gaps, and final score. Triticum vs Zea mays Local: Identity: 78.6% Similarity: 86.3% Gaps 0% Score 513 4. 5 points The following two sequences correspond to the same gene in both wheat and maize (60 million years of divergence). The sequences highlighted in green correspond to the exons and the ones in black text to the introns. START and STOP codons are bolded and highlighted. Splicing sites are bolded and in red. >GeneA_Wheat GGAAGGGGAAATGGCCGGTAGGGATAGGGACCCGCTGGTGGTTGGCAGGGTTGTGGGGGACGTGCTGGACCCCTTCGTCCGGACCA CCAACCTCAGGGTGACCTTCGGGAACAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTCGCCCAGCAGCCCAGGGTT GAGGTGGGCGGCAATGAGATGAGGACCTTCTACACACTCGTACGTACACAGTCACTATCTAATGCCAATTTATCTCTGAAAGTGCT CACCACACGCACATGATCGATCGAGCTCGATCTATAGTACGTGAGGGAAATTGATTTTCGATGCTTCTGTTCACATGTTTGCCTCA GCAAGCACATGACTAATGCTCCATCTTGCATATGTCTCTGTGCCCTCTGGTGTTGATCATGATTTTTCTATGCTTCTTCTATGTTC GGGGAGCATTTATTTTTTATGCTTCTCTTGACATGTTTCATGTTTGTCCTAGCAAGCACACGAGTAATTAAAGCTCGATCTTAAAT ACTCTCTCCGTCCGAATAAATGTACTTCTAGCTTTTGTCTTAAGTCAAAGTTTTAAAATTTTGACCAACTTTATAGGAAAAAGTAG CAGCATTTATGACACTAAATTAGTATCACTAGATTCGTTTTGAAATGTATTTTCATAATATATCAATTTGATATTATATATGTTAC TACTTATTTGTATATAGTTGGTCAAAGTTTTAAAACTTTGACTTAGGATAAAAACTAGAAGTACACTTATTCGTGGACGGAGGGAG TATATGCTTATGTAGGTAGTACTCTCTACTTTGATCATGATGTGCACGCGTTTACTGCCCGCAGGTGATGGTAGACCCAGATGCTC CAAGTCCAAGCGATCCCAACCTTAGGGAGTATCTCCACTTGAAAGTACTAAA >GeneA_ZeaMays ATAGATCGACATGGCCGGCAGGGACAGGGAGCCGCTGGTGGTTGGTAGGGTGGTCGGCGACGTGCTGGACCCCTTCGTCCGGACCA CCAACCTCAGGGTCAGCTACGGGGCCAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTGGTGCACCAGCCAAGGGTC GAGGTCGGGGGACCTGACATGAGGACCTTCTACACCCTCGTACGTGTATATATATATATATATATACACGTCGTCGTTTTACTTCT CTTCATTGGCTAGCTACTTAGCTAGCTAGCTAGCTTATAATGATAGCCCGTTGATCCATCGAGATATGATCGTACGTGATGCCATG CATGCTTCGTTCTTCAGGTGATGGTGGACCCAGATGCTCCGAGCCCAAGCGACCCGAACCTTAGGGAGTACCTACACTTGAAAAAT CTATA 9 4.1. Use Dotter and BLAST2SEQUENCES to align both gene sequences. - Using Shift+PrintScreen, present a plot of the alignments in your homework. - Report the Dotter parameters used (window size and stringency). - What are the conserved parts between the wheat and rice genes? ANSWER 4.1 The exons are conserved. DOTTER: Window Size 22, Stringency: 40/100 BLAST2SEQUENCES 10 5. 10 points Using the following sequence: >Hv_DNA tgaggcctaagagtaccaaatcccttctgaaaattgtggtttgagaagtgggttttagagaagaagaagaagatgat gaaaagtcccttctgtgttcggatttggtttggttaaggattacctttgtactgttgttgttgttgttgttgttgtt cctgatgaga 5.1. Use Dotter to align the sequence with itself. - Using Shift+PrintScreen, present a plot of the alignment in your homework. - Report the Dotter parameters used (window size and stringency). - What kind of repeats are you observing? Indicate their approximate coordinates. - What is this sequence? (Look at the coordinates of the best alignment against the database). ANSWER 5.1 Stringency: 38/68, Window: 21, Microsatellite direct repeats at 55-80, 120-150bp Accession DQ865365, Hordeum vulgare subsp. vulgare clone B microsatellite CrtSSR47 sequence 6. 15 points Using the two following scoring matrices, calculate manually the scores for the following alignments: 6.1. Scoring matrix A: Match 2, mismatch -1, open gap -7, extended gap -1 (affine gap penalty) 6.2. Scoring matrix B: Match 2, mismatch -1, gap -2 per each bp (linear gap penalty) - Which alignment is better under each scoring matrix? - What is the effect of affine versus linear gap penalties in the number of gaps introduced in an alignment? Alignment I Alignment 2 CAGTCCGATGTGGCGG || || | | || | CACTCA--TTTCGC-G CAGTCCGATGTGGCGG ||| || | ||| | CAGACCAA---GGCAG ANSWER 6.1 2+2-1+2+2-1-7-1+2-1+2-1+2+2-7+2=1 2+2+2-1+2+2-1+2-7-1-1+2+2+2-1+2=8 11 Alignment 2 is better. ANSWER 6.2 2+2-1+2+2-1-2-2+2-1+2-1+2+2-2+2=8 Alignment 2 is better. 2+2+2-1+2+2-1+2-2-2-2+2+2+2-1+2=11 7. 5 points Using Boolean operators perform the following ENTREZ searches and report the number of Nucleotides and ESTs found: - Containing ‘bZip’ 6701 Nucleotides, 939 ESTs - Containing ‘bZip and related family words (using truncation, *) 6741, 952 - Containing both ‘bZip’ and ‘transcription factor’ 5980, 598 - Containing either bZip’ or ‘transcription factor’ 69499, 56341 - Containing both ‘bZip’ and ‘transcription factor’ in rice 305, 11 - Containing both ‘bZip’ and ‘transcription factor’ in rice but not in Arabidopsis 139, 8 12