BIT150 – Fall 2008 – Homework 2 KEY Due on Thursday October 9th by email to TA: mfaricelli@ucdavis.edu as Hwk2_Lastname BEFORE the Lab 1. 15 points Using the DNA sequence presented below: >DNA Tm322N9 CGGAATTATTATTTAATTGGTCAGATTTATTGTTTCTATTCAGACAGATGGTTTCAGCAATACTTTTTGTGTGACTTTTTTGCATGTGATGACACCG TCTCCGAGGGCCGTCACCACCCCCAGACTCCTAGAGTAGAAGTCACCTGCAAGATACCTGGGTGTCAGTTATGTGCACGTGAACTGAGATGCTTGCA GTCAAAAGAGATGAGTGTTGCCAGTTGATGCTTATTCTGACACCGGCAACGAGATGATTCACAACCTGCAAGCATTCAATCAAGAAGAGTAAACAGG TATGGAACCGTGAACACTGCAAAAACAATTATGTTTTCTCATTAATGTATGATAAACTGATGCTATGAGATATTTTCTTGCTGTCTGATTACCATTT GATGGAACCTTCACTATTATCAGTTGGGAAACAAACCTGTTGTTTACGTCACTTTGAGGCTGGAAACTGGAGTTGTGAGCTGCATAGTCGATGCAGT TGATGCTTATTCTGACACCGGCAACGACATGATTCACCACCTGCAAGCATTCATTCAAGAAGAGTAAAGAATTTGGGGATGACAAATCGACCTAAAC AGGTATTGGGTGCTCCGTTGTAAAATTCATTGTTCTCCGTC 1.1. Do a blastn search against the nucleotide collection database. - Report the lowest E value and calculate the probability of finding an alignment with this E value by chance (P=1-e-E). - Can you conclude that your finding is NOT just a random alignment? ANSWER 1.1. The lowest E value is 0.16 P = 1-e-0.16 P = 1-0.852 P = 0.148 With this low probability, I can NOT conclude that my finding is NOT just a random alignment, since the lowest expected value E = 0.16 of finding alignments with scores equivalent to or better than the Bit Score occurring by chance in the database search is relatively high and, as a consequence, the probability P = 0.148 of finding at least one high-scoring segment pair with the Bit Score by chance is also high. (Remember: the suggested BLAST cutoff for nucleotide searches is E < e-10 to consider that your finding is NOT a random alignment). 1.2. Repeat your blastn search but now against the est_others database. - Report the lowest E value. Can you conclude now that your finding is NOT just a random alignment? - Are all these EST sequences present in the nucleotide collection database? - Click in the link ‘Distance tree of results’ that appears on top of your table of ‘Sequences producing significant alignments’. Using Shift+PrintScreen, include the picture of the tree in your homework. 1 ANSWER 1.2. The lowest E value is 8e-18 With this expected value, now I can conclude that my finding is NOT just a random alignment (8e-18 < e-10). All these EST sequences from the EST database are NOT present in the nucleotide collection database. Distance Tree of results: 1.3. This sequence is from cultivated diploid wheat, Triticum monococcum L., a species that belongs to the Triticeae tribe within the Poaceae (grass) family. Repeat your blastn search using the tribe as the limit by Organism. - How many alignments did you find? Report their accession.version numbers. Open their flat files and indicate the NCBI division to which they belong. - Report the lowest E value. Is this a lower or a higher E value than the one obtained in 1.2.? ANSWER 1.3. 2 I found only 3 alignments: CJ653306.1 BE515575.1 BE444040.1 Since the search was done against the est_others database, all three of them belong to the EST NCBI division. The lowest E value is 3e-19, and this is a LOWER value than the one obtained in 1.2. (8e-18). 2. 30 points Using the DNA sequence presented below: >DNA Fop1 Tm322N9 TTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATA CTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGT CGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAG CTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAG CGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTA CCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTC CGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACG TGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGT CGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGG CTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGC TGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCAC AAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCT ATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCG CTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACC TGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTG CGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTG GAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGA GGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGAC GCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCG TACACAGTGTGACGCATGTTTTATCA 2.1. Do a blastn search against the est_others database. - Report the lowest E value and the number of alignments you find with this E value. - A gene is present in this DNA sequence. From your blastn search against the est_others database, how many exons would you predict this gene has? - Highlight with different colors in the DNA sequence the exons of the gene defining their borders based on your best alignments. ANSWER 2.1. The lowest E value is 0, and there are 7 alignments with an E value = 0. I would predict this gene has 2 exons. Exon1 and Exon2. 3 2.2.The following is the protein sequence of the gene present in the DNA sequence provided above. >Protein Fop1 Tm322N9 MDHSVLLLLASLAAVAVAAVWHLRSHGRRTKLPLPPGPRGWPVLGNLPQLGAMPHHTMAALARQHGPLFRLRFGSVEVVVAASAKVARSFLRAHDAN FSDRPPTSGAEHLAYNYQDLVFAPYGARWRALRKLCALHLFSARALDALRTIRQDEARLMVTHLLSSSSPAGVAVNLCAINVCATNALARAAIGRRM FGDGVGEGAREFKDMVVELMQLAGVLNIGDFVPALRWLDPQGVVAKMKRLHRRYDRMMDGFISERGQHAGEMEGNDLLSVMLATMRWQSPADAGEED GIKFTEIDIKALLLNLFTAGTDTTSSTVEWALAELIRDPCILKQLQHELDGVVGNDRLVTEADLPRLTFLAAVIKETFRLHPATPLSLPRVAAEDCE VDGYHVSKGTTLIMNVWAIARDPASWGPDPLEFRPVRFLPGGLHESADVKGGDYELIPFGAGRRICAGLGWGLRMVTLMTAMLVHAFDWSLVDGTTP EKLNMEEAYGQTLQRAVPLVVQPVPRLLSSAYTV* - - Use the appropriate blast program to perform an alignment between the DNA sequence and the protein sequence. Can you confirm the number of exons you had predicted the gene has in 2.1.? Improve the borders of the exons defined in 2.1. based on your alignment. Find the START codon (ATG), the STOP codon (TGA), and the splicing sites (5’ GT and 3’ AG) of the gene, and indicate them in the DNA sequence with bold red letters (the gene is in the 5’ -> 3’ orientation). Obtain the cDNA of the gene from the START codon to the STOP codon after eliminating the introns according to the splicing sites. Present the cDNA sequence in your homework. ANSWER 2.2. I used the program blastx to align the DNA sequence with the protein sequence. I can confirm that the number of exons the gene has is 2. 4 >DNA Fop1 Tm322N9 TTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATA CTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGT CGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAG CTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAG CGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTA CCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTC CGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACG TGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGT CGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGG CTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGC TGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCAC AAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCT ATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCG CTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACC TGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTG CGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTG GAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGA GGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGAC GCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCG TACACAGTGTGACGCATGTTTTATCA >cDNA Fop1 Tm322N9 ATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGTCGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGC CTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAGCTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCA TGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAGCGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAAC TTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTACCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCC GCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTCCGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCT CTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACGTGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATG TTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGTCGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGC CCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGGCTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAG GGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGCTGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGAC GGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCGCTGG CAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACCTGCC ACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTGCGAG GTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTGGAGT TCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGAGGAT ATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGACGCCC GAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCGTACA CAGTGTGA 2.3.Using the protein sequence of this gene, perform a blastp search. - Report the lowest E value and the number of alignments you find with this E value. - What is this gene? Which percentages of identity between your query protein sequence and the aligned proteins from the database support your conclusion? Report the accession.version numbers of the aligned proteins from the database. - What is the conserved domain present in this protein? Using Shift+PrintScreen, present a picture of it in your homework. ANSWER 2.3. 5 The lowest E value is 0, and there are 6 alignments with an E = 0. This gene is likely a flavonoid 3’-hydroxylase. NP_001064338.1 AAM00948.1|AC021892_12 AAN04937.1 AAP52914.1 BAF26252.1 EAY78007.1 EAZ15637.1 72% identity Rice ABG54319.1 71% identity Sorghum ABG54321.1 70% identity Sorghum ABG54320.1 AAV74194.1 AAV74195.1 69% identity Sorghum ACF85998.1 73% identity Maize The conserved domain present in the protein is Cytochrome P450. 3. 10 points The following is the protein sequence of the rice (Oryza sativa L.) orthologue of the wheat gene presented in 2.: >Protein Fop1 Rice MDVVPLPLLLGSLAVSAAVWYLVYFLRGGSGGDAARKRRPLPPGPRGWPVLGNLPQLGDKPHHTMCALARQYGPLFRLRFGCAEVVVAASAPVAAQF LRGHDANFSNRPPNSGAEHVAYNYQDLVFAPYGARWRALRKLCALHLFSAKALDDLRAVREGEVALMVRNLARQQAASVALGQEANVCATNTLARAT IGHRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALRWLDPQGVVAKMKRLHRRYDNMMNGFINERKAGAQPDGVAAGEHGNDLLSVLLARMQ EEQKLDGDGEKITETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKEAQHELDTVVGRGRLVSESDLPRLPYLTAVIKETFRLHPSTPLSLP REAAEECEVDGYRIPKGATLLVNVWAIARDPTQWPDPLQYQPSRFLPGRMHADVDVKGADFGLIPFGAGRRICAGLSWGLRMVTLMTATLVHGFDWT LANGATPDKLNMEEAYGLTLQRAVPLMVQPVPRLLPSAYGV* 3.1. Use the appropriate blast program to perform an alignment between these two protein sequences, Fop1 from wheat and Fop1 from rice. - What is the percentage of identity between them? ANSWER 3.1. I used the program blastp to align both protein sequences. They are 72% identical. 3.2. Use one of the dynamic-programming methods shown in both Lecture2 and Lab2 to align both protein sequences. - Would you perform a global or a local alignment? 6 - Which BLOSUM matrix would you use? Answer these questions and based on your answers run the alignment. Present your results reporting the length of the sequence aligned, similarity, identity, number of gaps, and final score. ANSWER 3.2. I would perform a global (needle) alignment. I would use the BLOSUM62 matrix, since the proteins are 72% identical. Including gaps, 535 amino acids were included in the global alignment between both proteins. They resulted to be 79.1% similar, and 72.0% identical, and to have 25 (4.7%) gaps, and a final score of 1903.0. 4. 10 points The following two sequences correspond to the same gene in both wheat and rice (50 million years of divergence). The sequences in pink correspond to the exons and the ones in black to the introns. START and STOP codons are bolded and highlighted. Splicing sites are bolded and in red. >CyB5 Wheat CGAGAGCGAGATGCCGACGCTGACGAAGCTGTACAGCATGAAGGAGGCCGCCCTCCACAACACCCCCGACGACTGCTGGATCGTCGTCGACGGCAAG GTAGCGCCTCCCTCATACCCCTCGCCGCCGATCTGGCTTCAGCAATACTGCCCCTAACATCGGTAGGTAGGTAGGTAGGGTGTATGGACGCGCTTCG TCGTTGCTAGTTGGGCTTCGACCCCCGCCCGTAGCCTGTTCGACCGAATGCCTGGGAGATCCTGCGCTCGCTGTGTTAGTGAGAAGGCCGCAGAAAT CGAAACCTGCTAGTCTAGGCACCAACGCTAAGGTTTGATCCTCGTGGGACAACTGTGCTGGGGTATCCTGTTTGTGGAGGTTGTGCTTGAAAGCAAC TACAGCAGATGCCTCATACTGAGGGCTTTGAATCAAATAGAATTTGTGTCAGCAGAGAGTAGATGCGCATTGCAGTACTCCTACTTGGCAATATGTT CCACTATTCTGATTGTGTGGAGATCTCATGCCGTGTTGATGGATACATTGCAGATTTATGATGTGACTGCGTATTTGGACGACCATCCTGGGGGTGC TGATGTTCTCCTTGGGGTGACCGGTACTTCTTCTCTCCGCTTCTTTTCATGTTCTTGTTCAGCACATTTTATTCTCTCTTAGGCTGAATGCTCATGT ATGATAATCCGTTTGAAGGTATGGATGGCACCGAGGAATTTGAAGATGCAGGCCACAGCAAGGATGCCAAGGAGTTGATGAAGGATTACTTCATTGG GGAGTTGGACTTGGACGAAACACCTGACATGCCTGAGATGGAGGTTTTCAGGAAAGAGCAGGACAAGGACTTCGCCAGCAAGCTGGCGGCTTATGCT GTGCAGTACTGGGCCATTCCGGTAGCAGCAGTCGGGATATCAGCCGTGGTTGCCATATTGTATGCACGAAGGAAGTGA >Cyb5 Rice GGAGGAGGAGATGCCGACGCTGACGAAGCTGTACAGCTTGGAGGACGCGGCGCGCCACAACACCGCCGACGACTGCTGGGTCGTCGTCGACGGCAAG GTAAGCTTTCCCCATCTTAGCTCTCCTCCGTTCCTTCGCTCCCCATCTTAGCTCTCCTCGTTGCTGCTGAAGTAGCAGTAGCACGTGTAACGGTGTA AGGTCGGGAGATAGATGGGTGGGTGGATTGGTAGGGGGTGCGACCGTGCGAAGCTCGCTGCTCGCTCGGTCAAGATGTCGCCCGTAACCTGTTCGAC GGAATGGCTACTAGATCGCGTGCTCGATTTCTTTGTGCTAAACTGCAATTTACCATCTTGCGATGCAGTAGTGGTATTTGTTGTCAGGCGACTAGTC AGGAGTAGTGATTTAATGCGCTGTGGTTATAGTGCGGGCTATCATTCTTTCTTGTGGAAACCCGTCGTATTTACCTGCATTGAACTATTGAAGGCTA TGGTCAAATTGTTTGCTAGGGTCACTAAAGAATTAGAGATCTGATGCATGGCTACATGTTACGTTGTTCTTACCTACTATTCAGACAAGTTCATGCT GTGTCAATGAATGCGCTGCAGATTTATGATGTCACCAAGTATCTGGACGACCATCCTGGGGGTGCTGATGTTCTGCTCGAAGTGACCGGTACTGATA ACCCTCCATTAATCTTATGTTTCTTTTTTCAGTAATACCTAGTTTATTTAGGTGGACTGATCATATCTGATTGTCTGTTATAAGGTAAGGATGCCAA GGAGGAATTTGATGATGCGGGGCACAGCGAGAGTGCCAAGGAGCTAATGCAAGATTATTTCATTGGGGAGTTGGATCCAACACCCAACATCCCTGAG ATGGAGGTTTTCAGGAAGGAGCAGGATGTGAACTTCGCAAGCAAGCTGATGGCCAATGCAGCACAGTACTGGCCCATTCCAGCGACAGTAGTCGGGA TATCAGTCGTTATTGCTGTACTGTATGCACGCCAGAAGTGATAATC 4.1. Use both Dotter and Blast2sequences to align both gene sequences. - Using Shift+PrintScreen, present a plot of the alignments in your homework. - Report the Dotter parameters (window size and stringency) and the Blast2sequences scoring system used. - What are the conserved parts between the wheat and rice genes? ANSWER 4.1. Dotter Window size: 22 Stringency: 40/100 7 Blast2sequences Match: 1 Mismatch: -2 Gap open: 5 Gap extension: 2 The conserved parts between the wheat and rice sequences are the exons of the gene. 5. 10 points Using the following sequence: >Tm67B4 TCATCTTTGGCAAACATGTCCTTAGAGCATCTCCAGCCGTTCAGCCCATAGGACGCCGAAGAAGAGCCGCTTGGGGCTGAACCGACGCTTGCTTGGC GCGTGGGGGCGACTATGTTCCCAGTCGATGCCCCCAGGTCGCCGTCAAAATCGCGCGAATTCAGCCATATTCCAAACAAATTTGTAGAAACTCGGCG ATATTTCATTGAAATTTATACAAAAACATAAAAACATGCAAACTACGCTAAACTACGCCTATCCCTGCTACACCGTGGCCACCGCCCACCATCTACA TGCCGAGAAGCCTGTAGAAACGGGTGTAGTCGCCGCCGCCGCCGCCATCGTCGTCGTCGCGCCGGAGCCGCCGTTGTCCCTGCTGCACTCCTGGCCA GTGGCACCGACGCGCGGTGGGGTATTGGACGGGCCGGCCTCGTCCTCCTCGTCGTTGTTGAGGGCGATGACTGTTGGAAATATGCCCTAGAGACAAT AATAAATTGATTATTATTATATTTCCTTGTTCATGATAATCGTTTATTA 8 5.1. Use Dotter to align the sequence with itself. - Using Shift+PrintScreen, present a plot of the alignment in your homework. - Report the Dotter parameters used (window size and stringency). - What kind of repeats can you observe? Indicate their approximate coordinates. - What is this sequence? (Look at the coordinates of the best alignment against the database). ANSWER 5.1. Dotter Window size: 21 Stringency: 40/80 I can observe direct repeats. The coordinates of three of them are: (200;170), (325;345), (430;450). This sequence is a mobile element: Xusag-1. 6. 20 points Using the two following scoring matrices, calculate manually the scores for the following alignments: 6.1. Scoring matrix A: Match 2, mismatch -1, open gap -5, extended gap -1 (affine gap penalty) 6.2. Scoring matrix B: Match 2, mismatch -1, gap -2 per each bp (linear gap penalty) 9 - Which alignment is better under each scoring matrix? What is the effect of affine versus linear gap penalties in the number of gaps introduced in an alignment? Alignment I Alignment 2 ACAAAGATACTATTAAT || | | ||| || ACGA-GC--CTACAAAC ACAAAGATACTACTAAT |||| |||| || ACAA---GCCTACAAAC ANSWERS 6.1. and 6.2. Alignment 1: A. 18-5-5-5-1 = 2 B. 18-5-6 = 7 Alignment 2: A. 20-4-5-2 = 9 B. 20-4-6 = 10 Under both scoring matrices, alignment 2 is better since it has higher scores than alignment 1. Affine gap penalties give penalties not only for opening a gap but also for extending an open gap, resulting in lower scores (scoring matrix A). Because it is “more expensive” not only to open but also to extend an open gap, the number of gaps introduced in the alignment is smaller. On the other hand, linear gap penalties only apply penalties for every gap introduced and, therefore, the resulting scores are higher (scoring matrix B). In other words, it is “cheaper” to introduce a gap, and this is the reason why in the alignment the number of gaps introduced can be bigger. 7. 5 points Using Boolean operators perform the following ENTREZ searches and report the number of Nucleotides found: - Containing ‘flavonoid’ 1872 - Containing ‘flavonoid and related family words (using truncation, *) 2240 - Containing both ‘flavonoid’ and ‘hydroxylase’ 356 - Containing either ‘flavonoid’ or ‘hydroxylase’ 18607 - Containing both ‘flavonoid’ and ‘hydroxylase’ in rice 21 - Containing both ‘flavonoid’ and ‘hydroxylase’ in rice but not in Arabidopsis 5 10