BIT150 – Fall 2007

advertisement
BIT150 – Fall 2009 –Homework 2
Due on Thursday October 8th by email to TA: rlnitcher@ucdavis.edu as Hwk2_Lastname
BEFORE the Lab
1. 15 points Using the DNA sequence presented below:
>DNA_Tm43J16
CAGTATGCCATGGATGTTCCCTCTATGGAAGCGTTAGCACTTTTGAAAGGACTGCGATTGGCAGAGCATTTGGGTATTTATGCACTGGTTGTGGAGT
CCGACTCACAGGAGATAGTACAAGCTATGAAGGATCCTTCGGAGTACCGTCCTTCGGCTGCGGTGTTGAGGGATGATTGCCGACAGGTGTTGAGTTC
CTTCGGCAGAGCGACGATTGTTCATTGTGCTACGGAGAGTAATGGGGTAGCGCATTGGCTAGTCAGGACTAGTTATAGAAGGAAGTTAAATGAGGTC
TGGCAAGAGCATCCCCCTGATTTCTTAATTCCCTCCCTTGTATCTGATATGATTATTTTGTAATAAAGGTCGCCCAAGTTCAGAAAAAACTCTGTTT
TACTGTTTAGTAATAGGATAAAAAAACTGACACAACGGCAGGAAATCTGGCTATTGCTTTTGGCTGTTAGGTAAATATAAAATATTTTACAGAAATA
CTAACGCTCATACGTGTGGCAACATTGTAACTCACCCACATGTCTGGATCATCGTCCAGTTAAGTTTGCACGAATCTTGACACACCGAGGCAGTTTT
CGCGTGCCACGTAGGATAAGGCCGGTGTGTG
1.1. Do a blastn search against the nucleotide collection database.
- Report the lowest E value and calculate the probability of finding an alignment with this
E value by chance (P=1-e-E).
- Can you conclude that your finding is NOT just a random alignment?
ANSWER 1.1
The Lowest E is 0.63. This makes P = 0.4674. (P=1-e-0.63  P=1-0.5326)
With this low probability, I can NOT conclude that my finding is NOT just a random alignment, since the lowest
expected value E = 0.63 of finding alignments with scores equivalent to or better than the Bit Score occurring by
chance in the database search is relatively high and, as a consequence, the probability P = 0.4674 of finding at
least one high-scoring segment pair with the Bit Score by chance is also high. (Remember: the suggested BLAST
cutoff for nucleotide searches is E < e-10 to consider that your finding is NOT a random alignment).
1.2. Repeat your blastn search but now against the est_others database.
- Report the lowest E value. Can you conclude now that your finding is NOT just a random
alignment?
- Are all these EST sequences present in the nucleotide collection database?
- Click in the link ‘Distance tree of results’ that appears on top of your table of ‘Sequences
producing significant alignments’. Using Shift+PrintScreen, include the picture of the
tree in your homework. How can this tool be used to understand the BLAST results?
ANSWER 1.2
1
Lowest E value is 5e-14. We still can now conclude that the finding is NOT just a random alignment (5e-14 < e-10).
All these EST sequences from the EST database are NOT present in the nucleotide collection database.
Distance Tree of results:
In cases where many results appear to be closely related to your query sequence, you can use phylogenic
relationships to correct identify your sequence.
1.3. This sequence is from cultivated diploid wheat, Triticum monococcum L., a species that
belongs to the Triticeae tribe within the Poaceae (grass) family. Repeat your blastn search
using the tribe as the limit by Organism.
- How many alignments did you find? Open the flat files of the first three entries and
indicate the NCBI division to which they belong. Report any accession.version numbers
which you can conclude are not a random alignment.
- Report the lowest E value. Is this a lower or a higher E value than the one obtained in
1.2.?
ANSWER 1.3
-With the Family limit, 30 accessions were found. All of the first three accessions (GH732400.1, CJ928818.1, and
CK158097.1) are from the EST database, as the search was done against the est_others database. Only one accession
(GH732400.1) has a low enough probability of not being just a random alignment.
-The E value for this session is lower than the one found in 1.2.
2
2. 40 points Using the DNA sequence presented below:
>DNA TaFDL2_D
ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA
CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA
GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT
TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC
ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT
TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT
ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA
TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC
TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC
TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG
TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA
TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA
CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG
TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT
TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG
AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG
ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT
TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT
AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT
CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA
TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA
TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA
AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG
AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG
AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC
TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT
AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG
AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA
2.1. Use BLAST2Sequences to predict if any repetitive elements are present. Paste an image of
your alignment in the homework and state what kind of repeat it is. Highlight the repeat
region in your sequence in Grey.
ANSWER 2.1
3
There is a MITE present in the first intron.
2.2. Do a blastn search against the est_others database with a limit to the organisms in the
Triticeae family.
- Report the lowest E value and the number of alignments you find with this E value.
- Indicate the Maximum score and the Total score? How are these two measurements
obtained?
- Which % of the sequence is covered by the aligned bases?
- A gene is present in this DNA sequence. From your blastn search against the est_others
database, how many exons would you predict this gene has?
- Highlight with different colors in the DNA sequence the exons of the gene defining their
borders based on your best alignments.
- In which tissues it is transcribed based on the EST profile viewer?
4
ANSWER 2.2
There are three alignments with and E value of 2e-123. These alignments have a Max Score=448,
Total=sum of 3 HSP=734, Query coverage=15%.
I would predict this gene has 3 exons.
Exon1, Exon2 and Exon3.
>DNA TaFDL2_D
ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA
CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA
GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT
TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC
ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT
TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT
ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA
TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC
TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC
TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG
TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA
TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA
CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG
TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT
TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG
AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG
ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT
TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT
AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT
CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA
TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA
TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA
AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG
AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG
AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC
TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT
AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG
AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA
5
FDL2 is expressed in the inflorescence (flower), the root, and the seed.
2.3. The following is the protein sequence of the gene present in the DNA sequence provided
above.
>Protein TaFDL2
MAGPFMASHLGPQPLSVATGAIMEPIYPDGQITSPMLDALSDPQTPRRKRGASDGVTDKVVERRQKRMIKNRESAARSRARKQAYTNELENKVSRLE
EENERLKKQKELDMMITSAPPPEPKYQLRRTSSAPV*
-
-
Use the appropriate blast program to perform an alignment between the DNA sequence
and the protein sequence. Can you confirm the number of exons you had predicted in
2.2.?
Improve the borders of the exons defined in 2.2. based on your alignment. Find the
START codon (ATG), the STOP codon (TGA), and the splicing sites (5’ GT and 3’ AG)
of the gene, and indicate them in the DNA sequence with bold red letters (the gene is in
the 5’ -> 3’ orientation).
Obtain the cDNA of the gene from the START codon to the STOP codon after
eliminating the introns according to the splicing sites. Present the cDNA sequence in your
homework. (Hint: to check if you correctly annotated the cDNA, try translating it. Your
protein should match the one provided above.)
ANSWER 2.2
Yes, 3 exons are found by tblastn.
6
ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA
CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA
GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGTCTACAGCTATTGTCCTGTCTATTCCTGTTTTCCTACGCAT
TGGTTATGTCCTGTTATAATGTTCTGATTGCAGCATCAAAGCTTTGTGAAAAGTTATTGTTGAATCTTGAATGTATTGATTTAGGCATTTCTGTTAC
ACTTTTTAGCCTCAAATTGCAAGAATATCATTAGGTACTCCCTTCCGTTTCTTTTACTTCGCATATAAGATTTGTTTGAAGTCAAACTTCATAAAGT
TTGAGCAAATTTATATTAAAAATATCAACATCTACAACACTAAAGTTATACAATATGAAAATTAATTTCATGATGCCTCTAATGATATTGATTTCGT
ATTGCGAATGTTGATATTTTTTTCCTATAAAGTTGGTCAAACTTTATAAAGTTTGACTTCAGACAAAACTTATATGCAGAGTAAAAAGAAACGACTA
TGCCACTGGAGTAGCCTATTGACTTATTCTCATTCCATGTCATTCTACTCTATTAATAAAAATATAGCAATAACCTAGTGCATTCTACTTGGCAGGC
TTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAAGAGGAGAACGAGAGGTTGAAGAAGCAGAAGGTACTTTGTCTTCTGTTACTGTAACAC
TGTTACTTTTCGCCCCTTCTCTTCTTTTTGGTGGAGGGGATATTTAACTACAGCATTCCACTATGCATTGTAGTGAACCCAGGTTATTGTAACTATG
TTTTTATTCAGGTAGTAAGCTATTTTGTTTCACACTAGTCGATCGTTCAACTTCTTTTTACTTAGGCTCAATAATATCATGTGTAACAAGTGCACTA
TACTGATGTGATCATGTTAGTTAATTCCATAGTACAACCACCCACACAACTCATTTAGTATGATTTGCAGACCTGTATCTATATTGTCACTGCTGAA
CCCCATATGATCATTGCTACTTTTTTTTGTCAAGTATTGTTGAGAAAACTATGTTAGTTACCATGCTGACAGACACATGCAGAGATCTCATTGTTTG
TTTTTGATGTTCAAATATCTATTGGAGGCTATGTAATGAAAGTGGAGTCTCAAAGTTATTTTTCTTGCTTACACTAACTAAGTTTTGGCATTGACTT
TGTGCTACACTTTTTTTACAAGGCCAAACATACCATAATGTATATAAACCAACTGAATGGGGCTAATTTTTGTTAACTTGAACTTAGTAATTTTTGG
AAATATAAGCAGCTGTTCATTCGAATTTAGAGGAAACTCTTATGAAGTTCTTGAGCATCTTATTTAATTGAATATACCCTACTGAATGTATTGACCG
ATGTACTTAGGCTATGCTTCTCAATATTTTGCATGGTGAAGTTGATATGTGCATGGCTACAACAATTTTAATGATATGGAAGCAGAAAAACTGCTAT
TTAATGACTTGTAGTGGGGTTTGTCTATATAAGATATATAGTTGCAAGCCCATATGTTGTACTCTATTGTTTCGGTTGCTTCTTAGTAGTTCTTATT
AGTTAACAATGCGAGTGATTTCAGCACTACATGCTATTCCTAATAATCAGTTTGTACGAAGAACTGCAGCATTAATTCATAGAACTTCATTGCCCTT
CTCTACAGTGTAGCAATTCAGTAAGATGCTTGGCGCCCTTACGGCATTTGCTTTGGTTGATCAAAACTAAGTTGCTGCTGAGAATTGAATATATAGA
TAGGTTATCTTGATGGACTTTTGTTTGAATCAATGAATCATGTCTCTATACTTGGCGAGTTAGCAGTGTAATAGAATTGCTTCCCCGTAGTCCTTTA
TACAATTGACAAACCTGGAGTTACCCAACAGATTGCAGTCCTTCATGAAAGTAAAATAAACACATATCATGAACATTGACAGGAACCACACTTAACA
AACAGATCACCATTGGAAATGAATACATTAGACTTTGTACTCACTAAATCGTAATCACGCGTGAAGCAACAGAGCACCGGGCAAAAGAAAACCACAG
AGAGCATCTATGCTTGTGTTTTCTGTATTTTGGGTGCAATTCCATTTGGTCCTTGCAATGGGTGCCTGCCATTTATCTGTTAAGACCACATCACCTG
AAGATGATGCTGGTGATATTAATCATATCTAGCAATATTTGCAGTCCTATTCTTTTTTAGCCTAACACAATGATACTGAGATTTCATTGGTCATTAC
TTTATGGCTGTAAATAATTTTTTGTCAGCATGATGTAATTGACTCTTGTTCTTCAATGCAACAAACATTATCCTCACACTTCTTTTGGATGCTAAGT
AGCTTATATTATGTTATATTCATGTCATGTCTGACGCCTTCCATTTCATGATAATTTCAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAG
AACCCAAGTATCAACTCCGGAGAACAAGTTCTGCCCCTGTTTGA
>FDL2_cDNA
ATGGCAGGCCCTTTCATGGCAAGTCATTTGGGTCCTCAGCCATTGTCTGTTGCTACAGGTGCTATCATGGAACCAATTTACCCGGATGGCCAGATTA
CGTCACCAATGCTCGATGCACTTTCTGATCCTCAGACACCTAGGCGCAAGCGTGGTGCCTCAGATGGTGTAACTGATAAAGTCGTAGAAAGAAGGCA
GAAGAGAATGATAAAAAACAGGGAATCAGCCGCACGCTCCAGAGCGAGGAAGCAGGCTTACACTAATGAGCTTGAAAACAAGGTGTCTCGTCTAGAA
GAGGAGAACGAGAGGTTGAAGAAGCAGAAGGAGTTGGACATGATGATAACCTCCGCGCCTCCCCCAGAACCCAAGTATCAACTCCGGAGAACAAGTT
CTGCCCCTGTTTGA
7
2.4.Using the protein sequence of this gene, perform a blastp search.
- Report the lowest E value and the number of alignments you find with this E value.
- What is this gene? Which percentages of identity between your query protein sequence
and the aligned proteins from the database support your conclusion? Report the
accession.version numbers of the aligned proteins from the database.
- What is the conserved domain present in this protein? Using Shift+PrintScreen, present a
picture of it in your homework.
ANSWER 2.3
The lowest E value is 3e-70 – Triticum aestivum
This gene is a bZip transcription factor.
Percent Identity
Triticum aestivum
99%
Rice
80%
Sorghum Bicolor
82%
Zea mays
78%
The protein contains a conserved bZIP domain.
3. 10 points The following is the protein sequence of the corn (Zea mays) orthologue of the
wheat gene presented in 2:
>protein ZmFD
mssqtgggkesdagpgqhrqmqslarqgslynltldevqnhlgepllsmnfdellksvfpdgvdsdgavtgkpdrts
slqrqgsilmppqlskktvdevwkgiqggpetstvvdglqrrerhptlgemtledflvkagvvteglvkdsadfpsn
mdtagssvvvaaasslnpgaqwlqqyqqqvlgsqqlslagsymasqlrpqplsiatgatldsiysddqitspsfgal
8
sdpqtpgrkrgalgevvdkvverrqkrmiknresaarsrarkqaytnelenkvfrleeenkrlkkqqeldeilssap
ppepkyqlrrtgsaaf
3.1. Use the appropriate blast program to perform an alignment between these two protein
sequences, FDL2 from wheat and protein from rice.
- What is the percentage of identity between them? 78% Identity
3.2. Use one of the dynamic-programming methods shown in both Lecture2 and Lab2 to align
both protein sequences.
- Would you perform a global or a local alignment? Local. The proteins are different
lengths, making Local alignment more appropriate in this case.
- Which BLOSUM matrix would you use?
BLOSUM62, greater than 30%
identity
- Answer these questions and based on your answers run the alignment. Present your
results reporting the length of the sequence aligned, similarity, identity, number of gaps,
and final score.
Triticum vs Zea mays
Local:
Identity: 78.6%
Similarity: 86.3%
Gaps 0%
Score 513
4. 5 points The following two sequences correspond to the same gene in both wheat and maize
(60 million years of divergence). The sequences highlighted in green correspond to the exons
and the ones in black text to the introns. START and STOP codons are bolded and highlighted.
Splicing sites are bolded and in red.
>GeneA_Wheat
GGAAGGGGAAATGGCCGGTAGGGATAGGGACCCGCTGGTGGTTGGCAGGGTTGTGGGGGACGTGCTGGACCCCTTCGTCCGGACCA
CCAACCTCAGGGTGACCTTCGGGAACAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTCGCCCAGCAGCCCAGGGTT
GAGGTGGGCGGCAATGAGATGAGGACCTTCTACACACTCGTACGTACACAGTCACTATCTAATGCCAATTTATCTCTGAAAGTGCT
CACCACACGCACATGATCGATCGAGCTCGATCTATAGTACGTGAGGGAAATTGATTTTCGATGCTTCTGTTCACATGTTTGCCTCA
GCAAGCACATGACTAATGCTCCATCTTGCATATGTCTCTGTGCCCTCTGGTGTTGATCATGATTTTTCTATGCTTCTTCTATGTTC
GGGGAGCATTTATTTTTTATGCTTCTCTTGACATGTTTCATGTTTGTCCTAGCAAGCACACGAGTAATTAAAGCTCGATCTTAAAT
ACTCTCTCCGTCCGAATAAATGTACTTCTAGCTTTTGTCTTAAGTCAAAGTTTTAAAATTTTGACCAACTTTATAGGAAAAAGTAG
CAGCATTTATGACACTAAATTAGTATCACTAGATTCGTTTTGAAATGTATTTTCATAATATATCAATTTGATATTATATATGTTAC
TACTTATTTGTATATAGTTGGTCAAAGTTTTAAAACTTTGACTTAGGATAAAAACTAGAAGTACACTTATTCGTGGACGGAGGGAG
TATATGCTTATGTAGGTAGTACTCTCTACTTTGATCATGATGTGCACGCGTTTACTGCCCGCAGGTGATGGTAGACCCAGATGCTC
CAAGTCCAAGCGATCCCAACCTTAGGGAGTATCTCCACTTGAAAGTACTAAA
>GeneA_ZeaMays
ATAGATCGACATGGCCGGCAGGGACAGGGAGCCGCTGGTGGTTGGTAGGGTGGTCGGCGACGTGCTGGACCCCTTCGTCCGGACCA
CCAACCTCAGGGTCAGCTACGGGGCCAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTGGTGCACCAGCCAAGGGTC
GAGGTCGGGGGACCTGACATGAGGACCTTCTACACCCTCGTACGTGTATATATATATATATATATACACGTCGTCGTTTTACTTCT
CTTCATTGGCTAGCTACTTAGCTAGCTAGCTAGCTTATAATGATAGCCCGTTGATCCATCGAGATATGATCGTACGTGATGCCATG
CATGCTTCGTTCTTCAGGTGATGGTGGACCCAGATGCTCCGAGCCCAAGCGACCCGAACCTTAGGGAGTACCTACACTTGAAAAAT
CTATA
9
4.1. Use Dotter and BLAST2SEQUENCES to align both gene sequences.
- Using Shift+PrintScreen, present a plot of the alignments in your homework.
- Report the Dotter parameters used (window size and stringency).
- What are the conserved parts between the wheat and rice genes?
ANSWER 4.1
The exons are conserved.
DOTTER: Window Size 22, Stringency: 40/100
BLAST2SEQUENCES
10
5. 10 points Using the following sequence:
>Hv_DNA
tgaggcctaagagtaccaaatcccttctgaaaattgtggtttgagaagtgggttttagagaagaagaagaagatgat
gaaaagtcccttctgtgttcggatttggtttggttaaggattacctttgtactgttgttgttgttgttgttgttgtt
cctgatgaga
5.1. Use Dotter to align the sequence with itself.
- Using Shift+PrintScreen, present a plot of the alignment in your homework.
- Report the Dotter parameters used (window size and stringency).
- What kind of repeats are you observing? Indicate their approximate coordinates.
- What is this sequence? (Look at the coordinates of the best alignment against the
database).
ANSWER 5.1
Stringency: 38/68, Window: 21, Microsatellite direct repeats at 55-80, 120-150bp
Accession DQ865365, Hordeum vulgare subsp. vulgare clone B microsatellite
CrtSSR47 sequence
6. 15 points Using the two following scoring matrices, calculate manually the scores for the
following alignments:
6.1. Scoring matrix A: Match 2, mismatch -1, open gap -7, extended gap -1 (affine gap penalty)
6.2. Scoring matrix B: Match 2, mismatch -1, gap -2 per each bp (linear gap penalty)
- Which alignment is better under each scoring matrix?
- What is the effect of affine versus linear gap penalties in the number of gaps introduced
in an alignment?
Alignment I
Alignment 2
CAGTCCGATGTGGCGG
|| ||
| | || |
CACTCA--TTTCGC-G
CAGTCCGATGTGGCGG
||| || |
||| |
CAGACCAA---GGCAG
ANSWER 6.1
2+2-1+2+2-1-7-1+2-1+2-1+2+2-7+2=1
2+2+2-1+2+2-1+2-7-1-1+2+2+2-1+2=8
11
Alignment 2 is better.
ANSWER 6.2
2+2-1+2+2-1-2-2+2-1+2-1+2+2-2+2=8
Alignment 2 is better.
2+2+2-1+2+2-1+2-2-2-2+2+2+2-1+2=11
7. 5 points Using Boolean operators perform the following ENTREZ searches and report the
number of Nucleotides and ESTs found:
- Containing ‘bZip’ 6701 Nucleotides, 939 ESTs
- Containing ‘bZip and related family words (using truncation, *) 6741, 952
- Containing both ‘bZip’ and ‘transcription factor’ 5980, 598
- Containing either bZip’ or ‘transcription factor’ 69499, 56341
- Containing both ‘bZip’ and ‘transcription factor’ in rice 305, 11
- Containing both ‘bZip’ and ‘transcription factor’ in rice but not in Arabidopsis 139, 8
12
Download