Hwk2 KEY - Plant Sciences

advertisement
BIT150 – Fall 2008 –
Homework 2 KEY
Due on Thursday October 9th by email to TA: mfaricelli@ucdavis.edu as Hwk2_Lastname
BEFORE the Lab
1. 15 points Using the DNA sequence presented below:
>DNA Tm322N9
CGGAATTATTATTTAATTGGTCAGATTTATTGTTTCTATTCAGACAGATGGTTTCAGCAATACTTTTTGTGTGACTTTTTTGCATGTGATGACACCG
TCTCCGAGGGCCGTCACCACCCCCAGACTCCTAGAGTAGAAGTCACCTGCAAGATACCTGGGTGTCAGTTATGTGCACGTGAACTGAGATGCTTGCA
GTCAAAAGAGATGAGTGTTGCCAGTTGATGCTTATTCTGACACCGGCAACGAGATGATTCACAACCTGCAAGCATTCAATCAAGAAGAGTAAACAGG
TATGGAACCGTGAACACTGCAAAAACAATTATGTTTTCTCATTAATGTATGATAAACTGATGCTATGAGATATTTTCTTGCTGTCTGATTACCATTT
GATGGAACCTTCACTATTATCAGTTGGGAAACAAACCTGTTGTTTACGTCACTTTGAGGCTGGAAACTGGAGTTGTGAGCTGCATAGTCGATGCAGT
TGATGCTTATTCTGACACCGGCAACGACATGATTCACCACCTGCAAGCATTCATTCAAGAAGAGTAAAGAATTTGGGGATGACAAATCGACCTAAAC
AGGTATTGGGTGCTCCGTTGTAAAATTCATTGTTCTCCGTC
1.1. Do a blastn search against the nucleotide collection database.
- Report the lowest E value and calculate the probability of finding an alignment with this
E value by chance (P=1-e-E).
- Can you conclude that your finding is NOT just a random alignment?
ANSWER 1.1.
The lowest E value is 0.16
P = 1-e-0.16
P = 1-0.852
P = 0.148
With this low probability, I can NOT conclude that my finding is NOT just a random alignment,
since the lowest expected value E = 0.16 of finding alignments with scores equivalent to or better
than the Bit Score occurring by chance in the database search is relatively high and, as a
consequence, the probability P = 0.148 of finding at least one high-scoring segment pair with
the Bit Score by chance is also high. (Remember: the suggested BLAST cutoff for nucleotide
searches is E < e-10 to consider that your finding is NOT a random alignment).
1.2. Repeat your blastn search but now against the est_others database.
- Report the lowest E value. Can you conclude now that your finding is NOT just a random
alignment?
- Are all these EST sequences present in the nucleotide collection database?
- Click in the link ‘Distance tree of results’ that appears on top of your table of ‘Sequences
producing significant alignments’. Using Shift+PrintScreen, include the picture of the
tree in your homework.
1
ANSWER 1.2.
The lowest E value is 8e-18
With this expected value, now I can conclude that my finding is NOT just a random alignment
(8e-18 < e-10).
All these EST sequences from the EST database are NOT present in the nucleotide collection
database.
Distance Tree of results:
1.3. This sequence is from cultivated diploid wheat, Triticum monococcum L., a species that
belongs to the Triticeae tribe within the Poaceae (grass) family. Repeat your blastn search
using the tribe as the limit by Organism.
- How many alignments did you find? Report their accession.version numbers. Open their
flat files and indicate the NCBI division to which they belong.
- Report the lowest E value. Is this a lower or a higher E value than the one obtained in
1.2.?
ANSWER 1.3.
2
I found only 3 alignments:
CJ653306.1
BE515575.1
BE444040.1
Since the search was done against the est_others database, all three of them belong to the EST
NCBI division.
The lowest E value is 3e-19, and this is a LOWER value than the one obtained in 1.2. (8e-18).
2. 30 points Using the DNA sequence presented below:
>DNA Fop1 Tm322N9
TTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATA
CTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGT
CGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAG
CTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAG
CGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTA
CCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTC
CGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACG
TGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGT
CGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGG
CTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGC
TGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCAC
AAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCT
ATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCG
CTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACC
TGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTG
CGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTG
GAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGA
GGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGAC
GCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCG
TACACAGTGTGACGCATGTTTTATCA
2.1. Do a blastn search against the est_others database.
- Report the lowest E value and the number of alignments you find with this E value.
- A gene is present in this DNA sequence. From your blastn search against the est_others
database, how many exons would you predict this gene has?
- Highlight with different colors in the DNA sequence the exons of the gene defining their
borders based on your best alignments.
ANSWER 2.1.
The lowest E value is 0, and there are 7 alignments with an E value = 0.
I would predict this gene has 2 exons.
Exon1 and Exon2.
3
2.2.The following is the protein sequence of the gene present in the DNA sequence provided
above.
>Protein Fop1 Tm322N9
MDHSVLLLLASLAAVAVAAVWHLRSHGRRTKLPLPPGPRGWPVLGNLPQLGAMPHHTMAALARQHGPLFRLRFGSVEVVVAASAKVARSFLRAHDAN
FSDRPPTSGAEHLAYNYQDLVFAPYGARWRALRKLCALHLFSARALDALRTIRQDEARLMVTHLLSSSSPAGVAVNLCAINVCATNALARAAIGRRM
FGDGVGEGAREFKDMVVELMQLAGVLNIGDFVPALRWLDPQGVVAKMKRLHRRYDRMMDGFISERGQHAGEMEGNDLLSVMLATMRWQSPADAGEED
GIKFTEIDIKALLLNLFTAGTDTTSSTVEWALAELIRDPCILKQLQHELDGVVGNDRLVTEADLPRLTFLAAVIKETFRLHPATPLSLPRVAAEDCE
VDGYHVSKGTTLIMNVWAIARDPASWGPDPLEFRPVRFLPGGLHESADVKGGDYELIPFGAGRRICAGLGWGLRMVTLMTAMLVHAFDWSLVDGTTP
EKLNMEEAYGQTLQRAVPLVVQPVPRLLSSAYTV*
-
-
Use the appropriate blast program to perform an alignment between the DNA sequence
and the protein sequence. Can you confirm the number of exons you had predicted the
gene has in 2.1.?
Improve the borders of the exons defined in 2.1. based on your alignment. Find the
START codon (ATG), the STOP codon (TGA), and the splicing sites (5’ GT and 3’ AG)
of the gene, and indicate them in the DNA sequence with bold red letters (the gene is in
the 5’ -> 3’ orientation).
Obtain the cDNA of the gene from the START codon to the STOP codon after
eliminating the introns according to the splicing sites. Present the cDNA sequence in your
homework.
ANSWER 2.2.
I used the program blastx to align the DNA sequence with the protein sequence.
I can confirm that the number of exons the gene has is 2.
4
>DNA Fop1 Tm322N9
TTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATA
CTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGT
CGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAG
CTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAG
CGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTA
CCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTC
CGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACG
TGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGT
CGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGG
CTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGC
TGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCAC
AAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCT
ATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCG
CTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACC
TGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTG
CGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTG
GAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGA
GGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGAC
GCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCG
TACACAGTGTGACGCATGTTTTATCA
>cDNA Fop1 Tm322N9
ATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGTCGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGC
CTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAGCTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCA
TGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAGCGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAAC
TTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTACCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCC
GCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTCCGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCT
CTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACGTGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATG
TTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGTCGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGC
CCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGGCTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAG
GGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGCTGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGAC
GGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCGCTGG
CAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACCTGCC
ACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTGCGAG
GTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTGGAGT
TCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGAGGAT
ATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGACGCCC
GAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCGTACA
CAGTGTGA
2.3.Using the protein sequence of this gene, perform a blastp search.
- Report the lowest E value and the number of alignments you find with this E value.
- What is this gene? Which percentages of identity between your query protein sequence
and the aligned proteins from the database support your conclusion? Report the
accession.version numbers of the aligned proteins from the database.
- What is the conserved domain present in this protein? Using Shift+PrintScreen, present a
picture of it in your homework.
ANSWER 2.3.
5
The lowest E value is 0, and there are 6 alignments with an E = 0.
This gene is likely a flavonoid 3’-hydroxylase.
NP_001064338.1
AAM00948.1|AC021892_12
AAN04937.1
AAP52914.1
BAF26252.1
EAY78007.1
EAZ15637.1
72% identity
Rice
ABG54319.1
71% identity
Sorghum
ABG54321.1
70% identity
Sorghum
ABG54320.1
AAV74194.1
AAV74195.1
69% identity
Sorghum
ACF85998.1
73% identity
Maize
The conserved domain present in the protein is Cytochrome P450.
3. 10 points The following is the protein sequence of the rice (Oryza sativa L.) orthologue of
the wheat gene presented in 2.:
>Protein Fop1 Rice
MDVVPLPLLLGSLAVSAAVWYLVYFLRGGSGGDAARKRRPLPPGPRGWPVLGNLPQLGDKPHHTMCALARQYGPLFRLRFGCAEVVVAASAPVAAQF
LRGHDANFSNRPPNSGAEHVAYNYQDLVFAPYGARWRALRKLCALHLFSAKALDDLRAVREGEVALMVRNLARQQAASVALGQEANVCATNTLARAT
IGHRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALRWLDPQGVVAKMKRLHRRYDNMMNGFINERKAGAQPDGVAAGEHGNDLLSVLLARMQ
EEQKLDGDGEKITETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKEAQHELDTVVGRGRLVSESDLPRLPYLTAVIKETFRLHPSTPLSLP
REAAEECEVDGYRIPKGATLLVNVWAIARDPTQWPDPLQYQPSRFLPGRMHADVDVKGADFGLIPFGAGRRICAGLSWGLRMVTLMTATLVHGFDWT
LANGATPDKLNMEEAYGLTLQRAVPLMVQPVPRLLPSAYGV*
3.1. Use the appropriate blast program to perform an alignment between these two protein
sequences, Fop1 from wheat and Fop1 from rice.
- What is the percentage of identity between them?
ANSWER 3.1.
I used the program blastp to align both protein sequences. They are 72% identical.
3.2. Use one of the dynamic-programming methods shown in both Lecture2 and Lab2 to align
both protein sequences.
- Would you perform a global or a local alignment?
6
-
Which BLOSUM matrix would you use?
Answer these questions and based on your answers run the alignment. Present your
results reporting the length of the sequence aligned, similarity, identity, number of gaps,
and final score.
ANSWER 3.2.
I would perform a global (needle) alignment.
I would use the BLOSUM62 matrix, since the proteins are 72% identical.
Including gaps, 535 amino acids were included in the global alignment between both proteins.
They resulted to be 79.1% similar, and 72.0% identical, and to have 25 (4.7%) gaps, and a final
score of 1903.0.
4. 10 points The following two sequences correspond to the same gene in both wheat and rice
(50 million years of divergence). The sequences in pink correspond to the exons and the ones in
black to the introns. START and STOP codons are bolded and highlighted. Splicing sites are
bolded and in red.
>CyB5 Wheat
CGAGAGCGAGATGCCGACGCTGACGAAGCTGTACAGCATGAAGGAGGCCGCCCTCCACAACACCCCCGACGACTGCTGGATCGTCGTCGACGGCAAG
GTAGCGCCTCCCTCATACCCCTCGCCGCCGATCTGGCTTCAGCAATACTGCCCCTAACATCGGTAGGTAGGTAGGTAGGGTGTATGGACGCGCTTCG
TCGTTGCTAGTTGGGCTTCGACCCCCGCCCGTAGCCTGTTCGACCGAATGCCTGGGAGATCCTGCGCTCGCTGTGTTAGTGAGAAGGCCGCAGAAAT
CGAAACCTGCTAGTCTAGGCACCAACGCTAAGGTTTGATCCTCGTGGGACAACTGTGCTGGGGTATCCTGTTTGTGGAGGTTGTGCTTGAAAGCAAC
TACAGCAGATGCCTCATACTGAGGGCTTTGAATCAAATAGAATTTGTGTCAGCAGAGAGTAGATGCGCATTGCAGTACTCCTACTTGGCAATATGTT
CCACTATTCTGATTGTGTGGAGATCTCATGCCGTGTTGATGGATACATTGCAGATTTATGATGTGACTGCGTATTTGGACGACCATCCTGGGGGTGC
TGATGTTCTCCTTGGGGTGACCGGTACTTCTTCTCTCCGCTTCTTTTCATGTTCTTGTTCAGCACATTTTATTCTCTCTTAGGCTGAATGCTCATGT
ATGATAATCCGTTTGAAGGTATGGATGGCACCGAGGAATTTGAAGATGCAGGCCACAGCAAGGATGCCAAGGAGTTGATGAAGGATTACTTCATTGG
GGAGTTGGACTTGGACGAAACACCTGACATGCCTGAGATGGAGGTTTTCAGGAAAGAGCAGGACAAGGACTTCGCCAGCAAGCTGGCGGCTTATGCT
GTGCAGTACTGGGCCATTCCGGTAGCAGCAGTCGGGATATCAGCCGTGGTTGCCATATTGTATGCACGAAGGAAGTGA
>Cyb5 Rice
GGAGGAGGAGATGCCGACGCTGACGAAGCTGTACAGCTTGGAGGACGCGGCGCGCCACAACACCGCCGACGACTGCTGGGTCGTCGTCGACGGCAAG
GTAAGCTTTCCCCATCTTAGCTCTCCTCCGTTCCTTCGCTCCCCATCTTAGCTCTCCTCGTTGCTGCTGAAGTAGCAGTAGCACGTGTAACGGTGTA
AGGTCGGGAGATAGATGGGTGGGTGGATTGGTAGGGGGTGCGACCGTGCGAAGCTCGCTGCTCGCTCGGTCAAGATGTCGCCCGTAACCTGTTCGAC
GGAATGGCTACTAGATCGCGTGCTCGATTTCTTTGTGCTAAACTGCAATTTACCATCTTGCGATGCAGTAGTGGTATTTGTTGTCAGGCGACTAGTC
AGGAGTAGTGATTTAATGCGCTGTGGTTATAGTGCGGGCTATCATTCTTTCTTGTGGAAACCCGTCGTATTTACCTGCATTGAACTATTGAAGGCTA
TGGTCAAATTGTTTGCTAGGGTCACTAAAGAATTAGAGATCTGATGCATGGCTACATGTTACGTTGTTCTTACCTACTATTCAGACAAGTTCATGCT
GTGTCAATGAATGCGCTGCAGATTTATGATGTCACCAAGTATCTGGACGACCATCCTGGGGGTGCTGATGTTCTGCTCGAAGTGACCGGTACTGATA
ACCCTCCATTAATCTTATGTTTCTTTTTTCAGTAATACCTAGTTTATTTAGGTGGACTGATCATATCTGATTGTCTGTTATAAGGTAAGGATGCCAA
GGAGGAATTTGATGATGCGGGGCACAGCGAGAGTGCCAAGGAGCTAATGCAAGATTATTTCATTGGGGAGTTGGATCCAACACCCAACATCCCTGAG
ATGGAGGTTTTCAGGAAGGAGCAGGATGTGAACTTCGCAAGCAAGCTGATGGCCAATGCAGCACAGTACTGGCCCATTCCAGCGACAGTAGTCGGGA
TATCAGTCGTTATTGCTGTACTGTATGCACGCCAGAAGTGATAATC
4.1. Use both Dotter and Blast2sequences to align both gene sequences.
- Using Shift+PrintScreen, present a plot of the alignments in your homework.
- Report the Dotter parameters (window size and stringency) and the Blast2sequences
scoring system used.
- What are the conserved parts between the wheat and rice genes?
ANSWER 4.1.
Dotter
Window size: 22
Stringency: 40/100
7
Blast2sequences
Match: 1
Mismatch: -2
Gap open: 5
Gap extension: 2
The conserved parts between the wheat and rice sequences are the exons of the gene.
5. 10 points Using the following sequence:
>Tm67B4
TCATCTTTGGCAAACATGTCCTTAGAGCATCTCCAGCCGTTCAGCCCATAGGACGCCGAAGAAGAGCCGCTTGGGGCTGAACCGACGCTTGCTTGGC
GCGTGGGGGCGACTATGTTCCCAGTCGATGCCCCCAGGTCGCCGTCAAAATCGCGCGAATTCAGCCATATTCCAAACAAATTTGTAGAAACTCGGCG
ATATTTCATTGAAATTTATACAAAAACATAAAAACATGCAAACTACGCTAAACTACGCCTATCCCTGCTACACCGTGGCCACCGCCCACCATCTACA
TGCCGAGAAGCCTGTAGAAACGGGTGTAGTCGCCGCCGCCGCCGCCATCGTCGTCGTCGCGCCGGAGCCGCCGTTGTCCCTGCTGCACTCCTGGCCA
GTGGCACCGACGCGCGGTGGGGTATTGGACGGGCCGGCCTCGTCCTCCTCGTCGTTGTTGAGGGCGATGACTGTTGGAAATATGCCCTAGAGACAAT
AATAAATTGATTATTATTATATTTCCTTGTTCATGATAATCGTTTATTA
8
5.1. Use Dotter to align the sequence with itself.
- Using Shift+PrintScreen, present a plot of the alignment in your homework.
- Report the Dotter parameters used (window size and stringency).
- What kind of repeats can you observe? Indicate their approximate coordinates.
- What is this sequence? (Look at the coordinates of the best alignment against the
database).
ANSWER 5.1.
Dotter
Window size: 21
Stringency: 40/80
I can observe direct repeats. The coordinates of three of them are: (200;170), (325;345),
(430;450).
This sequence is a mobile element: Xusag-1.
6. 20 points Using the two following scoring matrices, calculate manually the scores for the
following alignments:
6.1. Scoring matrix A: Match 2, mismatch -1, open gap -5, extended gap -1 (affine gap penalty)
6.2. Scoring matrix B: Match 2, mismatch -1, gap -2 per each bp (linear gap penalty)
9
-
Which alignment is better under each scoring matrix?
What is the effect of affine versus linear gap penalties in the number of gaps introduced
in an alignment?
Alignment I
Alignment 2
ACAAAGATACTATTAAT
|| | |
||| ||
ACGA-GC--CTACAAAC
ACAAAGATACTACTAAT
||||
|||| ||
ACAA---GCCTACAAAC
ANSWERS 6.1. and 6.2.
Alignment 1:
A. 18-5-5-5-1 = 2
B. 18-5-6 = 7
Alignment 2:
A. 20-4-5-2 = 9
B. 20-4-6 = 10
Under both scoring matrices, alignment 2 is better since it has higher scores than alignment 1.
Affine gap penalties give penalties not only for opening a gap but also for extending an open
gap, resulting in lower scores (scoring matrix A). Because it is “more expensive” not only to
open but also to extend an open gap, the number of gaps introduced in the alignment is smaller.
On the other hand, linear gap penalties only apply penalties for every gap introduced and,
therefore, the resulting scores are higher (scoring matrix B). In other words, it is “cheaper” to
introduce a gap, and this is the reason why in the alignment the number of gaps introduced can
be bigger.
7. 5 points Using Boolean operators perform the following ENTREZ searches and report the
number of Nucleotides found:
- Containing ‘flavonoid’
1872
- Containing ‘flavonoid and related family words (using truncation, *)
2240
- Containing both ‘flavonoid’ and ‘hydroxylase’
356
- Containing either ‘flavonoid’ or ‘hydroxylase’
18607
- Containing both ‘flavonoid’ and ‘hydroxylase’ in rice
21
- Containing both ‘flavonoid’ and ‘hydroxylase’ in rice but not in Arabidopsis
5
10
Download