BIT150 – Fall 2007 – - Department of Plant Sciences

advertisement
BIT150 – Fall 2008 –
Homework 2
Due on Thursday October 9th by email to TA: mfaricelli@ucdavis.edu as Hwk2_Lastname
BEFORE the Lab
1. 15 points Using the DNA sequence presented below:
>DNA Tm322N9
CGGAATTATTATTTAATTGGTCAGATTTATTGTTTCTATTCAGACAGATGGTTTCAGCAATACTTTTTGTGTGACTTTTTTGCATGTGATGACACCG
TCTCCGAGGGCCGTCACCACCCCCAGACTCCTAGAGTAGAAGTCACCTGCAAGATACCTGGGTGTCAGTTATGTGCACGTGAACTGAGATGCTTGCA
GTCAAAAGAGATGAGTGTTGCCAGTTGATGCTTATTCTGACACCGGCAACGAGATGATTCACAACCTGCAAGCATTCAATCAAGAAGAGTAAACAGG
TATGGAACCGTGAACACTGCAAAAACAATTATGTTTTCTCATTAATGTATGATAAACTGATGCTATGAGATATTTTCTTGCTGTCTGATTACCATTT
GATGGAACCTTCACTATTATCAGTTGGGAAACAAACCTGTTGTTTACGTCACTTTGAGGCTGGAAACTGGAGTTGTGAGCTGCATAGTCGATGCAGT
TGATGCTTATTCTGACACCGGCAACGACATGATTCACCACCTGCAAGCATTCATTCAAGAAGAGTAAAGAATTTGGGGATGACAAATCGACCTAAAC
AGGTATTGGGTGCTCCGTTGTAAAATTCATTGTTCTCCGTC
1.1. Do a blastn search against the nucleotide collection database.
- Report the lowest E value and calculate the probability of finding an alignment with this
E value by chance (P=1-e-E).
- Can you conclude that your finding is NOT just a random alignment?
1.2. Repeat your blastn search but now against the est_others database.
- Report the lowest E value. Can you conclude now that your finding is NOT just a random
alignment?
- Are all these EST sequences present in the nucleotide collection database?
- Click in the link ‘Distance tree of results’ that appears on top of your table of ‘Sequences
producing significant alignments’. Using Shift+PrintScreen, include the picture of the
tree in your homework.
1.3. This sequence is from cultivated diploid wheat, Triticum monococcum L., a species that
belongs to the Triticeae tribe within the Poaceae (grass) family. Repeat your blastn search
using the tribe as the limit by Organism.
- How many alignments did you find? Report their accession.version numbers. Open their
flat files and indicate the NCBI division to which they belong.
- Report the lowest E value. Is this a lower or a higher E value than the one obtained in
1.2.?
2. 30 points Using the DNA sequence presented below:
>DNA Fop1 Tm322N9
TTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATA
CTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGT
CGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAG
CTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAG
CGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTA
CCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTC
CGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACG
TGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGT
CGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGG
CTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGC
TGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCAC
AAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCT
1
ATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCG
CTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACC
TGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTG
CGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTG
GAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGA
GGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGAC
GCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCG
TACACAGTGTGACGCATGTTTTATCA
2.1. Do a blastn search against the est_others database.
- Report the lowest E value and the number of alignments you find with this E value.
- A gene is present in this DNA sequence. From your blastn search against the est_others
database, how many exons would you predict this gene has?
- Highlight with different colors in the DNA sequence the exons of the gene defining their
borders based on your best alignments.
2.2. The following is the protein sequence of the gene present in the DNA sequence provided
above.
>Protein Fop1 Tm322N9
MDHSVLLLLASLAAVAVAAVWHLRSHGRRTKLPLPPGPRGWPVLGNLPQLGAMPHHTMAALARQHGPLFRLRFGSVEVVVAASAKVARSFLRAHDAN
FSDRPPTSGAEHLAYNYQDLVFAPYGARWRALRKLCALHLFSARALDALRTIRQDEARLMVTHLLSSSSPAGVAVNLCAINVCATNALARAAIGRRM
FGDGVGEGAREFKDMVVELMQLAGVLNIGDFVPALRWLDPQGVVAKMKRLHRRYDRMMDGFISERGQHAGEMEGNDLLSVMLATMRWQSPADAGEED
GIKFTEIDIKALLLNLFTAGTDTTSSTVEWALAELIRDPCILKQLQHELDGVVGNDRLVTEADLPRLTFLAAVIKETFRLHPATPLSLPRVAAEDCE
VDGYHVSKGTTLIMNVWAIARDPASWGPDPLEFRPVRFLPGGLHESADVKGGDYELIPFGAGRRICAGLGWGLRMVTLMTAMLVHAFDWSLVDGTTP
EKLNMEEAYGQTLQRAVPLVVQPVPRLLSSAYTV*
-
-
Use the appropriate blast program to perform an alignment between the DNA sequence
and the protein sequence. Can you confirm the number of exons you had predicted the
gene has in 2.1.?
Improve the borders of the exons defined in 2.1. based on your alignment. Find the
START codon (ATG), the STOP codon (TGA), and the splicing sites (5’ GT and 3’ AG)
of the gene, and indicate them in the DNA sequence with bold red letters (the gene is in
the 5’ -> 3’ orientation).
Obtain the cDNA of the gene from the START codon to the STOP codon after
eliminating the introns according to the splicing sites. Present the cDNA sequence in your
homework.
2.3.Using the protein sequence of this gene, perform a blastp search.
- Report the lowest E value and the number of alignments you find with this E value.
- What is this gene? Which percentages of identity between your query protein sequence
and the aligned proteins from the database support your conclusion? Report the
accession.version numbers of the aligned proteins from the database.
- What is the conserved domain present in this protein? Using Shift+PrintScreen, present a
picture of it in your homework.
3. 10 points The following is the protein sequence of the rice (Oryza sativa L.) orthologue of the
wheat gene presented in 2.:
>Protein Fop1 Rice
MDVVPLPLLLGSLAVSAAVWYLVYFLRGGSGGDAARKRRPLPPGPRGWPVLGNLPQLGDKPHHTMCALARQYGPLFRLRFGCAEVVVAASAPVAAQF
LRGHDANFSNRPPNSGAEHVAYNYQDLVFAPYGARWRALRKLCALHLFSAKALDDLRAVREGEVALMVRNLARQQAASVALGQEANVCATNTLARAT
IGHRVFAVDGGEGAREFKEMVVELMQLAGVFNVGDFVPALRWLDPQGVVAKMKRLHRRYDNMMNGFINERKAGAQPDGVAAGEHGNDLLSVLLARMQ
2
EEQKLDGDGEKITETDIKALLLNLFTAGTDTTSSTVEWALAELIRHPDVLKEAQHELDTVVGRGRLVSESDLPRLPYLTAVIKETFRLHPSTPLSLP
REAAEECEVDGYRIPKGATLLVNVWAIARDPTQWPDPLQYQPSRFLPGRMHADVDVKGADFGLIPFGAGRRICAGLSWGLRMVTLMTATLVHGFDWT
LANGATPDKLNMEEAYGLTLQRAVPLMVQPVPRLLPSAYGV
3.1. Use the appropriate blast program to perform an alignment between these two protein
sequences, Fop1 from wheat and Fop1 from rice.
- What is the percentage of identity between them?
3.2. Use one of the dynamic-programming methods shown in both Lecture2 and Lab2 to align
both protein sequences.
- Would you perform a global or a local alignment?
- Which BLOSUM matrix would you use?
- Answer these questions and based on your answers run the alignment. Present your
results reporting the length of the sequence aligned, similarity, identity, number of gaps,
and final score.
4. 10 points The following two sequences correspond to the same gene in both wheat and rice
(50 million years of divergence). The sequences in pink correspond to the exons and the ones in
black to the introns. START and STOP codons are bolded and highlighted. Splicing sites are
bolded and in red.
>CyB5 Wheat
CGAGAGCGAGATGCCGACGCTGACGAAGCTGTACAGCATGAAGGAGGCCGCCCTCCACAACACCCCCGACGACTGCTGGATCGTCGTCGACGGCAAG
GTAGCGCCTCCCTCATACCCCTCGCCGCCGATCTGGCTTCAGCAATACTGCCCCTAACATCGGTAGGTAGGTAGGTAGGGTGTATGGACGCGCTTCG
TCGTTGCTAGTTGGGCTTCGACCCCCGCCCGTAGCCTGTTCGACCGAATGCCTGGGAGATCCTGCGCTCGCTGTGTTAGTGAGAAGGCCGCAGAAAT
CGAAACCTGCTAGTCTAGGCACCAACGCTAAGGTTTGATCCTCGTGGGACAACTGTGCTGGGGTATCCTGTTTGTGGAGGTTGTGCTTGAAAGCAAC
TACAGCAGATGCCTCATACTGAGGGCTTTGAATCAAATAGAATTTGTGTCAGCAGAGAGTAGATGCGCATTGCAGTACTCCTACTTGGCAATATGTT
CCACTATTCTGATTGTGTGGAGATCTCATGCCGTGTTGATGGATACATTGCAGATTTATGATGTGACTGCGTATTTGGACGACCATCCTGGGGGTGC
TGATGTTCTCCTTGGGGTGACCGGTACTTCTTCTCTCCGCTTCTTTTCATGTTCTTGTTCAGCACATTTTATTCTCTCTTAGGCTGAATGCTCATGT
ATGATAATCCGTTTGAAGGTATGGATGGCACCGAGGAATTTGAAGATGCAGGCCACAGCAAGGATGCCAAGGAGTTGATGAAGGATTACTTCATTGG
GGAGTTGGACTTGGACGAAACACCTGACATGCCTGAGATGGAGGTTTTCAGGAAAGAGCAGGACAAGGACTTCGCCAGCAAGCTGGCGGCTTATGCT
GTGCAGTACTGGGCCATTCCGGTAGCAGCAGTCGGGATATCAGCCGTGGTTGCCATATTGTATGCACGAAGGAAGTGA
>Cyb5 Rice
GGAGGAGGAGATGCCGACGCTGACGAAGCTGTACAGCTTGGAGGACGCGGCGCGCCACAACACCGCCGACGACTGCTGGGTCGTCGTCGACGGCAAG
GTAAGCTTTCCCCATCTTAGCTCTCCTCCGTTCCTTCGCTCCCCATCTTAGCTCTCCTCGTTGCTGCTGAAGTAGCAGTAGCACGTGTAACGGTGTA
AGGTCGGGAGATAGATGGGTGGGTGGATTGGTAGGGGGTGCGACCGTGCGAAGCTCGCTGCTCGCTCGGTCAAGATGTCGCCCGTAACCTGTTCGAC
GGAATGGCTACTAGATCGCGTGCTCGATTTCTTTGTGCTAAACTGCAATTTACCATCTTGCGATGCAGTAGTGGTATTTGTTGTCAGGCGACTAGTC
AGGAGTAGTGATTTAATGCGCTGTGGTTATAGTGCGGGCTATCATTCTTTCTTGTGGAAACCCGTCGTATTTACCTGCATTGAACTATTGAAGGCTA
TGGTCAAATTGTTTGCTAGGGTCACTAAAGAATTAGAGATCTGATGCATGGCTACATGTTACGTTGTTCTTACCTACTATTCAGACAAGTTCATGCT
GTGTCAATGAATGCGCTGCAGATTTATGATGTCACCAAGTATCTGGACGACCATCCTGGGGGTGCTGATGTTCTGCTCGAAGTGACCGGTACTGATA
ACCCTCCATTAATCTTATGTTTCTTTTTTCAGTAATACCTAGTTTATTTAGGTGGACTGATCATATCTGATTGTCTGTTATAAGGTAAGGATGCCAA
GGAGGAATTTGATGATGCGGGGCACAGCGAGAGTGCCAAGGAGCTAATGCAAGATTATTTCATTGGGGAGTTGGATCCAACACCCAACATCCCTGAG
ATGGAGGTTTTCAGGAAGGAGCAGGATGTGAACTTCGCAAGCAAGCTGATGGCCAATGCAGCACAGTACTGGCCCATTCCAGCGACAGTAGTCGGGA
TATCAGTCGTTATTGCTGTACTGTATGCACGCCAGAAGTGATAATC
4.1. Use Dotter to align both gene sequences.
- Using Shift+PrintScreen, present a plot of the alignment in your homework.
- Report the Dotter parameters used (window size and stringency).
- What are the conserved parts between the wheat and rice genes?
5. 10 points Using the following sequence:
>Tm67B4
TCATCTTTGGCAAACATGTCCTTAGAGCATCTCCAGCCGTTCAGCCCATAGGACGCCGAAGAAGAGCCGCTTGGGGCTGAACCGACGCTTGCTTGGC
GCGTGGGGGCGACTATGTTCCCAGTCGATGCCCCCAGGTCGCCGTCAAAATCGCGCGAATTCAGCCATATTCCAAACAAATTTGTAGAAACTCGGCG
ATATTTCATTGAAATTTATACAAAAACATAAAAACATGCAAACTACGCTAAACTACGCCTATCCCTGCTACACCGTGGCCACCGCCCACCATCTACA
TGCCGAGAAGCCTGTAGAAACGGGTGTAGTCGCCGCCGCCGCCGCCATCGTCGTCGTCGCGCCGGAGCCGCCGTTGTCCCTGCTGCACTCCTGGCCA
3
GTGGCACCGACGCGCGGTGGGGTATTGGACGGGCCGGCCTCGTCCTCCTCGTCGTTGTTGAGGGCGATGACTGTTGGAAATATGCCCTAGAGACAAT
AATAAATTGATTATTATTATATTTCCTTGTTCATGATAATCGTTTATTA
5.1. Use Dotter to align the sequence with itself.
- Using Shift+PrintScreen, present a plot of the alignment in your homework.
- Report the Dotter parameters used (window size and stringency).
- What kind of repeats are you observing? Indicate their approximate coordinates.
- What is this sequence? (Look at the coordinates of the best alignment against the
database).
6. 20 points Using the two following scoring matrices, calculate manually the scores for the
following alignments:
6.1. Scoring matrix A: Match 2, mismatch -1, open gap -5, extended gap -1 (affine gap penalty)
6.2. Scoring matrix B: Match 2, mismatch -1, gap -2 per each bp (linear gap penalty)
- Which alignment is better under each scoring matrix?
- What is the effect of affine versus linear gap penalties in the number of gaps introduced
in an alignment?
Alignment I
Alignment 2
ACAAAGATACTATTAAT
|| | |
||| ||
ACGA-GC--CTACAAAC
ACAAAGATACTACTAAT
||||
|||| ||
ACAA---GCCTACAAAC
7. 5 points Using Boolean operators perform the following ENTREZ searches and report the
number of Nucleotides found:
- Containing ‘flavonoid’
- Containing ‘flavonoid and related family words (using truncation, *)
- Containing both ‘flavonoid’ and ‘hydroxylase’
- Containing either ‘flavonoid’ or ‘hydroxylase’
- Containing both ‘flavonoid’ and ‘hydroxylase’ in rice
- Containing both ‘flavonoid’ and ‘hydroxylase’ in rice but not in Arabidopsis
4
Download