description

advertisement
STOR072: Assignment #4
1. Use the BLAST webpage to find the best match for the following
sequences in the Nucleotide collection database.
mystery sequence #1
ATGTGCCATG
CCAGCTGCGC AACAGAATAT
TCCTGCTCTC
AGCTGTAGCT CTAAGTATCA
GTGTGAAGCA
AGAGTGAAGC GACCCATGAA
CGCATTCATT
GCTCTAGAGA ATCCCAAAAT
GCGAAACTCA
AAAATCCTTA CCGAAGCCGA
TAAATGGCCA
ATGCATAGAG AGAAATACCC
GAATTATAAG
CAAAACAGTT GCAGTTTGCT
TCCGGCAGAT
AACAACAGGT TGTACAGGGA
TGACTGTACC
TTAGTCCACT TACCGCCCAT
CAACACAGCC
CACTCGATTC CAATCATATG
CCAAAGCTGT
TTAAGCGTAT TTAACACTGA TGATTACAGT
CGGAGAAGCT CTTCCTTCAT TTGCGCTGAA
GGAGAAAACA GTAAAGGCAG CGTCCAGGAT
GTGTGGTCTC GGATCAGCAG GCGCAAGATG
GAGATCAGCA AGCAGCTGGG ATACCAGTGG
TTCTTCCAGG AGGCACAGAA ACTACAGGCC
TATCGACCTC GTCGGAAGGC GAAGATGCTG
CCCTCTTCGG TCCCTGCCAG AGAAGTGTAC
AAAGCCACGC ACTCAAGAAT GCAGCACCAG
AGCTCACCGC AGCAACGGGA CCGCTACAGC
AG
Give the organism (common name) and name of protein. Find the 6 errors
in the best match (scroll down to the Alignments section), giving the
location number of each error and whether it is a mismatch, a gap in the
databank sequence, or a gap in your sequence.
mystery sequence #2: In Michael Crichton's Jurassic
Park (p. 103), a putative dinosaur DNA sequence is given.
Search for this sequence for the following sequences in the
Nucleotide collection database.
GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAA
TCGACGC
GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGC
TCCCTCG
TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAA
GCGTGGC
TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGG
CTGTGTG
CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAAC
CCGGTAA
AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTC
GCTGGAG
ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTT
CGTCACT
CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCG
CTGGGCT
GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTT
CCGGCGG
CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACA
GCTTCAA
CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATT
TATGCCG
CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCA
TCACAAA
CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCC
CCTGGAA
GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCC
CTTCGGG
CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTC
CAAGCTG
ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTT
GAGTCCA
ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTG
CCTCCCC
GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCG
CTAACGG
CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAA
CCCTTGG
CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTT
GGGTCCT
Is Chrichton pulling one over on us? Why or why not? Scroll
down to the Alignments section and note the odd pattern in the
matching alignment. Extra credit for the correct interpretation of
the odd pattern in the match. (Hint: The sequence is given below
exactly the way it appears in the book. Further, the error has
nothing to do with biology or BLAST.)
3. For the following protein sequence from the Non-redundant protein
sequences database (this is under the blastp tab):
MAHETSFNDA
HPEQPNVDGQ
LEDTNNNNNS
PPPTDVNTTT
VAAAAARITA
PALQLIDMDN
EGNDSSLFGE
TTSKVAEDDF
IIGNVFVIAA
ADLFVACLVM
IWTSCDVLCC
IDYIHSRTSN
QFGWKDPDYL
TCCTFYVPLL
RPRPVDAAVN
LGRFSTAKSK
DGNSTNTVNT
APSTSGNQIA
AAVNGMAPSG
EDQDEQVGPQ
KANGVEVLED
GGGGASTSNA
IAAAAAGPMT
ESPSTPEPRS
QQQLSSIANP
KTLAIITGAF
QISDSVASLF
EFRQAFKRIL
LDYIYIANSM
DQDDAELEEL
KRYYSSGKRR
TTAGSPLATA
KAAHRALTTK
NYTNVAVGLG
MLANRSGQLD
TQLLRMAVTS
IILERNLQNV
PLGAVYEISQ
TASILHLVAI
RVFMMIFCVW
QRIEQQKCMV
VILALYWKIY
NNQPDGGAAT
TGSAVGVSGP
VEDTEFSSSN
TVSHLVALAK
RQEDDGQRPE
PTTATSAMTA
PQLQQQLEQV
TTITSISALS
AKTSTLTSCN
RQPTTPQQQP
MQKVNKRKET
VVCWLPFFVM
LWLGYFNSTL
FGGHRPVHYR
NDRAFLIAEP
DDMAVTDDGQ
ADFIGSLALK
ALAAAAASAS
QDATSSPASS
AMLLNDTLLL
LINGTGGLNV
VLLGLMILVT
ANYLVASLAV
GWILGPELCD
AVDRYWAVTN
TAAVIVSLAP
SQDVSYQVFA
QTARKRIHRR
DTKLHRLRLR
ASGGRALGLV
VDSKSRAGVE
QQGKSTAKSS
HGEQEDREEL
AGTNESEDQC
QQLQKSVKSG
PQTPTSQGVG
QSHPLCGTAN
HQQAHQQQQQ
LEAKRERKAA
ALTMPLCAAC
NPVIYTIFSP
SGKL
a. What is this protein, and what organism (common name) does it come
from? Only give sequences that come from real organisms! Also give the
E-value of the match.
b. Do the same thing searching just vertebrates, and just chimpanzees.
(This is done by typing in the group in the Organism box of the Choose
Search Set section.)
Download