Bio-Informatics Unknown III

advertisement
Normal Variant:
>gi|32880123|gb|AAP88892.1|
MGPWGWKLRWTVALLLAAAGTAVGDRCERNEFQCQDGKCISYKWVCDGSAECQDG
SDESQETCLSVTCKSGDFSCGGRVNRCIPQFWRCDGQVDCDNGSDEQGCPPKTCS
QDEFRCHDGKCISRQFVCDSDRDCLDGSDE
ASCPVLTCGPASFQCNSSTCIPQLWACDNDPDCEDGSDEWPQRCRGLYVFQGDSS
PCSAFEFHCLSGECIHSSWRCDGGPDCKDKSDEENCAVATCRPDEFQCSDGNCIG
SRQCDREYDCKDMSDEVGCVNVTLCEGPN
KFKCHSGECITLDKVCNMARDCRDWSDEPIKECGTNECLDNNGGCSHVCNDLKIG
YECLCPDGFQLVAQRRCEDIDECQDPDTCSQLCVNLEGGYKCQCEEGFQLDPHTK
ACKAVGSIAYLFFTNRHEVRKMTLDRSEYTSLIPNLRNVVALDTEVASNRIYWSD
LSQRMICSTQLDRAHGVSSYDTVISRDIQAPDGLAVDWIHSNIYWTDSVLGTVSV
ADTKGVKRKTLFRENGSKPRAIVVDPVHGFMYWTDWGTPAKIKKGGLNGVDIYSL
VTENIQWPNGITLDLLSGRLYWVDSKLHSISSIDVNGGNRKTILEDEKRLAHPFS
LAVFEDKVFWTDIINEAIFSANRLTGSDVNLLAENLLSPEDMVLFHNLTQPRGVN
WCERTTLSNGGCQYLCLPAPQINPHSPKFTCACPD
GMLLARDMRSCLTEAEAAVATQETSTVRLKVSSTAVRTQHTTTRPVPDTSRLPGA
TPGLTTVEIVTMSHQALGDVAGRGNEKKPSSVRALSIVLPIVLLVFLCLGVFLLW
KNWRLKNINSINFDNPVYQKTTEDEVHICHNQDGYSYPSRQMVSLEDDVA
Disease Variant
>gi|62088398|dbj|BAD92646.1|
SGSGHCLAEAASMGPWGWKLRWTVALLLAAAGTAVGDRCERNEFQCQDGKCISYK
WVCDGSAECQDGSDESQETCLSVTCKSGDFSCGGRVNRCIPQFWRCDGQVDCDNG
SDEQGCPPKTCSQDEFRCHDGKCISRQFVC
DSDRDCLDGSDEASCPVLTCGPASFQCNSSTCIPQLWACDNDPDCEDGSDEWPQR
CRGLYVFQGDSSPCSAFEFHCLSGECIHSSWRCDGGPDCKDKSDEENCAVATCRP
DEFQCSDGNCIHGSRQCDREYDCKDMSDEV
GCVNVTLCEGPNKFKCHSGECITLDKVCNMARDCRDWSDEPIKECGTNECLDNNG
GCSHVCNDLKIGYECLCPDGFQLVAQRRCEDIDECQDPDTCSQLCVNLEGGYKCQ
CEEGFQLDPHTKACKAVGSIAYLFFTNRHE
VRKMTLDRSEYTSLIPNLRNVVALDTEVASNRIYWSDLSQRMICSTQLDRAHGVS
SYDTVISRDIQAPDGLAVDWIHSNIYWTDSVLGTVSVADTKGVKRKTLFRENGSK
PRAIVVDPVHGFMYWTDWGTPAKIKKGGLN
GVDIYSLVTENIQWPNGITLDLLSGRLYWVDSKLHSISSIDVNGGNRKTILEDEK
RLAHPFSLAVFEDKVFWTDIINEAIFSANRLTGSDVNLLAENLLSPEDMVLFHNL
TQPRGVNWCERTTLSNGGCQYLCLPAPQIN
PHSPKFTCACPDGMLLARDMRSCLTEAEAAVATQETSTVRLKVSSTAVRTQHTTT
RPVPDTSRLPGATPGLTTVEIVTMSHQALGDVAGRGNEKKPSSVRALSIVLPIVL
LVFLCLGVFLLWKNWRLKNINSINFDNPVYQKTTEDEVHICHNQDGYSYPSMVSL
EDDVA
Above, you’ve been given FASTA sequences of an unknown protein and that same
unknown protein with a change in amino acid sequence. The unknown protein with no
amino acid sequence changes is the normal variant in the population (wild-type). The
protein sequence with the amino acid change(s) result in a variation of the protein that
does not function properly. These sequences will be used as part 2 of your Proteomics
Laboratory.
This protein has an important function for humans. Humans would not be able to survive
if they were no this protein. Further more, individuals who have a variant that does not
function properly are very sick. These patients have a disease in which thick mucous
builds up in the lungs. This mucous blocks the ability of the patient to breathe properly.
Often these patients die at a very young age. Follow the questions below to determine
what protein we are studying, where the change of amino acid sequence is. Also, for
those individuals carry the variant that doesn’t function properly, what disease they have.
Above, you’ve been given FASTA sequences of an unknown protein and that same
unknown protein with a change in amino acid sequence. The unknown protein with no
amino acid sequence changes is the normal variant in the population (wild-type). The
protein sequence with the amino acid change(s) result in a variation of the protein that
does not function properly. These sequences will be used as part 2 of your Proteomics
Laboratory.
This protein has an important function for humans. Humans would not be able to survive
if they were no this protein. Further more, individuals who have a variant that does not
function properly are very sick. Instead of having red blood cells that are bi-concave
(like we saw in the microscopy lab, they look like sickles). This has important effects on
how much oxygen can be carried by these cells. Follow the questions below to determine
what protein we are studying, where the change of amino acid sequence is. Also, for
those individuals carry the variant that doesn’t function properly, what disease they have.
Now, let’s work through to figure out which protein we are studying.
1. Take your unknown sequence (normal variant) and place that sequence into blast. The
first entry should be your protein of interest. Write down the name of the protein.
2. Look at the list of homologs spawned from your BLAST search. How many proteins
from the list can be declared homologs (Hint: Think about e-values)?
3. From your list of homologs choose 5 from different species than Homo sapiens. Note
the identities and similarities (positives) in the slot below. Also from your list note the
latin name for each species, as well as the common name. For instance, if you chose one
from Mus musculus you would have chosen the mouse.
4. For those 5 homologs you’ve chosen above, find the sequences as you did last week,
and put them in FASTA format. Then copy and paste them into a Microsoft Word file as
you did last week. We will use these sequences later.
5. Now let’s find some information about your protein. Therefore, let’s use the NCBI
protein program (the first one you used last week). In the box at the top of the page, type
the name of your protein, and then search. You will get back many hits from your search.
Find either the first or second entry for Homo sapiens.
6. In these entries there should be a summary paragraph. This summary paragraph will
tell you all about the protein. In the space provided below, discuss, as summarized in the
paragraph the function of the unknown protein.
7. In the space provided below, note which disease a patient might have if they can only
make a non-functional variant of your protein.
8. Now let’s find some information about this disease. From the summary paragraph,
write down the given information about the disease. At this point, you should use the
web to search for further information about the disease. Note the symptoms in detail,
people are most likely to be affected, where the disease would most likely be found, life
span of patients etc.
9. Now, let’s go ahead and do some sequence alignments. Go to ClustalW. Align your
normal human sequence, with the five homologs (be sure to place your human sequence
first). Note the alignment scores between the human sequence, and the protein sequences
of each homolog.
10. Using ClustalW, make a cladogram, using the normal variant human sequence, and
the 5 homologs. What conclusions can you make from your cladogram?
11. Now let’s figure out how your disease variant differs from your normal variant. Use
ClustalW. Place the normal variant, and the disease variant in the sequence box and align
them. In detail, state the amino acid number and the change in amino acid sequence in
the space below.
Download