Bioinformatics 2 Assignment

advertisement
Assignment 2 (50 points)
This assignment will be dedicated to protein analysis
The goal of this exercise:
* To learn about the basic protein structure prediction tools available on the WWW.
You will take a protein sequence, and predict features related to its primary, secondary
and tertiary structures.
*To get experience with phylogenic programs
* To use some of the important modeling WWW servers
Part-1: Sequence analysis (Sub-cellular localization)
1. From NCBI or SwissProt databases pick proteins that allow you to test the following
purposes:
-Signal peptide (secretory pathway)
-Chloroplast
-Mitochondria
-Peroxisome
-N-glycosylation
-Transmembrane domains
-GPI-anchoring
-Hydrophobicity
2. Use the websites from today’s lecture
3- Submit:
*Graphical presentations whenever is possible.
* Discuss and interpret your data and indicate which program you used.
Part-2: Phylogeny analysis.
1. Pick a protein sequence. Use your favorite protein sequence, or pick any random
sequence, or use what you used in Assignment 1. Save it as FASTA format.
2. Search Swiss-Prot or NCBI using BLAST search program
3- Pick randomly between 10 and 15 sequences (proteins). Submit the list of these sequences
(not the sequences).
4-Using this web-based program (http://align.genome.jp/ ), perform several trees.
-Submit three types of trees.
-Answer:
*What the difference between the methods that are used to create the trees?
* Discuss your trees.
1
Part-3: 3-D fold prediction.
We will be visiting these websites. The easy one to work with is the first (PHYRE), but I
encourage you to try the two other sites.
Protein Homology/analogY Recognition Engine (Phyre2, NEW!)
(http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index ) for structure modeling
PredictProtein for secondary structure prediction
(http://www.predictprotein.org/newwebsite/submit.html)
UCLA Fold Recognition (http://www.doe-mbi.ucla.edu/Services/FOLD/)
1. Fetch these 2 protein sequences from NCBI: AAV51943 and At3g46550 or your protein
* Report some details about the proteins
* For further processing, save the protein sequences on your computer as
FASTA format.
* Or you can use the following sequences without creating a FASTA file
2. Create a 3-D model for your protein sequences
* go to Phyre website and submit your sequences.
* You will get by email the link to the results of the modeling.
Or you can wait and get them for directly from the interactive web
* Report top template information.
* How many residues out of the sequence have been modeled? Confidence?
* Submit a picture of the model for each protein that you can view via Jmol program.
* What secondary elements can you identify? at what positions? Can you attempt to
change the presentation of the molecule?
Sequences you can use:
>alb8
VTGLRAGHRRVNENAWDVRTPVHLGSSFYDVPSVRAGRCTLGERELTLAGDVGGARLLHLQCHFGLDTLSWARRGARATGVDFSRAAVTA
ARELSAELGVPAVFHRADVQDLPAELSGFDLAVTTYGVTCWLEDLSAWAASVHGALRPGGRFLLVEFHPLLELALPGAVSGHGSYFGSPDPPP
TATSGTYTDPDAPIFYEEYRWQHPVGDVVNALIGAGFELTGLGEYPDSPVPLFDERLAGSPLAPAPRSYSITARRKS
>alb7
SSGLVPRGSGMKETAAAKFERQHMDSPDLGTGGGSGIEGRMAALFGALGRDQERARATLNLVPSENVLSPLARVPFALDAYARYFFDHKRM
FGAWSFFGGTGAGAIEQETLLPLLRDQAQAPFVNPQPISGLNCMTAAMSALASPGDTVVLIPTDAGGHMSTAGVARRLGLHVLTLPMADAHT
VDHEALGALLRSERPALVYLDQSTVLFPLDCAPLREVIDRESPRTLLHFDSRHLNGLILSKALANPLDRGADTFGGSTHKTLAGPHKGFLATRR
EDLSERIDASTADLVSHHHPAEVLSLAVTLLELRDRDGAGYGAAILANARALAARLHERGAAVAAADRGFTGCHQVWLDTRSADEGVAMA
DRLYAAGVAVNRVGVPGVRGAAFRLSSAEVTRCGATEADSTELADIIADVVVDGAPTDRVASRAAALRARLYRPRYCFEDDALEDPAVPEW
LRELAAAVGRGVYGEDR
>alb4
NSYFEHPSIAVLDRDEILFAVEDERFTGIKHGRTYSPYQTYLPVASLYHGLAAVDATVDDIDEIGYSYHRWTHLRSLAGCFTGKRVSGFREELT
AFLSLVNLRQAMRSGYDIPRRYRDRIFPEKLARVPFREYHHHLAHAASAFHCSDFEEALVVVADGAGERSATSVYRGRGGQLERIGGVDLPNS
LGIFYSMITAHLGFEPFSDEFKVMGLAAYGEPAHRQACSRILRLGPDGSYVLDLAALRSLDTLLGPARRPGEPLAQRHKDIARSVQDRLTEALH
HVLGHWLGRTGLRNVCLAGGTFLNCVANGSLARDPRIEGIFVQPAAHDAGTAIGAAALSAVRRGGGPKVVFRSAALGTSHTAAACEKACAA
AEVPHVRPAPEDMIDAVARRLADGEVVGVFRGRMEFGPRALGMRSLLASPADPAMRDRLNRIKGREDFRPVAPIVLREHFDTYFDGQPNRYM
LFTTRALERTVREAPSAVHVDGTARVQCVQEDEDPWLHALITRFAELTGLPMVINTSLNVRGKPIVESPAEALACLGSTAMNLLVLEDVLAGP
GAPDAVRQAVGSAGSGVAEGTA
2
Download