KAW_MolGen

advertisement
Introducing Database Mining to
Molecular Genetics Students
(Juniors & Seniors)
Karl Wilson
Objectives:
• Introduce students to online protein and
nucleotide databases (via GenBank at the NCBI
website).
• Specific operations:
– Use of BLAST to find similar sequences (protein &
nucleotide)
– Downloading and saving sequences
– Comparison of sequences and alignment with
ClustalW
– Interpretation of phylogenetic data.
• The “test” protein sequence:
AAA92063. cysteinyl endopep...[gi:1223922]
LOCUS AAA92063 362 aa linear PLN 22-AUG-2002
DEFINITION cysteinyl endopeptidase [Vigna radiata].
ACCESSION AAA92063
VERSION AAA92063.1 GI:1223922
DBSOURCE locus VRU49445 accession U49445.1
KEYWORDS .
SOURCE Vigna radiata
ORGANISM Vigna radiata
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids
I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna.
REFERENCE 1 (residues 1 to 362)
AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission
JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences,
State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY
13902-6000, USA
Student given VRU49445 sequence (only)
via e-mail or Blackboard
Find sequence via Entrez,
download in Fasta format
VRU49445 sequence
Submit to Protein-Protein
BLAST (BLASTP)
BLASTP results – related sequences
Score E Sequences producing significant alignments: (bits) Value
gi|1223922|gb|AAA92063.1| cysteinyl endopeptidase [Vigna ra... 705 0.0
gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo... 686 0.0
gi|445927|prf||1910332A Cys endopeptidase 684 0.0
gi|7435774|pir||S22502 cysteine proteinase (EC 3.4.22.-) - ... 677 0.0
gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo... 674 0.0
gi|1345573|emb|CAA40073.1| endopeptidase (EP-C1) [Phaseolus... 673 0.0
gi|31559530|dbj|BAC77523.1| cysteine proteinase [Glycine ma... 657 0.0
gi|31559526|dbj|BAC77521.1| cysteine proteinase [Glycine ma... 653 0.0
gi|7435817|pir||T08122 cysteine endopeptidase (EC 3.4.22.-)... 580 e-164
gi|600111|emb|CAA84378.1| cysteine proteinase [Vicia sativa] 540 e-152
gi|3688528|emb|CAA06243.1| pre-pro-TPE4A protein [Pisum sat... 539 e-152
gi|18423124|ref|NP_568722.1| cysteine proteinase [Arabidops... 521 e-147
gi|30141021|dbj|BAC75924.1| cysteine protease-2 [Helianthus... 516 e-145
gi|1076552|pir||S49166 cysteine proteinase (EC 3.4.22.-) pr... 510 e-143
gi|7435811|pir||T06708 cysteine proteinase (EC 3.4.22.-) T2... 490 e-137
gi|1169186|sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu... 490 e137 gi|25289998|pir||JC7787 carrot seed cysteine proteinase (EC... 485 e-136
gi|18408616|ref|NP_566901.1| cysteine proteinase, putative ... 483 e-135
gi|1173630|gb|AAB37233.1| cysteine proteinase 470 e-131
gi|4731374|gb|AAD28477.1|AF133839_1 papain-like cysteine pr... 462 e-129
gi|22331686|ref|NP_680113.1| cysteine proteinase, putative ... 462 e-129
BLASTP results – related sequences
Copy most similar cDNA
sequences (in FASTA format)
cDNA sequences from P. vulgaris, V. mungo,
G. max, V. sativa, etc.
Submit sequences to CLUSTALW at Biology
Workbench website.
Alignment of the Cysteine Proteases from
Vigna, Phaseolus, Glycine, and Vicia.
gi_118158_sp_P12412_CYSP_VIG
gi_1223922_gb_AAA92063.1__cy
gi_31559526_dbj_BAC77521.1__
gi_31559530_dbj_BAC77523.1__
gi_600111_emb_CAA84378.1__cy
MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS
MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS
MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS
MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS
MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT
gi_118158_sp_P12412_CYSP_VIG
gi_1223922_gb_AAA92063.1__cy
gi_31559526_dbj_BAC77521.1__
gi_31559530_dbj_BAC77523.1__
gi_600111_emb_CAA84378.1__cy
RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA
RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA
RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA
RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA
RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA
gi_118158_sp_P12412_CYSP_VIG
gi_1223922_gb_AAA92063.1__cy
gi_31559526_dbj_BAC77521.1__
gi_31559530_dbj_BAC77523.1__
gi_600111_emb_CAA84378.1__cy
GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG
GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG
GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG
GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG
DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG
Unrooted Phylogenetic Tree
• Add more sequences (e.g. of nonlegumes) and see how tree changes?
• Repeat, all of above, but this time do with
nucleotide sequences of the same
proteins (cDNA) sequences.
• Compare results.
Possible Additions:
• Add more sequences (e.g. of nonlegumes) and see how tree changes?
• Repeat, all of above, but this time do with
nucleotide sequences of the same
proteins (cDNA) sequences. Compare
results with those from protein sequences.
• Compare the nucleotide sequences of the
cDNA and gene pairs where available –
exons/introns?
ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGA
ACGTGTGACGAATCAAAGGTG----------------------------ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–ACCTGTGATGCATCAAAGGTG----------------------------AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC
--------------------------------------------------TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC
-------------------------------------------------TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA
----------------AATGACCTAGCTGTGTCAATTGATGGTCATGAAA
TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA
----------------AATGACCTAGCTGTGTCAATTGATGGCCATGAGA
• Examine targeting of cysteine protease –
e.g. with TargetP or PSORT.
PSORT : http://psort.ims.u-tokyo.ac.jp/
With AAA92063 (Vigna radiata cysteine protease):
endoplasmic reticulum (lumen)
outside
lysosome (lumen)
endoplasmic reticulum (membrane)
--- Certainty= 0.910(Affirmative)
--- Certainty= 0.719(Affirmative)
--- Certainty= 0.190(Affirmative)
--- Certainty= 0.100(Affirmative)
Download