Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson Objectives: • Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). • Specific operations: – Use of BLAST to find similar sequences (protein & nucleotide) – Downloading and saving sequences – Comparison of sequences and alignment with ClustalW – Interpretation of phylogenetic data. • The “test” protein sequence: AAA92063. cysteinyl endopep...[gi:1223922] LOCUS AAA92063 362 aa linear PLN 22-AUG-2002 DEFINITION cysteinyl endopeptidase [Vigna radiata]. ACCESSION AAA92063 VERSION AAA92063.1 GI:1223922 DBSOURCE locus VRU49445 accession U49445.1 KEYWORDS . SOURCE Vigna radiata ORGANISM Vigna radiata Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna. REFERENCE 1 (residues 1 to 362) AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences, State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY 13902-6000, USA Student given VRU49445 sequence (only) via e-mail or Blackboard Find sequence via Entrez, download in Fasta format VRU49445 sequence Submit to Protein-Protein BLAST (BLASTP) BLASTP results – related sequences Score E Sequences producing significant alignments: (bits) Value gi|1223922|gb|AAA92063.1| cysteinyl endopeptidase [Vigna ra... 705 0.0 gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo... 686 0.0 gi|445927|prf||1910332A Cys endopeptidase 684 0.0 gi|7435774|pir||S22502 cysteine proteinase (EC 3.4.22.-) - ... 677 0.0 gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo... 674 0.0 gi|1345573|emb|CAA40073.1| endopeptidase (EP-C1) [Phaseolus... 673 0.0 gi|31559530|dbj|BAC77523.1| cysteine proteinase [Glycine ma... 657 0.0 gi|31559526|dbj|BAC77521.1| cysteine proteinase [Glycine ma... 653 0.0 gi|7435817|pir||T08122 cysteine endopeptidase (EC 3.4.22.-)... 580 e-164 gi|600111|emb|CAA84378.1| cysteine proteinase [Vicia sativa] 540 e-152 gi|3688528|emb|CAA06243.1| pre-pro-TPE4A protein [Pisum sat... 539 e-152 gi|18423124|ref|NP_568722.1| cysteine proteinase [Arabidops... 521 e-147 gi|30141021|dbj|BAC75924.1| cysteine protease-2 [Helianthus... 516 e-145 gi|1076552|pir||S49166 cysteine proteinase (EC 3.4.22.-) pr... 510 e-143 gi|7435811|pir||T06708 cysteine proteinase (EC 3.4.22.-) T2... 490 e-137 gi|1169186|sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu... 490 e137 gi|25289998|pir||JC7787 carrot seed cysteine proteinase (EC... 485 e-136 gi|18408616|ref|NP_566901.1| cysteine proteinase, putative ... 483 e-135 gi|1173630|gb|AAB37233.1| cysteine proteinase 470 e-131 gi|4731374|gb|AAD28477.1|AF133839_1 papain-like cysteine pr... 462 e-129 gi|22331686|ref|NP_680113.1| cysteine proteinase, putative ... 462 e-129 BLASTP results – related sequences Copy most similar cDNA sequences (in FASTA format) cDNA sequences from P. vulgaris, V. mungo, G. max, V. sativa, etc. Submit sequences to CLUSTALW at Biology Workbench website. Alignment of the Cysteine Proteases from Vigna, Phaseolus, Glycine, and Vicia. gi_118158_sp_P12412_CYSP_VIG gi_1223922_gb_AAA92063.1__cy gi_31559526_dbj_BAC77521.1__ gi_31559530_dbj_BAC77523.1__ gi_600111_emb_CAA84378.1__cy MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT gi_118158_sp_P12412_CYSP_VIG gi_1223922_gb_AAA92063.1__cy gi_31559526_dbj_BAC77521.1__ gi_31559530_dbj_BAC77523.1__ gi_600111_emb_CAA84378.1__cy RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA gi_118158_sp_P12412_CYSP_VIG gi_1223922_gb_AAA92063.1__cy gi_31559526_dbj_BAC77521.1__ gi_31559530_dbj_BAC77523.1__ gi_600111_emb_CAA84378.1__cy GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG Unrooted Phylogenetic Tree • Add more sequences (e.g. of nonlegumes) and see how tree changes? • Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. • Compare results. Possible Additions: • Add more sequences (e.g. of nonlegumes) and see how tree changes? • Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results with those from protein sequences. • Compare the nucleotide sequences of the cDNA and gene pairs where available – exons/introns? ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGA ACGTGTGACGAATCAAAGGTG----------------------------ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–ACCTGTGATGCATCAAAGGTG----------------------------AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC --------------------------------------------------TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC -------------------------------------------------TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA ----------------AATGACCTAGCTGTGTCAATTGATGGTCATGAAA TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA ----------------AATGACCTAGCTGTGTCAATTGATGGCCATGAGA • Examine targeting of cysteine protease – e.g. with TargetP or PSORT. PSORT : http://psort.ims.u-tokyo.ac.jp/ With AAA92063 (Vigna radiata cysteine protease): endoplasmic reticulum (lumen) outside lysosome (lumen) endoplasmic reticulum (membrane) --- Certainty= 0.910(Affirmative) --- Certainty= 0.719(Affirmative) --- Certainty= 0.190(Affirmative) --- Certainty= 0.100(Affirmative)