bi6103-20feb04.ppt

Recognition of Protein Features Limsoon Wong Institute for Infocomm Research BI6103 guest lecture on ?? March 2004 Copyright 2003 limsoon wong Lecture Plan • Membrane proteins • Subcellular localization Copyright 2003 limsoon wong Recognition of Transmembrane Helices Copyright 2003 limsoon wong Eukaryotic Cells • Eukaryotic cells have membrane-bound compartments with specialized functions Copyright 2003 limsoon wong Lipids & Membrane • Membrane is a double layer of lipids and associated proteins which define subcellular compartments or enclose the cell • Lipids consist of a “polar head group” and long-chain fatty acids • This dual nature promotes formation of lipid bilayers • “Hydrophobic tails” are shielded from aqueous environment • Water-soluble (i.e., charged or polar) molecules cant pass through this impermeable barrier • Permeability across the bilayer is regulated by membrane proteins that span the bilayer and function like channels or pores Copyright 2003 limsoon wong Membrane Proteins • Two types of membrane proteins: Integral vs peripheral • Two types of integral membrane proteins: all- vs -barrel all- -barrel Copyright 2003 limsoon wong Topography & Topology • topography: predict location of transmembrane segment • topology: predict location of N- and Ctermini wrt lipid bilayer • We focus on topography prediction for all- Lipid molecules membrane proteins Copyright 2003 limsoon wong Datasets • Jayasinghe et al. Protein Sci, 10:455-458, 2001 – 59 high resolution membrane proteins – www.biocomp.unibo.it/gigi/ENSEMBLE • Moller et al. Bioinformatics, 16:1159--1160, 2000 – 151 low resolution membrane proteins • Jones et al., Biochem., 33(10):3038--3049, 1994 – 38 multi-spanning and 45 single-spanning membrane proteins – topologies experimentally determined • Sonnhammer et al., ISMB, 6:175-182, 1998 – 108 multi-spanning and 52 single-spanning membrane proteins – most of experimentally determined topologies, but less reliably determined than Jones et al. Copyright 2003 limsoon wong Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices ER • E. coli Lep protein contains two TM domains (H1, H2) and C-terminal doman P2 • Translocation of P2 to lumenal side is easy to test by glycoslation • Replace H2 by 40 residue poly-L segment LIK4L21XL7VL10Q3P • The poly-L segment can form either one long TM or 2 closely-spaced TM helices, depending on what is substituted for X Copyright 2003 limsoon wong Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices glycoslated non-glycoslated • Using the poly-L segment, measure “turn” propensity of the 20 amino acids by substituting them for the X in the poly-L segment • Hydrophobic residues (I, V, L, F, C, M, A) do not induce turn • Charged and polar residues (except S & T) induce turn • Exercise: – What are the charged/polar residues? – What could be reason of S & T not inducing turn? Copyright 2003 limsoon wong Monne et al., JMB, 288:141--145, 1999: Turn Propensity Scale for TM Helices • In all- membrane proteins, – hydrophobic residues prefer membrane env and have low turn propensity – charged & polar residues induce turn formation to avoid membrane interior  prediction of TM helix  distinction of 1 long TM helix vs 2 closely spaced TM helices Monne et al., JMB, 288:141--145, 1999 Copyright 2003 limsoon wong Wiess et al, ISMB, 1:420--421, 1993 Hydrophobicity Approach • Inside of cellular membrane is hydrophobic • Segment of protein that spans membrane is expected to contain many hydrophobic amino acids  Locate segments that have high average “hydrophobicity” score Monne et al., JMB, 288:141--145, 1999 Copyright 2003 limsoon wong Wiess et al, ISMB, 1:420--421, 1993 Hydrophobicity Approach • Caveats: – may be unable to distinguish hydrophobic core of nonmembrane proteins vs. transmembrane regions – what are the right thresholds? • • • • find a segment of 10 to 70aa with hp > 0.71 expand to longer segment with hp > 0.35 mark this segment as TM repeat above starting from position after previous segment Adjustable thresholds Copyright 2003 limsoon wong An Example: Bacteriorhodopsin 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta 7 transmembrane helices http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=protein&list_uids=461610&dopt=GenPept&term=bacteriorhodopsin&qty=1 Copyright 2003 limsoon wong An Example: Bacteriorhodopsin • After applying hydrophobicity scale... 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta Copyright 2003 limsoon wong An Example: Bacteriorhodopsin • Compute hydrophobicity score, hp > 7 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta TM identified: 6/7, TM FP: 0 TM residue identified: 62/117, TM residue FP: 4 Copyright 2003 limsoon wong An Example: Bacteriorhodopsin • Expand segment, maintain hp > 5, avoid low hydrophobicity 1 gigtllmlig tfyfiargwg vtdkkareyy aitilvpgia saaylsmffg iglttvevag 61 maepleiyya ryadwlfttp lllldlalla nadrttigtl igvdalmivt gligalshtp 121 larytwwlfs tiaflfvlyy lltvlrsaaa elsedvqttf ntltalvavl wtaypilwii 181 gtegagvvgl gvetlafmvl dvta TM identified: 6/7, TM FP: 0 TM residue identified: 100/117, TM residue FP:15 Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, A HMM Approach • There are 3 main locations of a residue: – TM helix core (viz., in hydrophobic tail of membrane – TM helix cap (viz., in head of membrane) • cytoplasmic vs • non-cytoplasmic side of the helix core cyto – loops • cytoplasimc vs • non-cytoplasmic (short) vs • non-cytoplasmic (long) non-cyto  So needs HMM with 7 states • Exercise: What is the 7th state for? Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture cyto non-cyto Each state has an associated probability distribution over the 20 amino acids characterizing the variability of amino acids in the region it models Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture • The first 3 and last 2 core states have to be traversed. But all other core states can be bypassed. • This models core regions of 5--25 residues Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Architecture To model neutral amino acid distribution To model bias in amino acid usage near cap • The states of globular, loop, & cap regions. • The caps are 5 residues each. Since core is 5--25 residues, this allows for helices 15--35 residues long Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Training the HMM • Stage 1: Baum-Welch is used for maximum likelihood estimation from “diluted” labeled training data. As precise end of TM is only approximately known, we “dilute” by unlabeling 3 residues on each side of a helix boundary to accommodate this • Stage 2: Baum-Welch is used for maximum likelihood estimation from “relabeled” training data. The original training data are diluted as by unlabeling 5 residues on each side of a helix boundary. Model from Stage 1 is used to produce “relabeled training data” by relabeling this part under constraints of remaining labels • Stage 3: Model from Stage 2 is further tuned by a method for “discriminative” training, to maximize probability of correct prediction (Krogh, ISMB, 5:179--186, 1997) Copyright 2003 limsoon wong Krogh, ISMB, 5:179--186, 1997: Discriminative HMM Training Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Example Non-cytoplasmic TM segment Cytoplasmic Datasets • Jones et al., Biochem., 33(10):3038--3049, 1994 • Sonnhammer et al., ISMB, 6:175-182, 1998 Copyright 2003 limsoon wong Sonnhammer et al., ISMB, 6:175-182, 1998: TMHMM, Accuracy (10-CV) All TM segments & their orientation correctly predicted All TM segments correctly predicted, ignoring orientation precision Copyright 2003 limsoon wong Martelli et al. Bioinformatics, 19:i205--i211, 2003 ENSEMBLE NN HMM1 HMM2 ENSEMBLE Copyright 2003 limsoon wong ENSEMBLE: The Neural Network Part 1 h1 17 * 20 input units h2 HMM Input layer 17*2 inputs LOOP 15 hidden units 17 h5 Feed-forward back-propagation neural network • The NN part is a cascade shown above, a la Rost et al., Protein Science, 1995 Copyright 2003 limsoon wong ENSEMBLE: The HMM1 Part • HMM1 models the hydrophobic nature of most TM helices, a la Krogh et al. JMB 2001 & Sonnhammer et al., ISMB 1998 Copyright 2003 limsoon wong ENSEMBLE: The HMM2 Part • HMM2 models TM helices that are mix of hydrophobic and hydrophilic residues, ala Martelli et al., Bioinformatics 2002. Copyright 2003 limsoon wong ENSEMBLE: Predicting if a residue is in TM NN helix • • • • HMM1 HMM2 ENSEMBLE loop (inner I, outer O) NN(p,i) = NN(H,p,i)  NN(L,p,i) HMM1(p,i) = AP1(H,p,i)  AP1(I,p,i)  AP1(O,p,i) HMM2(p,i) = AP2(H,p,i)  AP2(I,p,i)  AP2(O,p,i) E(p,i) = (NN(p,i) + HMM1(p,i) + HMM2(p,i)) / 3 position E(p,i) > 0 means residue i of protein p is in TM helix Copyright 2003 limsoon wong Ensemble: Topography Prediction Fariselli et al., Bioinformatics, 2003 TM helix found by MaxSubSeq but would be missed w/o it NN HMM1 ENSEMBLE HMM2 MaxSubSeq This path is taken means positions m to j form a helix Copyright 2003 limsoon wong Ensemble: Topography Prediction Results 90% 85% 80% 75% 70% 65% 60% Jayasinghe (CV) Moller NN HMM1 HMM2 ENSEMBLE TMHMM2.0 MEMSAT PHD HMMTOP A prediction is considered correct if (a) the number of TM segments is correct and (b) the overlap between a predicted and a real TM segment > 8aa Copyright 2003 limsoon wong Topology Prediction: Postive-Inside Gavel et al., FEBS, 282:41--46, 1991 Rule • Positivelycharged residues (Lys and Arg) are enriched more than 2 fold in stromal vs luminal loops Copyright 2003 limsoon wong Topology Prediction: Ensemble “positive-inside” rule Copyright 2003 limsoon wong Ensemble: Topology Prediction Results 80% 75% 70% 65% 60% 55% 50% 45% 40% ENSEMBLE (rule 4) TMHMM2.0 MEMSAT PHD HMMTOP Jayasinghe (CV) Moller ENSEMBLE (rule 1) Copyright 2003 limsoon wong Short Break Copyright 2003 limsoon wong Subcellular Localization Copyright 2003 limsoon wong Compartments and Sorting • Eukaryotic cells requires proteins be targeted to their subcellular destinations • Protein sorting is determined by specific amino acid sequences, or “signals”, within the protein • Secretory pathway targets proteins to plasma membrane, some membranebound organelles such as lysosomes, or to export proteins from the cell Copyright 2003 limsoon wong Secretory Pathway • The secretory pathway consists of the endoplasmic reticulum (ER), Golgi apparatus and transport vesicles • The transport vesicles carry proteins from one compartment to the other • Exocytosis is mediated by fusion of secretory vesicles with the plasma membrane. • Endocytosis is the opposite of exocytosis and involves the uptake of extracellular material by pinching off vesicles from the plasma membrane • The contents of the endocytic vesicles are delivered to the lysosomes by membrane fusion • Lysosomes contain hydrolytic enzymes that breakdown macromolecules into the smaller subunits which can be utilized by the cell for its own biosynthesis Copyright 2003 limsoon wong Datasets • Reinhartdt & Hubbard, NAR, 26:2230--2236, 1998 – 2427 eukaryotic proteins for 4 locations (cytoplasmic, extracellular, nuclear,& mitochondrial) – 997 prokaryotic proteins for 3 locations (cytoplasmic, extracellular, & periplasmic) • Park & Kanehisa, Bioinformatics, 19:1656--1663, 2003 – 7589 eukaryotic proteins from 709 organisms for 12 locations (chloroplast, cytoplasmic, cytoskeleton, ER, extracellular, golgi, lysosomal, mitochondrial, nuclear, peroxisomal, plasma membrane, vacuolar) • Chou & Cai, JBC., 277:45765--45769, 2002 – 2191 proteins for 12 locations • Emanuelsson et al., JMB, 300:1005--1016, 2000 • Gardy et al., NAR, 31:3613--3617, 2003 Copyright 2003 limsoon wong Common Eukaryotic Protein Sorting Signals For a comprehensive list of cellular localization sites, see http://mendel.imp.univie.ac.at/CELL_LOC/index.html Copyright 2003 limsoon wong ~25aa Schematic View of Sorting Signals cleavage site Copyright 2003 limsoon wong SP signal peptide Sequence Logos of SP, mTP, & cTP mTP mitochondrial transfer peptide cTP chloroplast transit peptide Copyright 2003 limsoon wong Neural Network Approach: TargetP Emanuelsson et al., JMB, 300:1005--1016, 2000 • cTP, mTP, SP – 4 hidden units – feedforward NNs – input windows: • 55aa (cTP), 35aa (mTP), 27aa (SP) • sparsely encoded • Integrating Network – 0 hidden unit – feedforward NN – input is taken from the outputs of cTP, mTP, SP networks over 100aa at N-terminal cTP: chloroplast transit peptide, mTP: mitochondria transfer peptide, SP: signal peptide Copyright 2003 limsoon wong TargetP: Performance Dataset: Emanuelsson et al., JMB, 2000 Copyright 2003 limsoon wong Expert System Approach: PSORT Horton & Nakai, ISMB, 1997 A simplified version of the decision tree that PSORT uses to check and reason over various sorting signals Copyright 2003 limsoon wong A Refinement: PSORT-B Gardy et al., NAR, 31:3613--3617, 2003 • Sites considered – – – – – Localization sites cytoplasm or “unknown” inner membrane periplasm Bayesian outer membrane Network extracellular space SCLMotifs BLAST HMMTOP Outer Signal Membrane SubLocC Peptides Protein Copyright 2003 limsoon wong PSORT-B: SCL-BLAST • Homology to a protein of known localization is good indicator of a protein’s actual localization site  BLAST target protein against a database of proteins whose localization sites are known  Return localization sites of hits at E-value of 10e-10 over 80% of length Copyright 2003 limsoon wong PSORT-B: Motifs • Some motifs in PROSITE may be able to identify subcellular localization with 100% precision  Scan target protein against a database of such motifs (28 such 100%-precision motifs are known)  Return localization sites corresponding to the motif hits Copyright 2003 limsoon wong PSORT-B: HMMTOP • -helical transmembrane region is reliable indicator of localization to inner membrane  Scan target protein for transmembrane  helices using HMMTOP  Return localization site as “inner membrane” if >2  helices found Copyright 2003 limsoon wong PSORT-B: Outer Membrane Proteins • Outer-membrane proteins have characteristics barrel structure  Identify freq seq occurring only in -barrel proteins (279 such freq seq known)  Scan target protein for these freq seq  Return localization site as “outer membrane” if >2 such freq seq found Copyright 2003 limsoon wong PSORT-B: SubLocC • Overall amino acid composition is useful for recognizing cytoplasmic proteins  Trained SVM on overall amino acid composition to predict cytoplasmic vs noncytoplasmic, as in SubLoc  Analyze target protein’s amino acid composition using this SVM Copyright 2003 limsoon wong PSORT-B: Signal Peptides • Presence of signal peptide at Nterminal means protein not cytoplasmic  Train HMM and SVM to recognize signal peptides and their cleavage sites  If high-confidence cleavage site found by HMM in first 70aa of target protein, then “non-cytoplasmic”  If low-confidence cleavage site found, pass candidate signal peptide to SVM to confirm  If confirmed, then “non-cytoplasmic”  Otherwise, “unknown” Copyright 2003 limsoon wong PSORT-B: Bayesian Network • Bayesian Network integrates results from the 6 modules • Produces a score for each of the 5 possible localization sites • If a site scores >7.5, then predicts as a localization site of the target protein • If no site scores >7.5, then makes no prediction Copyright 2003 limsoon wong PSORT-B: Performance of Individual Modules Dataset: Gardy et al., NAR, 2003 Copyright 2003 limsoon wong PSORT-B: Performance wrt Localization Sites PSORT-B is a considerable improvement over original PSORT Dataset: Gardy et al., NAR, 2003 Copyright 2003 limsoon wong PSORT vs PSORT-B: Some Remarks • PSORT considers various signal/features in a top-down way driven by its reasoning tree • PSORT-B generates all signal/features in a bottom-up way, then integrate them for decision making using Bayesian Network • Machine learning “beats” human expert? Probably the number of features/rules needed is too much/complicated Copyright 2003 limsoon wong Amino acid composition of proteins residing in different sites are different Copyright 2003 limsoon wong Amino Acid Composition Differences • each cellular location • If the above is true, has own characteristic the amino acid physio-chemical composition environment differences wrt cellular location sites • proteins in each should be more location have adapted pronounced on thru evolution to that protein surfaces than environment protein interior • thus reflected in the • Exercise: Why? protein structure and amino acid composition Copyright 2003 limsoon wong Adaptation of Protein Surfaces Andrade et al., JMB, 1998 • To test the theory of adaptation of protein surfaces to subcellular localization, we do a plot of 3 types of composition vectors along their first two principal components Proportion of jth amino acid type in ith protein Copyright 2003 limsoon wong Adaptation of Protein Surfaces Andrade et al., JMB, 1998 Total amino acid composition vector Surface amino acid composition vector • Clearly total & surface composition vectors show better separation than interior composition vectors Interior amino acid composition vector Copyright 2003 limsoon wong Amino Acid Composition • This means can use amino acid composition vectors, especially those from protein surfaces, to predict subcellular localization! • Let’s see how this turn out…. Copyright 2003 limsoon wong Neural Networks: NNPSL Reinhardt & Hubbard, NAR, 26:2230--2236, 1998 Input1 fraction of each amino acid in the input protein cytoplasmic extracellular mitochodrial nuclear Input20 Copyright 2003 limsoon wong NNPSL: Performance • Outputs NNPSL have values 0 to 1. The difference () between the highest and the next highest nodes can be used as a reliability index 0 <  < 0.2 0.2 <  < 0.4 0.4 <  < 0.6 0.6 <  < 0.8 0.8 <  < 1 Dataset: Reinhardt & Hubbard, NAR, 1998 Copyright 2003 limsoon wong Performance Emanuelsson, BIB, 3:361--376, 2002 (940 proteins) (2738 proteins) Dataset: Emanuelsson et al., JMB, 2000 Copyright 2003 limsoon wong Markov Chain Yuan, FEBS Letters, 451:23--26, 1999 Why? Copyright 2003 limsoon wong Markov Chain: Performance (Eukaryotic) NNPSL 4th Order Markov Dataset: Reinhardt & Hubbard, NAR, 1998 Copyright 2003 limsoon wong Support Vector Machines: SubLoc Hua & Sun, Bioinformatics, 17:721--728, 2001 SVM nuclear vs rest 20-dimensional vector giving amino acid composition of the input protein SVM mitochondrial vs rest SVM extracellular vs rest SVM cytoplasmic vs rest ArgmaxX X-vs-rest The SVMs use • polynomial kernel with d = 9 (prokaryotic), K(Xi,Xj) = (Xi ·Xj + 1)d • RBF kernel with =16 (eukaryotic), K(Xi, Xj) = exp(-  |Xi - Xj|2 Copyright 2003 limsoon wong SubLoc: Performance NNPSL SubLoc (Eukaryotic) Dataset: Reinhardt & Hubbard, NAR, 1998 Copyright 2003 limsoon wong SubLoc: Robustness of Amino Acid Composition Approach • Amazingly, accuracy of SubLoc is virtually unaffected when the first 10, 20, 30, & 40 amino acids in a protein are deleted • Amino acid composition is a robust indicator of subcellular localization, and is insensitive to errors in N-terminal sequences Copyright 2003 limsoon wong Amino Acid Composition: Taking it Further • How about pairs of consecutive amino acids? (a.k.a 2-grams) How about 3grams, …, k-grams? • How about pseudo amino acid composition? • How about presence of entire functional domains? (I.e. think of the presence/absence of a functional domain as a summary of amino acid sequence info...) Copyright 2003 limsoon wong Functional Domain Composition Chou & Cai, JBC, 277:45765--45769, 2002 Training seqs of various localization sites Train SVM using these vectors xi = 1 means ith domain is present BLAST against db of known functional domains (SBASE-A) + amino acid composition Copyright 2003 limsoon wong Functional Domain Composition: Performance Dataset: Reinhardt & Hubbard, NAR, 1998 • Not so good • Why? Number of known domains in SBASE-A too small  Need to handle situation where a protein has no hit in known domains Copyright 2003 limsoon wong Functional Domain Composition Cai & Chou, BBRC, 305:407--411, 2003 If a protein got a hit in Interpro, use NN-5875D; else use NN-40D Training seqs of various localization sites BLAST against db of known functional domains (Interpro) NN-5875D: NN-40D: Train k-NN (k=1) using these vectors Train k-NN (k=1) using these vectors or, if no hit found Amino acid composition Pseudo amino acid composition Copyright 2003 limsoon wong Functional Domain Composition: Performance Dataset: Reinhardt & Hubbard, NAR, 1998 Copyright 2003 limsoon wong Notes Copyright 2003 limsoon wong References (Transmembrane) • Wiess et al. “Transmembrane segment prediction from protein sequence data”, ISMB, 420--421, 1993 • Gavel et al. “The positive-inside rule applies to thylakoid membrane proteins”, FEBS 282:41--46, 1991 • Monne et al. “A turn propensity scale for transmembrane helices”, JMB, 288:141--145, 1999 • Sonnhammer et al. “A hidden Markov model for predicting transmembrane helices in protein sequences”, ISMB, 6:175--182, 1998 • Martelli et al. “An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins”, Bioinformatics, 19(suppl):i205--i211, 2003 Copyright 2003 limsoon wong References (Transmembrane) • Von Heijne. “Membrane protein structure prediction”, JMB, 225: 487--494, 1992 • Jacoboni et al. “Prediction of the transmembrane regions of beta-barrel membrane proteins with a neural networkbased predictor”, Protein Sci., 10:779--787, 2001 • Martelli et al. “a sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins”, Bioinformatics, 18:S46--S53, 2002 • Moller et al. “Evaluation of methods for the prediction of membrane spanning regions”, Bioinformatics, 17:646--653, 2001 • Fariselli et al. “MaxSubSeq: an algorithm for segmentlength optimization. The case study of the transmembrane spanning segments”, Bioinformatics, 19:500--505, 2003 Copyright 2003 limsoon wong References (Transmembrane) • Rost et al. “Transmembrane helices predicted at 95% accuracy”, Protein Sci., 4:521--533, 1995 • Krogh et al. “Predicting transmembrane protein topology with a hidden Markov model: Application to complete genomes”, JMB, 305:567--580, 2001 • Andersson et al. “Different positively charged amino acids have similar effectson the topology of a polytopic transmembrane protein in E. coli”, JBC, 267:1491--1495, 1992 Copyright 2003 limsoon wong References (Subcellular Localization) • Horton & Nakai, “Better prediction of protein cellular localization sites with the k-nearest neighbours classifier”, ISMB, 5:147--152, 1997 • Gardy et al., “PSORT-B: Improving protein subcellular localization for Gram-negative bacteria”, NAR, 31:3613--3617, 2003 • Emanuelsson, “Predicting protein subcellular localization from amino acid sequence information”, BIB, 3:361--376, 2002 • Andrade et al., “Adaptation of protein surfaces to subcellular location”, JMB, 276:517--525, 1998 • Yuan, “Prediction of protein subcellular locations using Markov chain models”, FEBS Letters, 451:23--26, 1999 Copyright 2003 limsoon wong References (Subcellular Localization) • Emanuelsson et al., “ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites”, Protein Sci., 8:978--984, 1999 • Emanuelsson et al., "Predicting subcellular localization of proteins based on their N-terminal amino acid sequence", JMB, 300:1005-1016, 2000 • Hua & Sun, “Support vector machine approach for protein subcellular localization prediction”, Bioinformatics, 17:721--728, 2001 • Reinhardt & Hubbard, “Using neural networks for prediction of the subcellular location of proteins”, NAR, 26:2230--2236, 1998 Copyright 2003 limsoon wong References (Subcellular Localization) • Cai & Chou, “Nearest neighbour algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition”, BBRC, 305:407--411, 2003 • Chou & Cai, “Using functional domain composition and support vector machines for prediction of protein subcellular location”, JBC, 277:45765--45769, 2002 • Park & Kanehisa, “Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs”, Bioinformatics, 19:1656--1663, 2003 Copyright 2003 limsoon wong

bi6103-20feb04.ppt

Related documents

Products

Support

bi6103-20feb04.ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib