PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS Sudhakar Reddy Patrick Shih Chrissy Oriol Lydia Shih Proteins And Secondary Structure Sudhakar Reddy Project Goals To predict the secondary structure of a protein using artificial neural networks. STRUCTURES Primary structure: linear arrangement of amino acid (a.a) residues that constitute the polypeptide chain. SECONDARY STRUCTURE Localized organization of parts of a polypeptide chain, through hydrogen bonds between different residues. Without any stabilizing interactions , a polypeptide assumes random coil structure. When stabilizing hydrogen bond forms, the polypeptide backbone folds periodically in to one of two geometric arrangements viz. ALPHA HELIX BETA SHEET U-TURNS ALPHA HELIX A polypeptide back bone is folded in to spiral that is held in place by hydrogen bonds between backbone oxygen atoms and hydrogen atoms. The carbonyl oxygen of each peptide bond is hydrogen bonded to the amide hydrogen of the a.a 4 residues toward the C-terminus Each alpha helix has 3.6 a.a per turn From the backbone side chains point outward Hydrophobic/hydrophilic quality of the helix is determined entirely by side chains, because polar groups of the peptide backbone are already involved H-bonding in the helix and thus are unable to affect its hydrophobic/hydrophilic. ALPHA HELIX THE BETA SHEET Consists of laterally packed beta strands Each beta strand is a short (5-8 residues), nearly fully extended polypeptide chain Hydrogen bonding between backbone atoms in a adjacent beta strands, within either the same or different polypeptide chains forms a beta sheet. Orientation can be either parallel or anti-parallel. In both arrangements side chains project from both faces of the sheet. THE BETA SHEET THE BETA SHEET TURNS Composed of 3-4 residues , are compact, U-shaped secondary structures stabilized by H-bonds between their end residues. Located on the surface of the protein, forming a sharp bend that redirects the polypeptide backbone back toward the interior. Glycine and proline are commonly present. Without these turns , a protein would be large, extended and loosely packed. TURNS MOTIFS MOTIFS: regular combinations of secondary structure. – Coiled coil motif – Helix-loop-helix(Ca+) – Zinc finger motif. COILED-COIL MOTIF HELIX-LOOP-HELIX (CA+) ZINC-FINGER MOTIF FUTURE Protein structure identification is key to understanding biological function and its role in health and disease Characterizing a protein structure helpful in the development of new agents and devices to treat disease Challenge of unraveling the structure lies in developing methods for accurately and reliably understanding this relationship Most of the current protein structures have been characterized by NMR and X-Ray diffraction Revolution in sequencing studies-growing data base-only 3000 known structures ADVANTAGE Very few confirmations of protein are possible and structure and sequence are directly related to each other, we can unravel the secondary structure by developing an efficient algorithm, which compares new sequences with the ones available, and use them in health care industry. WHY SECONDARY STRUCTURE? Prediction of secondary structure is an essential intermediate step on the way to predicting the full 3-D structure of a protein If the secondary structure of a protein is known, it is possible to derive a comparatively small number of possible tertiary structures using knowledge about the ways that secondary structural elements pack Artificial Neural Network (ANN) Peichung Shih Biological Neural Network Artificial Neural Network X1k : Input from X1 X2k : Input from X2 W1k : Weight of X1 W2k : Weight of X2 X0k : Bias term W0k : Weight of bias term S Q : Threshold 1 : Nonlinear function qk : Output of node k -1 Artificial Neural Network - Example X0 = 1 X 1= 1 X2 = 2 W0 = 2 W1 = 1 + W2 = 2 + Q= 6 2 Xi Wi 1 2 11 2 2 7 i 0 if (7 Q) output else exit (0); Xi Wi ; F(x) = ( 1 + e-x )-1 7 -1 -1 F(7 ) 1 1 1 7 e Output 1 1 0.9991 1 Paradigms of ANN - Overview Topology Feedback Learning Binary Adaptive Resonance Theory (ART1) Unsupervised Analog Adaptive Resonance Theory (ART2) Brain-State-in-a-Box (BSB) Fuzzy Cognitive Map (FCM) Supervised Feedforward Fuzzy Associative Memory (FAM) Learning Vector Quantization (LVQ) Perceptron Perceptron Adaline Adaline&&Madaline Madaline Backpropagation Backpropagation(BP) (BP) Paradigms of ANN - Feedforward Topology Learning Unsupervised Supervised Feedback Feedforward Paradigms of ANN - feedback Topology Learning Unsupervised Supervised Feedback Feedforward Paradigms of ANN - supervised Topology Learning Unsupervised Supervised Feedback Feedforward Paradigms of ANN - Unsupervised Topology Learning Unsupervised Supervised Feedback Feedforward Paradigms of ANN - Overview Topology Feedback Learning Binary Adaptive Resonance Theory (ART1) Unsupervised Analog Adaptive Resonance Theory (ART2) Brain-State-in-a-Box (BSB) Fuzzy Cognitive Map (FCM) Supervised Feedforward Fuzzy Associative Memory (FAM) Learning Vector Quantization (LVQ) Perceptron Perceptron Adaline Adaline&&Madaline Madaline Backpropagation Backpropagation(BP) (BP) Perceptron One of the earliest learning networks was proposed by Rosenblatt in the late 1950's. RULE: net = w1I1 + w2I2 if net > Q then output = 1, otherwise o = 0. MODEL: Perceptron Example : AND Operation y QQ W=W Output correct? N O=1;T0 Initial Network: QQ1 Q = 1.5 0.5 1 + 0.5 1 O=0;T1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 1 1 1 y QQ W=W Output correct? 0 N 0.5 O=1;T0 Q = 1.5 0.5 0.5 1 QQ1 + 0.5 1.5 1 O=0;T1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 1 0 0 y QQ W=W Output correct? 0 N O=1;T0 Q = 0.5 0.5 1 O=0;T1 QQ1 1.5 0 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 0 1 0 y QQ W=W Output correct? 1 N 1.5 O=1;T0 Q = 0.5 0.5 0 QQ1 1.5 0.5 1 O=0;T1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 0 0 0 y QQ W=W Output correct? 0 N O=1;T0 Q = 1.5 0.5 0 O=0;T1 QQ1 0.5 0 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Input I1 Input I2 Target 1 1 1 Operation y QQ W=W Output correct? 0 N 0.5 O=1;T0 Q = 1.5 1.5 1 0.5 QQ1 0.5 1.5 1 O=0;T1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 1 0 0 y QQ W=W Output correct? 1 N 1.5 O=1;T0 Q = 0.5 0.5 1 1.5 O=0;T1 QQ1 1.5 0 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 0 1 0 y QQ W=W Output correct? 0 N O=1;T0 Q = 1.5 0.5 0 O=0;T1 QQ1 1.5 1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Perceptron Example : AND Operation Input I1 Input I2 Target 0 1 0 y QQ W=W Output correct? 0 N O=1;T0 Q = 1.5 0.5 0 O=0;T1 QQ1 1.5 1 QQ-1 I0 I1 I0 I1 W=W W=W-1 W=W W=W+1 Hidden Layer 1 (0, 1) (1, 1) 1 0 (0, 0) (1, 0) 0 XOR OR AND Hidden Layer Input I1 Input I2 Target 1 1 0 1 0 1 0 1 1 0 0 0 0 ?? Hidden Layer Input I1 Input I2 Target 1 1 0 1 0 1 0 1 1 0 0 0 0.5 1 -2 1 1 1 1 1.5 1 1 1 1 1 How Many Hidden Nodes? We have indicated the number of layers needed. However, no indication is provided as to the optimal number of nodes per layer. There is no formal method to determine this optimal number; typically, one uses trial and error. Hidden Units Q3(%) 0 62.50 5 61.60 10 61.50 15 62.60 20 62.30 30 62.50 40 62.70 60 61.40 JNET AND JPRED CHRISSY ORIOL JNET •Multiple Alignement •Neural Network •Consensus of methods TRAINING AND TESTS • 480 proteins train (1996 PDB) • 406 proteins test (2000 PDB) Blind test 7-fold cross validation test MULTIPLE ALIGNMENTS ALIGNMENTS • Multiple sequence alignment constructed • Generation of profiles Frequency counts of each residue / total residue in the column (expressed as percentage) Each residue scored by its value from BLOSUM62 and the scores were averaged based on the number of sequence in that column Profile HMM generated by HMMER2 PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool) o Frequency of residue o PSSM (Position Specific Scoring Matrix) • Uses: HMM PROFILE Statistical descriptions of a sequence family's consensus Position-specific scores for residues, insertions and deletions • Profiles: Captures important information about the degree of conservation at different positions Varying degree to which gaps and insertions and deletions are permitted PSI-BLAST PROFILE Full length seq. from the initial PSIBlast search, extracted from the database, and ordered by pvalue Align [a] and [b] Remove gaps in [a] and the column below the gaps to form a restrained profile which better represents sequence [a] Align [c] to profile of [a] and [b] Iterate addition of each sequence from PSIBlast search until all are aligned Alignment profile based on the query sequence to be predicted PSI-BLAST PROFILE • Iterative Low complexity sequences polluted searching profile • Filtered database to “mask” out: Low complexity sequences (SEG) Coiled-coil regions (HELIXFILT) Transmembrane helices (HELIXFILT) NUERAL NETWORK NUERAL NETWORK • Two Nueral Network Used 1st o Sliding window of 17 residues o 9 hidden nodes o 3 outputs 2nd o Sliding window of 19 residue o 9 hidden nodes o 3 outputs CONSENSUS COMBINATION OF PREDICTION METHODS CONSENSUS COMBINATION OF PREDICTION METHODS “Jury Agreement” (Identical predictions by all methods Q3 = 82%) • • “No Jury” (Q3 = 76.4%) Trained another neural network Q 3 predicted 100 observed (i H ,E ,C ) ASSESMENT OF ACCURACY Segment Overlap: min ov (sobs ; spred ) d Sov len(s1 ) N max ov (sobs ; spred ) s 1 Confidence = 10 C (outmax - outnext) RIBONUCLEASE A KEY “H” – helix “E” – strand “B” - buried residue “-” exposed residue “*” – no jury JNET OUTPUT YourSeq YA60_PYRHO TF19_HUMAN Q9VUZ8 YRGK_CAEEL Y691_METJA YK68_ARCFU YF69_SCHPO YMW4_YEAST : : : : : : : : : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK ERALIEAQIQAILRKILTPEARERLARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREFRIKW. ..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYGQLSEKVSEQGLIEILKKVSQQEKTTTVKFN ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPEKAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD ARAENQETAKGMISQILDQAAMQRLSNLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATSVKFD ..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLGRLPIPLSDEDFKALLERISALKRKREIKIV MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPEIAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK ..QEVQDEMRNLLSQILEHPARDRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEKRNETKIVI .AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATNNVTHKITEAEIVSILNGIAKQQNNSKIIFE OrigSeq : 1---------11--------21--------31--------41--------51--------61--------71-------- : : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : OrigSeq jalign jfreq jhmm jnet jpssm : : : : : Jpred : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEE-- : Jpred MCoil MCoilDI MCoilTRI Lupas 21 Lupas 14 Lupas 28 : : : : : : ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- : : : : : : MCoil MCoilDI MCoilTRI Lupas 21 Lupas 14 Lupas 28 Jnet_25 Jnet_5 Jnet_0 Jnet Rel : : : : ---BB---B--BBB-BB---B--BB--B-BB---BB-BBB-BBB-BB-BB-B---B----BB-BB--B--------B-------------BB--B----B---B--B----------B---B--B--------------B--BB----------------------------------------------------B---B--B--------------B------------------79889998888998643697888849188454657899999999988626987657778999999986007883747728 : : : : Jnet_25 Jnet_5 Jnet_0 Jnet Rel --HHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EE---HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH----EEEEE--HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH---EEEEE-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEEE--HHHHHHHHHHHHHH--HHHHHHH-HEEEE---HHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE--- : : : : : : : : : : : : : : YourSeq YA60_PYRHO TF19_HUMAN Q9VUZ8 YRGK_CAEEL Y691_METJA YK68_ARCFU YF69_SCHPO YMW4_YEAST jalign jfreq jhmm jnet jpssm JPRED SERVER Consensus web server •JNET – default method •PREDATOR • Neural network focused on predicting hydrogen bonds •PHD - PredictProtein • Neural network focused on predicting hydrogen bonds JPRED SERVER cont. •NNSSP – Nearest-neighbor SS prediction •DSC – Discrimination of protein Secondary structure Class • Based on dividing secondary structure prediction into the basic concepts for prediction and then use simple and linear statistical methods to combine the concepts for prediction •ZPRED • physiochemical information •MULPRED •Single sequence method combination YourSeq OrigSeq : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : YourSeq : ERALIEAQIQAILRKILTPEARERLARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREFRIKW. : YA60_PYRHO : ..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYGQLSEKVSEQGLIEILKKVSQQEKTTTVKFN : TF19_HUMAN : ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPEKAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD : Q9VUZ8 : ARAENQETAKGMISQILDQAAMQRLSNLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATSVKFD : YRGK_CAEEL : ..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLGRLPIPLSDEDFKALLERISALKRKREIKIV : Y691_METJA : MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPEIAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK : YK68_ARCFU : ..QEVQDEMRNLLSQILEHPARDRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEKRNETKIVI : YF69_SCHPO : .AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATNNVTHKITEAEIVSILNGIAKQQNNSKIIFE : YMW4_YEAST : --3-273433568336-522-43--25838573836556-2384484316682-37581274298238323542-3422- : consv : 1---------11--------21--------31--------41--------51--------61--------71-------- : : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK : OrigSeq jalign jfreq jhmm jnet jpssm mul nnssp phd pred zpred : : : : : : : : : : Jpred : -HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH----EEEE-- : Jpred PHDHtm MCoil MCoilDI MCoilTRI Lupas 21 Lupas 14 Lupas 28 : : : : : : : -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- : : : : : : : PHDHtm MCoil MCoilDI MCoilTRI Lupas 21 Lupas 14 Lupas 28 PHDacc Jnet_25 Jnet_5 Jnet_0 : : : : ----B---B-BBBBBBB---B---BB-B-BB----B-BB-BBBB-BB-BB-B---B----B--BB--B------B-B-U---BB---B--BBB-BB---B--BB--B-BB---BB-BBB-BBB-BB-BB-B---B----BB-BB--B--------B-------------BB--B----B---B--B----------B---B--B--------------B--BB----------------------------------------------------B---B--B--------------B------------------- : : : : PHDacc Jnet_25 Jnet_5 Jnet_0 PHD Rel Pred Rel Jnet Rel : 97527999999999999899999999986315269999999999999964332235649999999999962356225319 : PHD Rel : 00777700999990990609990999886606668099999999009677787757768989909999957077777000 : Predator Rel : 79889998888998643697888849188454657899999999988626987657778999999986007883747728 : Jnet Rel YA60_PYRHO TF19_HUMAN Q9VUZ8 YRGK_CAEEL Y691_METJA YK68_ARCFU YF69_SCHPO YMW4_YEAST consv --HHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EE---HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH----EEEEE--HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH---EEEEE-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEEEE--HHHHHHHHHHHHHH--HHHHHHH-HEEEE---HHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE----HHHHHHHHHHHHHHHHH--HHHHHHHH-H--HHHHHHHHHHHHHH----------HHHHHHHHHHHHHHH--H-EEEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH-----EEEEE ---HHHHHHHHHHHHHHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----EEE----HHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHH-------HHHHHHHHHHHHHHHHHHHHH-----HHHHHHHHHHHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-EE----HHHHHHHHHHHHHHHHH---EE-- : : : : : : : : : : jalign jfreq jhmm jnet jpssm mul nnssp phd pred zpred Accuracy Evaluation By Liang-Yu Shih Methods Per-residue accuracy Q3 measurement: traditional way Mathew’s correlation coefficient: Per-segment accuracy SOV measurement: CASP2 Subcategorizing the incorrect prediction Over: predict alpha/beta when it is coil Under: predict coil when it is alpha/beta Wrong: predict alpha when it is beta or vice versa How to measure Q3 Qindex: Qhelix, Qstrand and Qcoil: for a single conformational state: Qi = [(number of residues correctly predicted in state i)/(number of residues observed in state i)] x 100 Q3: for all three states Q3 = [(number of residues correctly predicted)/(number of all residues)] x 100 How to measure Matthew coefficients Problems in per-residue accuracy 1. It does not reflect 3D structure. 2. Example: assigning the entire myoblobin chain as a single helix gives a Q3 score of 80. Conformational variation observed at secondary structure segment ends. Example: low Q3 value but can predict folding well. Q: What is a good measure? A: A structurally oriented measure A structurally oriented measure consider the following……….. 1. Type and position of secondary structure segments rather than a per-residue assignment of conformational state. Natural variation of segment boundaries among families of homologous proteins. 2. How to measure SOV SOV Example Observed (S1): CCEEECCCCCCEEEEEECCC Predicted (S2): CCCCCCCEEEEECCCEECCC Minov # ## Maxov SOV Example Cont. Sov(E) = 1 11 2 2 100 * *( ) * 6 34.6 663 10 6 EEECCCCCCEEEEEE S(E’) S(E’) S(E) S(E) [minov(s1, s2) + delta(s1,s2)] / maxov(s1, s2) Delta(s1,s2)=min[(10-1);(1);(15/2);(10/2)] Delta(s1,s2)=min[(6-2);(2);(15/2);(10/2)] Evaluation-Step 1 (query sequence) Hypothetical Protein : MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQI QLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK 80 residues Methanothermobacter thermautotrophicus Structures solved by NMR Christendat,D., et al. Nat. Struct. Biol. 7 (10), 903-909 (2000) Evaluation-Step 2 (programs) Explicit rules First Generation (information is from a single residue, of a single sequence) NearestNeighbors Neural-Networks based prediction Levin et al 1986 Nishikawa and Ooi 1986 Holley and Karplus 1989 Qian and Sejnowski 1988 PSIProfile HMM Lim 1974 Second Generation (Local interactions) PREDATOR 1996 APSSP1995 Third Generation (Information is from homologous sequences) SAM-T99sec PHD 1993 Jpred 1999 PROFsec2000 SSPRO2 Severs 1. 2. 3. 4. 5. 6. APSSPhttp://imtech.ernet.in/raghava/aps sp/ JPred http://jura.ebi.ac.uk:8888/ PHDhttp://cubic.bioc.columbia.edu/predic tprotein PROFsechttp://cubic.bioc.columbia.edu/pr edictprotein PSIpredhttp://insulin.brunel.ac.uk/psifor m.html SAM-T99sec http://www.cse.ucsc.edu/research/compbi o/HMM-apps/T99-query.html Evaluation-Step 3 Conversion of DSSP secondary structure from 8 states to 3 states: DSSP USED H H G H H: alpha helix E: beta strand L: coil (others) I H E E B E T L S L '' L Evaluation-Step 4 •First column: protein sequence (AA) in one-letter code •Second column: observed (OSEC) secondary structure •Third column: predicted (PSEC) secondary structure http://predictioncenter.llnl.gov/local/sov/sov.html Evaluation-Result Method Measurement Jpred Q3 Apssp Sam-T99 PHD Predator SSRPO ALL HELIX STRAND COIL 73.8 100.0 100.0 47.5 SOV 62.2 80.5 100.0 48.1 Q3 72.5 97.5 100.0 47.5 SOV 67.3 93.8 100.0 46.9 Q3 72.5 100.0 100.0 45.0 SOV 65.8 93.8 100.0 44.2 Q3 67.5 97.5 100.0 37.5 SOV 56.5 80.0 100.0 38.5 Q3 70.0 95.5 100.0 45.0 SOV 66.4 89.4 100.0 48.0 Q3 77.5 100.0 100.0 55.0 SOV 69.1 94.0 100.0 50.0 EVA: Evaluation of Automatic protein structure prediction http://cubic.bioc.columbia.edu/eva/sec/graph/common3.jpg Conclusion Jpred is the pioneer of methods which give high Q3 and SOV scores. The 2ndary structure prediction using a jury of neural networks is one of the best methods. REFERENCES 1. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ. “Jpred: A consensus secondary structure prediction server,” Bioinformatics, 1998;14:892-893. 2. Cuff,J.A. and Barton, G.J. “Evaluation and improvement of multiple sequence methods for protein secondary structure prediction.” Proteins: Structure, Functions, and Genetics, 1999;34:508-519. 3. Cuff,J.A. and Barton, G.J. “Application of multiple sequence alignment profiles to improve protein secondary structure prediction.” Proteins: Structure, Functions, and Genetics, 2000;40:502-511. 4. Zemla et al. A modified definition of Sov, a Segment-Based Measure for Protein Secondary Structure Prediction Assessment. Protein; 1999:34:220-223 5. Defay T, Cohen F. Evaluation of current techniques for ab initio protein structure prediction. Proteins 1995; 23:431-445. 6. Barton GJ. Protein secondary structure prediction. Curr Opin Struct Biol 1995; 5:372-376 7. Schulz GE. A critical evaluation of methods for prediction of secondary structures. Ann Rev Biophys Chem 1988; 17:1-21 8. Zhu Z-Y. A new approach to the evaluation of protein secondary structure predictions at the level of the elements of secondary strucuter. Protein Eng 1995; 8:103-108