Amit Rosner Barak Raveh 12/05/03 Protein Tertiary Structure Prediction – Abstract The 'Protein Folding Problem' is one of the greatest challenges in structural molecular biology today. Three approaches face this challenge of predicting a protein fold, given its primary sequence1: Comparative Modeling - predicts the structure of a new protein by comparing its sequence with the sequence of homologous proteins of known structure. Threading - more sophisticated scoring functions align the target's sequence with structures of known templates. When a similar fold is found, the target's sequence is “threaded” onto that fold. Ab initio - predicts structure using only the information in the target sequence itself, without templates. The development of rapid, automatic simple methods for protein fold recognition raises interesting questions2, dealt with in our presentation: What constitutes a baseline level of success for protein fold recognition methods? Can simple methods that make use of secondary structure information assign folds more accurately? Could these methods be used to construct viable hierarchical classifications? GenThreader3 is an example for an automated threading method, that use the power of neuron networks for assigning folds to given primary sequences. GenThreader combines information of sequence comparison with physical properties such as pair interaction and solvation energy, to quickly evaluate folds at high confidence levels. Using this method, quick analysis of whole genomes is made possible. For instance, the whole genome of mycoplasma genitalium was analyzed by this method in less then a day, assigning folds for 46% of the genome ORFs, including matches that could not be recognized by sequence comparison alone. Rosetta4 5 6, a successful ab-inito method for predicting folds, uses a different approach. The method narrows down the number of possible structure conformation from an infinite number into a set of 200,000 decoy structures, thus reducing the problem of ab-initio fold prediction into a manageable fold recognition problem. A scoring function based on Bayes theorem separates sequence dependant from sequence independent factors. I-Sites7, a method for predicting local structure elements, serves both as an example for a sequence dependant factor, and to explain how the structural search space can be reduced. 1 Orengo C.A. Jones D.T. and Thornton J.M. (2003) Bioinformatics: Genes, Proteins & Computers, BIOS Scienific Publishers Limite d. McGuffin L.J, Bryson K. and Jones D.T. (2001) What are the baselines for protein fold recognition. BIOINFORMATICS, 17 , 63-72 3 Jones D.T (1999) GenTHREADER: An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences. J. Mol. Biol, 287, 797-815 4 Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss C.E.M, Baker D (2001) Rosetta in CASP4: Progress in Ab Initio Protein Structure Prediction. PROTEINS, 5, 119-126 5 Simons K.T, Kooperberg C, Huang E, and Baker D (1997) Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol 268, 209-225 6 Simons K.T, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D (1999) Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins. PROTEINS, 34, 82-95 7 Bystroff C, Baker D (1998) Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol, 281, 565-577 2