GenThreader is a neuron-network based threading method

advertisement
Amit Rosner
Barak Raveh
12/05/03
Protein Tertiary Structure Prediction – Abstract
The 'Protein Folding Problem' is one of the greatest challenges in structural molecular biology today.
Three approaches face this challenge of predicting a protein fold, given its primary sequence1:
Comparative Modeling - predicts the structure of a new protein by comparing its sequence with the
sequence of homologous proteins of known structure.
Threading - more sophisticated scoring functions align the target's sequence with structures of known
templates. When a similar fold is found, the target's sequence is “threaded” onto that fold.
Ab initio - predicts structure using only the information in the target sequence itself, without templates.
The development of rapid, automatic simple methods for protein fold recognition raises interesting
questions2, dealt with in our presentation: What constitutes a baseline level of success for protein fold
recognition methods? Can simple methods that make use of secondary structure information assign folds
more accurately? Could these methods be used to construct viable hierarchical classifications?
GenThreader3 is an example for an automated threading method, that use the power of neuron networks
for assigning folds to given primary sequences. GenThreader combines information of sequence
comparison with physical properties such as pair interaction and solvation energy, to quickly evaluate folds
at high confidence levels. Using this method, quick analysis of whole genomes is made possible. For
instance, the whole genome of mycoplasma genitalium was analyzed by this method in less then a day,
assigning folds for 46% of the genome ORFs, including matches that could not be recognized by sequence
comparison alone.
Rosetta4 5 6, a successful ab-inito method for predicting folds, uses a different approach. The method
narrows down the number of possible structure conformation from an infinite number into a set of 200,000
decoy structures, thus reducing the problem of ab-initio fold prediction into a manageable fold recognition
problem. A scoring function based on Bayes theorem separates sequence dependant from sequence
independent factors. I-Sites7, a method for predicting local structure elements, serves both as an example
for a sequence dependant factor, and to explain how the structural search space can be reduced.
1
Orengo C.A. Jones D.T. and Thornton J.M. (2003) Bioinformatics: Genes, Proteins & Computers, BIOS Scienific Publishers Limite d.
McGuffin L.J, Bryson K. and Jones D.T. (2001) What are the baselines for protein fold recognition. BIOINFORMATICS, 17 , 63-72
3
Jones D.T (1999) GenTHREADER: An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences. J. Mol. Biol, 287, 797-815
4
Bonneau R, Tsai J, Ruczinski I, Chivian D, Rohl C, Strauss C.E.M, Baker D (2001) Rosetta in CASP4: Progress in Ab Initio Protein Structure
Prediction. PROTEINS, 5, 119-126
5
Simons K.T, Kooperberg C, Huang E, and Baker D (1997) Assembly of Protein Tertiary Structures from Fragments with Similar Local Sequences using
Simulated Annealing and Bayesian Scoring Functions. J. Mol. Biol 268, 209-225
6
Simons K.T, Ruczinski I, Kooperberg C, Fox BA, Bystroff C, Baker D (1999) Improved recognition of native-like protein structures using a combination of
sequence-dependent and sequence-independent features of proteins. PROTEINS, 34, 82-95
7
Bystroff C, Baker D (1998) Prediction of local structure in proteins using a library of sequence-structure motifs. J Mol Biol, 281, 565-577
2
Download