RAPTOR-CASP8

advertisement
RAPTOR: protein structure prediction by multiple techniques
Jinbo Xu, Jian Peng and Feng Zhao
Toyota Technological Institute at Chicago
Corresponding author: Dr. Jinbo Xu (j3xu@tti-c.org)
Evolved from a pure threading software package, RAPTOR in CASP8 contains
the following modules: a threading module (i.e., the original RAPTOR), an
alignment-based model quality assessment program, a multiple sequence
alignment module and an ab initio folding program. The strategy used by
RAPTOR in CASP8 is as follows. First, a target is threaded to all the templates
and the quality of each sequence-template alignment is predicted by a model
quality prediction program. According to the predicted GDT score, different
methods are employed. If there are multiple templates with very good predicted
GDT scores, then we build a multiple sequence alignment between the target
and some selected templates, based on which a 3D model is built by
MODELLER. If all the templates have very low predicted GDT score, then ab
initio folding is employed to build a 3D model. Otherwise, a threadinggenerated 3D model is directly submitted. The new RAPTOR was not ready for
the first 1/3 CASP8 targets at all and was under development during the whole
CASP8 season. Therefore, this new RAPTOR was not fully benchmarked yet.
Quality assessment. Different from many model quality assessment methods
that directly work on a 3D model, our quality prediction program takes as input
an alignment and generates a predicted GDT score. Tested on the alignments
generated by RAPTOR for the CASP6 and CASP7 targets, the average
prediction error of GDT score is approximately 0.04 and the correlation
coefficient between the predicted and real is more than 0.9 for all the
alignments and approximately 0.8 for low-quality ones. This quality prediction
method is built upon an idea described in [1], which uses SVM to predict the
number of correctly aligned positions in an alignment. Current implementation
uses better features to describe an alignment and employs an SVM variant to
predict the GDT score.
Multiple template method. The top two templates are always used and then
we enumerate all the possible combinations of the remaining good templates (at
most 5 templates are used in total). For a given set of multiple templates, Tcoffee and TM-align are used to generate a multiple sequence alignment from
the pairwise alignments produced by threading. Then ProQ and DFIRE are
used to rank all the generated models and select the best combination of
multiple templates. MetaMQAP is also examined and looks like it is highly
consistent with ProQ in model ranking.
Ab initio folding method. We have developed a probabilistic graphical model
for ab initio folding, which employs Conditional Random Fields (CRFs) and
directional statistics to model the protein sequence-structure relationship.
Different from the widely-used fragment assembly method and the lattice
model method, our graphical model can explore protein conformations in a
continuous space according to their probability. The probability of a protein
conformation reflects its stability and is estimated from PSI-BLAST sequence
profile and PSIPRED-predicted secondary structure. Experimental results
indicate that this new method compares favorably with the fragment assembly
method (e.g., Rosetta) and the lattice model (i.e., TOUCHSTONE II). Our
method performs well on some hard targets such as T0480, T0495_2 (by
Rosetta’s domain definition), T0496_1, T0496_2, and T0510_3. A preliminary
version of this method is available at [2], in which only a first-order CRF model
for conformation sampling is described. Current implementation [3] employs a
second-order CRF model for sampling and drives conformation optimization by
a simple energy function consisting of Sali’s DOPE, Baker’s KMBhbond and a
simplified solvent accessibility potential ESP.
Finally, a new threading program based on a probabilistic graphical model and
a boosting method was also developed in late CASP8 and tested on only several
targets [4].
Acknowledgement. The authors are grateful to Xin Gao for his work in setting
up RAPTOR web server and running RAPTOR in the very early stage of
CASP8. The authors are also grateful to the authors of ProQ, DFIRE,
MetaMQAP, T-Coffee and TM-align etc.
1. Jinbo Xu. (2005) Fold recognition by predicted alignment accuracy.
IEEE/ACM Trans. on Comput Biol and Bioinfo. 2005 Apr-Jun;2(2):157-65.
2. Feng Zhao, Shuaicheng Li, Beckett Sterner and Jinbo Xu. (2008)
Discriminative learning for protein conformation sampling. Proteins. 2008
Oct;7 3(1):228-40.
3. Feng Zhao, Jian Peng, Joe DeBartolo, Karl F. Freed, Tobin R. Sosnick and
Jinbo Xu. (2008) A probabilistic graphical model for ab initio folding.
RECOMB 2009. In Press.
Abstracts - 1
4. Jian Peng, Jinbo Xu. (2008) Boosting protein threading accuracy. RECOMB
2009. In Press.
Abstracts - 2
Download