3DSIG: STRUCTURAL BIOINFORMATICS AND COMPUTATIONAL BIOPHYSICS PENG & XU ID 046 VIENNA, JULY 15-16 2011 A MULTIPLE-TEMPLATE APPROACH TO PROTEIN THREADING Jian Peng and Jinbo Xu,* Toyota Technological Institute at Chicago, jinboxu@gmail.com We present a novel multiple-template protein threading method to correct errors in pairwise sequence-template alignments by exploiting template structure similarity and enforcing consistency among all pairwise alignments. We achieve this by realigning a sequence simultaneously to multiple templates using a novel probabilisticconsistency algorithm. By contrast, other methods simply assemble pairwise sequence-template alignments into a sequence-to-multiple-template alignment and thus, cannot correct alignment errors. The 9th CASP evaluation indicates that our method generated the best alignments for the 50 hardest template-based modeling targets and that our multi-template models significantly excels the models built from the best single templates. Our method was also voted as one of the most interesting and innovative methods by the CASP9 community. INTRODUCTION Along with the enlargement of the PDB, multitemplate homology modelling has been proposed for protein 3D structure prediction. Two key challenging issues remain open for multi-template modelling. First, can we consistently improve alignment accuracy by utilizing multiple templates? Second, can we build a multi-template model better than the model built from the best single template? Previous multi-template methods simply assemble pairwise sequence-template alignments into a sequence-to-multiple-template alignment using the sequence as an anchor and then generate a 3D model from such an alignment. These methods cannot correct pairwise alignment errors and very often produce a multi-template model no better than the best-single-template model due to inconsistency and errors in the final alignment. METHODS Overview. Given a target sequence, we first run our single-template threading method to determine the top templates. Then we realign the sequence simultaneously to the top templates (see below). The resultant sequence-to-multiple-template alignment is fed into MODELLER to produce a 3D model. Realignment algorithm. Given two pairwise alignments S-T1 and S-T2 where S is the sequence and T1 and T2 are two templates, an alignment between T1 and T2 can be derived from S-T1 and ST2 (using S as an anchor). Such a T1-T2 alignment should be consistent with the T1 to T2 alignment generated by a structure alignment program. Otherwise, there may be errors in S-T1 and S-T2 alignments because template-template structure alignment usually is more accurate than sequencetemplate alignment. Therefore, we can use templatetemplate structure alignment to improve sequencetemplate alignment by enforcing consistency among all pairwise alignments. We achieve this by realigning a sequence simultaneously to multiple templates. To do so, we use a probabilistic alignment matrix (PAM) to represent all the possible alignments between a pair of sequence and template so that we do not fix their pairwise alignment and thus, later we can make the alignment better. We adjust all the PAMs using a novel probabilistic-consistency algorithm to make them as consistent as possible with all the templatetemplate structure alignments. After the PAM adjustment, we can obtain better pairwise sequencetemplate alignments and also the final sequence-tomultiple-template alignment, from which we can build a 3D model using MODELLER. Our probabilisticconsistency algorithm is significantly better than that used by ProbCons for multiple sequence alignment because ProbCons does not deal with gaps while ours does. Because of this, our algorithm can generate better alignment for distantly-related proteins than ProbCons can. RESULTS & CONCLUSIONS To do evaluation, we use 51 CASP8 and 48 CASP9 targets, each with ≥2 reasonable templates and in total with 327 templates. TM and GDT are two widely-used metrics for model quality evaluation. The higher TM and GDT a model has, the better the model. Our method improves pairwise sequence-template alignment. The 327 pairwise alignments generated by our realignment method excel those by our singletemplate threading method with P-value 7.18E-05. Our multi-template models excel the best-singletemplate models. As shown in Table 1, the cumulative TM and GDT of our multiple-template models are 75.686 and 6585.7, respectively, excelling the best-single-template models with P-values 3.32E06 and 1.10E-08. Empirically it is very challenging to identify the best template for a target, so our multitemplate method has a larger advantage in reality. Our method excels many others. In Table 1, MAFFT, T-coffee, MUSCLE and ProbCons are multiple sequence alignment methods. PROMALS3D and MCoffee are multiple sequence/structure alignment methods. All these methods can be used to align a sequence to its templates. The baseline method assembles pairwise alignments generated by our single-template threading to a sequence-to-multitemplate alignment using the target as an anchor. The models derived from all these methods are even worse than our best-single-template models. This indicates the importance of a good realignment algorithm. TABLE 1. Performance of various alignment methods on 99 CASP targets. The smaller the P-value, the more significant the difference between our method and others. Methods This work Best-single-template MAFFT T-coffee MUSCLE ProbCons PROMALS3D M-coffee Baseline Model Quality TM GDT 75.876 6598.2 74.066 6381.5 66.368 5715.9 67.697 5852.1 66.556 5715.3 67.193 5804.9 72.636 6309.2 73.721 6414.9 73.386 6353.4 P-value TM GDT 3.32E-06 1.10E-08 3.69E-10 4.40E-10 1.07E-07 9.19E-08 1.91E-09 1.54E-09 3.93E-08 3.89E-08 1.47E-04 3.94E-04 2.91E-04 1.69E-03 2.49E-08 2.38E-07 Our method also performs well in CASP9. Compared to ~80 servers, our method is only slightly second to Zhang-Server and QUARK of Zhang group (P > 0.1), which are based upon a consensus method and also extensively refine threading models. Conclusions. Multi-template threading can consistently improve alignment accuracy and build a model better than the best-single-template model. REFERENCES 1. Peng, J & Xu, J. A multiple-template approach to protein threading. PROTEINS. (2011)