3DSIG2011_Template

advertisement
3DSIG: STRUCTURAL BIOINFORMATICS AND COMPUTATIONAL BIOPHYSICS
PENG & XU
ID
046
VIENNA, JULY 15-16 2011
A MULTIPLE-TEMPLATE APPROACH TO PROTEIN THREADING
Jian Peng and Jinbo Xu,*
Toyota Technological Institute at Chicago, jinboxu@gmail.com
We present a novel multiple-template protein threading method to correct errors in pairwise sequence-template
alignments by exploiting template structure similarity and enforcing consistency among all pairwise alignments.
We achieve this by realigning a sequence simultaneously to multiple templates using a novel probabilisticconsistency algorithm. By contrast, other methods simply assemble pairwise sequence-template alignments into
a sequence-to-multiple-template alignment and thus, cannot correct alignment errors. The 9th CASP evaluation
indicates that our method generated the best alignments for the 50 hardest template-based modeling targets and
that our multi-template models significantly excels the models built from the best single templates. Our method
was also voted as one of the most interesting and innovative methods by the CASP9 community.
INTRODUCTION
Along with the enlargement of the PDB, multitemplate homology modelling has been proposed for
protein 3D structure prediction. Two key challenging
issues remain open for multi-template modelling. First,
can we consistently improve alignment accuracy by
utilizing multiple templates? Second, can we build a
multi-template model better than the model built from
the best single template? Previous multi-template
methods simply assemble pairwise sequence-template
alignments into a sequence-to-multiple-template
alignment using the sequence as an anchor and then
generate a 3D model from such an alignment. These
methods cannot correct pairwise alignment errors and
very often produce a multi-template model no better
than the best-single-template model due to
inconsistency and errors in the final alignment.
METHODS
Overview. Given a target sequence, we first run our
single-template threading method to determine the top
templates. Then we realign the sequence
simultaneously to the top templates (see below). The
resultant sequence-to-multiple-template alignment is
fed into MODELLER to produce a 3D model.
Realignment algorithm. Given two pairwise
alignments S-T1 and S-T2 where S is the sequence
and T1 and T2 are two templates, an alignment
between T1 and T2 can be derived from S-T1 and ST2 (using S as an anchor). Such a T1-T2 alignment
should be consistent with the T1 to T2 alignment
generated by a structure alignment program.
Otherwise, there may be errors in S-T1 and S-T2
alignments because template-template structure
alignment usually is more accurate than sequencetemplate alignment. Therefore, we can use templatetemplate structure alignment to improve sequencetemplate alignment by enforcing consistency among
all pairwise alignments. We achieve this by realigning
a sequence simultaneously to multiple templates. To
do so, we use a probabilistic alignment matrix (PAM)
to represent all the possible alignments between a pair
of sequence and template so that we do not fix their
pairwise alignment and thus, later we can make the
alignment better. We adjust all the PAMs using a
novel probabilistic-consistency algorithm to make
them as consistent as possible with all the templatetemplate structure alignments. After the PAM
adjustment, we can obtain better pairwise sequencetemplate alignments and also the final sequence-tomultiple-template alignment, from which we can build
a 3D model using MODELLER. Our probabilisticconsistency algorithm is significantly better than that
used by ProbCons for multiple sequence alignment
because ProbCons does not deal with gaps while ours
does. Because of this, our algorithm can generate
better alignment for distantly-related proteins than
ProbCons can.
RESULTS & CONCLUSIONS
To do evaluation, we use 51 CASP8 and 48 CASP9
targets, each with ≥2 reasonable templates and in total
with 327 templates. TM and GDT are two widely-used
metrics for model quality evaluation. The higher TM
and GDT a model has, the better the model.
Our method improves pairwise sequence-template
alignment. The 327 pairwise alignments generated by
our realignment method excel those by our singletemplate threading method with P-value 7.18E-05.
Our multi-template models excel the best-singletemplate models. As shown in Table 1, the
cumulative TM and GDT of our multiple-template
models are 75.686 and 6585.7, respectively, excelling
the best-single-template models with P-values 3.32E06 and 1.10E-08. Empirically it is very challenging to
identify the best template for a target, so our multitemplate method has a larger advantage in reality.
Our method excels many others. In Table 1, MAFFT,
T-coffee, MUSCLE and ProbCons are multiple
sequence alignment methods. PROMALS3D and MCoffee are multiple sequence/structure alignment
methods. All these methods can be used to align a
sequence to its templates. The baseline method
assembles pairwise alignments generated by our
single-template threading to a sequence-to-multitemplate alignment using the target as an anchor. The
models derived from all these methods are even worse
than our best-single-template models. This indicates
the importance of a good realignment algorithm.
TABLE 1. Performance of various alignment methods on 99 CASP
targets. The smaller the P-value, the more significant the
difference between our method and others.
Methods
This work
Best-single-template
MAFFT
T-coffee
MUSCLE
ProbCons
PROMALS3D
M-coffee
Baseline
Model Quality
TM
GDT
75.876 6598.2
74.066
6381.5
66.368
5715.9
67.697
5852.1
66.556
5715.3
67.193
5804.9
72.636
6309.2
73.721
6414.9
73.386
6353.4
P-value
TM
GDT
3.32E-06 1.10E-08
3.69E-10 4.40E-10
1.07E-07 9.19E-08
1.91E-09 1.54E-09
3.93E-08 3.89E-08
1.47E-04 3.94E-04
2.91E-04 1.69E-03
2.49E-08 2.38E-07
Our method also performs well in CASP9.
Compared to ~80 servers, our method is only slightly
second to Zhang-Server and QUARK of Zhang group
(P > 0.1), which are based upon a consensus method
and also extensively refine threading models.
Conclusions.
Multi-template
threading
can
consistently improve alignment accuracy and build a
model better than the best-single-template model.
REFERENCES
1. Peng, J & Xu, J. A multiple-template approach to protein
threading. PROTEINS. (2011)
Download