Topic 9

advertisement
Application of Monte Carlo Simulation:
Removing averaging artifacts in protein
structure prediction
Dukka
Outline
•
Background
•
Protein Structure Prediction and CASP
•
TASSER algorithm
•
MCORE algorithm
Background
•
Experimental or computational method often output results as an ensemble of
protein structures.
– NMR, Protein Structure Prediction, Protein Docking, RNA Structure Prediction
•
•
•
•
A single representative structure is required to compare or do further analysis.
Representative structure (consensus structure) = a centroid structure by
averaging the Cartesian coordinates of the ensemble of superimposed
structures.
RMSD between the ‘averaged structure’ and any reference structure is always
less than or equal to the average RMSD of the individual members. (Zagrovic
et al.)
However, the centroid structure has averaging artifacts rendering bond angles
and bond lengths to be unphysical.
Protein Structure Prediction and CASP
•
Critical Assessment of Structure prediction of Proteins (CASP) is a biannual
contest where different groups try to predict structure of a protein whose
structure is not released to the outside world.
•
One of the most popular and objective contest in the bioinformatics field.
•
CASP8 just over.
•
Major observations from CASP7:
– Methods are more or less ripe enough
– Consensus servers usually outperform individual servers
– A lot of work needed to be done in the refinement step
Refinement
• Given a set of conformations obtain a conformation that is closest to
the native structure.
• Molecular force fields like AMBER, CHARMM can be utilized but as
we know they are not perfect.
• Furthermore, still lack of perfect definition of “closest”. Hence, CASP
coming up with new ideas of other measures to measure the closeness
to the native like HB score and so on.
• Often, the ‘most closest prediction’ is not ranked top 1. Hence,
‘Refinement’ is getting a lot of attention.
TASSER algorithm
(Threading/ASSembly/Refinement)
Centroid Structure
Zhang & Skolnick, 2004
Problem Identification
• TASSER is one of the best prediction server in both CASP7 and
CASP8.
• A large number of conformations is generate after the assembly step.
However, we can submit only a couple of models.
• Clustering is utilized and the centroid of the largest cluster (Combo
model) is predicted as the output and has proven to be successful.
• Artifacts in ‘Tasser (combo) output’
– Unrealistic bond lengths and bond angles due to averaging artifacts
Scope
– To fix these unrealistic bond lengths and bond angles
C-alpha Space
Energy Minimization!
Combo and Closc Models
Fraction of clashes
COMBO model : The centroid structure of the most dense cluster.
CLOSC model : The structure that is closest to the centroid of the most dense
cluster.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
PULCHRA
•
PULCHRA - based on steepest descent minimization and a simple force field.
•
Sometimes, can not come out of the kinetic trap.
•
Heavily distorted chain, the minimization procedure does not converge or the
optimized model still exhibits irregularities.
Rotkiewicz and Skolnick, 2008
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
MCORE
Generate an extended
structure based on Combo model
Start from a ‘close-by model’
Monte-Carlo Minimization
Output the best structure
Generation of Extended Structure
Using the distance distribution from the PDB, mainly three types: x-Pro = 3.77, x{ALA|ARG|ASN|LEU|LYS|MET} = 3.81, and x{ASP|CYS|GLU|GLY|HIS|ILE|PHE|SER|THR|TRP|TYR|VAL} = 3.80
Monte-Carlo
• Two major components of any Monte-Carlo Approach
– Energy Function
• Can be generic force field or any combination of terms
– Move Sets
• Critical to the performance of the algorithm, more of an art(?)
– Convergency Criteria
• Naïve way (Run for certain number of steps)
• Introduce some criteria based on the generated conformations
Monte-Carlo: Metropolis Criteria
• Starting from a state A, make a change in the configuration to obtain a
new (nearby) configuration B.
• Compute EB
• If (EB < EA), assume the new configuration, since it is a desirable thing.
• If (EB > EA), calculate the probability p
p  e(E B E A ) / T
• Draw r from uniform distribution [0,1], if r < p then accept the new
configuration
B else reject the new configuration B.

Move Sets
• Move Sets
– Global move-set
• Rest-all bead move
– Local move-set
• 1-bead move
• 2-bead move
• 3-bead move
• 4-bead move
• 5-bead move
– End-bond move
• 1,2,3-bead C-terminal end bond move
• 1,2,3-bead N-terminal end bond move
Move Sets
• Calculate the unit vector
along axis defined by i-1 and
i+1
• Calculate the rotation matrix
around this vector
• Calculate the new position of
i
• Important thing is to preserve
the bond length i.e. to
preserve the distance
between consecutive Calphas.
i
i-1
i+1
One bead move
Two bead move
Three-bead move
Rest-bead move
Four-bead move
Five-bead move
Axis of rotation
End-bond Move Sets
Axis of rotation
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Energy Function
Excluded volume
N 2
kexcl
N
 
(rkl  ro _ excl
k1 lk2
Penalize if the distance is
less than 4.0A
Bond angle
)2  k
N 2
N
2
2
ang  ( i,i1,i2   o _ ang)  kclos  (d kk t  d o _ clo )
k1
k1
Penalize if the angle is not
between 70 and 150

kang  0.075
ro _ excl  4.0
Closeness to target
kclos  5.4
Penalize if the difference in C-alpha
position between the target and
starting structure is not with-in
certain cutoff
kexcl  2.9
do _ clo  0.001
o _ ang  70 if i,i1,i2  70 and150if  150 and same otherwise
N: Number of C-alpha atoms
Assessment of Move Sets and Energy Function
• Before doing the actual computation, have to test whether the move
sets and energy function is properly working or not.
• So, have to design some test cases. Positive test cases would be to
drive extended structure to native structure.
– Desired results:
should be able to drive ‘very close’ to extended structure to
native structure in relatively short number of steps
Data Set
• 1363 proteins less than 200 residues and the combo RMSD to the
native is lesser than 6.5 Å.
• 1363 Centroid structures (COMBO models)
• 1363 CLOSC models
• 1363 Close-by structures (CLOSC models + Pulchra Refinement)
• 1363 Native structures.
Driving Extended to Native
Steps
10000 steps RMSD = 0.039
Average RMSD to NATIVE (Å)
Average Energy
0.045
0.06
Steps
0.041
Driving Extended to Native
0.033Å
Ext-refined Vs CA
Convergency criteria
i
l
| rmsd_diff((i –l))| < Tolerance value, where l = i+j , j=1,…,L
Tried with different value of L and L=49 and Tolerance value = 0.005 seems
reasonable.
Propose two algorithms
• MCORE: Start from a ‘close-by model and drive it towards the
COMBO model.
• CLOSC models as the close-by models.
– When close-by model is readily available
• MCORE-EXT: Start from an extended structure and drive it towards
the COMBO model.
– When close-by model is not readily available
Average Energy
MCORE: Driving Close-by models to COMBO
Steps
Fraction of Atoms Clashing in MCORE
Fraction of Atoms Clashing in COMBO
Why cannot go much closer to COMBO?
RMSD of MCORE to COMBO (Å)
RMSD of MCORE to COMBO (Å)
RMSD of MCORE to NATIVE (Å)
MCORE Vs Combo
RMSD of COMBO to NATIVE
38 proteins had even lesser RMSD than the
respective combo model
RMSD to Native (Å)
Fraction of Atoms in Clashes
Comparison of Different Models
3.35
3.36
3.54
3.28
0.010
0.065
0.000
0.63
TM-score of four models
0.770
0.746
0.747
0.754
Results
Avg. RMSD to Native
(Å)
Avg clash < 1.9
Avg clash < 3.6
Combo
3.28
0.03
0.630
Closc
3.54
0
0.614
MCORE
3.35
0
0.010
Pulchra (Closc)
3.54
0
0
MCORE(EXT, 2000
steps)
3.35
0
0.011
Pulchra (Combo)
3.36
0.005
0.065
Some Examples
0.78Å
0.354Å
1akhA refined Vs native
1akhA com Vs refined
12 clashes
0.674Å
0.68Å
1akhA com Vs native
1akhA pulchra Vs native
0.852Å
2.948Å
3bbn_ refined Vs combo
3bbn_ comboVs Native
2.918Å
3bbn_ pulchra Vs Native
3.099Å
3bbn_ refined Vs Native
All-atom model reconstruction
• Built the main chain atoms
of the refined Cα trace.
• Rebuilt side-chains using
two methods
– Pulchra (-c)
– Scwrl
3.92(all)
2.868(cα)
3.95(all)
Conclusion
• Designed an algorithm to remove averaging artifacts and applied it to
refine combo model.
• Acknowledgments
– Dr. Jeff Skolnick and all the members of the Skolnick Lab,
especially Lila, Shashi, Hongyi, Seung Yup,…….
– Dr. Dennis Livesay
• Future Works
– Refinement in All-atom space
Download