Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University Protein Long sequence of amino-acids (dozens to thousands), from a dictionary of 20 distinct amino-acids Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure Why Proteins?  They are the workhorses of living organisms • They perform many vital functions, e.g.: - catalysis of reactions storage of energy transmission of signals building blocks of muscles  They raise challenging computational issues • Large molecules (100s to several 1000s of atoms) • Made of building blocks drawn from a small “dictionary” • Unusual kinematic structure  They are associated with many critical problems • Folded structure determination • Global and local structural similarities • Prediction of folding and binding motions f-y Kinematic Linkage Model peptide group side-chain group Molecule and Robot Two problems  Structure determination from electron density maps • Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)]  Energy maintenance during Monte Carlo simulation • Collision detection techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)] Structure Determination/Prediction  Experimental tools X-ray crystallography  Computational tools • Homology, threading • Molecular dynamics NMR spectrometry Protein Data Bank Only about 10% of structures have been determined for known protein sequences  Protein Structure Initiative (PSI) 1990 1999 2000 2004     250 new structures 2500 new structures >20,000 structures total ~30,000 structures total X-Ray Crystallography Automated Model Building Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID • 1.0Å < d < 2.3Å ~ 90% completeness • 2.3Å ≤ d < 3.0Å ~ 67% completeness (varies widely)1 1.0Å 3.0Å JCSG: 43% of data sets  2.3Å  Manually completing a model: • Labor intensive, time consuming • Existing tools are highly interactive  Model completion is high-throughput bottleneck 1Badger (2003) Acta Cryst. D59 The Completion Problem  Input: Anchor 1 (3 atoms) • Electron-density map • Partial structure • Two anchor residues • Amino-acid sequence of missing fragment (typically 4 – 15 residues long) Anchor 2 (3 atoms) Protein fragment (fuzzy map) Main part of protein (folded)  Output: • Few candidate conformation(s) of fragment that - Respect the closure constraint (IK) - Maximize match with electron-density map IK Problem  Input: • Closed kinematic chain with n > 6 degrees of freedom • Relative positions/orientations X of end frames • Target function T(Q) → R  Output: • Joint angles Q that - Achieve closure - Optimize T T Related Work Biology/Crystallography Robotics/Computer Science • – – Manocha & Canny ’94 Manocha et al. ’95 – Wang & Chen ’91 – – Khatib ’87 Burdick ’89 – – – Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Optimization IK solvers • Redundant manipulators Motion planning for closed loops Exact IK solvers – – Exact IK solvers • • • • Optimization IK solvers – – • Fiser et al. ’00 Kolodny et al. ’03 Database search loop closure – – • Fine et al. ’86 Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure – – • Wedemeyer & Scheraga ’99 Coutsias et al. ’04 Jones & Thirup ’86 Van Vlijman & Karplus ’97 Semi-automatic tools – – Jones & Kjeldgaard ’97 Oldfield ’01 Two-Stage IK Method 1. Candidate generations  Closed fragments 2. Candidate refinement  Optimize fit with EDM Stage 1: Candidate Generation 1. Generate random conformation of fragment (only one end attached to anchor) 2. Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) Closure Distance Closure Distance: S  N - N  C - C  C - C 2 moving end fixed end 2 2 A.A. Canutescu and R.L. Dunbrack Jr. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003. S 0 Compute qi s.t. qi + bias toward EDM + avoid steric clashes Stage 2: Candidate Refinement  Target function T (Q) measuring quality of the fit with the EDM  Minimize T while retaining closure  Closed conformations lie on a self-motion manifold of lower dimension dq3 dq2 Null space (q1,q2,q3) dq1 1-D manifold Closure and Null Space      dX = J dQ, where J is the 6n Jacobian matrix (n > 6) Null space {dQ | J dQ = 0} has dim = n – 6 N: orthonormal basis of null space Pseudo-inverse J+ such that JJ+ = I dQ = J+dX + NNTy y = T(Q) Computation of J+ and N SVD of J dX U66 VT6n S66 s1 s2 dQ 0 = s6 NT (n-6) basis N of null space Gram-Schmidt orthogonalization J+ = V S+ UT where S+=diag[1/si] Refinement Procedure Repeat until minimum is reached:  Compute J, J+ and N at current Q • Compute T at current Q (analytical expression of T + linear-time recursive computation [Abe et al., Comput. Chem., 1984]) • Move along dQ = J+dX + NNT T until minimum is reached or closure is broken + Monte Carlo + simulated annealing protocol to deal with local minima Monte Carlo Optimization Repeat: 1. Perform a random move of the fragment: – either by picking a random direction in null space – or by using an exact IK solver over 6 dofs [Coutsias et al, 2004] ( big jumps) 2. Minimize T(Q) 3. Accept move with Metropolis-criterion probability ~exp(-DT/Temp) Tests #1: Artificial Gaps  TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b  Complete structures (gold standard) resolved with EDM at 1.6Å resolution  Compute EDM at 2, 2.5, and 2.8Å resolution  Remove fragments and rebuild TM1621 103 Fragments from TM1621 at 2.5Å Short Fragments: 100% < 1.0Å aaRMSD Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Produced by H. van den Bedem Comparison Across Resolutions Resolution = 2.0Å Resolution = 2.5Å Resolution = 2.8Å Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD Tests #2: True Gaps     Structure computed by RESOLVE Gaps completed independently (gold standard) Example: TM1742 (271 residues) 2.4Å resolution; 5 gaps left by RESOLVE Length Top scorer Lowest error 4 0.22Å 0.22Å 5 0.78Å 0.78Å 5 0.36Å 0.36Å 7 0.72Å 0.66Å 10 0.43Å 0.43Å Produced by H. van den Bedem TM0813 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap GLU-83 GLY-96 TM0813 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD GLU-83 GLY-96 TM1621  Green: manually completed conformation  Cyan: conformation computed by stage 1  Magenta: conformation computed by stage 2  The aaRMSD improved by 2.4Å to 0.31Å Alr1529 D72-D78 resolution: initial model: contour: PDB: aaRMSD: 2.0Å ARP/wARP 1.0s 1VJG 0.33Å TM0542 • Top-scoring fragment in cyan • Manually completed fragment in green • Residues A259 and A260 are flipped Current/Future Work  Software actively being used at the JCSG  What about multi-modal loops? B A  TM0755: data at 1.8Å  8-residue fragment crystallized in 2 conformations  Overlapping density: Difficult to interpret manually A323 Hist A316 Ser Algorithm successfully identified and built both conformations Current/Future Work  Software actively being used at the JCSG  What about multi-modal loops?  Fuzziness in EDM can then be exploited B  Use EDM to infer probability measure over the conformation space of the loop A Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004 Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin1 1 Computer Science Department, Tel Aviv University Monte Carlo Simulation (MCS)  Random walk through conformation space  At each attempted step: • Perturb current conformation at random • Accept step with probability:  P(accept )  min 1, e -DE / kbT   The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~  V e - E kT dV Monte Carlo Simulation (MCS)  Used to: • sample meaningful distributions of conformations • generate energetically plausible motion pathways  A simulation run may consist of millions of steps  energy must be evaluated frequently Problem: How to maintain energy efficiently? Energy Function  E = S bonded terms + S non-bonded terms  Bonded terms + S solvation terms - O(n)  Non-bonded terms - E.g., e.g. Van der Waals and electrostatic - Depend on distances between pairs of atoms - O(n2)  Expensive to compute  Solvation terms - May require computing molecular surface Non-Bonded Terms  Energy terms go to 0 when distance increases  Cutoff distance (6 - 12Å)  vdW forces prevent atoms from bunching up  Only O(n) interacting pairs [Halperin&Overmars 98] Problem: How to find interacting pairs without enumerating all atom pairs? Grid Method dcutoff  Subdivide 3-space into cubic cells  Compute cell that contains each atom center  Represent grid as hashtable Grid Method dcutoff  Θ(n) time to build grid  O(1) time to find interactive pairs for each atom  Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98]  Asymptotically optimal in worst-case Can we do better on average?  Few DOFs are changed at each MC step 0 simulation of 100,000 attempted steps 5 10 20 30 Number k of DOF changes Can we do better on average?  Few DOFs are changed at each MC step  Proteins are long chain kinematics  Long sub-chains stay rigid at each step  Many partial energy sums remain constant Problem: How to retrieve the unchanged partial sums? Hierarchical Collision Checking  Widely used technique in robotics/graphics to approximate distances between objects  Pre-computation of bounding-volume hierarchy  How to update this hierarchy if the objects deform Two New Data Structures 1. ChainTree  Fast detection of interacting atom pairs 2. EnergyTree  Retrieval of unchanged partial energy sums ChainTree (Twofold Hierarchy: BVs + Transforms) links ChainTree (Twofold Hierarchy: BVs + Transforms) TNO TJK TAB joints Updating the ChainTree Update path to root: – Recompute transforms that “shortcut” the DOF change – Recompute BVs that contain the DOF change – O(k log(n/k)) work for k changes Finding Interacting Pairs  Finding Interacting Pairs Finding Interacting Pairs  Do not search inside rigid sub-chains (unmarked nodes) Finding Interacting Pairs  Do not search inside rigid sub-chains (unmarked nodes)  Do not test two nodes with no marked node between them  New interacting pairs EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M) EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M) Complexity  n : total number of DOFs  k : number of DOF changes at each MCS step  k << n  Complexity of:  updating ChainTree: O(k log(n/k))  finding interacting pairs: O(n4/3) but performs much better in practice!!! Experimental Setup  Energy function:     Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å  300,000 steps MCS with Grid and ChainTree  Steps are the same with both methods  Early rejection for large vdW terms Results: 1-DOF change 12.5 7.8 speedup 5.8 3.5 # amino acids (68) (144) (374) (755) Results: 5-DOF change 5.9 speedup 4.5 3.4 2.2 (68) (144) (374) (755) Two-Pass ChainTree (ChainTree+) 1st pass: small cutoff distance to detect steric clashes 2nd pass: normal cutoff distance >5 Tests around native state Interaction with Solvent  Explicit solvent models: 100s or 1000s of discrete solvent molecules  Implicit solvent models: solvent as continuous medium, interface is solvent-accessible surface E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces under Conformational Changes. http://www.give.nl/movie/publications/telaviv/EH04.pdf Summary  Inverse kinematics techniques  Improve structure determination from fuzzy electron density maps  Collision detection techniques  Speedup energy maintenance during Monte Carlo simulation About Computational Biology  Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation)  One of its goals is to achieve algorithmic efficiency by exploiting properties of molecules, e.g.: • Proteins are long kinematic chains • Atoms cannot bunch up together • Forces have relatively short ranges

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe

Related documents

Products

Support

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib