CONFORMATIONAL OPTIMIZATION AND SAMPLING ALONG NATURAL COORDINATES Peter Minary Computational Structural Biology Group & Bio-X Center Stanford University Stanford, CA 94305 1 TALK OUTLINE – Obstacles for Deciphering the Central Dogma of MB – Challenges for Optimization & Sampling Algorithms – Natural Coordinates for Biological Macromolecules – Chain Closure Algorithms, Obstacles & Solutions – An Atomic Level Insight into the Central Dogma • Nucleosome Positioning/Large Scale Optimization • Structure Space of RNA Junctions and Fractals • Interpretation & Refinement of Experimental Data 2 CENTRAL DOGMA OF MOLECULAR BIOLOGY Translation Folding Post Transcriptional Regulation (1) F. H. C. Crick et al. Nature 227 561-563 (1970). Motion F. H. Crick(1) FUNCTION “If you want to understand function, study structure.” F. H. C. Crick 3 CENTRAL DOGMA OF MOLECULAR BIOLOGY Translation Folding Post Transcriptional Regulation (1) F. H. C. Crick et al. Nature 227 561-563 (1970). Motion F. H. Crick(1) 4 FUNCTION TRANSCRIPTIONAL REGULATION DNA ...GTCCAGTTACGAATTGCGCGC… DNA TF Nucleosome Structure Nucleosome Positioning 3D Structure ~ ...GTCCAGTTACGAATTGCGCGC… DNA in Chromatin Scan DNA – Grand Challenges for CSB • Structure Based Prediction of Nucleosome Positions • Structure Based Prediction of TransF Binding Sites • Requires All Atom Representation & Rapid Optimization • Simultaneously Explore Sequence and Structure Space • Need Conceptually Novel Optimization/Sampling Tools TF E(Xi) …..GTGAATGCCCAG….. 5 CENTRAL DOGMA OF MOLECULAR BIOLOGY Translation Folding Post Transcriptional Regulation (1) F. H. C. Crick et al. Nature 227 561-563 (1970). Motion F. H. Crick(1) 6 FUNCTION POST TRANSCRIPTIONAL REGULATION – Grand Challenges for CSB • Prediction of RNA Tertiary Structure • & Transport Protein Binding Sites • Need a Novel O/S Approach EXAMPLE: mRNA TRANSPORT IN NEURONS CENTRAL DOGMA OF MOLECULAR BIOLOGY Translation Folding Post Transcriptional Regulation (1) F. H. Crick et al. Nature 227 561-563 (1970). Motion F. H. Crick(1) 8 FUNCTION PROTEIN MOTION EM images of Molecular Complex – In Current Trend: Experimentally Measured Structures Are Getting • Larger in Size • Higher in Flexibility • Lower in Resolution – In Current Refinement Methods Atomic Motions Are Modeled As • Independent • Isotropic • Harmonic – To Follow the Trend Atomic Motion in Refinement Methods Should Be FAS Fatty Acid Synthase • Collective • Anisotropic • Anharmonic – Demand for Novel Optimization Methods for Structure Refinement 9 CHALLENGES FOR OPTIMIZATION & SAMPLING ALGORITHMS – Roughness of the object function, E(X) • Leads to rare events in Markov Chain MC(1) • Solutions – Multiple Markov Chains in Temperature(2)/Energy Domain(3, 4) – Transformation of Variables(5) and/or using Extra Dimensions(6) – Large number of degrees of freedom, Nd • Number of energy basins is non polynomial in Nd • Solutions – Local or Global Torsional Degrees of Freedom(4,7) – Arbitrary/Most Relevant/Natural Degrees of Freedom(9) (1) (2) (3) (4) (5) (6) (7) (8) (9) Metropolis, et al. J. Chem. Phys. 21, 1087-1091 (1953). Geyer, et al. Proceedings of the 23rd Symposium on the Interface, 156-163 (1991). Kou, et al. Annals of Statistics 34 1581-1619 (2006). Minary et al. Annals of Statistics 34 1638-1642 (2006). Minary et al. SIAM Journal of Scientific Computing 30 2055-2083 (2008). Minary et al. J. Chem. Phys. 118 2510-2525 (2003) Minary et al. J. Mol. Biol. 25 920-933 (2008). Dodd et al. Mol. Phys. 78 961-996 (1993). Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). 10 NATURAL DEGREES of FREEDOM for NUCLEIC ACIDS Dx Shift Dy Slide Dz Rise τ Tilt ρ Roll ω Twist Dx O3′ z x y Dy O1’ 12 1 y Dz Dx Dy Dz τ Sx κ ρ Sy π ω Sz σ N z x y τ z y ρ ω y Sx z x Sy x Sz y Moves break the chain! z x x y z x y z P dof: 10 (4+12x½) κ z y y κ Buckle π Propeller σ Opening O5’ RC 2 x O3′ C4’ 23 z x Sx Shear Sy Stretch Sz Stagger C5’ π z x y z x σ z x y 11 NATURAL DEGREES of FREEDOM for PROTEINS β-SHEET & α-HELIX Sx Sy Sz Shear Stretch Stagger z Sx x y κ Buckle π Propeller σ Opening Moves break the chain! 12 CHAIN CLOSURE ALGORITHMS – Analytical multi atom closure algorithms(1) • Ncd non-linear equations and Ncd unknown, Ncd number of closure dof • Ncd = 6 is the practical limit, given that the complexity is O(fNP(Ncd)) – Single atom Deterministic Full Closure (DFC)(2) • Cost efficient • Two solutions or No solution – Single atom Stochastic Partial Closure (SPC)(3) • Cost efficient • Solution always exist for • Any size of the chain break (1) Dodd et al. Mol. Phys. 78 961-996 (1993). (2) Sklenar et al. J. Comp Chem. 27 309-315 (2005). (3) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). 13 RECURSIVE STOCHASTIC CLOSURE 1 cycle of RSC = DFC[ SPC[ SPC[ SPC[…] ] ] ] Molten zone 1st cycle m cycles • One SPC step – Restores 4-5, breaks 3-4 • DFC Multiple SPC steps – Propagates the chain brake – Narrows closure gap • AC = O(Ncd) << O(fNP(Ncd)) – Ncd = 2 Nm + 5 Molten zone Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). 14 MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-I Molten zone (C4’….O3’) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). 15 MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-II • Monte Carlo Minimization(1) (MCM) is Monte Carlo on E E ( X ) min E ( X ) X E • In MCRSC(2) is Monte Carlo on E ( X i X d ) min E ( X i X d ) Xd minimization MCM BFGS, CG MCRSC N cycle of RSC invariant DOF X E evaluation none cart/tors ~10-1000 Xi arbitrary 1 (1) Wales, D. J., Scheraga, H. A. Science 285 1368-1372 (1999). (2) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010). 16 RECURSIVE STOCHASTIC vs DETERMINISTIC FULL CLOSURE in MONTE CARLO: a B-DNA Dx z x y Dy y z x y z x y Dz Sx Sy z x y z x Sz z x y dof: 6 E2 binding DNA: 5’-ACCGAATTCGGT-3’ Force Field: amber99-bs0 • RSC works with an order of magnitude larger move sizes than DFC • RSC is like a wire, you pull the system that deforms to follow the change 17 Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). RECURSIVE STOCHASTIC CLOSURE vs LOOP TORSIONAL SAMPLING in MONTE CARLO: an α+β PROTEIN (1) Ncd = 19 (2) SCOP id: d1div_2, 55 residue domain (1) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). (2) Minary & Levitt J. Mol. Biol. 25 920-933 (2008). 18 APPLICATIONS 19 THE METHOD: GENERAL PIPELINE IN SILICO NUCLEOSOME POSITIONING 20 APPLICATION TO CHROMOSOME 14 IN SILICO NUCLEOSOME POSITIONING • Yeast Chromosome 14 – 187k-189k from SGD(1) – Experimental Data(2) • Nucleosome template – 1.9 Å resolution – pdb code (1kx3)(3) • Slide nucleosome along DNA – Slide a 147 bp window – Design template 187k 189k 201k 203k 205k 207k • Run MCRSC on all structures – Force field: – Software: AMBER99-bs0(5) MOSAICS(6) • Get probability profile ab initio in vitro – P(i) ~ exp(-β <E(i)>) P(i) P(i) (1) Cherry, J. M. et al., Nucleic Acids Res. 26, 73-79 (1998). (2) Kaplan, N. et al., Nature 458, 362-366 (2006). (3) Davey, C. A. et al., J. Mol. Biol. 319 1097-1113 (2002). (4) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). (5) Perez et al., Biophysics J. 92 3817-3827 (2007). (6) Minary (2010). i i Minary & Levitt 21 NUCLEOSOME OCCUPANCY IN SILICO NUCLEOSOME POSITIONING Yeast Chromosome 14 P(i) in vitro P(i) ab initio P(i) in vivo 187000 191000 195000 i 199000 203000 P(i) 207000 in vitro P(i) ab initio P(i) in vivo 191000 193000 195000 i Minary & Levitt 197000 199000 22 HIERARCHICAL NATURAL DOFs/MOVES (HNM) L1 EXPLORING RNA STRUCTURE SPACE L2 L1 L3 L4 23 RNA 4 WAY JUNCTION: SAMPLING METHODS EXPLORING RNA STRUCTURE SPACE MCRSC(1) NM-MC(1,3) L1 HNM-MC(1,2,3) + User Defined Move Sets (Medicine/Physics) (Chemistry/Biology) L1 - L4 MCRSC(1) Move Set(1,2,3) Sampling Methods NM-MC(1,3) L1 + L1 L2 ... = L1 – L2 L1 – L3 HNM-MC(1,2,3) L1 – L4 L3 L4 (1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010). (2) Sim, A., Levitt, M., Minary, P. To be submitted. (3) Minary, P., MOSAICS: http://csb.stanford.edu/minary/MOSAICS . . . . 24 RNA 4 WAY JUNCTION NM-MC(1,5) (a) L1 EXPLORING RNA STRUCTURE SPACE FA-MC-Sym(2) (b) FA-Rosetta(3) (c) L1 L2 L3 L4 • Necessary condition for unbiased sampling HNM-MC(1,4,5) L1-L4 (d) L1 - L4 – Symmetric RNA -> distributions coincide • Easy to improve by field specific move set – RNA : relative arrangement of stem loops • Comparing to Fragment Assembly – Biased and non continuous sampling – Dependence on fragment libraries (1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010). (2) Parisien and Major, Nature, 452, 51 (2008). (3) R. Das, J. Karanicolas, and D. Baker, Nat. Methods 7 (4), 291 (2010). (4) Sim, A., Levitt, M., Minary, P. , To be submitted. (5) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS HNM-MC(1,4,5) 25 FRACTAL RNA: BEYOND CURRENT METHODS εrror(i) EXPLORING RNA STRUCTURE SPACE i x 104 • Necessary condition for unbiased sampling – Symmetric RNA -> armend distributions coincide L1 – L4 L1 – L7 • Further improvement by L5, L6, L7 – No limitation on improvement • Benchmark with different move sets – Accuracy converges by L7(1,2,3) (1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010). (2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS HNM-MC(1,2,3) 26 FRACTAL RNA: WHY/HOW DOES IT WORK? EXPLORING RNA STRUCTURE SPACE • Use embedded subspaces 3 2 1 • In particular – 3 : 6 DOFs / main arms(2) – 2 : 6 DOFs / arms of arms(2) 3 2 1 – 1 : 10 DOFs / nucleotides(1) • Low cost method to approximate dL (L) f (L) L , f : ° • Multi scale integration(3) along – L3 3 – L2 2 around all L 3 – L1 1 around all L2 (1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010). (2) Sim, A., Levitt, M., Minary, P. , To be submitted. (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS 27 CRYO-EM REFINEMENT OBJECTIVE EM images of Molecular Complex Fatty Acid Synthase (FAS) Objective initial model EM image refined model 28 CRYO-EM REFINEMENT VALIDATION I optimization(1)-(3) along natural dof initial structure 18 Å rmsd refined structure target projection target structure (1) Zhang, Minary, Levitt In preparation. (2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS 2 Å rmsd 29 VALIDATION II: CROSS CORRELATION OF MAPS Lysozyme CRYO-EM REFINEMENT Projection Angle cc THE PROTOCOL CRYO-EM REFINEMENT Etotal= Weight*EEM+ Emolecule Lysozyme 31 REFINEMENT CRYO-EM REFINEMENT 32 CRYO-EM REFINEMENT DOMAIN FLEXIBILITY (1)-(3) (4) (1) Zhang, Minary, Levitt In preparation. (2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010). (3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS (4) Courtesy of Steve Ludtke, Baylor College, Texas. 33 CONCLUSION • CSB has Limited Impact due to Inefficient Conformational Sampling • Novel Algorithms Supporting Natural DOF May Offer The Solution • Our Novel Approach May Open New Avenues • – In The Refinement and Interpretation of Experimental Data – In The Use of Structural Information in Molecular Biology Atomic Level Understanding of the CDMB may be a reality with NC CDMB “If the code does indeed have some logical foundation then it is legitimate to consider all the evidence, both good and bad, in any attempt to deduce it.” F. C. H. Crick FUNCTION 34 ACKNOWLEDGEMENTS – – – – – – – Michael Levitt Jernei Ule Peter Lukavszky Sebastian Doniach Zev Bryan Wing H Wong Wah Chiu – Adelene Sim – Gaurav Chopra – Junjie Zhang Computer Sci. & Structural Biology, Stanford, US Molecular Biology/MRC, Cambridge, UK Molecular Biology/MRC, Cambridge, UK Physics, Stanford, US Bioengineering, Stanford, US Statistics, Stanford, US Baylor College, Texas, US Physics, Stanford, US (graduate student) Mathematics, Stanford, US (graduate student) Baylor College and Stanford, US (postdoc) – Anatole von Lilienfeld & and Workshop Organizing Committee 35