Peter Minary Computational Structural Biology Group & Bio-X Center Stanford University

advertisement
CONFORMATIONAL OPTIMIZATION AND SAMPLING
ALONG NATURAL COORDINATES
Peter Minary
Computational Structural Biology Group & Bio-X Center
Stanford University
Stanford, CA 94305
1
TALK OUTLINE
– Obstacles for Deciphering the Central Dogma of MB
– Challenges for Optimization & Sampling Algorithms
– Natural Coordinates for Biological Macromolecules
– Chain Closure Algorithms, Obstacles & Solutions
– An Atomic Level Insight into the Central Dogma
• Nucleosome Positioning/Large Scale Optimization
• Structure Space of RNA Junctions and Fractals
• Interpretation & Refinement of Experimental Data
2
CENTRAL DOGMA OF MOLECULAR BIOLOGY
Translation
Folding
Post
Transcriptional
Regulation
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).
Motion
F. H. Crick(1)
FUNCTION
“If you want to understand function,
study structure.” F. H. C. Crick
3
CENTRAL DOGMA OF MOLECULAR BIOLOGY
Translation
Folding
Post
Transcriptional
Regulation
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).
Motion
F. H. Crick(1)
4
FUNCTION
TRANSCRIPTIONAL REGULATION
DNA
...GTCCAGTTACGAATTGCGCGC…
DNA
TF
Nucleosome Structure
Nucleosome Positioning
3D Structure
~
...GTCCAGTTACGAATTGCGCGC…
DNA in Chromatin
Scan DNA
– Grand Challenges for CSB
• Structure Based Prediction of Nucleosome Positions
• Structure Based Prediction of TransF Binding Sites
• Requires All Atom Representation & Rapid Optimization
• Simultaneously Explore Sequence and Structure Space
• Need Conceptually Novel Optimization/Sampling Tools
TF
E(Xi)
…..GTGAATGCCCAG…..
5
CENTRAL DOGMA OF MOLECULAR BIOLOGY
Translation
Folding
Post
Transcriptional
Regulation
(1) F. H. C. Crick et al. Nature 227 561-563 (1970).
Motion
F. H. Crick(1)
6
FUNCTION
POST TRANSCRIPTIONAL REGULATION
– Grand Challenges for CSB
• Prediction of RNA Tertiary Structure
• & Transport Protein Binding Sites
• Need a Novel O/S Approach
EXAMPLE: mRNA TRANSPORT IN NEURONS
CENTRAL DOGMA OF MOLECULAR BIOLOGY
Translation
Folding
Post
Transcriptional
Regulation
(1) F. H. Crick et al. Nature 227 561-563 (1970).
Motion
F. H. Crick(1)
8
FUNCTION
PROTEIN MOTION
EM images of Molecular Complex
– In Current Trend: Experimentally
Measured Structures Are Getting
• Larger in Size
• Higher in Flexibility
• Lower in Resolution
– In Current Refinement Methods
Atomic Motions Are Modeled As
• Independent
• Isotropic
• Harmonic
– To Follow the Trend Atomic Motion
in Refinement Methods Should Be
FAS
Fatty
Acid
Synthase
• Collective
• Anisotropic
• Anharmonic
– Demand for Novel Optimization
Methods for Structure Refinement
9
CHALLENGES FOR OPTIMIZATION & SAMPLING ALGORITHMS
– Roughness of the object function, E(X)
• Leads to rare events in Markov Chain MC(1)
• Solutions
– Multiple Markov Chains in Temperature(2)/Energy Domain(3, 4)
– Transformation of Variables(5) and/or using Extra Dimensions(6)
– Large number of degrees of freedom, Nd
• Number of energy basins is non polynomial in Nd
• Solutions
– Local or Global Torsional Degrees of Freedom(4,7)
– Arbitrary/Most Relevant/Natural Degrees of Freedom(9)
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
Metropolis, et al. J. Chem. Phys. 21, 1087-1091 (1953).
Geyer, et al. Proceedings of the 23rd Symposium on the Interface, 156-163 (1991).
Kou, et al. Annals of Statistics 34 1581-1619 (2006).
Minary et al. Annals of Statistics 34 1638-1642 (2006).
Minary et al. SIAM Journal of Scientific Computing 30 2055-2083 (2008).
Minary et al. J. Chem. Phys. 118 2510-2525 (2003)
Minary et al. J. Mol. Biol. 25 920-933 (2008).
Dodd et al. Mol. Phys. 78 961-996 (1993).
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
10
NATURAL DEGREES of
FREEDOM for
NUCLEIC ACIDS
Dx Shift
Dy Slide
Dz Rise
τ Tilt
ρ Roll
ω Twist
Dx
O3′
z
x
y
Dy
O1’
 12
1
y
Dz
Dx
Dy
Dz
τ Sx κ
ρ Sy π
ω Sz σ
N
z
x
y
τ
z
y
ρ
ω
y
Sx
z
x
Sy
x
Sz
y
Moves
break the
chain!
z
x
x
y
z
x
y
z
P
dof: 10
(4+12x½)
κ
z
y
y
κ Buckle
π Propeller
σ Opening
O5’
RC
2
x
O3′
C4’
 23
z
x
Sx Shear
Sy Stretch
Sz Stagger
C5’
π
z
x
y
z
x
σ
z
x
y
11
NATURAL DEGREES of
FREEDOM for PROTEINS
β-SHEET & α-HELIX
Sx
Sy
Sz
Shear
Stretch
Stagger
z
Sx
x
y
κ Buckle
π Propeller
σ Opening
Moves
break the
chain!
12
CHAIN CLOSURE ALGORITHMS
– Analytical multi atom closure algorithms(1)
• Ncd non-linear equations and Ncd unknown, Ncd number of closure dof
• Ncd = 6 is the practical limit, given that the complexity is O(fNP(Ncd))
– Single atom Deterministic Full Closure (DFC)(2)
• Cost efficient
• Two solutions or No solution
– Single atom Stochastic Partial Closure (SPC)(3)
• Cost efficient
• Solution always exist for
• Any size of the chain break
(1) Dodd et al. Mol. Phys. 78 961-996 (1993).
(2) Sklenar et al. J. Comp Chem. 27 309-315 (2005).
(3) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
13
RECURSIVE STOCHASTIC CLOSURE
1 cycle of RSC = DFC[ SPC[ SPC[ SPC[…] ] ] ]
Molten zone
1st cycle
m cycles
• One SPC step
– Restores 4-5, breaks 3-4
• DFC
Multiple SPC steps
– Propagates the chain brake
– Narrows closure gap
• AC = O(Ncd) << O(fNP(Ncd))
– Ncd = 2 Nm + 5
Molten zone
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
14
MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-I
Molten zone (C4’….O3’)
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
15
MONTE CARLO RECURSIVE STOCHASTIC CLOSURE-II
• Monte Carlo Minimization(1) (MCM) is Monte Carlo on
E
E ( X )  min E ( X )
X
E
• In MCRSC(2) is Monte Carlo on
E ( X i  X d )  min E ( X i  X d )
Xd
minimization
MCM
BFGS, CG
MCRSC
N cycle of RSC
invariant DOF
X
E evaluation
none
cart/tors
~10-1000
Xi
arbitrary
1
(1) Wales, D. J., Scheraga, H. A. Science 285 1368-1372 (1999).
(2) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
16
RECURSIVE STOCHASTIC vs DETERMINISTIC FULL CLOSURE
in MONTE CARLO: a B-DNA
Dx
z
x
y
Dy
y
z
x
y
z
x
y
Dz
Sx
Sy
z
x
y
z
x
Sz
z
x
y
dof: 6
E2 binding DNA: 5’-ACCGAATTCGGT-3’
Force Field: amber99-bs0
• RSC works with an order of magnitude larger move sizes than DFC
• RSC is like a wire, you pull the system that deforms to follow the change
17
Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
RECURSIVE STOCHASTIC CLOSURE vs LOOP TORSIONAL
SAMPLING in MONTE CARLO: an α+β PROTEIN
(1)
Ncd = 19
(2)
SCOP id: d1div_2, 55 residue domain
(1) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
(2) Minary & Levitt J. Mol. Biol. 25 920-933 (2008).
18
APPLICATIONS
19
THE METHOD: GENERAL PIPELINE
IN SILICO NUCLEOSOME POSITIONING
20
APPLICATION TO CHROMOSOME 14
IN SILICO NUCLEOSOME POSITIONING
• Yeast Chromosome 14
– 187k-189k from SGD(1)
– Experimental Data(2)
• Nucleosome template
– 1.9 Å resolution
– pdb code (1kx3)(3)
• Slide nucleosome along DNA
– Slide a 147 bp window
– Design template
187k
189k
201k
203k
205k
207k
• Run MCRSC on all structures
– Force field:
– Software:
AMBER99-bs0(5)
MOSAICS(6)
• Get probability profile
ab initio
in vitro
– P(i) ~ exp(-β <E(i)>)
P(i)
P(i)
(1)
Cherry, J. M. et al., Nucleic Acids Res. 26, 73-79 (1998).
(2) Kaplan, N. et al., Nature 458, 362-366 (2006).
(3) Davey, C. A. et al., J. Mol. Biol. 319 1097-1113 (2002).
(4) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
(5) Perez et al., Biophysics J. 92 3817-3827 (2007).
(6) Minary (2010).
i
i
Minary & Levitt
21
NUCLEOSOME OCCUPANCY
IN SILICO NUCLEOSOME POSITIONING
Yeast Chromosome 14
P(i)
in vitro
P(i)
ab initio
P(i)
in vivo
187000
191000
195000
i
199000
203000
P(i)
207000
in vitro
P(i)
ab initio
P(i)
in vivo
191000
193000
195000
i
Minary & Levitt
197000
199000
22
HIERARCHICAL NATURAL DOFs/MOVES (HNM)
L1
EXPLORING RNA STRUCTURE SPACE
L2
L1
L3
L4
23
RNA 4 WAY JUNCTION: SAMPLING METHODS
EXPLORING RNA STRUCTURE SPACE
MCRSC(1)
NM-MC(1,3)
L1
HNM-MC(1,2,3)
+
User Defined
Move Sets
(Medicine/Physics)
(Chemistry/Biology)
L1 - L4
MCRSC(1)
Move Set(1,2,3)
Sampling Methods
NM-MC(1,3)
L1
+
L1
L2
... =
L1 – L2
L1 – L3 HNM-MC(1,2,3)
L1 – L4
L3
L4
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
(2) Sim, A., Levitt, M., Minary, P. To be submitted.
(3) Minary, P., MOSAICS: http://csb.stanford.edu/minary/MOSAICS
.
.
.
.
24
RNA 4 WAY JUNCTION
NM-MC(1,5)
(a)
L1
EXPLORING RNA STRUCTURE SPACE
FA-MC-Sym(2)
(b)
FA-Rosetta(3)
(c)
L1
L2
L3
L4
• Necessary condition for unbiased sampling
HNM-MC(1,4,5)
L1-L4 (d)
L1 - L4
– Symmetric RNA -> distributions coincide
• Easy to improve by field specific move set
– RNA : relative arrangement of stem loops
• Comparing to Fragment Assembly
– Biased and non continuous sampling
– Dependence on fragment libraries
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
(2) Parisien and Major, Nature, 452, 51 (2008).
(3) R. Das, J. Karanicolas, and D. Baker, Nat. Methods 7 (4), 291 (2010).
(4) Sim, A., Levitt, M., Minary, P. , To be submitted.
(5) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
HNM-MC(1,4,5)
25
FRACTAL RNA: BEYOND CURRENT METHODS
εrror(i)
EXPLORING RNA STRUCTURE SPACE
i x 104
• Necessary condition for unbiased sampling
– Symmetric RNA -> armend distributions coincide
L1 – L4
L1 – L7
• Further improvement by L5, L6, L7
– No limitation on improvement
• Benchmark with different move sets
– Accuracy converges by L7(1,2,3)
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
(2) Sim, A., Levitt, M., Minary, P. , To be submitted.
(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
HNM-MC(1,2,3)
26
FRACTAL RNA: WHY/HOW DOES IT WORK?
EXPLORING RNA STRUCTURE SPACE
• Use embedded subspaces
 3  2  1  
• In particular
–  3 : 6 DOFs / main arms(2)
–  2 : 6 DOFs / arms of arms(2)
3
2
1
– 1 : 10 DOFs / nucleotides(1)
• Low cost method to approximate
 

dL  (L) f (L)
L
, f :   °
• Multi scale integration(3) along
– L3  3
– L2  2 around all L 3
– L1 1 around all L2
(1) Minary, P., Levitt, M. J. Comp. Biol. 17(8) 993-11010 (2010).
(2) Sim, A., Levitt, M., Minary, P. , To be submitted.
(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
27
CRYO-EM REFINEMENT
OBJECTIVE
EM images of Molecular Complex
Fatty Acid Synthase (FAS)
Objective
initial model
EM image
refined model
28
CRYO-EM REFINEMENT
VALIDATION I
optimization(1)-(3)
along natural dof
initial structure
18 Å rmsd
refined structure
target projection
target structure
(1) Zhang, Minary, Levitt In preparation.
(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
2 Å rmsd
29
VALIDATION II: CROSS CORRELATION OF MAPS
Lysozyme
CRYO-EM REFINEMENT
Projection
Angle
cc
THE PROTOCOL
CRYO-EM REFINEMENT
Etotal= Weight*EEM+ Emolecule
Lysozyme
31
REFINEMENT
CRYO-EM REFINEMENT
32
CRYO-EM REFINEMENT
DOMAIN FLEXIBILITY
(1)-(3)
(4)
(1) Zhang, Minary, Levitt In preparation.
(2) Minary & Levitt J. Comp. Biol. 17(8) 993-11010 (2010).
(3) Minary, P. MOSAICS: http://csb.stanford.edu/minary/MOSAICS
(4) Courtesy of Steve Ludtke, Baylor College, Texas.
33
CONCLUSION
•
CSB has Limited Impact due to Inefficient Conformational Sampling
•
Novel Algorithms Supporting Natural DOF May Offer The Solution
•
Our Novel Approach May Open New Avenues
•
–
In The Refinement and Interpretation of Experimental Data
–
In The Use of Structural Information in Molecular Biology
Atomic Level Understanding of the CDMB may be a reality with NC
CDMB
“If the code does indeed have some logical
foundation then it is legitimate to consider all the
evidence, both good and bad, in any attempt to
deduce it.” F. C. H. Crick
FUNCTION
34
ACKNOWLEDGEMENTS
–
–
–
–
–
–
–
Michael Levitt
Jernei Ule
Peter Lukavszky
Sebastian Doniach
Zev Bryan
Wing H Wong
Wah Chiu
– Adelene Sim
– Gaurav Chopra
– Junjie Zhang
Computer Sci. & Structural Biology, Stanford, US
Molecular Biology/MRC, Cambridge, UK
Molecular Biology/MRC, Cambridge, UK
Physics, Stanford, US
Bioengineering, Stanford, US
Statistics, Stanford, US
Baylor College, Texas, US
Physics, Stanford, US
(graduate student)
Mathematics, Stanford, US (graduate student)
Baylor College and Stanford, US (postdoc)
– Anatole von Lilienfeld & and Workshop Organizing Committee
35
Download