Conformation Networks: an Application to Protein Folding Zoltán Toroczkai

Conformation Networks: an Application to Protein Folding Zoltán Toroczkai Erzsébet Ravasz Center for Nonlinear Studies Gnana Gnanakaran (T-10) Theoretical Biology and Biophysics Los Alamos National Laboratory Center for Nonlinear Studies Proteins  the most complex molecules in nature  globular or fibrous  basic functional units of a cell  chains of amino acids (50 – 103)  peptide bonds link the backbone Native state  unique 3D structure (native physiological conditions)  biological function  fold in nanoseconds to minutes  about 1000 known 3D structures: crystallography, NMR X-ray Myoglobin 153 Residues, Mol. Weight=17181 [D], 1260 Atoms Main function: primary oxygen storage and carrier in muscle tissue It contains a heme (iron-containing porphyrin ) group in the center. C34H32N4O4FeHO Protein conformations • defined by dihedral angles  2 angles with 2-3 local minima of the torsion energy Amino-acid  N monomers  about 10N different conformations Levinthal’s paradox • Anfinsen: thermodynamic hypothesis native state is at the global minimum of the free energy • Levinthal’s paradox, 1968 Epstain, Goldberger, & Anfinsen, Cold Harbor Symp. Quant. Biol. 28, 439 (1963) Levinthal, J. Chim. Phys. 65, 44-45 (1968)  finding the native state by random sampling is not possible  40 monomer polypeptide  1013 conf/s  3 1019 years to sample all  universe ~ 2 1010 years old  nucleation  folding pathways Wetlaufer, P.N.A.S. 70, 691 (1973) Free energy landscapes • Bryngelson & Wolynes, 1987 Bryngelson & Wolynes, P.N.A.S. 84, 7524 (1987)  free energy landscape  a random hetero-polymer typically does NOT fold Experiment: — random sequences — GLU, ARG, LEU — 80-100 amino-acids ~ 95% did not fold in a stable manner Davidson & Sauer, P.N.A.S. 91, 2146 (1994) Funnels • Leopold, Mortal & Onuchic, 1992 Leopold, Mortal & Onuchic, P.N.A.S. 89, 8721 (1992) Energy funnels Given any amino-acid sequence: can we tell if it is a good folder?  experiments (X-ray, NMR)  molecular dynamics simulations  homology modeling  many folding pathways Difficult and slow Molecular dynamics • State of the art Sanbonmatsu, Joseph & Tung, P.N.A.S. 102 15854 (2005)  supercomputer (LANL) Ribosome in explicit solvent: – targeted MD – 2.64x106 atoms (2.5x105 + water) – Q machine, 768 processors – 260 days of simulation (event: 2 ns)  distributed computing (Stanford, Folding@home) ~ 1016 times slower – more than 100,000 CPU’s – simulation of complete folding event » BBA5, 23-residue, implicit water » 10,000 CPU days/folding event (~1s) Shirts & Pande, Science 290, 1903 (2000) Snow, Nguyen, Pande, Gruebele, Nature 420,102 (2002) Configuration networks • Configuration networks Protein conformations  dihedral angles have few preferred values Ramachandran & Sasisekharan, J.Mol.Biol. 7, 95 (1963) • Helix • Sheet • other Ramachandran map PDB structures NODE  configuration LINK  change of one degree of freedom (angle)  refinement of angle values  continuous case Why networks? • VERY LARGE: 100 monomers  10100 nodes. However: Generic features of folding are determined by STATISTICAL properties of the configuration network  toolkit from network research  captures the high dimensionality  degree distribution  average distance  clustering  degree correlations Albert & Barabási, Rev. Mod. Phys. 74, 67 (2002); Newman, SIAM Rev. 45, 167 (2003)  faster algorithms to simulate folding events  pre-screening synthetic proteins  insights into misfolding A real example • The Protein Folding Network: F. Rao, A. Caflisch, J.Mol.Biol, 342, 299 (2004)  beta3s: 20 monomers, antiparallel beta sheets  MD simulation, implicit water  330K, equilibrium folded  random coil NODE -- 8 letters / AA (local secondary struct) LINK -- 2ps transition Its native conformation has been studied by NMR experiments: De Alba et.al. Prot.Sci. 8, 854 (1999). Beta3s in aqueous solution forms a monomeric triple-stranded antiparallel beta sheet in equilibrium with the denaturated state. •Simulations @ 330K •The average folding time from denaturated state ~ 83ns •The average unfolding time ~83ns •Simulation time ~12.6s •Coordinates saved at every 20ps (5105 snapshots in 10s) •Secondary structures: H,G,I,E,B,T,S,- (-helix, 310 helix, -helix, extended, isolated -bridge, hydrogen-bonded turn, bend and unstructured). •The native state: -EEEESSEEEEEESSEEEE•There are approx. 818 1016 conformations. •Nodes: conformations, transitions: links. Scale-free network beta3s randomized Many real-world networks are scale free   hubs Barabási & Albert, Science 286, 509, (1999); Many reasons  co-authorship (=1 - 2.5) behind SF topology  citations (=3) • • •  sexual contacts (=3.4)  movie actors (=2.3)  Internet Why is the protein network scale free?(y=2.4)  World Why does the randomized chain haveWide Web (=2.1/2.5)  Genetic regulation (=1.3) similar degree distribution?  Protein-protein interactions ( =2.4)  Metabolic pathways (=2.2) Why is = - 2 ?  Food webs (=1.1)  Robot arm networks 12 02 n=0 22 021 11 n=1 0 1 2 01 21 00 20 020 10 010 00 n=2 22 01 02 10 11 12 20 21 000  n-dimensional hypercube 100 200 • Steric constraints?  binomial degree distribution Homogeneous  missing nodes  missing links Swiss cheese A bead-chain model • Beads on a chain in 3D: robot arm model  similar to C protein models  rod-rod angle   3 positions around axis N=18;  = 120 2212112212111122 Honeycutt & Thirumalai, Biopolymers 32, 695 (1992) N=6;  = 90  Homogeneous network Another example: L = 7,  = 75 , r = 0.25 “00100” state “00100” allowed state forbidden state Adding monomers not only increases the number of nodes in the network but also its dimensionality!! The combined effect is small-world. Shortcuts in Folding Space The “dilemma” HOMOGENEOUS • from studies of conformation networks SCALE FREE • from polypeptide MD simulations  bead chain  beta3s  robot arm  randomized version ? Gradient Networks Gradients of a scalar (temperature, concentration, potential, etc.) induce flows (heat, particles, currents, etc.). Naturally, gradients will induce flows on networks as well. Ex.: Load balancing in parallel computation and packet routing on the internet Y. Rabani, A. Sinclair and R. Wanka, Proc. 39th Symp. On Foundations of Computer Science (FOCS), 1998: “Local Divergence of Markov Chains and the Analysis of Iterative Load-balancing Schemes” References: Z. T. and K.E. Bassler, “Jamming is Limited in Scale-free Networks”, Nature, 428, 716 (2004) Z. T., B. Kozma, K.E. Bassler, N.W. Hengartner and G. Korniss “Gradient Networks”, http://www.arxiv.org/cond-mat/0408262 Setup: Let G=G(V,E) be an undirected graph, which we call the substrate network. The vertex set: V  {x0 , x1 ,..., xN 1}  {0,1,2,..., N  1} The edge set: E  V  V , e  E , e  xi x j  (i, j ), xx  E (no self - loops) A simple representation of E is via the Nx N adjacency (or incidence) matrix A 1 if (i, j )  E A( xi , x j )  aij   0 if (i, j )  E Let us consider a scalar field (1) {h} : V   Set of nearest neighbor nodes on G of i : Si(1) Definition 1 The gradient h(i) of the field {h} in node i is a directed edge: h(i )  (i,  (i )) (2) (1) Which points from i to that nearest neighbor   Si  {i} for G for which the increase in the scalar is the largest, i.e.,:  (i)  arg max (h j ) (3) jS i(1) {i} The weight associated with edge (i,) is given by: h(i)  h  hi If  (i )  i then h(i )  (i, i )  0(i ) . The self-loop 0(i ) is a loop through i with zero weight. Definition 2 The set F of directed gradient edges on G together with the vertex set V forms the gradient network: G  G (V , F ) If (3) admits more than one solution, than the gradient in i is degenerate. In the following we will only consider scalar fields with non-degenerate gradients. This means: Prob.{hi  h j if (i, j )  E}  0 Theorem 1 Proof: Non-degenerate gradient networks form forests. Theorem 2 The number of trees in this forest = number of local maxima of {h} on G. 0.48 0.82 0.67 0.65 0.46 0.6 0.53 0.44 0.5 0.22 0.2 0.65 0.1 0.19 0.16 0.87 0.15 0.32 0.14 0.2 0.2 0.18 0.44 0.43 0.67 0.7 0.05 0.15 0.24 0.16 0.65 0.13 0.55 0.05 0.65 0.8 For Erdős - Rényi random graph substrates with i.i.d random numbers as scalars, the in-degree distribution is: In the limit p  0, N  , z  Np  const. , z  1 : 1 RN (l )  , 1  l  z  Np , zl The Configuration model A. Clauset, C. Moore, Z.T., E. Lopez, to be published. Generating functions: g ( z )   ki z k i  xg ( x)   R( z )   dx g 1  (1  z ) g (1)   0 1 K-th Power of a Ring    4 3  9 K  4 K 2  2 Kl , 1  l  K 1   (2 K  l )( 2 K  l  1)( 2 K  l  2)( 2 K  l  3)   6 2  7K  7K 2  , lK 3K (3K  1)(3K  2)(3K  3)  R ( 2 K ) (l )    42 K  1  , K  1  l  2K  1  (2 K  l  1)( 2 K  l  2)( 2 K  l  3)   1  , l  2K  4 K  1   Power law with exponent =- 3 2K+l The energy landscape • Energy associated with each node (configuration)  the gradient network  most favorable transitions  T=0 backbone of the flow  MD simulation  tracks the flow network  biased walk close to the gradient network  trees  basins of local minima What generates  = - 2 ? The REM generates an exponent of -1. Model ingredients • A network model of configuration spaces  network topology  homogeneous  degree correlations  how to associate energies constrained (folded) small kconf lower energy k, E increases loose (random coil) large kconf higher energy Random geometric graph • random geometric graph R=0.113, <k>=20 Dall & Christensen, Phys.Rev.E 66, 026121 (2002) • Energy proportional to connectivity E  in higher D: similar to hypercube with holes  degree correlations k N=30000, <k> = 1000, d=2. Exponent is - 2 2 essential ingredients: 1) 2) k1-k2 correlations <E> with k monotonic Bead-chain model • more realistic model: bead-chain  configuration network  excluded volume  energy: Lennard-Jones Lennard-Jones potential Repulsive Attractive L = 30,  = 75 The case of the -helix AKA peptide • ALA: orange • LYS: blue • TYR: green MD simulations, no water. The MD traced network T = 400 More than one simulation 3 different runs: yellow, red and green The role of temperature Conclusions • A network approach was introduced to study sterically constrained conformations of ball-chain like objects. • This networks approach is based on the “statistical dogma” stating that generic features must be the result of statistical properties of the networks and should not depend on details. • Protein conformation dynamics happens in high dimensional spaces that are not adequately described by simplistic reaction coordinates. • The dynamics performs a locally biased sampling of the full conformational network. For low enough temperatures the sampled network is a gradient graph which is typically a scale-free structure. • The -2 degree exponent appears at and bellow the temperature where the basins of the local energy minima become kinetically disconnected. • Understanding the protein folding network has the potential of leading to faster simulation algorithms towards closing the gap between nature’s speed and ours. Coming up: conditions on side chain distributions for the existence of funneled energy landscapes.

Conformation Networks: an Application to Protein Folding Zoltán Toroczkai

Related documents

Products

Support

Conformation Networks: an Application to Protein Folding Zoltán Toroczkai

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib