H-bonds D-H … A Assignment: if D…A < 3.7 Å in crystal structure. (normally 2.7-3.1 Å). Energy of stablization: -12~-40 kJ/mol) Tends to be linear. Only weakly stabilize proteins. (!?) A survey over H-bonds in globular proteins (J. Mol. Biol. (1992) 226, 1143) Local H-bonds? The authors made this conclusion: “Most H-bonds are local.” Should be more critically reviewed. Most H-bonds are between beckbone atoms. Source: K. Schulten Group University of Illinois Urbana-Champaign Protein folding, dynamics and structural evolution Chapter 9 Questions How does a peptide sequence find its native, functional conformation? Is there a set of fundamental principles? We discussed several factors determining protein structure. Can we *predict* the structure, from sequence information yet? Determinants of Protein Folding We will discuss the following factors, as listed in V&V chapter 9. Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein. Still, keep in mind that most of them are based from observations and deductions. Exceptions are possible. Whether the statistical methods/criteria are acceptable is another question. Compactness QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Proteins are like liquids and glasses, instead of crystalline solids. The reverse statement is, does compactness serve as a factor determining protein structure? QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Compactness helps, but not enough. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Folding is directed mainly by internal residues Mutations that change surface residues are accepted more frequently and are less likely to affect protein conformations than are changes of internal residues. This is consistent with the idea of Hydrophobic force-driven folding. Determinants of Protein Folding Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein. Determinants of Protein Folding Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein. Protein structures are quite “resistant” to mutations A large number of single residue mutations do not yield a very different structure. A complete study was done in phage T4 lysozyme by B. W. Matthews. Homologous proteins comes with some sequence identity and they are often structurally similar. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Determinants of Protein Folding Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein. Voet Biochemistry 3e Page 300 © 2004 John Wiley & Sons, Inc. Table 9-1 Propensities and Classifications of Amino Acid Residues for a Helical and b Sheet Conformations. Secondary structure Secondary structure prediction can be done with more sophisticated algorithms. Artificial intelligence such as neuronal networks or support vector machines. Basically look at a local sequence and recongnize its pattern. Usually such methods need a training set. I.e. knowledge-based methods. Voet Biochemistry 3e Page 280 © 2004 John Wiley & Sons, Inc. • Green: residues 23-33. • Cyan: residues 42-53. • Chm-alpha: a new sequence replaces green part. • Chm-beta: the same new sequence replaced cyan part. • Both are structurally similar to native GB1. • The same sequence can be either an alpha helix or a beta sheet structure, depending on their context. Figure 9-6 NMR structure of protein GB1. Determinants of Protein Folding Space Packing. Directed mainly by internal residues Protein structures are hierarchically organized. Protein structures are highly adaptable. Secondary structure can be context dependent. Changing the fold of a protein. Homology Proteins that share some sequence identity may be structurally similar. One evidence that support evolution. Proteins with as little as 20% sequence identity may have similar structure. How much should be changed for a protein to assume a different structure? Voet Biochemistry 3e Page 281 © 2004 John Wiley & Sons, Inc. • GB1 and Rop are structurally different. • 50% of the residues of GB1 is changed, yielding a new polypeptide that assumes Rop-like structure. • This new peptide has 41% sequence identity with native Rop. • The idea of Protein design and engineering. Figure 9-7 X-Ray structure of Rop protein, a homodimer of aa motifs that associate to form a 4-helix bundle Protein Folding Levinthal’s paradox If for each residue there are only two degrees of freedom (,). Assume each can have only 3 stable values. This leads to 32n possible conformations. If a protein can explore 1013 conformation per second. (10 per picosecond). Still requires an astronomical amount of time to fold a protein. This is impossible. So protein must fold in a way that does not randomly explore each possible conformations. Molten Globule Much of the secondary structure that is present in a native proteins forms within a few milliseconds. This is called hydrophobic collapse. Something called “Molten Globule”. Slightly (5-15% in radius) larger than native conformation. Significant amount of secondary structure formed. Side chains are still not ordered/packed. Structure fluctuation is much larger. Not very thermodynamically stable. Are proteins sticky tapes? Are they simply hetereopolymers that like to form H-bond, hydrophobic interactions with each other? Proteins are not any random hetero-polymers • By observation: – every protein has a very stable native structure, – while polymers are usually random in their conformation. • Interesting observation for simple models: the “designability”. In the following materials are from: – R. Helling et al., “The designability of protein structures”, J. Mol. Graphics and Modelling, 19, 157, (2001). – J. Miller et al.,“Emergence of highly designable protein-backbone conformations in an off-lattice model” Proteins, 47, 506 (2002). – Steven S. Plotkin and Jose N. Onuchic, “Understanding protein folding with energy landscape theory Part I : Basic concepts” Quart. Rev. of Biophys., 35, 2 (2002), 111. A 3D lattice HP model • Assuming only two kinds of residule H and P. • Well-studied before. EHH=−2.3, EHP=−1, EPP=0 • Enumerate all 227 possible sequences. • Each sequence has a lowest energy structure. • Some sequences share the same structure. Count the number of sequence per unique structure NS. • Plot the distribution of NS. • (a) Histogram of NS for the 3 × 3 × 3 system. (b) Average energy gap between the ground state and the first excited state versus NS for the 3 × 3 × 3 system. 3794 different sequences share ONE structure! On the average, these structure have large energy gaps between the lowest energy structure and the next lowest one. NS: Number of seq. corresponding to a structure S Some structures are very “popular” for many different sequences 3794 different sequences share ONE structure! On the average, these structure have large energy gaps between the lowest energy structure and the next lowest one. NS: Number of seq. corresponding to a structure S Such a property does not depend on the model used • Very similar behavior are seen in 2D 6×6 HP model and in 2D or 3D models with 20 different amino acids. Off the lattice: 23mer 3 state model Zinc finger of 1PSV Off-lattice model: results • a: Backbone configuration of the 11th most designable 23-mer structure • b: Backbone configuration of the zinc finger 1NC8, truncated to 23 amino acids. What does it mean? Sequence Structure The energy landscape Proteins are in a special subset of heteropolymers • Such that the number of possible structures are greatly reduced. • Evolution! • Therefore protein structure prediction is not as hard as it appears. (still a hard problem though..) • That also explains why knowledge-based methods works. • Nevertheless, the tools developed offers valuable clues for the structure of a new protein. Computer Simulation • Goals: 1) 2) • • Structure prediction: From primary sequences to tertiary structures (so that we can infer its function) Known structure (from X-Ray or NMR or another simulation). Want: dynamics (how it moves at room temperature, with a ligand, or with a mutation). (1) above is difficult but do-able. We will discuss about some of the methods. (2) is often done with the same methods developed for (1). Computers • • • Deals with numbers and logical operations. Needs some “principles” (written in mathematical equations). For protein simulations there are different approaches: 1. Physics-based 2. Knowledge-based Physics • Laws for particles moving and interaction – Classical Mechanics (Newton’s Equation of motion) F ma – Quantum Mechanics (Schrödinger’s Equation) (r, t ) Hˆ (r, t ) t Time - Independen t : Hˆ (r ) E (r ) Time - Dependent : i • Many developments in physical chemistry can be used. Physics-Based protein simulation • All quantum mechanics (QM) calculation is not feasible. • QM can be applied to a small set of atoms. – Modeling of an active site (other atoms: not treated or treated as dielectric continuum) – Can get total energies (binding vs. nonbinding, pKa etc.), wave function (charge distribution). – QM/MM simulations (other atoms: treated with Molecular Mechanics) An example of using QM (Case et al., J. Biol. Inorg. Chem. 2002, 7, 632) • Rieske iron-sulfur protein in bc-type cytochromes • Calculations based on density functional theory (DFT) performed. • pKa and redox potentials can be obtained from total energies of several states. • Change of pKa (proton-binding) and redox potential (electronbinding) are strongly coupled, as observed in experiments. Using classical mechanics for protein structure and dynamics • Ignore electrons, assigning (empirical) force fields for atoms (or clusters of atoms). • A very simple potential: Force fields: bond stretching and bending A. R. Leach, “Molecular Modelling”, 1996 Torsional potential A. R. Leach, “Molecular Modelling”, 1996 3 point charges N2 molecules: Known to have an Electric quadrupole moment 5 point charges Ab initio QM results Polarization: many-body effect Physics based: methods • Energy Minimization – Steepest descent – Conjugated gradient • Monte Carlo Simulation – Random sampling – Stimulated annealing • Molecular Dynamics – Compute conformational changes. Energy surface of two torsional angles • Very shallow valleys. • Similar in energy. • Determining the (ψ,φ) conformations of peptide backbones is even more complicated. Trapping at a local minimum • Standard practice: use Monte Carlo (random sampling) with stimulated annealing techniques. Using classical mechanics for protein structure and dynamics • With a Force field ( V(rN) ), for lowest energy structure • Find the structure that gives energy minimum. Hopefully this is done within finite amount of computer resources. And hopefully this energy minimum gives the desired native protein structure. • For protein dynamics: calculate trajectories (Newton’s eq.) at thermal condition and find the averaged physical quantities. Questions to ask: • Is the energy function correct? – Precise enough to discriminate other nonnative structure. – Yet simple enough for computers to carry out efficiently. • Is the conformational search good enough to cover the global minimum? Take-home messages: Physics-based methods • Protein folding without any prior knowledge about protein structure is a difficult task. • Protein structure prediction is often quoted as an “N-P complete problem”, i.e. the complexity of the problem grows exponentially as the number of residues increases. • Structures of small proteins (~101 - 102 a.a.) can be solved in principle.