ELECTROSTATIC COUPLING AND CONFORMATIONAL FLUCTUATIONS AS DETERMINANTS OF PKA VALUES IN PROTEINS by Brian Doctrow A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland March, 2014 © 2014 Brian Doctrow All Rights Reserved Abstract Electrostatic effects, particularly proton binding and transfer, govern many essential biological functions of proteins. Relating protein structure to function therefore requires understanding the molecular determinants of pKa values in proteins. Various factors influence these pKa values, including hydration, hydrogen bonding, and Coulomb interactions. Resolving the contributions of these factors requires structure-based calculations of electrostatic energies. To be useful, such calculations must be able to reproduce experimental data, which current structurebased pKa calculations are unable to do. This work examined two problems where experimental insight was necessary to improve structure-based electrostatics calculations. Enzyme active sites typically contain clusters of ionizable residues, leading to strong electrostatic interactions and complex coupling between the pKa values of the residues involved. To better characterize these interactions, the ionizable residues clustered in the active site of staphylococcal nuclease (SNase) were systematically neutralized by mutagenesis, and the effect on the pKa values of the other ionizable groups was measured using NMR spectroscopy. One of the residues in the active site, Asp-19, has a depressed intrinsic pKa due to accepting a hydrogen bond, and is therefore insensitive to repulsive Coulomb interactions. Meanwhile, Asp-21 has an elevated intrinsic pKa due to acting as a hydrogen bond donor and therefore absorbs most of the repulsive interaction energy in the cluster. Therefore in systems with strong coupling between ionizable groups, small structural variations can lead to ii large differences in pKa values. Crystal structures may not be sufficiently accurate to capture these variations. It is believed that one reason for the failure of structure-based pKa calculations is that they do not explicitly include the effects of backbone reorganization. To show that backbone reorganization has a significant effect on pKa values, the pKa values of carboxylic groups in SNase were measured in the presence of glycine substitutions that perturbed the local stability of the protein backbone. Significant changes in pKa values were observed that could not be reproduced with calculations that treat the protein backbone as static. This suggests that structure-based electrostatics calculations need to account for backbone reorganization explicitly. Thesis Committee Bertrand García-Moreno E., Ph.D. (Advisor, Reader) Juliette Lecomte, Ph.D. (Second Reader) Mario Amzel, Ph.D. Doug Barrick, Ph.D. Vincent Hilser, Ph.D. iii Acknowledgements First and foremost, thanks to Dr. Bertrand García-Moreno for his support and encouragement throughout my graduate training. I am also grateful to my fellow BGME lab members, past and present, for their assistance and companionship. I especially want to thank Dr. Carlos Castañeda (for teaching me more about NMR than I ever thought I’d learn), Dr. Carolyn Fitch (for explaining calculations and for bringing doughnuts), Dr. Jamie Schlessman (for helping me navigate the harrowing path of X-ray crystallography), Dr. Mike Harms (for showing me what computers can do in the right hands), Dr. Aaron Robinson (for keeping the place nerdy), Erika Wheeler (for always being happy and for watching my cat while I was out of town), and Dan Richman (for many stimulating discussions). Second, thanks to Dr. Ananya Majumdar, the NMR facility director. Much of this work could not have been done without his commitment and support. I would also like to thank the members of my thesis committee who have guided me throughout my research: Dr. Juliette Lecomte, Dr. Doug Barrick, Dr. Vince Hilser, and Dr. Mario Amzel. Third, I want to thank my friends and fellow graduate students who have shared my experiences. I thank Matt Preimesberger for sharing my love of classic rock and for joining me for many concerts and baseball games. I thank Jackson Buss for helping me take time out from research to play golf. I thank Dr. Helen Jun for starting a turducken tradition and for showing me that life after graduate school is possible. I thank Mike Lee-Thompson for many nights of trivia fun and for iv introducing me to many new beers. And thanks to Thuy Dao, for her excellent cooking. I am grateful to my parents for always supporting me no matter what. Thanks for always being there when I didn’t know where else to turn. Thanks also for periodically getting me away from the lab and taking me on vacation. Finally, I want to thank Amber Hill. Meeting her was the best thing to happen to me during grad school. I could not have made it through the last couple of years without her support, encouragement, and love. I hope I can be as good to her as she has been to me, and I hope we have many more years and experiences to share. v Table of Contents Abstract ii Acknowledgements iv Table of Contents vi List of Tables viii List of Figures ix 1 1 INTRODUCTION 1.1 Importance of protein electrostatics in biology 1.2 pKa values of ionizable groups in proteins 1.3 Physical model of the determinants of pKa values of ionizable groups in proteins 1.4 Measurement of pKa values by NMR spectroscopy 1.5 Structure-based pKa calculations 1.6 pKa values of His, Asp, and Glu in staphylococcal nuclease 1.7 Overview of the contents of this dissertation 2 3 4 9 13 23 24 2 ELECTROSTATIC COUPLING IN A CLUSTER OF CARBOXYLIC GROUPS IN THE ACTIVE SITE OF AN ENZYME 31 2.1 Abstract 2.2 Introduction 2.3 Results 2.3.1 Coulomb interactions in the active site cluster 2.3.2 pKa of Asp-21 2.3.3 pKa values at high ionic strength 2.3.4 Influence of Arg-35 2.4 Discussion 2.4.1 Determinants of the intrinsic pKa of Asp-21 2.4.2 Role of intrinsic binding affinities in partitioning of cooperative energy 2.4.3 Implications for structure-based pKa calculations 2.5 Conclusions 2.6 Materials and methods 2.6.1 Protein expression and purification 2.6.2 NMR spectroscopy 2.6.3 pKa values 2.6.4 Comparison of SNase structures 2.6.5 Crystal structure of ∆+PHS/D21N 2.6.6 Structure-based continuum electrostatic calculations 2.7 References 32 33 38 38 42 45 47 56 56 57 62 70 71 72 72 73 75 76 77 79 3 CONFORMATIONAL REORGANIZATION OF THE BACKBONE INFLUENCES THE PKA VALUES OF IONIZABLE GROUPS IN PROTEINS 82 3.1 Abstract 83 vi 3.2 Introduction 3.3.1 pKa values measured by NMR spectroscopy 3.3.2 Thermodynamic stability 3.3.3 Crystal structures 3.3.4 Hydrogen exchange in Gly variants 3.3.5 15N NMR relaxation measurements 3.3.6 Structure-based pKa calculations 3.3.7 COREX calculations 3.4 Discussion 3.5 Conclusion 3.6 Materials and methods 3.6.1 Site directed mutagenesis and protein purification 3.6.2 Equilibrium thermodynamics 3.6.3 NMR spectroscopy 3.6.4 X-ray crystallography 3.6.5 Calculations 3.7 References 84 87 94 97 99 102 102 105 108 114 115 115 116 117 120 122 124 APPENDIX A SUPPLEMENTARY INFORMATION FOR CHAPTER 2, “ELECTROSTATIC COUPLING IN A CLUSTER OF CARBOXYLIC GROUPS IN THE ACTIVE SITE OF AN ENZYME” 130 A.1 References 146 APPENDIX B SUPPLEMENTARY INFORMATION FOR CHAPTER 3, “CONFORMATIONAL REORGANIZATION OF THE BACKBONE INFLUENCES THE PKA VALUES OF IONIZABLE GROUPS IN PROTEINS” 147 Vita 161 vii List of Tables Table 2.1. pKa values of Asp and Glu residues in or near the active site of SNase measured at 100 mM KCl 39 Table 2.2. pKa values of Asp and Glu residues in or near the active site of SNase in 1M KCl. 46 Table 2.3. List of expected NOE interactions involving Arg-35-Hε for both the NVIAGA and ∆+PHS crystal structures. 53 Table 3.1. pKa values of select Asp and Glu residues measured by NMR spectroscopy.a 90 Table 3.2: Stability measured by acid- and GdmCl-induced denaturation. 95 Table A.1. pKa values of all Asp and Glu residues in all SNase variants from this study measured at 100 mM KCl 131 Table A.2. pKa values for all carboxylic groups in ∆+PHS and ∆+PHS/D19N/D40N/E43Q measured at 1 M KCl 140 Table A.3. X-Ray data collection and refinement statistics for ∆+PHS/D21N 142 Table B.1: pKa values of select Asp & Glu residues measured by NMR spectroscopy.a 148 Table B.2: Crystallographic statistics for ∆+PHS/M98G and ∆+PHS/A69G 153 Table B.3: RMSD of Gly variant crystal structures relative to ∆+PHS 155 Table B.4: Hydrogen exchange rates measured in ∆+PHS and Gly variants.a 156 viii List of Figures Figure 1.1. Examples of titration curves of Asp residues measured by NMR spectroscopy 11 Figure 2.1. Structures of the active sites of ∆+PHS and NVIAGA SNase variants 34 Figure 2.2. Asp-21 titration curves in the presence of charge-removal mutations 43 Figure 2.3. Titration curves for active site groups in ∆+PHS and R35Q 48 Figure 2.4. Histograms of distances from Arg-35 to Asp-19 or Asp-21 in SNase crystal structures 51 Figure 2.5. NOEs involving Arg-35-Hε 54 Figure 2.6. Simulated titration curves for two interacting carboxylic groups 59 Figure 2.7. Effect of Asp-19 and Asp-21 on each other’s pKa 61 Figure 2.8. FDPB calculations for active site carboxylic groups 65 Figure 2.9. pKa values calculated during MD trajectories 68 Figure 3.1. Locations of Gly substitutions in SNase 88 Figure 3.2. pKa shifts caused by Gly substitutions in SNase 89 Figure 3.3. Effects of Gly substitutions on global stability 96 Figure 3.4. Alignment of Cα traces of ∆+PHS SNase and Gly variants 98 Figure 3.5. HX changes due to A69G and M98G substitutions 100 Figure 3.6. Backbone 15N relaxation parameters in ∆+PHS and Gly variants 103 Figure 3.7. Correlation between measured and calculated pKa values and shifts 104 Figure 3.8. Changes in COREX folding and protection constants due to Gly substitutions 106 Figure A.1. Structure of the active site of ∆+PHS/D21N 144 Figure B.1. pKa shifts in M98A and double Gly variants 160 ix 1 Introduction 1 1.1 Importance of protein electrostatics in biology Many of the essential biological functions of proteins involve the transfer of charge (e.g. protons (H+), electrons (e-), or ions such as Na+, K+, Ca2+, Mg2+, Cl-, etc.) either between different compartments in a cell, between protein and solvent, between protein and another molecule, or between different sites within the protein. Examples of such functions include processes central to biological energy transduction, such as H+ transport1–3 and e- transfer,4 ion homeostasis,5 and catalysis.6 Because the energy of charge transfer depends on the electrostatic potential difference between the start and end points, the ability of proteins to perform charge transfer functions is governed by those properties that govern electrostatic effects. Electrostatic interactions also govern the pH-dependence of biochemical processes. For example, the pH-dependence of the equilibrium properties of proteins arises from the differential proton affinities of different conformational states of the protein. Classic examples of pH-dependent biological processes include the modulation of the affinity of hemoglobin for oxygen7 and of the assembly of many virus capsids8 by pH. Since protons (H+) are charged, the relevant differences in binding affinity involve differences in electrostatic interactions between the charged species of weak acids and bases and the different conformations of the protein. For all of these interesting biological processes, detailed physical understanding of the relationship between structure and function requires knowing the magnitude of electrostatic effects and understanding the factors that determine 2 them. It is well recognized that electrostatic energy is singularly valuable for correlation of structure and function in biochemical processes in general.9 1.2 pKa values of ionizable groups in proteins In proteins, the binding, release, and transfer of H+ involve primarily the weak acids and bases of the ionizable moieties of Lys, His, Arg, Asp, and Glu. The energetics of H+ binding and release are described by the pKa values of these groups, which describe the equilibrium between the neutral and charged species of the ionizable group. The pKa of an ionizable residue in water describes the energetic balance between the proton-side chain bond and the proton-water bond. This is a complicated balance, governed partly by quantum mechanical effects. Since a change in the protonation state of an ionizable group involves a change in charge state, the pKa of an ionizable residue in a protein will also be influenced by the electrostatic properties of its milieu. Specifically, it will depend on the electrostatic potential at the binding site, which is a complex function determined by the geometry of the charges from other ionizable groups, by the influence of permanent dipoles, and by the dielectric properties of the protein, which are different from those of water. In general, the pKa can be expressed in terms of the group’s pKa in a model compound in water plus the difference in electrostatic energy between the protein and model compound states:10 model pK a,i pK a,i zi Gelec,i 2.303RT (1.1) 3 The energetics of e- transfer are described by redox potentials, which reflect a similar equilibrium between charge states. Therefore the same electrostatic properties that influence pKa values also influence redox potentials. At many levels and for many important problems in biochemistry, the problem of relating protein structure to functions governed by electrostatics involves understanding the molecular determinants of the pKa values (redox potentials) of the ionizable residues (redox centers) involved. 1.3 Physical model of the determinants of pKa values of ionizable groups in proteins The pKa value of an ionizable group i is a measure of the Gibbs free energy required to protonate (or deprotonate) that group. Invoking the additivity of the Gibbs free energy function, the pKa values can be parsed into contributions from different physical factors according to the following scheme:11,12 (1.2) The term pKmodel refers to the pKa value of the group in a model compound in water. This is a term that is meant to be determined empirically. It cannot be calculated with precision because, as mentioned previously, it is a thermodynamic parameter that is governed by a complicated balance between the energetics of H+ binding to water versus the weak acid or weak base, which involves quantum effects. These calculations are beyond even the most sophisticated quantum mechanical 4 calculations, primarily owing to uncertainties about the nature of the H+ in its interaction with water. These pKmodel values have been experimentally determined by a variety of approaches and under a variety of conditions. At 298 K, 0.1 M ionic strength using peptides of various lengths with blocked N- and C-termini as model compounds, the pKmodel values are 3.9, 4.4, and 6.5 for Asp, Glu, and His, respectively,13 10.4 for Lys,14 12.0 for Arg, 10.0 for Tyr, and 9.0 for Cys.15 1.3.1 Hydration The hydration of charged species is one of the strongest forces in biology. An ionizable side chain in the charged state in bulk water is considered to be fully hydrated and this hydration is reflected in the pKmodel values. In the protein, even at the protein-water interface, the ionizable groups can be partially dehydrated. The Born term, ∆pKBorn reflects the difference in the hydration energy of the charged form of the group in water and in the protein interior. In a primitive continuum electrostatics model the Born free energy can be described in terms of the free energy for transferring a unit charge of radius r (in Å) between water and the protein: GBorn 332q2 1 1 2r in w (1.3) Here εin and εw are the dielectric constants of the protein and water, respectively. The factor 332 converts the value of ∆G into units of kcal/mol. The free energy of 5 ionization is related to the pKa according to ∆G = 1.36*pKa (at 298K with ∆G in kcal/mol). Because the protein interior is usually less polar and less polarizable than water, εin will always be smaller than εw, and hence ∆∆GBorn will always be unfavorable for the ionizable group in the protein relative to the ionizable group in water. This will shift the pKa in the direction that favors the neutral state. 1.3.2 Coulomb Interactions An ionizable group in a protein can experience two types of Coulomb effects. The ∆pKbackground term in equation 1.2 reflects interactions between the ionizable group and permanent dipoles within the protein. ∆pKij reflects interactions with other charged ionizable groups. In a primitive continuum model with atomic detail permanent dipoles are modeled as partial charges,11 so these two types of interactions both follow Coulomb’s law: Gij 332qiq j rij (1.4) where qi and qj are the charges on groups i and j, and rij is the distance in Å between groups i and j. Because the charged states of ionizable groups vary with pH, ∆pKij is pH-dependent; the other terms in equation 1.2 are not. The sum of the pH- independent terms (pKmodel, ∆pKBorn, ∆pKbackground) is referred to as the intrinsic pKa (pKint). It represents the pKa that the group would have if all of the other ionizable groups in the protein were neutral. Save for Coulomb interactions with the charges 6 of other ionizable groups, pKint includes all effects on the pKa related to the ionizable group being in a protein environment as opposed to bulk water. 1.3.3 pKa values in proteins are useful to examine the accuracy of structure-based calculations pKa values in proteins can be measured using NMR spectroscopy.16,17 Therefore, in principle, by comparing pKa values measured in proteins and pKa values measured in model compounds, it is possible to determine the magnitudes of electrostatic energies in proteins. The pKa values of ionizable groups within proteins can vary considerably. Groups at the protein surface tend to have pKa values similar to those of model compounds.17,18 On the other hand, ionizable groups buried in the interior of a protein can have highly anomalous pKa values quite different from those of model compounds. The reason for this is that the dielectric effect inside a protein is much smaller than that of water; therefore the Born energies can be very large and uncompensated by background or Coulomb effects. For example, both Glu and Lys have been substituted systematically at 25 internal positions in staphylococcal nuclease. At 23 of these positions, Glu has a significantly elevated pKa compared to its model compound value of 4.4.19 Similarly, the pKa of Lys is significantly depressed at 19 of these positions compared to its model compound value of 10.4. 20 In both cases, the pKa values range from 5.2-9.4, corresponding to shifts of 1-5 pH units from the corresponding model compound values. 7 One of the problems with attempting to understand the physical and structural origins of electrostatic effects is that pKa measurements alone cannot identify how the different terms in equation 1.2 contribute to a pKa. What is desired is a correlation between the electrostatic energy and how the protein conformation and dynamics are affected by a change in the charge of one group. The similarity between pKa values of surface residues and the pKa values of model compounds may indicate lack of interactions with the protein, or strong favorable interactions that are canceled out by equally strong unfavorable interactions. To distinguish between these two possibilities, structure-based pKa calculations with methods based on physical principles are needed. With these methods one could attempt to calculate the Born, background, and Coulomb contributions to pKa values starting from the protein structure and principles from classical electrostatics and statistical thermodynamics. For such calculations to be useful, they must be able to reproduce experimental data to prove that they capture all of the relevant factors contributing to the pKa.10 Current methods for structure-based pKa calculations (described in section 1.4) are not able to reproduce experimental data well enough to have predictive power.21 This suggests that our understanding of the physics governing electrostatics in proteins is incomplete. The development of more accurate computational methods for structure-based calculations of electrostatic effects remains one of the important goals in the area of structural biochemistry. The experiments described in this dissertation examine two problems where experiments are needed to obtain the detailed physical insight needed to guide the 8 development and improvement of accurate methods for structure-based calculation of electrostatic energies. 1.4 Measurement of pKa values by NMR spectroscopy The most useful way of accessing information about electrostatic effects in proteins is by measurement of pKa values with NMR spectroscopy. Often a single NMR spectroscopy experiment is sufficient to measure the pKa values of all the ionizable groups of a given type simultaneously, sometimes unambiguously.16,17 The work described in this dissertation uses the pKa values of Asp and Glu residues measured by NMR as probes of electrostatic properties of a protein. For Asp and Glu, the carboxyl carbon (Cγ or Cδ) resonance is the best reporter of the group’s protonation state for two reasons: (1) it exhibits large changes in chemical shift upon protonation (typically 3-4 ppm);22,23 and (2) its chemical shift is relatively insensitive to spurious pH-dependent effects that may complicate the titration curves.22,23 For surface carboxyl residues, the carboxyl carbon chemical shifts typically fall in the range of 175-180 ppm for Asp and 180185 ppm for Glu. In proteins with a large number of carboxylic groups, these chemical shifts can be measured with a two-dimensional experiment that correlates the carboxyl carbon (Cγ or Cδ) chemical shift with that of the neighboring aliphatic carbon (Cβ or Cγ). This allows a large number of Asp & Glu resonances to be resolved. For most residues, the carboxyl carbon resonance follows the titration of only that residue, and the pH-dependence of the chemical shift has a characteristic sigmoid shape, with the midpoint of the curve corresponding to the apparent pKa 9 value (Figure 1.1(a)). Apparent pKa values can be obtained by fitting a modified Hill equation to the data: obspH AH A 10 n pHpK a (1.5) 110 npHpK a where δAH and δA- are the chemical shifts of the fully protonated and fully deprotonated forms, respectively, and n is the Hill coefficient, which reflects the slope of the titration curve. Using this method, the pKa values for all Asp & Glu residues in a protein can be measured simultaneously. The pKa value obtained in this manner is an apparent pKa that is independent of pH, as opposed to the microscopic pKa of equation 1.2 which is pH-dependent, as explained in section 1.3.2. The apparent pKa corresponds to the point where pKa,i(pH) = pH. In certain cases, the titration curve does not follow the characteristic sigmoid shape described by equation 1.5. Such complexity may arise from strong electrostatic interactions between two carboxylic groups that cause their resonances to report on each other’s titration as well as their own. In these cases, the modified Hill equation can be generalized to three-state binding to fit the data better (Figure 1.1(b)). Equation 1.5 assumes that proton binding and release occur in the fast exchange regime, so that δobs is a weighted average of the chemical shifts of the protonated and deprotonated states. In some cases severe line broadening occurs 10 Figure 1.1. Examples of titration curves of Asp residues measured by NMR spectroscopy. (a) Example of a single-site titration curve and the fit line to equation 1.5. (b) Example of a curve that exhibits two titration events, and the fit to a three state version of equation 1.5. (c) Example of a residue that titrates below the pH where the protein unfolds, therefore no titration event is visible. In this case, the pKa cannot be determined. 11 during the titration (see Figure 1.1(b)), indicating that the fast exchange condition is no longer met. This is more likely to occur for residues with higher pKa values, which titrate at lower [H+], and thus have slower exchange between protonation states (kex = kon[H+] + koff). In such cases the pKa values determined using equation 1.5 will be less accurate. The degree of inaccuracy will depend primarily on how far outside of the fast exchange regime the protonation rate is, which depends upon both the exchange rate and the chemical shift difference between the protonated and deprotonated species.24,25 Although NMR-monitored pH titrations provide an accurate way to measure multiple pKa values within a protein, they are not without limitations. Chief among these is that the protein has to remain folded during the titration of the residue(s) of interest. In an unfolded protein, most residues’ resonances cannot be resolved because the residues have lost their distinct chemical environments. Even if a residue could still be resolved in the unfolded state, it will still be in a different electrostatic environment from the one it experiences in the folded protein. Thus the resulting titration curve will reflect a different pKa from the one the residue would have in the folded state. This means that for Asp & Glu residues the protein should remain folded at acidic pH (< 4), and even then the H+ titration curve of residues with significantly depressed pKa values may not be measureable (Figure 1.1(c)). Therefore, proteins that fold only within a narrow pH range will not be amenable to these types of measurements. Furthermore, the limitation of experimental pKa values mentioned in the previous section applies even to pKa 12 values measured with exquisite accuracy and precision by NMR spectroscopy: the NMR experiments yield little direct insight into the determinants of the pKa values. 1.5 Structure-based pKa calculations One of the goals of studying protein electrostatics is to understand how structure determines pKa values. In fact, one of the goals of this thesis is to test two specific hypotheses with the aim of contributing both the physical insight needed to guide the development of computational algorithms for structure-based pKa calculations, and the data necessary to benchmark these methods. Various methods exist for calculating pKa values in proteins based on structure and physical principles. These methods differ in the amount of atomic detail that is treated explicitly, and in whether or not the protein structure is treated as static or dynamic. At the most extreme level of detail are microscopic models, in which all of the protein and solvent atoms and their motions are treated explicitly. Such a model can provide the most rigorous insight into the physical origins of pKa values. Unfortunately, these methods suffer from practical problems that make them generally unsuitable for pKa calculations. These problems include difficulty converging, improper treatment of long-range interactions, and artifacts resulting from the treatment of the system boundary.10,26 Models based on the continuum approximation are more useful. In these models, the polarizability of protein and solvent are treated implicitly by assigning dielectric constants to these regions. Electrostatic energies are then scaled according to these constants (see the equations in section 1.3). Different continuum models vary in how much 13 microscopic detail of the protein is retained (e.g. whether or not protein dipoles are treated explicitly). In addition, some models account for protein motions explicitly whereas others do not. The use of dielectric constants greatly simplifies pKa calculations. However, it also obscures the physical basis for the calculated pKa values, since the dielectric constant subsumes a variety of processes that can influence the pKa value. 1.5.1 FDPB calculations One of the most popular methods for pKa calculations is based on the numerical solution of the linearized Poisson-Boltzmann equation by the method of finite-differences (FDPB).11 In this model, the protein is represented as a set of stationary charges embedded in a medium with a uniform dielectric constant, εin. Partial charges are used to represent permanent dipoles, which are assumed to have a fixed orientation. In the simplest FDPB implementation, the protein is represented by a single, static structure. The solvent is represented as a continuous medium with the dielectric constant of water, εw, and the concentrations of mobile ions in solution are assumed to follow a Boltzmann distribution around the protein. Solution of the Poisson-Boltzmann equation yields the electrostatic potential, Φij, at site j due to a unit positive test charge at site i. Electrostatic energy is calculated as the product of charge times potential, therefore the free energy of ionizing a group at site i in the protein can be calculated using an expression such as:10 14 (1.6) where the index j runs over all of the partial charges in the protein, and the index k runs over all of the ionizable sites. The superscript ° designates the neutral state of the group at i. The first two terms correspond to ∆pKBorn, the middle two to ∆pKbackground, and the last two to ∆pKij. An analogous expression can be used to calculate the ionization energy in the model compound, ∆Gimodel. The pKa can then be calculated as: (1.7) Implementation of FDPB calculations requires a number of parameters to be specified by the user. First is a set of atomic coordinates, including H atoms (which must be added computationally if their positions are not known experimentally). A set of partial atomic charges must be provided, as well as the values of pKamodel for each residue type. The temperature and ionic strength must be specified. Finally, the values of the dielectric constants εw and εin must be assigned. εw is generally assigned the value of the measured dielectric constant for bulk water (εw = 78.5). The appropriate value of εin, on the other hand, is a matter of considerable debate. Experimental measurements of the dielectric constant of dry protein powders, εprot, give values in the range of 2-4.27 This value is comparable across many different 15 types of proteins, and is consistent with theoretical considerations.28 However, using εin = εprot in standard FDPB calculations exaggerates the magnitude of electrostatic effects. Ionizable groups at the surface of the protein tend to have measured pKa values close to their model compound values,17,18 but the calculations with εin = 4 predict pKa values that are considerably shifted from their model compound pKa values. Agreement between calculated and measured pKa values of surface residues can be improved by setting εin = 20, thereby artificially attenuating electrostatic interactions.29 The reason that FDPB calculations with static structures fail when εin = εprot lies in the fact that the physical meaning of these two parameters is not the same. The measured parameter εprot reflects a fundamental property of proteins, namely, the bulk dielectric response of the protein molecules to an external electric field. By contrast, what determines the pKa values of ionizable residues is the dielectric response of a single protein molecule to a charge within that molecule. In this sense, εin is not a true dielectric constant, rather it is a scaling parameter meant to account for all contributions to electrostatic interactions that the protein model does not treat explicitly.26,30 Its physical meaning depends entirely on the way the protein is modeled in the calculations. In a fully microscopic simulation, all contributions to electrostatic interactions are treated explicitly, therefore εin = 1. If electronic polarizability is not treated explicitly, then εin has to account for its effect implicitly, resulting in εin ≈ 2. As the protein model gets less detailed, more effects get subsumed into εin, causing the value of εin to increase further. Since a standard FDPB calculation represents the protein with a static structure, it does not account for 16 dynamic contributions to electrostatic effects explicitly. These can range from fluctuations of charged side chains and reorientation of dipoles to large-scale structural reorganization and even global unfolding in the most extreme cases. 26 Supposedly εin = 20 accounts implicitly for the effects of these dynamics on the pKa values of surface residues.29 Unfortunately, the ability to reproduce the pKa values of surface residues alone is insufficient to show that a calculation captures the correct physical determinants of pKa values. As noted in section 1.3, surface groups tend to have pKa values similar to those of model compounds. Any calculation with a high εin will predict weak interactions between residues and consequently, small shifts from model compound pKas,26 even if these small shifts do not actually reflect weak interactions. Internal ionizable residues, whose pKa values are very different from model compounds, provide a more stringent test for identifying physically realistic models. Although using εin = 20 in FDPB calculations with static structures can give reasonable results for surface ionizable residues,29 the pKa values of internal ionizable residues are better reproduced using εin = 10,30 which does not reproduce the pKa values of surface residues. Thus a protein model that treats dynamic effects implicitly via εin is a poor model, because there is no value of εin that can selfconsistently reproduce the pKa values of surface and internal ionizable groups simultaneously. This reflects the fact that the protein interior is a heterogeneous environment. Different atoms within the protein will not be equally polarizable, and there is no compelling reason to assume that all parts of the protein will exhibit the same structural and dynamic response to ionization. Therefore the same value of εin 17 may not be valid for all ionizable groups and there is no way to know a priori what value to use for any given group.10 In order for structure-based calculations to selfconsistently reproduce the pKa values of all ionizable residues simultaneously, structural reorganization must be treated explicitly. 1.5.2 Structural reorganization by molecular dynamics One way to model structural reorganization explicitly is by using molecular dynamics (MD) simulations. In the simplest implementation, an ensemble of conformations is generated from an MD simulation, and an average pKa is computed from the ensemble. Early calculations of this sort on bacteriorhodopsin31 and cytochrome c32 showed that averaging had a significant effect on the calculated pKa values. However, it was difficult to judge whether the conformational averaging improved the accuracy over calculations with static structures because experimental pKa values for most residues in these proteins were either unavailable or poorly determined. The first study to compare pKa values calculated from MD ensembles with measured pKa values was by van Vlijmen et al.33 These authors calculated pKa values in BPTI and lysozyme from MD-generated ensembles using three different approaches to conformational averaging. They also performed single-structure calculations using two different crystal structures of each protein. They found that for both proteins, pKa values calculated from MD ensembles were as good or better than calculations from a single crystal structure, depending on which crystal structure was used. However, even with conformational averaging, the calculated pKa values were more accurate using εin = 20 versus εin = 4. 18 The limitations of using MD simulations to account for conformational reorganization have been discussed previously. Bashford and Gerwert pointed out the inability of classical MD to account for conformational changes linked to titration.31 If a residue is assumed to be in the charged state (the standard protonation state at pH 7 for Arg, Lys, Asp, and Glu), then the MD simulation will be biased towards conformations that stabilize the charged state. Consequently, favorable Coulomb interactions will be exaggerated and the pKa value will be shifted too far in the direction favoring the charged state. Use of εin = 20 may compensate for this bias by attenuating favorable interactions, resulting in the improved accuracy seen by van Vlijmen et al.33 One way to include coupled titration and conformational reorganization is to run two simulations, one with the residue of interest protonated and one with the residue deprotonated, and then average the results of the two simulations using a linear response approximation.26 For calculations of the pKa values of all ionizable residues in a protein, this becomes problematic because the large number of possible protonation states requires generating an equally large number of MD trajectories. Another difficulty with MD simulations is that they are limited in the range of timescales that can be sampled. Large changes, such as local unfolding or water penetration, may have a substantial impact on pKa values yet cannot be sampled adequately in the timescales accessible to current MD simulations.21,34 Consequently, a high value of εin may still be necessary to account for the effects of processes that are slow compared to the timescale of the simulations. 19 1.5.3 Constant-pH molecular dynamics Recently, constant-pH molecular dynamics (CPHMD) methods have been developed to address some of the aforementioned issues with classical MD. In these methods, coupling between conformational dynamics and protonation is explicitly modeled using one of two approaches: (1) continuous titration coordinates are propagated alongside the spatial coordinates using λ dynamics,35 or (2) discrete protonation states are sampled throughout the simulation via Monte Carlo.36 In such an approach, while explicit solvent can be used in calculation of the conformational state, it is not practical to calculate protonation states using explicit solvent because of the lengthy simulation times required to compute solvation forces accurately.35 Instead, most of these methods calculate protonation states using a generalized Born implicit solvent model. Unfortunately, this model is known to underestimate effective Born radii for buried atoms, which leads to overestimation of solvation energies and underestimation of Coulomb interactions.37 CPHMD methods also suffer from problems with slow convergence,37 although the introduction of enhanced sampling techniques such as replica exchange35 and accelerated MD38 may alleviate some of these problems. 1.5.3 Multi-conformation continuum electrostatics Another method that accounts for conformational reorganization explicitly is the multi-conformation continuum electrostatics (MCCE) method.39,40 In this method, each side chain in the protein is allowed to adopt multiple rotameric and tautomeric states (conformers) with energies calculated using a continuum 20 approach. The populations of the conformers at specific pH values are then determined using Monte Carlo sampling. Thus the coupling between structural reorganization and ionization is treated explicitly for side chains, as well as for sitebound ions and internal water molecules. MCCE improves the accuracy of pKa calculations when εin lower than 20 is used, illustrating just how sensitive the calculated pKa values can be to small changes in the local microenvironment.40 However, arbitrary adjustments to εin are still necessary to get the best accuracy.41 Comparison of calculations on different crystal structures of the same protein give different results, and pKa values averaged over multiple crystal structures tend to be more accurate than values calculated from a single structure.40 Furthermore, MCCE does a poor job of reproducing the pKa values of residues whose ionization is coupled to unfolding.42 These results imply that backbone reorganization, which is not treated explicitly in the MCCE calculations, can have a significant impact on pKa values. 1.5.4 Other ensemble-modulated continuum electrostatics Backbone reorganization can be treated explicitly with the ensemblemodulated continuum electrostatics (EMCE) method.43 This method is based on the COREX algorithm,44,45 which generates a Boltzmann-weighted ensemble of partially unfolded microstates from a single input structure. Within each microstate, ionizable groups that are in folded regions of the protein and sufficiently protected from solvent are assigned microscopic pKa values calculated from a single structure with a continuum method. Groups in unfolded regions or that are exposed to 21 solvent are assigned model compound pKa values. The protonation state of each residue is then averaged over the entire ensemble at each pH to obtain titration curves. Each residue’s overall pKa will then be a population-weighted average: pK a Pf pK a, f Pu pK a,u (1.8) where pKa,f and pKa,u are the pKa values in the folded and unfolded states, respectively, and Pf and Pu are the populations of those states. Residues that are in less stable regions of the protein, and thus more prone to local unfolding, will have more normal pKa values in the ensemble calculations compared with the static structure calculations. Furthermore, because groups can have different pKa values in different microstates, the populations of the microstates will be pH-dependent. Specifically, as the pH decreases, microstates with higher pKa values will become more favorable. This method has been shown to reproduce correctly the acidunfolding behavior of staphylococcal nuclease,43 but its ability to reproduce individual pKa values has yet to be tested. This method also ignores the effects of side chain reorganization within folded regions of the protein. Thus the EMCE and MCCE techniques are complementary to each other. MCCE only treats side chain reorganization explicitly, while EMCE only treats local & global unfolding explicitly. However, neither technique on its own provides a complete description of the protein response to ionization. 22 1.6 pKa values of His, Asp, and Glu in staphylococcal nuclease The studies described in the following chapters used staphylococcal nuclease (SNase) as a model system for probing electrostatic effects in proteins in detail. SNase is extremely useful as a model protein owing to its relatively small size (149 residues), its high solubility that makes it highly amenable to experimental analysis in general and to NMR studies in particular, its high stability, and the ease with which it can be crystallized and manipulated. It contains a large number of ionizable residues: 23 Lys, 12 Glu, 8 Asp, 5 Arg, and 4 His, which provides the possibility of many electrostatic interactions at the protein surface. The pKa values of all 20 Asp & Glu residues have been measured previously by NMR,46 as have those of His residues.47,48 FDPB calculations can reproduce the pKa values of the His residues,48, but not those of the Asp and Glu residues,46 even when the protein is treated with εin = εw. The only time reasonable agreement between calculated and measured pKa values of Asp and Glu residues is obtained is when the calculations are carried out using 1 M ionic strength. This suggests that the calculations overestimate the magnitude of the ∆pKij term in equation 1.2, since raising the ionic strength increases screening of medium- and long-range Coulomb interactions. For carboxylic residues, these interactions are predominantly attractive, thus overestimation of these interactions leads to calculated pKa values that are too depressed. This leads the calculations to overestimate the number of protons taken up during unfolding and to predict the protein to unfold at a much higher pH than is observed experimentally.21 Furthermore, these calculations are unable to reproduce the anomalous pKa values of the active site residues. Thus SNase is useful 23 as a model system in which to demonstrate the limitations of structure-based electrostatics calculations. Studying the determinants of pKa values in SNase experimentally can provide insight into why these calculations fail, and hopefully lead to improvements in the calculations as well as our general understanding of the relationship between protein structure and electrostatic energy. 1.7 Overview of the contents of this dissertation The studies described in this dissertation examine two situations in which pKa values are particularly difficult for structure-based electrostatics calculations to reproduce. The first is when multiple ionizable groups come together to form a cluster, such as might be found in an enzyme active site. The second is when ionizable residues are under the influence of significant backbone conformational reorganization. To understand why the calculations fail in these instances, experimental data that provide physical insight into the determinants of the pKa values are needed. Measuring how pKa values in a protein shift in response to specific mutations can provide this insight, if we also know how those mutations affect other physical properties of the protein. The resulting data can be used to evaluate the physical accuracy of existing methods for structure-based pKa calculations, as well as to guide the development of more accurate methods. Clusters of ionizable residues are typically found in the active sites of enzymes, where ionizable residues facilitate catalysis by acting as general acids/bases or as nucleophiles, or by stabilizing transition states through electrostatic interactions.6,49 Having multiple ionizable residues in close proximity 24 creates strong electrostatic interactions, which give rise to highly shifted pKa values and complex titration curves for the clustered residues.49–51 Because of these interactions, the pKa values of the clustered residues are strongly coupled to each other, and reflect a precise balance between the different interactions represented by the terms in equation 1.2. Small changes to any one of these interactions can affect all of the residues in the cluster significantly. Consequently the pKa values in such a cluster are sensitive to small variations in structure, making structure-based pKa values extremely difficult. To our knowledge, nobody has yet attempted to dissect the interactions in such a cluster experimentally. Chapter 2 of this dissertation comprises a detailed characterization of electrostatic interactions in the active site of SNase. The active site of SNase contains a cluster of four carboxylic groups with very different pKa values. One pKa is elevated, one is depressed, and the other two are unchanged relative to model compound pKa values. FDPB calculations consistently fail to reproduce these pKa shifts. To understand why participation in the cluster has such different effects on each group’s pKa, the ionizable residues in and around this cluster were systematically neutralized by mutagenesis, and the pKa values in these variants were measured by NMR spectroscopy. This enabled dissection of the detailed network of interactions present in this cluster and to separate intrinsic (pKint) from cooperative (pKij) contributions to each residue’s pKa. Chapter 3 explores the hypothesis that backbone reorganization influences pKa values. As outlined in section 1.5, the high values of εin required for continuum calculations to reproduce the pKa values of surface groups are believed to account 25 implicitly for the effects of conformational reorganization.29 In some cases, however, even using εin = εw is not sufficient to reproduce experimental pKa values. Such is the case with the Asp and Glu residues in SNase. 46 This implies that the single protein structure used in the calculations is an inadequate representation of the protein in solution, which exists in an ensemble of conformations. Support for this view comes from calculations using the EMCE method described in section 1.5, which are better at reproducing the acid unfolding of SNase than FDPB calculations with a single structure.43 The success of the EMCE model, which treats conformational reorganization as a local unfolding process, suggests that pKa values are coupled to local conformational stability. This hypothesis was tested using amino acid substitutions intended to perturb the local stability of the protein backbone without affecting the overall, global structure. Changes to the measured pKa values in these variants are strong evidence that local stability has a significant influence on pKa values and must be treated explicitly in electrostatics calculations. The data from this chapter also provide a useful test of the accuracy of methods that explicitly model conformational reorganization. 26 1.7 References 1. Burykin, A. & Warshel, A. (2003). What Really Prevents Proton Transport through Aquaporin? Charge Self-Energy versus Proton Wire Proposals. Biophysical Journal 85, 3696–3706 2. Burykin, A. & Warshel, A. (2004). On the origin of the electrostatic barrier for proton transport in aquaporin. FEBS Letters 570, 41–46 3. Braun-Sand, S., Strajbl, M. & Warshel, A. (2004). Studies of Proton Translocations in Biological Systems: Simulating Proton Transport in Carbonic Anhydrase by EVBBased Models. Biophysical Journal 87, 2221–2239 4. Gunner, M.R. & Alexov, E. (2000). A pragmatic approach to structure based calculation of coupled proton and electron transfer in proteins. Biochimica et Biophysica Acta (BBA) - Bioenergetics 1458, 63–87 5. Burykin, A., Kato, M. & Warshel, A. (2003). Exploring the origin of the ion selectivity of the KcsA potassium channel. Proteins: Structure, Function, and Bioinformatics 52, 412–426 6. Warshel, A. (2003). COMPUTER SIMULATIONS OF ENZYME CATALYSIS: Methods, Progress, and Insights. Annual Review of Biophysics and Biomolecular Structure 32, 425–443 7. Chu, A.H., Turner, B.W. & Ackers, G.K. (1984). Effects of protons on the oxygenation-linked subunit assembly in human hemoglobin. Biochemistry 23, 604– 617 8. Ehrlich, L.S., Liu, T., Scarlata, S., Chu, B. & Carter, C.A. (2001). HIV-1 Capsid Protein Forms Spherical (Immature-Like) and Tubular (Mature-Like) Particles in Vitro: Structure Switching by pH-induced Conformational Changes. Biophysical Journal 81, 586–594 9. Warshel, A. & Levitt, M. (1976). Theoretical studies of enzymic reactions: Dielectric, electrostatic and steric stabilization of the carbonium ion in the reaction of lysozyme. Journal of Molecular Biology 103, 227–249 10. García-Moreno E, B. & Fitch, C.A. (2004). Structural Interpretation of pH and SaltDependent Processes in Proteins with Computational Methods. Methods in Enzymology Volume 380, 20–51 11. Bashford, D. & Karplus, M. (1990). pKa’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29, 10219–10225 12. Warshel, A. (1981). Calculations of enzymic reactions: calculations of pKa, proton transfer reactions, and general acid catalysis reactions in enzymes. Biochemistry 20, 3167–3177 13. Castaneda, C.A. (2009). Determinants of electrostatic energies and pKa values in proteins. at <http://search.proquest.com/docview/304907798?accountid=11752> 14. Keim, P., Vigna, R.A., Nigen, A.M., Morrow, J.S. & Gurd, F.R.N. (1974). Carbon 13 Nuclear Magnetic Resonance of Pentapeptides of Glycine Containing Central Residues of Methionine, Proline, Arginine, and Lysine. J. Biol. Chem. 249, 4149– 4156 15. Matthew, J.B., Gurd, F.R.N., Garcia-Moreno, B.E., Flanagan, M.A., March, K.L. & Shire, S.J. (1985). pH-Dependent Processes in Protein. Critical Reviews in Biochemistry and Molecular Biology 18, 91–197 27 16. Markley, J.L. (1975). Observation of histidine residues in proteins by nuclear magnetic resonance spectroscopy. Accounts of Chemical Research 8, 70–80 17. Forsyth, W.R., Antosiewicz, J.M. & Robertson, A.D. (2002). Empirical relationships between protein structure and carboxyl pKa values in proteins. Proteins: Structure, Function, and Genetics 48, 388–403 18. Edgcomb, S.P. & Murphy, K.P. (2002). Variability in the pKa of histidine side-chains correlates with burial within proteins. Proteins: Structure, Function, and Bioinformatics 49, 1–6 19. Isom, D.G., Castañeda, C.A., Cannon, B.R., Velu, P.D. & García-Moreno E., B. (2010). Charges in the hydrophobic interior of proteins. Proceedings of the National Academy of Sciences of the United States of America 107, 16096 –16100 20. Isom, D.G., Castañeda, C.A., Cannon, B.R. & E, B.G.-M. (2011). Large shifts in pKa values of lysine residues buried inside a protein. Proceedings of the National Academy of Sciences of the United States of America 108, 5260–5265 21. Fitch, C.A., Whitten, S.T., Hilser, V.J. & García‐ Moreno E., B. (2006). Molecular 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. mechanisms of pH‐ driven conformational transitions of proteins: Insights from continuum electrostatics calculations of acid unfolding. Proteins: Structure, Function, and Bioinformatics 63, 113–126 Oda, Y., Yamazaki, T., Nagayama, K., Kanaya, S., Kuroda, Y. & Nakamura, H. (1994). Individual Ionization Constants of All the Carboxyl Groups in Ribonuclease HI from Escherichia coli Determined by NMR. Biochemistry 33, 5275–5284 Chen, H.A., Pfuhl, M., McAlister, M.S.B. & Driscoll, P.C. (2000). Determination of pKa Values of Carboxyl Groups in the N-Terminal Domain of Rat CD2: Anomalous pKa of a Glutamate on the Ligand-Binding Surface. Biochemistry 39, 6814–6824 Feeney, J., Batchelor, J.G., Albrand, J.P. & Roberts, G.C.K. (1979). The effects of intermediate exchange processes on the estimation of equilibrium constants by NMR. Journal of Magnetic Resonance (1969) 33, 519–529 Sudmeier, J.L., Evelhoch, J.L. & Jonsson, N.B.-H. (1980). Dependence of NMR lineshape analysis upon chemical rates and mechanisms: Implications for enzyme histidine titrations. Journal of Magnetic Resonance (1969) 40, 377–390 Schutz, C.N. & Warshel, A. (2001). What are the dielectric “constants” of proteins and how to validate electrostatic models? Proteins: Structure, Function, and Bioinformatics 44, 400–417 Takashima, S. & Schwan, H.P. (1965). Dielectric Dispersion of Crystalline Powders of Amino Acids, Peptides, and Proteins1. The Journal of Physical Chemistry 69, 4176–4182 Gilson, M.K. & Honig, B.H. (1986). The dielectric constant of a folded protein. Biopolymers 25, 2097–2119 Antosiewicz, J., McCammon, J.A. & Gilson, M.K. (1994). Prediction of Phdependent Properties of Proteins. Journal of Molecular Biology 238, 415–436 Fitch, C.A., Karp, D.A., Lee, K.K., Stites, W.E., Lattman, E.E. & García-Moreno, E.B. (2002). Experimental pKa Values of Buried Residues: Analysis with Continuum Methods and Role of Water Penetration. Biophysical Journal 82, 3289–3304 Bashford, D. & Gerwert, K. (1992). Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. Journal of Molecular Biology 224, 473–486 28 32. Zhou, H.-X. & Vijayakumar, M. (1997). Modeling of protein conformational fluctuations in pKa predictions. Journal of Molecular Biology 267, 1002–1011 33. Van Vlijmen, H.W.T., Schaefer, M. & Karplus, M. (1998). Improving the accuracy of protein pKa calculations: Conformational averaging versus the average structure. Proteins: Structure, Function, and Bioinformatics 33, 145–158 34. Warshel, A., Sharma, P.K., Kato, M. & Parson, W.W. (2006). Modeling electrostatic effects in proteins. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1764, 1647–1676 35. Wallace, J.A. & Shen, J.K. (2011). Continuous Constant pH Molecular Dynamics in Explicit Solvent with pH-Based Replica Exchange. Journal of Chemical Theory and Computation 7, 2617–2629 36. Mongan, J., Case, D.A. & McCammon, J.A. (2004). Constant pH molecular dynamics in generalized Born implicit solvent. Journal of Computational Chemistry 25, 2038–2048 37. Alexov, E., Mehler, E.L., Baker, N., M. Baptista, A., Huang, Y., Milletti, F., Erik Nielsen, J., Farrell, D., Carstensen, T., Olsson, M.H.M., Shen, J.K., Warwicker, J., Williams, S. & Word, J.M. (2011). Progress in the prediction of pKa values in proteins. Proteins: Structure, Function, and Bioinformatics 79, 3260–3275 38. Williams, S.L., de Oliveira, C.A.F. & McCammon, J.A. (2010). Coupling Constant pH Molecular Dynamics with Accelerated Molecular Dynamics. Journal of Chemical Theory and Computation 6, 560–568 39. Alexov, E.G. & Gunner, M.R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophysical Journal 72, 2075–2093 40. Georgescu, R.E., Alexov, E.G. & Gunner, M.R. (2002). Combining conformational flexibility and continuum electrostatics for calculating pKas in proteins. Biophysical Journal 83, 1731–1748 41. Gunner, M.R., Zhu, X. & Klein, M.C. (2011). MCCE analysis of the pKas of introduced buried acids and bases in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 79, 3306–3319 42. Song, Y., Mao, J. & Gunner, M.R. (2009). MCCE2: Improving protein pKa calculations with extensive side chain rotamer sampling. Journal of Computational Chemistry 30, 2231–2247 43. Whitten, S.T., García-Moreno E., B. & Hilser, V.J. (2005). Local conformational fluctuations can modulate the coupling between proton binding and global structural transitions in proteins. Proceedings of the National Academy of Sciences of the United States of America 102, 4282–4287 44. Hilser, V.J. & Freire, E. (1996). Structure-based Calculation of the Equilibrium Folding Pathway of Proteins. Correlation with Hydrogen Exchange Protection Factors. Journal of Molecular Biology 262, 756–772 45. Hilser, V.J. & Freire, E. (1997). Predicting the equilibrium protein folding pathway: Structure-based analysis of staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 27, 171–183 46. Castañeda, C.A., Fitch, C.A., Majumdar, A., Khangulov, V., Schlessman, J.L. & García‐ Moreno, B.E. (2009). Molecular determinants of the pKa values of Asp and 29 47. 48. 49. 50. 51. Glu residues in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 77, 570–588 Lee, K.K., Fitch, C.A., Lecomte, J.T.J. & García-Moreno E, B. (2002). Electrostatic Effects in Highly Charged Proteins: Salt Sensitivity of pKa Values of Histidines in Staphylococcal Nuclease†. Biochemistry 41, 5656–5667 Lee, K.K., Fitch, C.A. & García-Moreno E., B. (2002). Distance dependence and salt sensitivity of pairwise, coulombic interactions in a protein. Protein Science 11, 1004– 1016 Søndergaard, C.R., McIntosh, L.P., Pollastri, G. & Nielsen, J.E. (2008). Determination of electrostatic interaction energies and protonation state populations in enzyme active sites. Journal of Molecular Biology 376, 269–287 McIntosh, L.P., Hand, G., Johnson, P.E., Joshi, M.D., Körner, M., Plesniak, L.A., Ziser, L., Wakarchuk, W.W. & Withers, S.G. (1996). The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: a 13C-NMR study of bacillus circulans xylanase. Biochemistry 35, 9958–9966 Ondrechen, M.J., Clifton, J.G. & Ringe, D. (2001). THEMATICS: A simple computational predictor of enzyme function from structure. Proceedings of the National Academy of Sciences of the United States of America 98, 12473–12478 30 2 Electrostatic Coupling in a Cluster of Carboxylic Groups in the Active Site of an Enzyme (to be submitted in a slightly different form to Journal of Molecular Biology under the authorship of Brian M. Doctrow, Carlos A. Castañeda, Carolyn A. Fitch, Ananya Majumdar, Maja Cieplak, Jamie L. Schlessman, and Bertrand García-Moreno E.) 31 2.1 Abstract Clusters of ionizable groups in active sites of enzymes are useful to examine the partitioning of Gibbs free energy in cooperative and allosteric ligand binding systems. The active site cluster of staphylococcal nuclease consists of two carboxylic groups (Asp-40, Glu-43) with near normal pKa values (3.9 and 4.3, respectively), one (Asp-19) with a low value of 2.2, and one (Asp-21) with an anomalous value of 6.5. Typical of active sites and other such clusters, FDPB calculations cannot reproduce the measured pKa values using εin = 20. NMR spectroscopy was used to examine the partitioning of cooperative interaction free energy between these four H+ binding sites. H+ titration curves of all carboxylic groups were measured for variants in which the charge of the cluster was modified. The data suggest that Asp-19 is insensitive to repulsive Coulomb interactions because it has a low intrinsic pKa value, a consequence of accepting a hydrogen bond from the backbone amide of Asp-21 and of favorable Coulomb interactions not observed in the crystal structure. Asp-21 absorbs most of the repulsive interaction energy in the cluster because it has an elevated intrinsic pKa, the result of its acting as a hydrogen bond donor to the backbone carbonyl of Val-39. This cluster of carboxylic groups exhibits the amplification of small perturbations that is the hallmark of cooperative systems. These results illustrate problems inherent to structure-based calculations of ligand binding energy in cooperative ligand binding systems, where small and unavoidable inaccuracies of the models used in the calculations and of the crystal structures used by the models are amplified, thereby compromising the accuracy of the calculations. Implications of these findings for structure-based pKa calculations are discussed. 32 2.2 Introduction Biological function in many proteins results from energetically coupling processes occurring in different parts of the molecule. Electrostatic forces govern biological function in enzymes, proton pumps, viruses, and other systems where cooperative H+ binding interactions are essential. Enzymes are especially noteworthy in this respect, as they often have clusters of ionizable groups at their active sites, and strong Coulomb interactions and structural reorganization coupled to ionization events can lead to anomalous pKa values and complex titration curves.1,2 The active site of xylanase, for example, contains two acidic residues: Glu78, which titrates with a near normal pKa of 4.6, and Glu-172, which titrates with an anomalous pKa of 6.7.3 Understanding how coupling free energy can be distributed in a network of ligand binding sites,4 and specifically how the pKa values of individual H+ binding sites are affected by cooperative interactions within clusters of ionizable groups is essential to understanding the relationship between protein structure and function. The active site of staphylococcal nuclease (SNase), with a cluster of four carboxylic residues (Asp-19, Asp-21, Asp-40, and Glu-43), is well suited for in-depth studies of coupling energies in charged clusters (Figure 2.1(a)). Previous studies5–7 have shown that all of these residues except Asp-19 are crucial for catalysis. The distribution of pKa values in this cluster is striking: Asp-21 has a pKa = 6.5, Asp-19 has a pKa = 2.2, and Asp-40 and Glu-43 have normal values of 3.9 and 4.3, respectively.8 The molecular determinants of these pKa values are not obvious in 33 Figure 2.1. (a) Structure of the active site of ∆+PHS SNase at 1.80 Å (PDB accession code: 3BDC)8 showing the side chains of Asp-19, Asp-21, Arg-35, Asp-40, and Glu43, as labeled. Shortest distances between the ionizable moieties are indicated. Also shown in purple is an apparent hydrogen bond between Asp-21 and the backbone amide of Thr-41. The green sphere indicates where Ca2+ is bound. (b) Structure of the active site the NVIAGA/E75A variant of SNase at 2.01 Å (PDB accession code 2RDF)9 showing the side chains of Asp-19, Asn-21, Arg-35, and Asp-40, as labeled (Glu-43 is disordered). Shortest distances from Arg-35 to Asp-19 or Asn-21 are indicated. Also shown in purple is the apparent hydrogen bond between Asn-21 and the backbone carbonyl of Val-39. 34 SNase crystal structures, which appear not to reflect the conformational state of this region of the protein in solution. The carboxylic groups in this cluster are known to interact with each other. For example, titration curves obtained from the pH dependence of the chemical shifts the Cγ and Hβ of Asp-19 measured with NMR spectroscopy report on the titration of Asp-21, and vice versa.8 Similarly, the Cγ and Hβ chemical shifts of Asp40 also appear to report on the protonation state of Asp-21. Furthermore, Asp-40 and Glu-43 have low Hill coefficients, even in the presence of 1M KCl, suggesting that their titration is under the influence of other groups with comparable pKa.8 The pKa values suggest that the free energy from cooperative interactions is not partitioned evenly among the cluster elements, and that Asp-21 absorbs all of the repulsive interactions in the cluster. Crystal structures do not provide insight into the structural and physical origins of this apparent asymmetry. The H+ titration curves described by the pH-dependence of chemical shifts measured with NMR spectroscopy represent individual-site binding isotherms.4 They describe the free energy for binding H+ to one specific site while binding can occur simultaneously at all other sites. In charged systems the cooperative interactions between ligand binding events at different individual H + binding sites can be mediated directly through Coulomb interactions, and indirectly through conformational changes that shift the equilibrium of the H+ binding site between charged and neutral states. The individual-site binding isotherms thus represent a convolution of numerous processes and interactions. In scenarios more complex than a two- or three-site ligand binding system, it would be impossible to 35 deconvolute analytically the intrinsic, microscopic ligand binding constants for each site from the cooperative interaction energies.4 However, because the intrinsic ligand binding properties of the different ionizable sites can be studied independently by measurement of pKa values of model compounds in water (e.g. pKa = 4.0 or 4.5 for Asp or Glu in water, respectively), it is possible to measure cooperative interactions in individual-site binding isotherms empirically. Ackers and co-workers demonstrated analytically that the free energy of cooperative interactions in a two-site ligand binding system need not be distributed symmetrically.4 Instead the cooperative interaction will have a larger effect on the binding site with the weaker intrinsic binding affinity. In the case of two carboxylic groups with cooperative interactions mediated by Coulomb forces, the largest share of the cooperative interaction will be allocated to the site with the higher intrinsic pKa, defined as the pKa that the group would have in the absence of other ionizable groups. When the intrinsic pKa values of the two sites are very different, the one with the higher intrinsic pKa will be neutral in the pH range where the site with the lower intrinsic pKa titrates. Therefore the site with low intrinsic pKa titrates as if the other site is not there. Figure 2.1(a) depicts the active site of SNase as observed in atomic coordinates (PDB accession code 3BDC)8 that can be considered representative of published structures of SNase. The apparent hydrogen bond between Asp-21 and the backbone amide of Thr-41 should lower the pKa of Asp-21. The inferred ion pair between Asp-21 and Arg-35 should compensate for repulsive Coulomb interactions with other members of the cluster and maybe also lower its pKa. Instead, the pKa of 36 Asp-21 is 6.5, significantly higher than the Asp model compound pKa of 4.0. Burial of a carboxyl oxygen could lead to an elevated pKa because dehydration will destabilize the charged state. However the average solvent accessibility of the carboxyl oxygen atoms of Asp-21 is similar to that of Asp-19 (16% vs. 19%), so the shift in pKa resulting from dehydration is unlikely to be an issue. It is not obvious from the crystal structure why the pKa of Asp-21 is high and that of Asp-19 is low. This project sought to examine the nature of cooperative interactions in the charge cluster. Charges in or near the cluster were removed systematically with site-directed mutagenesis, and H+ binding isotherms for all carboxylic groups in the variants were measured with NMR spectroscopy.8 The crystal structure of SNase variant ∆+PHS/D21N was determined, and the conformations of the active site cluster in 119 crystal structures of SNase were compared. Nuclear Overhauser effects (NOEs) were used to examine specific atomic contacts in the active site region of the protein in solution. After the interactions in the cluster were understood with sufficient detail to explain why the pKa values of Asp-19 and Asp21 are so different, pKa values were calculated with a variety of continuum electrostatics methods to illustrate the inherent difficulties with structure-based calculation of pKa values in clusters of ionizable groups where strong cooperative interactions are present. In these cases, precisely because of the cooperative nature of the system, the accuracy of energy calculations is compromised by the unavoidable amplification either of inaccuracies inherent to the electrostatic models or of artifacts in the crystal structures. 37 2.3 Results 2.3.1 Coulomb interactions in the active site cluster To determine the contributions from pairwise Coulomb interactions to the pKa values of carboxylic groups in the active site cluster, variants of the highly stable ∆+PHS form of SNase were created in which the ionizable groups in the cluster (Asp-19, Asp-21, Arg-35, Asp-40, and Glu-43) were replaced with the non-ionizable analogues Asn or Gln. The ∆+PHS form of SNase was used as the background protein to ensure it remained folded in the pH range where Asp and Glu residues titrate, so that their pKa values could be measured. Of the 8 Asp and 12 Glu residues in SNase, most were insensitive to these substitutions and exhibited no shifts in pKa (Appendix A). The D19N, D40N, and E43Q substitutions lowered pKa values in the cluster, consistent with removing repulsive Coulomb interactions. Significant repulsive interactions were apparent between Asp-19 and Asp-21, Asp-19 and Glu-43, Asp-40 and Glu-43, and Asp-21 and Asp-40, but not between Asp-19 and Asp-40 (Table 2.1), which is not surprising because Asp-19 and Asp-40 are on opposite ends of the cluster (Figure 2.1(a)) and farther apart than any other pair of carboxylic groups in the cluster. Glu-43 also appeared to interact with Glu-52, whose ionizable moiety is roughly 6 Å from that of Glu-43, but which does not interact with any of the other groups in the cluster. The pKa of Asp-19 increased slightly in the E43Q variant. Similarly, the D21N substitution caused the pKa values of the other groups in the cluster to increase. Neither effect can be explained in terms of Coulomb interactions, as the mutated 38 Table 2.1. pKa values of Asp and Glu residues in or near the active site of SNase measured at 100 mM KCl Protein ∆+PHSc ∆+PHS/D19N ∆+PHS/D21N ∆+PHS/D40N ∆+PHS/E43Q ∆+PHS/D19N/D40N/E43Q Residue Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 pKaa 2.12 ± 0.05d,e 6.54 ± 0.01e 3.83 ± 0.05e 4.32 ± 0.03 3.93 ± 0.05 5.75 ± 0.02e 3.80 ± 0.03 3.79 ± 0.01 3.85 ± 0.02 2.60 ± 0.01d 3.94 ± 0.01 4.46 ± 0.02 4.10 ± 0.03 2.19 ± 0.01d,e 6.18 ± 0.01e 4.11 ± 0.01 3.77 ± 0.02 2.34 ± 0.01d,e 6.16 ± 0.01e 3.69 ± 0.01e 3.65 ± 0.01 4.57 ± 0.01 39 ∆pKab -0.79 ± 0.02 -0.03 ± 0.06 -0.52 ± 0.03 -0.08 ± 0.05 0.48 ± 0.05 0.11 ± 0.05 0.14 ± 0.05 0.17 ± 0.06 0.07 ± 0.05 -0.36 ± 0.01 -0.21 ± 0.03 -0.16 ± 0.05 0.22 ± 0.05 -0.38 ± 0.01 -0.14 ± 0.05 -0.28 ± 0.05 -1.97 ± 0.01 na 0.81 ± 0.01d,e 1.03 ± 0.02e 0.65 ± 0.01e 0.69 ± 0.01 0.65 ± 0.02 0.94 ± 0.02e 0.56 ± 0.02 0.68 ± 0.01 0.69 ± 0.01 0.82 ± 0.02 0.68 ± 0.01 0.67 ± 0.02 0.66 ± 0.02 0.93 ± 0.02d,e 0.97 ± 0.01e 0.73 ± 0.01 0.75 ± 0.02 0.81 ± 0.03d,e 0.93 ± 0.01e 0.83 ± 0.01e 0.75 ± 0.01 0.93 ± 0.02 ∆+PHS/R35Q ∆+PHS/D19N/R35Q/D40N/E43Q ∆+PHS/D21N/R35Q/D40N/E43Q ∆+PHS/R35Q/D40N/E43Q PHS Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 Asp-40 Glu-43 Glu-52 3.55 ± 0.01 3.06 ± 0.01d,e 6.05 ± 0.01e 4.27 ± 0.01 4.45 ± 0.02 3.89 ± 0.02 4.65 ± 0.01 3.67 ± 0.01 3.46 ± 0.03 3.76 ± 0.02 3.10 ± 0.05e 5.70 ± 0.01 3.78 ± 0.01 2.05 ± 0.05d,e 6.12 ± 0.05 3.73 ± 0.02e 3.74 ± 0.03e 3.90 ± 0.02 40 -0.38 ± 0.05 0.94 ± 0.05 -0.49 ± 0.01 0.44 ± 0.05 0.13 ± 0.04 -0.04 ± 0.02 -1.89 ± 0.01 -0.26 ± 0.05 1.34 ± 0.06 -0.17 ± 0.05 0.98 ± 0.07 -0.84 ± 0.01 -0.15 ± 0.05 -0.07 ± 0.07 -0.42 ± 0.05 -0.10 ± 0.05 -0.58 ± 0.04 -0.03 ± 0.05 0.87 ± 0.02 0.88 ± 0.02d,e 0.89 ± 0.02e 0.64 ± 0.01 0.63 ± 0.02 0.61 ± 0.01 0.90 ± 0.02 0.85 ± 0.02 0.96 ± 0.04 0.86 ± 0.02 0.87 ± 0.08e 0.99 ± 0.02 0.91 ± 0.01 0.63 ± 0.06d,e 0.81 ± 0.06 0.84 ± 0.03e 0.76 ± 0.07e 0.79 ± 0.02 pKa values and Hill coefficients obtained by fitting the modified Hill equation (equation (2.2)) to the pH-dependence of the Cγ/Cδ chemical shift, unless otherwise indicated. Titrations were performed at 298 K and 100 mM KCl. Values reported are those from a single titration experiment with corresponding errors of fit, unless otherwise indicated. b Change in pKa relative to ∆+PHS at 100 mM KCl: ∆pKa = pKavariant – pKa∆+PHS c Values reported for ∆+PHS at 100 mM KCl are means and standard errors over 3 independent titration experiments, using the data from Castañeda et al.8 d pKa and Hill coefficient determined by fixing the amplitude (∆δ) of the transition to the ∆δ obtained from the fit for the same residue in ∆+PHS at 1 M KCl. e pKa and Hill coefficient obtained by fitting a two-site model (equation (2.3)) to the pH-dependence of the Cγ/Cδ chemical shift. Only the values corresponding to the larger of the two transitions are reported. a 41 groups are neutral in the range where the affected carboxylic groups titrate. These observations suggest the D21N and E43Q substitutions must somehow perturb conformational equilibria in the cluster. The cluster of carboxylic groups forms a binding site for Ca2+ and the crystal structures of ∆+PHS and many other SNase variants show a site-bound Ca2+ ion within this cluster. However, previous work has demonstrated that treating SNase with EDTA does not affect the pKa values of the carboxylic groups measured with NMR spectroscopy, suggesting that trace amounts of Ca2+ do not affect these pKa values.8 pKa values of carboxylic groups in ∆+PHS were also measured when Ca2+ and the inhibitor thymine 3’-5’-diphosphate (pdTp) were added at the same time. The resulting changes were minimal (< 0.2 pH units) for all residues except Asp-21, whose pKa decreased to 5.36 (data not shown). In addition, certain residues showed a second titration event having about the same pKa as Asp-21, which presumably corresponds to the pH-dependent dissociation of Ca2+ and pdTp. Thus, it seems likely that even when they are present in the sample, Ca2+ and pdTp are not bound to a significant extent below pH 5, where most carboxylic groups titrate. 2.3.2 pKa of Asp-21 The titration curves of Asp-21 show that replacement of any carboxylic group in the cluster with a neutral group lowers the pKa of Asp-21 relative to its value of 6.54 ± 0.01 in the ∆+PHS protein (Figure 2.2(a)). This is fully consistent with the presence of unfavorable Coulomb interactions between Asp-21 and all other carboxylic groups in the cluster. Note that even when all three carboxylic 42 Figure 2.2: (a) Titration curves for Asp-21 in ∆+PHS SNase (black circles) and in the D19N (cyan), D40N (red), E43Q (green), D19N/D40N/E43Q (orange), and D19N/R35Q/D40N/E43Q (violet) variants, all at 0.1 M KCl. (b) Titration curves for Asp-21 in ∆+PHS SNase (black) and in the D19N/D40N/E43Q variant (orange) at 0.1 M KCl (filled circles) and at 1 M KCl (open circles). Lines represent fits of a modified Hill equation to the data. Except for D19N/D40N/E43Q and D19N/R35Q/D40N/E43Q at 0.1 M KCl, a two-site modified Hill equation (equation (2.3)) was used, where the Hill coefficient of the smaller transition was fixed arbitrarily to be 1.0. This smaller transition reflects the titration of either Asp-19 (in ∆+PHS, D40N, and E43Q), Asp-40 (in D19N), or Asp-83 (in D19N/D40N/E43 at 1 M KCl). A one-site modified Hill equation (equation (2.2)) was fitted to the data for D19N/D40N/E43Q and D19N/R35Q/D40N/E43Q at 0.1 M KCl. 43 groups were replaced, the pKa of Asp-21 was 4.57, which is 0.67 pH units higher than the normal pKa of 3.9 for Asp in water.8 The effects of these substitutions were not additive: the ∆pKa of Asp-21 when all three carboxylic groups were replaced was −1.97 ± 0.01 pH units, whereas the sum of the ∆pKa values for the single-residue variants resulted in a net pKa shift of only −1.53 ± 0.02. In ∆+PHS at 0.1 M KCl, the resonance of Asp-21 broadens significantly at pHvalues near its pKa.8 Such broadening of Asp-21 was observed to some extent in all of the variants, and probably reflects exchange between protonated and deprotonated forms. Since the rate of this exchange is pH-dependent (kex = koff + kon[H+]), a higher pKa corresponds to slower exchange during titration, possibly resulting in intermediate as opposed to fast exchange.10 In variants where the pKa of Asp-21 is lower than 6, the broadening is less pronounced, and the peak remains visible throughout the titration (Figure 2.2), consistent with faster exchange at lower pH. Wild-type SNase contains a disordered loop spanning residues 44-49 that is excised in ∆+PHS SNase. This loop is close to the active site and contains four basic residues (Lys-45, His-46, Lys-48, and Lys-49). To test whether the pKa of Asp21 was affected by the presence of this loop, the pKa values of carboxylic groups were also measured in the SNase variant PHS, in which the 44-49 loop is present and which has Gly-50 and Val-51 as opposed to Phe-50 and Asn-51 in ∆+PHS.11 The measured pKa of Asp-21 in PHS SNase was 6.12, which is 0.42 pH units lower than in ∆+PHS (Table 2.1). The pKa of Glu-43 in PHS SNase was 0.58 pH units lower than in ∆+PHS. These changes are consistent with the presence of favorable Coulomb 44 interactions between Asp-21 and the basic residues in the loop in the wild type protein. However, even in the presence of these interactions, the pKa of Asp-21 is still significantly higher than the pKa of Asp in water. 2.3.3 pKa values at high ionic strength The pKa values of carboxylic groups in ∆+PHS were measured previously at both 0.1 M and 1 M KCl to estimate the overall contribution of Coulomb interactions to the pKa values of these groups.8 This assumes that because salt screens Coulomb interactions, all contributions from Coulomb effects to pKa values are eliminated at high salt. Upon increasing from 0.1 M to 1 M KCl, the pKa of Asp-21 decreases by 0.52 units, which is only 20% of the total difference between the Asp-21 pKa at 0.1 M KCl (6.54 ± 0.01) and the normal pKa of Asp in water (3.90 ± 0.01) (Table 2.2 and Figure 2.2(b)).8 The pKa of Asp-21 was elevated even when the other carboxylic groups in the active site were removed (Table 2.1). To determine if this was caused by long-range repulsive Coulomb interactions with residues outside of the active site, the pKa values of Asp and Glu residues in the D19N/D40N/E43Q variant of ∆+PHS were measured in 1 M KCl. For most residues, the differences between 0.1 and 1 M KCl were comparable in the variant and in ∆+PHS (Appendix A). The pKa shift of Glu-52 due to increased KCl was more than twice as large in the variant as in the reference protein (Table 2.2). This can be explained if one assumes that in the background protein the simultaneous screening of repulsive Coulomb interactions offsets the increase in pKa from screening of attractive Coulomb interactions. As noted earlier, 45 Table 2.2. pKa values of Asp and Glu residues in or near the active site of SNase in 1M KCl. Protein Residue Asp-19 Asp-21 c ∆+PHS Asp-40 Glu-43 Glu-52 Asp-19 Asp-21 ∆+PHS/D19N/D40N/E43Q Asp-40 Glu-43 Glu-52 pKaa 2.88 ± 0.02d 6.02 ± 0.01d 4.28 ± 0.01 4.40 ± 0.01 4.08 ± 0.02 5.01 ± 0.01d 3.91 ± 0.01 ∆pKab 0.76 ± 0.05 -0.52 ± 0.01 0.45 ± 0.05 0.08 ± 0.03 0.15 ± 0.05 0.44 ± 0.01 0.36 ± 0.01 na 0.83 ± 0.04d 0.94 ± 0.02d 0.81 ± 0.01 0.81 ± 0.02 0.84 ± 0.02 0.95 ± 0.02d 0.94 ± 0.02 pKa values and Hill coefficients obtained by fitting the modified Hill equation (equation ( 2.2)) to the pH-dependence of the Cγ/Cδ chemical shift, unless otherwise indicated. Titrations were performed at 298 K and 1 M KCl. Values reported are those from a single titration experiment with corresponding errors of fit. b Change in pKa relative to the same variant at 100 mM KCl: ∆pKa = pKa1M – pKa100mM c pKa values obtained using the data from Castañeda et al.8 d pKa and Hill coefficient obtained by fitting a two-site model (equation ( 2.3)) to the pH-dependence of the Cγ/Cδ chemical shift. Only the values corresponding to the larger of the two transitions are reported. a 46 the pKa values indicate a repulsive Coulomb interaction between Glu-43 and Glu-52. In the D19N/D40N/E43Q variant this repulsive interaction has been removed, leaving only attractive interactions to be screened by salt. This effect was even more pronounced in the change in the pKa of Asp-21. Whereas in ∆+PHS increasing the salt concentration lowered the pKa of Asp-21, in the D19N/D40N/E43Q variant increasing the salt concentration elevated the pKa of Asp-21 to a value of 5.01 (Table 2.2 and Figure 2.2(b)). These results indicate that in the absence of Asp-19, Asp-40, and Glu-43, the net Coulomb interactions sensed by Asp-21 are attractive, and that Asp-21 is not under the influence of significant Coulomb interactions with negatively charged groups outside the active-site region. 2.3.4 Influence of Arg-35 In the crystal structure of ∆+PHS SNase (Figure 2.1(a)), the carboxyl group of Asp-21 is within 4 Å of the Arg-35 guanidino group, suggesting a favorable Coulomb interaction between these two groups. To test this, pKa values were measured in the R35Q variant (Table 2.1). Because this substitution removes an apparent favorable Coulomb interaction with Asp-21, the pKa of Asp-21 was expected to shift up relative to the background. However, the pKa of Asp-21 in the R35Q variant was 0.49 units lower than in the background (Figure 2.3). One possible explanation is that Gln-35 could form a hydrogen bond to Asp-21 that compensates for the loss of the favorable Coulomb interaction with Arg-35. In the R35Q variant the pKa value of Asp-19 was elevated by 0.94 units relative to the background, consistent with the loss of a strong Coulomb attraction 47 Figure 2.3. (following page) Titration curves for carboxylic groups in the active site in both ∆+PHS (black) and the R35Q variant (red). Lines represent fits of equation (2.2) or (2.3) to the data. Equation (2.2) was used to fit Asp-40 in the R35Q variant and Glu-43 in both variants, whereas equation (2.3) was used to fit Asp-19 and Asp21 in both variants, and Asp-40 in ∆+PHS. Dashed lines indicate that the fit was performed using a fixed value of ∆, as described in Materials and Methods. 48 49 between Asp-19 and Arg-35. This is surprising given that in the crystal structure, these groups are relatively far apart (~7 Å) and both groups are exposed to bulk water (Figure 2.1(a)). The pKa of Asp-40, which is closer (~5.5 Å) to Arg-35 in the crystal structure, was elevated by only 0.44 units in the R35Q variant and that of Glu-43 by only 0.13 units, entirely consistent with the large (~8 Å) distance between Arg-35 and Glu-43. In the absence of the other carboxylic groups, Arg-35 has no significant effect on the pKa of Asp-21 (Table 2.1 and Figure 2.2(a)). To explain the apparent strong Coulomb interaction between Arg-35 and Asp-19, the conformations observed for this region of the protein in 119 crystal structures of SNase in the Protein Data Bank12 were compared. Histograms of the distances between residues 19 and 35 (Figure 2.4(a)) and between 21 and 35 (Figure 2.4(b)) in 119 crystal structures are both bimodal. The two clusters in each distribution correspond to cases with or without Ca2+ bound at the active site. In nearly all cases Asp-19 is at least 7 Å away from Arg-35 and Asp-21 is always within 5 Å of Arg-35. However, two outliers (PDB accession codes 2QDB and 2RDF)9 were also identified in which Asp-19 and Arg-35 are closer (< 6 Å). Both of these outliers were variants of the NVIAGA form of SNase, so called after the six substitutions used to make it (D21N, T33V, T41I, S59A, P117G, S128A).9 The D21N substitution changes the hydrogen-bonding capacity of this residue. The crystal structure shows Asn-21 replacing Arg-35 as a hydrogen-bonding partner to the backbone carbonyl of residue 39 (Figure 2.1(b)). Arg-35 in turn adopts a conformation that puts it closer to Asp-19. A similar conformation for Arg-35 has been observed in other NVIAGA variants (Doctrow et al., in preparation). 50 This conformation is more Figure 2.4. (a) Histogram of distances between the side chains of residues Asp-19 and Arg-35 in the SNase structures listed in Materials and Methods. The peaks of the clusters that correspond to the presence and absence of Ca2+ are indicated. (b) Same as (a) for the distances between residues Asp-21 and Arg-35. 51 consistent with the apparent strong Coulomb interaction between Asp-19 and Arg35 (Figure 2.3), suggesting that this conformation is more representative of the structure of SNase in solution at the pH where Asp-19 titrates. To better determine whether the crystal structure of ∆+PHS or NVIAGA better represents the conformation of Arg-35 in solution at low pH, a 15N-NOESY- HSQC spectrum was collected for ∆+PHS at pH 4.68. Table 2.3 lists all of the expected NOE interactions involving Arg-35-Hε for both the NVIAGA and ∆+PHS crystal structures using a cutoff distance of 4 Å. Many more NOEs are expected for the NVIAGA conformation than for the ∆+PHS conformation, allowing the two conformations to be distinguished in solution. All but one of the observed NOEs involving Arg-35-Hε are consistent with the NVIAGA structure, but several of them are inconsistent with the ∆+PHS crystal structure (Figure 2.5, colored peaks & atoms). The observed NOE involving a Leu-36-Hδ is not consistent with either structure. However, the distance between Arg-35-Hε and Leu-36-Hδ1, which is the closer of the two leucine methyl groups, is much shorter in the NVIAGA structure than in the ∆+PHS structure. Taken together, these NOEs indicate that the conformation of Arg-35 in solution at low pH is more similar to the NVIAGA crystal structure than to the ∆+PHS structure, and the small remaining discrepancies may be attributed to crystal packing interactions. Therefore, pKa calculations based solely on the ∆+PHS crystal structure will not be correct because the crystal structure does not reflect the relevant conformation in solution. 52 Table 2.3. List of expected NOE interactions involving Arg-35-Hε for both the NVIAGA and ∆+PHS crystal structures. Atoma Thr-22-Hα Thr-22-Hγ1 Thr-22-Hγ2 Arg-35-Hα Arg-35-Hβ2 Arg-35-Hγ2 Arg-35-Hγ3 Arg-35-Hδ2 Arg-35-Hδ3 Leu-36-HN Leu-36-Hδ1 Arg-87-Hδ2 Arg-87-HH12 Arg-87-HH22 Distance from Arg-35 (Å) ∆+PHSb NVIAGAc 6.38 3.09 6.3 3.13 4.17 2.53 4.85 2.52 4.37 2.69 2.87 3.5 3.53 3.94 2.94 2.94 2.35 2.35 5.14 3.82 7.68 5.33 5.51 3.98 2.77 5.03 2.99 5.42 NOE expected ? ∆+PHS NVIAGA No Yes No Yes No Yes No Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes No Yes No No No Yes Yes No Yes No Hydrogen coordinates added to structures using the MolProbity server (molprobity.biochem.duke.edu).13,14 b Distances obtained from the crystal structure of holo-∆+PHS (PDB accession code 3BDC) c Distances obtained from the crystal structure of NVIAGA/E75Q (PDB accession code 2QDB). a 53 Figure 2.5: (a) F2(1H)-F3(1H) slice from the 15N-NOESY-HSQC spectrum of ∆+PHS centered on the Arg-35-Nε and Hε resonances in F1 and F3, respectively. NOESY peaks that are not expected based on the ∆+PHS crystal structure are colored by residue: green = Arg-35, cyan = Thr-22, magenta = Leu-36. Structure of the environment of Arg-35 in (b) NVIAGA (PDB accession code 2QDB)9 and (c) ∆+PHS (PDB accession code 3BDC).8 Atoms involved in the observed NOE interactions are colored as in (a). 54 2.3.5 Crystal structure of the D21N variant To determine if the presence of Asn-21 altered the conformation of Arg-35, the crystal structure of ∆+PHS/D21N was determined. The conformation of Arg-35 is similar to that in ∆+PHS, and not to that in either of the NVIAGA structures (Appendix A). However, the ∆+PHS/D21N crystal structure includes the bound inhibitor pdTp, which is not present in the NVIAGA structures. One of the phosphate groups of this inhibitor interacts directly with Arg-35, and may serve to stabilize the side chain and bias the conformation of the active site. There are almost no intermolecular interactions in the ∆+PHS/D21N crystal structure involving the cluster residues that could bias the conformation of these residues. No polar atoms from neighboring molecules come within hydrogen bonding distance of residue 21. Residues Arg-105, Lys-127, and Lys-134 from a neighboring molecule are all within 7 Å of Glu-43 and may therefore form favorable Coulomb interactions with Glu-43, but not with any of the other cluster residues. Similar interactions are observed in the ∆+PHS crystal structure. In addition, in the ∆+PHS/D21N crystal structure the N-terminus of a neighboring molecule is within 6 Å of residues 19, 35 and 43, and therefore there may be Coulomb interactions between the N-terminus and these groups. The closest intermolecular Coulomb contact is 4.5 Å, between Asp-19-Oδ2 and the N-terminus of a neighboring molecule. The N-terminus is not ordered in most SNase crystal structures, including ∆+PHS. The structure confirms that for the purposes of pKa calculations, crystal structures alone may not reliably represent the state of the protein in solution. Certain states 55 may crystallize more easily than others, and those states may not correspond to the dominant states in solution. 2.4 Discussion 2.4.1 Determinants of the intrinsic pKa of Asp-21 A protonated Asp-21 acting as a hydrogen bond donor should stabilize the protonated state and lead to a high intrinsic pKa.15 In the NVIAGA crystal structures Asn-21 replaces Arg-35 as a hydrogen bond donor to the backbone carbonyl of Val39, and this may be the reason for the difference in the conformation of Arg-35 between NVIAGA and other SNase variants. In variants with Asp at position 21, this hydrogen bond could form only when Asp-21 is protonated under conditions of pH lower than those under which the crystals were grown. If this hydrogen bond does determine the conformation of Arg-35, this would render the conformations of Asp21 and Arg-35 pH-sensitive. Since the crystal structures of ∆+PHS and all other SNase variants were obtained from crystals grown at a pH above the pKa of Asp-21, the crystal structures are likely to show Arg-35 in the conformation present when Asp-21 is charged. However, all of the other carboxylic groups in SNase titrate at pH values well below that of Asp-21 and so their titrations will be influenced by Arg-35 in the conformation corresponding to Asp-21 neutral.8 This is supported by the effect of the R35Q substitution on the pKa of Asp-19 (Figure 2.3), and by the NOEs involving Arg-35-Nε (Figure 2.5). Both of these results are more consistent with the conformation observed in the NVIAGA crystal structure than in the ∆+PHS crystal 56 structure. Chemical shifts of atoms up to 12 Å away from Asp-21 report on its titration,8 which could indicate a conformational change linked to the titration of Asp-21. In contrast to Asp-21, the pKa of Asp-19 is depressed. Even when all other ionizable groups in the active site (including Arg-35) are neutralized, Asp-19 has a pKa of 3.46, which is 0.44 pH units lower than the normal pKa of 3.9 for Asp in water.8 This suggests a depressed intrinsic pKa for Asp 19. This could result from Asp-19 acting as a hydrogen bond acceptor, which would stabilize the deprotonated state. Inspection of the crystal structures of several nuclease variants reveals multiple potential donors within hydrogen bonding distance of Asp-19.8 In addition, the Asp-21 HN chemical shift appears to follow the titration of Asp-19, which suggests a hydrogen bond between these two groups.8 2.4.2 Role of intrinsic binding affinities in partitioning of cooperative energy For any given pair of ionizable residues A and B, the Coulomb interaction between them will contribute to the pKa of A only when some fraction of B is in the charged state while A is titrating, et vice versa. Thus the pKa of site A will depend on the extent of binding at B, giving rise to a cooperative interaction between the two binding sites and a pH-dependence for the Coulomb interaction. This situation is described by the following thermodynamic cycle: 57 Here ∆GAint and ∆GBint refer to the intrinsic proton binding affinities of groups A and B, respectively, and ∆GAB represents the energy of the cooperative (Coulomb) interaction between them. Titration curves for the individual sites derived from this cycle are shown in Figure 2.6. Dashed lines are simulated titration curves for two non-interacting sites (∆GAB = 0) with different intrinsic pKa values. Solid lines are the titration curves for the same groups when they are assumed to have a Coulomb interaction between them. It is evident from the curves in Figure 2.6 that the titration of the group with the lower intrinsic pKa (black) is largely unaffected by the presence of the Coulomb interaction, whereas the titration of the group with the higher intrinsic pKa (red) is shifted to a significantly higher pH. This is consistent with the analysis of Ackers et al.4 Titration curves resembling the solid lines in Figure 2.6 have been observed for residues in the active sites of various enzymes.1,3 These active sites tend to have multiple ionizable groups carrying like charge in close proximity and protected from solvent, resulting in strong Coulomb interactions. In fact, titration curves resembling those in Figure 2.6 have been used as diagnostics to identify ionizable 58 Fraction charged 1.0 0.8 0.6 0.4 0.2 0.0 0 2 4 6 8 10 pH Figure 2.6. Simulated titration curves for two carboxylic groups in the absence of any interaction (∆GAB = 0, dashed lines) and in the presence of a Coulomb interaction between them (solid lines). The black lines correspond to a group with intrinsic pKa of 3.8 and the red lines to an intrinsic pKa of 4.6. The energy of the Coulomb interaction (∆GAB) has been arbitrarily set to 3.1 kcal/mol to emphasize the fact that when there is an interaction it is the weaker of the two binding sites that is most affected and therefore shifted the most. 59 groups in enzyme active sites.2 It has been demonstrated that such strong cooperative interactions result in a free energy of protonation that is relatively constant over a wide pH range. Such a mechanism may help catalytic residues to retain the proper protonation state over a wider environmental pH range.16 The pKa of Asp-21 in the D19N/D40N/E43Q variant in 1 M KCl has a pKa of 5.0 (Table 2.1), which is 1.1 pH units higher than the pKa of Asp in water. Assuming that high salt screens Coulomb interactions with other charges without affecting dehydration energies or short-range interactions with other polar groups, the pKa of Asp-21 in this variant at high salt provides evidence that its intrinsic pKa is also elevated significantly. Therefore, it appears that Asp-21 behaves like the red group in Figure 2.6 and that in the presence of repulsive Coulomb interactions its pKa is shifted even further, whereas Asp-21 has little or no influence on the pKa values of the other carboxylic groups. To better demonstrate this we measured titration curves for Asp-19 and Asp-21 in variants with all other ionizable groups in the active site neutralized (∆+PHS/R35Q/D40N/E43Q, ∆+PHS/D19N/R35Q/D40N/ E43Q, and ∆+PHS/D21N/R35Q/D40N/E43Q, Table 2.1 and Figure 2.7). This allowed us to examine in greater detail the interactions between two carboxylic groups within the context of a protein. The pKa of Asp-19 in ∆+PHS/D21N/R35Q/D40N/E43Q approximates the intrinsic pKa of Asp-19 and has a value of 3.46. This is lower than the normal pKa for Asp in water, and presumably reflects the ability of Asp-19 to form a favorable hydrogen bond with the Asp-21 HN. Similarly the pKa of Asp-21 in ∆+PHS/D19N/R35Q/D40N/E43Q has a pKa of 4.65, which is higher than the pKa of Asp in water 60 182 180 Chemical Shift (ppm) 178 176 Asp-19 182 180 178 176 Asp-21 1 2 3 4 5 6 7 8 9 pH Figure 2.7: Titration curves for Asp-19 (top) and Asp-21 (bottom) in ∆+PHS/D21N/R35Q/D40N/E43Q (red), ∆+PHS/D19N/R35Q/D40N/E43Q (blue), and ∆+PHS/R35Q/D40N/E43Q (green). Lines represent fits of equations (2.2) or (2.3) to the data. Equation (2.3) was used to fit Asp-19 in ∆+PHS/R35Q/D40N/E43Q; equation (2.2) was used to fit all other curves. 61 According to the thermodynamic cycle described above, a Coulomb interaction between Asp-19 and Asp-21 should result in a large increase in the pKa of Asp-21, because it has the higher intrinsic pKa, and a small, if any, increase in the pKa of Asp-19. Consistent with this model, when both of these groups are present (∆+PHS/R35Q/D40N/E43Q), the pKa of Asp-21 is 5.70, an increase of about one pH unit relative to its pKa in the absence of Asp-19. On the other hand, Asp-19 has a pKa of 3.10, which is 0.36 pH units lower than its pKa in the absence of Asp-21 (Figure 2.7). This difference cannot be explained in terms of the Coulomb interaction alone, and suggests that the pKa of Asp-19 in the presence of Asn-21 is not exactly equal to the “true” intrinsic pKa of Asp-19 (and perhaps likewise for the intrinsic pKa of Asp21). Because of this discrepancy we do not know how much of the measured pKa shifts are due to Coulomb interactions versus other effects, and thus we cannot determine the energy of the Coulomb interaction ∆GAB from the measured pKa shifts alone. Nevertheless, the results are in qualitative agreement with the above model: the group with the higher intrinsic pKa (Asp-21) exhibits a large response to the Coulomb interaction with Asp-19, whereas Asp-19, having the lower intrinsic pKa, does not appear to sense the Coulomb interaction at all. 2.4.3 Implications for structure-based pKa calculations In FDPB calculations, the determinants of the pKa for a group are separated into contributions from hydration, dipoles, and other ionizable groups according to the following formalism: 62 (2.1) where pKaW is the pKa of model compounds in water, ∆pKaBorn is the shift in pKa caused by changes in hydration (Born energy), ∆pKabg is the background energy arising from the interaction of the charge with the permanent dipoles of the protein, and ∆pKa,ij is the contribution from the Coulomb interaction between groups i and j. The self-energy, ∆pKa,ii is the sum of the Born and background terms, and the intrinsic pKa, (pKaint) is the sum of the model compound pKa and the self-energy. FDPB calculations applied to static structures exaggerate the magnitude of Coulomb interactions between carboxylic groups and basic groups, and hence the calculated pKa values of carboxylic groups tend to be more depressed than the measured ones.8,17 What is not well appreciated is that when cooperative interactions are present, if the intrinsic pKa values, which are typically calculated first, are not calculated correctly, the resulting pKa values will not be correct even if the magnitude of Coulomb interaction energies are calculated perfectly. To examine the difficulties structure-based pKa calculations have in simultaneously calculating self- and Coulomb energies, calculations were performed on ∆+PHS SNase using different implementations of continuum electrostatics with the finite difference Poisson-Boltzmann algorithm (FDPB) using static crystal structures and MD-relaxed structures. In FDPB calculations using in = 20 and the default tautomeric state for the cluster residues, the intrinsic pKa values are 4.50 for Asp-21, 3.52 for Asp-19, 63 4.06 for Asp-40 and 4.19 for Glu-43. This underestimates the intrinsic pKa of Asp-21 of 5.01 determined from the D19N/D40N/E43Q variant at 1 M KCl, conditions under which Coulomb contributions to the pKa of Asp-21 were minimized. These small errors in the calculations of intrinsic pKa values are sufficient to preclude accurate estimation of the shifts in pKa values in the cluster. In general, the best agreement between calculated and measured pKa values are obtained when the protein is treated with in = 20.18,19 However, this approach fails when applied to active sites and other ionizable clusters. In SNase, using in = 20 cannot reproduce the pKa values of the carboxylic groups in the cluster, regardless of which crystal structure is used.8,17 In fact, for Asp-21, increasing the value of the protein dielectric constant above 10 worsens agreement with experiment (Figure 2.8). The problem is that high values of in improve the treatment of Coulomb interactions by attenuating them, but this improvement is offset by the attenuation of the self-energy, which may reduce the accuracy of the calculated intrinsic pKa values and hence offset any improvement in accuracy resulting from attenuation of the Coulomb interactions. Note that the pKa values of Asp-19 and Asp-21, which according to the calculations are under the influence of a significant, destabilizing Born energy, are only reproduced with effective dielectric constants less than 20, and Asp-21 only with a dielectric constant between 10 and 12 (Figure 2.8). Meanwhile, the pKa value of Asp-40 is insensitive to in > 8, and that of Glu-43 is independent of the value of in. The calculations show that the particular value of the protein dielectric constant needed to reproduce the pKa of a given ionizable group may depend on the extent to which the different terms in 64 Figure 2.8. (following page) FDPB pKa calculations shown as a function of in for the four carboxylic groups in the active site cluster. Solid circles refer to pKa values calculated with the structure of ∆+PHS (PDB accession code 3BDC) with the color indicating the tautomeric state of the neutral form of the group: black, protonated Oδ2 or Oε2 (state 1); blue, protonated Oδ1 or Oε1 (state 2); red, protonated Oδ1 for Asp-19 only (state 3); green, protonated Oδ1 for Asp-21 only (state 4). Open circles describe contributions to the calculated pKa shifts from Coulomb (black), self (blue), Born (red), and background (green) interactions (using the protonated Oδ1/Oε1 states for all residues, which gives the best agreement with the experimentally measured values at in = 10-12) The dotted black line is the model compound pKa, and the solid line is the experimentally measured value from Table 2.1. The dashed vertical lines denote the range of in values that reproduce the experimental pKa of Asp-21 to within 0.5 pH units. 65 66 equation (2.1) contribute to its pKa. As others have suggested, different values of the dielectric constant appear to be needed for the self-energy and for the Coulomb interactions.20 The data in Figure 2.9 suggest that relaxing structures with classical MD simulations is not always a reliable way to improve pKa calculations, at least not in ionizable clusters or other cases where proton binding is highly cooperative. Simulations in which the charged state of a single residue differs can lead to significantly different conformations and to incorrect pKa values. Two 10 ns MD simulations were performed, one with Asp-21 charged and one with Asp-21 neutral (all other ionizable groups were assigned the charged states of their respective model compounds at pH 7). pKa values were calculated at 50 ps intervals using in = 10. In the trajectory with Asp-21 charged (Figure 2.9(a)), the simulation settled into a state with a very depressed pKa for Asp-21 (average value 0.9), mostly from forming a strong apparent ion pair with Arg-35. On the other hand, in the trajectory with neutral Asp-21 (Figure 2.9(b)) the simulation settled into a state with a high pKa for Asp-21 (average value 5.9), coincident with a slight increase in both the Born and background terms. Notably there was an abrupt decrease in the distance between Asp-21 and the carbonyl oxygen of Val-39 upon shifting to the high pKa state, consistent with the formation of a hydrogen bond like the one seen in the NVIAGA crystal structures (Figure 2.1(b)). Using longer MD runs would not eliminate the problem, given that the conformation of the active site appears to depend on the protonation state of Asp-21. Since classical MD simulations have fixed protonation states, then a single MD run would be unlikely to sample both 67 Figure 2.9. pKa values calculated in structures sampled along MD trajectories performed with Asp-21 in the charged (a) or neutral (b) states. The calculated pKa values (black) and contributions to the shift in pKa from Coulomb (green), Born (red) and background (blue) energies are shown. The solid horizontal lines correspond to the same values calculated with the static crystal structure in the default tautomeric state using εin = 10 (Figure 2.8). 68 conformations regardless of the simulation length. In the case of clustered groups with similar intrinsic pKa values and strong Coulomb interactions, small errors in the calculated intrinsic pKa values can be amplified dramatically in the partitioning of the Coulomb interactions (Figure 2.6). This is a serious problem because calculations of both the Born and background terms that constitute the intrinsic pKa in equation (2.1) depend on the fine details of the protein structure, especially at the low values of in required to weight these terms appropriately. At εin = 4 the difference between calculations with different tautomeric states was as large as 2 pH units for Asp-19 and Asp-21 (Figure 2.8). Crystal structures can be biased by crystal packing interactions that are not present in solution.21 For instance, in many crystal structures of SNase the side chain of Lys70 from an adjacent molecule inserts into the active site and comes within hydrogen-bonding distance of Asp-19, Asp-21, and Glu-43. Similarly, the structure can be subtly dependent on pH as we suspect is the case for the conformations of Arg-35 and Asp-21. These potential problems with the conformation of the native state as described by crystal structures can reduce the accuracy of intrinsic pKa calculations. It is difficult to calculate pKa values in cooperative clusters of ionizable groups with similar intrinsic pKa values precisely because the capacity to amplify small signals is the hallmark of cooperativity; small and unavoidable inaccuracies in the calculations will be amplified as well. 69 2.5 Conclusions The elevated pKa of Asp-21 in SNase is not merely the result of repulsive Coulomb interactions with the other carboxylic groups in the active-site cluster. Even when all of the other residues in the cluster are neutralized, the intrinsic pKa of Asp-21 is still higher than the normal pKa of Asp in water. Longer-range Coulomb interactions cannot account for this effect, as screening of Coulomb interactions with 1 M salt was not sufficient to reduce the pKa of Asp-21 to a normal value. The intrinsic pKa of Asp-21 is elevated; consequently, Asp-21 senses repulsive interactions with Asp-19, Asp-40, and Glu-43. Conversely, the pKa values of the other carboxylic groups are not affected by the charge state of Asp-21 because they all titrate in a pH range where Asp-21 is neutral. Asp-19 has a depressed pKa, due to a combination of its acting as a hydrogen bond acceptor and a strong Coulomb attraction to Arg-35. Asp-40 and Glu-43 have near-normal pKa values, which likely reflect an equal balance of favorable and unfavorable interactions. The reasons for the elevated intrinsic pKa of Asp-21 are not obvious from SNase crystal structures. The experimental data suggest that the microenvironment around Asp-21 in the crystal structure may differ from that in solution; the strong Coulomb interaction between Asp-19 and Arg-35 suggested by the increased pKa of Asp-19 in R35Q is inconsistent with the relative positions of these groups in the crystal structure. Crystal structures of variants with an Asn residue replacing Asp21 show Arg-35 in an altered conformation that places it much closer to Asp-19. These same variants have Asn-21 donating a hydrogen bond to a backbone carbonyl. If Asp-21 were similarly capable of donating a hydrogen bond in the 70 protonated state, then such an interaction would favor the protonated state of Asp21 and lead to an upward shift in its intrinsic pKa. The ability of hydrogen bonds to depress the pKa values of Asp and Glu residues acting as hydrogen bond acceptors is well documented.15,21 The effect on the pKa when a protonated Asp or Glu acts as a hydrogen bond donor has not been well established, but it seems reasonable to expect that such an interaction would favor the protonated state and elevate the pKa. Since protein hydrogen atoms are not generally visible by x-ray crystallography, assumptions must be made about the protonation states of Asp and Glu residues when performing calculations on x-ray structures, which may not necessarily reflect the actual protonation states of those residues. Therefore it will be very difficult for calculations based on crystal structures to predict the consequences of hydrogen bonds formed by these residues without some means of determining the correct protonation states. An algorithm for placing hydrogens in crystal structures based on global optimization of hydrogen bond network for each protonation state has been shown to improve the accuracy of pKa calculations for active site residues of Bacillus circulans xylanase and hen egg white lysozyme.21,22 The active site cluster of SNase may provide a useful benchmark for further calibrating such a method. 2.6 Materials and methods 71 2.6.1 Protein expression and purification All experiments were performed with a highly stable, acid-resistant variant of SNase known as ∆+PHS to ensure that the protein remained folded throughout the titration of most carboxylic groups.8 Site-directed mutagenesis was performed using the QuickChange kit (Stratagene). uniformly 13C/15N For NMR titrations, all variants were labeled and expressed as described previously.8 All proteins were purified according to the procedure of Shortle and Meeker.23 2.6.2 NMR spectroscopy For pH titrations, protein samples were prepared by exchanging from H2O into aqueous solution containing 100 mM KCl (or 1 M KCl), 10% D2O, and 0.5 mM NaN3. Exchange was conducted by successive dilutions in Amicon Ultra-4 tubes (Millipore). Final protein concentrations ranged from 0.8 to 1.1 mM, as measured by absorbance at 280 nm using an extinction coefficient of 0.93 (mg/ml)-1. A 1.4 mL volume of sample was prepared and divided into two equal fractions, one to be initially titrated with acid and the other with base. Titrations were carried out as described previously.8 The chemical shifts of the carboxyl carbons of Asp and Glu side chains, as well as those of the adjacent methylene carbons, were monitored using a two-dimensional 13C-detect CBCGCO experiment.8 assignments, as well as Asp and Glu side chain were determined previously.8 13C Full backbone assignments in ∆+PHS SNase The resonances in the CBCGCO spectra for the variants could be assigned by comparison with the spectrum for ∆+PHS. 72 Side chain proton assignments were obtained using standard tripleresonance HBHA(CO)NH and H(CCCO)NH-TOCSY experiments. The sample buffer consisted of 25 mM potassium acetate, 100 mM KCl, 10% D2O, and 0.5 mM NaN3. The final protein concentration was 0.7 mM, and the final sample pH was 4.68. The same sample was used to collect a 3D 15N-NOESY-HSQC with a mixing time of 75 ms. Titrations, HBHA(CO)NH, and H(CCCO)NH-TOCSY experiments were collected on a Bruker Avance II 600 equipped with a cryogenic-TCI probe (with a cryocooled 13C preamplifier), whereas the NOESY was collected on a Bruker Avance 600 equipped with a cryogenic-TXI probe. All experiments were collected at 298 K and in the absence of Ca2+ or pdTp. 2.6.3 pKa values pKa values were obtained from the pH dependence of the C (Asp) or C (Glu) chemical shifts by fitting a modified Hill equation24 to the data: obspH AH A 10 n pHpK a (2.2) 110 npHpK a AH and A- are the chemical shifts of the protonated and deprotonated forms of the residue, respectively. The parameter n is the Hill coefficient that describes the slope of the titration curve in the transition region, and reflects the degree of cooperativity in binding. This analysis assumes that the protonated and deprotonated states are in fast exchange. Non-linear least squares fitting was 73 performed with the nlme library in the R statistics package.25 In cases where two titration events were evident in the curve, a two-site modified Hill equation was used:26 obspH AH 2 AH 10n1pHpK A 10n1pHpK a1 a1 n 2pHpK a 2 110 n1pHpK a1 10 n1pHpK a1 n 2pHpK a 2 (2.3) This model differs from the two-site model used by Castañeda et al., which did not include Hill coefficients.8 The two pKa values and Hill coefficients correspond to the separate titration events. To minimize the number of fitting parameters, the Hill coefficient of the smaller titration event was fixed to be 1, since in all cases the amplitude of the smaller titration was too small for the Hill coefficient to be fit accurately. pH values were not corrected for deuterium isotope effects. For residues for which complete titration curves could not be obtained because of protein unfolding at low pH, the amplitude A AH of the titration was fixed to the value determined for the same residue at 1 M KCl, as described previously. 8 In the case of Glu-75, whose resonance broadened beyond detection below pH 4 at 1 M KCl, the pKa was obtained by fixing the amplitude of the titration to that obtained at 100 mM KCl. For Asp-77 and Asp-83 pKa values could not be obtained because these groups titrate at a pH value below the acid-unfolding midpoint of the protein in all variants.8 However, for the Asp-77 pKa in most variants an upper bound was 74 assigned using by fitting equation (2.3) with the Hill coefficients fixed to 1 and ∆ fixed to the smallest value observed for other Asp residues. For the reference protein, ∆+PHS, we applied the fitting procedure described above to the data from Castañeda et al.8 These data at 100 mM KCl comprise three separate titration experiments, and the mean and standard errors of pKa values and Hill coefficients over these three experiments are reported in Table 2.1 and Appendix A. Small discrepancies between the results from Castañeda et al. and those reported here result from using a different model for two-site fits, as explained above, and using the two-site model for residues other than Asp-19 and Asp-21. The largest standard error observed was 0.06 pH units, which is of comparable magnitude to the errors of each individual fit. Therefore, for the other variants and conditions, a single titration experiment was performed and the pKa values and Hill coefficients with fitting errors from this experiment are reported. 2.6.4 Comparison of SNase structures Crystal structures of 119 SNase variants from the PDB were analyzed (1A2T, 1A2U, 1A3T, 1A3U, 1A3V, 1AEX, 1ENA, 1ENC, 1EQV, 1EY0, 1EY4, 1EY5, 1EY6, 1EY7, 1EY8, 1EY9, 1EYA, 1EYC, 1EYD, 1EZ6, 1EZ8, 1F2M, 1F2Y, 1F2Z, 1IHZ, 1II3, 1KAA, 1KAB, 1KDA, 1KDB, 1KDC, 1NSN, 1NUC, 1SNC, 1SND, 1SNM, 1SNO, 1SNP, 1SNQ, 1STA, 1STB, 1STG, 1STH, 1STN, 1STY, 1SYB, 1SYC, 1SYD, 1SYE, 1SYF, 1SYG, 1TQO, 1TR5, 1TT2, 1U9R, 2ENB, 2EXZ, 2EY1, 2EY2, 2EY5, 2EY6, 2EYF, 2EYH, 2EYJ, 2EYL, 2EYM, 2EYO, 2EYP, 2F0D, 2F0E, 2F0F, 2F0G, 2F0H, 2F0I, 2F0J, 2F0K, 2F0L, 2F0M, 2F0N, 2F0O, 2F0P, 2F0Q, 2F0S, 2F0T, 2F0U, 2F0V, 2F0W, 2NUC, 2OEO, 2OF1, 2OXP, 75 2PW5, 2PW7, 2PYK, 2PZT, 2PZU, 2PZW, 2QDB, 2RBM, 2RDF, 2RKS, 2SNM, 2SNS, 3BDC, 3C1E, 3C1F, 3D4D, 3D4W, 3D6C, 3D8G, 3DHQ, 3DMU, 3E5S, 3EJI, 3ERO, 3ERQ, 3EVQ, 3NUC, 5NUC). All structures have resolutions between 1.5 and 2.8 Å. The distance from Arg-35 to Asp-19 and Asp-21 were measured in each structure using as endpoints the Arg-35 C and the midpoint between the O atoms of Asp-19 or Asp-21. Histograms of the resulting values for both Asp-19 and Asp-21 were generated using R.25 2.6.5 Crystal structure of ∆+PHS/D21N Crystals were grown at 277 K using hanging drop vapor diffusion methods. The reservoir solution contained 38% (v/v) 2-methyl-2,4-pentanediol (MPD) (Sigma-Aldrich) and 25 mM potassium phosphate buffer, pH 6.0. Protein of 23 mg/mL concentration was combined with CaCl2 and thymine 3’,5’-diphosphate (pdTp) in a 1:3:2 molar ratio prior to mixing equal volumes with the reservoir solution and equilibrating at 4 °C. Crystals were mounted in nylon loops on a copper base (CryoloopsTM and CrystalCap Copper MagneticTM, Hampton Research), flash-cooled in liquid nitrogen and stored at 78 K. Diffraction data were collected from a single crystal using beamline X-25 at the National Synchrotron Light Source. Data were indexed, integrated, and scaled using HKL2000.27 Initial phases were obtained by molecular replacement using the program Phaser28 within the CCP4 suite.29 The coordinates for ∆+PHS (PDB accession code 3BDC) with heteroatoms removed, B-factors set to 20 Å2, and residue 21 truncated to Ala. Several alternating rounds of structure refinement 76 using Refmac530 and model building using Coot31 resulted in a final model that contained residues 1-142, one pdTp molecule, one phosphate ion, and 143 water molecules. Only water molecules for which both 2Fo-Fc and Fo-Fc electron density (contoured to 1.2 s and 3.0 s, respectively) were observed and that were within 3.5 Å of a likely hydrogen-bonding partner were incorporated into the model. Data collection and refinement statistics are summarized in Appendix A. 2.6.6 Structure-based continuum electrostatic calculations Calculations were performed with the structure of ∆+PHS SNase (3BDC).8 pKa values were calculated using the FDPB method with the linearized form of the Poisson-Boltzmann equation, as implemented in the University of Houston Brownian Dynamics package of McCammon and co-workers.18,19,32,33 Details of FDPB calculations with SNase have been presented elsewhere.17,34 Energies were computed using the distributed charge scheme for the ionizable form of each titratable residue.19,35–38 Calculations with both the PARSE parameter set39 and the CHARMm version 22 polar hydrogen-only parameter set40,41 were compared (data not shown); the reported results are those for the PARSE parameter set. The default placement of hydrogen atoms was on O2 for all Asp residues and O2 for all Glu residues, unless noted otherwise. FDPB pKa calculations were also computed on 50 ps snapshots of 10 ns MD trajectories. The trajectories were computed using the program NAMD242 and the CHARMm v27 forcefield.41 The system was simulated with explicit solvent (TIP3 77 water) and periodic boundary conditions, using the particle mesh Ewald method with real space interaction cutoff of 10 Å. 78 2.7 References 1. Søndergaard, C.R., McIntosh, L.P., Pollastri, G. & Nielsen, J.E. (2008). Determination of electrostatic interaction energies and protonation state populations in enzyme active sites. Journal of Molecular Biology 376, 269–287 2. Ondrechen, M.J., Clifton, J.G. & Ringe, D. (2001). THEMATICS: A simple computational predictor of enzyme function from structure. Proceedings of the National Academy of Sciences of the United States of America 98, 12473–12478 3. McIntosh, L.P., Hand, G., Johnson, P.E., Joshi, M.D., Körner, M., Plesniak, L.A., et al. (1996). The pKa of the general acid/base carboxyl group of a glycosidase cycles during catalysis: a 13C-NMR study of bacillus circulans xylanase. Biochemistry 35, 9958–9966 4. Ackers, G.K., Shea, M.A. & Smith, F.R. (1983). Free energy coupling within macromolecules: The chemical work of ligand binding at the individual sites in co-operative systems. Journal of Molecular Biology 170, 223–242 5. Weber, D.J., Gittis, A.G., Mullen, G.P., Abeygunawardana, C., Lattman, E.E. & Mildvan, A.S. (1992). NMR docking of a substrate into the X-ray structure of staphylococcal nuclease. Proteins: Structure, Function, and Genetics 13, 275–287 6. Serpersu, E.H., Shortle, D. & Mildvan, A.S. (1987). Kinetic and magnetic resonance studies of active-site mutants of staphylococcal nuclease: factors contributing to catalysis. Biochemistry 26, 1289–1300 7. Serpersu, E.H., Hibler, D.W., Gerlt, J.A. & Mildvan, A.S. (1989). Kinetic and magnetic resonance studies of the glutamate-43 to serine mutant of staphylococcal nuclease. Biochemistry 28, 1539–1548 8. Castañeda, C.A., Fitch, C.A., Majumdar, A., Khangulov, V., Schlessman, J.L. & García‐Moreno, B.E. (2009). Molecular determinants of the pKa values of Asp and Glu residues in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 77, 570–588 9. Baran, K.L., Chimenti, M.S., Schlessman, J.L., Fitch, C.A., Herbst, K.J. & GarcíaMoreno, B. (2008). Electrostatic effects in a network of polar and ionizable groups in staphylococcal nuclease. Journal of Molecular Biology 379, 1045–1062 10. Rule, G.S. & Hitchens, T.K. (2006). Fundamentals of Protein NMR Spectroscopy. Springer, Dordrecht. 11. Chen, J., Lu, Z., Sakon, J. & Stites, W.E. (2000). Increasing the thermostability of staphylococcal nuclease: implications for the origin of protein thermostability. Journal of Molecular Biology 303, 125–130 12. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., et al. (2000). The Protein Data Bank. Nucleic Acids Research 28, 235–242 13. Chen, V.B., Arendall, W.B., Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., et al. (2009). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D-Biological Crystallography 66, 12–21 14. Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., et al. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research 35 Suppl 2, W375–W383 79 15. Forsyth, W.R., Antosiewicz, J.M. & Robertson, A.D. (2002). Empirical relationships between protein structure and carboxyl pKa values in proteins. Proteins: Structure, Function, and Genetics 48, 388–403 16. Bombarda, E. & Ullmann, G.M. (2010). pH-dependent pKa values in proteins—a theoretical analysis of protonation energies with practical consequences for enzymatic reactions. The Journal of Physical Chemistry B 114, 1994–2003 17. Fitch, C.A., Whitten, S.T., Hilser, V.J. & García‐Moreno E., B. (2006). Molecular mechanisms of pH‐driven conformational transitions of proteins: Insights from continuum electrostatics calculations of acid unfolding. Proteins: Structure, Function, and Bioinformatics 63, 113–126 18. Antosiewicz, J., McCammon, J.A. & Gilson, M.K. (1994). Prediction of Phdependent Properties of Proteins. Journal of Molecular Biology 238, 415–436 19. Antosiewicz, J., McCammon, J.A. & Gilson, M.K. (1996). The determinants of pKas in proteins. Biochemistry 35, 7819–7833 20. Schutz, C.N. & Warshel, A. (2001). What are the dielectric “constants” of proteins and how to validate electrostatic models? Proteins: Structure, Function, and Bioinformatics 44, 400–417 21. Nielsen, J.E. & Vriend, G. (2001). Optimizing the hydrogen-bond network in Poisson-Boltzmann equation-based pKa calculations. Proteins: Structure, Function, and Genetics 43, 403–412 22. Nielsen, J.E., Andersen, K.V., Honig, B., Hooft, R.W.W., Klebe, G., Vriend, G., et al. (1999). Improving macromolecular electrostatics calculations. Protein Engineering 12, 657–662 23. Shortle, D. & Meeker, A.K. (1986). Mutant forms of staphylococcal nuclease with altered patterns of guanidine hydrochloride and urea denaturation. Proteins: Structure, Function, and Genetics 1, 81–89 24. Markley, J.L. (1975). Observation of histidine residues in proteins by nuclear magnetic resonance spectroscopy. Accounts of Chemical Research 8, 70–80 25. R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. at <http://www.R-project.org> 26. Pérez-Cañadillas, J.M., Campos-Olivas, R., Lacadena, J., Martínez del Pozo, A., Gavilanes, J.G., Santoro, J., et al. (1998). Characterization of pKa values and titration shifts in the cytotoxic ribonuclease alpha-sarcin by NMR. Relationship between electrostatic interactions, structure, and catalytic function. Biochemistry 37, 15865–15876 27. Otwinowski, Z. & Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods in Enzymology 276, 307–326 28. McCoy, A.J., Grosse-Kunstleve, R.W., Storoni, L.C. & Read, R.J. (2005). Likelihoodenhanced fast translation functions. Acta Crystallographica Section D-Biological Crystallography 61, 458–464 29. Bailey, S. (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallographica Section D-Biological Crystallography 50, 760–763 30. Murshudov, G.N., Vagin, A.A. & Dodson, E.J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallographica Section D-Biological Crystallography 53, 240–255 80 31. Emsley, P. & Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta Crystallographica Section D-Biological Crystallography 60, 2126– 2132 32. Davis, M.E., Madura, J.D., Luty, B.A. & McCammon, J.A. (1991). Electrostatics and diffusion of molecules in solution: simulations with the University of Houston Brownian dynamics program. Computer Physics Communications 62, 187–197 33. Madura, J.D., Briggs, J.M., Wade, R.C., Davis, M.E., Luty, B.A., Ilin, A., et al. (1995). Electrostatics and diffusion of molecules in solution: simulations with the University of Houston Brownian Dynamics program. Computer Physics Communications 91, 57–95 34. Fitch, C.A., Karp, D.A., Lee, K.K., Stites, W.E., Lattman, E.E. & García-Moreno, E.B. (2002). Experimental pKa Values of Buried Residues: Analysis with Continuum Methods and Role of Water Penetration. Biophysical Journal 82, 3289–3304 35. Bashford, D. & Gerwert, K. (1992). Electrostatic calculations of the pKa values of ionizable groups in bacteriorhodopsin. Journal of Molecular Biology 224, 473– 486 36. Yang, A.-S. & Honig, B. (1993). On the pH dependence of protein stability. Journal of Molecular Biology 231, 459–474 37. Antosiewicz, J., Briggs, J.M., Elcock, A.H., Gilson, M.K. & McCammon, J.A. (1996). Computing ionization states of proteins with a detailed charge model. Journal of Computational Chemistry 17, 1633–1644 38. Trylska, J., Antosiewicz, J., Geller, M., Hodge, C.N., Klabe, R.M., Head, M.S., et al. (1999). Thermodynamic linkage between the binding of protons and inhibitors to HIV-1 protease. Protein Science 8, 180–195 39. Sitkoff, D., Sharp, K.A. & Honig, B. (1994). Accurate calculation of hydration free energies using macroscopic solvent models. The Journal of Physical Chemistry 98, 1978–1988 40. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J., Swaminathan, S. & Karplus, M. (1983). CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. Journal of Computational Chemistry 4, 187–217 41. MacKerell, A.D., Bashford, D., Bellott, M., Dunbrack, R.L., Evanseck, J.D., Field, M.J., et al. (1998). All-atom empirical potential for molecular modeling and dynamics studies of proteins. The Journal of Physical Chemistry B 102, 3586–3616 42. Kalé, L., Skeel, R., Bhandarkar, M., Brunner, R., Gursoy, A., Krawetz, N., et al. (1999). NAMD2: Greater scalability for parallel molecular dynamics. Journal of Computational Physics 151, 283–312 81 3 Conformational Reorganization of the Backbone Influences the pKa Values of Ionizable Groups in Proteins (to be submitted in a slightly different form to Journal of the American Chemical Society under the authorship of Brian M. Doctrow, Jamie L. Schlessman, Ananya Majumdar, and Bertrand García-Moreno E.) 82 3.1 Abstract The pKa values of ionizable residues at protein surfaces are usually similar to those of ionizable residues in water. However, electrostatics calculations with static structures tend to predict large shifts in pKa values. These problems can be minimized by using artificially high dielectric constants, but sometimes the discrepancy persists even when the protein interior is treated with the dielectric constant of water. This suggests that the conformational dynamics of the protein, which are not reflected in the crystal structure, are reflected in pKa values. Molecular dynamics or Monte Carlo simulations can attempt to reproduce these dynamic effects, but there are no useful data for testing this approach. To examine the role of backbone conformational heterogeneity and reorganization in determining pKa values, NMR spectroscopy was used to measure the pKa values of all 20 Asp and Glu residues in variants of staphylococcal nuclease (SNase) with Gly substitutions at select locations. These substitutions were intended to enhance backbone reorganization without affecting the overall conformation of the protein. Some of the Gly substitutions tested shifted the pKa values of nearby carboxylic groups without having a significant effect on the crystal structure. Hydrogen exchange measurements indicated increased propensity for local unfolding near the residues whose pKa values were affected by the Gly substitutions. Calculations with continuum electrostatic methods using crystal structures of the variants do not reproduce the effects of Gly substitutions on pKa values. Our results suggest that the high apparent polarizability of proteins might be the result of subtle structural reorganization of the backbone that is difficult to reproduce computationally. 83 3.2 Introduction Ionizable groups in proteins play essential roles in processes such as catalysis1, H+ transport2–4, and the regulation of function by pH5,6. To understand how ionizable groups contribute to biochemical processes, it is necessary to know their pKa values and to understand the factors that govern them. To identify the molecular determinants of the pKa values we usually depend on structure-based pKa calculations that attempt to reproduce pKa values solely from protein structure and based on physical principles7–14. The success of these methods at reproducing experimental pKa values has been limited15, suggesting that the physical determinants of pKa values are still not fully understood. Here we examine how the conformational heterogeneity of the backbone and its high capacity for reorganization affect pKa values. This is something that is very difficult to treat computationally and which has not been examined systematically before. The pKa values of surface ionizable groups tend to be similar to the normal values of model compounds in water16–18. This is difficult to reproduce using calculations with static structures, which tend to predict pKa values significantly shifted from their model compound values19. The discrepancy can be minimized by using an arbitrarily high protein dielectric constant, presumably because the high dielectric constant implicitly accounts for protein reorganzation11. However, in many cases the calculated pKa values do not match the measured values even when the protein is treated with the dielectric constant of water15,18. This implies that the relative positions of ionizable groups in the crystal structure do not match the average positions of the ensemble in solution. The conformational properties of the 84 ensemble in solution must be treated explicitly in calculations to improve their accuracy. Structure-based calculations can treat conformational reorganization explicitly with molecular dynamics (MD) simulations20 or Monte Carlo (MC) methods12,21. Although these methods can improve the accuracy of pKa calculations, often this requires arbitrary adjustments to the dielectric constant to achieve acceptable accuracy. In some cases the calculations fail. Neither MD nor MC methods sample backbone conformational changes adequately: the former because the timescale of reorganziation of the backbone can be long, and the latter because the conformational space is too large. If the inherent heterogeneity of the backbone and its ability to reorganize can affect pKa values, then neither MD nor MC are likely to be an effective approach to the improvement of structure-based calculations. Previously, an ensemble model based on the COREX algorithm22 was able to reproduce the acid unfolding profile of staphylococcal nuclease (SNase) better than electrostatic calculations with a static structure. It was also able to identify the carboxylic residues responsible for acid unfolding23. In this model the protein is treated as a Boltzmann-weighted ensemble of partially unfolded structures, and ionizable groups are assigned different pKa values depending on whether they are in folded or unfolded regions of the protein. The overall pKa of a residue then reflects an average over all of the states: pKa P1 pK1a P 2 pKa2 K (1) 85 where Pi is the population of state i, and pKai the pKa of the residue in state i. In this model, the more likely a residue is to be unfolded, the more normal its pKa will be. The model essentially captures the contributions of local unfolding and backbone reorganziation on pKa values. The success of this model in reproducing the acid unfolding of SNase suggests that the conformational heterogeneity and stability of a carboxylic group’s microenvironment can significantly influence its pKa. This remains to be demonstrated experimentally. To examine how conformational heterogeneity and reorganization of the protein backbone can affect pKa values of ionizable groups, we measured the pKa values of Asp and Glu residues in SNase in variants with Gly substitutions. Glycine is known to promote fluctuations and local unfolding of the protein backbone24,25. Our results show that glycine substitutions can affect on pKa values significantly, without altering the charges or polarity near the affected ionizable groups and without detectable changes to the protein conformation in the crystal structure. pKa calculations that represent the protein with a single structure are unable to reproduce the shifts in pKa values caused by the substitutions with Gly. Hydrogendeuterium exchange suggests that the pKa shifts are associated with increased backbone fluctuations, consistent with the view that conformational reorganziation can modulate the electrostatic microenvironment of an ionizable group and act as an important determinant of pKa values. 3.3 Results 86 Figure 3.1 shows the locations of the different Gly substitution sites and of Asp and Glu side chains. The initial hypothesis was that increased fluctuations promoted by Gly substitutions would allow ionizable groups to sample more solvent-exposed environments, therefore the substitutions should tend to normalize pKa values. The glycine substitutions were thus chosen to target residues whose pKa is significantly depressed in ∆+PHS18. Only hydrophobic residues were substituted to ensure that any observed pKa shifts were not caused by direct removal of polar or Coulomb interactions. Substitutions of residues within the hydrophobic core were largely avoided so as not to disrupt the overall folding of the protein. 3.3.1 pKa values measured by NMR spectroscopy The pKa values of Asp and Glu residues were determined from the pHdependence of the carboxyl carbon chemical shifts measured with a CBCGCO experiment. The resulting titration curves are shown in Figure 3.2(a) for a subset of the Asp and Glu residues. The corresponding pKa values are listed in Table 3.1. Using the stable form of SNase known as ∆+PHS as the reference state ensured that the protein remained folded over as wide a pH range as possible. Nevertheless, there were several instances where the low pH-baseline for a titration curve could not be determined because the protein unfolded before the residue was fully protonated. In such cases, we followed the procedure of Castañeda et al. by assuming that the amplitude of the titration did not vary with salt concentration and fixing the lower baseline to match the amplitudes measured at 1M KCl18. 87 Figure 3.1: Cα trace of ∆+PHS SNase (PDB accession code 3BDC). Asp and Glu side chains are shown as ball-and-stick representation. Colored spheres indicate locations of Gly substitutions: Pro-11 (white), Ala-60 (blue), Ala-69 (red), Met-98 (green), and Ala-130 (gray). The color scheme will be maintained in subsequent figures. 88 Figure 3.2: (a) Plots of carboxyl carbon chemical shift vs. pH for the subset of carboxylic residues indicated in Figure 3.1 in each Gly-substituted variant. Lines represent fits of the data to a two- or three-state modified Hill equation, as described in the text. Dashed lines indicate that the fit was performed using a fixed value for the lower baseline, as described in Castañeda et al.18 (b) Bar graphs indicating the shift in pKa relative to the pKa measured in ∆+PHS. For Asp-77, the difference in the upper limits for the pKa between the variant and ∆+PHS are shown. 89 Table 3.1. pKa values of select Asp and Glu residues measured by NMR spectroscopy.a Protein Residue pKab ∆pKac d e ∆+PHS Asp-77 ≤ 1.7 f Asp-95 2.16 ± 0.04 Glu-57 3.49 ± 0.05 g Glu-75 3.30 ± 0.02 Glu-101 3.81 ± 0.06 e ∆+PHS/P11G Asp-77 ≤ 1.6 Asp-95 2.15 ± 0.01f -0.01 ± 0.04 Glu-57 3.45 ± 0.01 -0.04 ± 0.05 Glu-75 3.17 ± 0.05g -0.13 ± 0.05 Glu-101 3.84 ± 0.01 0.03 ± 0.06 e ∆+PHS/A60G Asp-77 ≤ 1.7 Asp-95 2.38 ± 0.01f 0.22 ± 0.04 Glu-57 3.67 ± 0.01 0.18 ± 0.05 Glu-75 3.45 ± 0.01f,g 0.15 ± 0.02 Glu-101 3.98 ± 0.01 0.17 ± 0.06 ∆+PHS/A58G/A60G Asp-77 ≤ 1.8e f Asp-95 1.93 ± 0.05 -0.23 ±0.06 Glu-57 3.61 ± 0.01f 0.12 ± 0.05 f,g Glu-75 3.19 ± 0.01 -0.11 ± 0.02 Glu-101 3.83 ± 0.05 0.02 ± 0.08 e ∆+PHS/A69G Asp-77 ≤ 1.6 Asp-95 2.77 ± 0.02f 0.61 ± 0.04 Glu-57 3.52 ± 0.02 0.03 ± 0.05 Glu-75 3.27 ± 0.04g -0.03 ± 0.04 Glu-101 3.77 ± 0.04 -0.04 ± 0.07 ∆+PHS/M98G Asp-77 ≤ 2.5e f Asp-95 2.25 ± 0.05 0.09 ± 0.06 Glu-57 3.46 ± 0.07 -0.03 ± 0.09 Glu-75 3.92 ± 0.05 0.62 ± 0.05 f Glu-101 3.32 ± 0.01 -0.49 ± 0.06 e ∆+PHS/M98A Asp-77 ≤ 2.7 Asp-95 2.13 ± 0.06f -0.03 ± 0.07 Glu-57 3.43 ± 0.05 -0.06 ± 0.07 f Glu-75 3.91 ± 0.16 0.60 ± 0.16 Glu-101 3.27 ± 0.02f -0.54 ± 0.06 ∆+PHS/A130G Asp-77 ≤ 1.6e Asp-95 2.37 ± 0.01f 0.21 ± 0.04 Glu-57 3.61 ± 0.01 0.12 ± 0.05 Glu-75 3.50 ± 0.03g 0.20 ± 0.04 Glu-101 4.02 ± 0.01 0.21 ± 0.06 ∆+PHS/A128G/A130G Asp-77 ≤ 1.8e f Asp-95 2.13 ± 0.04 -0.03 ± 0.06 Glu-57 3.67 ± 0.02 0.18 ± 0.05 90 Glu-75 Glu-101 3.31 ± 0.02f,g 3.97 ± 0.02h 0.00 ± 0.03 0.16 ± 0.06 Measurements were performed at 298 K and 100 mM KCl. pKa values were obtained by fitting a single-site modified Hill equation to the data, unless otherwise indicated. Values reported are from a single titration experiment with corresponding goodness of fit, unless otherwise indicated. c Change in pKa relative to ∆+PHS d pKa values for ∆+PHS are means & standard errors over 3 independent titration experiments, using the data from Castañeda et al18. e Upper limit for Asp-77 pKa obtained by fitting the data to a two-site modified Hill equation with a fixed Hill coefficient of 1 and a fixed ∆δ of 1.85 ppm f Fit performed by fixing the amplitude of the titration (∆δ) to the value obtained from the titration of the same residue in ∆+PHS at 1M KCl18 or at 100 mM (for Glu-75) g pKa values obtained by fitting a two-site modified Hill equation to the data. Only values corresponding to the larger of the two transitions are reported. h Fit performed by fixing ∆δ to the largest value obtained for titration of other Glu residues (4.45 ppm) a b 91 The effects of Gly substitutions on the pKa values of Glu-57 are representative of most of the Asp and Glu residues in SNase; the pKa is shifted from the ∆+PHS value by no more than 0.25 pH units (corresponding to a change in protonation free energy less that 0.34 kcal/mol). This is the case for all of the Asp & Glu residues not shown in Figure 3.2 (Table B.1). The fact that the majority of pKa values in each Glycontaining variant are minimally perturbed suggests that the glycine substitutions are not causing any global structural perturbations. As can be seen from Figure 3.2(b), only such minimal shifts were observed in three of the five variants (P11G, A60G, A130G). In two variants (A69G and M98G), the pKa of at least one residue was shifted by 0.5 pH units or more relative to the reference protein. The pKa of Asp-95 increases by 0.6 pH units in A69G, while that of Glu-75 increases by the same amount in M98G (Table 3.1 and Figure 3.2(b)). Both of these residues have pKa values that are depressed in ∆+PHS relative to their model compound values (3.9 for Asp, 4.4 for Glu). Thus, a positive ∆pKa corresponds to a shift towards a more normal pKa value, consistent with the idea that the Gly substitutions are promoting states in which these residues are more solvent exposed and not forming hydrogen bonds or Coulomb interactions with the rest of the protein. In contrast, the pKa of Glu-101 in M98G shifts away from the model compound pKa by 0.5 pH units, suggesting that the Gly substitution allows this residue to form favorable interactions that are not accessible to it in ∆+PHS. The case of Asp-77 merits special mention. This residue’s pKa is below the pH at which the protein unfolds; its pKa cannot be measured even in ∆+PHS. 92 However, enough of the beginning of the transition is observable to allow a reasonable estimate of an upper limit for the pKa. The difference in these estimated upper limits are what is plotted in Figure 3.2(b). The upper limit estimated in M98G is 0.8 pH units higher than that in ∆+PHS, suggesting that Asp-77 begins to titrate at a significantly higher pH in M98G than in ∆+PHS (Table 3.1). This in turn suggests that the pKa of Asp-77 is higher in M98G than in ∆+PHS, even though a quantitative measurement of pKa values is not possible for this residue. The M98G substitution involves the removal of a bulky methionine side chain. It is therefore possible that this substitution causes significant changes to side chain packing and that this may be the cause of the increases in pKa rather than an increase in backbone fluctuations. To establish this, pKa values were also measured in the M98A variant. The M98A substitution should have a similar effect on side chain packing as M98G, but should not increase backbone fluctuations appreciably since the substitution retains the Cβ. The same large shifts in pKa measured in M98G were also observed in M98A (Figure B.1 and Table B.1). Thus the large pKa shifts in the M98G variant appear to be a response to changes in side chain packing interactions resulting from the removal of the Met-98 side chain. Residues Ala-60 and Ala-130, where Gly substitution did not produce any pKa shifts, are both located in the middle of helices. One might thus expect the backbone at these positions to be relatively rigid regardless of the amino acid type, and thus a single Gly substitution might not be enough to significantly increase backbone fluctuations. Therefore, a second Gly substitution was introduced to each of the A60G and A130G variants near the original substitution site to see if further 93 perturbations could produce a measureable response. Neither of the double-glycine variants showed significant changes in pKa values (Figure B.1 and Table B.1). No pKa shifts larger than 0.25 pH units were observed in the A58G/A60G variant compared to ∆+PHS. In the A128G/A130G variant, the pKa of one residue (Glu-135) shifted towards the model compound value by 0.33 pH units – larger than the shifts observed in the corresponding single variant (A130G), but still much smaller than the shifts observed in either A69G or M98G. Therefore, the effect of the Gly substitutions on pKa values is highly position-specific; at some positions, a single Gly substitution can cause large pKa shifts, whereas at others even two Gly substitutions have only minimal effects. 3.3.2 Thermodynamic stability The Gly variants were subjected to chemical and acid denaturation to determine the effects of the substitutions on global stability. Figure 3.3 shows guanidinium chloride (GdmCl) and acid denaturation curves of ∆+PHS and the Glysubstituted variants. Table 3.2 lists the values of ∆G°, pHmid, and m obtained from these curves. All of the substitutions were destabilizing, by an amount ranging from 1.2-5 kcal/mol. However, all of the variants had stability of 6.8 kcal/mol or greater. The loss in stability measured by GdmCl denaturation paralleled the increase in the pHmid for acid unfolding as expected26. However, the pKa shifts due to the Gly substitutions do not correlate with either ∆G° or with pHmid. Even in the variants that showed significant pKa shifts, the majority of Asp and Glu pKa values were unaffected (Figure 3.3(c)). Furthermore, the double-glycine variants, which were 94 Table 3.2: Stability measured by acid- and GdmCl-induced denaturation. ∆G˚ Protein pHmida (kcal/mol)b m-valueb ∆+PHS 2.25 ± 0.01 11.8 ± 0.05 4.9 ± 0.02 ∆+PHS/P11G 2.33 ± 0.01 10.6 ± 0.11 4.7 ± 0.05 ∆+PHS/A60G 2.36 ± 0.01 10.1 ± 0.08 4.7 ± 0.04 ∆+PHS/A69G 2.61 ± 0.01 9.6 ± 0.18 4.8 ± 0.11 ∆+PHS/M98G 2.87 ± 0.01 7.8 ± 0.06 5.4 ± 0.03 ∆+PHS/A130G 2.27 ± 0.01 10.9 ± 0.13 4.9 ± 0.06 ∆+PHS/A58G/A60G 6.8 ± 0.2 4.6 ± 0.14 ∆+PHS/A128G/A130G 8.5 ± 0.13 5.2 ± 0.08 a Acid denaturation was monitored by Trp fluorescence at 298 K and 100 mM KCl b ∆G° and m-values measured with guanidine hydrochloride denaturation monitored by Trp fluorescence at pH 7.0, 298 K, and 100 mM KCl. 95 Figure 3.3: (a) GdnHCl and (b) acid denaturation curves for the proteins used in this study. Denaturation was monitored by intrinsic Trp fluorescence. Colors represent the unfolding of ∆+PHS (black), P11G (white), A60G (blue), A69G (red), M98G (green), and A130G (gray). Lines represent fits to two-state (for GdnHCl) or three-state (for acid) models as described previously for SNase..26,27 (C) Global destabilization of ∆+PHS (bars) and pKa shifts (circles) caused by Gly substitutions. The figure incorporates data from this study and a separately published study (Doctrow et al., in preparation) 96 among the least stable variants, showed no appreciable shifts in pKa (Figure B.1). This is consistent with previous work that found no correlation between global stability and the pKa values of surface histidine residues28. 3.3.3 Crystal structures Crystal structures were determined for both the A69G and M98G variants, in both the unliganded forms and in complex with Ca2+ and the inhibitor pdTp. Crystals of the two ternary complexes were isomorphous with that of ∆+PHS. The unliganded M98G crystal had a different spacegroup from ∆+PHS, whereas the unliganded A69G crystal had the same spacegroup as ∆+PHS but different unit cell dimensions (Table B.2). The aligned Cα traces of ∆+PHS and the A69G and M98G variants are shown in Figure 3.4. No significant structural changes are visible in any of the variant structures that could explain the observed pKa shifts. In the ternary complexes, no significant differences in the protein backbone were observed between the two glycine variants, or between the glycine variants and ∆+PHS. Nor were there any differences between the unliganded structures of the two variants. The conformation of the loop spanning residues 113-117 differed between the structures of the unliganded and ternary complex forms. This difference is consistently observed between structures of nuclease with and without Ca+2 and pdTp. Therefore, it is likely that this conformational change results from the presence or absence of ligands and not from the glycine substitution. The RMSD of all structures from the ∆+PHS structure was < 0.3 Å for Cα, < 0.4 Å for all heavy atoms (Table B.3). Except for surface lysine residues, for which electron density is 97 Figure 3.4: Alignment of Cα traces of ∆+PHS (3BDC, white), A69G with (3SR1, magenta) and without (3T13, yellow) Ca2+ and pdTp, and M98G with (3S9W, green) and without (3SK8, cyan) Ca2+ and pdTp. Only chain A from 3T13 is shown. 98 often not visible over much of the side chain, no significant differences in side chain conformations were seen in the vicinity of the Gly substitutions either. The Gly substitutions do not appear to affect pKa values by changing the overall protein structure, or at least not by causing a change detectable in a crystal structure. 3.3.4 Hydrogen exchange in Gly variants Amide hydrogen exchange (HX) rates were measured by NMR in ∆+PHS and in the A69G and M98G variants to determine the effects of the Gly substitutions on local conformational fluctuations in solution. These rates are usually interpreted in terms of the Linderstrøm-Lang scheme29: Closed kop Open kch Exchanged kcl Under conditions where kch is rate limiting (EX2 regime), the protection factor Pf = kch/kobs reflects the equilibrium between the closed and open states (lower Pf = greater population of open states). Residues that are more likely to be in open states will exchange at rates close to kch and thus will have protection factors close to 1, whereas residues that are likely to be in closed states will have protection factors much greater than 1. Therefore, a reduction in the Pf of a residue indicates an increased population of open states for that residue. In the A69G variant, significant decreases in Pf are seen in residues 68-71, comprising the C-terminus of helix 1 and the short loop connecting it to β-strand 4 (α1/β4 loop, Figure 3.5(a)). Asp-95, whose amide forms a hydrogen bond to the 99 Figure 3.5: Significant changes in HX rates observed in the (a) A69G and (b) M98G variants relative to ∆+PHS. The Gly-substituted residues are shown as spheres. Residues with measurable exchange rates are colored red, with darker shades indicating larger increases in rate upon Gly substitution. Black denotes residues that exchanged within the dead time of the experiment in both the reference and variant proteins. Residues in white showed no exchange during the course of the experiment. Gray indicates prolines and any residue whose resonance was not visible in the HSQC spectrum. Dashed magenta lines indicate hydrogen bonds whose disruption is consistent with either the observed changes in pKa or the observed changes in HX rates. 100 carbonyl of Lys-71, also is less protected in the variant. A previous study of HX rates in SNase suggests that residues in this loop exchange through concerted unfolding of the loop30. The reduced protection factors for these groups in the variant suggest that the A69G substitution may promote the unfolding of this loop, thereby favoring the open state for these residues relative to ∆+PHS. In the M98G variant the changes in HX are more widespread than in the A69G variant (Figure 3.5(b)). Lower protection factors are observed in three areas of the protein: residues 100 and 101 in helix 2 (near the substitution site), residues 125 and 126 in helix 3, and several residues in the large loop spanning residues 7689 (β4/β5 loop). Residues 113-121 and 123 exchanged within the dead time of the experiment in all variants. This suggests that the Gly substitution promotes unfolding of the N-termini of helices 2 and 3 and of the β4/β5 loop. The effect on the loop was unexpected, given its distance from the substitution (shortest distance 9.8 Å between Met-98-Cγ and Phe-76-HN). The fact that a single substitution affects the stability of all of these regions indicates that these regions are thermodynamically coupled to one another. The stability of one region depends on whether the others are folded, and vice versa. As noted above, most of the observed pKa shifts are toward the model compound value. Since open states resemble model compounds in their high solvent exposure and lack of hydrogen bonds, our hypothesis would predict that the residues with normalized pKa values should coincide with regions of reduced Pf. In both of these variants, the residues with large shifts in pKa were located in or near regions of reduced Pf (Figure 3.5). Thus, the data are consistent with our hypothesis 101 that the Gly substitutions are shifting the pKa by altering the conformational ensemble in solution. The exception to this pattern is Glu-101, whose pKa is more depressed in M98G than in ∆+PHS, even though its Pf is reduced. One possible explanation is that the substitution might make it more favorable for Glu-101 to form a hydrogen bond (see Discussion). 3.3.5 15N NMR relaxation measurements To further probe changes in the dynamics of the protein backbone caused by Gly substitution, 15N 15N longitudinal (R1) and transverse (R2) relaxation rates and 1H- heteronuclear NOE were measured in ∆+PHS and the A69G and M98G variants. In general, all three parameters were unchanged among the three proteins (Figure 3.6). R1 values tend to be slightly higher across the board in both variants relative to ∆+PHS, suggesting a change in the overall correlation time of the molecule. Therefore, the Gly substitutions do not appear to affect motions on the timescales that affect the relaxation parameters (ps-ns, µs-ms). 3.3.6 Structure-based pKa calculations pKa values for Asp and Glu residues in ∆+PHS and in the A69G and M98G variants were calculated using several different methods (Figure 3.7(a)). The overall correlation between calculated and measured pKa values was comparable between the variants, but was not particularly good in any of them. Calculations using PROPKA tended to underestimate the shifts in pKa relative to model 102 Figure 3.6: Backbone 15N R1 (top), R2 (middle), and 1H-15N NOE (bottom) as a function of residue number for ∆+PHS and the A69G and M98G variants. 103 Figure 3.7: (a) Plot of calculated versus measured pKa values in ∆+PHS (black), A69G (red), and M98G (green). (b) Plot of calculated versus measured changes in pKa relative to ∆+PHS for A69G and M98G (same colors as in A). Calculations performed using FDPB (circles), PROPKA (squares), and MCCE (diamonds) are included. 104 compound values, whereas in the FDPB calculations many such shifts were overestimated. Figure 3.7(b) plots the calculated pKa shifts between the Gly variants and the background against the measured differences. No correlation is observed for any of the methods used. The FDPB calculations do a particularly poor job of reproducing these shifts, with some residues showing a discrepancy of >2 pH units between calculations and measurements. Both the FDPB and PROPKA calculations are based solely on the static crystal structure; the failure of these methods implies that the determinants of the pKa shifts due to Gly substitution are not being captured in the crystal structure. MCCE explicitly models the variability and pH-dependent reorganization of the side chain conformations, but not the backbone. The failure of MCCE to capture the Gly substitution effects implies that the Gly substitutions do not affect pKa values only by altering side chain reorganization. Therefore the pKa shifts in the Gly variants must reflect changes in the backbone reorganization. 3.3.7 COREX calculations The COREX algorithm31,32 was used to analyze changes in local backbone stability due to the glycine substitutions. For each variant, changes in the natural log of the single-residue stability constants relative to ∆+PHS (∆lnKf) were calculated at pH 7.0. All variants showed significant destabilization (∆lnKf < -1) in the immediate vicinity of the substitution site (Figure 3.8, top). In addition, residues 36 and 37 were destabilized to varying extents in all variants, with ∆lnKf ranging 105 Figure 3.8: ∆lnKf (top) and ∆lnKp (bottom) relative to ∆+PHS, calculated using COREX for P11G (black), A60G (blue), A69G (red), M98G (green), and A130G (gray) at pH 7.0, 298 K. Horizontal dotted lines correspond to ∆lnK = -1, equivalent to ∆∆G = RT. 106 from ~-0.13 in A130G to ~-3.2 in M98G. The ∆lnKf for these residues correlated roughly with the change in the measured global stability of the Gly variants, suggesting that the stability constants of these residues may reflect the global stability of the protein. Although all of the variants showed significant changes in Kf, only the variants exhibiting significant pKa shifts showed significant changes in Kp (Figure 3.8, bottom). This constant, which reflects the probability of a residue being buried vs. exposed to solvent, determines a residue’s pKa in the COREX algorithm. The lower the protection constant, the greater the probability of being in a solventexposed environment, and hence the more normal the pKa will be. Thus the calculations are consistent with our observation that P11G, A60G, and A130G have minimal effects on pKa values. For the A69G variant, aside from residue 36-37, significant destabilization was observed in residues 61-74 (Figure 3.8, top). This range includes the residues for which increased HX was observed (68-69 and 71) (Figure 3.5(a)). The other residues in this region exchanged with rates outside the measureable range in both background and variant. Hence, changes in stability could not be experimentally determined for these residues. Nevertheless, the calculated changes in local stability and the location of the affected region are qualitatively consistent with the HX measurements. In addition, there is a large decrease in the calculated protection constant (Kp) of residue 95 (Figure 3.8, bottom). Thus the large decrease in Kp for Asp-95 is consistent with the more normal pKa measured for Asp-95 in this variant (Figure 3.2). 107 M98G showed the largest destabilizations (∆lnKf ~ 3.2) of any of the variants. Besides residues 36-37, large destabilization was seen in residues 91-106. This includes the residues with measurable increases in HX rate (92, 100, 101), but also includes residues whose HX rate does not change measurably (95, 97). Nor is any large destabilization seen in the other regions with increased HX (residues 76-89 and 125-125) (Figure 3.4(b)). Furthermore, there are no large changes in Kp for any Asp or Glu residues. The carboxylic group with the largest change in Kp is Asp-95 (∆lnKp = -0.41), whose pKa is unchanged in this variant. The only residue with |∆lnKp| > 1 is Tyr-93. Thus although COREX does a good job of capturing the effects of the A69G substitution on the conformational ensemble, it does not adequately capture the effects of M98G. 3.4 Discussion The data from this study are consistent with the idea that pKa values of surface ionizable residues in proteins are governed partly by the inherent heterogeneity of the protein. Even though Gly substitutions do not add or remove charges or polar groups, they induced shifts in pKa values similar in magnitude to what would result from removing a charge 5-10 Å from an ionizable group33. Crystal structures of the Gly variants showed no evidence that the Gly substitutions promoted significant conformational changes. It has been shown previously that Gly substitutions destabilize proteins locally, resulting in increased hydrogen exchange in residues around the substitution site34,35. In a random coil, Gly can access more of Φ,Ψ space compared to other amino acids, and thus has greater 108 conformational entropy. Thus, all other factors being equal, Gly will gain more conformational entropy than other amino acids upon unfolding. Therefore Gly will favor unfolding more than other amino acids36. Local unfolding may change the electrostatic environment of an ionizable residue by increasing its solvent exposure, or by removing hydrogen bonds or Coulomb interactions. Thus, a Gly substitution may shift the pKa of an ionizable group by increasing the populations of locally unfolded states. Consistent with this idea, the changes in pKa values seen upon Gly substitution were accompanied by localized increases in hydrogen exchange rates (Figure 3.4). Interestingly, the changes in protonation energy reflected in the larger pKa shifts (∆∆G = 1.36*∆pKa = 0.83-0.84 kcal/mol) are comparable to the measured value for T*∆∆Sconf upon substitution of alanine with glycine (0.73 ± 0.06 kcal/mol at T=298 K)36. 15N relaxation measurements gave no indication that the Gly substitutions increased backbone motions on the µs-ms timescale, in contrast to what Beeser et al. observed in BPTI24. By considering the hydrogen exchange results and the pKa shifts together, we can understand how the same dynamic processes that give rise to the observed changes in HX can also produce the observed pKa shifts. In the crystal structure of ∆+PHS, Asp-95 forms hydrogen bonds to the backbone amide groups of Lys-70 and Lys-71 (Figure 3.5(a)), which depress the pKa of Asp-9518. In the A69G variant, the pKa of Asp-95 is normalized (Figure 3.2). As explained in Results, the reduced protection factors in this variant suggest that the Gly substitution promotes unfolding of the α1/β4 loop that includes residues 70 and 71. This unfolding ought to break the hydrogen bonds between Asp-95 and the backbone. Therefore, by 109 increasing the unfolded population of the α1/β4 loop, the Gly substitution should also increase the population of states in which Asp-95 has a more normal pKa. Thus, the ensemble-averaged pKa should be more normal in A69G compared to ∆+PHS, which is exactly what is observed experimentally (Figure 3.2). The case of M98G is more complex. The pH titration data indicate that the pKa values of Glu-75 and Asp-77 are both normalized in the M98G variant (Figure 3.2). Both of these residues participate in hydrogen bonds to residues in helix-3 that could depress their pKa values: Glu-75 with the His-121 side chain37 and Asp-77 with the Thr-120 side chain and backbone (Figure 3.5(b)). In addition, prior work has shown that the pKa of His-121, which caps helix 3 and has a depressed pKa in the background protein37,38, normalizes in the M98G variant28. The pKa shifts of these three residues (Glu-75, Asp-77, and His-121) are consistent with the increased unfolding of the helix-3 N-terminus suggested by the HX data. His-121 is likely to be more solvent exposed when the N-terminus of helix 3 unfolds, and thus will have a more normal pKa. Unfolding of helix 3 would also break the hydrogen bond between Glu-75 and His-121, thereby normalizing the pKa of Glu-75. Assuming that Thr-120 also participates in the unfolding of helix 3, then the unfolding would break the hydrogen bonds between Thr-120 and Asp-77 and normalize the pKa of Asp-77 as well. Thus by increasing the unfolded population of helix 3, as suggested by the HX data, the M98G substitution could normalize the pKa values of Glu-75, Asp-77, and His-121 as observed. In using Gly substitutions to perturb the pKa values, we assumed that Gly would promote populations of locally unfolded states in which ionizable residues 110 would have increased exposure to water, thus more normal pKa values. The behavior of Glu-101, which became further depressed in M98G, is at odds with this interpretation. The data suggest that the M98G substitution allows the ionized Glu to form favorable interactions that are not accessible to it in the background protein. One possibility is that in the variant, the Glu side chain can adopt a conformation that allows a hydrogen bond between the side chain and its own backbone amide. Such a conformation would not be possible in the background protein because of a steric clash between the Glu-101 and Met-98 side chains. The increased fluctuations of residues 100-101 indicated by the HX data may facilitate this interaction by further reducing steric and geometric constraints. The Glu-101 backbone amide does not have a hydrogen-bonding partner in the background protein. The crystal structure of M98G shows Glu-101 in the same conformation as in the background protein, suggesting that the dominant conformation of Glu-101 is the same in both proteins. However, the proposed hydrogen-bonded conformation may still be present in a small but significant population in M98G. As long as this population is larger than in ∆+PHS, the ensemble-averaged pKa will be more depressed in the variant compared to the background. Our results demonstrate that the pKa value of an ionizable group can be shifted by as many as 0.6 pH units (and possibly more in the case of Asp-77) by a single amino acid substitution that does not affect either the polarity or the number of charges in the protein. The results also suggest that the effect involves increased exposure of the ionizable moiety to bulk water and that this is a consequence of the increase in the propensity for fluctuation of the backbone into locally unfolded 111 states. These results have important implications for structure-based pKa calculations. The calculated pKa values in Figure 3.7 were obtained using various methods (FDPB, PROPKA, MCCE). None of these methods is capable of reproducing the consequences of Gly substitutions on the pKa of carboxyl groups. All of the calculations predict large pKa shifts for residues that are unaffected by the substitution, while predicting no pKa shifts for the residues that are affected. Even in cases where the calculations correctly identify residues whose pKa values are shifted by the Gly substitution, the direction of the shift is usually in the wrong direction. Because the error in the calculated ∆pKa is not systematic, the agreement between calculated and measured ∆pKa values cannot be improved by arbitrarily adjusting the protein dielectric constant. Two of the pKa calculation methods tested here (FDPB and PROPKA) use a static structure to represent the protein. The third, MCCE, includes side chain heterogeneity explicitly while keeping the backbone static. None of these methods are capable of explicitly reproducing the effects of the Gly substitution on backbone reorganization or local unfolding, and must rely on implicit treatment of these effects through dielectric constant. However, the experimental data suggest that the Gly substitutions have a significant effect on backbone reorganization, and that this effect is not uniform throughout the protein. Therefore it is impossible to treat implicitly the effect of the Gly substitutions using a single dielectric constant for the entire protein. The backbone dynamics, and local unfolding in particular, must be treated explicitly in calculations. This has been attempted using molecular 112 dynamics (MD) simulations20, but these cannot adequately sample the timescales relevant to backbone reorganization14,15. The COREX algorithm provides another way to treat backbone dynamics in calculations. Because COREX generates its ensemble through enumeration of states, rather than dynamic sampling, it is not limited in the range of timescales that it can model. The coupling between local unfolding and side chain ionization is treated explicitly by assigning separate pKa values to unfolded versus folded residues, and then averaging the protonation state over the entire ensemble (equation (1)). This model was able to accurately reproduce the acid unfolding of SNase, and to identify the carboxylic residues responsible for this unfolding23. Figure 3.8 shows that COREX is able to reproduce the measured effects of the A69G substitution on local stability (HX) and on the pKa of Asp-95. This suggests that the COREX model adequately describes the effect of this substitution on the protein, and that the shift in the pKa of Asp-95 in this variant results from an increased population of states in which Asp-95 titrates with a more normal pKa. However, COREX does not reproduce the effects of the M98G substitution on either local stability or pKa values. In this variant, the pKa of Glu-101 becomes more depressed. For COREX to reproduce this, the residue’s environment would have to become more stable in the variant. This conflicts with the HX results, which show that the M98G substitution destabilizes Glu-101. We proposed above that the pKa of Glu-101 is more depressed in M98G because it can form a hydrogen bond that is not allowed in the native state. If this is correct, then COREX cannot reproduce the pKa shift of Glu-101, as COREX does not permit non-native hydrogen bonds to form. 113 COREX also does not reproduce the measured pKa shifts of Glu-75, Asp-77, and His121. These residues are all part of a network of hydrogen bonds37, which could energetically couple regions at opposite ends of the network. Because of this coupling, if one of these regions (e.g. helix 2) is destabilized by the M98G substitution, all other regions connected to this network (e.g. helix 3, β4/β5 loop) will be destabilized39. Consistent with this interpretation, the HX data show that M98G destabilizes helix 3 and the β4/β5 loop in addition to helix 2 (Figure 3.5(b)). Apparently COREX is unable to reproduce the energetic coupling between these regions, and therefore cannot reproduce the effects of the M98G substitution on local stabilities. Most likely COREX would have an equally difficult time with any other substitution that perturbs this h-bond network. 3.5 Conclusion This study provides strong experimental evidence that local backbone reorganization influences pKa values in proteins. Substitutions to Gly that did not change the charge or polarity of amino acids shifted the pKa values of Asp and Glu residues in SNase. The substitutions did not affect the crystal structure of the protein; this is clear evidence of ways in which crystal structures do not reflect all of the determinants of pKa values. However, the substitutions with Gly did change HX rates, consistent with the substitutions promoting local unfolding. Local unfolding can change an ionizable residue’s average environment, thus shifting the pKa. The changes in HX occured in the same parts of the protein where the pKa shifts occur, further suggesting a connection between pKa values and local unfolding. 114 Many commonly used algorithms for structure-based pKa calculations represent the protein by the crystal structure alone. The results presented here illustrate one way in which such a representation is inadequate; calculations based on a single, static protein structure were unable to reproduce the pKa shifts caused by Gly substitutions. Calculations that only allow the side chains to reorganize were similarly ineffective. These results, combined with our experimental measurements, indicate that structure-based pKa calculations must take into account the full ensemble of backbone conformations in solution in order to be accurate and useful. Developing an algorithm that can adequately sample this ensemble within a practical amount of computation time remains a challenge40. Considerable effort has been put into developing constant-pH molecular dynamics (CPHMD) methods, which explicitly couple backbone dynamics with changes in protonation states41–51. If successful, such methods would represent a major advance in the field of protein electrostatics and could make great contributions to understanding protein evolution and protein engineering. The effects of Gly substitutions on pKa values would provide a useful benchmark for further calibrating these methods. 3.6 Materials and methods 3.6.1 Site directed mutagenesis and protein purification All mutations were made in a highly stable variant of SNase known as ∆+PHS, which differs from wild-type SNase by five substitutions (G50F, V51N, P117G, H124L, and S128A) and a truncation (residues 44-49). Throughout this paper, 115 residue numbers refer to the position in the wild-type sequence. Mutations were engineered in the pET24a+ plasmid and expressed in E. coli BL21(DE3) cells. Proteins were expressed and purified according to the procedure of Shortle and Meeker52. For NMR experiments, uniformly 15N- or made by growing the cells in M9 minimal media with 13C6-D-glucose 13C/15N-labeled protein was 15NH4Cl 15NH4Cl or with and as described previously18,37 and purified according to the same procedure as for the unlabeled protein. 3.6.2 Equilibrium thermodynamics The thermodynamic stability of the Gly-substituted variants was determined by denaturation with guanidinium chloride (GdmCl) and with acid, monitored by the intrinsic fluorescence of Trp-140 (λex = 296 nm, λem = 326 nm) as described previously26. Experiments were performed with an AVIV ATF-105 or 107 automated fluorometer (Aviv Biomedical Inc. Lakewood, NJ) equipped with a Hamilton MicroLab 531C automated titration pump. All data were collected at 298 K and 100 mM KCl. GdmCl denaturation was carried out at pH 7.0. GdmCl denaturation experiments were analyzed by non-linear least squares fitting to a two-state model to obtain the standard Gibbs energy of unfolding (∆G°H2O), as described previously26. Acid denaturation experiments were analyzed by non-linear least squares fitting to a three-state model to obtain the midpoint of the unfolding transition (pHmid), as described previously27. For acid unfolding experiments, only the pHmid corresponding to global unfolding is reported. 116 3.6.3 NMR spectroscopy Uniformly 15N- or 13C/15N-labeled samples were prepared by exchanging from H2O into aqueous buffer containing 100 mM KCl, 0.5 mM NaN3, and 10% D2O (v/v) by successive dilution in Amicon Ultra-4 tubes (Millipore). Final protein concentration ranged from 0.7 to 1.1 mM. All NMR experiments were collected at 298 K on either a Bruker Avance or Avance II 600 MHz spectrometer. H N and N resonances in the A69G and M98G variants were assigned using standard HNCACB53, CBCACONH54, and C-C TOCSY (CO)NH55 triple-resonance NMR experiments. The sample buffer for assignments contained 25 mM acetate buffer (pH 4.7) in addition to the components listed above. Acetate buffer was prepared by mixing potassium acetate and glacial acetic acid in amounts calculated to yield the desired pH according to the Henderson-Hasselbalch equation. The final sample pH was 4.76 for A69G and 4.69 for M98G. All spectra were processed using NMRPipe56 and analyzed using Sparky57. Asp and Glu pKa values were measured on a Bruker Avance or Avance II 600 MHz NMR spectrometer with a cryogenic-TCI probe (with a cryocooled 13C preamplifier). The Cγ and Cδ resonances were used to monitor the titration of Asp and Glu resonances, respectively. These resonances have been assigned previously in ∆+PHS18 and were measured using a 13C-detected CBCGCO experiment18. The resonances in the spectra of the variants could be assigned by comparison with the ∆+PHS spectrum. pKa values were obtained from least squares fitting to a one- or two-site modified Hill equation, as previously described in Section 2.6.3. 117 Samples for HX measurements were prepared by lyophilizing uniformly 15Nlabeled protein. 1H to 2H exchange was initiated by dissolving lyophilized protein in D2O buffer containing 25 mM acetate (pH* 4.8), 100 mM KCl, and 0.5 mM NaN3, to a final protein concentration of 1 mM. The uncorrected sample pH, measured at the end of the exchange measurements, ranged from 5.05-5.13. Exchange was monitored in real time by sequential 2D HSQC experiments, separated by progressively longer intervals. HX rates were obtained by fitting a single-exponential decay function to the peak heights as a function of time, using the R statistics package58. The peaks of residues 90 and 122 overlap in the spectra of ∆+PHS and A69G, as do residues 66 and 86 in A69G. However, in both cases the two overlapping residues exchange at very different rates, resulting in a bi-exponential decay profile for the single observed peak. By fitting a bi-exponential decay function to this peak, the exchange rates for both residues can be resolved. Based on the behavior of the neighboring residues, residues 122 and 86 were assigned the faster rates. (Table B.4). Protection factors (Pf) were calculated as the ratio kch/kobs, where kobs is the rate obtained from the exponential fit, and kch is the intrinsic rate calculated from the sequence and the experimental conditions according to the method of Bai et al.59 Following Skinner et al.30 the reference acid- and base-catalyzed rate constants for Asp and Glu residues were increased by a factor of 2.5 to account for their known systematic deviation from the values predicted by Bai et al.60 Many residues showed minimal exchange over the duration of the experiment, which ranged from 45 to 93 hours. For these residues, the fitted rate 118 constants were used only if the fitting program returned a p-value less than 0.0005. Otherwise, kobs was assigned an upper limit equal to –ln(0.9)/tmax, where tmax is the last measured time point. This corresponds to the rate for which a residue would have exchanged by 10% by the end of the experiment. A lower limit for Pf was then calculated from this upper limit and the calculated value of kch. Protection factors for residues that completely exchanged within the first three time points were assigned upper limits, using the fastest measured rate in the variant as the lower limit for kobs. This was only done for residues whose exchange rate could be measured in at least one variant. Residues whose exchange rate was too fast to measure in all proteins were excluded from analysis. Experiments to measure R1, R2, and the heteronuclear NOE of backbone 15N atoms in the Gly variants were conducted using established two-dimensional heteronuclear correlation experiments61,62. The pulse sequence for R2 measurements incorporated duty-cycle heating compensation63. uniformly 15N-labeled Samples of protein were prepared in the same manner as for the assignment experiments. For R1 measurements, the maximum relaxation delay was 1.1 s (A69G) or 1.5 s (M98G). For R2 measurements, the maximum delay was 106 ms (A69G) or 144 ms (M98G). 6-9 time points were collected for all measurements. One randomly selected time point from each measurement was collected twice for error estimation. Relaxation rates were obtained by fitting 1H-15N cross-peak heights versus time to an exponential decay function using the CurveFit program from the laboratory of Dr. Arthur Palmer III (www.palmer.hs.columbia.edu). Steady-state NOEs were calculated as the ratio of the 1H-15N cross peak heights in 119 the presence and absence of proton presaturation. The peak heights with and without presaturation were each calculated as the average over three experiments. 3.6.4 X-ray crystallography Crystals of the A69G and M98G variants were grown at 277 K using the hanging-drop, vapor diffusion method. Crystals were grown both in the presence and absence of Ca2+ and the inhibitor thymidine-3’,5’-bis-phosphate (pdTp). Crystals of the M98G variant with Ca2+ and pdTp were grown in a solution containing 20% (v/v) 2-methyl-2,4-pentanediol (MPD) and 25 mM potassium phosphate buffer, pH 7.0, whereas crystals without Ca2+ and pdTp were grown in a solution containing 34% (v/v) MPD and 25 mM potassium phosphate buffer, pH 7.0. Crystals of the A69G variant with Ca2+ and pdTp were grown in a solution containing 25% (v/v) MPD and 25 mM potassium phosphate buffer, pH 9.0, whereas crystals without Ca2+ and pdTp were grown in a solution containing 41% (v/v) MPD, 25 mM potassium phosphate buffer, pH 7.0. The initial protein concentration was 11.8 mg/mL for M98G and 13.0 mg/mL for A69G. For crystallization with Ca 2+ and pdTp, the protein was mixed with 3 M equiv. CaCl2 and 2 M equiv. pdTp. The protein or protein/CaCl2/pdTp mixture was then mixed with the reservoir solution in a 1:1 ratio to form the hanging drop. For data collection, crystals were suspended with mother liquor in a cryoloop and flash-cooled in liquid nitrogen. All proteins crystallized in spacegroup P21, except for M98G in the absence of Ca2+ and pdTp, which crystallized in spacegroup P41. 120 Data were collected from a single crystal of each variant on beamline X-25 at the National Synchrotron Light Source at Brookhaven National Laboratory. Reflections were indexed, integrated, scaled and merged using HKL200064. Phases for all structures were determined by molecular replacement using the Phaser65 program within the CCP4 suite66. The structure of ∆+PHS (PDB accession code 3BDC) with the mutated residue truncated to glycine, all waters removed, and all Bfactors set to 20.0 Å2 was used as a search model. Alternating rounds of structure refinement with Refmac567 and model building with Coot68 yielded the final models. TLS refinement69,70 was used during the later rounds of refinement. Geometries of the final models were evaluated using the MolProbity server71,72. Both of the M98G models and the A69G model with Ca2+ and pdTp include residues 7-141 and have one protein per asymmetric unit. The A69G model without Ca2+ and pdTp has two proteins per asymmetric unit, one of which includes residues 6-141 and the other of which includes residues 7-141. The models with Ca2+ and pdTp contained one Ca2+ and one pdTp each. The model of M98G without Ca2+ and pdTp contained two phosphates and one MPD molecule, whereas the model of A69G without Ca2+ and pdTp contained three phosphates and four MPD molecules. The latter model also contains one Ca2+, even though no CaCl2 was added to the protein used to grow the crystal. The electron density associated with this Ca2+ is too strong to be from a water molecule, and is located where Ca2+ normally binds to the protein. The Ca2+ may have come from trace calcium contamination in the well where the crystal was grown. No Ca2+ appears to be bound to the second molecule 121 in the asymmetric unit. Data collection and refinement statistics are summarized in Table B.1. 3.6.5 Calculations All structure-based pKa calculations were performed using the crystal structures of ∆+PHS (PDB accession code 3BDC), ∆+PHS/A69G (3SR1), or ∆+PHS/M98G (3S9W). Ca2+ and pdTp are bound to the protein in all of these structures, but were deleted from the structure prior to running the calculations. Structures were not relaxed prior to pKa calculations. All calculations were performed at 298 K. Finite difference Poisson-Boltzmann (FDPB) calculations with a static structure were performed using the finite difference Poisson-Boltzmann algorithm within the University of Houston Brownian Dynamics package73,74 using the fullcharge implementation, as described previously for SNase15. Calculations used a protein dielectric constant of 10 and 100 mM ionic strength. Charges from the PARSE parameter set were used75. For the histidines, hydrogen atoms were placed on Nε2 of His-8 and Nδ1 of His-121. This gave the best agreement with experimental pKa values. pKa values were also calculated with PROPKA version 3.1 on the PROPKA web interface (propka.ki.ku.dk)13,76,77. The effects of side chain conformational variability were also explored using the multi-conformation electrostatics (MCCE) algorithm12,21,78, version 2.5. The parameters of the calculations were those used for previous calculations on SNase79. 122 continuum The FULL method of conformer generation was used to explore all possible side chain rotamers. Electrostatic interactions were calculated using a dielectric constant of 10 and a salt concentration of 0.15 M. Lennard-Jones interactions were scaled by a factor of 0.25. COREX calculations were performed using source code provided by Dr. Steven Whitten (Texas State University-San Marcos)31. Calculations were performed on 3BDC and models of the A69G and M98G variants generated from 3BDC by truncating residue 69 or 98 to glycine in silico. Models of the variants were used instead of crystal structures to ensure that any calculated changes were solely the result of the Gly substitution, and not to any small changes in the coordinates of the rest of the protein. The size of the folding units was 8 residues, which resulted in ~6.5 x 105 microstates in the ensemble for each protein. The pKa values calculated for ∆+PHS using the FDPB procedure described above were used as the native state pKa values in all COREX calculations. The solvent accessibility cutoff for assignment of native versus unfolded pKa values was 0.45 for histidines and 0.31 for all other residues. These are the same values used previously for wild-type SNase calculations23,80. Entropy-scaling factors were chosen to reproduce the experimentally measured free energy of unfolding of the SNase variants at pH 7.0, and ranged from 0.97-0.981. 123 3.7 References 1. Warshel, A. (2003). Computer Simulations of Enzyme Catalysis: Methods, Progress, and Insights. Annual Review of Biophysics and Biomolecular Structure 32, 425–443 2. Burykin, A. & Warshel, A. (2003). What Really Prevents Proton Transport through Aquaporin? Charge Self-Energy versus Proton Wire Proposals. Biophysical Journal 85, 3696–3706 3. Burykin, A. & Warshel, A. (2004). On the origin of the electrostatic barrier for proton transport in aquaporin. FEBS Letters 570, 41–46 4. Braun-Sand, S., Strajbl, M. & Warshel, A. (2004). Studies of Proton Translocations in Biological Systems: Simulating Proton Transport in Carbonic Anhydrase by EVB-Based Models. Biophysical Journal 87, 2221–2239 5. Chu, A.H., Turner, B.W. & Ackers, G.K. (1984). Effects of protons on the oxygenation-linked subunit assembly in human hemoglobin. Biochemistry 23, 604–617 6. Ehrlich, L.S., Liu, T., Scarlata, S., Chu, B. & Carter, C.A. (2001). HIV-1 Capsid Protein Forms Spherical (Immature-Like) and Tubular (Mature-Like) Particles in Vitro: Structure Switching by pH-induced Conformational Changes. Biophysical Journal 81, 586–594 7. Tanford, C. & Kirkwood, J.G. (1957). Theory of Protein Titration Curves. I. General Equations for Impenetrable Spheres. J. Am. Chem. Soc. 79, 5333–5339 8. Matthew, J.B., Gurd, F.R.N., Garcia-Moreno, B.E., Flanagan, M.A., March, K.L. & Shire, S.J. (1985). pH-Dependent Processes in Protein. Critical Reviews in Biochemistry and Molecular Biology 18, 91–197 9. Klapper, I., Hagstrom, R., Fine, R., Sharp, K. & Honig, B. (1986). Focusing of electric fields in the active site of Cu-Zn superoxide dismutase: Effects of ionic strength and amino-acid modification. Proteins: Structure, Function, and Bioinformatics 1, 47–59 10. Bashford, D. & Karplus, M. (1990). pKa’s of ionizable groups in proteins: atomic detail from a continuum electrostatic model. Biochemistry 29, 10219–10225 11. Antosiewicz, J., McCammon, J.A. & Gilson, M.K. (1994). Prediction of Phdependent Properties of Proteins. Journal of Molecular Biology 238, 415–436 12. Alexov, E.G. & Gunner, M.R. (1997). Incorporating protein conformational flexibility into the calculation of pH-dependent protein properties. Biophysical Journal 72, 2075–2093 13. Li, H., Robertson, A.D. & Jensen, J.H. (2005). Very fast empirical prediction and rationalization of protein pKa values. Proteins: Structure, Function, and Bioinformatics 61, 704–721 14. Warshel, A., Sharma, P.K., Kato, M. & Parson, W.W. (2006). Modeling electrostatic effects in proteins. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1764, 1647–1676 15. Fitch, C.A., Whitten, S.T., Hilser, V.J. & García‐ Moreno E., B. (2006). Molecular mechanisms of pH‐ driven conformational transitions of proteins: Insights from 124 continuum electrostatics calculations of acid unfolding. Proteins: Structure, Function, and Bioinformatics 63, 113–126 16. Forsyth, W.R., Antosiewicz, J.M. & Robertson, A.D. (2002). Empirical relationships between protein structure and carboxyl pKa values in proteins. Proteins: Structure, Function, and Genetics 48, 388–403 17. Edgcomb, S.P. & Murphy, K.P. (2002). Variability in the pKa of histidine sidechains correlates with burial within proteins. Proteins: Structure, Function, and Bioinformatics 49, 1–6 18. Castañeda, C.A., Fitch, C.A., Majumdar, A., Khangulov, V., Schlessman, J.L. & García‐ Moreno, B.E. (2009). Molecular determinants of the pKa values of Asp and Glu residues in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 77, 570–588 19. García-Moreno E, B. & Fitch, C.A. (2004). Structural Interpretation of pH and Salt-Dependent Processes in Proteins with Computational Methods. Methods in Enzymology 380, 20–51 20. Van Vlijmen, H.W.T., Schaefer, M. & Karplus, M. (1998). Improving the accuracy of protein pKa calculations: Conformational averaging versus the average structure. Proteins: Structure, Function, and Bioinformatics 33, 145–158 21. Georgescu, R.E., Alexov, E.G. & Gunner, M.R. (2002). Combining conformational flexibility and continuum electrostatics for calculating pKas in proteins. Biophysical Journal 83, 1731–1748 22. Hilser, V.J., García-Moreno E., B., Oas, T.G., Kapp, G. & Whitten, S.T. (2006). A Statistical Thermodynamic Model of the Protein Ensemble. Chemical Reviews 106, 1545–1558 23. Whitten, S.T., García-Moreno E., B. & Hilser, V.J. (2005). Local conformational fluctuations can modulate the coupling between proton binding and global structural transitions in proteins. Proceedings of the National Academy of Sciences of the United States of America 102, 4282–4287 24. Beeser, S.A., Goldenberg, D.P. & Oas, T.G. (1997). Enhanced protein flexibility caused by a destabilizing amino acid replacement in BPTI. Journal of Molecular Biology 269, 154–164 25. Maity, H., Rumbley, J.N. & Englander, S.W. (2006). Functional role of a protein foldon—An Ω‐ loop foldon controls the alkaline transition in ferricytochrome c. Proteins: Structure, Function, and Bioinformatics 63, 349–355 26. Whitten, S.T. & García-Moreno E., B. (2000). pH dependence of stability of staphylococcal nuclease: Evidence of substantial electrostatic interactions in the denatured state. Biochemistry 39, 14292–14304 27. Karp, D.A., Gittis, A.G., Stahley, M.R., Fitch, C.A., Stites, W.E. & García-Moreno E., B. (2007). High Apparent Dielectric Constant Inside a Protein Reflects Structural Reorganization Coupled to the Ionization of an Internal Asp. Biophysical Journal 92, 2041–2053 28. Doctrow, B.D., Baran, K.L., Chimenti, M.S., Herbst, K.J., Fitch, C.A., Majumdar, A. & Garcia-Moreno E, B. Local flexibility as a determinant of pKa values of surface ionizable groups in proteins. In preparation 125 29. Hvidt, A. & Nielsen, S.O. (1966). Hydrogen Exchange in Proteins. Advances in Protein Chemistry 21, 287–386 30. Skinner, J.J., Lim, W.K., Bédard, S., Black, B.E. & Englander, S.W. (2012). Protein dynamics viewed by hydrogen exchange. Protein Science 21, 996–1005 31. Hilser, V.J. & Freire, E. (1996). Structure-based Calculation of the Equilibrium Folding Pathway of Proteins. Correlation with Hydrogen Exchange Protection Factors. Journal of Molecular Biology 262, 756–772 32. Hilser, V.J. & Freire, E. (1997). Predicting the equilibrium protein folding pathway: Structure-based analysis of staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 27, 171–183 33. Lee, K.K., Fitch, C.A. & García-Moreno E., B. (2002). Distance dependence and salt sensitivity of pairwise, coulombic interactions in a protein. Protein Science 11, 1004–1016 34. Huyghues-Despointes, B.M.P., Langhorst, U., Steyaert, J., Pace, C.N. & Scholtz, J.M. (1999). Hydrogen-Exchange Stabilities of RNase T1 and Variants with Buried and Solvent-Exposed Ala → Gly Mutations in the Helix. Biochemistry 38, 16481– 16490 35. Maity, H., Lim, W.K., Rumbley, J.N. & Englander, S.W. (2003). Protein hydrogen exchange mechanism: Local fluctuations. Protein Science 12, 153–160 36. D’Aquino, J.A., Gómez, J., Hilser, V.J., Lee, K.H., Amzel, L.M. & Freire, E. (1996). The magnitude of the backbone conformational entropy change in protein folding. Proteins: Structure, Function, and Bioinformatics 25, 143–156 37. Baran, K.L., Chimenti, M.S., Schlessman, J.L., Fitch, C.A., Herbst, K.J. & GarcíaMoreno, B. (2008). Electrostatic effects in a network of polar and ionizable groups in staphylococcal nuclease. Journal of Molecular Biology 379, 1045–1062 38. Castaneda, C.A. (2009). Determinants of electrostatic energies and pKa values in proteins. at <http://search.proquest.com/docview/304907798?accountid=11752> 39. Hilser, V.J. & Thompson, E.B. (2007). Intrinsic disorder as a mechanism to optimize allosteric coupling in proteins. Proceedings of the National Academy of Sciences of the United States of America 104, 8311–8315 40. Alexov, E., Mehler, E.L., Baker, N., M. Baptista, A., Huang, Y., Milletti, F., Erik Nielsen, J., Farrell, D., Carstensen, T., Olsson, M.H.M., Shen, J.K., Warwicker, J., Williams, S. & Word, J.M. (2011). Progress in the prediction of pKa values in proteins. Proteins: Structure, Function, and Bioinformatics 79, 3260–3275 41. Mongan, J. & Case, D.A. (2005). Biomolecular simulations at constant pH. Current Opinion in Structural Biology 15, 157–163 42. Mongan, J., Case, D.A. & McCammon, J.A. (2004). Constant pH molecular dynamics in generalized Born implicit solvent. Journal of Computational Chemistry 25, 2038–2048 43. Baptista, A.M., Martel, P.J. & Petersen, S.B. (1997). Simulation of protein conformational freedom as a function of pH: constant-pH molecular dynamics using implicit titration. Proteins: Structure, Function, and Bioinformatics 27, 523– 544 126 44. Lee, M.S., Salsbury, F.R. & Brooks, C.L. (2004). Constant-pH molecular dynamics using continuous titration coordinates. Proteins: Structure, Function, and Bioinformatics 56, 738–752 45. Donnini, S., Tegeler, F., Groenhof, G. & Grubmü ller, H. (2011). Constant pH Molecular Dynamics in Explicit Solvent with λ-Dynamics. Journal of Chemical Theory and Computation 7, 1962–1978 46. Khandogin, J. & Brooks, C.L. (2006). Toward the Accurate First-Principles Prediction of Ionization Equilibria in Proteins†. Biochemistry 45, 9363–9373 47. Meng, Y. & Roitberg, A.E. (2010). Constant pH Replica Exchange Molecular Dynamics in Biomolecules Using a Discrete Protonation Model. Journal of Chemical Theory and Computation 6, 1401–1412 48. Williams, S.L., de Oliveira, C.A.F. & McCammon, J.A. (2010). Coupling Constant pH Molecular Dynamics with Accelerated Molecular Dynamics. Journal of Chemical Theory and Computation 6, 560–568 49. Itoh, S.G., Damjanović, A. & Brooks, B.R. (2011). pH replica-exchange method based on discrete protonation states. Proteins: Structure, Function, and Bioinformatics 79, 3420–3436 50. Wallace, J.A. & Shen, J.K. (2011). Continuous Constant pH Molecular Dynamics in Explicit Solvent with pH-Based Replica Exchange. Journal of Chemical Theory and Computation 7, 2617–2629 51. Goh, G.B., Hulbert, B.S., Zhou, H. & Brooks, C.L. (2014). Constant pH molecular dynamics of proteins in explicit solvent with proton tautomerism. Proteins: Structure, Function, and Bioinformatics [Online early access], http://onlinelibrary.wiley.com/doi/10.1002/prot.24499/abstract 52. Shortle, D. & Meeker, A.K. (1986). Mutant forms of staphylococcal nuclease with altered patterns of guanidine hydrochloride and urea denaturation. Proteins: Structure, Function, and Genetics 1, 81–89 53. Wittekind, M. & Mueller, L. (1993). HNCACB, a High-Sensitivity 3D NMR Experiment to Correlate Amide-Proton and Nitrogen Resonances with the Alpha- and Beta-Carbon Resonances in Proteins. Journal of Magnetic Resonance, Series B 101, 201–205 54. Grzesiek, S. & Bax, A. (1992). Correlating backbone amide and side chain resonances in larger proteins by multiple relayed triple resonance NMR. J. Am. Chem. Soc. 114, 6291–6293 55. Grzesiek, S., Anglister, J. & Bax, A. (1993). Correlation of Backbone Amide and Aliphatic Side-Chain Resonances in 13C/15N-Enriched Proteins by Isotropic Mixing of 13C Magnetization. Journal of Magnetic Resonance, Series B 101, 114– 119 56. Delaglio, F., Grzesiek, S., Vuister, G.W., Zhu, G., Pfeifer, J. & Bax, A. (1995). NMRPipe: A multidimensional spectral processing system based on UNIX pipes. Journal of Biomolecular NMR 6, 277–293 57. Goddard, T.D. & Kneller, D.G. (University of California, San Francisco: ).SPARKY 3. 58. R Development Core Team (R Foundation for Statistical Computing: Vienna, Austria, 2009). R: A language and environment for statistical computing. at <http://www.R-project.org> 127 59. Bai, Y., Milne, J.S., Mayne, L. & Englander, S.W. (1993). Primary structure effects on peptide group hydrogen exchange. Proteins: Structure, Function, and Bioinformatics 17, 75–86 60. Mori, S., van Zijl, P.C.M. & Shortle, D. (1997). Measurement of water–amide proton exchange rates in the denatured state of staphylococcal nuclease by a magnetization transfer technique. Proteins: Structure, Function, and Bioinformatics 28, 325–332 61. Kay, L.E., Torchia, D.A. & Bax, A. (1989). Backbone dynamics of proteins as studied by nitrogen-15 inverse detected heteronuclear NMR spectroscopy: application to staphylococcal nuclease. Biochemistry 28, 8972–8979 62. Loria, J.P., Rance, M. & Palmer, A.G. (1999). A Relaxation-Compensated Carr−Purcell−Meiboom−Gill Sequence for Characterizing Chemical Exchange by NMR Spectroscopy. J. Am. Chem. Soc. 121, 2331–2332 63. Yip, G.N.B. & Zuiderweg, E.R.P. (2005). Improvement of duty-cycle heating compensation in NMR spin relaxation experiments. Journal of Magnetic Resonance 176, 171–178 64. Otwinowski, Z. & Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods in Enzymology 276, 307–326 65. McCoy, A.J., Grosse-Kunstleve, R.W., Storoni, L.C. & Read, R.J. (2005). Likelihoodenhanced fast translation functions. Acta Crystallographica Section D-Biological Crystallography 61, 458–464 66. Bailey, S. (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallographica Section D-Biological Crystallography 50, 760–763 67. Murshudov, G.N., Vagin, A.A. & Dodson, E.J. (1997). Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallographica Section D-Biological Crystallography 53, 240–255 68. Emsley, P. & Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta Crystallographica Section D-Biological Crystallography 60, 2126– 2132 69. Painter, J. & Merritt, E.A. (2006). Optimal description of a protein structure in terms of multiple groups undergoing TLS motion. Acta Crystallographica Section D-Biological Crystallography 62, 439–450 70. Painter, J. & Merritt, E.A. (2006). TLSMD web server for the generation of multigroup TLS models. Journal of Applied Crystallography 39, 109–111 71. Chen, V.B., Arendall, W.B., Headd, J.J., Keedy, D.A., Immormino, R.M., Kapral, G.J., Murray, L.W., Richardson, J.S. & Richardson, D.C. (2009). MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallographica Section D-Biological Crystallography 66, 12–21 72. Davis, I.W., Leaver-Fay, A., Chen, V.B., Block, J.N., Kapral, G.J., Wang, X., Murray, L.W., Arendall, W.B., Snoeyink, J., Richardson, J.S. & Richardson, D.C. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Research 35 Suppl 2, W375–W383 73. Davis, M.E., Madura, J.D., Luty, B.A. & McCammon, J.A. (1991). Electrostatics and diffusion of molecules in solution: simulations with the University of Houston Brownian dynamics program. Computer Physics Communications 62, 187–197 128 74. Madura, J.D., Briggs, J.M., Wade, R.C., Davis, M.E., Luty, B.A., Ilin, A., Antosiewicz, J., Gilson, M.K., Bagheri, B., Scott, L.R. & McCammon, J.A. (1995). Electrostatics and diffusion of molecules in solution: simulations with the University of Houston Brownian Dynamics program. Computer Physics Communications 91, 57–95 75. Sitkoff, D., Sharp, K.A. & Honig, B. (1994). Accurate calculation of hydration free energies using macroscopic solvent models. The Journal of Physical Chemistry 98, 1978–1988 76. Olsson, M.H.M., Søndergaard, C.R., Rostkowski, M. & Jensen, J.H. (2011). PROPKA3: Consistent treatment of internal and surface residues in empirical pKa predictions. Journal of Chemical Theory and Computation 7, 525–537 77. Søndergaard, C.R., Olsson, M.H.M., Rostkowski, M. & Jensen, J.H. (2011). Improved Treatment of Ligands and Coupling Effects in Empirical Calculation and Rationalization of pKa Values. Journal of Chemical Theory and Computation 7, 2284–2295 78. Song, Y., Mao, J. & Gunner, M.R. (2009). MCCE2: Improving protein pKa calculations with extensive side chain rotamer sampling. Journal of Computational Chemistry 30, 2231–2247 79. Gunner, M.R., Zhu, X. & Klein, M.C. (2011). MCCE analysis of the pKas of introduced buried acids and bases in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 79, 3306–3319 80. Whitten, S.T., García‐ Moreno, B.E. & Hilser, V.J. (2008). Ligand Effects on the Protein Ensemble: Unifying the Descriptions of Ligand Binding, Local Conformational Fluctuations, and Protein Stability. Methods in Cell Biology 84, 871–891 129 Appendix A Supplementary information for Chapter 2, “Electrostatic Coupling in a Cluster of Carboxylic Groups in the Active Site of an Enzyme” 130 Table A.1. pKa values of all Asp and Glu residues in all SNase variants from this study measured at 100 mM KCl Protein ∆+PHSc ∆+PHS/D19N Residue Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 pKaa 2.12 ± 0.05d,e 6.54 ± 0.01e 3.83 ± 0.05e ≤ 1.7 2.16 ± 0.04d 3.81 ± 0.06 3.86 ± 0.03 2.83 ± 0.05d,e 4.32 ± 0.03 3.93 ± 0.05 3.49 ± 0.05 3.76 ± 0.04 3.31 ± 0.01 3.30 ± 0.02e 3.81 ± 0.06 3.89 ± 0.05 3.75 ± 0.06 3.75 ± 0.05 4.49 ± 0.02 5.75 ± 0.02e 3.80 ± 0.03 ≤ 1.7 - 131 ∆pKab -0.79 ± 0.02 -0.03 ± 0.06 - na 0.81 ± 0.01d,e 1.03 ± 0.02e 0.65 ± 0.01e 0.87 ± 0.01d 0.77 ± 0.03 0.76 ± 0.01 0.94 ± 0.01d,e 0.69 ± 0.01 0.65 ± 0.02 0.83 ± 0.02 0.99 ± 0.02 0.92 ± 0.01 0.88 ± 0.03e 0.82 ± 0.01 0.78 ± 0.02 0.66 ± 0.01 0.82 ± 0.01 0.85 ± 0.01 0.94 ± 0.02e 0.56 ± 0.02 - ∆+PHS/D21N Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 2.22 ± 0.005d 3.77 ± 0.01 3.86 ± 0.01 2.87 ± 0.01d,e 3.79 ± 0.01 3.85 ± 0.02 3.49 ± 0.01 3.73 ± 0.01 3.24 ± 0.01 3.19 ± 0.05e 3.83 ± 0.01 3.92 ± 0.02 3.79 ± 0.01 3.77 ± 0.01 4.50 ± 0.01 2.60 ± 0.01d 3.94 ± 0.01 ≤ 1.6 2.26 ± 0.01d 3.93 ± 0.01 3.95 ± 0.01 2.93 ± 0.02d,e 4.46 ± 0.02 4.10 ± 0.03 3.63 ± 0.01 3.85 ± 0.01 132 0.06 ± 0.04 -0.04 ± 0.06 0.00 ± 0.03 0.04 ± 0.05 -0.53 ± 0.03 -0.08 ± 0.05 0.00 ± 0.05 -0.03 ± 0.04 -0.07 ± 0.01 -0.11 ± 0.05 0.02 ± 0.06 0.03 ± 0.05 0.04 ± 0.06 0.02 ± 0.05 0.01 ± 0.02 0.48 ± 0.05 0.11 ± 0.05 0.10 ± 0.04 0.12 ± 0.06 0.09 ± 0.03 0.10 ± 0.05 0.14 ± 0.04 0.17 ± 0.06 0.14 ± 0.05 0.09 ± 0.04 0.88 ± 0.01d 0.76 ± 0.01 0.76 ± 0.01 0.98 ± 0.01d,e 0.68 ± 0.01 0.69 ± 0.01 0.86 ± 0.01 0.98 ± 0.01 0.93 ± 0.01 0.85 ± 0.05e 0.84 ± 0.01 0.81 ± 0.03 0.68 ± 0.01 0.83 ± 0.01 0.83 ± 0.02 0.82 ± 0.02d 0.68 ± 0.01 0.91 ± 0.02d 0.83 ± 0.02 0.77 ± 0.02 0.96 ± 0.03d,e 0.67 ± 0.02 0.66 ± 0.02 0.88 ± 0.01 0.98 ± 0.02 ∆+PHS/D40N ∆+PHS/E43Q Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 3.31 ± 0.04 3.28 ± 0.02e,f 3.93 ± 0.01 3.98 ± 0.02 3.89 ± 0.02 3.84 ± 0.02 4.56 ± 0.02 2.19 ± 0.01d,e 6.18 ± 0.01e ≤ 1.7 2.21 ± 0.005d 3.74 ± 0.01 3.75 ± 0.01 2.88 ± 0.01d,e 4.11 ± 0.01 3.77 ± 0.02 3.46 ± 0.01 3.71 ± 0.005 3.22 ± 0.01 3.22 ± 0.02e,f 3.80 ± 0.01 3.89 ± 0.02 3.67 ± 0.01 3.73 ± 0.01 4.40 ± 0.01 2.34 ± 0.01d,e 133 0.00 ± 0.04 -0.02 ± 0.03 0.12 ± 0.06 0.09 ± 0.05 0.14 ± 0.06 0.09 ± 0.05 0.07 ± 0.03 0.07 ± 0.05 -0.36 ± 0.01 0.05 ± 0.04 -0.07 ± 0.06 -0.11 ± 0.03 0.05 ± 0.05 -0.21 ± 0.03 -0.16 ± 0.04 -0.03 ± 0.05 -0.05 ± 0.04 -0.09 ± 0.01 -0.08 ± 0.03 -0.01 ± 0.06 0.00 ± 0.05 -0.08 ± 0.01 -0.02 ± 0.05 -0.09 ± 0.02 0.22 ± 0.05 0.90 ± 0.04 0.84 ± 0.03e,f 0.85 ± 0.02 0.86 ± 0.03 0.73 ± 0.02 0.86 ± 0.02 0.81 ± 0.02 0.93 ± 0.02d,e 0.97 ± 0.01e 0.90 ± 0.01d 0.80 ± 0.01 0.76 ± 0.01 1.01 ± 0.01d,e 0.73 ± 0.01 0.75 ± 0.02 0.86 ± 0.01 1.00 ± 0.01 0.94 ± 0.01 0.89 ± 0.04e,f 0.87 ± 0.01 0.86 ± 0.02 0.74 ± 0.01 0.87 ± 0.01 0.83 ± 0.01 0.81 ± 0.03d,e ∆+PHS/D19N/D40N/E43Q Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 6.16 ± 0.01e 3.69 ± 0.01e ≤ 1.7 2.21 ± 0.01d 3.76 ± 0.02 3.77 ± 0.01 2.88 ± 0.01d,e 3.65 ± 0.01 3.47 ± 0.01 3.70 ± 0.01 3.23 ± 0.01 3.27 ± 0.03e 3.80 ± 0.01 3.86 ± 0.03 3.73 ± 0.01 3.71 ± 0.01 4.42 ± 0.01 4.57 ± 0.01 ≤ 1.7 2.21 ± 0.01d 3.75 ± 0.01 3.76 ± 0.01 2.90 ± 0.02d,e 134 -0.38 ± 0.01 -0.14 ± 0.01 0.05 ± 0.04 -0.05 ± 0.06 -0.09 ± 0.03 0.05 ± 0.05 -0.28 ± 0.05 -0.02 ± 0.05 -0.06 ± 0.04 -0.08 ± 0.01 -0.03 ± 0.04 -0.01 ± 0.06 -0.03 ± 0.06 -0.02 ± 0.06 -0.04 ± 0.05 -0.07 ± 0.02 -1.97 ± 0.01 -0.05 ± 0.04 -0.06 ± 0.06 -0.10 ± 0.03 0.07 ± 0.05 0.93 ± 0.01e 0.83 ± 0.01e 0.91 ± 0.01d 0.79 ± 0.02 0.75 ± 0.01 1.01 ± 0.02d,e 0.75 ± 0.01 0.89 ± 0.01 1.03 ± 0.01 0.95 ± 0.02 0.99 ± 0.04e 0.87 ± 0.01 0.82 ± 0.03 0.69 ± 0.01 0.87 ± 0.01 0.81 ± 0.01 0.93 ± 0.02 0.90 ± 0.01d 0.83 ± 0.01 0.79 ± 0.01 1.01 ± 0.03d,e ∆+PHS/R35Q Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 3.55 ± 0.01 3.49 ± 0.01 3.70 ± 0.01 3.24 ± 0.01 3.29 ± 0.01e,f 3.83 ± 0.01 3.91 ± 0.02 3.75 ± 0.02 3.74 ± 0.01 4.40 ± 0.01 3.06 ± 0.01d,e 6.05 ± 0.01e 4.27 ± 0.01 ≤ 1.9 2.28 ± 0.01d 3.74 ± 0.01 3.78 ± 0.01 2.94 ± 0.01d,e 4.45 ± 0.02 3.89 ± 0.02 3.50 ± 0.02 3.73 ± 0.01 3.26 ± 0.02 3.31 ± 0.01e,f 3.87 ± 0.01 3.95 ± 0.03 135 -0.38 ± 0.05 0.00 ± 0.05 -0.06 ± 0.04 -0.07 ± 0.01 -0.01 ± 0.02 0.02 ± 0.06 0.02 ± 0.05 0.00 ± 0.06 -0.01 ± 0.05 -0.09 ± 0.02 0.94 ± 0.05 -0.49 ± 0.01 0.44 ± 0.05 0.12 ± 0.04 -0.07 ± 0.06 -0.08 ± 0.03 0.11 ± 0.03 0.13 ± 0.04 -0.04 ± 0.05 0.01 ± 0.05 -0.03 ± 0.04 -0.05 ± 0.02 0.01 ± 0.02 0.06 ± 0.06 0.06 ± 0.06 0.87 ± 0.02 0.89 ± 0.02 0.99 ± 0.01 0.94 ± 0.02 0.99 ± 0.03e,f 0.86 ± 0.01 0.85 ± 0.03 0.79 ± 0.02 0.87 ± 0.02 0.84 ± 0.01 0.88 ± 0.02d,e 0.89 ± 0.02e 0.64 ± 0.01 0.91 ± 0.01d 0.77 ± 0.01 0.75 ± 0.01 1.03 ± 0.01d,e 0.63 ± 0.02 0.61 ± 0.01 0.87 ± 0.02 0.99 ± 0.01 0.95 ± 0.02 0.96 ± 0.02e,f 0.85 ± 0.01 0.81 ± 0.03 ∆+PHS/D19N/R35Q/D40N/E43Q ∆+PHS/D21N/R35Q/D40N/E43Q Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 3.77 ± 0.01 3.75 ± 0.01 4.44 ± 0.01 4.65 ± 0.01 ≤ 1.9 2.22 ± 0.01d 3.85 ± 0.01 3.85 ± 0.02 2.93 ± 0.02d,e 3.67 ± 0.01 3.55 ± 0.01 3.75 ± 0.01 3.19 ± 0.03 3.29 ± 0.01e,f 3.86 ± 0.02 3.99 ± 0.03 3.81 ± 0.03 3.78 ± 0.01 4.44 ± 0.02 3.46 ± 0.03 ≤ 1.9 - 136 0.02 ± 0.06 0.00 ± 0.05 -0.05 ± 0.02 -1.89 ± 0.01 0.06 ± 0.04 0.04 ± 0.06 -0.01 ± 0.04 0.10 ± 0.05 -0.26 ± 0.05 0.06 ± 0.05 -0.01 ± 0.04 -0.12 ± 0.03 -0.01 ± 0.02 0.05 ± 0.06 0.10 ± 0.06 0.06 ± 0.06 0.03 ± 0.05 -0.05 ± 0.03 1.34 ± 0.06 - 0.67 ± 0.01 0.84 ± 0.01 0.81 ± 0.02 0.90 ± 0.02 0.88 ± 0.01d 0.87 ± 0.01 0.81 ± 0.02 1.02 ± 0.03d,e 0.85 ± 0.02 0.89 ± 0.01 1.03 ± 0.02 0.88 ± 0.02 0.97 ± 0.03e,f 0.87 ± 0.02 0.89 ± 0.05 0.84 ± 0.04 0.90 ± 0.01 0.91 ± 0.03 0.96 ± 0.04 - ∆+PHS/R35Q/D40N/E43Q Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 2.27 ± 0.01d 3.80 ± 0.02 3.81 ± 0.02 2.98 ± 0.01d,e 3.76 ± 0.02 3.55 ± 0.02 3.78 ± 0.01 3.33 ± 0.03d 3.25 ± 0.03e,f 3.90 ± 0.02 4.01 ± 0.02 3.77 ± 0.02 3.78 ± 0.04 4.46 ± 0.02 3.10 ± 0.05e 5.70 ± 0.01 ≤ 1.90 2.31 ± 0.01d 3.82 ± 0.01 3.86 ± 0.01 2.98 ± 0.01d,e 3.78 ± 0.01 3.56 ± 0.01 3.81 ± 0.01 137 0.11 ± 0.04 -0.01 ± 0.06 -0.05 ± 0.04 0.15 ± 0.05 -0.17 ± 0.05 0.06 ± 0.05 0.02 ± 0.04 0.02 ± 0.03 -0.05 ± 0.04 0.09 ± 0.06 0.12 ± 0.05 0.02 ± 0.06 0.03 ± 0.06 -0.03 ± 0.03 0.98 ± 0.07 -0.84 ± 0.01 0.15 ± 0.04 0.01 ± 0.06 0.00 ± 0.03 0.15 ± 0.05 -0.15 ± 0.05 0.07 ± 0.05 0.05 ± 0.04 0.89 ± 0.02d 0.82 ± 0.02 0.77 ± 0.02 1.04 ± 0.03d,e 0.86 ± 0.02 0.87 ± 0.02 0.99 ± 0.02 1.18 ± 0.09d 0.90 ± 0.08e,f 0.87 ± 0.02 0.87 ± 0.03 0.79 ± 0.02 0.88 ± 0.05 0.86 ± 0.03 0.87 ± 0.08e 0.99 ± 0.02 0.89 ± 0.02d 0.83 ± 0.01 0.80 ± 0.01 1.00 ± 0.02d,e 0.91 ± 0.01 0.89 ± 0.01 1.05 ± 0.01 PHS Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 3.32 ± 0.03 3.37 ± 0.02e,f 3.94 ± 0.01 4.00 ± 0.02 3.79 ± 0.01 3.82 ± 0.01 4.46 ± 0.01 2.05 ± 0.05d,e 6.12 ± 0.05 3.73 ± 0.02e ≤ 1.8 2.16 ± 0.03d 3.70 ± 0.01 3.68 ± 0.01 2.90 ± 0.01d,e 3.74 ± 0.03e 3.90 ± 0.02 3.76 ± 0.01 3.34 ± 0.02 3.33 ± 0.01e,f 3.91 ± 0.01 3.91 ± 0.02 3.76 ± 0.01 3.74 ± 0.01 4.16 ± 0.02 138 0.01 ± 0.03 0.07 ± 0.03 0.13 ± 0.06 0.11 ± 0.05 0.04 ± 0.06 0.07 ± 0.05 -0.03 ± 0.02 -0.07 ± 0.07 -0.42 ± 0.05 -0.10 ± 0.05 0.00 ± 0.05 -0.11 ± 0.06 -0.18 ± 0.03 0.07 ± 0.05 -0.58 ± 0.04 -0.03 ± 0.05 0.00 ± 0.04 0.03 ± 0.02 0.03 ± 0.02 0.10 ± 0.06 0.02 ± 0.05 0.01 ± 0.06 -0.01 ± 0.05 -0.33 ± 0.03 0.94 ± 0.03 0.91 ± 0.03e,f 0.92 ± 0.01 0.90 ± 0.03 0.78 ± 0.01 0.89 ± 0.01 0.85 ± 0.01 0.63 ± 0.06d,e 0.81 ± 0.06 0.84 ± 0.03e 0.81 ± 0.03d 0.81 ± 0.01 0.77 ± 0.01 0.88 ± 0.02d,e 0.76 ± 0.07e 0.79 ± 0.02 1.05 ± 0.02 1.00 ± 0.03 0.93 ± 0.03e,f 0.91 ± 0.02 0.83 ± 0.02 0.76 ± 0.01 0.87 ± 0.01 0.75 ± 0.02 pKa values and Hill coefficients obtained by fitting the modified Hill equation (Equation (2.2)) to the pH-dependence of the Cγ/Cδ chemical shift, unless otherwise indicated. Titrations were performed at 298 K and 100 mM KCl. Values reported are those from a single titration experiment with corresponding errors of fit, unless otherwise indicated. b Change in pKa relative to ∆+PHS at 100 mM KCl: ∆pKa = pKavariant – pKa∆+PHS c pKa values obtained using the data from Castañeda et al.l1 Except for Glu-73 and Glu-75, reported values are means and standard errors over 3 independent titration experiments. Reasons for discrepancies between that paper and the data presented here are explained in Chapter 2, Materials and methods. d pKa and Hill coefficient determined by fixing the amplitude (∆δ) of the transition to the ∆δ obtained from the fit for the same residue in ∆+PHS at 1 M KCl e pKa and Hill coefficient obtained by fitting a two-site model (Equation (2.2)) to the pH-dependence of the Cγ/Cδ chemical shift. Only the values corresponding to the larger of the two transitions are reported. f pKa and Hill coefficient determined by fixing the amplitude (∆) of the transition to the ∆ obtained from the fit for the same residue in ∆+PHS at 0.1 M KCl a 139 Table A.2. pKa values for all carboxylic groups in ∆+PHS and ∆+PHS/D19N/D40N/E43Q measured at 1 M KCl Protein Residue Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 ∆+PHSc Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 ∆+PHS/D19N/D40N/E43Q Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 pKaa 2.88 ± 0.02d 6.02 ± 0.01d 4.28 ± 0.01 ≤ 2.1 2.71 ± 0.02 3.94 ± 0.01 3.93 ± 0.01 3.43 ± 0.01d 4.40 ± 0.01 4.08 ± 0.02 3.90 ± 0.01 4.16 ± 0.003 3.80 ± 0.01 4.04 ± 0.01d,e 4.41 ± 0.01 4.28 ± 0.04d 4.32 ± 0.01 4.08 ± 0.01 4.45 ± 0.01 5.01 ± 0.01 2.64 ± 0.01 3.87 ± 0.01 3.89 ± 0.01 3.36 ± 0.01 3.91 ± 0.01 3.87 ± 0.01 4.120 ± 0.004 3.73 ± 0.01 4.10 ± 0.12 4.40 ± 0.01 4.28 ± 0.02 4.26 ± 0.02 4.06 ± 0.01 4.49 ± 0.01 140 ∆pKab 0.76 ± 0.05 -0.52 ± 0.01 0.45 ± 0.05 0.55 ± 0.04 0.13 ± 0.06 0.07 ± 0.03 0.60 ± 0.05 0.08 ± 0.03 0.15 ± 0.05 0.41 ± 0.05 0.4 ± 0.04 0.49 ± 0.01 0.74 ± 0.02 0.60 ± 0.06 0.39 ± 0.06 0.57 ± 0.06 0.33 ± 0.05 -0.04 ± 0.02 0.44 ± 0.01 0.43 ± 0.01 0.12 ± 0.01 0.13 ± 0.01 0.46 ± 0.02 0.36 ± 0.01 0.38 ± 0.01 0.42 ± 0.01 0.49 ± 0.01 0.81 ± 0.12 0.57 ± 0.01 0.37 ± 0.03 0.51 ± 0.03 0.32 ± 0.01 0.09 ± 0.01 na 0.83 ± 0.04d 0.94 ± 0.02d 0.81 ± 0.01 0.90 ± 0.03 0.96 ± 0.01 0.92 ± 0.01 1.01 ± 0.02d 0.81 ± 0.02 0.84 ± 0.02 0.98 ± 0.02 1.03 ± 0.01 0.91 ± 0.02 1.00 ± 0.02d,e 0.89 ± 0.03 0.76 ± 0.04d 0.83 ± 0.01 0.95 ± 0.01 0.88 ± 0.01 0.95 ± 0.02 0.93 ± 0.01 0.90 ± 0.01 0.88 ± 0.01 0.95 ± 0.02 0.94 ± 0.02 0.97 ± 0.02 1.00 ± 0.01 0.88 ± 0.02 1.1 ± 0.2 0.90 ± 0.01 0.80 ± 0.02 0.83 ± 0.02 0.90 ± 0.01 0.94 ± 0.01 pKa values and Hill coefficients obtained by fitting the modified Hill equation (Equation (2.2)) to the pH-dependence of the Cγ/Cδ chemical shift, unless otherwise indicated. Titrations were performed at 298 K and 1 M KCl. Values reported are those from a single titration experiment with corresponding errors of fit. b Change in pKa relative to the same variant at 100 mM KCl: ∆pKa = pKa1M – pKa100mM c pKa values obtained using the data from Castañeda et al.l1 Reasons for discrepancies between that paper and the data presented here are explained in Materials and Methods. d pKa and Hill coefficient obtained by fitting a two-site model (Equation (2.3)) to the pH-dependence of the Cγ/Cδ chemical shift. Only the values corresponding to the larger of the two transitions are reported. e pKa and Hill coefficient determined by fixing the amplitude (∆) of the transition to the ∆ obtained from the fit for the same residue in ∆+PHS at 0.1 M KCl a 141 Table A.3. X-Ray data collection and refinement statistics for ∆+PHS/D21N PDB accession code Data collection Space group Cell dimensions a (Å) b (Å) c (Å) β (°) Wavelength (Å) Temperature (K) Resolutiona (Å) Rmergea,b (%) I/σ(I)a Redundancya No. unique reflectionsa Completenessa (%) Wilson B-factor (Å2) Refinement Resolutiona (Å) No. reflectionsa Rworka,c (%) Rfreea,c (%) No. molecules per asymmetric unit No. atoms Protein Ligand Water Average B-factors (Å2) Protein atoms Ligand Water R.M.S.D. Bond lengths (Å) Bond angles (°) Ramachandran plot No. in most favored regions (%) No. in additionally allowed regions (%) No. in generously allowed regions (%) No. in disallowed regions (%) Total No. Non-Gly, Non-Pro Residues a The 3LX0 P21 30.75 60.38 34.53 97.76 0.9795 100 27.2–1.45 (1.48-1.45) 6.7 (30.0) 16.4 (6.9) 6.7 (5.8) 21,820 (1013) 98.3 (93.4) 24.0 24.45–1.50 (1.54-1.50) 19,778 (1307) 17.9 (21.9) 22.2 (25.2) 1 1140 30 142 17.8 18.8 27.6 0.02 1.9 105 (86.8) 15 (12.4) 0 (0) 1 (0.8) 121 value in parentheses is for the highest resolution shell 142 b Rmerge Ihkl, j Ihkl hkl I j hkl hkl, j where Ihkl ,j represents the jth observation of the intensity j of a unique set of indices hkl, and <Ihkl> is the mean intensity for this set of indices c Rwork Fobs Fcalc hkl F obs calculated using 95% of reflections, while Rfree reports the same hkl calculation using the remaining 5% of reflections. Rfree calculated using the same set of reflections used to calculate Rfree for the molecular replacement model. 143 Figure A.1. (following page) (a) Overlay of the structures of ∆+PHS (PDB accession code 3BDC1, white) and ∆+PHS/D21N (PDB ID 3LX0, green), showing the ionizable groups in the active site. (b) Same for NVIAGA/E75A (PDB ID 2RDF2, white) and ∆+PHS/D21N (green). 144 145 A.1 References 1. Castañeda, C.A., Fitch, C.A., Majumdar, A., Khangulov, V., Schlessman, J.L. & García‐Moreno, B.E. (2009). Molecular determinants of the pKa values of Asp and Glu residues in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 77, 570–588 2. Baran, K.L., Chimenti, M.S., Schlessman, J.L., Fitch, C.A., Herbst, K.J. & GarcíaMoreno, B. (2008). Electrostatic effects in a network of polar and ionizable groups in staphylococcal nuclease. Journal of Molecular Biology 379, 1045–1062 146 Appendix B Supplementary information for Chapter 3, “Conformational Reorganization of the Backbone Influences the pKa Values of Ionizable Groups in Proteins” 147 Table B.1: pKa values of select Asp & Glu residues measured by NMR spectroscopy.a Protein Residue pKab ∆pKac Asp-19 2.12 ± 0.05f,g Asp-21 6.54 ± 0.01g g Asp-40 3.83 ± 0.05 Asp-77 ≤ 1.7e Asp-83 Asp-95 2.16 ± 0.04f Asp-143 3.81 ± 0.06 Asp-146 3.86 ± 0.03 f,g Glu-10 2.83 ± 0.05 Glu-43 4.32 ± 0.03 ∆+PHSd Glu-52 3.93 ± 0.05 Glu-57 3.49 ± 0.05 Glu-67 3.76 ± 0.04 Glu-73 3.31 ± 0.01 Glu-75 3.30 ± 0.02g Glu-101 3.81 ± 0.06 Glu-122 3.89 ± 0.05 Glu-129 3.75 ± 0.06 Glu-135 3.75 ± 0.05 Glu-142 4.49 ± 0.02 Asp-19 2.18 ± 0.01f,g 0.06 ± 0.05 Asp-21 6.52 ± 0.05g -0.02 ± 0.05 Asp-40 3.79 ± 0.02g -0.04 ± 0.05 Asp-77 ≤ 1.6e Asp-83 f Asp-95 2.15 ± 0.01 -0.01 ± 0.04 Asp-143 3.75 ± 0.01 -0.06 ± 0.06 Asp-146 3.76 ± 0.01 -0.10 ± 0.03 Glu-10 2.90 ± 0.01f,g 0.07 ± 0.05 Glu-43 4.23 ± 0.01 -0.09 ± 0.03 ∆+PHS/P11G Glu-52 3.85 ± 0.02 -0.08 ± 0.05 Glu-57 3.45 ± 0.01 -0.04 ± 0.05 Glu-67 3.71 ± 0.01 -0.05 ± 0.04 Glu-73 3.20 ± 0.02 -0.11 ± 0.02 g Glu-75 3.17 ± 0.05 -0.13 ± 0.05 Glu-101 3.84 ± 0.01 0.03 ± 0.06 Glu-122 3.89 ± 0.02 0.00 ± 0.05 Glu-129 3.73 ± 0.01 -0.02 ± 0.06 Glu-135 3.73 ± 0.01 -0.02 ± 0.05 Glu-142 4.40 ± 0.01 -0.09 ± 0.02 f,g Asp-19 2.31 ± 0.03 0.19 ± 0.06 ∆+PHS/A60G Asp-21 6.59 ± 0.01g 0.05 ± 0.01 g Asp-40 3.94 ± 0.01 0.11 ± 0.05 148 ∆+PHS/A69G ∆+PHS/M98G Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 ≤ 1.7e 2.38 ± 0.01f 3.87 ± 0.01 3.86 ± 0.01 3.02 ± 0.01f,g 4.35 ± 0.01 3.94 ± 0.02 3.67 ± 0.01 3.85 ± 0.01 3.38 ± 0.02 3.45 ± 0.01f,g 3.98 ± 0.01 4.01 ± 0.02 3.92 ± 0.01 3.86 ± 0.01 4.48 ± 0.01 2.18 ± 0.03f,g 6.52 ± 0.03g 3.79 ± 0.04g ≤ 1.6e 2.77 ± 0.02f 3.82 ± 0.02 3.86 ± 0.02 2.86 ± 0.01f,g 4.33 ± 0.02 3.98 ± 0.04 3.52 ± 0.02 3.79 ± 0.02 3.20 ± 0.05 3.27 ± 0.04g 3.77 ± 0.04 3.80 ± 0.05 3.66 ± 0.04 3.75 ± 0.02 4.51 ± 0.01 2.38 ± 0.08f,g 6.54 ± 0.02g 3.76 ± 0.07g ≤ 2.5e 2.25 ± 0.05f 3.74 ± 0.02 3.74 ± 0.01 149 0.22 ± 0.04 0.06 ± 0.06 0.00 ± 0.03 0.19 ± 0.05 0.03 ± 0.03 0.01 ± 0.05 0.18 ± 0.05 0.09 ± 0.04 0.07 ± 0.02 0.15 ± 0.02 0.17 ± 0.06 0.12 ± 0.05 0.17 ± 0.06 0.11 ± 0.05 -0.01 ± 0.02 0.06 ± 0.06 -0.02 ± 0.03 -0.04 ± 0.06 0.61 ± 0.04 0.01 ± 0.06 0.00 ± 0.04 0.03 ± 0.05 0.01 ± 0.04 0.05 ± 0.06 0.03 ± 0.05 0.03 ± 0.04 -0.11 ± 0.05 -0.03 ± 0.04 -0.04 ± 0.07 -0.09 ± 0.07 -0.09 ± 0.07 0.00 ± 0.05 0.02 ± 0.02 0.26 ± 0.09 0.00 ± 0.02 -0.07 ± 0.09 0.09 ± 0.06 -0.07 ± 0.06 -0.12 ± 0.03 ∆+PHS/A130G ∆+PHS/M98A Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 Glu-52 Glu-57 Glu-67 2.94 ± 0.02f,g 4.24 + 0.02 3.80 ± 0.05 3.46 ± 0.07 3.73 ± 0.02 3.24 ± 0.01f 3.92 ± 0.05 3.32 ± 0.01f 3.86 ± 0.06 3.66 ± 0.08 3.83 ± 0.03 4.37 ± 0.01 2.37 ± 0.01f,g 6.56 ± 0.01g 3.96 ± 0.01g ≤ 1.6e 2.37 ± 0.01f 3.87 ± 0.01 3.85 ± 0.01 3.04 ± 0.01f,g 4.36 ± 0.01 3.96 ± 0.01 3.61 ± 0.01 3.85 ± 0.01 3.39 ± 0.02 3.50 ± 0.03g 4.02 ± 0.01 4.05 ± 0.02 3.86 ± 0.02 3.83 ± 0.02 4.49 ± 0.01 2.32 ± 0.07f,g 6.55 ± 0.02g 3.71 ± 0.08g ≤ 2.7e 2.13 ± 0.06f 3.77 ± 0.02 3.80 ± 0.03 2.81 ± 0.01f,g 4.31 ± 0.03 3.94 ± 0.03 3.43 ± 0.05 3.68 ± 0.04 150 0.11 ± 0.05 -0.08 ± 0.04 -0.13 ± 0.07 -0.03 ± 0.09 -0.03 ± 0.04 -0.07 ± 0.01 0.62 ± 0.05 -0.49 ± 0.06 -0.03 ± 0.08 -0.09 ± 0.10 0.08 ± 0.06 -0.12 ± 0.02 0.25 ± 0.05 0.02 ± 0.01 0.13 ± 0.05 0.21 ± 0.04 0.06 ± 0.06 -0.01 ± 0.03 0.21 ± 0.05 0.04 ± 0.03 0.03 ± 0.05 0.12 ± 0.05 0.09 ± 0.04 0.08 ± 0.02 0.20 ± 0.04 0.21 ± 0.06 0.16 ± 0.05 0.11 ± 0.06 0.08 ± 0.05 0.00 ± 0.02 0.2 ± 0.09 0.01 ± 0.02 -0.12 ± 0.09 -0.03 ± 0.07 -0.04 ± 0.06 -0.06 ± 0.04 -0.02 ± 0.05 -0.01 ± 0.04 0.01 ± 0.06 -0.06 ± 0.07 -0.08 ± 0.06 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 Glu-43 ∆+PHS/A58G/A60G Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 Glu-135 Glu-142 Asp-19 Asp-21 Asp-40 Asp-77 Asp-83 Asp-95 Asp-143 Asp-146 Glu-10 ∆+PHS/A128G/A130G Glu-43 Glu-52 Glu-57 Glu-67 Glu-73 Glu-75 Glu-101 Glu-122 Glu-129 3.11 ± 0.01f 3.91 ± 0.16f 3.27 ± 0.02f 3.93 ± 0.06 3.75 ± 0.10 3.86 ± 0.03 4.43 ± 0.01 2.18 ± 0.11f,g 6.33 ± 0.01g 3.88 ± 0.05g ≤ 1.8e 1.93 ± 0.05f 3.92 ± 0.03 3.99 ± 0.03 2.80 ± 0.01f,g 4.36 ± 0.01 4.02 ± 0.01f 3.61 ± 0.01f 3.78 ± 0.02 3.20 ± 0.01f 3.19 ± 0.01f,g 3.83 ± 0.05 3.73 ± 0.08 3.84 ± 0.05 3.81 ± 0.03 4.48 ± 0.01 2.22 ± 0.08f,g 6.62 ± 0.02g 3.93 ± 0.03g ≤ 1.8e 2.13 ± 0.04f 3.93 ± 0.03 4.00 ± 0.02 2.88 ± 0.01f,g 4.36 ± 0.01 4.09 ± 0.01 3.67 ± 0.02 3.84 ± 0.02 3.28 ± 0.01f 3.31 ± 0.02f,g 3.97 ± 0.02h 3.89 ± 0.07 3.69 ± 0.05 151 -0.20 ± 0.01 0.60 ± 0.16 -0.54 ± 0.06 0.04 ± 0.07 0.00 ± 0.12 0.11 ± 0.06 -0.06 ± 0.02 0.06 ± 0.12 -0.21 ± 0.01 0.05 ± 0.07 -0.23 ±0.06 -0.11 ± 0.07 0.13 ± 0.04 -0.03 ± 0.05 0.04 ± 0.03 0.09 ± 0.05 0.12 ± 0.05 0.02 ± 0.04 -0.11 ± 0.01 -0.11 ± 0.02 0.02 ± 0.08 -0.16 ± 0.09 0.09 ± 0.08 0.06 ± 0.06 -0.01 ± 0.02 0.10 ± 0.09 0.08 ± 0.02 0.10 ± 0.06 -0.03 ± 0.06 0.12 ± 0.07 0.14 ± 0.04 0.05 ± 0.05 0.04 ± 0.03 0.16 ± 0.05 0.18 ± 0.05 0.08 ± 0.04 -0.03 ± 0.01 0.00 ± 0.03 0.16 ± 0.06 0.00 ± 0.09 -0.06 ± 0.08 Glu-135 4.08 ± 0.02 0.33 ± 0.05 Glu-142 4.52 ± 0.01 0.03 ± 0.02 a Measurements were performed at 298 K and 100 mM KCl. b pKa values were obtained by fitting a single-site modified Hill equation to the data, unless otherwise indicated. Values reported are from a single titration experiment with corresponding goodness of fit, unless otherwise indicated. c Change in pKa relative to ∆+PHS d pKa values for ∆+PHS are means & standard errors over 3 independent titration experiments, using the data from Castañeda et al1. e Upper limit for Asp-77 pKa obtained by fitting the data to a two-site modified Hill equation with a fixed Hill coefficient of 1 and a fixed ∆δ of 1.85 ppm for the low-pH transition f Fit performed by fixing the amplitude of the titration (∆δ) to the value obtained from the titration of the same residue in ∆+PHS at 1M KCl1 or at 100 mM (for Glu-73 & Glu-75) g pKa values obtained by fitting a two-site modified Hill equation to the data. Only values corresponding to the larger of the two transitions are reported. h Fit performed by fixing ∆δ to the largest value obtained for titration of other Glu residues (4.45) 152 Table B.2: Crystallographic statistics for ∆+PHS/M98G and ∆+PHS/A69G Variant ∆+PHS/M98G ∆+PHS/A69G 2+ Ca & pdTp present? Yes No Yes No PDB accession code 3S9W 3SK8 3SR1 3T13 Data collection: Space Group P21 P41 P21 P21 Unit cell dimensions: a (Å) 31.21 48.37 31.22 46.6 b (Å) 60.53 48.37 60.65 63.76 c (Å) 38.35 63.45 38.09 49.87 β (°) 93.13 90 93.69 91.92 Wilson B-factor 30.3 34.3 22.6 27.6 50.0-1.9 50.0-1.9 50-1.45 50-1.80 Resolution (Å) (1.93-1.90)a (1.93-1.9) (1.48-1.45) (1.83-1.80) Completeness (%) 98.9 (90.4) 99.7 (99.3) 99.7 (99.7) 99.9 (99.3) Rmerge (%) 5.3 (24.7) 7.3 (28.3) 4.9 (27.9) 8.5 (25.9) I/σ(I) 14.9 (5.9) 13.3 (10.2) 17.7 (6.4) 10.5 (7.2) Redundancy 3.9 (3.5) 13.5 (13.1) 7.1 (6.2) 7.3 (6.9) # unique reflections 11311 (519) 11646 (581) 25258 (1230) 27161 (1331) Refinement: # reflections # reflections in Rfree set Resolution (Å) Rwork (%) Rfree (%) # molecules/asymmetric unit # atoms: Protein Water 11284 (816) 538 (43) 38.3-1.90 (1.94-1.90) 17.12 (21.10) 21.15 (24.40) 1 11629 (857) 555 (52) 48.37-1.90 (1.95-1.90) 16.5 (18.1) 21.0 (24.9) 1 25078 (1860) 2487 (195) 30.33-1.45 (1.49-1.45) 16.4 (22.6) 20.2 (26.1) 1 27144 (1961) 1363 (99) 49.85-1.80 (1.84-1.80) 15.6 (20.4) 19.9 (25.5) 2 1029 109 1029 96 1032 153 2073 266 153 Ligand 25 18 Ions 1 0 Average B-factors: Protein atoms 26.7 30 Water 20.1 37 Ligand 19.6 31.7 RMSD from ideal: Bond Lengths (Å) 0.019 0.02 Bond Angles (°) 1.642 1.704 # of TLS groups 5 7 Molprobity validation: Rotamer outliers (%) 0 1.89 Ramachandran outliers (%) 0 0 Ramachandran favored (%) 95.2 97.64 a Values in parentheses refer to the highest-resolution shell n b c Rwork Fo hkl Fc hkl hkl 48 1 17.5 28.3 11.8 19.6 31 26.3 0.019 1.861 7 0.018 1.623 9 (2/7) 0 0 95.28 1.86 0 97.65 n Rmerge Ii hkl Ihkl hkl i1 25 1 Ihkl hkl i1 F hkl calculated using the 95% of reflections used in model building, while Rfree reports the o hkl same value calculated using the remaining 5% of reflections. For A69G with Ca2+ and pdTp, Rfree was calculated using the same set of reflections used to calculate Rfree for the molecular replacement model, otherwise the Rfree set was chosen randomly. 154 Table B.3: RMSD of Gly variant crystal structures relative to ∆+PHS PDB ID Cα RMSD All-atom RMSD 3S9W 0.116 0.162 3SR1 0.123 0.165 3SK8 0.283 0.357 3T13, chain A 0.271 0.368 3T13, chain B 0.300 0.387 155 Table B.4: Hydrogen exchange rates measured in ∆+PHS and Gly variants.a ∆+PHS ∆+PHS/A69G b Residue ∆Gex (kcal/mol) ∆Gex (kcal/mol)b ∆∆Gexc 10 4.25 ± 0.02 4.35 ± 0.02 -0.1 ± 0.03 12 5.88 ± 0.02 5.78 ± 0.04 0.1 ± 0.04 13 5.468 ± 0.007 5.431 ± 0.005 0.038 ± 0.008 14 1.82 ± 0.03 1.94 ± 0.02 -0.12 ± 0.03 15 > 6.1d > 5.7 16 6.73 ± 0.02 6.55 ± 0.03 0.19 ± 0.04 18 4.92 ± 0.011 4.831 ± 0.008 0.089 ± 0.013 19 5.448 ± 0.004 5.392 ± 0.007 0.056 ± 0.008 21 7.503 ± 0.006 7.362 ± 0.007 0.14 ± 0.009 22 > 8.1 > 7.7 23 7.19 ± 0.09 > 6.3 < 0.9 24 > 7.1 > 6.8 25 > 6.8 > 6.4 26 > 7.1 > 6.7 27 5.81 ± 0.008 5.75 ± 0.012 0.06 ± 0.015 30 6.347 ± 0.008 6.328 ± 0.008 0.019 ± 0.011 32 7.05 ± 0.04 7.29 ± 0.1 -0.24 ± 0.11 34 7.99 ± 0.14 >7 <1 35 > 7.6 > 7.2 36 7.32 ± 0.06 > 6.5 < 0.8 37 > 6.3 > 5.9 38 2.125 ± 0.004 2.094 ± 0.009 0.03 ± 0.01 39 > 6.1 > 5.8 40 5.241 ± 0.003 5.187 ± 0.006 0.054 ± 0.007 e 51 < 3.2 3.04 ± 0.21 < 0.2 52 3.14 ± 0.03 3.07 ± 0.02 0.07 ± 0.04 54 < 2.2 < 2.2 156 ∆+PHS/M98G ∆Gex (kcal/mol)b ∆∆Gexc 3.94 ± 0.02 0.31 ± 0.03 5.312 ± 0.003 0.56 ± 0.02 5.273 ± 0.003 0.195 ± 0.007 1.83 ± 0.1 -0.01 ± 0.11 6.92 ± 0.15 > -0.8 6.216 ± 0.005 0.52 ± 0.02 4.833 ± 0.009 0.087 ± 0.014 5.367 ± 0.004 0.082 ± 0.006 7.274 ± 0.004 0.228 ± 0.007 > 8.2 > 6.8 < 0.4 > 7.2 > 6.8 > 7.2 5.844 ± 0.006 -0.035 ± 0.01 6.432 ± 0.004 -0.085 ± 0.009 7.09 ± 0.04 -0.04 ± 0.05 > 7.4 < 0.6 > 7.7 >7 < 0.3 > 6.4 2.254 ± 0.008 -0.13 ± 0.009 > 6.2 4.923 ± 0.004 0.317 ± 0.005 < 3.24 3.27 ± 0.04 -0.13 ± 0.05 < 2.29 55 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 82 83 85 86 87 88 89 6.828 ± 0.013 <2 7.05 ± 0.04 7.27 ± 0.03 7.6 ± 0.02 7.7 ± 0.1 7.82 ± 0.1 8 ± 0.07 8.05 ± 0.08 7.79 ± 0.05 6.94 ± 0.13 >7 3.108 ± 0.014 4.732 ± 0.012 < 2.4 4.386 ± 0.01 1.81 ± 0.02 > 6.9 > 6.2 >7 3.61 ± 0.009 5.146 ± 0.006 2.891 ± 0.009 3.083 ± 0.005 4.87 ± 0.02 3.38 ± 0.02 3.12 ± 0.03 5.71 ± 0.02 6.248 ± 0.007 6.22 ± 0.02 6.84 ± 0.02 < 1.9 > 6.8 7.31 ± 0.12 7.9 ± 0.09 > 6.7 >7 > 7.2 > 7.1 > 7.2 > 6.2 > 6.7 < 2.8 < 3.2 < 2.6 3.488 ± 0.006 1.69 ± 0.02 > 6.5 > 5.9 > 6.7 3.685 ± 0.006 5.159 ± 0.008 2.879 ± 0.004 3.073 ± 0.006 4.85 ± 0.02 3.363 ± 0.013 3.06 ± 0.02 5.7 ± 0.03 6.239 ± 0.009 6.41 ± 0.05 -0.01 ± 0.02 < 0.2 -0.04 ± 0.12 -0.3 ± 0.1 <1 < 0.8 < 0.8 < 0.9 < 0.6 < 0.7 > 0.3 > 1.5 0.898 ± 0.011 0.12 ± 0.03 -0.076 ± 0.01 -0.013 ± 0.01 0.012 ± 0.01 0.01 ± 0.008 0.02 ± 0.03 0.02 ± 0.02 0.07 ± 0.04 0.01 ± 0.03 0.009 ± 0.011 -0.19 ± 0.05 157 6.944 ± 0.014 < 2.01 7.43 ± 0.06 7.509 ± 0.011 7.82 ± 0.02 > 7.1 > 7.5 8.42 ± 0.07 8.73 ± 0.1 > 7.6 > 6.7 > 7.1 2.99 ± 0.03 4.658 ± 0.01 < 2.44 4.385 ± 0.008 1.756 ± 0.014 >7 > 6.3 > 7.1 < 1.96 3.493 ± 0.01 < 2.19 2.609 ± 0.012 3.658 ± 0.014 3.088 ± 0.012 3.112 ± 0.01 4.579 ± 0.015 4.999 ± 0.007 5.182 ± 0.006 -0.12 ± 0.02 -0.38 ± 0.07 -0.23 ± 0.03 -0.21 ± 0.03 < 0.5 < 0.4 -0.42 ± 0.1 -0.68 ± 0.13 < 0.2 < 0.3 0.12 ± 0.03 0.07 ± 0.02 0.001 ± 0.012 0.05 ± 0.02 > 1.7 1.654 ± 0.012 > 0.7 0.474 ± 0.013 1.21 ± 0.02 0.29 ± 0.02 0.01 ± 0.03 1.13 ± 0.02 1.249 ± 0.01 1.04 ± 0.02 90 91 92 93 94 95 97 99 100 101 102 103 104 105 106 107 108 109 110 111 112 122 125 126 127 128 129 130 131 132 7.55 ± 0.07 6.59 ± 0.11 > 6.5 7.43 ± 0.14 > 7.5 4.174 ± 0.008 6.833 ± 0.01 > 6.6 8.35 ± 0.11 > 7.7 7.44 ± 0.06 6.86 ± 0.07 6.74 ± 0.15 7.63 ± 0.06 > 7.8 >8 > 6.8 7.66 ± 0.07 > 7.3 4.447 ± 0.01 3.835 ± 0.008 4.082 ± 0.008 6.55 ± 0.05 > 7.2 5.236 ± 0.003 7 ± 0.06 > 7.2 > 7.2 5.386 ± 0.003 > 7.7 > 6.7 6.63 ± 0.12 > 6.1 > 6.3 > 7.1 3.009 ± 0.014 7.26 ± 0.03 > 6.2 > 7.5 > 7.3 > 6.8 > 6.2 > 5.8 > 6.9 > 7.4 > 7.7 > 6.5 > 6.7 >7 4.409 ± 0.011 3.77 ± 0.005 4.037 ± 0.012 > 5.9 > 6.8 5.232 ± 0.003 7.47 ± 0.12 > 6.9 > 6.8 5.351 ± 0.003 > 7.3 < 0.8 -0.05 ± 0.16 < 1.1 1.17 ± 0.02 -0.42 ± 0.03 < 0.9 < 0.6 < 0.6 <1 < 0.7 < 0.9 0.038 ± 0.015 0.065 ± 0.009 0.045 ± 0.014 < 0.6 0.003 ± 0.004 -0.47 ± 0.13 0.035 ± 0.005 158 > 7.2 6.59 ± 0.08 6.04 ± 0.05 > 6.8 > 7.5 3.877 ± 0.005 7.118 ± 0.01 > 6.8 6.388 ± 0.005 5.467 ± 0.006 > 7.3 > 6.7 > 6.2 > 7.4 > 7.9 > 8.1 > 6.9 > 7.2 > 7.4 4.495 ± 0.007 3.769 ± 0.005 4.05 ± 0.03 4.967 ± 0.008 6.6 ± 0.04 5.067 ± 0.01 6.79 ± 0.04 > 7.3 > 7.3 5.341 ± 0.004 > 7.7 < 0.4 0 ± 0.13 > 0.4 < 0.6 0.298 ± 0.01 -0.286 ± 0.014 1.96 ± 0.11 > 2.2 < 0.2 < 0.2 < 0.5 < 0.2 < 0.5 -0.048 ± 0.012 0.066 ± 0.009 0.03 ± 0.03 1.59 ± 0.05 > 0.6 0.168 ± 0.011 0.21 ± 0.07 0.045 ± 0.005 133 > 7.3 >7 > 7.4 134 7.29 ± 0.04 > 7.1 < 0.2 7.58 ± 0.03 -0.29 ± 0.05 135 5.871 ± 0.007 5.823 ± 0.007 0.049 ± 0.01 5.962 ± 0.007 -0.09 ± 0.01 136 6.26 ± 0.011 6.34 ± 0.02 -0.08 ± 0.02 6.317 ± 0.008 -0.058 ± 0.014 137 7.22 ± 0.12 > 6.4 < 0.8 > 6.8 < 0.4 138 3.301 ± 0.013 3.293 ± 0.013 0.01 ± 0.02 3.331 ± 0.014 -0.03 ± 0.02 139 7.05 ± 0.09 > 6.5 < 0.6 7.44 ± 0.11 -0.4 ± 0.15 140 > 6.5 > 6.1 > 6.6 141 6.397 ± 0.008 6.371 ± 0.005 0.026 ± 0.009 6.45 ± 0.003 -0.054 ± 0.008 a Measurements were performed at 298K, 100 mM KCl, pH* 5.05-5.15. b Free energy of exchange (∆Gex) was calculated as RTln(kint/kex.) Values for kint were calculated based on sequence and experimental conditions as described in Materials and Methods. c Difference in ∆Gex between the Gly variant and ∆+PHS d ∆Gex for residues exhibiting less than 20% exchange are given as lower limits, as described in Materials and Methods. e ∆Gex for residues that exchange within the dead time of the experiment are given as upper limits, as described in Materials and Methods. 159 Figure B.1: Shifts of Asp & Glu pKa in variants M98G, M98A, A58G/A60G, and A128G/A130G relative to ∆+PHS. B.1 References 1. Castañeda, C.A., Fitch, C.A., Majumdar, A., Khangulov, V., Schlessman, J.L. & García‐ Moreno, B.E. (2009). Molecular determinants of the pKa values of Asp and Glu residues in staphylococcal nuclease. Proteins: Structure, Function, and Bioinformatics 77, 570–588 160 Vita Brian M. Doctrow was born on October 10, 1983 in Baltimore, Maryland, where he lived until going to college. From the time he first could read, he developed a passion for learning and a natural curiosity about the world around him. His interest in science became fully mature in high school, thanks in large part to the enthusiasm of his tenth-grade biology teacher. In the fall of 2002, Brian entered Rice University in Houston, Texas, where he majored in physics with a biophysics concentration. He graduated with a B.S. in physics in May of 2006. Following graduation, he spent a year as a post-baccalaureate research fellow at the National Institute of Alcoholism and Alcohol Abuse in Rockville, Maryland. There he worked with Drake Mitchell, studying how the properties of rhodopsin are affected by different methods of sample preparation. Brian chose to study biophysics to combine his interests in the quantitative aspects of physics with the problems of chemistry and biology. He enrolled in the graduate Program in Molecular Biophysics at Johns Hopkins University, matriculating in August 2007. Under the guidance of his advisor, Bertrand García-Moreno, Brian studied electrostatic effects in proteins from both an experimental and computational perspective. After receiving his Ph.D., Brian hopes to become a science journalist and writer to help raise understanding and awareness of science in the general public. 161