Structure and Determinants of the Alpha-Lactalbumin Molten Globule by Lawren Chialun Wu Submitted to the Department of Biology in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in Biology at the Massachusetts Institute of Technology February 1998 © 1998 Massachusetts Institute of Technology All rights reserved. Signature of Author Department of Biology November 27, 1997 Certified by Peter S. Kim Professor, Department of Biology Thesis Supervisor Accepted by Frank Solomon Chairman, Biology Graduate Committee JAN 2 6 LIUlAM ES Dedication To Mom and Dad, Thank you for encouraging my interest and pursuit of science and for providing me with endless love and support. Acknowledgments I am fortunate to have known and been influenced by many people whose scientific expertise and outlooks on life have shaped and inspired my own pursuits. First, I would like to thank those teachers who played a major role early in my career. Thomas Twilley and Ron Klopfer, my high school biology teachers, introduced me to the world of biology and made science classes exciting and interesting. In college, Professor Clarence Schutt introduced me to the protein folding problem, an area of research which I immediately found fascinating. I am indebted to Professor Jannette Carey, my undergraduate research adviser, in whose lab I first tinkered with protein folding and who has given me invaluable guidance and support. I would also like to thank Professor Heinrich Roder, with whom I developed a fruitful collaboration during my time in the Carey lab. I am glad to have known many graduate students and postdocs, each of whom has left a mark on me. Teresa Lavoie, Dale Lewis, Mitch Gittelman, and Maria Luisa Tasayco were great teachers and colleagues during my time in the Carey lab. In graduate school, I have been surrounded by a large group of dynamic scientists. I thank Jamie McKnight and Peter Petillo for helping me get acquainted with the Kim lab and keeping me from getting lost. Steve Blacklow, Chave Carr, Andrea Cochran, David Lockhart, Kevin Lumb, and Masaru Ueno were helpful resources who made lab life fun. Pehr Harbury and Dan Minor were fountains of protein folding knowledge and offered thoughtful criticism and advice. Yutaka Kuroda and Yoshi Hagihara have also been good protein folding colleagues. I enjoyed the company of Rachel Fennell-Fezzie and Christine Sonu, two undergraduates with whom I worked. I am grateful for the generosity, honesty, and companionship of my classmate Debbie Fass. I would like to acknowledge David Akey, David Chan, Debbie Ehrgott, John Newman, Michael Root, Brian Schneider, and Ethan Wolf, who joined the Kim lab towards the end of my stay. I appreciate their advice, support, patience, and humor. I am grateful for the support and help of several technicians, Mike Burgess, Mike Milhollen, James Pang, and Rheba Rutkowski, as well as administrative assistants Susan Doka and Pam Dvorak. They are the backbone of the Kim lab and have made my graduate years go by smoothly. I am especially indebted to three colleagues from the Kim lab, Zheng-yu Peng, Brenda Schulman, and Martha Oakley. Peng introduced me to c- lactalbumin and was a valuable and gracious mentor, allowing me to share in the system that he developed. Brenda also worked on xo-lactalbumin, forming a small lab subgroup with Peng and myself. I benefitted greatly from Peng's and Brenda's experience, insight, and countless discussions. In addition, I am grateful for Brenda's generosity and kindness. Martha Oakley was my baymate for much of my time in the Kim lab. Her infinite support, advice, and humor are greatly appreciated, and I will forever value her friendship. I would like to thank the members of my thesis committee, Professors Bob Sauer and Jamie Williamson, for thinking critically about my work and offering advice and direction. I also thank Professor Michael Hecht for being an outside reader and member of my thesis committee. Professor Peter Kim has been my graduate school adviser and mentor. I am particularly grateful for his support and guidance. Peter has put together a vibrant group from which I have benefited greatly. I appreciate his constructive criticism, thoroughness, and talent for identifying key issues and providing penetrating insight. I am indebted to him for taking me into his lab, guiding me through thick and thin, and providing me with unwavering support. Finally, I would like to thank Cindy Bogden, who has been a great friend and dependable companion. She has listened, cheered, sacrificed, and encouraged, and I am lucky to share a relationship with her. Structure and Determinants of the Alpha-Lactalbumin Molten Globule by Lawren C. Wu Submitted to the Department of Biology on November 27, 1997 in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Biology Abstract The structure and determinants of the molten globule folding intermediate of a-lactalbumin (c-LA) are investigated using a combination of site-directed mutagenesis, disulfide chemistry, and spectroscopy. The a-LA molten globule folding intermediate may best be described as a bipartite structure consisting of a molten globule helical domain containing helices arranged in a native-like topology and a largely unstructured beta sheet domain (Chapter 2). Analysis of thiol-disulfide effective concentrations for native and non-native cysteine pairs in the helical domain of the a-LA molten globule reveals local preferences for native-like structure (Chapter 3). Analysis of hydrophobic core mutants defines a stabilizing subdomain containing specific native-like sidechain interactions (Chapter 4). Finally, simultaneous mutation of all hydrophobic sidechains to a representative hydrophobic residue leucine in a recombinant model of the isolated helical domain suggests that specific hydrophobic sidechain diversity may not be necessary to establish the native-like fold of the a-LA molten globule (Chapter 5). The appendix describes studies aimed at clarifying the requirements for forming fixed tertiary interactions from the molten globule state. It is shown that acquisition of specific tertiary interactions requires calcium-dependent organization of the beta sheet domain and the interdomain regions of a-LA. Thesis Supervisor: Peter S. Kim Title: Professor of Biology Table of Contents Dedication Acknowledgments Abstract Table of Contents Chapter 1 Introduction: Protein Folding and Molten Globule Structures Chapter 2 Chapter 2 has been published as L. C. Wu, Z.-y. Peng, and P. S. Kim, "Bipartite Structure of the cx-Lactalbumin Molten Globule." Nature Structural Biology 2: 281-286 (1995). © Macmillan Magazines Ltd. Chapter 3 Chapter 3 has been published as Z.-y. Peng, L. C. Wu, and P. S. Kim, "Local Structural Preferences in the a-Lactalbumin Molten Globule." Biochemistry 34: 3248-3252 (1995). © American Chemical Society. Chapter 4 A Specific Hydrophobic Core in the a-Lactalbumin Molten Globule Chapter 5 79 Hydrophobic Sequence Minimization of the a-Lactalbumin Molten Globule Appendix 105 The appendix has been published as L. C. Wu, B. A. Schulman, Z.-y. Peng, and P. S. Kim, "Disulfide Determinants of CalciumInduced Packing in a-Lactalbumin." Biochemistry 35: 859-863 (1996). © American Chemical Society. Biographical Note 123 CHAPTER 1 Introduction: Protein Folding and Molten Globule Structures The process by which a linear chain of amino acids adopts a specific threedimensional structure is known as protein folding and remains a major unsolved problem in biology. Understanding protein folding is complicated by the multiple factors, such as van der Waals interactions, the hydrophobic effect, hydrogen bonding, and electrostatic interactions, that each contribute energetically to the folding process (1). Moreover, these stabilizing forces yield a net difference in the free energy of folding that is small in comparison to the total energies favoring and opposing folding. Furthermore, folding occurs quickly (milliseconds to hours) compared to the time it would take even a small protein to randomly search the extraordinarily large number of conformations accessible to the polypeptide chain (2, 3). The discrepancy between the timescale required for a random search of all conformations and the observed timescale of protein folding has led to the belief that proteins must fold via distinct pathways and intermediate structures. These pathways and intermediates reduce the conformational space to be searched and guide the polypeptide chain towards the native state (4). Much effort has therefore been invested in isolating and characterizing intermediates in the folding process, in the hope of elucidating the rules of protein folding. Schemes for identifying folding intermediates include analysis of unfolding and refolding kinetics to identify distinct folding phases associated with intermediates, and subjection of proteins to conditions that promote formation of equilibrium intermediates. Analysis of folding pathways and intermediates by experimental and theoretical approaches has led to several hypotheses of protein folding which place varying degrees of importance on global and local interactions early in the folding process. The framework hypothesis suggests that folding is driven by local interactions. Short stretches of secondary structure are thought to form early and subsequently assemble hierarchically into subdomains and whole domains (4, 5). An alternate theory, which stresses the early formation of global interactions, postulates that non-specific hydrophobic collapse restricts conformational space and induces secondary structure formation and assembly (6). A third hypothesis, termed nucleation condensation, envisions a small number of critical residues in disparate parts of the polypeptide chain that must come together to form a specific folding nucleus that drives structural organization (7). This hypothesis also emphasizes early formation of non-local interactions, but envisions a specific folding nucleus, as opposed to non-specific collapse. Cooperatively Formed Subdomains A number of small proteins have been used as model systems to study protein folding. Study of the oxidative refolding of bovine pancreatic trypsin inhibitor (BPTI), a small single domain protein containing three disulfide bonds, led to the identification of several intermediates containing subsets of the native disulfide bonds and structured subdomains (8, 9, 10). These subdomains are native-like, cooperatively folded, and contain extensive tertiary interactions and well-packed hydrophobic interfaces. Moreover, the unstructured regions can be eliminated or replaced with generic polyglycine linkers (11). Interestingly, the BPTI intermediates are kinetically trapped species, since continued oxidative folding appears to require unfolding of the stable subdomain (12). Molten Globules Another type of intermediate, first observed in the protein -lactalbumin (a-LA), differs from well-packed subdomains in that it lacks tightly packed, cooperatively formed structure. This intermediate, termed the molten globule, has been identified in a large number of proteins and is thought to be a general folding intermediate (13, 14). Molten globules are characterized by several distinct traits. First, they are compact collapsed structures, not extended chains. Physical measurements indicate that the radius of gyration of a molten globule is expanded by 20-30% compared to that of the native protein, while that of an unfolded polypeptide is expanded by more than 100% (13, 14). Second, molten globules have high amounts of secondary structure. Moreover, this secondary structure is native-like, as assessed by hydrogen exchange NMR. Third, molten globules lack extensive fixed tertiary interactions, appearing instead to be highly dynamic, fluctuating species. This is reflected in the lack of significant near-UV circular dichroism (CD) signal and in the broad linewidths and poor chemical shift dispersion observed in NMR spectra. Finally, molten globules are folded noncooperatively, as judged by denaturation transitions and, recently, by proline scanning mutagenesis (15). A number of questions regarding the structure and determinants of the molten globule are addressed in this thesis. First, it is important to determine the overall fold of the polypeptide in the molten globule state. One can imagine two extreme possibilities. The molten globule could be either a randomly collapsed structural ensemble or a specific structure with a defined native-like fold. These two possibilities have very different implications for the role of the molten globule in protein folding. A second important issue is to understand the extent of specific sidechain interactions that are present in the molten globule and their role in determining molten globule structure. Since the molten globule is a highly dynamic species, it is unclear whether specific sidechain interactions are present or play a necessary structural role. Finally, it is important to clarify how a unique native structure is acquired from the molten globule state. Model Systems for the Study of Molten Globules The molten globule forms of five proteins have been characterized extensively and have contributed significantly to the understanding of molten globule species. These proteins, cytochrome c, apomyoglobin, staphylococcal nuclease, ribonuclease H, and c-LA/lysozyme, will each be discussed briefly. Cytochrome c Equine cytochrome c is a 104 residue single domain protein, containing a covalently attached heme group and three major helices. The three major helices are all present in the equilibrium molten globule of cytochrome c, formed at low pH (16, 17). Electrostatic repulsion may influence the stability of the cytochrome c molten globule. Chemical modification of positively charged lysine residues and studies of the salt dependence of molten globule formation suggest that electrostatic repulsion must be screened in order for stable molten globule structure to be formed (18, 19). Site directed mutagenesis at the packing interface of the N- and C-terminal helices suggests that specific tertiary interactions stabilize the cytochrome c molten globule (20, 21). Analysis of folding and unfolding rate constants is consistent with the molten globule being Moreover, mutations that an on-pathway folding intermediate (21, 22). destabilize the molten globule lead to slower folding rates. Interestingly, hydrogen exchange NMR studies of the kinetics of cytochrome c folding indicate that the terminal helices fold and dock concomitantly early in the folding process, forming a structure that shares some similarities to that of the equilibrium molten globule (17, 23). It should be noted, however, that this early kinetic intermediate has also been proposed to be an off-pathway species resulting from misfolding associated with improper heme ligation (24). Thus, the precise relationship between the kinetic folding intermediate and the equilibrium molten globule of cytochrome c is still in debate. Apomyoglobin Apomyoglobin is a helical single domain protein formed upon removal of heme from myoglobin. Apomyoglobin can form two molten globule forms, one at low pH and low salt ("low pH form"), and another at low pH and moderate concentrations of trichloroacetate ("trichloroacetate form"). The low pH molten globule of apomyoglobin has been studied extensively and consists of a subdomain comprising the A, G, and H helices of the native structure (25, 26). Addition of trichloroacetate to the low pH form causes folding and recruitment of the B helix to the AGH subdomain (27). Kinetic intermediates in the folding of apomyoglobin have been studied in detail using hydrogen exchange NMR and small angle X-ray scattering (28, 29). Significantly, these studies indicate that the equilibrium low pH and trichloroacetate molten globules correspond to two consecutive kinetic folding intermediates. Studies of a collection of site directed mutants at the packing interface of the A, G, and H helices indicate that specific packing interactions, as opposed to nonspecific hydrophobic interactions, stabilize the low pH molten globule of apomyoglobin (30). Staphylococcal Nuclease Staphylococcal nuclease is a small protein consisting of subdomain and a beta barrel subdomain separated by an active Staphylococcal nuclease forms a molten globule at low pH and concentrations, as judged by global structural properties (31). High a helical site cleft. high salt resolution information at the individual residue level, such as that for cytochrome c, apomyoglobin, and cc-LA, is not available. Calorimetric studies of the molten globules of wildtype staphylococcal nuclease and a collection of point mutants lead to the conclusion that some specific packing interactions are present in the molten globule (32). Although denaturation of wildtype staphylococcal nuclease is two-state, a stable intermediate is populated upon denaturation of variants in which residues at the interface of the two subdomains or in the core of the protein are changed. This intermediate is thought to consist of an unfolded helical subdomain and a partially structured beta barrel subdomain (33, 34). Fragments of staphylococcal nuclease show evidence for residual structure in the beta barrel subdomain at physiologic conditions, similar to that observed in the unfolding intermediate. Whether the residual structure corresponds to a molten globule is in debate (33). Ribonuclease H Ribonuclease H (RNase H) is a protein containing both alpha-helical and beta-sheet structures. RNase H forms an equilibrium molten globule at low pH, in which structure is confined to a helical subdomain of the molecule (35, 36). Interestingly, NMR studies of rare partially folded forms of RNase H present under native conditions indicate that the structures of these marginally stable species are similar to the acid-induced molten globule state (37). Furthermore, studies indicate that a kinetic folding intermediate of RNase H also resembles the acid molten globule (38). These studies suggest that the first portion of RNase H to fold is the thermodynamically most stable region of the molecule. Moreover, this substructure is a high energy conformation present at equilibrium in the native state, and likely forms a core scaffold for further hierarchical folding. a-Lactalbumin/Lysozyme a-Lactalbumin is a calcium binding protein comprising a helical domain and a beta sheet domain. The protein contains eight cysteine residues that form four disulfide bonds, two in each subdomain. The molten globule of a-LA was the first discovered molten globule and can be formed under a wide variety of conditions, including moderate concentrations of denaturant, removal of the calcium ion coupled with reduction of one or more disulfide bonds, and low pH (39). Studies of lysozyme, a protein that is structurally homologous to a-LA, complement studies of a-LA. Interestingly, hen egg white lysozyme, which lacks a calcium binding site, does not form an equilibrium molten globule, but folds through a kinetic intermediate with structural features similar to those of the equilibrium molten globule of a-LA (40, 41). In the molten globule intermediates of a-LA and lysozyme, structure is confined to the helical domain, although the beta sheet domain is collapsed (42, 43, 44). Direct NMR spectroscopy during refolding of a-LA indicates that a kinetic intermediate shares properties similar to the equilibrium molten globule (45, 46). Partially denatured intermediate states can range in orderliness from highly ordered molten globules, whose structures can be solved by NMR (47, 48), to classic molten globules, whose thermal denaturation transitions are noncooperative. The molten globule of a-LA is a classic molten globule, characterized by NMR spectra with broad linewidths and very little chemical shift dispersion, as well as a linear, non-cooperative thermal denaturation. Moreover, proline scanning mutagenesis indicates that the overall fold in the molten globule of a-LA is assembled noncooperatively (15). Interestingly, the thermal denaturation of the a-LA molten globule appears to occur with no excess heat absorption as measured by calorimetry, suggesting that the hydrophobic core of the molten globule may be solvent-exposed, although this is in debate (49, 50, 51). Physiologic Significance of Molten Globules The biological relevance of molten globules has been a recent area of investigation. It is thought that molten globules are populated in vivo and that these forms of proteins may be relevant to some physiologic processes. Evidence indicates that molecular chaperones, proteins involved in preventing unproductive aggregation reactions in vivo, may recognize and bind to molten globules (52, 53). It is also thought that molten globules may be involved in membrane translocation (54). Interestingly, unliganded steroid hormone receptors, which are thought to be molten globules, are associated with molecular chaperones in the cell cytoplasm (55). A human lysozyme variant that forms amyloid fibrils similar to those seen in Alzheimer's disease is thought to aggregate via an intermediate similar to a molten globule (56). Finally, small molecule compounds that may enhance and stabilize molten globule formation are being developed and tested as drug delivery agents (57). It should be noted that the structural evidence linking the above cited phenomena with the molten globule state is limited. Further investigation may better elucidate in vivo roles of molten globules. Protein Design Protein design has been a useful tool with which to test the understanding of protein folding. Much effort has been put into designing a four helix bundle, a relatively simple symmetrical motif that nonetheless occurs often in natural proteins of biological importance (58, 59). Interestingly, protein design has been only partially successful, creating approximate structures that have the high conformational flexiblity of molten globules. The de novo design of a wellpacked, fully native protein has been elusive, despite numerous attempts using design or screening combinatorial libraries (60). Recently, helical with more native-like characteristics have been identified (61, 62). These are often stabilized by diverse sidechain packing complementarity and tertiary interactions. Thus, a remaining challenge is to understand the details required for fixed tertiary packing, a requisite for proper protein function. It is interesting that molten globule structures can be encoded by a large rational bundles bundles specific number of protein sequences, including those of limited amino acid diversity (63, 64). This suggests that the rules for adopting "low resolution" native-like structure may be relatively simple and has strong implications for protein fold recognition and prediction. One interesting hypothesis is that a binary patterning of hydrophobic and hydrophilic amino acid types is sufficient to determine protein folds. This has been tested experimentally by constructing libraries of patterned protein sequences designed to form four helix bundles (65). Interestingly, a large percentage of these sequences (60%) appear to form four helix bundles, albeit with molten globule properties. Futher analysis of the molten globules of naturally occurring proteins, combined with protein design, may elucidate the rules for a protein folding hierarchy, from molten globules to well-packed native structures. Recent Issues in Protein Folding Three issues in protein folding have arisen recently and will be described briefly. These are (i) the debate whether intermediates are on or off the folding pathway, (ii) the observation that some small proteins fold with very fast kinetics and no detectable intermediates, and (iii) the development of theoretical folding models based on structural ensembles and funnels. It has been assumed that intermediates assist folding by reducing the conformational space to be searched by the polypeptide chain and guiding a protein towards the final native structure. Intermediates have been shown to be kinetically competent for on-pathway folding (24, 66, 67). Moreover, analysis of kinetic refolding data for cytochrome c and ubiquitin suggests that a model incorporating an on-pathway intermediate better describes the experimental data than one with an off-pathway intermediate (22, 68). On the other hand, it has been suggested that intermediates are actually off-pathway, non-productive traps. This stems from the observation that the early kinetic intermediate of cytochrome c, which consists of interacting terminal helices, contains a nonnative misligated heme group (67). Misfolding around the misligated heme is thought to lead to an off-pathway intermediate which must be unfolded in order for productive folding to proceed (24). By extension, it is argued that many intermediates that have been detected experimentally may result from misfolding to a stable but off-pathway species. Even if intermediates are offpathway, studying their structure and determinants may still be useful in elucidating the protein folding code. It will be important to understand what factors determine properly folded and misfolded structure and how partitioning between productive and non-productive folding is achieved. The observation that small single domain proteins can fold with very fast two-state kinetics has been cited in support of the hypothesis that detectable intermediates are off-pathway and retard folding. Several small proteins have been shown to fold in microseconds with no observable kinetic or equilibrium intermediates (69, 70, 71). However, the lack of detectable intermediates does not imply that these proteins do not fold via pathways. Indeed, it would seem that very fast folding emphasizes the importance of pathways or other means of restricting conformational space during the folding process. For BPTI, the formation of a stable subdomain intermediate leads to a kinetically trapped metastable species that must be unfolded for further oxidative folding to proceed. This is consistent with the idea that intermediates slow down folding. On the other hand, mutations that destabilize folding intermediates of T4 lysozyme and cytochrome c slow the rate of folding, suggesting that these intermediates accelerate folding (21, 72). It should be noted that all documented fast folding proteins are small single domain species. Larger proteins may consist of multiple fast folding subdomains that dock and assemble in slower kinetic phases. Classical theories of protein folding envision folding similar to small molecule chemical reactions, with defined intermediate structures and transition states. A new "folding funnel" view has emerged from the consideration of statistical thermodynamics and structural ensembles (73, 74). In the new view, a polypeptide chain samples all of conformational space, partitioned based on energetics. A major difference between folding funnels and classical pathway views of folding is that in funnels there are no discrete species except the final native state. Thus, intermediates and transition states are described as structurally heterogeneous ensembles, as opposed to single species. An unfolded polypeptide can adopt a large number of isoenergetic structures. As folding proceeds, the polypeptide chain is funneled through structures of progressively lower free energy. At one extreme, folding proceeds through narrow energetic "troughs" leading to distinct intermediates at local energetic minima, similar to classical views of protein folding. At the other extreme, folding follows a smooth "downhill" reaction in which all paths lead to the final structure. In this case, a classical pathway view cannot describe the folding process. The Structure and Determinants of the a-Lactalbumin Molten Globule This thesis addresses a number of questions regarding the structure and determinants of molten globules, using the protein a-LA as a model system. A key issue regarding the structure of molten globules involves the organization of the secondary structural elements. While the original theoretical description of molten globules envisioned a native-like backbone topology, experiments on the a-LA molten globule have led to conflicting conclusions (75, 76). This issue is addressed and resolved by investigating the topology of each subdomain of a-LA independently in the molten globule form (Chapter 2). Further work defines local interactions important for establishing a native-like fold and assesses the extent of specific sidechain packing in the core of the a-LA molten globule (Chapters 3 and 4). An intriguing hypothesis regarding the minimal sequence information necessary to establish molten globule structure, namely patterning of amino acid types, is investigated in Chapter 5. Finally, the appendix describes experiments aimed at clarifying the structural features necessary for formation of rigid sidechain packing and fixed tertiary interactions in aX-LA, subsequent to molten globule formation. References 1. Dill, K. A. (1990) Biochemistry 29, 7133-7155. 2. Dill, K. A. (1985) Biochemistry 24, 1501-1509. 3. Levinthal, C. (1968) J. Chim. Phys. 65, 44-45. 4. Kim, P. S., and Baldwin, R.L. (1990) Annu. Rev. Biochem. 59, 631-660. 5. Karplus, M., and Weaver, D.L. (1994) Protein Sci. 3, 650-668. 6. Dill, K. A., Bromberg, S., Yue, K.Z., Fiebig K.M., Yee, D.P., Thomas, P.D., and Chan, H.S. (1995) Protein Sci. 4, 561-602. 7. Fersht, A. R. (1997) Curr. Opin. Struct. Biol. 7, 3-9. 8. Creighton, T. E. (1985) J. Phys. Chem. 89, 2452-2459. 9. Weissman, J. S., and Kim, P.S. (1991) Science 253, 1386-1393. 10. Staley, J. P., and Kim, P.S. (1990) Nature 344, 685-688. 11. Oas, T. G., and Kim, P.S. (1988) Nature 336, 42-48. 12. Weissman, J. S., and Kim, P.S. (1995) Nat. Struct. Biol. 2, 1123-1130. 13. Ptitsyn, O. B. (1992), ed. Creighton, T. E. (W.H. Freeman and Co., New York), pp. 243-300. 14. Kuwajima, K. (1989a) Proteins: Struc., Func., Genet. 6, 87-103. 15. Schulman, B. A., and Kim, P.S. (1996) Nat. Struct. Biol. 3, 682-687. 16. Ohgushi, M., and Wada, A. (1983) FEBS Lett. 164, 21-24. 17. Jeng, M.-F., Englander, S.W., Elove, G.A., Wand, A.J., and Roder, H. (1990) Biochemistry 29, 10433-10437. 18. Goto, Y., and Nishikiori, S. (1991) J. Mol. Biol. 222, 679-686. 19. Goto, Y., Calciano, L.J., and Fink, A.L. (1990) Proc. Natl. Acad. Sci. USA 87, 573-577. 20. Marmorino, J. L., and Pielak, G.J. (1995) Biochemistry 34, 3140-3143. 21. Colon, W., Elove, G.A., Wakem, L.P, Sherman, F., and Roder, H. (1996a) Biochemistry 35, 5538-5549. 22. Colon, W., and Roder, H. (1996b) Nat. Struct. Biol. 3, 1019-1025. 23. Roder, H., Elove, GA., and Englander, S.W. (1988) Nature 1988, 700-704. 24. Sosnick, T. R., Mayne, L., Hiller, R., and Englander, S.W. (1994) Nat. Struct. Biol. 3, 149-156. 25. Hughson, F. M., Wright, P.E., and Baldwin, R.L. (1990) Science 249, 1544-1548. 26. Barrick, D., and Baldwin, R.L. (1993) Protein Sci. 2, 869-876. 27. Loh, S. N., Kay, M.S., and Baldwin, R.L. (1995) Proc. Natl. Acad. Sci. USA 92, 5446-5450. 28. Jennings, P. A., and Wright, P.E. (1993) Science 262, 892-896. 29. Eliezer, D., Jennings, P.A., Wright, P.E., Doniach, S., Hodgson, K.O., and Tsuruta, H. (1995) Science 270, 487-488. 30. Kay, M. S., and Baldwin, R.L. (1996) Nat. Struct. Biol. 3, 439-445. 31. Fink, A. L., Calciano, L.S., Goto, Y., Nishimura, M., and Swedberg, S.A. (1993) Protein Sci. 2, 1155-1160. 32. Carra, J. H., Anderson, E.A., and Privalov, P.L. (1994) Protein Sci. 3, 952-959. 33. Carra, J. H., and Privalov, P.L. (1996) FASEB J. 10, 67-74. 34. Walkenhorst, W. F., Green, S.M., and Roder, H. (1997) Biochemistry 36, 57955805. 35. Dabora, J. M., and Marqusee, S. (1994) Protein Sci. 3, 1401-1408. 36. Dabora, J. M., Pelton, J.G., and Marqusee, S. (1996) Biochemistry 35, 1195111958. 37. Chamberlain, A. K., Handel, T.M, and Marqusee, S. (1996) Nat. Struct. Biol. 3, 782-787. 38. Raschke, T. M., and Marqusee, S. (1997) Nat. Struct. Biol. 4, 298-304. 39. Kuwajima, K. (1996) FASEB J. 10, 102-109. 40. Radford, S. E., Dobson, C.M., and Evans, P.A. (1992) Nature 358, 302-307. 41. Miranker, A., Radford, S.E., Karplus, M., and Dobson, C.M. (1991) Nature 349, 633-636. 42. Wu, L. C., Peng, Z.-y., and Kim, P.S. (1995) Nat. Struct. Biol. 2, 281-286. 43. Schulman, B. A., Kim, P.S., Dobson, C.M., and Redfield, C. (1997) Nat. Struct. Biol. 4, 630-634. 44. Alexandrescu, A. T., Evans, P.A., Pitkeathly, M., Baum, J., and Dobson, C.M. (1993) Biochemistry 32, 1707-1718. 45. Balbach, J., Forge, V., van Nuland, N.A.J., Winder, S., Hore, P.J., and Dobson, C.M. (1995) Nat. Struct. Biol. 2, 865-870. 46. Arai, M., and Kuwajima, K. (1996) Folding & Design 1, 275-287. 47. Redfield, C., Smith, R.A.G., and Dobson, C.M. (1994) Nat. Struct. Biol. 1994, 23-29. 48. Feng, Y., Sligar, S.G., and Wand, A.J. (1994) Nat. Struct. Biol. 1, 30-35. 49. Xie, D., Bhakuni, V., and Friere, E. (1991) Biochemistry 30, 10673-10678. 50. Yutani, K., Ogasahara, K., and Kuwajima, K. (1992) J. Mol. Biol. 228, 347-350. 51. Griko, Y. V., Friere, E., and Privalov, P.L. (1994) Biochemistry 33, 1889-1899. 52. Martin, J., Langer, T., Boteva, R., Schramel, A., Horwich, A.L, and Hartl, F.-U. (1991) Nature 352, 36. 53. Martin, J., Horwich, A.L., and Hartl, F.-U. (1992) Science 258, 995. 54. Bychkova, V. E., and Ptitsyn, O.B. (1993) Chemtracts - Biochem. and Molec. Biol. 4, 133-163. 55. Katzenellenbogen, J. A., and Katzenellenbogen, D.S. (1996) Chem. Biol. 3, 529536. 56. Booth, D. R., Sunde, M., Bellotti, V., Robinson, C.V., Hutchinson, W.L., Fraser, P.E., Howkins, P.N., Dobson, C.M., Radford, S.E., Blake, C.C.F., and Pepys, M.B. (1997) Nature 385, 787-793. 57. Milstein, S. J., Leipold, H., Sarubbi, D., Leone-Bay, A., Mlynek, G.M., Robinson, J.R., Kasimova, M., and Friere, E. submitted. 58. DeGrado, W. F., Raleigh, D.R., and Handel, T. (1991) Curr. Opin. Struct. Biol. 1, 984-993. 59. Kamtekar, S., and Hecht, M.H. (1995) FASEB J. 9, 1013-1022. 60. Betz, S. F., Raleigh, D.P., and DeGrado, W.F. (1993) Curr. Opin. Struct. Biol. 3, 601-610. 61. Betz, S. F., Bryson, J.W., and DeGrado, W.F. (1995) Curr. Opin. Struct. Biol. 5, 457-463. 62. Roy, S., Helmer, K.J., and Hecht, M.H. (1997) Fold. Des. 2, 89-92. 63. Davidson, A. R., and Sauer, R.T. (1994) Proc. Natl. Acad. Sci. USA 91, 21462150. 64. Davidson, A. R., Lumb, K.J., and Sauer, R.T. (1995) Nat. Struct. Biol. 2, 856864. 65. Kamtekar, S., Schiffer, J.M., Xiong, H., Babik, J.M., and Hecht, M.H. (1993) Science 262, 1680-1685. 66. Kuwajima, K., Mitani, M., and Sugai, S. (1989b) J. Mol. Biol. 206, 547-561. 67. Elove, G. A., Bhuyan, A.K., and Roder, H. (1994) Biochemistry 33, 6925-6935. 68. Roder, H., and Colon, W. (1997) Curr. Opin. Struct. Biol. 7, 15-28. 69. Jackson, S. E., and Fersht, A.R. (1991) Biochemistry 30, 10428-10435. 70. Schindler, T. S., Herrier, M., Marahrel, M.A., and Schmid, F.X. (1995) Nat. Struct. Biol. 2, 663-673. 71. Huang, G. S., and Oas, T.G. (1995) Proc. Natl. Acad. Sci. USA 1995, 6878-6882. 72. Lu, J., and Dahlquist, F.W. (1992) Biochemistry 31, 4749-4756. 73. Wolynes, P. G., Luthey-Schulten, Z., and Onuchic, J.N. (1996) Chem. Biol. 3, 425-432. 74. Dill, K. A., and Chan, H.S. (1997) Nat. Struct. Biol. 4, 10-19. 75. Ewbank, J. J., and Creighton, T.E. (1991) Nature 350, 518-520. 76. Peng, Z.-y., and Kim, P.S. (1994) Biochemistry 33, 2136-2141. CHAPTER 2 Bipartite Structure of the a-Lactalbumin Molten Globule Molten globules are thought to be general intermediates in protein folding. Apparently conflicting studies fail to clarify whether the best characterized molten globule, that of a-lactalbumin, resembles an expanded native-like protein or a nonspecific collapsed polypeptide. Here we find that the molten globule of a-lactalbumin consists of two conformationally distinct domains. The a-helical domain forms a helical structure with a native-like tertiary fold, while the p-sheet domain is largely unstructured. Thus, molten globules possess a native-like backbone topology, but do not necessarily encompass the entire polypeptide chain. Our studies indicate that molten globules provide an approximate solution and vast simplification of the protein folding problem. Molten globules are partially folded forms of proteins thought to be general protein folding intermediates, that may also be important in the folding and processing of proteins in vivo 1-6 . Despite numerous studies, the structure of molten globules and their significance to protein folding remain unclear. The best characterized molten globule is that of ca-lactalbumin 7-1 5 (a-LA), a twodomain protein. Studies of the molten globule of a-LA fail to resolve whether molten globules resemble expanded native-like proteins or nonspecific collapsed polypeptides. A model of the isolated a-helical domain in the molten globule form (a-Domain) strongly prefers the native disulfide pairings 15 , indicating that molten globules have a native-like backbone topology. However, the molten globule of intact a-LA with all eight cysteines fails to show a preference for the native disulfide pairings 14 , leading to the contradictory conclusion that molten globules have no preferred backbone topology. In order to clarify this apparent discrepancy, we have produced and studied two variants of human a-LA (Fig. la), allowing us to probe the conformational preferences of each domain in the molten globule form. The species denoted a-LA(a) contains the two disulfide bonds in the a-helical domain, 6-120 and 28-111, but lacks the p-sheet domain and the interdomain cysteines, which are replaced by alanines. Conversely, a-LA() contains the 61-77 disulfide bond in the p-sheet domain and the interdomain 73-91 disulfide bond, with the cysteines in the a-helical domain replaced by alanines. Structural Characterization Both a-LA(a) and a-LA() are molten globules, even at neutral pH. Each variant is a monomer at concentrations below 100 gM to within ±5% as determined by sedimentation equilibrium at pH 8.5 (see Fig. 1 legend). The farUV (Fig. ib) and near-UV (Fig. ic) CD spectra of each variant resemble that of the pH 2 molten globule of a-LA (A-state), and differ significantly from that of native a-LA. Thermal denaturation of each variant lacks a cooperative transition, resembling instead the transition observed for the A-state of a-LA (Fig. Id). Moreover, the near-UV CD signals of a-LA(a) and a-LA(P) are weak compared to those of the native protein, indicating an absence of extensive, tight side-chain packing. Backbone Topology In order to assess the tertiary topology of the a-helical and p-sheet domains in the molten globule of a-LA, we performed disulfide exchange studies. At equilibrium, the populations of disulfide species reflect the probability of forming specific disulfide pairings and thus reflect the backbone topology of the domain. In our experiments, identical populations were achieved regardless of the starting disulfide isomer, indicating that equilibrium was established. Control experiments carried out under denaturing conditions yield equilibrium populations that are in good agreement with a random walk model (Fig. 2). Under native conditions, rearrangement of a-LA(a) yields an equilibrium population containing predominantly the native disulfide species (Fig. 2a). This suggests that the a-helical domain of a-LA strongly prefers a native tertiary fold in the molten globule, consistent with previous studies of a-Domain 15 , a model for the isolated a-helical domain in the molten globule form. The preference of a-Domain for native disulfide pairings is not a result of the three glycine residues in the model that substitute for the p-sheet domain, since a-LA(a) shows the same equilibrium preferences. In contrast, rearrangement of a-LA(p) under native conditions yields significant amounts of non-native species (Fig. 2b). This suggests strongly that the p-sheet domain is predominantly disordered, with only a slight preference for the native disulfide pairings compared with a random walk model (see Fig. 2 legend). These conclusions regarding the tertiary topology of each domain are confirmed by CD spectroscopy of the non-native two-disulfide species. The nonnative isomers in the a-helical domain have significantly reduced helical content (Table 1), as observed previously for a-Domain 15 . Moreover, the helical content of each non-native disulfide species is less than that of reduced a-LA(a), indicating that formation of non-native backbone topologies is unfavorable. In contrast, the native and non-native disulfide isomers in the p-sheet domain have identical helical content (Table 1), and the far-UV, near-UV, and thermal denaturation transitions of all p-sheet domain variants are superimposable. These results are consistent with a largely unstructured p-sheet domain. Bipartite Structure Our results, taken together with previous work 15, 10, 11: (i) lead to a bipartite picture of the molten globule of a-LA in which the a-helical domain resembles an expanded native-like protein and the p-sheet domain is predominantly unfolded, and (ii)provide a likely resolution to the apparent discrepancy between the studies of the a-Domain 15 and intact a-LA 13, 14 molten globules. Although the a-helical domain of a-LA strongly prefers native disulfide pairings, this preference will likely be obscured by distribution amongst several equally favorable p-sheet domain disulfide pairings in the molten globule of intact a-LA with all eight cysteines. In addition, disulfide exchange involving the disordered p-sheet domain could further obscure a preference for native disulfide pairings in the a-helical domain. Unlike the NMR spectra of highly ordered molten globules, which are amenable to structure determination 16, 17, the NMR spectra of the A-state of a-LA 1, 9, 10 and our a-LA variants (data not shown) contain broad resonances and poor chemical shift dispersion. These characteristics indicate an absence of extensive, tight packing interactions. It seems likely that molten globules such as the A-state of a-LA correspond to earlier folding intermediates than highly ordered molten globules. Our results suggest that proteins can adopt a nativelike tertiary fold even in the absence of extensive specific tertiary interactions and side chain packing. Hydrogen exchange NMR studies of lysozyme indicate that the protein contains two structural domains that fold on kinetically distinct time scales, leading to an intermediate containing a predominantly folded a-helical domain and a relatively unstructured p-sheet domain 18, 19. A similar process may occur in the folding of a-LA. We propose that the A-state of a-LA and the corresponding kinetic folding intermediate 20, 21, 22 contain a molten globule ahelical domain and an unstructured p-sheet domain. Implications for Protein Folding Our results indicate that molten globule properties need not encompass the entire polypeptide chain, but can be attained independently by individual domains. This complements the observation that individual domains of multidomain proteins can fold independently 23, 24. One of the major questions of protein folding is how a polypeptide chain can fold quickly and efficiently to a unique structural solution 25 . Stepwise folding to the molten globule may deter global misfoldings. By adopting a native-like backbone topology, molten globules achieve most of the information transfer of the folding process, providing an approximate solution to the protein folding problem. This vastly simplifies the search for a unique folded conformation and reduces the energy barriers for minor structural rearrangements (Fig. 3). Our results suggest that a key to understanding protein folding is to resolve how a polypeptide chain acquires a native-like backbone topology in the absence of extensive specific tertiary contacts. METHODS Variants. a-Lactalbumin variants were produced by oligonucleotide-directed mutagenesis 2 6 of an expression plasmid for human aLA in the cloning vector pAED4 (ref. 27) called pALA. The gene was synthesized in eight segments on an Applied Biosystems DNA synthesizer, and ligated into Production of o-LA the BamHI/NdeI restriction site in pAED4. Mutations were verified by sequencing the entire a-LA gene. Each protein species was expressed in E. coli BL21 DE3 pLysS (ref. 28) and purified as described for a-Domain 15 . Protein identity was confirmed by laser desorption mass spectroscopy (Finigan Lasermat), and purity was assessed by analytical reverse-phase HPLC. All proteins were completely oxidized, as assayed by the lack of reaction with 5, 5'-dithio-bis(2nitrobenzoic acid) 2 9 in 6 M GuHC1. Disulfide bonds were assigned by digestion with pepsin 15 , followed by analytical reverse-phase HPLC. Peptide fragments with altered elution times upon reduction with DTT were purified and identified by mass spectroscopy, N-terminal sequencing, and amino acid analysis. The following fragments were identified for each species. a-LA(a) : (i) Phe3-Leull/ Trp118-Leu123 (disulfide 6-120) (ii) Ile27-Thr33/Trp104-Cysi11 (disulfide 28-111). a-LA(p) : (i) Trp60-Asn71/Ile75-Leu81 (disulfide 61-77) (ii) Trp60-Asp82/Ile89-Ile95 (disulfides 61-77 and 73-91). All non-native twodisulfide species except a-LA [6-111; 28-120] were prepared from the native twodisulfide species by disulfide exchange in 6 M GuHC1, 10 mM Tris, 0.5 mM EDTA, pH 8.5, using 25 gM protein, 226 gM reduced glutathione, and 50 gM oxidized glutathione. a-LA [6-111; 28-120] was prepared by disulfide exchange in 1 M GuHC1, 10 mM Tris, 0.5 mM EDTA, pH 8.5. Each species was isolated and purified by reverse-phase HPLC. The following disulfide fragments were identified for each non-native isomer. [61-73; 77-91] : (i) Trp60-Asp74 (disulfide 61-73) (ii) Ile75-Leu81/Ile89-Ile95 (disulfide 77-91). [61-91; 73-77] : (i) Trp60Asn71/Ile89-Asp97 (disulfide 61-91) (ii) Ile72-Phe80 (disulfide 73-77). [6-28; 111120] : (i) Thr4-Leull/Ile27-Phe31 (disulfide 6-28) (ii) Leu105-Leu123 (disulfide 111[6-111; 28-120] : (i) Phe3-Leu8/Trp104-Lys114 (disulfide 6-111) Phe31/Leu119-Leu123 (disulfide 28-120). 120). (ii) Ile27- Circular Dichroism (CD) Spectroscopy. CD spectroscopy was performed on an Aviv 62DS spectrometer equipped with a thermoelectric temperature controller. Samples were dissolved in 10 mM Tris, 0.5 mM EDTA, pH 8.5, and spectra were taken at 0 oC. Reduced a-LA(a) and a-LA(p) were dissolved in 10 mM Tris, 0.5 mM EDTA, 1 mM DTT, pH 8.5. Protein concentrations were determined by absorbance in 6 M GuHC1, 20 mM sodium phosphate, pH 6.5, using an extinction coefficient at 280 nm of 23,150 (ref. 30). Far-UV studies were performed in a 1 mm path length cuvette with a 1.5 nm bandwidth. Near-UV and thermal denaturation studies were performed in a 1 cm path length cuvette with a 3.0 nm bandwidth. Data for thermal denaturations were obtained at intervals of 20, allowing 1.5 minutes equilibration time between measurements, and were >95% reversible with no hysteresis. Helix content was calculated by the method of Chen, et al. 3 1, with a [0]222 of 38,700 for 100% helix. Sedimentation Equilibrium. Sedimentation equilibrium was performed on a Beckman XL-A analytical ultracentrifuge using an An-60 Ti rotor. Protein solutions were dialyzed overnight against 10 mM Tris, 0.5 mM EDTA, pH 8.5. Protein concentrations of 100, 40, and 15 gM were analyzed in Beckman 6-sector cells at 40 C. Data were collected at 23 and 27 krpm and processed using the XL-A analysis program Origin (Beckman) with a partial specific volume of 0.733, calculated from the amino acid composition 32 . The data for both c-LA(a) and aLA(p) fit well to a model for an ideal monomer, with no systematic deviation of the residuals. There is no concentration dependence of the observed molecular weight for either variant. Disulfide Exchange. Disulfide exchange studies were performed at room temperature in an anaerobic chamber (Coy Laboratory Products). Native buffer consisted of 10 mM Tris, 0.5 mM EDTA, pH 8.5. Denaturing buffer consisted of 6 M GuHC1, 10 mM Tris, 0.5 mM EDTA, pH 8.5. Exchange was initiated in buffers containing 100 gM reduced glutathione, 10 gM oxidized glutathione, and 5 gM protein. Samples were quenched with 10% formic acid (v/v) after approximately 24 hours of equilibration and analyzed by reverse-phase HPLC on a C 18 column (Vydac) with a linear H 2 0-acetonitrile gradient (0.09% per minute) containing 0.1% TFA. All peak areas are reproducible to within +2% and are identical regardless of the starting species. References 1. Ptitsyn, O.B. The Molten Globule State. in Protein Folding (Creighton, T.E., Ed.) pp. 243-300. (W.H. Freeman and Co., New York, 1992). 2. Bychkova, V.E., and Ptitsyn, O.B. The Molten Globule in Vitro and in Vivo. Chemtracts - Biochem. Mol. Biol. 4, 133-163 (1993). 3. Kuwajima, K. The Molten Globule State as a Clue for Understanding the Folding and Cooperativity of Globular-Protein Structure. Proteins: Struc., Func., Genet. 6, 87-103 (1989). 4. Haynie, D.T., and Friere, E. Structural Energetics of the Molten Globule State. Proteins: Struc., Func., Genet. 16, 115-140 (1993). 5. Christensen, H., and Pain, R.H. Molten Globule Intermediates and Protein Folding. Eur. Biophys. J. 19, 221-229 (1991). 6. Dobson, C.M. Unfolded Proteins, Compact States, and Molten Globules. Curr. Opin. Struct. Biol. 2, 6-12 (1992). 7. Kuwajima, K., Nitta, K., Yoneyama, M., and Sugai, S. Denaturation of a-Lactalbumin by Guanidine Hydrochloride. 106, 359-373 (1976). 8. Dolgikh, D.A., Gilmanshin, R.I., Brazhnikov, E.V., Bychkova, V.E., Semisotnov, G.V., Venyaminov, S.Y., and Ptitsyn, O.B. u-Lactalbumin: Compact State with Fluctuating Tertiary Structure. FEBS Letters 136, 311-315 Three-state J. Mol. Biol. (1981). 9. Dolgikh, D.A., Abaturov, L.V., Bolotina, I.A., Brazhnikov, E.V., Bychkova, V.E., Gilmanshin, R.I., Lebedev, Y.O., Semisotnov, G.V., Tikupulo, E.I., and Ptitsyn, O.B. Compact State of a Protein Molecule with Pronounced SmallScale Mobility: Bovine a-Lactalbumin. Eur. Biophys. J. 13, 109-121 (1985). 10. Baum., J., Dobson, C.M., Evans, P.A., and Hanley, C. Characterization of a Partly Folded Protein by NMR Methods: Studies on the Molten Globule State of Guinea Pig a-Lactalbumin. Biochemistry 28, 7-13 (1989). 11. Alexandrescu, A.T., Evans, P.A., Pitkeathly, M., Baum, J., and Dobson, C.M. Structure and Dynamics of the Acid-Denatured Molten Globule State of a-Lactalbumin: A Two-Dimensional NMR Study. Biochemistry 32, 17071718 (1993). 12. Xie., D., Bhakuni, V., and Freire, E. Calorimetric Determination of the Energetics of the Molten Globule Intermediate in Protein Folding: Apo-a-Lactalbumin. Biochemistry 30, 10673-10678 (1991). 13. Ewbank, J.J., and Creighton, T.E. The Molten Globule Protein Conformation Probed by Disulphide Bonds. Nature 350, 518-520 (1991). 14. Creighton, T.E., and Ewbank, J.J. Disulfide-Rearranged Molten Globule State of a-Lactalbumin. Biochemistry 33, 1534-1538 (1994). 15. Peng, Z.-Y., and Kim, P.S. A Protein Dissection Study of a Molten Globule. Biochemistry 33, 2136-2141 (1994). 16. Redfield, C., Smith, R.A.G., and Dobson, C.M. Structural Characterization of a Highly-Ordered "Molten Globule" at Low pH. Nature Struc. Biol. 1, 23-29 (1994). 17. Feng, Y., Sligar, S.G., and Wand, A.J. Solution Structure of Apocytochrome B562. Nature Struc. Biol. 1, 30-35 (1994). 18. Miranker, A., Radford, S.E., Karplus, M., and Dobson, C.M. Demonstration by NMR of Folding Domains in Lysozyme. Nature 349, 633-636 (1991). 19. Radford, S.E., Dobson, C.M., and Evans, P.A. The Folding of Hen Lysozyme Involves Partially Structured Intermediates and Multiple Pathways. Nature 358, 302-307 (1992). 20. Kuwajima, K., Hiraoka, Y., Ikeguchi, M., and Sugai, S. Comparison of the Transient Folding Intermediates in Lysozyme and a-Lactalbumin. Biochemistry 24, 874-881 (1985). 21. Ikeguchi, M., Kuwajima, K., Mitani, M., and Sugai, S. Evidence for Identity Between the Equilibrium Unfolding Intermediate and a Transient Folding A Comparitive Study of the Folding Reactions of aIntermediate: Lactalbumin and Lysozyme. Biochemistry 25, 6965-6972 (1986). 22. Gilmanshin, R.I., and Ptitsyn, O.B. An Early Intermediate of Refolding a-Lactalbumin Forms Within 2 ms. FEBS Letters 223, 327-329 (1987). 23. Wetlaufer, D.B. Nucleation, Rapid Folding, and Globular Intrachain Regions in Proteins. Proc. Natl. Acad. Sci. 70, 697-701 (1973). 24. Jaenicke, R. Protein Folding: Local Structures, Domain, Subunits, and Assemblies. Biochemistry 30, 3147-3161 (1991). 25. Levinthal, C. Are There Pathways for Protein Folding? J. Chim. Phys. 65, 4445 (1968). 26. Kunkel, T.A., Roberts, J.D., and Zakour, R.A. Rapid and Efficient SiteMethods in Specific Mutagenesis Without Phenotypic Selection. Enzymology 154, 367-382 (1987). 27. Doering, D.S. Functional and Structural Studies of a Small f-actin Binding Protein, (Massachusetts Institute of Technology Press, Cambridge, 1992). 28. Studier, F.W., Rosenberg, A.H., Dunn, J.J., and Dubendorff, J.W. Use of T7 RNA Polymerase to Direct Expression of Cloned Genes. Methods in Enzymology 185, 60-89 (1990). 29. Ellman, G.L. Tissue Sulfhydryl Groups. Arch. Biochem. Biophys. 82, 70-77 (1959). 30. Edelhoch, H. Spectroscopic Determination of Tryptophan and Tyrosine in Proteins. Biochemistry 6, 1948-1954 (1967). 31. Chen, Y.-H., Yang, J.T., and Chau, K.H. Determination of the Helix and p Form of Proteins in Aqueous Solution by Circular Dichroism. Biochemistry 13, 3350-3359 (1974). 32. Laue, T.M., Shah, B.D., Ridgeway, T.M., and Pelletier, S.L. Computer-Aided Interpretation of Analytical Sedmientation Data For Proteins. in Analytical Ultracentrifugation in Biochemistry and Polymer Science (Harding, S.E., et al., Eds.) pp. 90-125. (The Royal Society of Chemistry, Cambridge, 1992). 33. Acharya, K.R., Ren, J. Stuart, D.I., Phillips, D.C., and Fenna., R.E. Crystal Structure of Human a-Lactalbumin at 1.7 A Resolution. J. Mol. Biol. 221, 571-581 (1991). 34. Priestle, J.P. RIBBON: A Stereo Cartoon Drawing Program for Proteins. J. Appl. Crystallogr. 21, 572-576 (1988). 35. Kauzmann, W. in Sulfur in Proteins (Benesch, R., et al., Eds.) pp. 93-108. (Academic Press, New York, 1959). Acknowledgements We thank Jun Sakai of the Whitehead Institute and Richard Cook of the Biopolymers Laboratory for peptide sequencing and amino acid analysis. also thank Pehr Harbury, Dan Minor, Martha Oakley, Brenda Schulman, members of the Kim lab for helpful advice and discussion. This work supported by grant GM41307 from the National Institutes of Health. MIT We and was a-LA Species x-LA Species [0]222 (deg cm 22 dmol-1) [O]222 (deg cm dmol-1 ) Helix Content -10,300 -7,100 -4,900 -7,400 27% -10,800 -10,800 -10,900 28% a-LA() [6-120; 28-111] (Native) [6-111; 28-120] [6-28; 111-120] Reduced a-LA(a) a-LA(3) [61-77; 73-91] (Native) [61-73; 77-91] [61-91; 73-77] Reduced a-LA() -9,300 Helix Content 18% 13% 19% 28% 28% 24% Table 1: a-Helix content of native and non-native disulfide isomers of a-LA(a) and a-LA(p) as determined by CD spectroscopy. The numbers in brackets denote the disulfide bonded residues in each species. Figure 1: a-LA(a) and a-LA(1) resemble the molten globule A-state of a-LA and differ from native a-LA. (a) Schematic representation of human a-LA 3 3 produced with the program RIBBON 34 . The a-helical domain (white) consists of residues 1-37 and 86-123, and contains the four a-helices. The p-sheet domain (shaded) consists of residues 38-85, and contains two short p-strands and several loop structures. a-LA(a) contains the two disulfide bonds in the a-helical domain (white), with the p-sheet domain and interdomain cysteines replaced by alanines. a-LA(p) contains the disulfide bond in the p-sheet domain and the interdomain disulfide bond (black), with the cysteines in the a-helical domain replaced by alanines. (b) Far-UV circular dichoism (CD) spectra. The CD spectra indicate that each variant contains approximately 30% helix (see Table 1 legend), close to the value expected if the helices in the a-helical domain are structured. (c) Near-UV CD spectra. (d) Thermal denaturation monitored at 222 nm. Figure 2: Disulfide exchange studies of a-LA(a) and a-LA(p). (a) a-LA(a) under native and denaturing conditions. (b) a-LA(p) under native and denaturing conditions. The expected equilibrium populations (calculated with a random walk model35 ) for a-LA(a) and a-LA(p) under denaturing conditions are given in parentheses in the figure. The numbers in brackets denote the disulfide-bonded residues in each species. Schematic diagrams of the free energy landscape for an unfolded polypeptide, the molten globule, and the native protein. Formation of the molten globule with a native tertiary fold vastly simplifies the conformational search for the native state and reduces the energy barriers for minor structural Figure 3: rearrangements. Figure 1 b) a) 10 ,-I0 0 -10 0 ti3 -20 -30 200 220 240 260 Wavelength (nm) d) c) 100 ao-LA(P) 50 ao-LA, pH 2 0 h a-LA(a)~ . 0 ei6 'd 0 -50 0 (X) Nd -100 tC' ° 0 Qj Cf) 0 -00 S~~a -150 * 0 0 0 Native oa-LA 0* -200 * 0 S * 6~ -250 I I 260 I 280 I I 300 Wavelength (nm) I1 320 0 10 20 30 40 50 60 70 80 Temperature (C) Figure 2 b) oa-LA(p) a) oa-LA() Native Conditions NATIVE [6-120; 28-111] 89% [6-28;111-120] [-28;111-120] 7% [6-111; 28-120] 4% I I I I I 45 50 55 60 65 ' 70 I 75 Time (min) Denaturing Conditions [6-28; 111-120] 95% (96%) NATIVE [6-120; 28-111] 3% [6-111; 28-120] (2%) 2% (2%) 45 50 55 60 65 Time (min) 70 75 Figure 3 Unfolded Molten Globule Native Conformational Space CHAPTER 3 Local Structural Preferences in the x-Lactalbumin Molten Globule Molten globules have been proposed to be general intermediates in protein folding. Despite numerous studies, a detailed description of the structure of a molten globule remains elusive. Recently, we showed that the molten globule formed by the helical domain of a-lactalbumin (a-LA) has a native-like backbone topology. Here we probe local structural preferences in the helical domain of the a-LA molten globule by analyzing a set of native and non-native single disulfide bond variants using a combination of circular dichroism spectroscopy and determination of the equilibrium constant for disulfide bond formation. We find that the region surrounding the 28-111 disulfide bond has a high preference to adopt a native-like structure. Formation of other native or non-native disulfide bonds is significantly less favorable. Our results suggest that molten globules contain regions with varying degrees of specificity for native-like structure and that the core region surrounding the 28-111 disulfide bond plays an important role in (c-LA folding by stabilizing the molten globule intermediate. INTRODUCTION Many proteins fold via molten globule intermediates that are characterized by compactness, near-native levels of secondary structure, an absence of rigid, specific side-chain packing, and a non-cooperative thermal denaturation (for reviews, see Ptitsyn, 1987; Kuwajima, 1989; Christensen & Pain, 1991; Ptitsyn, 1992; Haynie & Freire, 1993). Classic molten globules, such as those formed by a-lactalbumin (a-LA) 1, carbonic anhydrase, and p-lactamase have high conformational mobility and a low degree of side-chain ordering that preclude structure determination at atomic resolution 2 . The best studied molten globule is that of a-LA (Kuwajima et al., 1976; Nozaka et al., 1978; Dolgikh et al., 1981; 1985; Kuwajima et al., 1985; Ikeguchi et al., 1986; Baum et al., 1989; Ewbank & Creighton, 1991; Xie et al., 1991; Alexandrescu et al., 1993; Peng & Kim, 1994). a-LA is a two-domain protein, consisting of an a-helical domain and a P-sheet domain (Figure 1). Recently, we showed that a model of the isolated a-helical domain of a-LA (called a-Domain) forms a molten globule with a native-like tertiary fold (Peng & Kim, 1994). Subsequent studies show that the molten globule of intact a-LA has a bipartite structure. The a-helical domain adopts a native-like backbone topology, whereas the P-sheet domain remains largely unstructured (Wu et al., 1994). These results indicate that, even though it lacks extensive side chain packing interactions, the molten globule formed by the a-helical domain of a-LA retains much of the native protein's structural specificity (i.e., the ability of the polypeptide chain to distinguish the native structure from numerous nonnative structures). To investigate the origin of this specificity, we have produced six single-disulfide variants of a-LA, corresponding to all possible native and non-native disulfide bonds in the a-helical domain. Each single-disulfide 1 ABBREVIATIONS a-LA, a-lactalbumin; CD, circular dichroism; Ceff, effective concentration; HPLC, high performance liquid chromatography; DTT, dithiothreitol; GuHC1, guanidine hydrochloride; GSH, reduced glutathione; GSSG, oxidized glutathione; [0]222, mean residue ellipticity at 222 nm 2 Molten globules have not been crystallized successfully. The NMR spectra of classic molten globules are broad and lack chemical shift dispersion, inhibiting direct assignment. In contrast, "highly ordered molten globules" are partially folded species that yield high resolution NMR spectra. The structures of two highly ordered molten globules have been solved recently (Feng et al., 1994; Redfield et al., 1994) and closely resemble that of the corresponding native protein. variant is characterized by circular dichroism (CD) spectroscopy and determination of the effective concentration (Ceff) for disulfide bond formation. Ceff is the ratio of equilibrium constants for intra- and intermolecular disulfide reactions and reflects the extent that specific interactions within the polypeptide chain favor the formation of a particular disulfide bond (Page & Jencks, 1971; Creighton, 1983; Lin & Kim, 1989; 1991). Thus, the structural specificity of local regions surrounding specific disulfide bonds can be quantitated and compared by Ceff measurements. MATERIAL AND METHODS Cloning, protein expression and purification. Human a-LA was expressed from a plasmid denoted pALA which contains a synthetic gene using the most frequently occurring E. coli codons cloned into the T7-polymerase-based expression vector pAED4 (Studier et al., 1990; Doering, 1992). Single disulfide variants of a-LA were produced by oligonucleotide-directed mutagenesis of the Mutations were confirmed by wild-type gene (Kunkel et al., 1987). dideoxynucleotide sequencing. The recombinant u-LA has an additional N-terminal methionine, as determined by amino acid sequencing and laser desorption mass spectrometry. Wild-type recombinant a-LA retains full activity in stimulating the galactose transfer reaction (Fitzgerald et al., 1970) and gives the same CD spectra as commercial human a-LA obtained from Sigma (Peng and Kim, unpublished results). Proteins were expressed in E. coli BL21 (DE3) pLysS (Studier et al., 1990) and purified from inclusion bodies by ion exchange chromatography as described previously (Peng & Kim, 1994). Briefly, fractions containing a-LA were combined, reduced by DTT, and dialyzed against 5% acetic acid. Reduced a-LA was purified from the dialyzed material by reversed-phase HPLC on a Vydac C18 column with an H 2 0-acetonitrile gradient. Reduced, lyophilized a-LA was oxidized in 4 M GuHC1, 0.2 M Tris, pH 8.5, at room temperature for 48-72 hours. The oxidized a-LA was purified further by reversed-phase HPLC. Circular dichroism and equilibrium sedimentation. CD studies were performed on an AVIV 62DS circular dichroism spectrometer equipped with a thermoelectric temperature controller. Far-UV CD spectra were taken at O0C in 10 mM Tris, 0.5 mM EDTA, pH 8.5 with 20 tM protein, a 1 mm path length cuvette, and a 1.5 nm bandwidth. For near-UV studies, 40 jtM protein, a 1 cm path length cuvette, and a 5 nm bandwidth were used. Thermal denaturations were monitored at 222 nm in a 1 cm path length cuvette. All denaturation Protein concentrations were curves were greater than 90% reversible. determined by the absorbance at 280 nm in 6M GuHC1 (Edelhoch, 1967). Equilibrium sedimentation was performed in a Beckman XLA analytical ultracentrifuge. Protein solutions were dialyzed extensively against 10 mM Tris, 0.5 mM EDTA, pH 8.5. Three concentrations (-7 jtM, -20 gM, and -~60 tM) were analyzed at 4°C and the data were collected at three wavelengths (chosen between 230 and 285 nm) for two rotor speeds of 25000 and 30000 rpm. The apparent molecular weights were determined using the program NONLIN (courtesy of M. L. Johnson and J. Lary, University of Connecticut) with a partial specific volume of 0.729 for human c-LA (Laue et al., 1992). Effective concentration. Effective concentration (Ceff) measurements were performed in an anaerobic chamber (Coy Laboratory Products). Native buffer contains 10 mM Tris, 1 mM EDTA, pH 8.5; denaturing buffer contains 10 mM Tris, 1 mM EDTA, 7.5 M GuHC1, pH 8.5. All buffers were degassed and stored under anaerobic conditions. Solutions of GSSG (sodium salt), GSH, and lyophilized proteins were made fresh in native buffer. The concentration of GSSG was determined spectroscopically at 248 nm using an extinction coefficient of 382 M- 1 cm- 1 (Huyghues-Despointes & Nelson, 1990). The concentration of GSH was determined by reaction with Ellman's reagent followed by measurement of the absorbance at 412 nm, using an extinction coefficient of 14150 M- 1 cm- 1 (Ellman, 1959; Riddles et al., 1983). The Ceff measurements were initiated by diluting the protein stock solution to a 5 gtM final concentration in redox buffer containing GSSG and GSH. The final reaction mix contains 10 mM After Tris, 1 mM EDTA, and for denaturing conditions, 6 M GuHC1. equilibration periods of 7-8 and 22-24 hours, aliquots of the sample were quenched by adding an equal volume of 10% acetic acid, and were analyzed by reversed-phase HPLC. For each variant, 4-6 independent measurements were made under at least two redox conditions. In each case, the results were the same within experimental error starting from either oxidized or reduced proteins, confirming that the reactions were at equilibrium. RESULTS Sedimentation equilibrium studies indicate that all of the a-LA variants studied here are monomers at concentrations between 7 and 60 gM at pH 8.5 with no additional salt. The apparent molecular weights of our (-LA variants measured at 7 gM and 20 gM were between 13.3 and 15.3 kDa, in close agreement with the expected molecular weight of the a-LA monomer (14.2 kDa). At higher concentration (60 gM), the apparent molecular weight typically decreases by 8-15% due to the interaction with the counterion gradient. All subsequent experiments were carried out under these conditions. Far-UV CD spectroscopy indicates that our single-disulfide variants of a-LA have different degrees of secondary structure (Figure 2A). As judged by near-UV CD spectroscopy, however, all variants lack the rigid, extensive side chain packing characteristic of native (-LA (Figure 2B). In addition, unlike native a-LA, all variants exhibit non-cooperative thermal denaturation (data not shown). We used far-UV CD spectroscopy to examine the effect on secondary structure of introducing native or non-native disulfide bonds in the a-helical domain of a-LA (Figure 2A)3 . Only one variant, which contains the native 28111 disulfide, exhibits a higher level of helix content than the reference protein, "all-Ala (-LA" (a variant in which all cysteines are replaced by alanines). Conversely, formation of a non-native disulfide bond with Cys 28 is particularly disruptive to the secondary structure. These results suggest that the 28-111 disulfide bond is important for maintaining the structure of the u-LA molten globule. All other single-disulfide variants, including the variant with the native 6-120 disulfide bond, have levels of secondary structure below that of all-Ala (-LA. Curiously, the variant with the non-native 6-111 disulfide exhibits a slightly higher [0]222 than the variant with the native 6-120 disulfide, possibly because of strain associated with the 6-120 disulfide bond (see below). In order to investigate the specificity of local regions for native-like structure, we measured the Ceff for disulfide bond formation, with glutathione as the reference thiol. Control measurements under strongly denaturing 3 Both aromatic side chains and disulfide bonds can produce CD signals in the far-UV region, although the contribution from disulfide bonds is usually small (Woody, 1985; Manning, 1989). The absence of substantial near-UV CD signals in our variants strongly suggests that the aromatic and disulfide contribution to the far-UV CD signal will be minimal. We therefore assume that the dominant contribution to [0]222 is from secondary structure. conditions (Table 1, third column) reflect the intrinsic probability for disulfide bond formation in the unfolded a-LA chain. These values agree well (Figure 3) with those predicted by the random walk theory of a unstructured polymer chain (Kauzmann, 1959), indicating that the Cys to Ala substitutions do not significantly perturb the unfolded state. Similar results have been observed previously for other unfolded proteins (Muthukrishnan & Nall, 1991). Under native conditions, the Ceff for forming the native disulfide bond, 28-111, is at least an order of magnitude greater than the Ceff for forming any other native or non-native disulfide bond (Table 1, second column). In contrast, the Ceff for forming the native 6-120 disulfide bond is in the same range as for forming a non-native disulfide bond. This apparent lack of preference for forming the native 6-120 disulfide bond is consistent with the observation that the 6-120 disulfide bond is geometrically strained in the native a-LA structure and is hypersensitive towards reduction (Shechter et al., 1973; Kuwajima et al., 1990). It is particularly interesting that the Ceff of 28-111 is more than tenfold higher than the Ceff of any other disulfide bond, including 28-120 and 6-111. Remarkably, the structural specificity of the polypeptide chain is sufficiently high to allow Cys 28 to distinguish Cys 111 from Cys 120, even though these cysteines are separated by only nine amino acid residues. In addition, Cys 28 and Cys 111 are located far apart in the primary sequence, residing in different secondary structure elements and separated by the f-sheet domain. Thus, the high Ceff of 28-111 must result from interactions between residues that are distant in the sequence, rather than local structural propensities. The ratio of Ceff's under native and denaturing conditions gives, by thermodynamic linkage, the free energy of stabilization between the oxidized and reduced states (Creighton, 1983; Lin & Kim, 1991). Figure 4 shows the ratios for all six single-disulfide oa-LA variants. The ratio for the native 28-111 disulfide bond is greater than 1000, strongly suggesting that this disulfide is stabilized by specific intramolecular interactions. On the other hand, the ratios for all other native and non-native disulfide bonds are much smaller. The enhancement of the Ceff under native conditions (50-100 fold) for those disulfide bonds that are far apart in the primary sequence (6-120, 6-111, and 28-120) may result from the formation of a compact hydrophobically collapsed state, which brings distant cysteines closer together than in the unfolded random coil. DISCUSSION The striking result of our studies is that the Ceff for formation of the native 28-111 disulfide bond is more than ten times higher than the Ceff for formation of any other native or non-native disulfide bond. The c-LA variant with the 28-111 disulfide bond is the only variant with substantially more secondary structure than the all-Ala reference protein. On the other hand, all a(-LA variants with non-native disulfide bonds involving Cys28 have significantly less secondary structure than all-Ala a-LA. Taken together, these results suggest that the local region surrounding the 28-111 disulfide bond has a high preference for adopting a native-like structure in the a-LA molten globule. Unlike that of the native 28-111 disulfide, the Ceff for formation of the other native disulfide, 6-120, is within the same range as that of the non-native disulfide bonds. This indicates that the 6-120 disulfide bond is not stabilized by specific interactions or that the stabilizing interactions are offset by unfavorable interactions such as disulfide bond strain. (The 6-120 disulfide bond can still have a non-specific stabilization effect on the molten globule by reducing the chain entropy of the unfolded state; see Ikeguchi et al., 1992). Thus, molten globules can be heterogeneous, containing regions which have varying degrees of specificity towards native-like structure. A rough quantitation of the local structural specificity around the 28-111 disulfide bond in the ua-LA molten globule can be made. The Ceff's for each of the non-native disulfide bonds in the helical domain of oa-LA fall within a narrow range (50 ± 20 mM), which can be taken as a representative Ceff value in the non-specific hydrophobically collapsed protein. The Ceff for forming the native 28-111 disulfide bond is, however, twenty times higher than the average Assuming, to a first value for forming a non-native disulfide bond. approximation, that the ratio of Ceff between native and non-native disulfide bonds reflects the difference in free energy between native and non-native structures, our results suggest that formation of the native-like structure around the 28-111 disulfide bond contributes -2 kcal/mole of free energy to the stability of the molten globule relative to species with non-native structures. The single-disulfide variant of a-LA with the native 28-111 disulfide bond (denoted a-LA[28-111]) is likely to be a useful model system for future studies of molten globules. Previous studies of the a-LA molten globule are complicated by the presence of multiple disulfide isomers and disulfide exchange reactions. a-LA[28-111] exhibits the characteristics of a molten globule under a broad range of conditions, including near neutral pH. The single disulfide bond simplifies protein production and folding, and can be used as a reporter of backbone topology in Ceff studies. Two observations may explain why the 28-111 disulfide bond has a significantly higher Ceff than other disulfide bonds. First, the 28-111 disulfide bond is encompassed by regions that have a high content of helical secondary structure (helices B, C, and D are all nearby). Second, this disulfide is buried and is located near many hydrophobic residues. Our results suggest that hydrophobic interactions between regions with high local secondary structure propensities may play an important role in protein folding, even in the absence of extensive side chain packing interactions, by stabilizing molten globule intermediates. ACKNOWLEDGMENTS We thank B. A. Schulman for many helpful suggestions and discussions and members of the Kim group for comments on the manuscript. REFERENCES Acharya, K. R., Ren, J., Stuart, D. I. & Phillips, D. C. (1991) J. Mol. Biol. 221, 571581. Acharya, K. R., Stuart, D. I., Walker, N. P. C., Lewis, M. & Phillips, D. C. (1989) J. Mol. Biol. 208, 99-127. Alexandrescu, A. T., Evans, P. A., Pitkeathly, M., Baum, J. & Dobson, C. M. (1993) Biochemistry 32, 1707-1718. Baum, J., Dobson, C. M., Evans, P. A. & Hanley, C. (1989) Biochemistry 28, 7-13. Christensen, H. & Pain, R. H. (1991) Eur. Biophys. J. 19, 221-229. Creighton, T. E. (1983) Biopolymer 22, 49-58. Doering, D. S. (1992) Functional and Structural Studies of a Small f-actin Binding Domain, Massachusetts Institutes of Technology, (1992). Dolgikh, D. A., Abaturov, L. V., Bolotina, I. A., Brazhnikov, E. V., Bychkova, V. E., Gilmanshin, R. I., Lebedev, Y. O., Semisotnov, G. V., Tiktopulo, E. I. & Ptitsyn, O. B. (1985) Eur. Biophys. J. 13, 109-121. Dolgikh, D. A., Gilmanshin, R. I., Brazhnikov, E. V., Bychkova, V. E., Semisotnov, G. V., Venyaminov, S. Y. & Ptitsyn, O. B. (1981) FEBS Lett. 311-315. Edelhoch, H. (1967) Biochemistry 6, 1948-1954. Ellman, G. L. (1959) Arch. Biochem. Biophys. 82, 70-77. Ewbank, J. J. & Creighton, T. E. (1991) Nature 350, 518-520. Feng, Y.-q., Sligar, S. G. & Wand, A. J. (1994) Nature Structural Biology 1, 30-35. Fitzgerald, D. K., Colvin, B., Mawal, R. & Ebner, K. E. (1970) Analytical Biochemistry 36, 43-61. Haynie, D. T. & Freire, E. (1993) Proteins Struct. Funct. Genet. 16, 115-140. Huyghues-Despointes, B. M. P. & Nelson, J. W. (1990) in Current Research in Protein Chemistry pp 457-466, Academic Press, New York. Ikeguchi, M., Kuwajima, K., Mitani, M. & Sugai, S. (1986) Biochemistry 25, 69656972. Ikeguchi, M., Sugai, S., Fujino, M., Sugawara, T. & Kuwajima, K. (1992) Biochemistry 31, 12695-12700. Kauzmann, W. (1959) in Sulfur in Proteins (Benesch, R., Benesch, R.E., Boyer, P.D., Klotz, I.M., Middlebrook, W.R., Szent-Gyorgyi, A.G. & Schwarz, D.R., Eds.) pp 93-108, Academic Press, New York. Kunkel, T. A., Robert, J. D. & Zakour, R. A. (1987) Methods in Enzymology 154, 367-382. Kuwajima, K. (1989) Proteins Struct. Funct. Genet. 6, 87-103. Kuwajima, K., Hiraoka, Y., Ikeguchi, M. & Sugai, S. (1985) Biochemistry 24, 874881. Kuwajima, K., Ikeguchi, M., Sugawara, T., Hiraoka, Y. & Sugai, S. (1990) Biochemistry 29, 8240-8249. Kuwajima, K., Nitta, K., Yoneyama, M. & Sugai, S. (1976) J. Mol. Biol. 106, 359373. Laue, T. M., Shah, B. D., Ridgeway, T. M. & Pelletier, S. L. (1992) in Analytical Ultracentrifugation in Biochemistry and Polymer Science (Harding, S.E., Rowe, A.J. & Horton, J.C., Eds.) pp 90-125, The Royal Society of Chemistry, Cambridge. Lin, T.-Y. & Kim, P. S. (1989) Biochemistry 28, 5282-5287. Lin, T.-Y. & Kim, P. S. (1991) Proc. Natl. Acad. Sci. USA 88, 10573-10577. Manning, M. C. (1989) J. Pharm. Biomed. Anal. 7, 1103-19. Muthukrishnan, K. & Nall, B. T. (1991) Biochemistry 30, 4706-4710. Nozaka, M., Kuwajima, K., Nitta, K. & Sugai, S. (1978) Biochemistry 17, 37533758. Page, M. I. & Jencks, W. P. (1971) Proc. Natl. Acad. Sci. USA 68, 1678-1683. Peng, Z.-y. & Kim, P. S. (1994) Biochemistry 33, 2136-2141. Priestle, J. P. (1988) J. Appl. Cryst. 21, 572-576. Ptitsyn, O. B. (1987) J. Protein Chem. 6, 273-293. Ptitsyn, O. B. (1992) in Protein Folding (Creighton, T.E., Ed.) pp 243-300, W. H. Freeman and Co., New York. Redfield, C., Smith, R. A. G. & Dobson, C. M. (1994) Nature Structual Biology 1, 23-29. Riddles, P. W., Blakeley, R. L. & Zerner, B. (1983) Methods in Enzymology 91, 4960. Shechter, Y., Patchornik, A. & Burstein, Y. (1973) Biochemistry 12, 3407-3413. Studier, F. W., Rosenberg, A. H., Dunn, J. J. & Dubendorff, J. W. (1990) Methods in Enzymology 185, 60-89. Woody, R. W. (1985) in The Peptides: Analysis, Synthesis, Structure (Hruby, V.J., Ed.) pp 15-114, Academic Press, Orlando. Wu, L. C., Peng, Z.-y. & Kim, P. S. (1994) Submitted. Xie, D., Bhakuni, V. & Freire, E. (1991) Biochemistry 30, 10673-10678. Effective concentrations of disulfide bond formation in the helical TABLE 1 domain of a-LAt oa-LA variant Ceff in native buffer (mM) Ceff in denaturing buffer (mM) [28-111] (1.07 ± 0.11) x 103 0.94 ± 0.13 [6-120] 46.7 ± 4.1 0.54 ± 0.08 [6-28] 62.4 + 3.3 9.1 + 0.7 [6-111] 29.1 ± 4.0 0.48 ± 0.13 [28-120] 71.7 ± 1.7 0.90 ± 0.09 [111-120] 40.0 ± 1.7 23.1 + 1.3 tEach data point consists of 4-6 independent measurements, taken after 7-8 hrs of equilibration, starting from both oxidized and reduced proteins, for at least two different redox conditions. The Ceff's obtained after 22-24 hrs equilibration are typically 10-15% higher; otherwise, the pattern of Ceff's remains the same. FIGURE LEGENDS Schematic representation of a-LA (Priestle, 1988). The a-helical Figure 1 domain (residues 1-37 and 85-123) and the two disulfide bonds in the a-helical domain (28-111 and 6-120) are shaded (Acharya et al., 1989; 1991). CD spectra of six single disulfide variants. (A) Far-UV CD spectra of [6-28] (open circles), [6-111] (open squares), [6-120] (filled squares), [28-111] (filled circles), [28-120] (open diamonds), and [111-120] (open triangles). Filled symbols represent variants with a native disulfide, whereas open symbols represent Figure 2 variants with a non-native disulfide. The reference for comparison of secondary structure is "all-Ala a-LA", an a-LA variant in which all cysteines are replaced by alanines (pluses). (B) Near-UV CD spectra of the same set of proteins. All variants show much weaker near-UV CD signals as compared to native (-LA (inverse filled triangles), indicating a disrupted side-chain packing. The effective concentration for formation of an intramolecular Figure 3 disulfide bond in the helical domain of a-LA under denaturing conditions, plotted as a function of the number of amino acid residues in the intervening loop. The line represents a single parameter fit to the random walk model of an unfolded polypeptide chain (Kauzmann, 1959), where Ceff = an - 3 / 2 and n-1 is the number of non-cysteine residues in the loop. The best fit is obtained when a = 694 mM. The ratio of Ceff between native and denaturing conditions (Lin & Figure 4 Kim, 1991) for each of the six possible disulfides in the helical domain of (-LA. The large ratio for the 28-111 disulfide bond suggests that the interactions between amino acid residues in the vicinity of this disulfide are largely responsible for stabilizing the molten globule of a-LA. Figure 1 28-111 73-91 61-77/ N 6-120 5 .. --- . . . . ................................... .............. 8+ O V- E is - CD O +i 0B + : o . 1..... -15 + ..... ................................ -20 190 220 210 200 240 230 250 260 270 Wavelength (nm) 50 0 I- o E -50 -100 o 1 PD -150 -200 ", . . . . - 250 250 260 : v i . . . 270 . . 280 . 290 . . . . . . . 300 Wavelength (nm) . . . . 310 . . . . 320 330 Figure 3 30 25 i I ) 20 40 60 80 100 120 Figure 4 1200 -1.14 x10 3 1000 800 600 Z a= 400 () 200 0 6.9 """""""""""""8~ 8 ~~ laarrrrrr 1.7 [28-111] [6-120] [28-120] [6-111] Disulfide Bond [6-28] [111-120] CHAPTER 4 A Specific Hydrophobic Core in the a-Lactalbumin Molten Globule ABSTRACT Molten globules are partially structured protein folding intermediates that adopt a native-like overall backbone topology in the absence of extensive detectable tertiary interactions. It is important to determine the extent of specific tertiary structure present in molten globules and to understand the role of specific sidechain packing in stabilizing and specifying molten globule structure. Previous studies indicate that a small degree of specific core sidechain packing stabilizes the structures of the cytochrome c, apomyoglobin, and staphylococcal nuclease molten globules. Here we investigate the extent of specific sidechain packing in the molten globule of (-lactalbumin (a-LA), a highly fluctuating, non-cooperatively formed molten globule. By analyzing a set of point mutations in the helical domain of cx-LA, we have identified a stabilizing hydrophobic core. Moreover, this core corresponds to a previously identified structural subdomain and contains some native-like packing interactions. Our results indicate that native-like packing of core amino acids helps stabilize molten globules and suggest that some specific interactions can exist in even highly dynamic, fluctuating species. INTRODUCTION Molten globules are compact, partially structured forms of proteins thought to be general intermediates in protein folding (1-3). Molten globules have high levels of native-like secondary structure arranged in an overall native-like fold, but lack rigid sidechain packing and extensive detectable tertiary interactions. As such, molten globules can be viewed as a "low-resolution solution" to the protein folding problem. It is important to understand how an overall native-like fold can be established in the apparent absence of extensive specific tertiary interactions. One extreme possibility is that molten globule structure is formed by non-specific collapse around a core of dynamic, loosely interacting hydrophobic residues. In this case, specific packing interactions may be minimal or non-existent, and molten globule structure may be determined by the global distribution of hydrophobic and hydrophilic amino acids, as opposed to specific amino acid identity. Another extreme possibility is that undetected, highly specific tertiary interactions stabilize molten globules. In this case, only a few key residues may determine molten globule structure. The extent of specific native packing that exists in the molten globules of apomyoglobin, cytochrome c, and staphylococcal nuclease has been assessed by studying point mutants of hydrophobic core residues (4-9). In these studies, the effects of mutations on the stabilities of the molten globule states are small, suggesting that tertiary interactions are only partially formed in the molten globule. Nonetheless, the effects of point mutations on the stabilities of the native and molten globule states are correlated. Moreover, even conservative mutations can have measurable effects. These results suggest that a small degree of specific native-like packing stabilizes and helps determine the structure of the molten globule. The molten globule folding intermediate of a-lactalbumin (a-LA), a two domain protein containing four disulfide bonds, consists of a native-like helical domain and a largely unstructured beta sheet domain (10-13). Molten globules can range in orderliness from highly dynamic species with poor NMR chemical shift dispersion and noncooperatively formed structure, to highly ordered species with substantial NMR chemical shift dispersion and cooperatively formed structure. The oa-LA molten globule is highly dynamic, yielding NMR spectra with broad linewidths and poor chemical shift dispersion. Moreover, formation of the a-LA molten globule is non-cooperative, as judged by proline scanning mutagenesis and denaturation transitions monitored at global and residue-specific levels (14-16). Furthermore, thermal denaturation of the a-LA molten globule is accompanied by little excess heat absorption, suggesting that the core of the molten globule may be solvent exposed, although this is in debate (17-19). The extent of specific packing interactions in such a dynamic fluctuating species is unclear, but may be elucidated by systematic mutagenesis in the core of the molten globule. There are two hydrophobic cores in the structure of native a-LA (Fig. 1; 20). One, called the hydrophobic box, comprises residues from the C- and D-helices and the beta sheet domain (20). Another comprises residues from the A-, B-, and 3 1 0 -helices. Measurements of the equilibrium constants for formation of native and non-native disulfide bonds in the helical domain of the a-LA molten globule indicate that the region around the 28-111 disulfide bond plays an important stabilizing role (21). Although the 28-111 disulfide bond connects the B- and D-helices, which lie near the hydrophobic box, no direct evidence indicates that the hydrophobic box forms the core of the molten globule. On the other hand, NMR studies delineate a structural subdomain in the a-LA molten globule, comprising the A-, B-, and 310-helices (15). Here we analyze a collection of point mutants in the helical domain of a-LA, using the equilibrium constant for formation of the 28-111 disulfide bond to monitor effects on the stability of the molten globule. We find that residues from the A/B/3 1 0 subdomain, as opposed to the hydrophobic box, form the major stabilizing hydrophobic core. Moreover, this core contains specific nativelike packing interactions that may help specify the native-like structure of the a-LA molten globule. METHODS Protein Production. Full length za-LA 2 8-111 and variants thereof were produced as described previously (21). In summary, mutations were introduced by singlestranded mutagenesis and verified by DNA sequencing. Inclusion bodies of expressed proteins were washed with sucrose and Triton buffers, solubilized and reduced in urea/DTT, and purified by anion exchange chromatography and reverse phase HPLC. Reduced proteins were oxidized in 4 M GdnHC1, 0.1 M Tris, pH 8.8, for 48 hours at room temperature and further purified by reverse phase HPLC. Protein identity was confirmed by mass spectrometry. Sedimentation Equilibrium. Sedimentation equilibrium experiments were performed on a Beckman XL-A analytical ultracentrifuge as described previously (21). Protein solutions were dialyzed overnight against 10 mM Tris, 0.5 mM EDTA, pH 8.8, loaded at an initial concentration of 20 tM, and analyzed at 23 and 27 krpm. Data were acquired at three wavelengths per rotor speed and processed simultaneously using a non-linear least squares fitting routine (Nonlin; 22). Solvent density and protein partial specific volume were calculated according to solvent and protein composition, respectively (23). The data fit well to a model for an ideal monomer (±5%), with no systematic deviation of the residuals. Circular Dichroism (CD) Spectroscopy. CD spectroscopy was performed with an Aviv 62 DS spectrometer as described previously (21). Proteins were dissolved in 10 mM Tris, 0.5 mM EDTA, pH 8.8, to a concentration of 10 jiM, and spectra were acquired at O0C. Protein concentrations were determined by absorbance at 280 nm in 6 M GuHC1, 20 mM sodium phosphate, pH 6.5, using an extinction coefficient calculated based on tryptophan, tyrosine, and cystine content (24). Effective concentration measurements (Ceff) were performed in an anaerobic chamber (Coy Laboratory Products). Native buffer consists of 10 mM Tris, 0.5 mM EDTA, pH 8.8. Denaturing buffer consists of 6 M GdnHC1, 10 mM Tris, 0.5 mM EDTA, pH 8.8. All buffers were Effective Concentration Measurements. Solutions of oxidized degassed and stored under anaerobic conditions. glutathoine (GSSG), reduced glutathoine (GSH), and protein were prepared fresh in buffer. The concentration of GSSG was determined spectroscopically at 248 nm using an extinction coefficient of 382 M- 1 cm - 1 (25). The concentration of GSH was determined by reaction with Ellman's reagent, followed by measurement of the absorbance at 412 nm, using an extinction coefficient of 14,150 M-l1 cm - 1 (26). Samples of 5 jM protein in native or denaturing buffer were equilibrated for 28, 72, and 96 hours (native buffer) or 12 and 24 hours (denaturing buffer), starting from both oxidized and reduced proteins. Native and denaturing buffers contained GSH and GSSG in concentrations that established a glutathione reference redox of 1 M and 0.5 M (native buffer) or 2 5 mM (denaturing buffer), where the reference redox is defined as [GSH] /[GSSG]. Reactions were quenched by addition of 20% (v/v) acetic acid and analyzed by reverse phase HPLC. The amounts of oxidized and reduced proteins were determined by integrating peak areas. RESULTS Mutagenesis Scheme We examined a collection of point mutations in and around the hydrophobic cores of a-LA, using the native structure of a-LA as a guide to probe the molten globule state (Fig. 1). One hydrophobic core, the hydrophobic box, contains residues from the C- and D-helices of the helical domain and parts of the beta-sheet domain. Two central residues in the hydrophobic box are 195 and W104, which lie in the C-helix and just proximal to the D-helix, respectively. Both residues are fully buried in the native structure and make extensive contacts with each other. The other hydrophobic core in a-LA consists of residues from the A-, B-, and 310-helices, associated with the A/B/3 10 subdomain. Three central residues in this core are L8, 127, and W118, which lie in the A-, B-, and 3 10-helices, respectively. Thus, we examined a series of mutants at these and other nearby positions (Q10, M30, and E116). Our reference molecule is a-LA28-111, a single disulfide variant of a-LA containing only the 28-111 disulfide bond, with all other cysteines changed to alanine. This variant has been characterized previously and forms a molten globule with properties similar to that of the wildtype a-LA molten globule with all four disulfide bonds (21). Global Structure Since molten globules are prone to aggregation, we determined the oligomerization state of a-LA 2 8 - 1 11 and its variants using sedimentation equilibrium. Under native buffer conditions, all species studied here are monomeric (Table 1). Therefore, the effects of mutations in a-LA28-111 do not result from aggregation. We examined the global structure of a-LA 2 8-111 mutants by determining and comparing far-UV CD spectra. All mutants of a-LA 28 -1 11 have substantial helix content (Fig. 2). Some variants have small decreases in helix content as compared to the a-LA 28-11 1 reference, indicated by decreases in the magnitude of the CD signal at 222 nm (Table 1). Since the a-LA molten globule is formed non-cooperatively, it is unclear whether the decreased helix content results from global destabilization of the molecule or local helix unfolding. Proline scanning mutagenesis and NMR studies indicate that single helices in the a-LA molten globule can unfold independently of structure in the remainder of the molecule (14, 15). The point mutations examined here may have similar effects. Interestingly, for a series of mutants at a given site, changes in helix content are roughly correlated with destabilizing effects assessed by the effective concentration for formation of the 28-111 disulfide bond. Effective Concentration Delineated Core Structure We investigated the effects of mutations on the stability of the c-LA molten globule by measuring the effective concentration for formation of the 28-111 disulfide bond (Ceff), using glutathione as a reference thiol. Ceff is the ratio of equilibrium constants for intra- and inter-molecular disulfide reactions and reflects the extent that specific interactions in the polypeptide chain favor the formation of the disulfide bond (27-30). Previous studies indicate that nativelike structure in the o-LA molten globule significantly enhances the Ceff of 28-111 (21). Thus, we assessed the effects of mutations in the a-LA molten globule by quantitating and comparing Ceff measurements. It is important to establish that effects on the Ceff of 28-111 due to mutations do not arise from changes in the unfolded state. Control experiments carried out under denaturing conditions indicate that the Ceff of the 28-111 disulfide bond is unchanged in the unfolded state, regardless of mutations (Table 2). Under native conditions, mutations in the hydrophobic cores of the oa-LA molten globule have markedly different effects (Table 2). Mutations of 195 and W104 in the hydrophobic box cause small decreases (up to 20%) in the Ceff of 28-111, despite significant changes in sidechain size and hydrophobicity. In contrast, mutations of residues L8, 127, and W118 in the A/B/3 10 subdomain cause substantial decreases in the Ceff of 28-111 (up to 80%). Importantly, mutations of surface residues Q10 and E116, which lie on the outside of the A/B/3 10 subdomain (A- and 310-helices, respectively), do not destabilize the molten globule. Thus, the A/B/3 10 core plays the major role in stabilizing the a-LA molten globule, while the hydrophobic box may be largely unstructured. Moreover, buried hydrophobic interactions, as opposed to surface interactions, stabilize molten globule structure. Interestingly, mutations of M30, a fully buried B-helix residue that makes contacts with both the hydrophobic box and the A/B/3 10 subdomain in the native a-LA structure, cause significant decreases in the Ceff of 28-111, suggesting that M30 is associated with the A/B/3 10 subdomain. Specific Packing Interactions We assessed the specificity of packing interactions in the A/B/310 core by systematically mutating L8, 127, M30, and W118 to larger, smaller, and similarly sized hydrophobic amino acids. If the A/B/3 10 core consists of highly dynamic, loose hydrophobic interactions, then the core might be expected to accommodate Instead, neither larger nor smaller considerable structural variation. hydrophobic amino acids are well tolerated at residue 127, causing substantial decreases in the Ceff of 28-111. Moreover, even a relatively conservative mutation of 127 to leucine causes a significant decrease (28%) in the Ceff of 28-111. Similar effects are observed at residues L8 and M30, where neither larger, smaller, nor similarly sized amino acids are well accommodated. Thus, only wildtype residues comprising the A/B/310 hydrophobic core are well tolerated by the molten globule structure, suggesting that the core of the a-LA molten globule contains specific native-like packing interactions. Interestingly, mutation of 195 in the hydrophobic box to either larger or smaller amino acids has negligible effects on the stability of the a-LA molten globule. The 195F mutation may even slightly stabilize the molten globule. These effects are consistent with an unstructured or loosely organized hydropobic box. In addition, a I27A, W118A double mutant shows nearly additive effects of each single mutation (the Ceff of the double mutant is nearly the product of the Ceff's for each single mutant, and the decrease in CD signal at 222 nm for the double mutant is nearly the sum of the decreases for each single mutant). This is consistent with non-cooperative formation of the hydrophobic core. DISCUSSION Hydrophobic Core Location We have investigated the hydrophobic core structure of the a-LA molten globule by systematic mutagenesis of specific amino acids, using the Ceff of the 28-111 disulfide bond to assess effects on stability. Of the two hydrophobic cores in the helical domain of a-LA, the one associated with the A/B/310 subdomain plays the predominant role in stabilizing the molten globule. Mutations of residues in this core cause substantial decreases in the Ceff of the 28-111 disulfide bond. In contrast, mutations in the hydrophobic box have little or no impact on the Ceff of 28-111. Our results are consistent with other studies that assess core structure in the molten globules of a-LA and lysozyme, a protein homologous to a-LA. In one study, alanine scanning mutagenesis of all hydrophobic residues in the helical domain of the a-LA molten globule, using Ceff of the 28-111 disulfide as a measure of stability, describes a core structure similar to that deduced here (31). In another study, NMR measurements of backbone amide hydrogen exchange in a lysozyme molten globule delineate a protected native-like hydrophobic core similar to that observed in this study (32). Our results complement a residue-specific NMR study of the denaturation of the wildtype a-LA molten globule (with all four disulfide bonds) (15). In this study, the regions of structure most resistant to unfolding correspond to the A/B/3 10 subdomain and encompass L8, 127, M30, and W118, the residues shown here to play a substantial role in stabilizing the a-LA molten globule. In contrast, residues 195 and W104 (i.e., the hydrophobic box) are not part of the structural subdomain defined by NMR, since they unfold at lower concentrations of denaturant prior to disruption of the A/B/3 10 subdomain. We find that 195 and W104 tolerate a wide variety of amino acid substitutions with little or no effect on the stability of the a-LA molten globule. Taken together, the NMR studies and our Ceff studies indicate that the hydrophobic box is poorly formed in the a-LA molten globule, resulting in loosely associated and marginally stable structure. Native-like Topology of the Molten Globule Notably, our mutagenesis is based on the native structure of a-LA and involves residues from distant parts of the polypeptide chain brought together by the tertiary fold. It is striking that our mutagenesis studies are consistent with this structural model. This extends to the observation that buried core residues, as opposed to surface residues, play the dominant stabilizing role in the molten globule, as has been observed in native proteins (33, 34). Previous studies indicate that the helical domain in the ca-LA molten globule contains native-like helices arranged in a native-like fold (10-12). Our results provide additional evidence that molten globules have a native-like fold and suggest that nativelike structure helps stabilize the molten globule. Specific Native-like Packing Our results indicate that specific native-like packing exists in the core of the a-LA molten globule. Neither larger nor smaller residues are well-tolerated in the A/B/3 10 core. Moreover, even conservative amino acid mutations cause significant destabilization of the molten globule. Specific native-like packing has also been observed in the molten globules of apomyoglobin, cytochrome c, and staphylococcal nuclease (4-9). Significantly, the a-LA molten globule is the most dynamic of these species, showing evidence for highly fluctuating structure and non-cooperative folding. It is interesting that specific native-like packing and a native-like fold can exist in even a highly fluctuating molten globule such as that of a-LA, which may correspond to an early folding intermediate in which hydrophobic collapse, but not formation of extensive fixed tertiary structure, has Our studies support structural models of molten globules in which secondary structure comprising the core scaffold of the protein is formed in the molten globule state, while loop and peripheral regions are flexible and occurred. disordered (1, 2). Energetics For a non-cooperatively folded molten globule, traditional methods of assessing stability, which model denaturation transitions assuming two-state unfolding, are not useful. Ceff measurements can circumvent this problem since, by thermodynamic linkage, the ratio of Ceff in the native versus unfolded protein is related to the difference in the free energy of folding between the oxidized and reduced states, regardless of the nature of the unfolding transition (27, 28). Since the mutations studied here have no effect on Ceff in the unfolded state, one can obtain what is likely a lower limit for the effect of a particular mutation on the folding of the oxidized protein by comparing the Ceff of the wildtype and mutant proteins under native conditions (Fig. 3). Using this approximation, the most destabilizing point mutation observed here (I27A, which yields an approximately 80% decrease in Ceff of 28-111) corresponds to ~1 kcal/mole loss in stabilization between the oxidized and reduced states, a lower limit for the effect of the mutation on the oxidized state. For comparison, a similar mutation of a fully buried core hydrophobic residue in a native protein can yield a three to five kcal/mole loss in stabilization (34). Thus, the majority of the native-like packing interactions that we observe in the o-LA molten globule, which yield significantly smaller decreases in Ceff of 28-111 than I27A, may be relatively small in magnitude. ACKNOWLEDGEMENTS We thank Zheng-yu Peng, Brenda Schulman, Debra Ehrgott, Michael Root, and members of the Kim lab for helpful discussions. REFERENCES 1. Ptitsyn, O. B. 1992. The Molten Globule State. In Protein Folding, ed. T. E. Creighton, 243-300. New York: W.H. Freeman and Co. 2. Kuwajima, K. 1989. The Molten Globule State in Protein Folding. Proteins: Struct., Func., and Genet. 6: 87-103. 3. Dobson, C. M. 1992. Unfolded Proteins, Compact States, and Molten Globules. Curr. Opin. Struct. Biol. 2: 6-12. 4. Colon, W., and Roder, H. 1996. Kinetic Intermediates in the Formation of the Cytochrome c Molten Globule. Nat. Struct. Biol. 3: 1019-1025. 5. Colon, W., Elove, G.A., Wakem, L.P., Sherman, F., and Roder, H. 1996. Side Chain Packing of the N- and C-Terminal Helices Plays a Critical Role in the Kinetics of Cytochrome c Folding. Biochemistry. 35: 5538-5549. 6. Kay, M. S., and Baldwin, R.L. 1996. Packing Interactions in the Apomyoglobin Folding Intermediate. Nat. Struct. Biol. 3: 439-445. 7. Marmorino, J. L., and Pielak, G.J. 1995. A Native Tertiary Interaction Stabilizes the A State of Cytochrome c. Biochemistry. 34: 3140-3143. 8. Lin, L., Pinker, R.J., Forde, K., Rose, G.D., and Kallenbach, N.R. 1994. Molten Globular Characteristics of the Native State of Apomyoglobin. Nat. Struct. Biol. 1: 447-452. 9. Carra, J. H., Anderson, E.A., and Privalov, P.L. 1994. Thermodynamics of Staphylococcal Nuclease Denaturation. II. The A-State. Protein Sci. 3: 952-959. 10. Wu, L. C., Peng, Z.-y., and Kim, P.S. 1995. Bipartite Structure of the a-Lactalbumin Molten Globule. Nat. Struct. Biol. 2: 281-286. 11. Schulman, B. A., Redfield, C., Peng, Z.-y., Dobson, C.M., and Kim, P.S. 1995. Different Subdomains are Most Protected From Hydrogen Exchange in the Molten Globule and Native States of Human a-Lactalbumin. J. Mol. Biol. 253: 651-657. 12. Alexandrescu, A. T., Evans, P.A., Pitkeathly, M., Baum, J., and Dobson, C.M. 1993. Structure and Dynamics of the Acid-Denatured Molten Globule State of a-Lactalbumin: A Two-Dimensional NMR Study. Biochemistry. 32: 1707-1718. 13. Kuwajima, K. 1996. The Molten Globule State of ca-Lactalbumin. FASEB J. 10: 102-109. 14. Schulman, B. A., and Kim, P.S. 1996. Proline Scanning Mutagenesis of a Molten Globule Reveals Non-Cooperative Formation of a Protein's Overall Topology. Nat. Struct. Biol. 3: 682-687. 15. Schulman, B. A., Kim, P.S., Dobson, C.M., and Redfield, C. 1997. A ResidueSpecific NMR View of the Non-Cooperative Unfolding of a Molten Globule. Nat. Struct. Biol. 4: 630-634. 16. Shimizu, A., Ikeguchi, M., and Sugai, S. 1993. Unfolding of the Molten Globule State of cx-Lactalbumin Studied by 1 H NMR. Biochemistry. 32: 13198-13203. 17. Pfeil, W., Bychkova, V.E., and Ptitsyn, O.B. 1986. Physical Nature of the Phase Transition in Globular Proteins: Calorimetric Study of Human a-Lactalbumin. FEBS Lett. 198: 287-291. 18. Xie, D., Bhakuni, V., and Freire, E. 1991. Calorimetric Determination of the Energetics of the Molten Globule Intermediate in Protein Folding: Apo-a-Lactalbumin. Biochemistry. 30: 10673-10678. 19. Yutani, K., Ogasahara, K., and Kuwajima, K. 1992. Absence of the Thermal Transition in Apo-a-Lactalbumin in the Molten Globule State. J. Mol. Biol. 228: 347-350. 20. Acharya, K. R., Ren, J., Stuart, D.I., Phillips, D.C., and Fenna, R.E. 1991. Crystal Structure of Human a-Lactalbumin at 1.7A Resolution. J. Mol. Biol. 221: 571-581. 21. Peng, Z.-y., Wu, L.C., and Kim, P.S. 1995. Local Structural Preferences in the a-Lactalbumin Molten Globule. Biochemistry. 34: 3248-3252. 22. Johnson, M. L., Correia, J.J., Yphantis, D.A., and Halvorson, H.R. 1981. Analysis of Data From the Analytical Ultracentrifuge by Nonlinear LeastSquares Techniques. Biophys. J. 36: 575-588. 23. Laue, T. M., Shah, B.D., Ridgeway, T.M., and Pelletier, S.L. 1992. ComputerAided Interpretation of Analytical Sedimentation Data for Proteins. In Analytical Ultracentrifugation in Biochemistry and Polymer Science, 90-125. Cambridge: The Royal Society of Chemistry. 24. Edelhoch, H. 1967. Spectroscopic Determination of Tryptophan and Tyrosine in Proteins. Biochemistry. 6: 1948-1954. 25. Chau, M.-H., and Nelson, J.W. 1991. Direct Measurement of the Equilibrium Between Glutathione and Dithiothreitol by High Performance Liquid Chromatography. FEBS Lett. 291: 296-298. 26. Ellman, G. L. 1959. Tissue Sulfhydryl Groups. Arch. Biochem. Biophys. 82: 70-77. 27. Lin, T.-Y., and Kim, P.S. 1989. Urea Dependence of Thiol-Disulfide Equilibria in Thioredoxin: Confirmation of the Linkage Relationship and a Sensitive Assay for Structure. Biochemistry. 28: 5282-5287. 28. Lin, T.-Y., and Kim, P.S. 1991. Evaluating the Effects of a Single Amino Acid Substitution on Both the Native and Denatured States of a Protein. Proc. Natl. Acad. Sci. USA. 88: 10573-10577. 29. Creighton, T. E. 1983. An Empirical Approach to Protein Conformation Stability and Flexibility. Biopolymers. 22: 44-58. 30. Page, M. I., and Jencks, W.P. 1971. Entropic Contributions to Rate Accelerations in Enzymic and Intramolecular Reactions and the Chelate Effect. Proc. Natl. Acad. Sci. USA. 68: 1678-1683. 31. Song, J., Bai, P., Luo, L, and Peng, Z.-y. Determinants of the Native Backbone Topology in a Protein Folding Intermediate. Submitted. 32. Morozova, L. A., Haynie, D.T., Arico-Muendel, C., Van Deal, H., and Dobson, C.M. 1995. Structural Basis of the Stability of a Lysozyme Molten Globule. Nat. Struct. Biol. 2: 871-875. 33. Bowie, J. U., Reidhaar-Olson, J.R., Lim, W.A., and Sauer, R.T. 1990. Deciphering the Message in Protein Sequences: Tolerance to Amino Acid Substitutions. Science. 247: 1306-1310. 34. Matthews, B. W. 1995. Studies on Protein Stability with T4 Lysozyme. Adv. Protein Chem. 46: 249-278. TABLE 1 Structural Characterization of a-LA 28 -111 and Point Mutants a-LA 2 8 -111 Variant Aggregation Statet -[0]222 (deg cm 2 dmol- 1 ) Wildtype 1.02 13,000 L8F L8I L8A 0.98 1.02 1.03 12,200 10,900 10,100 I27W I27F I27L I27A 1.06 0.96 0.96 0.95 11,200 10,700 13,000 11,800 W118L W118A 0.96 0.95 11,100 12,700 I27A + W118A 1.03 10,800 Q10A 1.00 12,200 E116A 0.97 13,100 M30F M30A 1.07 0.98 12,600 12,100 W104A 1.02 13,000 195F 195A 0.98 1.03 12,400 12,500 tAggregation state is determined by sedimentation equilibrium and is denoted as the ratio of the observed to the expected molecular weight. TABLE 2 Effective Concentrations of Disulfide Bond Formation in a-LA 28 -1 11 a-LA28-111 Variant Ceff in Native Buffer (mM) Ceff in Denaturing Buffer (mM) Wildtype 1100 + 80 1.09 ± 0.08 L8F L8I L8A 740 + 100 990 + 80 530 + 20 1.13 + 0.07 1.07 + 0.05 1.11 + 0.06 I27W I27F I27L I27A 420 580 790 210 + 30 + 50 + 50 + 20 1.07 ± 0.01 1.01 + 0.03 1.01 ± 0.08 1.01 + 0.02 W118L W118A 650 + 30 560 ± 30 0.93 + 0.04 1.11 + 0.04 I27A + W118A 180 ± 10 0.95 ± 0.02 Q10A 1130 80 1.06 ± 0.08 E116A 1380 + 100 1.11 + 0.06 M30F M30A 850 ± 70 630 + 30 1.19 ± 0.05 0.94 + 0.04 W104A 940 + 40 1.01 + 0.13 195F 195A 1230 + 90 1100 + 50 1.06 + 0.11 0.97 ± 0.13 FIGURE LEGENDS FIGURE 1: (a) Schematic representation of the structure of a-lactalbumin showing the polypeptide backbone and the 28-111 disulfide bond. The five major helices in the helical domain are indicated. (b) Depiction of the wildtype residues in oa-LA that are mutated in this study. The molecule is oriented as in (a) with the sidechains of residues in the hydrophobic box (195, W104; red) and A/B/3 1 0 subdomain (L8, Q10, 127, M30, E116, W118; green) shown as space-filling representations. FIGURE 2: Circular dichroism spectra of wildtype a-LA 2 8-111 and selected point mutants indicate that all molecules have substantial helix content, although some mutations cause a small decrease in overall helix content. Shown are a-LA28-111 (plus signs), W104A (open circles), W118A (filled circles), I27A (open squares), and the double mutant I27A+W118A (filled squares). W104 is in the hydrophobic box, while 127 and W118 are in the A/B/3 10 subdomain. FIGURE 3: A thermodynamic box showing the linkage relationship between Ceff of wildtype and mutant proteins in the native and unfolded states, and the folding free energies of the oxidized and reduced wildtype and mutant proteins. The front thermodynamic cycle is that of the wildtype protein, and the rear thermodynamic cycle (denoted with superscript primes) is that of the mutant protein. K1 and K2 are the equilibrium constants for formation of a disulfide bond in the native and unfolded proteins, respectively. K3 and K4 are the equilibrium constants for folding of the reduced and oxidized proteins, respectively. Since Ceff under denaturing conditions is unchanged between wildtype and mutant proteins, the ratio of Ceff of the mutant protein to Ceff of the wildtype protein under native conditions is related to the difference in folding free energies of the oxidized and reduced states between the mutant and wildtype proteins. That is, Ceff,native,mutant / Ceff,native,wildtype =( Ceff,nat,mutant/Ceff,den,mutant) / (Ceff,nat,wildtype / Ceff,den,wildtype) = (K1 '/K 2 ') / (K1 /K 2 ) = (K4 '/K 3 ) / (K4 /K 3 ) ~ AGfolding(ox-red),mutant - AGfolding(ox-red),wildtype = AGfolding(mutant-wildtype),ox - AGfolding(mutant-wildtype),red Thus, Ceff,mutant/Ceff,wildtype in native buffer is a lower limit for the destabilizing effect of a mutation on the oxidized protein, assuming that folding and disulfide bond formation are coupled such that the two processes can be treated as linked interactions (29). In the case of linked interactions, folding of the oxidized protein will be more favorable than folding of the reduced protein (K4 > K3). A key assumption is that mutations will have similar, proportional effects on the oxidized and reduced states, such that greater structural destabilization yields greater decreases of Ceff in the native state. This is likely to be true if mutations do not preferentially interact with the disulfide in either the oxidized or reduced state, such as by forming specific interactions with reduced cysteine thiols or physically obstructing disulfide bond formation (see also discussion of multiple cooperative interactions in 29). Note that a rigorous analysis of Ceff values in native buffer requires that both the oxidized and reduced proteins be fully folded in native buffer. If the oxidized and/or reduced protein in native buffer is an equilibrium of folded and unfolded conformations, then Ceff in native buffer is a combination of Ceff,native and Ceff,denatured. In this case, calculated changes in free energy, using Ceff in native buffer as Ceff,native, will be lower limits of actual effects. Figure 1 Figure 2 20 10 60 Cl Co 0 -10 -20 200 210 220 230 Wavelength (nm) 240 250 Figure 3 K1' Is NS N ISH SH / ' III IK4 K 31 t 1 K2' is US 2 u K1 N SH SH ISH SH / / ! N K3 , U SH SH Ceff = U S 9 K2 [Ps] [GSH] 2 K, Ceff N K4 [p H] [GSSG] K Ceff U K 3 CHAPTER 5 Hydrophobic Sequence Minimization of the xo-Lactalbumin Molten Globule ABSTRACT The molten globule, a widespread protein folding intermediate, can attain a native-like backbone topology, even in the apparent absence of rigid sidechain packing. Nonetheless, mutagenesis studies suggest that molten globules are stabilized by some degree of sidechain packing amongst specific hydrophobic residues. Here we investigate the importance of hydrophobic sidechain diversity in determining the overall fold of the a-lactalbumin molten globule. We have replaced all of the hydrophobic amino acids in the sequence of the helical domain with a representative amino acid, leucine. Remarkably, the minimized molecule forms a molten globule that retains many structural features characteristic of a native o-lactalbumin fold. Thus, non-specific hydrophobic interactions may be sufficient to determine the global fold of a protein. The molten globule is a general protein folding intermediate characterized by compact but dynamic structure (1-3) arranged in a native-like tertiary fold (4). The molten globule can be considered a low resolution protein structure in which the global fold, but not unique native structure, has been established. As such, the molten globule represents a system in which to study the determinants of a protein's overall fold, uncoupled from the determinants of fixed tertiary packing. A key issue is to understand how the molten globule can distinguish between native and non-native backbone topologies in the apparent absence of extensive fixed tertiary interactions. Hydrophobic interactions are thought to play a dominant role in protein folding, and the close complementary packing of hydrophobic sidechains in a protein's core may help specify a protein's fold (5-7). Mutagenesis studies indicate that specific packing of core sidechain residues exists in the molten globules of cytochrome c, apomyoglobin, and staphylococcal nuclease (8-13). In these studies, a correlation is observed between mutational effects on the stabilities of the native and molten globule states. However, the effects of mutations on the molten globule are small in comparison to that on the native state, suggesting that packing interactions may be only partially formed in the molten globule. These results have been interpreted to indicate that some specific native-like packing, as opposed to non-specific hydrophobic interactions, stabilizes the structure of molten globules (14). In contrast, other studies suggest that non-specific hydrophobic interactions may be sufficient to produce tertiary folds. Computer lattice models and structure assessment algorithms based solely on a non-specific hydrophobic interaction can model some aspects of protein structure, stability, and folding kinetics and can distinguish between native and non-native folds (15-18). In addition, proteins composed of only glutamine, leucine, and arginine can form helical oligomers, although their detailed structures are unknown (19, 20). Interestingly, proteins with cores of predominantly one amino acid form folded structures with varying degrees of order (21-24). In one case a well-packed native structure is formed, of which a high resolution structure has been determined by X-ray crystallography (23). Here we use the molten globule formed by the helical domain of a-lactalbumin (a-LA) as a model system to investigate the role of specific hydrophobic amino acids in determining backbone topology. a-LA is a twodomain protein that forms one of the most widely studied molten globules (25- 32). Significantly, the ax-LA molten globule is highly dynamic and noncooperatively folded and may represent an earlier-formed intermediate than other well-studied molten globules. In particular, NMR spectra of the oa-LA molten globule lack significant chemical shift dispersion (26, 32), suggesting that rigid sidechain packing is minimal. Previous experiments indicate that the molten globule of a-LA is a bipartite structure in which the helical domain adopts a native-like tertiary fold, while the f-sheet domain is largely unstructured (26, 30). A recombinant model of the isolated helical domain (a-Domain) is a molten globule that retains a native-like fold (28). To test the importance of hydrophobic sidechain diversity in determining a protein's fold, we have replaced all of the hydrophobic amino acids in the sequence of a-Domain with the representative residue, leucine. Remarkably, we find that our minimized hydrophobic construct (MinLeu) has many features characteristic of a native-like fold. Thus, non-specific hydrophobic interactions may be sufficient to establish the native-like architecture of molten globules. MATERIALS AND METHODS Protein Production. Full length o-LA and variants thereof were produced as described previously (30). Mutations were introduced by single-stranded mutagenesis and verified by DNA sequencing. Inclusion bodies of expressed proteins were washed with sucrose and Triton buffers, solubilized and reduced in urea/DTT, and purified as described previously (28, 30). Protein identity was confirmed by mass spectrometry. Identification of Disulfide Connectivities. Disulfide bonds were assigned by digestion with the specific protease trypsin, followed by mass spectrometry of the proteolysis mixture, using a slight modification of the procedure of Schulman and Kim (31). Briefly, 100 gg of protein at a concentration of 1 gtg/gtl was digested with trypsin at a ratio of 1:20 by weight in 10 mM Tris, pH 8.8, 37°C. Aliquots at 0, 5, 10, and 24 hours were quenched by addition of PMSF to a final concentration of 3 mM. DTT was added to half of each time point sample to reduce all disulfidelinked fragments. The quenched digestion mixtures (oxidized and reduced) were dried and resuspended in ox-cyano cinnamic acid matrix for analysis by mass spectrometry on a Voyager Elite MALDI-TOF mass spectrometer (Perseptive Biosystems). Characteristic peptide fragments were identified in the mass spectrum of each disulfide species, thereby facilitating unequivocal disulfide connectivity assignments. Sedimentation Equilibrium. Sedimentation equilibrium experiments were performed on a Beckman XL-A analytical ultracentrifuge as described previously (28, 30). MinLeu solutions were dialyzed overnight against 10 mM Tris, 0.5 mM EDTA, pH 8.8, loaded at initial concentrations of 200, 100, 40, 15 and 5 jIM, and analyzed at 28 and 34 krpm. Data were acquired at three wavelengths per rotor speed and processed simultaneously using a non-linear least squares fitting routine (Nonlin; 33). Solvent density and protein partial specific volume were calculated according to solvent and protein composition, respectively (34). The data fit well to a model for an ideal monomer (+10%), with no systematic deviation of the residuals. There is no concentration dependence of the observed molecular weight. Circular Dichroism (CD) Spectroscopy. CD spectroscopy was performed with an Aviv 62 DS spectrometer as described previously (30). Proteins were dissolved in 10 mM Tris, 0.5 mM EDTA, pH 8.8, to a concentration of 10 gM, and spectra were acquired at OoC. Protein concentrations were determined by absorbance at 280 nm in 6 M GuHC1, 20 mM sodium phosphate, pH 6.5, using an extinction coefficient calculated based on tryptophan, tyrosine, and cystine content (35). Disulfide Exchange. Disulfide exchange studies were performed at room temperature (25-27 0 C) as described previously (30). Native buffer consisted of 10 mM Tris, 0.5 mM EDTA, pH 8.8, and denaturing buffer consisted of 6 M guanidinium isothiocyanate, 10 mM Tris, 0.5 mM EDTA, pH 8.8. Proteolysis Mapping of Secondary Structure. Protein samples were subjected to limited proteolysis at room temperature with the non-specific proteases elastase, proteinase K, and subtilisin. Protein samples were dissolved in 10 mM Tris, 0.5 mM EDTA, pH 8.8, to a concentration of 2 mg/mL. Protease was added at a weight ratio of 1:1000, and proteolysis was stopped at 1, 2, 5, 10, and 20 minutes by the addition of PMSF to 3 mM. Samples were subsequently reduced by incubation with 100 mM DTT. Proteolysis samples were analyzed by reverse phase HPLC on a microbore C 18 column (Vydac), followed by direct injection into an electrospray mass spectrometer (Finigan MAT LC-Q). Fragments were assigned by matching masses with possible fragments predicted using a computer algorithm (E. Wolf and P.S.K., unpublished). Early time points were analyzed to insure that initial cuts were observed, and complementary fragments were identified. This assignment procedure was verified by N-terminal sequence analysis for some proteolysis samples. Solvent Accessibility Calculations. Solvent accessibility of sidechain atoms was determined with the areaimol routine in the CCP4 suite of programs (36), using a solvent probe of radius 1.4 A. Fully buried, partially buried, and exposed methylene groups are defined as those with less than 5 A2, between 5 and 18 A2, and greater than 18 A2 of accessible surface area, respectively. RESULTS Minimization of the Helical Domain of a-LA The helical domain of a-LA consists of residues 1-39 and 81-123 of the wildtype protein. A recombinant model of the isolated helical domain (a-Domain), consisting of residues 1-39 and 81-123 joined by a linker of three glycine residues, has a native-like fold (28). We minimized the hydrophobic amino acid diversity of a-Domain by following a sequence-based rule. An arbitrary cutoff was implemented between histidine and threonine in the hydrophobicity scale of Eisenberg and McLachlan (37). The following residues were thus deemed hydrophobic and were changed to leucine in the sequence of the helical domain: W, M, F, I, L, Y, V, and H. The one proline in the sequence of the helical domain (Pro 24), although a hydrophobic residue by our rule, was left unchanged, since its major structural influence may be through local helix capping effects rather than sidechain packing (38, 39). A single tryptophan was introduced in the tri-glycine linker in a-Domain to facilitate protein concentration determination, thereby changing the linker sequence from G-G-G For The resulting minimized molecule is called MinLeu. to G-W-G. convenience, we refer to the amino acids in MinLeu by their positions in the context of full length a-LA. MinLeu corresponds to the helical domain of a-LA minimized with respect to amino acid type, regardless of residue location in the threedimensional structure of the molecule (Fig. 1). Thus, we have removed sequence-specific hydrophobic interactions from the helical domain of a-LA and replaced them with general hydrophobic interactions mediated by leucines. Notably, our minimization scheme is not biased by decisions based on the native structure. A total of 31 out of the 86 amino acid residues in MinLeu are leucine (36%), of which 12 are leucine in the wildtype a-LA sequence. Lack of Aggregation Molten globules are prone to aggregation. Moreover, the substantial changes introduced by hydrophobic minimization may yield molecules that associate non-specifically. It is therefore important to determine whether MinLeu is monomeric under the conditions used in this study. We measured the oligomerization state of MinLeu and its variants by sedimentation equilibrium. Significantly, all species studied here are monomeric (Fig. 2). Thus, the structural properties determined in this study are not artifacts of aggregation. Backbone Topology We determined whether the minimized hydrophobic sequence adopts a native-like tertiary fold (backbone topology) by performing disulfide exchange experiments (Fig. 3). The four cysteine residues in MinLeu can pair in three possible ways, corresponding to one native and two non-native structures. The equilibrium distribution of the three disulfide variants of MinLeu reflects the intrinsic propensity of the chain to sample native and non-native folds. Importantly, the same ratios of disulfide isomers are obtained at extended time intervals and starting from different initial species, confirming that equilibrium was established. Remarkably, 97% of MinLeu adopts native disulfide pairings. This suggests that the minimized sequence has a strong preference for a native-like fold. For comparison, 90% of the wildtype sequence (ca-Domain) adopts native disulfide pairings (28). In contrast, under denaturing conditions only 9% of MinLeu adopts native disulfide pairings. The disulfide distribution of fully denatured MinLeu can be predicted using a random walk model for an unfolded polypeptide chain (40, 41), and our experimental results are in agreement with the predicted distribution. Thus, the preference of MinLeu for native disulfide bonds under native conditions is the result of structure, as opposed to intrinsic differences in thiol reactivity. Overall Helix Content We assessed the helix content of MinLeu using CD spectroscopy to determine whether sequence minimization yields a molecule that still contains secondary structure consistent with a native-like fold (Fig. 4). Native MinLeu (i.e., with native ca-LA disulfide pairings) has substantial helix content. The native forms of MinLeu and the full length a-LA molten globule, which consists of a native-like helical domain and an unstructured beta-sheet domain, have similar helix content when normalized for the number of residues in each Non-native folds, imposed on MinLeu by formation of non-native disulfide pairings, result in lower overall helix content than in native MinLeu, indicating that the high level of secondary structure in MinLeu is linked to formation of a native-like fold. The spectra of non-native MinLeu disulfide protein. isomers have striking similarities to those of the corresponding non-native isomers of the full length a-LA molten globule (Fig. 4). Localization of Secondary Structure To investigate the location of helices and loops in MinLeu, we performed limited proteolysis on the native and non-native disulfide isomers of MinLeu. Proteolysis is a useful tool with which to assess the location of exposed flexible segments in a protein (42). Three proteases that non-specifically cleave peptide bonds (elastase, subtilisin, and proteinase K) were chosen to avoid sequencespecific cleavage bias. We identified cleavage sites by reverse phase HPLC separation and mass spectrometry of proteolysis products, limiting our analysis to initial cuts of the polypeptide chain (Fig. 5). By analyzing initial cleavage sites, we identified regions of the polypeptide chain that are likely to be disordered in the native and non-native variants of MinLeu. Proteolysis of native MinLeu occurs in regions of the polypeptide chain corresponding to loops in the native u-LA structure, except for one cleavage site that lies in the region corresponding to the C-helix (Fig. 6). Thus, proteolysis is consistent with the presence of native-like secondary structure, except for the C-helix. Significantly, in the wildtype a-LA molten globule, the A-, B-, D-, and 3 1 0 -helices, but not the C-helix, are structured, as assessed by NMR measurements of hydrogen exchange and proline scanning mutagenesis studies (26, 31, 43). Thus, one would predict that, even for the wildtype u-Domain sequence, cleavage may occur in the region corresponding to the C-helix. In contrast, the proteolysis profiles of the non-native variants of MinLeu show additional cuts inconsistent with native-like helices, specifically in regions of the polypeptide chain encompassing, in the native structure, the B- and 310-helices (both non-native species) and the D-helix (one non-native species). Notably, the B-helix is substantially buried in the native structure of a-LA. Proteolysis in the middle of this helix, as observed in one non-native species, is highly inconsistent with a native-like fold. It is striking that regions corresponding to native helices are cleaved in non-native variants of MinLeu but not in the native species. Thus, in native MinLeu the lack of proteolysis in these regions is likely due to the presence of protective structure, as opposed to the lack of a cleavage site. Our results indicate that proteolysis can differentiate between native and non-native structures and confirms that formation of nonnative disulfide pairings disrupts native-like secondary structure. Structure in the Absence of Disulfide Constraints To examine the structure of the MinLeu sequence in the absence of topological constraints imposed by disulfide bonds, we constructed a variant of MinLeu in which all four cysteines are changed to alanine. This alanine variant can freely sample all possible structures in the absence of disulfide crosslinks. Importantly, the CD spectrum of the alanine variant is very similar to that of native MinLeu, as opposed to non-native structures (Fig. 4), indicating that native-like helix content is intrinsic to the MinLeu sequence and is not imposed by disulfide bonds. Moreover, the alanine variant of MinLeu has a proteolysis profile similar to that of native MinLeu (Fig. 5). Thus, the native-like structure in MinLeu exists even in the absence of disulfide bonds. DISCUSSION Hydrophobic Amino Acid Diversity Our results suggest that specific hydrophobic amino acid diversity may not be necessary to determine a protein's overall fold, as represented by the molten globule. Although we have substituted all of the hydrophobic amino acids in the helical domain of a-LA with leucine, the resulting molecule retains many features consistent with a native-like fold. MinLeu has a strong preference for a native-like backbone topology, as determined by disulfide exchange studies. In addition, MinLeu is monomeric and has a helix content similar to that present in the helical domain of the full length a-LA molten globule (both MinLeu and the a-LA molten globule lack a well-formed C-helix). Non-native forms of MinLeu have CD spectra similar to the analogous non-native forms of the full length a-LA molten globule. Finally, proteolysis studies of MinLeu show protection patterns consistent with native-like secondary structure. Thus, nonspecific hydrophobic interactions may play a significant role in determining the global fold of a protein. Given the fact that MinLeu is a molten globule, our characterization of MinLeu is limited to low resolution structural assays. Thus, we have not obtained high resolution NMR or x-ray crystallographic structural data which would describe the structure of MinLeu more definitively. Our proteolysis data indicate that some elements of native structure are not present in MinLeu, since the carboxy-terminal half of the region corresponding to the C-helix in the native structure is sensitive to protease digestion. In addition, MinLeu, while similar in helix content to that expected based on the spectrum of the full length a-LA molten globule, is more helical than the isolated wildtype a-Domain. It should be noted, however, that a-Domain is less structured in isolation than in the context of the full length a-LA molten globule (see Fig. 4 legend). Additional experiments, including the possibility of obtaining high resolution structural data by multidimensional NMR techniques as was recently achieved for the a-LA molten globule (32), may further establish the degree of native structure that is present in MinLeu. Specific Sidechain Packing Partially folded structures can range in orderliness from species with noncooperative thermal denaturations and poor NMR chemical shift dispersion, such as the a-LA molten globule, to highly ordered structures that have cooperative thermal denaturations and substantial NMR chemical shift dispersion (44, 45). The role of hydrophobic sidechain diversity in stabilizing and specifying this range of structures may differ. Our results suggest that specific packing interactions are not required for the formation of a native overall topology by molten globules. Thus, non-specific hydrophobic interactions may help determine a protein's overall fold early in the folding process. Subsequently, specific sidechain packing may order loosely formed native-like structure arising from non-specific hydrophobic interactions. Our studies do not preclude the possibility that molten globules contain specific packing interactions. Indeed, studies of the apomyoglobin, cytochrome c, and staphylococcal nuclease molten globules indicate that specific packing of core hydrophobic residues exists in these molten globules (8-13). In addition, we have obtained evidence that some native-like core hydrophobic packing may exist in the a-LA molten globule (Chapter 4). One might envision that the multiple leucine residues in the core of MinLeu may adopt a specific packing arrangement, as has been observed in variants of T4 lysozyme (23) and the src SH3 domain (46) that consist of wellpacked but simplified hydrophobic cores lacking significant amino acid diversity. Importantly, our leucine mutations have not created any unsatisfied buried polar interactions. However, our leucine mutations have resulted in the net loss of twenty-five sidechain methylene groups, of which seven are fully buried, twelve are partially buried, and six are exposed, as determined by solvent accessible surface area calculations (see Methods). Such a large loss in buried nonpolar surface area is expected to significantly destabilize well-packed native structure to the extent that MinLeu is not expected to fold to a stable native structure, even in the context of full length a-LA (47, 48). Thus, it is unlikely that there is extensive specific packing of the leucine sidechains in the MinLeu molten globule. Interestingly, the molten globule formed by MinLeu is more stable than that formed by wildtype a-Domain. MinLeu must be subjected to 6 M guanidinium isothiocyanate in order to be fully unfolded, as observed in disulfide exchange assays. In addition, the effective concentration for formation of the 28-111 disulfide bond is higher in MinLeu than in wildtype a-Domain under native conditions (MinLeu: 3.67 ± 0.24 M; a-Domain: 0.78 ± 0.03 M, Zheng-yu Peng, personal communication). Other proteins with cores comprised of only leucine have been observed to be highly stable molten globule-like structures (21, 22). It is possible that our sequence minimization has removed unfavorable interactions present in the wildtype a-LA molten globule or has increased the stability of the molten globule, for example by increasing overall helix propensity. Binary Patterning of Amino Acid Types It is thought that a binary pattern of hydrophobic and hydrophilic amino acid types may be sufficient to determine a protein's overall fold (7, 49). Studies of libraries of protein sequences indicate that a large combination of appropriately patterned hydrophobic and hydrophilic residues can be tolerated by the four-helix bundle fold (49). Our studies of MinLeu also support the binary patterning hypothesis. If patterning of amino acid types is sufficient to determine a protein's fold, our sequence minimization approach could be extended successfully to other residues, resulting in a minimized model of the helical domain of a-LA consisting of very few amino acid types, representing perhaps hydrophobic, polar, and charged amino acids. In this light, it is interesting that a substantial portion of protein libraries composed of only glutamine, leucine, and arginine can form helical structures with properties similar to natural proteins containing significant amino acid diversity (19, 20) and that the src SH3 domain fold can be encoded largely by only five different amino acids (46). Alternatively, it is possible that the determinants of a protein's fold lie in specific polar and/or charged amino acids. Our leucine substitutions have not affected either buried polar residues or a substantial portion of the hydrophilic protein surface. Polar residues buried in the cores of proteins have been found to contribute to structural uniqueness and specificity (50, 51), although sometimes they can be replaced by hydrophobic residues without significant structural effects (52). Recent experiments in which multiple residues in the core of the phage 434 cro protein have been replaced by leucine without significant perturbation of the protein structure have been interpreted to indicate that surface residues may play a more active role in protein conformation than assumed traditionally (24), although a large body of data indicates that surface residues play a minor role in determining protein structure (48, 53). The Role of the Molten Globule in Protein Folding Our results emphasize how formation of the molten globule can assist protein folding (1-4). Hydrophobic collapse of the polypeptide chain yields a lowresolution native-like structure, thereby greatly reducing the conformational space to be searched by the polypeptide chain. The driving force for such nativelike structure may be the distribution of hydrophobic amino acids along the polypeptide chain. Specific packing interactions resulting from hydrophobic sidechain diversity and packing complementarity may stabilize loosely ordered structure but do not appear to be necessary for establishing an overall native-like fold. ACKNOWLEDGEMENTS We thank James Pang for help with mass spectrometry and Zheng-yu Peng, Brenda Schulman, Pehr Harbury, Martha Oakley, and members of the Kim lab for helpful discussions and comments on the manuscript. This work was supported by the Howard Hughes Medical Institute. REFERENCES 1. Ptitsyn, O. B. 1992. The Molten Globule State. In Protein Folding, ed. T. E. Creighton, 243-300. New York: W.H. Freeman and Co. 2. Kuwajima, K. 1989. The Molten Globule State in Protein Folding. Proteins: 3. 4. Struct., Func., and Genet. 6: 87-103 Dobson, C. M. 1992. Unfolded Proteins, Compact States, and Molten Globules. Curr. Opin. Struct. Biol. 2: 6-12 Peng, Z.-y., Wu, L.C., Schulman, B.A., and Kim, P.S. 1995. Does the Molten Globule Have a Native-Like Tertiary Fold? Phil. Trans. R. Soc. Lond. B. 348: 43-47 5. 6. 7. Ponder, J. W., and Richards, F.M. 1987. Tertiary Templates for Proteins: Use of Packing Criteria in the Enumeration of Allowed Sequences for Different Structural Classes. J. Mol. Biol. 193: 775-791 Richards, F. M., and Lim, W.A. 1993. An Analysis of Packing in the Protein Folding Problem. Q. Rev. Biophys. 26: 423-298 Cordes, M. H. J., Davidson, A.R., and Sauer, R.T. 1996. Sequence Space, Folding, and Protein Design. Curr. Opin. Struct. Biol. 6: 3-10 8. Marmorino, J. L., and Pielak, G.J. 1995. A Native Tertiary Interaction Stabilizes the A State of Cytochrome c. Biochemistry. 34: 3140-3143 9. Colon, W., and Roder, H. 1996. Kinetic Intermediates in the Formation of the Cytochrome c Molten Globule. Nat. Struct. Biol. 3: 1019-1025 10. Colon, W., Elove, G.A., Wakem, L.P., Sherman, F., and Roder, H. 1996. Side Chain Packing of the N- and C-Terminal Helices Plays a Critical Role in the Kinetics of Cytochrome c Folding. Biochemistry. 35: 5538-5549 11. Kay, M. S., and Baldwin, R.L. 1996. Packing Interactions in the Apomyoglobin Folding Intermediate. Nat. Struct. Biol. 3: 439-445 12. Lin, L., Pinker, R.J., Forde, K., Rose, G.D., and Kallenbach, N.R. 1994. Molten Globular Characteristics of the Native State of Apomyoglobin. Nat. Struct. Biol. 1: 447-452 13. Carra, J. H., Anderson, E.A., and Privalov, P.L. 1994. Thermodynamics of Staphylococcal Nuclease Denaturation. II. The A-State. Protein Sci. 3: 952-959 14. Ptitsyn, O. B. 1996. How Molten is the Molten Globule? Nat. Struct. Biol. 3: 488-490 15. Behe, M. J., Lattman, E.E., and Rose, G.D. 1991. The Protein-Folding Problem: The Native Fold Determines Packing, But Does Packing Determine the Native Fold? Proc. Natl. Acad. Sci. USA. 88: 4195-4199 16. Dill, K. A., Bromberg, S., Yue, K., Fiebig, K.M., Yee, D.P., Thomas, P.D., and Chan, H.S. 1995. Principles of Protein Folding - A Perspective From Simple Exact Models. Protein Sci. 4: 561-602 17. Huang, E. S., Subbiah, S., and Levitt, M. 1995. Recognizing Native Folds by the Arrangement of Hydrophobic and Polar Residues. J. Mol. Biol. 252: 709720 18. West, M. W., and Hecht, M.H. 1995. Binary Patterning of Polar and Nonpolar Amino Acids in the Sequences and Structures of Native Proteins. Protein Sci. 4: 2032-2039 19. Davidson, A. R., and Sauer, R.T. 1994. Folded Proteins Occur Frequently in Libraries of Random Amino Acid Sequences. Proc. Natl. Acad. Sci. USA. 91: 2146-2150 20. Davidson, A. R., Lumb, K.J., and Sauer, R.T. 1995. Cooperatively Folded Proteins in Random Sequence Libraries. Nat. Struct. Biol. 2: 856-864 21. Regan, L., and DeGrado, W.F. 1988. Characterization of a Helical Protein Designed From First Principles. Science. 241: 976-978 22. Munson, M., Balasubramanian, S., Fleming, K.G., Nagi, A.D., O'Brien, R., Sturtevant, J.M., and Regan, L. 1996. What Makes a Protein a Protein? Hydrophobic Core Designs That Specify Stability and Structural Properties. Protein Sci. 5: 1584-1593 23. Gassner, N. C., Baase, W.A., and Matthews, B.W. 1996. A Test of the "Jigsaw Puzzle" Model for Protein Folding by Multiple Methionine Substitutions Within the Core of T4 Lysozyme. Proc. Natl. Acad. Sci. USA. 93: 12155-12158 24. Desjarlais, J. R., and Handel, T.M. 1995. De Novo Design of the Hydrophobic Cores of Proteins. Protein Sci. 4: 2006-2018 25. Dolgikh, D. A., Abaturov, L.V., Bolotina, I.A., Brazhnikov, E.V., Bychkova, V.E., Gilmanshin, R.I., Lebedev, Y.O., Semisotnov, G.V., Tikupulo, E.I., and Ptitsyn, O.B. 1985. Compact State of a Protein Molecule with Pronounced Small-Scale Mobility: Bovine a-Lactalbumin. Eur. Biophys. J. 13: 109-121 26. Alexandrescu, A. T., Evans, P.A., Pitkeathly, M., Baum, J., and Dobson, C.M. 1993. Structure and Dynamics of the Acid-Denatured Molten Globule State of a-Lactalbumin: A Two-Dimensional NMR Study. Biochemistry. 32: 1707-1718 27. Xie, D., Bhakuni, V., and Friere, E. 1991. Calorimetric Determination of the Energetics of the Molten Globule Intermediate in Protein Folding: Apo-aLactalbumin. Biochemistry. 30: 10673-10678 28. Peng, Z.-y., and Kim, P.S. 1994. A Protein Dissection Study of a Molten Globule. Biochemistry. 33: 2136-2141 29. Kuwajima, K. 1996. The Molten Globule State of a-Lactalbumin. FASEB J. 10: 102-109 30. Wu, L. C., Peng, Z.-y., and Kim, P.S. 1995. Bipartite Structure of the aLactalbumin Molten Globule. Nat. Struct. Biol. 2: 281-286 31. Schulman, B. A., and Kim, P.S. 1996. Proline Scanning Mutagenesis of a Molten Globule Reveals Non-Cooperative Formation of a Protein's Overall Topology. Nat. Struct. Biol. 3: 682-687 32. Schulman, B. A., Kim, P.S., Dobson, C.M., and Redfield, C. 1997. A ResidueSpecific NMR View of the Non-Cooperative Unfolding of a Molten Globule. Nat. Struct. Biol. 4: 630-634 33. Johnson, M. L., Correia, J.J., Yphantis, D.A., and Halvorson, H.R. 1981. Analysis of Data From the Analytical Ultracentrifuge by Nonlinear LeastSquares Techniques. Biophys. J. 36: 575-588 34. Laue, T. M., Shah, B.D., Ridgeway, T.M., and Pelletier, S.L. 1992. ComputerAided Interpretation of Analytical Sedimentation Data for Proteins. In Analytical Ultracentrifugation in Biochemistry and Polymer Science, ed. S. E. Harding, et al., 90-125. Cambridge: The Royal Society of Chemistry 35. Edelhoch, H. 1967. Spectroscopic Determination of Tryptophan and Tyrosine in Proteins. Biochemistry. 6: 1948-1954 36. CCP4. 1994. Collaborative Computational Project, Number 4. Acta Cryst. D50: 760-763 37. Eisenberg, D., and McLachlan, A.D. 1986. Solvation Energy in Protein Folding and Binding. Nature. 319: 199-203 38. Richardson, J. S., and Richardson, D.C. 1988. Amino Acid Preferences for Specific Locations at the Ends of Alpha Helices. Science. 240: 1648-1652 39. Presta, L. G., and Rose, G.D. 1988. Helix Signals in Proteins. Science. 240: 16321641 40. Snyder, G. H. 1987. Biochemistry. 26: 688-694 41. Kauzman, W. 1959.. In Sulfur in Proteins, ed. R. Benesch, et al., 93-108. New York: Academic Press 42. Fontana, A., de Laureto, P.P., De Felippis, V., Scaramella, E., and Zambonin, M. 1997. Probing the Partly Folded States of Proteins by Limited Proteolysis. Folding & Design. 2: R17-R26 43. Schulman, B. A., Redfield, C., Peng, Z.-y., Dobson, C.M., and Kim, P.S. 1995. Different Subdomains are Most Protected From Hydrogen Exchange in the Molten Globule and Native States of Human a-Lactalbumin. J. Mol. Biol. 253: 651-657 44. Redfield, C., Smith, R.A.G., and Dobson, C.M. 1994. Structural Characterization of a Highly-Ordered "Molten Globule" at Low pH. Nat. Struct. Biol. 1: 23-29 45. Feng, Y., Sligar, S.G., and Wand, A.J. 1994. Solution Structure of Apocytochrome B562. Nat. Struct. Biol. 1: 30-35 46. Riddle, D. S., Santiago, J.V., Bray-Hall, S.T., Doshi, N., Grantcharova, V.P., Qian, Y., and Baker, D. 1997. Functional Rapidly Folding Proteins From Simplified Amino Acid Sequences. Nat. Struct. Biol. 4: 805-809 47. Lim, W. A., and Sauer, R.T. 1991. The Role of Internal Packing Interactions in Determining the Structure and Stability of a Protein. J. Mol. Biol. 219: 359376 48. Matthews, B. W. 1995. Studies on Protein Stability with T4 Lysozyme. Adv. Protein Chem. 46: 249-278 49. Kamtekar, S., Schiffer, J.M., Xiong, H., Babik, J.M., and Hecht, M.H. 1993. Protein Design by Binary Patterning of Polar and Nonpolar Amino Acids. Science. 262: 1680-1685 50. Lumb, K. J., and Kim, P.S. 1995. A Buried Polar Interaction Imparts Structural Uniqueness in a Designed Heterodimeric Coiled Coil. Biochemistry. 34: 86428648 51. Hendsch, Z. S., and Tidor, B. 1994. Do Salt Bridges Stabilize Proteins? A Continuum Electrostatic Analysis. Protein Sci. 3: 211-226 52. Waldburger, C. D., Schildbach, J.F., and Sauer, R.T. 1995. Are Buried Salt Bridges Important for Protein Stability and Conformational Specificity? Nat. Struct. Biol. 2: 122-128 53. Bowie, J. U., Reidhaar-Olson, J.R., Lim, W.A., and Sauer, R.T. 1990. Deciphering the Message in Protein Sequences: Tolerance to Amino Acid Substitutions. Science. 247: 1306-1310 54. Acharya, K. R., Ren, J., Stuart, D.I., Phillips, D.C., and Fenna, R.E. 1991. Crystal Structure of Human a-Lactalbumin at 17.A Resolution. J. Mol. Biol. 221: 571581 FIGURE LEGENDS Hydrophobic sequence minimization of the helical domain of -lactalbumin. (a) Schematic representation of the helical domain of a-LA based on the structure of full length human a-LA (54), with helices and disulfide bonds indicated. (b) Hydrophobic sidechains in the helical domain of (a-LA are highlighted by a space-filling representation. The domain is oriented as in (a), FIGURE 1: with wildtype leucines shown in green and all other hydrophobic sidechains, mutated to leucine in the current study, shown in red. (c) Amino acid sequences of the wildtype and minimized helical domains. Residues are numbered in the context of full length a-LA and colored as in (b). Solid red lines align those hydrophobic amino acids in ua-LA that are changed to leucine in MinLeu, and dashed green lines align wildtype leucines. FIGURE 2: Sedimentation equilibrium studies of MinLeu and its variants indicate that all species are monomeric. Representative data for native MinLeu (native disulfide bonds) are plotted as ln(absorbance) against the square of the radius from the axis of rotation. The slope is proportional to the molecular weight: dashed lines indicate calculated slopes for monomeric and dimeric species. Deviations from the calculated values are plotted as residuals (top). The composite ratio (of all data sets for a given species) of observed to expected molecular weights for native MinLeu, non-native disulfide isomer [6-28; 111-120], non-native isomer [6-111; 28-120], and the alanine variant are 1.07, 1.08, 1.10, and 1.10, respectively. FIGURE 3: Disulfide exchange studies indicate that MinLeu has a strong preference for a native-like fold. Reverse phase HPLC chromatograms of equilibrium disulfide populations under native and denaturing (6 M guanidinium isothiocyanate) conditions are shown, with each disulfide species indicated. The relative populations of disulfide species under native and denaturing conditions, as well as that calculated for an unfolded polypeptide chain using a random walk model, are indicated. The same results (+4%) were obtained at 40 C as at room temperature. Circular dichroism spectra of native, non-native, and alanine variants of MinLeu (left) and the full length a(-LA molten globule (right). The FIGURE 4: native species (native disulfide bonds) is depicted by large filled circles. The two non-native disulfide variants [6-111; 28-120] and [6-28; 111-120] are depicted by small filled diamonds and open squares, respectively. The alanine variant of MinLeu is depicted by small open circles. CD comparisons are made between analogous MinLeu and full length a-LA species because the isolated wildtype helical domain (a-Domain) is less structured than the helical domain in the context of the full length molten globule. [6]222 of a-Domain is less than that expected from the full length a-LA molten globule, normalized to the length of a-Domain (28, 30). FIGURE 5: Proteolysis of the alanine variant of MinLeu with proteinase K. Products of an early digestion time point, separated by reverse phase HPLC, are shown. The region of the HPLC chromatogram from 45 to 75 minutes is expanded, with the full chromatogram shown in the inset. Initial cuts were identified as those cleavage sites that create two complementary digestion products spanning the entire MinLeu sequence. It possible that complementary fragments are generated from secondary digestion of larger initial fragments. This is unlikely, however, since we have analyzed very early digestion time points in which we do not observe other secondary digestion producing fragments without a complementary product. The identities of individual species, their predicted masses, and their masses measured by mass spectrometry, are indicated. Amino acid residues are numbered as in the context of full length a-LA, with the N-terminal methionine designated as residue zero, and the linker residues designated as Link1, 2, and 3. FIGURE 6: Summary of proteolysis patterns for the native, non-native, and alanine variants of MinLeu. The polypeptide chain is represented schematically, with the locations of each native helix highlighted in yellow. Proteolysis sites for each variant of MinLeu are indicated by rows of arrows beneath the polypeptide chain. Elastase, subtilisin, and proteinase K cuts are shown by blue, red, and green arrows, respectively (top to bottom). Figure 1 C) (-LA 1 6 MKQFTKCE SQ MinLeu MKQLTKCE SQ II 28 39 81 111 120 KDIDGYGGIA PE ICTMFHTSGYDTQGGG- DDDITDDIMAAKKI -DIKGIDYW AHKA-CTEK EQW-CEK I l I I III I I II I l I II I I KDLDGLGGLA PE LCTLLLTSGLDTQGWG DDDLTDDLLAAKKL DLKGLDLL ALKA CTEK EQL CEK Figure 2 0.04 0.02 0 -0.02 -0.04 0 -0.25 c U1 -0.5 0, C' -0.75 -1 -1.25 35 35.5 36 36.5 r 2 (cm 2) 37 37.5 Figure 3 NN2 [6-111; 28-120] NATIVE [6-120; 28-1111 NN1 [6-28; 111-120] NN2 Native Buffer Denaturing Buffer . . . . 80 . . . . 90 . . . . . 100 . . 110 . . . . : NN1 : 97 : 1 • 9 : 85 8 : 85 . 120 Random Walk Time (min) : NATIVE Figure 4 20 104 5 " 10 0 b~O 0 -5 -10 -10 -15 -20 205 215 225 235 245 Wavelength (nm) 255 205 215 225 235 245 Wavelength (nm) 255 Figure 5 D 20 60 40 80 C2 B2 A2 I I I 50 45 B1 f II I 55 El Al I II 6'5 60 ' 75 70 Time (min) Peak Fragment Predicted Mass A2 B2 C2 C1 B1 Al D El D82-End Link3-End Linkl-End MO-Q39 MO-Link2 MO-D81 Uncut MO-E121 4605 4775 5018 4223 4467 4637 9223 8979 I Observed Mass 4604 4775 5018 4223 4466 4636 9223 8980 Figure 6 NATIVE- A C B D I ALANINE I 3 I I Stt NN2 - SNN1I N N1 I I I [ ! I I I I I APPENDIX Disulfide Determinants of Calcium-Induced Packing in oc-Lactalbumin 105 a-Lactalbumin (a-LA) is a two-domain calcium-binding protein that folds through a molten globule intermediate. Calcium binding to the wild-type a-LA molten globule induces a transition to the native state. Here we assess the calcium-binding properties of the a-LA molten globule by studying two disulfide variants. a-LA(a) contains only the two disulfide bonds in the a-helical domain of a-LA, while a-LA(3) contains only the P-sheet domain and interdomain disulfide bonds. We find that only a-LA(3) binds calcium, leading to the cooperative formation of substantial tertiary interactions. In addition, the 3sheet domain acquires a native-like backbone topology. Thus, specific interactions within a-LA imposed by the P-sheet domain and interdomain disulfide bonds, as opposed to the two a-helical domain disulfides, are necessary for the calcium-induced progression from the molten globule towards more native-like structure. Our results suggest that organization of the P-sheet domain, coupled with calcium binding, comprises the locking step in the folding of a-LA from the molten globule to the native state. 106 INTRODUCTION Native proteins are distinguished from partially folded forms by rigid side chain packing and extensive tertiary interactions. Many proteins fold through the molten globule, a compact but highly dynamic species with a native-like backbone topology (Kuwajima, 1989; Dobson, 1992; Ptitsyn, 1992; Ptitsyn, 1995). Native packing and tertiary interactions are subsequently acquired from the molten globule (Matthews, 1993), but our understanding of the structural and mechanistic bases of these events is poor. a-Lactalbumin (a-LA) 1 is a widely studied calcium-binding protein composed of two structural domains. Folding of a-LA proceeds in two major steps (Ikeguchi et al., 1986; Kuwajima et al., 1990; Ptitsyn, 1992; Dobson et al., 1994; Peng & Kim, 1994; Wu et al., 1995). Collapse of the polypeptide chain yields a molten globule intermediate, which then acquires rigid side chain packing and tertiary interactions to form the native protein. We previously assessed the structure of the ax-LA molten globule by studying the properties of two disulfide variants, c-LA(a) and (a-LA(P) (Wu et al., 1995). Our studies indicated that the molten globule of a-LA is a bipartite structure in which the a-helical domain adopts a native-like backbone topology, while the P-sheet domain is largely unstructured (Peng & Kim, 1994; Wu et al., 1995). Formation of the a-LA molten globule is independent of calcium (Ikeguchi et al., 1986; Kuwajima, 1989). In addition, the equilibrium molten globule of a-LA is studied in the absence of calcium (for reviews, see Kuwajima, 1989; Ptitsyn, 1992). Calcium binding to a-LA appears to induce native structure from the molten globule and affects the rate of folding from the molten globule to the native state (Dolgikh et al., 1981; Hiraoka & Sugai, 1984; Ikeguchi et al., 1986; Kuwajima et al., 1990; Ewbank & Creighton, 1993a; Ewbank & Creighton, 1993b). Furthermore, calorimetric and structural analyses of equine lysozyme, a calcium-binding lysozyme structurally homologous to x-LA, indicates that the protein unfolds in two stages, the first of which is calcium-dependent and results in the loss of specific tertiary interactions and side chain packing (Van Dael et al., 1 ABBREVIATIONS a-LA, (-lactalbumin; ac-LA(ca), human a-lactalbumin with cysteines 61, 73, 77, and 91 replaced by alanines, leaving the (-helical domain disulfide bonds intact; a-LA(3), human (a-lactalbumin with cysteines 6, 28, 111, and 120 replaced by alanines, leaving the 3-sheet domain and interdomain disulfide bonds intact; CD, circular dichroism; NMR, nuclear magnetic resonance; DTT, dithiothreitol; EDTA, (ethylenediamine)-tetraacetic acid; GuHC1, guanidine hydrochloride; TMSP, (trimethylsilyl)-propionate. 107 1993; Griko et al., 1995). Here we study the calcium-binding properties of a-LA() and a-LA(f), in order to localize the disulfide determinants of native packing and tertiary interactions in a-LA. 108 MATERIALS AND METHODS Production of a-LA Variants. o-Lactalbumin variants were produced and purified as described previously from expression plasmids pALA-A2 and pALAB2 for ca-LA(a) and a-LA(), respectively (Wu et al., 1995). Sedimentation equilibrium experiments were performed on a Beckman XL-A analytical ultracentrifuge as described previously (Wu et al., 1995). Our data indicate that slight aggregation of a-LA(a) and aSedimentation Equilibrium. LA(P) occurs at high concentrations of CaC12 (-15% aggregation at 1 mM CaCl2). Based on this observation and the results of the calcium titration (see below), a concentration of 200 jiM CaC12 was used for all further studies of a-LA() and a-LA(P) in the presence of calcium. Protein solutions were dialyzed overnight against 10 mM Tris, pH 8.5, with either 0.5 mM EDTA or 200 jtM CaC12 . Initial protein concentrations of 100, 40 and 15 tM were analyzed at 23 and 27 krpm. The data for a-LA(a) and a-LA() in both the presence and absence of calcium fit well (within 8%) to a model for an ideal monomer, with no systematic deviation of the residuals. There is no concentration dependence of the observed molecular weight for either variant. Circular Dichroism (CD) Spectroscopy. CD spectroscopy was performed with an Aviv 62 DS spectrometer as described previously (Wu et al., 1995). Samples were dissolved in 10 mM Tris, pH 8.5, with either 0.5 mM EDTA or 200 jgM CaC12 . The spectra of native x-LA were taken in 10 mM Tris, 1 mM CaC12 , pH 8.5. Thermal denaturation data were collected at 208 nm at 2' intervals, allowing 1.5 minutes equilibration time and 60 seconds data averaging, using samples dissolved to a concentration of 2.5 jgM in 10 mM Tris, pH 8.5, with either 0.5 mM EDTA or 200 gM CaC12 . Denaturation curves were smoothed by least squares fitting to a thirdorder polynomial, using a window of ten data points. All thermal melts are >90% reversible with no hysteresis. Protein concentrations were determined by absorbance in 6 M GuHC1, 20 mM sodium phosphate, pH 6.5, using an extinction coefficient at 280 nm of 22,430 (Edelhoch, 1967). Calcium Titration. Changes in the far-UV CD signal of ca-LA(a) and a-LA(P) upon addition of CaC12 were monitored at 208 nm. Samples were dissolved to a concentration of 4 jiM in 10 mM Tris, pH 8.5, pre-treated with a chelating agent (Chelex, Bio-Rad). CaC12 was added in small aliquots from 300 gM, 3 mM, 30 109 mM, 300 changes. site with Software) mM, and 3 M stocks, and CD signal was normalized for volume The titration data for a(-LA(3) were fit to a model for a single binding a non-linear least squares fitting program (Kaleidagraph, Abelbeck to yield the dissociation constant. 0 Disulfide Exchange. Disulfide exchange studies were performed at 4 C as described previously (Wu et al., 1995). Native buffer consisted of 10 mM Tris, pH 8.5, with either 0.5 mM EDTA or 200 gM CaC12. Denaturing buffer consisted of 6 M GuHCl, 10 mM Tris, 200 gM CaC12 , pH 8.5. Nuclear Magnetic Resonance (NMR) Spectroscopy. Protein samples were dissolved in D20 to a concentration of -100 gM at pH 8.5 (uncorrected for isotope effects), and either 0.5 mM deuterated EDTA or 200 gM CaC12 . Both a-LA(a) and oa-LA(3) are monomers under these conditions, as determined by sedimentation equilibrium. 1H NMR spectroscopy was preformed at 500.1 MHz on a Bruker AMX spectrometer. 1D spectra were acquired at 4°C using a spectral width of 7812.5 Hz, 4096 complex points, 32,768 transients, and a recycle delay of 1.5 seconds. The residual water peak was suppressed by mild presaturation. Chemical shifts were referenced to 0 ppm with internal (trimethylsilyl)propionate (TMSP). 110 RESULTS corresponds to human a-LA, with the P-sheet domain and ca-LA() interdomain cysteines replaced by alanines, leaving the native 6-120 and 28-111 disulfide bonds intact (Fig. la). Conversely, c-LA(3) corresponds to human caLA, with the a-helical domain cysteines replaced by alanines, leaving the native 61-77 and 73-91 disulfide bonds intact. In the absence of calcium, ca-LA(cc) and XaLA() are both molten globules with structural properties very similar to the widely studied pH 2 molten globule (A-state) of ca-LA (Wu et al., 1995). Calcium does not affect the structural properties of ca-LA(), as judged by far-UV CD titrations of up to 1 mM calcium (Fig. lb). Indeed, the far-UV and near-UV CD spectra (Fig. 2a) of c-LA(cc) in the presence of calcium are superimposable on those of the calcium-free molten globule of c-LA(cc). Moreover, the 1 H NMR spectra of c-LA(c) are identical in the presence and absence of calcium, and resemble closely that of the A-state molten globule of a-LA (Fig. 3). Finally, disulfide exchange studies (Peng & Kim, 1994; Wu et al., 1995) under native conditions in both the presence and absence of calcium give identical results (data not shown). On the other hand, ca-LA(3) binds calcium with a Kd of 6.6 ± 0.3 jM (Fig. lb). For comparison, the Kd of wild-type ca-LA is 2-10 nM, depending on solution conditions 2 (Permyakov et al., 1981; Segawa & Sugai, 1983; Hamano et al., 1986; Mitani et al., 1986). The far-UV CD spectrum of calcium-bound oa-LA(P) resembles closely that of native ca-LA (Fig. 2b). The near-UV CD of calciummolten globule, bound ca-LA(P) is more intense than that of the ca-LA() indicative of tighter side chain packing, but it is substantially less intense than that of native ca-LA (Fig. 2b). These results suggest that, upon addition of calcium, ca-LA(3) acquires more native-like structure, but does not become fully native. Thermal denaturation studies indicate that the formation of this structure is cooperative (Fig. 4). In the molten globule of ca-LA, the P-sheet domain is largely unstructured, lacking a marked preference for native disulfide pairings in disulfide exchange assays, although the ca-helical domain has a native-like tertiary fold (Peng & We determined the Kd of calcium binding to a-LA(3), a variant of human a-LA, in 10 mM Tris, pH 8.5, 0 'C. Reported Kd's for wild-type bovine a-LA are 1.7x10 - 9 M, measured in 5 mM Tris, pH 8.04, -8 20 'C (Permyakov et al., 1981), and 1x10 M, measured in 60 mM Tris, pH 8.4, 25 oC (Hamano et al., 1986). The Kd of human a-LA is -20% greater than that of bovine c-LA under identical conditions (Segawa & Sugai, 1983). 2 111 Kim, 1994; Wu et al., 1995). Under denaturing conditions (6 M GuHC1) in the presence of calcium, only 18% of the a-LA(P) molecules assume the native disulfide pairings, in agreement with the behavior predicted for an unfolded polypeptide, using a random-walk model (Kauzman, 1959; Snyder, 1987). Strikingly, addition of calcium to ax-LA() under native conditions permits 90% of the a-LA() molecules to assume the native disulfide pairings (Fig. 5a), in sharp contrast to the molten globule (Fig. 5b) and unfolded (Fig. 5c) forms of a-LA(P). Although the CD and disulfide exchange studies indicate that calcium binding induces more native structure in the -sheet domain, the lack of a significant increase in near-UV CD intensity in calcium-bound a-LA(P) suggests that the molecule is still flexible. The NMR spectrum of a-LA(P) in the absence 1 of calcium is broad, lacks chemical shift dispersion, and resembles closely the H NMR spectrum of the A-state molten globule of ca-LA (Fig. 3). However, the NMR spectrum of calcium-bound ca-LA(P) (Fig. 3) contains extensive chemical shift dispersion, including resonances shifted upfield of TMSP, indicative of substantial tertiary interactions. These features suggest that a significant amount of folded structure exists in calcium-bound a-LA(f), albeit less than in native a-LA. 112 DISCUSSION Previous studies indicate that in the molten globule of a-LA, the a-helical domain is a dynamic, native-like structure, while the -sheet domain is largely unstructured (Wu et al., 1995). Calcium induces the transition between the molten globule and the native state of a-LA. We find that calcium binding to aLA(3) introduces specific structure, while a-LA(a) remains a molten globule in the presence of calcium. Thus, the 1-sheet domain (61-77) and interdomain (7391) disulfide bonds, as opposed to the a-helical domain disulfides, are crucial for the calcium-induced progression from the a-LA molten globule towards native structure. The folding of lysozyme, a protein homologous to a-LA, proceeds via a kinetic molten globule intermediate in which the a-helical domain is nativelike, while the P-sheet domain remains largely disordered (Radford et al., 1992; Miranker et al., 1993; Dobson et al., 1994, Balbach et al., 1995). Acquisition of near-UV CD signal and enzymatic activity are single kinetic events late in the folding pathway, with rates identical to the rate of folding of the 3-sheet domain determined by hydrogen-exchange NMR (Radford et al., 1992; Dobson et al., 1994; Itzhaki et al., 1994). Thus, in both a-LA and lysozyme, although the a-helical domain folds first, interactions outside of the a-helical domain are important for the acquisition of native packing and tertiary interactions. Our studies, taken together with previous work (Radford et al., 1992; Miranker et al., 1993; Dobson et al., 1994; Peng & Kim, 1994; Wu et al., 1995), suggest the following pathway for the folding of the structurally homologous a-lactalbumins and lysozymes. Initial formation of the molten globule yields a species in which the a-helical domain is native-like, while the 1-sheet domain is predominantly unfolded. Subsequently, a locking step requiring organization of the 1-sheet domain, and calcium binding in a-LA, yields the unique native structure. It is interesting that the NMR spectrum of calcium-bound a-LA(P) suggests significant tertiary contacts, while the near-UV (aromatic) CD spectrum lacks much of the intensity of native a-LA. Our studies indicate that the 1-sheet domain has a native-like fold in calcium-bound a-LA(P). However, the detailed structure of calcium-bound a-LA() remains unclear. Many possibilities are apparent, in which the individual domains of a-LA have varying degrees of native structure. 113 At one extreme is the possibility that calcium binding to a-LA() yields slightly more structure throughout the entire molecule, converting a-LA(P) from a molten globule to a "highly ordered molten globule", a flexible, partially folded species in which the native secondary structures are largely formed, but loop regions are disordered (Feng et al., 1994; Redfield et al., 1994). At the other extreme is the possibility that calcium binding to u-LA(3) causes the P-sheet domain to fold entirely, while the a-helical domain remains dynamic; the relatively small near-UV CD signal may result from the fact that only one of three tryptophans and one of four tyrosines in a-LA are present in the f-sheet domain. This second possibility is consistent with the previous identification of a stable two-disulfide species of a-LA involving the P-sheet domain and interdomain disulfide bonds, which is transiently populated during reduction of a-LA in the presence of calcium (Ewbank & Creighton, 1993a; Ewbank & High-resolution structural characterization or protein Creighton, 1993b). dissection of a-LA(3) in the presence of calcium should resolve this issue. ACKNOWLEDGMENTS We thank members of the Kim lab for helpful comments regarding the manuscript. 114 REFERENCES Acharya, K. R., Ren, J. S., Stuart, D. I., Phillips, D. C. & Fenna, R. E. (1991) J. Mol. Biol. 221, 571-581. Acharya, K. R., Stuart, D. I., Walker, N. P. C., Lewis, M. & Phillips, D. C. (1989) J. Mol. Biol. 208, 99-127. Balbach, J., Forge, V., van Nuland, N. A. J., Winder, S. L., Hore, P. J. & Dobson, C. M. (1995) Nature Struct. Biol. 2, 865-869. Dobson, C. M. (1992) Curr. Opin. Struct. Biol. 2, 6-12. Dobson, C. M., Evans, P. A. & Radford, S. E. (1994) Trends Biochem. Sci. 19, 31-37. Dolgikh, D. A., Gilmanshin, R. I., Brazhnikov, E. V., Bychkova, V. E., Semisotnov, G. V., Venyaminov, S. Y. & Ptitsyn, O. B. (1981) FEBS Lett. 136, 311-315. Edelhoch, H. (1967) Biochemistry 6, 1948-1954. Ewbank, J. J. & Creighton, T. E. (1993a) Biochemistry 32, 3677-3693. Ewbank, J. J. & Creighton, T. E. (1993b) Biochemistry 32, 3694-3707. Feng, Y., Sligar, S. G. & Wand, A. J. (1994) Nature Struct. Biol. 1, 30-35. Griko, Y. V., Freire, E., Privalov, G., Van Dael, H., and Privalov, P. L. (1995) submitted. Hamano, M., Nitta, K., Kuwajima, K. & Sugai, S. (1986) J. Biochem. (Tokyo) 100, 1617-1622. Hiraoka, Y. & Sugai, S. (1984) Int. J. Peptide Protein Res. 23, 535-542. Ikeguchi, M., Kuwajima, K. & Sugai, S. (1986) J. Biochem. (Tokyo) 99, 1191-1201. Itzhaki, L. S., Evans, P. A., Dobson, C. M. & Radford, S. E. (1994) Biochemistry 33, 5212-5220. Kauzman, W. (1959) in Sulphur in Proteins (Benesch, R., Benesch, R.E., Boyer, P.D., Klotz, I.M., Middlebrook, W.R., Szent-Gyorgyi, A.G. & Schwarz, D.R., Eds.) pp 93-108, Academic Press, New York. Kuwajima, K. (1989) Proteins 6, 87-103. Kuwajima, K., Ikeguchi, M., Sugawara, T., Hiraoka, Y. & Sugai, S. (1990) Biochemistry 29, 8240-8249. Matthews, C. R. (1993) Annu. Rev. Biochem. 62, 653-683. Miranker, A., Robinson, C. V., Radford, S. E., Aplin, R. T. & Dobson, C. M. (1993) Science 262, 896-900. Mitani, M., Harushima, Y., Kuwajima, K., Ikeguchi, M. & Sugai, S. (1986) J. Biol. Chem. 261, 8824-8829. Peng, Z. Y. & Kim, P. S. (1994) Biochemistry 33, 2136-2141. 115 Permyakov, E. A., Yarmolenko, V. V., Kalinichenko, L. P., Morozova, L. A. & Burstein, E. A. (1981) Biochem. Biophys. Res. Commun. 100, 191-197. Priestle, J. P. (1988) J. Appl. Crystallogr.21, 572-576. Ptitsyn, O. B. (1992) in Protein Folding (Creighton, T.E., Eds.) pp 243-300, W. H. Freeman and Co., New York. Ptitsyn, O. B. (1995) Curr. Opin. Struct. Biol. 5, 74-78. Radford, S. E., Dobson, C. M. & Evans, P. A. (1992) Nature 358, 302-307. Redfield, C., Smith, R. A. G. & Dobson, C. M. (1994) Nature Struct. Biol. 1, 23-29. Segawa, T. & Sugai, S. (1983) J. Biochem. (Tokyo) 93, 1321-1328. Snyder, G. H. (1987) Biochemistry 26, 688-694. Van Dael, H., Haezebrouck, P., Morozova, L., Arico-Muendel, C., and Dobson, C. M. (1993) Biochemistry 32, 11886-11894. Wu, L. C., Peng, Z.-y. & Kim, P. S. (1995) Nature Struct. Biol. 2, 281-286. 116 FIGURE LEGENDS FIGURE 1: Calcium induces structural changes in a-LA(P), but not a-LA(a). (a) Schematic representation of human a-LA (Acharya et al., 1989; Acharya et al., 1991) produced with the program RIBBON (Priestle, 1988). The aX-helical domain (white) contains the four x-helices. The P-sheet domain (shaded) contains two short P-strands and several loop structures. A single calcium ion (black ball) binds to the calcium-binding loop, comprised of residues 78-89. a-LA() contains the two disulfide bonds in the ca-helical domain (6-120 and 28-111; white), with the 1-sheet domain and interdomain cysteines replaced by alanines. a-LA(P) contains the disulfide bond in the -sheet domain (61-77; black) and the interdomain disulfide bond (73-91; black), with the cysteines in the a-helical domain replaced by alanines. (b) Calcium titration of x-LA(ca) (open circles) and a-LA(3) (filled circles) at 40 C, pH 8.5, monitored by the CD signal at 208 nm. FIGURE 2: Far-UV and near-UV CD spectra of oa-LA(a) and (a-LA(P) in the presence (filled circles) and absence (open circles) of calcium at 4°C, pH 8.5. Spectra of native a-LA (pluses) are also shown for comparison. (a) (-LA(). (b) a-LA(P). The spectra of native a-LA and a-LA(a) and a-LA(P) in the absence of calcium have been published previously (Wu et al., 1995). FIGURE 3: 1 H NMR spectra of a-LA(a) and a-LA(P) in the presence and absence of calcium at 4°C, pH 8.5. For comparison, spectra of the pH 2 A-state and native a-LA (Peng & Kim, 1994) are also shown. FIGURE 4: Thermal denaturation studies of a-LA(3) in the presence of calcium indicate that formation of native-like structure is cooperative. CD signal of xaLA(3) in the presence (filled circles) and absence (open circles) of calcium are shown. FIGURE 5: Disulfide exchange studies of a-LA(P) at 40 C, pH 8.5. (a) Native conditions with calcium. (b) Native conditions without calcium. (c) Denaturing conditions. The expected equilibrium populations (calculated with a randomwalk model) for ca-LA(P) under denaturing conditions are given in parentheses in (c). The numbers in brackets denote the disulfide-bonded residues in each species. 117 Figure 1 a) b) o -12 -d Co -13 00 p0- -14 10 [CaC12 (WiM) 100 1000 Figure 2 a) oa-LA(u) 0 req O C S -50 -4 -d +- C -100 ++ b0 a) orb S CotO -150 + + ~- + +- + ++ + -10 -200 -15 -250 200 220 240 260 I 260 + I + + I 280 I I 300 I 320 Wavelength (nm) Wavelength (nm) b) a-LA(p) C S rl1-4( C o -50 0) Cl -100 Co b0 o, o -150 -200 -10 -250 -15 200 220 240 Wavelength (nm) 260 260 280 300 Wavelength (nm) 320 Figure 3 a-LA(a) 8 7 6 5 4 3 2 (ppm) 1 0 -1 8 7 6 6 5 IIIIIIIII 4 3 2 10-1 (ppm) 5 4 3 2 (ppm) 1 0 -1 8 7 6 5 4 3 2 (ppm) Native a-LA(p) a-LA(a) + calcium + calcium I l8 8 7 A-state (molten globule) a-LA(p3) I 8 7 6 5 4 3 2 (ppm) 1 0 -1 (ppm) 1 0 -1 Figure 4 -11 -°o°.*°.e 0 C- -12 - 0 o 0 0 000 0 ) -14 0 o 20 40 60 0C) Temperature Temperature (('C) 80 Figure 5 a) NATIVE [61-77; 73-91] 89% [61-91; 73-77] 6% / [61-73; 77-91] 5% Calcium __ 30 35 b) o-LA(p) 50 40 45 Time (min) 55 [61-91; 73-77] 32% [61-73; 77-91] 32% NATIVE [61-77; 73-91] 36% a-LA(3) 30 35 c) 40 45 Time (min) 50 55 [61-91; 73-77] 56% (53%) [61-73; 77-91] 25% (31%) 30 5 NATIVE [61-77; 73-91] 18% (16%) 4o0 45 Time (min) g500 55 Unfolded a-LA(p) Biographical Note Lawren C. Wu Education Present Doctoral Candidate in Biology Massachusetts Institute of Technology, Cambridge, MA 1992 A.B., Chemistry, Summa Cum Laude Princeton University, Princeton, NJ Honors and Awards 1992 1992 1992 1992 Phi Beta Kappa Sigma Xi American Institute of Chemists Student Award Robert Thornton McCay Prize in Physical Chemistry Publications Wu, L.C., Laub, P.B., El6ve, G.A., Carey, J., and Roder, H. "A Noncovalent Peptide Complex as a Model for an Early Folding Intermediate of Cytochrome c." Biochemistry (1993) 32: 10271-10276. Wu, L.C., Grandori, R., and Carey, J. "Autonomous Subdomains in Protein Folding." Protein Science (1994) 3: 369-371. Wu, L.C., Peng, Z.-y., and Kim, P.S. "Bipartite Structure of the -Lactalbumin Molten Globule." Nature Structural Biology (1995) 2: 281-285. Peng, Z.-y., Wu, L.C., and Kim, P.S. "Local Structural Preferences in the a-Lactalbumin Molten Globule." Biochemistry (1995) 34: 3248-3252. Peng, Z.-y., Wu, L.C., Schulman, B.A., and Kim, P.S. "Does the Molten Globule Have a Native-like Tertiary Fold?" Philosophical Transactions of the Royal Society of London (1995) 348:43-47. Wu, L.C., Schulman, B.A., Peng, Z.-y., and Kim, P.S. "Disulfide Determinants of Calcium-Induced Packing in c-Lactalbumin." Biochemistry (1996) 35: 859-863. 123 Wu, L.C., and Kim, P.S. "Hydrophobic Sequence Minimization of the a-Lactalbumin Molten Globule." Proceedings of the National Academy of Sciences USA, in press. Wu, L.C., and Kim, P.S. "A Specific Hydrophobic Core in the a-Lactalbumin Molten Globule." In preparation. Personal Date of birth Place of birth Citizenship October 11, 1970 Wilmington, Delaware USA 124