Article No. mb981881 J. Mol. Biol. (1998) 280, 623±638 X-ray Crystallographic Determination of a Collagen-like Peptide with the Repeating Sequence (Pro-Pro-Gly) Rachel Z. Kramer1, Luigi Vitagliano4, Jordi Bella1, Rita Berisio4 Lelio Mazzarella4, Barbara Brodsky3, Adriana Zagari4 and Helen M. Berman1,2* 1 Department of Chemistry Rutgers University, 610 Taylor Rd, Piscataway, NJ 088548087, USA 2 Waksman Institute Piscataway, NJ 08855, USA 3 Department of Biochemistry, Robert Wood Johnson Medical School, Piscataway, NJ 08855 USA 4 Centro di Studio di Biocristallogra®a, CNR and Dipartimento di Chimica, Universita' di Napoli, via Mezzocannone 4, 80134 Napoli Italy The crystal structure of the triple-helical peptide (Pro-Pro-Gly)10 has been re-determined to obtain a more accurate description for this widely studied collagen model and to provide a comparison with the recent highresolution crystal structure of a collagen-like peptide containing Pro-HypGly regions. This structure demonstrated that hydroxyproline participates extensively in a repetitive hydrogen-bonded assembly between the peptide and the solvent molecules. Two separate structural studies of the peptide (Pro-Pro-Gly)10 were performed with different crystallization conditions, data collection temperatures, and X-ray sources. The polymerlike structure of one triple-helical repeat of Pro-Pro-Gly has been deterÊ resolution in one case and 1.7 A Ê resolution in the other. mined to 2.0 A The solvent structures of the two peptides were independently determined speci®cally for validation purposes. The two structures display a reverse chain trace compared with the original structure determination. In comparison with the Hyp-containing peptide, the two Pro-Pro-Gly structures demonstrate very similar molecular conformation and analogous hydration patterns involving carbonyl groups, but have different crystal packing. This difference in crystal packing indicates that the involvement of hydroxyproline in an extended hydration network is critical for the lateral assembly and supermolecular structure of collagen. # 1998 Academic Press *Corresponding author Keywords: collagen; triple helix; hydration; supermolecular structure; hydroxyproline Introduction The triple helix is the primary structural element in collagen and is an important component of various proteins such as the serum complement protein C1q and the macrophage scavenger receptor. In both cases a triple-helical domain has been R.Z.K. and L.V. contributed equally to this work. Present address: J. Bella, Purdue University, Department of Biological Sciences, West Lafayette, IN 47907, USA Abbreviations used: PPG 0, structure of (Pro-ProGly)10 determined by Okuyama et al. (1981); PPG 1, Ê ; PPG 2, structure of (Pro-Pro-Gly)10 determined to 2.0 A Ê structure of (Pro-Pro-Gly)10 determined to 1.7 A resolution; PPG, the PPG 1 and PPG 2 structures collectively; Gly!Ala, structure determined by Bella et al. (1994); Hyp, hydroxyproline; rms deviation, rootmean-square deviation. 0022±2836/98/290623±16 $30.00/0 found to be the site responsible for binding interactions (Acton et al., 1993; Doi et al., 1993; Hoppe & Reid, 1994). Much of what is currently known about the structure of the triple helix is the result of ®ber diffraction studies on actual collagen and collagen-like peptides (Fraser et al., 1979; Rich & Crick, 1961; Yonath & Traub, 1969). The similarity between ®ber diffraction patterns from synthetic polypeptides and those from native collagen con®rmed their utility as good models for collagen. Recent crystal structures of collagen-like peptides (Bella et al., 1994; Okuyama et al., 1981) have corroborated and expanded what was known from ®ber models. The collagen molecule is known to be a triplehelical coiled-coil, in which each of the three strands has a left-handed, extended polyproline II helical conformation. The three strands then wrap around a common helical axis in a right-handed # 1998 Academic Press 624 fashion. The three strands are held together with interchain hydrogen bonds in the Rich & Crick (1961) collagen II pattern, with a one residue stagger between adjacent chains. The extended, closepacked nature of the triple helix requires a glycine residue in every third position. This creates the repetitive sequence (Gly-X-Y), in which the X and Y positions are frequently occupied by the imino acids proline and hydroxyproline, respectively. Hydroxyproline is formed by the post-translational modi®cation of proline by prolyl hydroxylase, which places a hydroxyl group at the 4-position. Hydroxyproline is common in collagens and in proteins containing collagen-like regions, though it is a rare amino acid in proteins overall. In general, only proline residues on the amino side of a glycine residue are hydroxylated. Hydroxyproline appears to play an important role in the stability of the triple helix. Melting studies of the two triple-helical synthetic peptides (Pro-Pro-Gly)10 and (Pro-Hyp-Gly)10 demonstrated a distinct disparity in their melting temperatures{ (tm). The tm for (Pro-Hyp-Gly)10 is about 58 C in aqueous solution with 10% (v/v) acetic acid, while (Pro-Pro-Gly)10 has a tm value of about 24 C (Sakakibara et al., 1973). Analogous experiments using collagen yielded similar results. Rosenbloom et al. (1973) demonstrated that collagen lacking hydroxyproline is unstable at biological temperatures. The disease scurvy is the result of the improper functioning of prolyl hydroxylase in the absence of its cofactor ascorbate. The repetitive nature of the collagen sequence, (Gly-X-Y)n, has offered unique opportunities for ®ber diffraction, model peptide, and theoretical studies. Over the years, (Pro-Pro-Gly)-based collagen models have been widely investigated as the simplest models to describe collagen triple helices, neglecting the subtle effects introduced by Hyp residues. Yonath & Traub (1969) reported a detailed conformational analysis of the sequential polypeptide poly(Pro-Pro-Gly). Its ®ber diffraction pattern was quite similar to that observed for collagen, and showed signi®cantly higher de®nition that allowed for an improved structure. This model exhibited the basic characteristics of the model previously proposed for collagen by Rich & Crick (1961), and for several years was considered to be the best available for the triple-helical conformation of collagen. Theoretical studies of poly(ProPro-Gly) by Miller & Scheraga (1976) and NeÂmethy et al. (1992) indicated good agreement between the lowest-energy model and experimental data. { Speci®cally, this is the midpoint of the thermal denaturation from the triple-helical state to the nontriple-helical state. { The use of the 75 and 107 notation is intended to be consistent with crystallographic screw symmetry nomenclature, which indicates the handedness of the helix. The 75 and 107 helices are equivalent to 7/2 and 10/3 helices, respectively. X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Sakakibara et al. (1968) utilized solid-phase synthesis methods to obtain collagen-like oligopeptides of de®ned molecular mass. Single crystals of (Pro-Pro-Gly)10 were obtained (Sakakibara et al., 1972), and the crystal structure was determined (Okuyama et al., 1981). Although the end-to-end stacking of the molecules in the crystal structure of (Pro-Pro-Gly)10 was not well determined, the structure exhibited many of the main features of the commonly accepted model for collagen (Fraser et al., 1979). The structure did, however, display subtle differences in helical parameters yielding a triple helix with 75 screw symmetry in contrast to the 107 screw symmetry observed for native collagen{. Helical twist discrepancies notwithstanding, the crystal structure of (Pro-Pro-Gly)10 has been considered by many to be a good high-resolution picture of the molecular conformation of the collagen triple helix. A recent crystal structure determination of a triple-helical designed peptide of sequence (Pro-HypGly)4Pro-Hyp-Ala-(Pro-Hyp-Gly)5, termed Gly! Ala, (Bella et al., 1994, 1995), has provided a highresolution picture of a triple helix. This structure displays regular triple-helical conformation at the Pro-Hyp-Gly repeats at both ends and a bulging in the center of the molecule where one alanine residue in each chain is substituted for a glycine residue. This results in the untwisting of one end of the helix with respect to the other. The Gly!Ala structure demonstrated that there is a delicate and repetitive hydrogen-bonded assembly between the triple-helical peptide molecules and the solvent molecules surrounding them, and that Hyp residues participate extensively in the building of the water network providing extra anchoring points for hydrogen bonding to or from the peptide surface. Analysis of the hydration patterns in (ProPro-Gly)10 in the absence of Hyp residues can provide clues to understand why and how these residues contribute to triple-helical stability and what role they may play in the proper assembly of collagen in vivo. The packing of collagen-like peptides in crystals is directly analogous to the native biological situation in which one molecule must interact with others. This is demonstrated by the similarity of the pseudo-hexagonal lateral packing of the Gly!Ala structure (Bella et al., 1994) to that observed for collagen (Fraser et al., 1983). With this consideration, a re-determination of the crystal structure of the peptide (Pro-Pro-Gly)10 was undertaken. The increased resolution of the Ê , allows for the comparison of the data, 1.7 A hydration patterns of this peptide with those observed in the Gly!Ala structure that does contain Hyp residues. Results and Discussion Two independent crystallization experiments (PPG 1 and PPG 2) were performed with the peptide (Pro-Pro-Gly)10. The structure of the 21 residue X-ray Structure of Repeating (Pro-Pro-Gly) Peptide asymmetric unit (Figure 1(a)) was determined using molecular replacement and an idealized 7-fold triple helix. The reduced size of the asymmetric unit compared with the entire molecule is a consequence of translational disorder along the triple-helical axis, which leads to a molecule that behaves as a quasi-in®nite chain. The crystallographic asymmetric unit consists of 21 residues arranged in three chains of different length; one with sequence Pro-Pro-Gly-Pro-Pro-Gly-Pro-Pro- 625 Gly and the other two with the shorter sequence Pro-Pro-Gly-Pro-Pro-Gly. Because of this particular arrangement, a given continuous peptide chain runs throughout symmetry-generated mates of all three chains in the asymmetric unit (Figure 1(b)). The ®nal PPG 1 model, re®ned to a resolution of Ê , contains 21 peptide residues, 37 water mol2.0 A ecules, and two acetic acid molecules. The ®nal Ê , conPPG 2 model, re®ned to a resolution of 1.7 A tains 21 peptide residues and 40 water molecules. Figure 1. (a) The 21 residue asymmetric unit of (Pro-Pro-Gly)10. One chain has a length of nine residues with sequence Pro-Pro-Gly-Pro-Pro-Gly-Pro-Pro-Gly (dark gray). The other two chains are each six residues long with the sequence Pro-Pro-Gly-Pro-Pro-Gly (medium and light gray). This results in a model with the length of one triple-helical repeat. Interchain hydrogen bonds are shown with broken lines. It should be noted that this representation of the asymmetric unit is arbitrary. The repeating unit could be similarly represented by three chains of seven residues each or by one chain of 21 residues. Whichever representation is chosen, the c-axis unit translation generates the entire triple helix. The Figure was generated with MOLSCRIPT (Kraulis, 1991). (b) Line diagram showing the numbering scheme of the molecular replacement model. The ®rst chain is numbered from one to nine, the second from 31 to 36, and the third from 61 to 66. A cylindrical projection is shown with the ®rst chain repeated on the right-hand-side of the diagram for clarity. Due to the quasi-in®nite nature of the triple helix, covalent bonds are necessary to join the molecule with its symmetry mates both above it and below it along the helical axis. These connections are displayed as well (symmetryrelated residues are indicated with #). For example, the N terminus of the ®rst chain is contiguous with the C terminus of the second chain of a symmetry-related molecule. This connects residue 1 with residue #36. Interchain hydrogen bonds are shown with thick diagonal lines. (c) A 2Fo ÿ Fc electron density map with one Pro-Pro-Gly tripeptide displayed from the PPG_2 structure. The map is contoured at 1s and was generated with SETOR (Evans, 1993). Hydrogen atoms are not shown. 626 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Table 1. Data collection parameters and re®nement statistics A. Data collection Data detection device Data collection temp. ( C) Ê) High resolution limit (A No. of unique reflections Overall completeness (%) Completeness (top shell) (%) Rmerge (based on I) (%)c Space group Ê) a Unit cell dimensions (A b c B. Refinement Ê) Resolution (A No. of reflections (F>2sF) Rcryst (%) Peptide non-hydrogen atoms Water sites Acetic acid molecules rms deviations from standard geometries Ê) Bonds (A Angles (deg.) Impropers (deg.) Ê 2) Average temperature factors (A All atoms Peptide atoms Solvent PPG 0a PPG 1 PPG 2 Diffractometer 12 2.2 787 94 ± ± P212121 26.93 26.42 20.08 CAD4 diffractometer ÿ14 1.97 1136 100 Ê) 98 (2.2± 1.97 A ± P212121 26.82 26.29 20.18 CCD detector 20 1.6 1836 86 Ê) 60 (1.8± 1.6 A 4.9d P212121 27.01 26.42 20.42 Up to 2.2 401b 30 126 21 0 8±1.97 861 18.1 126 37 2 ± ± ± ± ± ± 0.011 2.07 2.11 15.90 13.20 23.44 8±1.6 1736 21.3 126 40 0 0.009 1.81 1.99 21.10 15.62 38.34 a Okuyama et al. (1981) A different criterion for re¯ection selection (F 5 90) was used. c Rmerge jIobs ÿ hIij/ I. d Ê . Rmerge jI1 ÿ I2j/ I2. Rmerge between PPG 1 and PPG 2 data sets is 15.9% for 967 re¯ections between 8 and 1.97 A b Table 1 gives overall statistics for both models. The two models, obtained following signi®cantly different crystallization and data collection conditions, possess very similar molecular conformation and will be referred to collectively as PPG. The ®nal models show good agreement with data, have Rfactors of 18.1% and 21.3% for the PPG 1 and PPG 2 models, respectively, and ®t electron density maps well (Figure 1(c)). In addition, the two hydration networks are very similar, though they were determined completely independently. The structure of (Pro-Pro-Gly)10 determined by Okuyama et al. (1981) will be referred to as PPG 0. Structural description Because of the greater number of re¯ections used in the structure determination and re®nement of PPG 1 and PPG 2, it was possible to remove 7fold symmetric non-crystallographic restraints and thus obtain a more accurate structure than was obtained for PPG 0. The PPG 1 and PPG 2 structures differ from that of PPG 0 in the direction of the chain trace (Figure 2). The intrinsic symmetry of the Pro-Pro-Gly sequence and the quasi-in®nite nature of the helix yield two very similar possible models, differing primarily by the directionality along the helical axis. By using higher-resolution data and a less-constrained model, discrimination Figure 2. PPG 1 and PPG 2 (a) differ from PPG 0 (b) in the directionality of the chain trace. Considering the molecule in the lower left-hand corner of the unit cell as a reference, in the PPG 1 and PPG 2 structures, this molecule is oriented C!N towards the positive crystallographic c axis. In the PPG 0 structure this molecule is oriented in a reverse way with N!C along the positive c axis. The Figure was generated with MOLSCRIPT (Kraulis, 1991). 627 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Ê ) between the PPG 1, PPG 2, PPG 0, Gly!Ala and PPG COMP triple-helical Table 2. The rms deviations (A structures PPG 1 PPG 0a,b Gly!Alac PPG COMPd A. Triple helix (21 residues) PPG 2 PPG 1 PPG 0 Gly!Ala 0.23 (0.21) ± ± ± 0.38 (0.28) 0.43 (0.35) ± ± 0.29 (0.26) 0.33 (0.29) 0.45 (0.35) ± 0.51 0.56 0.64 0.52 (0.41) (0.43) (0.49) (0.43) B. One chain (three Pro-Pro-Gly repeat units) PPG 2 PPG 1 PPG 0 Gly!Ala 0.23 (0.21) ± ± ± 0.34 (0.27) 0.41 (0.37) ± ± 0.29 (0.26) 0.36 (0.31) 0.35 (0.30) ± 0.29 0.39 0.37 0.33 (0.20) (0.29) (0.25) (0.20) C. One Pro-Pro-Gly repeat unit PPG 2 PPG 1 PPG 0 Gly!Ala 0.16 (0.17) ± ± ± 0.23 (0.18) 0.30 (0.28) ± ± 0.15 (0.12) 0.22 (0.22) 0.14 (0.12) ± 0.24 0.33 0.31 0.30 (0.08) (0.19) (0.21) (0.14) The rms deviations have been computed using all non-hydrogen atoms of the structures. The rms deviations computed on backbone atoms (N, Ca, C and O) alone are given in parentheses. a Okuyama et al. (1981). b The PPG 0 model was transformed to the correct chain directionality prior to computation of rms deviations. c Bella et al. (1994), a segment from the regular, Pro-Hyp-Gly portion of the molecule was used in the calculations. d Computationally derived model of NeÂmethy et al. (1992). between the two was possible. The reverse model (similar to PPG 0) was investigated but was found to be incorrect (see Materials and Methods). Rigidbody re®nement trials using the PPG 0 model (not including water molecules) and the PPG 0 model with a reverse chain trace against PPG 0 data gave further evidence that the reversed model is correct. After several cycles of rigid body re®nement using X-PLOR (BruÈnger, 1992) and the data selection criteria of the original determination, i.e. re¯ections Ê with F 5 90 (399 re¯ections), the R-facup to 2.2 A tors for the PPG 0 model and the reversed PPG 0 model are 42.0% and 39.1%, respectively. This pattern is observed also if weak data (697 re¯ections) are included; rigid body re®nement yields R-factors of 46.6% for the original model and 42.6% for the reversed model. Given the similarities between the two models, a larger disparity in R-factors would not be expected. The main conformational characteristics of the polymer-crystal model for (Pro-Pro-Gly)10 are very similar to those reported by Okuyama et al (1981) and to those exhibited by the (Pro-Hyp-Gly)n regions of the crystal structure of the Gly!Ala peptide (Bella et al., 1994). The rms deviations among various structures are given in Table 2. In the polymer-crystal model, three identical chains in polyproline II conformation are aligned in parallel and wrap around the triple-helical axis with a stagger of one residue between adjacent chains. The three chains are held together through hydrogen bonds following the Rich & Crick II pattern (Rich & Crick, 1961) between glycyl NÐ H groups and CO groups of the proline residues in the X position of the neighboring chain, as was observed in the PPG 0 structure (Figure 1(b)). The f and c conformational angles of the ®nal structures are typical of a polyproline II conformation, and are very close to those reported in previous studies of model polypeptides with collagenlike sequences (Table 3). The helical symmetry of the models is almost exactly 75. Helical twist par- Table 3. Averaged values of PPG main chain dihedral angles. The values are compared with those of Gly!Ala, PPG 0 and native collagen structures. Standard deviations are given in parentheses. Torsion Angle o f c o f c o f c ProX ProX ProX ProY ProY ProY Gly Gly Gly a b c PPG 0a 178.2 ÿ75.5 152.0 ÿ176.8 ÿ62.6 147.2 178.2 ÿ70.2 175.4 PPG 1 (this work) 178.6 (0.6) ÿ73.1 (8.8) 159.7 (3.7) 179.1 (1.2) ÿ58.7 (8.1) 161.0 (12.4) 179.6 (0.5) ÿ83.7 (11.2) 179.8 (5.8) Okuyama et al. (1981) Bella et al. (1994), Ala residues are classi®ed with Gly residues. Fraser et al. (1979). PPG 2 (this work) 177.8 (0.7) ÿ75.0 (2.7) 161.4 (3.1) 176.7 (2.1) ÿ61.2 (1.1) 153.3 (2.2) ÿ179.9 (0.2) ÿ75.8 (2.0) 179.5 (3.5) Gly!Alab 179.9 ÿ72.6 163.8 178.5 ÿ59.6 149.8 177.3 ÿ71.9 174.1 (1.8) (7.6) (8.8) (1.5) (7.3) (8.8) (3.1) (9.6) (11.9) Collagen fiberc 180.0 ÿ72.1 164.3 180.0 ÿ75.0 155.8 180.0 ÿ67.6 151.4 628 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Table 4. Helical parameters of PPG 1, PPG 2 and PPG 0, Gly!Ala and native collagen (standard deviations in parentheses) Ê) Helix twist height D (A Helix twist angle y (deg.) PPG 0a PPG 1b PPG 2b Gly!Alac Collagen fiberd 8.6 51.4 (this work) 8.65 (0.08) 51.4 (2.6) (this work) 8.75 (0.03) 51.4 (1.9) 8.4 60 8.6 36 a Okuyama et al. (1981). Calculated by placing the triple-helical axis parallel with the crystallographic c axis and measuring the rotation (y) and translation (D) needed to superimpose each triplet with the next along the chain. c Bella et al. (1994); values include the alanine substitution zone as well as the Pro-Hyp-Gly regions of the structure. d Fraser et al. (1979). b ameters of the PPG 1 and PPG 2 structures are given in Table 4. Although non-crystallographic symmetry restraints were applied during the ®rst part of the re®nement, their removal in later stages did not produce signi®cant changes in the agreement with the X-ray data, indicating that the ®nal models do not deviate signi®cantly from the symmetrical one. Accordingly, the rms deviations between the ®nal unrestrained models and an Ê (0.24 A Ê ) and idealized 7-fold model are 0.32 A Ê (0.18 A Ê ) for PPG 1 and PPG 2, respectively; 0.23 A rms deviations computed on backbone atoms alone are given in parentheses. The average geometrical parameters for the interchain hydrogen bonds are similar to those observed for the Gly!Ala peptide. The average Ê and 2.96 A Ê Gly N to Pro CO distances are 3.01 A for the PPG 1 and PPG 2 structures, respectively. Ê for the Gly!Ala These compare well with 2.94 A structure. The average N OC angles are 165 and 166 for the PPG 1 and PPG 2 structures, respectively. Again, these compare well with the Gly!Ala value of 163 . In addition to the Rich & Crick II interchain hydrogen bonds, the Gly!Ala structure showed evidence of Ca-H OC hydrogen bonds (Bella & Berman, 1996). With the exception of the central disruption zone of the molecule, both Ha1 and Ha2 of Gly residues interact with the Gly CO group of a neighboring chain, creating a bifurcated hydrogen bond. In addition, Ha1 makes a hydrogen bonded interaction with the Pro CO, thus forming a three-centered hydrogen bond. An additional pattern was observed involving the Ha from the Hyp residue, which interacts with the Pro CO. Hydrogen atoms from the Pro residues are directed into the solvent and were not considered. The PPG 1 and PPG 2 structures show analogous patterns. Hydrogen bonding distances and angles are given in Table 5. The puckering of the imino acid rings is dependent on the position of the Pro residue. In general, those in the X position show downward puckering, whereas those in the Y position display upward puckering. The geometries of the upward and downward conformations were described by Momany et al. (1975). This puckering behavior was previously reported for (Pro-Pro-Gly)10 (Okuyama et al., 1981), although the resolution of the data did not allow for discrimination between these two conformations other than by R-factor. This pattern of downward puckering in the X position and upward puckering in the Y position has been described for collagen or collagen-like peptides in Table 5. Average selected hydrogen bonding parameters for PPG 1 and PPG 2 compared with Gly!Ala (Bella & Berman, 1996) Ê) Interatomic distances (A PPG 1 PPG 2 Gly!Alaa Interatomic angles (deg.) PPG 1 PPG 2 Gly!Alaa A. NÐH OC hydrogen bonds HN Gly O Pro X 2.14 (0.11) N Gly O Pro X 3.01 (0.05) 2.05 (0.07) 2.96 (0.07) 2.06 (0.07) 2.94 (0.08) B. Ca ÐH OC hydrogen bonds 2.65 (0.14) Ha1 Gly O Gly Ha2 Gly O Gly 2.91 (0.14) 3.21 (0.12) Ca Gly O Gly N ÐH Gly O Pro X H Gly OC Pro X N Gly O C Pro X 2.56 (0.05) 2.85 (0.08) 3.13 (0.05) 2.63 (0.20) 2.79 (0.16) 3.15 (0.15) Ha1 Gly O Pro X Ca Gly O Pro X 2.51 (0.17) 3.55 (0.16) 2.45 (0.07) 3.49 (0.07) 2.41 (0.18) 3.46 (0.18) Ha Pro Y O Pro X Ca Pro Y O Pro X 2.48 (0.10) 3.36 (0.12) 2.39 (0.04) 3.29 (0.04) 2.52 (0.19) 3.41 (0.16) Ca ÐHa1 Gly O Gly Ca ÐHa2 Gly O Gly Ha1 Gly OC Gly Ha2 Gly OC Gly Ca Gly OC Gly Ca ÐHa1 Gly O Pro X Ha1 Gly OC Pro X Ca Gly OC Pro X Ca ÐHa Pro Y O Pro X Ha Pro Y OC Pro X Ca Pro Y OC Pro X 147 (9) 156 (7) 165 (5) 153 (4) 157 (3) 166 (2) 150 (4) 154 (5) 163 (5) 112 96 95 116 103 164 111 115 138 129 139 112 94 97 118 104 161 109 114 140 130 140 109 100 91 110 99 165 113 117 140 126 136 (6) (5) (5) (6) (5) (6) (6) (5) (6) (4) (5) (3) (2) (1) (1) (1) (3) (3) (3) (1) (2) (2) (7) (8) (5) (6) (5) (6) (8) (8) (5) (5) (5) Hydrogen atoms have been placed based on the crystal coordnates of the heavier atoms, using X-PLOR default parameters (BruÈnger, 1991). Standard deviations are shown in parentheses. a The proline residue in the Y position is hydroxyproline. 629 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide various other experiments: the ®ber X-ray diffraction patterns of native collagen (Fraser et al., 1979); the 2D-NMR measurements of the collagen-like peptide (Pro-Pro-Gly)10 (Li et al., 1993); and the crystal structure of the Gly!Ala peptide (Bella et al., 1994). In contrast to Hyp-containing triple helices, where direct water interactions may also play a role in the conformation of the Y position imino acid, the pattern observed for (Pro-ProGly)10 must necessarily arise from either conformational effects derived from the different favored values of the backbone and/or side-chain torsion angles for Pro in the X or Y position, from local steric effects, or from indirect hydration effects. In the PPG 1 structure, there appears to be one exception to the general puckering preference. This occurs at residue 65. This Y position proline ring puckers in the down conformation rather than in the expected up conformation. Efforts to model this ring with the reverse pucker only raised the R-factor and the model re®ned back to the original down conformation. Electron density maps also seem to con®rm that this residue is in the down orientation. The proline ring at the same position in the PPG 2 structure is puckered in the up conformation. The observation of two different puckering conformations in two similar structures indicates the potential ¯exibility of the proline ring. Recent theoretical studies corroborate this observation (NeÂmethy et al., 1992). Their results suggest that ring puckering is not immutable and that such interchanges may be more readily accomplished in the Y position than in the X position. Crystal packing The distribution of the triple helices follows a pattern that can be envisioned by the position of the intersections of their helical axes with the 001 plane. The intersection points display a tiling made of regular squares and triangles as was ®rst suggested by Okuyama et al. (1981). In this fashion, every helix is ®ve-coordinated and two different kinds of clusters appear: a square cluster in which helices placed diagonally run in parallel and are antiparallel with those of the other diagonal, and a triangle cluster in which two helices run in parallel with and opposite to the third one, no matter which triangle or square is considered (Figure 2(a)). Because of this mixed-parallel nature of the molecules, there is only quasi-tetragonal symmetry and the structure falls instead into the P212121 space group with 2-fold rather than 4-fold symmetry. Aperiodic lattices made of squares and triangles have been invoked to account for the pseudo-hexagonal pattern of lateral packing between collagen triple helices (Sasisekharan & Bansal, 1990). This tiling provides a useful classi®cation tool for the analysis of the water distribution. Hydration analysis From the early stages of the re®nement, electron density maps displayed a considerable number of maxima that by their shape, distance to main-chain atoms and orientation, were good candidates for water molecules. Non-crystallographic symmetry restraints have not been applied to the water molecules, in contrast to the procedure utilized for the PPG 0 structure. As solvent molecules are more dependent on local environments and the true crystallographic packing symmetry is incompatible with 7-fold symmetry, water molecules must be distributed in a non-symmetric way that is dependent on the packing arrangement of the triple-helical molecules. Prior to the addition of any water molecules, signi®cant density appeared in the Fourier maps in the ``triangle'' regions, but very little density was apparent in the ``square'' regions. After 28 water molecules had been included in the PPG 1 model, density for reasonably well-de®ned water molecules in the square regions became evident (Figure 3(a)). In this way it is clear that the addition of the initial water molecules enhances the phasing of the entire structure. The situation was similar for the PPG 2 structure, in which an automated water-picking procedure was used and the ®rst water molecules chosen were in the triangle regions. In both models, the density in the square regions remained much more diffuse than in the regions of closer intermolecular contact; water molecules in the square regions correspond to lower peaks in the electron density maps and do not participate as readily in discernible interwater links. This suggests that the water in the square regions is somewhat less ordered than that in other areas, leading possibly to a bulk solvent channel. This may lead to increased disorder of the solvent structure in this area as well as of the portion of the molecule contacting this region. For example, the reverse-puckered prolyl ring at position 65 in the PPG 1 structure, occurs in the square region of the packing. The ®nal PPG 1 polymer-crystal model for (ProPro-Gly)10 contains 37 water molecules and two acetic acid molecules. The PPG 2 model contains 40 water molecules. In both structures these represent average solvation positions along the extended unit cell. Because of the differences in the crystallization conditions, data collection techniques and resolution, small differences in the solvent distribution would be expected between the two models. The majority of the water molecules participate in extensive hydrogen bonding with peptide carbonyl groups and/or other water molecules, in a way that is clearly reminiscent of what has been observed for the Gly!Ala peptide (Bella et al., 1995) and comprises a coherent water network around the triple helix that can be divided into multiple hydration shells. The ®rst hydration shell contains 20 water molecules that are directly bound to the peptide chain. 630 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Figure 3. (a) Molecular packing looking down the helical axis of PPG. The packing can be viewed as having two general regions, one triangular and the other square (shown with broken purple lines). The water molecules in the square regions appear to be less ordered and more diffuse. The hydration pattern in the center of the square regions maintains a pseudo-tetragonal distribution. Helices are surrounded by ®ve nearest-neighbors that vary in distance from Ê to 13.9 A Ê . Two other neighbors, across the square regions are 19.4 A Ê and 19.6 A Ê away. The unit cell is shown 13.5 A with thin, broken lines. (b) The packing of Gly!Ala is hexagonally closest packed with the six generally similar Ê to 14.9 A Ê . The molecules of Gly!Ala appear thicker than those of PPG interhelical distances ranging from 14.0 A because of the additional length of the molecule (90 amino acid residues compared with 21) and the unwinding of Ê . The Figure was generated with one end of the Gly!Ala helix with respect to the other. All distances are in A CHAIN (Sack, 1988). On average, the water molecules are positioned Ê and 2.81 A Ê from carbonyl groups for PPG 1 2.97 A and PPG 2, respectively. This shell is characterized by a repetitive pattern in which one water molecule is bound to the glycyl carbonyl group (Figure 4(a)) and two are bound to the prolyl carbonyl group in the Y position (Figure 4(b)). As was observed in the Gly!Ala structure, the Gly carbonyl group points slightly more towards the molecule than the Pro carbonyl group in the Y position and the second water position is occupied by the Ca of a glycine residue from a neighboring chain. As a result, Ha1 and Ha2 from the neighboring Ca make hydrogen bonding contacts to the Gly carbonyl group (Table 5). The two positions on the Y prolyl carbonyl group can be termed WN and WA according to their proximity to the nitrogen and the a-carbon, respectively{. These water bridges satisfy all backbone polar groups, since the carbo{ A numbering scheme has been developed for these water molecules in which those attached to the ®rst chain are numbered 101 to 109, the second chain 111 to 119 and third chain 121 to 129. They then form groups of three in which the ®rst is in the WA position on the prolyl carbonyl group, the second is that in the WN position, and the third is attached to the glycyl carbonyl group. All second and third shell water molecules are given numbers in the 200s. nyl group of the proline residues in the X position participate in interchain hydrogen bonds with glycine NÐ H groups. In this way, the ®rst shell hydration pattern involving carbonyl groups is identical with that reported in the Gly!Ala structure. This demonstrates that this portion of the pattern is sequence/hydroxyproline-independent and can be a general feature of the triple-helical motif. The sole exception in this pattern occurs in both the PPG 1 and the PPG 2 models at residue 35. Here, at this Y position proline residue, one of the two expected water molecules is missing; the water molecule in the WN position is present and that in the WA position is absent. This can be explained by the proximity of the carbonyl group to the less-dense square region of the molecular packing (Figure 3(a)). A water molecule in this position would fall into the square region. The repetitive regularity of the ®rst hydration shell is further demonstrated by the similarity between the PPG 1 and PPG 2 structures. The rms deviation between the water molecules of the ®rst Ê. hydration shell of these two structures is 0.85 A The long distance between one of these water molecules in the PPG 1 structure (water 121) and the carbonyl group can be explained by its proximity to the square region, where the water seems to be generally much less ordered. As a result, the position of this water molecule may not be well X-ray Structure of Repeating (Pro-Pro-Gly) Peptide de®ned. The uniformity between the two structures is particularly interesting in that these water molecules represent average positions along the triple helix (because of the reduced size of the asymmetric unit), indicating that these positions are extremely well conserved. Proline residues in the Y position do not contain a hydrophilic hydroxyl group, but still are surrounded by ordered water that would fall within a reasonable hydrogen bonding distance from a simulated hydroxyl group (Figure 4(c)). When the averaged water positions surrounding such a hypothetical hydroxyl group are superimposed with those found for the Gly!Ala peptide, the averaged positions from PPG are close but not exactly coincident with those of Gly!Ala. However, diffuse portions of the PPG averaged water density fall on both the WB and WD2 positions (Bella et al., 1995) of the Gly!Ala structure. It can be proposed that the water surrounding the Y position proline, once in the presence of a hydroxyproline residue, becomes better localized and is thereby shifted into the correct bridge-building geometry. This would be accomplished without much loss in entropy, since the water molecules already occupy ordered positions, but with a gain in enthalpy through the formation of hydrogen bonded contacts. These observations indicate that the existence of localized water surrounding the Y position imino acid is itself not dependent on hydroxyproline, but the presence of hydroxyproline induces the formation of additional water bridges, greater localization, and a more extensive hydration network. X position proline rings are not similarly surrounded by ordered water positions. Additional water molecules form a second hydration shell, that is water molecules that are bound to those bound to the peptide chain. These water molecules form repetitive bridges that are similar to the a (intrachain), b (interchain) and o (intermolecular) bridges connecting carbonyl groups as described for the Gly!Ala structure (Bella et al., 1995; and see Figure 5(a)). In the PPG 1 structure, acetic acid molecules take the positions of water molecules and participate in bridges. The second hydration shell, in general, forms three-water molecule intra- and interchain bridges (a3 and b3 bridges){ connecting water molecules of the ®rst hydration shell. In some cases, a water molecule from the ®rst hydration shell of one peptide molecule is from the second hydration shells of another symmetry-related peptide molecule. These bridges are quite repetitive and fundamentally pentagonal in shape, with carbonyl groups occupying the additional two apices { When considering bridges, the usual requirements for hydrogen bonding distances have been taken rather loosely because of the averaged nature of the structure. Long and short distances can be considered to be an effect of the averaged nature of the structure. In general, the overall appearance of the bridge was considered. 631 Figure 4. Water distribution diagrams around the carbonyl groups of (a) glycine and (b) the Y position proline residues of PPG. Water molecules were chosen using a Ê cutoff from the carbonyl group. The method of 3.25 A Schneider et al. (1993) was used to calculate threedimensional contours. These positions are very similar to those shown for the Gly!Ala peptide by Bella et al. (1995), wherein the glycine carbonyl group has one water bonding position and the proline carbonyl group has two. The water positions are labeled WN or WA according to their proximity to the N or Ca atoms, respectively. (c) Water positions surrounding the ring of proline in the Y position. A hydroxyl group (shown in red) was modeled at the Cg position of the proline ring. Ê of the simulated Od were Water molecules within 3.5 A selected (red) and contours were calculated with these water molecules (also shown in red). The resulting contours and averaged water positions were superimposed on those from hydroxyproline from the Gly!Ala structure. The contours and water molecules from the Gly!Ala structure are shown in blue, and are labeled WD1, WD2 and WB according to their respective proximity to Cd and Cb. The superimposition demonstrates that while the general positions between the two structures are different, it is conceivable that a hydroxyproline residue could direct the water molecules around the Y proline residue into the positions from the Gly!Ala structure. The Figure was generated with CHAIN (Sack, 1988). of the pentagon (Figure 5(b)). Five of the six potential hydration positions occupied by acetic acid in the PPG 1 structure are occupied by water molecules in the PPG 2 structure. The second hydration shells of PPG 1 and PPG 2 have 18 hydration positions in common, including the ®ve positions from acetic acid molecules. The rms devi- 632 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Figure 5. Examples of hydration structure in the (a) Gly!Ala structure (b) PPG 1 or PPG 2 and (c) PPG 0 structures. In (a) and (b) two water molecules are bound to the carbonyl group of the Y position proline residue (WA and WN) and one water molecule is bound to the glycine carbonyl group (WN). In (a) two additional water molecules are bound to Od of the hydroxyproline residue. Inter- and intrachain water bridges are then formed by interconnecting water molecules. In general, the water structure of PPG shows repetitive pentagonal-like inter and intrachain bridges between carbonyl groups. In (c) two water molecules, W1 and W2, are bound to the carbonyl group of the Y position proline residue. W1 is also bound to the glycine carbonyl group, forming a one water molecule intrachain bridge. W1 and W2 are connected by W3, forming a three water molecule interchain bridge similar to that seen in (a) and (b). The Figure was generated with MOLSCRIPT (Kraulis, 1991). ation of these 18 water positions between the Ê . Included in PPG 1 and PPG 2 structures is 0.86 A these 18 water molecules are the pseudo 4-fold water positions that can be seen in the center of the square region in Figure 3(a). Thus, even within this region of decreased order there is still a large degree of structural similarity. The a bridges (Figure 6(a)) connect the Y position prolyl carbonyl group with the immediately following glycyl carbonyl group, utilizing the water molecule in the WA position on the proline carbonyl group. The b bridges (Figure 6(b)) connect the glycyl carbonyl group in one chain with the Y position prolyl carbonyl group in the adjacent chain, utilizing the WN position of the proline carbonyl group. Thus, the water molecule that is attached to the glycyl carbonyl group participates in two bridges (one inter- and one intrachain. Interchain b3 bridges are made between the same chains, as are the interchain hydrogen bonds, thus reinforcing the triple-helical structure. In a few cases, the a-bridge pentagons are distorted by proline rings from neighboring helices that occupy the position in which the water molecule would seem most naturally to ®t and two water bridges can be envisioned. In other cases, interchain distances are bridged by four water b bridges or have one particularly long leg. These perturbations can be seen as a consequence of the proximity of the bridge to the square region or an interfering proline ring. A variety of o bridges are also observed, i.e. bridges between different neighboring helices. Figure 7 shows an example of an o bridge formed by the intersection of two interchain bridges. The length of these bridges is dependent on their location with regard to interhelical packing, and may include two, three or four water molecules. As in the Gly!Ala structure, there are no direct contacts between the peptide molecules them- selves. Any intermolecular interactions occur through water molecules and o bridges. In contrast to Gly!Ala, few water molecules in a third hydration shell are seen. In the regions of close packing (triangular and interhelical areas), the peptide molecules are too close to allow a third layer. In this region, water molecules in the second hydration shell from one helix may become the ®rst shell water molecules of an adjacent symmetry-related helix or there may be interaction between second shell water molecules making the third shell water molecules from one helix the second shell water molecules from another. In the regions of less dense packing (square areas) the distances between helices are greater. While ordered water molecules do appear in these regions, there are overall fewer and the bridging patterns are less distinct. Hence most of the ordered water molecules lie in the triangular or interhelical regions. The PPG 1 and PPG 2 determinations differ essentially by four water positions in the PPG 1 structure and two in the PPG 2 structure. Three of the four water molecules from the PPG 1 structure lie in the square region. Comparison of packing with that of the Gly!Ala structure While the Gly!Ala and PPG structures have similar molecular conformation (in terms of f/c angles, puckering, and interchain hydrogen bonding) and ®rst hydration shell patterns where carbonyl groups are involved, the supermolecular arrangements of the molecules are different. In the Gly!Ala structure, the triple helices pack in a way that is reminiscent of the putative quasi-hexagonal closest packing of collagen ®brils (Figure 3(b)). In PPG, the distribution of the mol- 633 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Figure 6. Water bridging patterns connecting carbonyl groups along the chain. (a) Intrachain a bridges may utilize either two or three water molecules. In one case an a bridge is missing due to the absence of a water molecule in the WN position on residue 35. (b) Interchain b bridges may incorporate three or four water molecules. In the PPG 1 structure an acetate molecule participates in the b bridge connecting residue 33 with residue 62, occupying two of the four hydration positions. Long bridging distances are marked with an asterisk (*). These bridges were included as they give the general appearance of a b bridge. Glycine Ca atoms have been omitted for clarity. ecules is different (Figure 3(a)). As mentioned above, the packing can be described by a series of intersecting triangles and squares in which each triple helix is surrounded by ®ve close neighbors and two neighbors that are further away across the square region. In this way, the molecules are not as equivalently or symmetrically placed as they are in the Gly!Ala structure. Comparing interhelical distances, the molecules of Gly!Ala are all separated Ê , producing six essentially by about 14 to 15 A equivalent interactions for any one helix, whereas in PPG, the interhelical distances are more varied. In the triangle regions, the ®ve helices are about 13 Ê apart, slightly smaller, but similar to to 14 A Gly!Ala. However across the square region the Ê apart. helices are about 19 A These two distinctly different forms of packing indicate that the extensive water network that hydroxyproline induces is related to the determination of lateral molecular packing and therefore supermolecular structure. As collagen is required to form ®brils and other higher-order structures, the interaction with other molecules is critical. Comparison with the PPG 0 structure Although the interchain hydrogen bonds in PPG 0 maintain normal distance conformation, Ê between glycyl NÐ H groups and CO 2.86 A groups of the proline residues in the X position; the average N OC angles for the PPG 1 and PPG 2 structures (165 and 166 , respectively) are different when compared with those of PPG 0 (152 ). Overall, the rms deviations demonstrate that PPG 1 and PPG 2 are closer in form to Gly!Ala than to PPG 0 (Table 2). While the ®rst hydration shell that is observed in PPG 1 and PPG 2 is very similar to that of Gly!Ala, it is signi®cantly different from that reported for PPG 0 (Okuyama et al., 1981), in which one water molecule (W1) links the carbonyl group of the Y proline residue and the following 634 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Conclusions Figure 7. An example of interhelical o bridges. The symmetry-related helix is shown in dark gray; symmetryrelated water molecules are marked with an (*). This bridge pattern can be viewed as having o2 (through water molecules 106 and *102) and several o4 (through water molecules 115, 201, *204 and *109, for example) connections. The pattern is formed by the intersection of two b3 bridges (the ®rst through water molecules 115, 201 and 106, and the second through water molecules *109, *204 and *102). This bridge occurs in an interhelical triangular region. This Figure demonstrates how interchain bridges span about one-seventh of the way around the helix. Water molecules are shown in light gray and carbonyl oxygen atoms involved in the bridges are shown in black. The Figure was generated with MOLSCRIPT (Kraulis, 1991). glycine residue (Figure 5(c)). It was proposed that this water molecule could stabilize the glycine conformation and therefore the triple-helical structure. The angle that the hydrogen bonds make in this case is 68 . This angle seems possibly unstable and it is likely that there is a lower-energy way to make an intrachain water bridge. It is unlikely that a water molecule with such a small hydrogen bonding angle would do so. The PPG 1 and PPG 2 structures demonstrate that this direct link does not exist, and the stabilization of the triple helix must arise from more extensive water bridges. A second water molecule (W2) in PPG 0 attached to the Y prolyl carbonyl group is analogous to the water molecule that has been observed on the Y position proline residue in the WN position. An interchain water bridge was observed involving three water molecules wherein W1 and W2 are connected by a third water molecule (W3). This bridge is nearly identical with the interchain b3 bridges observed in the PPG 1 and PPG 2 structures. We present two high-resolution structures of a long-studied collagen-like polypeptide that improve a previous structural determination (Okuyama et al., 1981) In comparison, the structures presented here display a reversal in chain direction and a different hydration pattern. Although the molecules in the crystals retain a polymer-like organization, a high-resolution averaged model for a triple-helical structure with a Pro-Pro-Gly sequence can be described. Two separate structural determinations were made using different crystallization conditions, data collection temperatures, and X-ray sources; yet the results are essentially the same. This serves to demonstrate that the ®ndings are not conditiondependent, but rather are representative of the sequence. The failure to get completely ordered crystals, despite varied conditions, shows that the non-speci®c packing observed can be a consequence of the regularity of this sequence. This can be indirect evidence of the importance of some sequence variety and speci®city for correct lateral assembly in native collagen. In the two determinations, the peptide structures are quite similar to each other and show close agreement with the ®rst atomic resolution structure of a triple helix (Bella et al., 1994). The present model shows a clear pattern for the puckering of the imino acids (Pro X down, Pro Y up) consistent with previous ®ndings (Fraser et al., 1979) and is similar to that observed in the Pro-Hyp-Gly regions of Gly!Ala but demonstrates that the Y position has the potential to be ¯exible, and may adopt the alternative pucker. The close similarity of molecular structure between these two high-resolution structures con®rms that the presence of hydroxyproline does not directly affect the molecular structure in an imino acid-rich region of collagen and therefore the structural stability of the triple helix related to hydroxyproline arises solely from proteinwater interactions. The ®rst hydration shells of PPG 1 and PPG 2 also display a high level of agreement with each other and with the Gly!Ala structure. Differences among the structures occur primarily in the extended water structure. Further, the ordered hydration found around the proline ring in the Y position demonstrates that even in strongly hydrophobic regions, the triple helix maintains extensive hydration. This indicates that while hydroxyproline is not necessary for hydration, its presence adds stability and interconnectivity to the water network that may be necessary for the functioning of native collagen. This involvement of hydroxyproline was suggested by the Gly!Ala structure (Bella et al., 1994, 1995) but the dissimilarity of its packing with that of the (Pro-Pro-Gly)10 structures demonstrates that this role for the extended hydration network induced by hydroxyproline is X-ray Structure of Repeating (Pro-Pro-Gly) Peptide 635 more extensive and directly related to lateral assembly and supermolecular structure. Materials and Methods Crystallization experiments Two separate, independent sets of crystallization experiments were performed using the peptide (Pro-ProGly)10, yielding crystals grown under different conditions. In both sets of trials (PPG 1 and PPG 2) the hanging-drop vapor diffusion technique was employed. In the ®rst set of experiments (PPG 1), (Pro-Pro-Gly)10 was purchased from Peptides International. X-ray diffraction quality crystals were obtained at 4 C from 10 ml drops containing initial concentrations of 4.0 mg/ml of peptide dissolved in 10% (v/v) acetic acid, 0.1% (w/v) sodium azide, and 3.0% (w/v) PEG 400, equilibrated against a reservoir containing 1 ml of 6.0% PEG 400. The crystals were orthorhombic in shape with typical dimensions of approximately 0.2 mm 0.2 mm 0.1 mm. In this particular setting, acetic acid migration from the drop to the reservoir produced a gradual increase in the pH value of the drop, which resulted in nucleation processes and eventually in the appearance of single crystals. The second set of crystallization experiments (PPG 2) utilized peptide purchased from Peninsula Laboratories Europe LTD. Small square plates of lengths ranging from 0.01 to 0.20 mm were grown within one to two weeks. Drops (10 ml) containing 7.5 mg/ml peptide (dissolved in 5% (v/v) aqueous acetic acid) and 0.05 M sodium acetate were equilibrated at room temperature against 1.0 ml reservoirs of 0.1 M acetate buffer at pH 5.5. A mass spectroscopic analysis of dissolved crystals indicated that they were composed entirely of chains that were ten triplets long. Diffraction experiments on a PPG 1 crystal with a maximum dimension of 0.2 mm were carried out at ÿ14 C on an Enraf Nonius CAD4 diffractometer using CuKa radiation. Data up to a maximum resolution of Ê were collected. The majority of the observed dif1.97 A fraction data could be indexed in an orthorhombic unit Ê , b 26.29 A Ê and cell with dimensions a 26.82 A Ê . The space group was determined to be c 20.18 A P212121, with one triple-helical molecule in the asymmetric unit. Intensity measurements were corrected for Lorentz-polarization and absorption with MOLEN (Fair, 1990; and see Table 1). Intensity data from a PPG 2 crystal were collected at room temperature at the A1 beamline of the Cornell High Energy Synchrotron Source (CHESS) using the Ê . In all, oscillation method and a wavelength of 0.91 A 118 images were recorded on a CCD detector at a distance of 47 mm, with an oscillation angle of 1 from a single crystal with a maximum length of 0.05 mm. The Ê , although pareffective resolution of the data was 1.7 A Ê were also collected and tially complete data up to 1.6 A used in re®nement. The images were indexed and integrated using DENZO (Otwinowski, 1993) and merged with SCALEPACK (Minor, 1993) The overall Rmerge (on I) is 0.049 with a completeness of 86% and a mosaicity of 0.8 . Again, the space group was determined to be Ê , b 26.42 A Ê P212121 with cell dimensions a 27.01 A Ê (Table 1). A complete data set was also and c 20.42 A collected on a ¯ash-frozen crystal. However, the freezing increased the crystal disorder. These data were therefore not used. Ê Figure 8. The subcell of (Pro-Pro-Gly)10. The entire 86 A model is shown in dark gray and the seven residue model used in molecular replacement is show in light gray. The subcell is clearly not large enough to hold the entire molecule and is exactly the height of one triplehelical repeat. The Figure was generated with MOLSCRIPT (Kraulis, 1991). The predicted length of a 30 residue collagen triple Ê . Thus, the observed unit cell rephelix is about 86 A resents a subcell (Figure 8). Both diffraction sets also show evidence of a longer unit cell with identical a and b Ê axes, and a c axis ®ve times as long: c0 5c 100.9 A Ê (PPG 2 set). Identical ®ndings (PPG 1 set) or 102.1 A have been reported previously for this peptide (Okuyama et al., 1981). The data of the dominant subcell corresponds to those re¯ections with l0 5n. Re¯ections with l0 5n m (m 1, 2, 3, 4), were observed as well, especially with the synchrotron data, but their average intensities were much lower than those from the subcell. Of the 1193 5n 4 re¯ections collected, only 58% had intensities greater than 1s(I). The rise per tripeptide in Ê , and strong collagen triple helices is known to be 2.9 A re¯ections (0 1 7) corresponding to that spacing appear in the PPG 1 data near the c axis. These re¯ections were not measured in the PPG 2 data collection. The Ê dimension of the short c axis corresponds to a 20 A complete turn of a 7-fold triple helix aligned along the c axis. The dominance of this reduced cell can be interpreted in terms of a structure partially disordered along the helical axis, in which the peptide molecules stack on top of each other to form a columnar structure. Because of crystalline disorder, this columnar structure resembles an in®nite chain in which an individual triple helix cannot be discriminated from that above or below it. During the course of the work reported here, several attempts were made to model this disorder using the PPG 2 data from the l0 5n m re¯ections, but the data proved to be insuf®cient to discriminate among individual molecules along the helical axis in the extended cell. Conse- 636 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Figure 9. Comparing model A with model B from initial molecular replacement. (a) Model A. Presumably, the carbonyl group of the proline residue in the X position should make a hydrogen bonding contact with the N atom of the glycine residue in the neighboring chain. Model A displays ``normal'', N(Gly) to O(Pro)X hydrogen bonds in terms of length and orientation (broken single line). The distance to the Ca of the third chain is longer and the orientation is not as appropriate for a hydrogen bond (broken double line). (b) Model B. Interchain hydrogen bonding geometry is perturbed. In model B, the oxygen atom appears to be somewhat pointed toward the Ca (broken double line), rather than toward the N(Gly), and this length is reasonable for a hydrogen bond while the distance to the N(Gly) becomes longer (broken single line). (c) The high-symmetry sequence and quasi-in®nite helical nature of Pro-Pro-Gly implies that an end-to-end rotation of the peptide chain would give analogous models with only the N and Ca positions transposed. Parts (a) and (b) of the Figure were generated with MOLSCRIPT (Kraulis, 1991). quently, the average crystal structure corresponding to an in®nite polymer crystal, in which the c axis is the helical repeat of a 7-fold collagen triple helix, was solved and re®ned using an asymmetric unit of 21 residues. Structure determination and refinement A simpli®ed molecular replacement search was performed using the LALS program (Campbell-Smith & Arnott, 1978) and an idealized 21 residue fragment of a 7-fold triple helix. The polymer-crystal nature of this structure reduces the number of search variables from six to four. Since the helix axis can be aligned with the crystallographic c axis, the orientation search is reduced to the azimuth angle m. Three translational variables, u, v { Given the small number of re¯ections, the discriminatory power of the free R-factor was signi®cantly reduced, so other validation criteria were taken into account subsequently. and w, are still required to place the model correctly with respect to the unit cell origin. To simplify the search further, a ®rst solution for the (u, v, m) variables was obtained based on the 31 equatorial-like hk0 re¯ections Ê . Then, the search was extended (using all the up to 4 A Ê , 127 re¯ections) into the third dimension data up to 4 A by varying the vertical displacement w and the azimuth angle m. Two independent solutions (models A and B) were found differing in the u translation and the m rotation, which showed similar agreement with the X-ray data; R 40.75 % and 41.15 % after rigid-body re®nement. Torsion-angle re®nement of both models in parallel using LALS (Campbell-Smith & Arnott, 1978) produced non-discriminative results. Positional and overall B-factor re®nements along with simulated annealing were performed on both models Ê with X-PLOR (BruÈnger, 1991). Data between 8 and 2.0 A were used and 7-fold non-crystallographic symmetry restraints were maintained. As the re®nements proceeded, model B appeared to give somewhat higher R and R-free values{, higher rms deviations against stan- 637 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide dard geometries, as well as somewhat higher constraint energies (bonds and angles). However, neither set of Fourier maps (2FoÿFc) was signi®cantly better or worse in terms of chain continuity or coverage. An investigation of the hydrogen bonding geometries in both models provided the ®nal answer. While model A has interchain N Ð H OC hydrogen bonds with reasonable geometry, model B did not. In model B, interchain N O distances are longer than Ca O distances, as if the primary hydrogen bond donors were the alpha carbon instead of amide nitrogen atoms (Figure 9(a) and (b)). This is consistent with the effect of an end-to-end rotation of the triple helix. Because there is so much inherent symmetry in the Pro-Pro-Gly sequence and the quasi-in®nite helix eliminates end-effects, an end-to-end rotation of the model leads to differences in just a few places. While the carbonyl oxygen atoms and proline rings remain in the same location, only Ca is substituted for N and vice versa (Figure 9(c)). Inverted modeling of the peptide helix imposes incorrect stereochemical restraints and yields inverted non-bonded geometry between the N Ð H OC hydrogen bonds and the Ca OC non-bonded interactions. A review of the 2Fo ÿ Fc maps at this point further con®rmed this notion, and consequently only model A was kept for subsequent rounds of re®nement. Several constraints were imposed in X-PLOR (BruÈnger, 1991) to ensure the matching of the end of one helix with the beginning of the next. The nature of the in®nite triple helix requires covalent bonds among symmetry molecules related along the helical axis. Non-crystallographic symmetry restrictions were removed and the resulting model underwent many rounds of simulated annealing, positional re®nement, and manual water molecule ®tting. At a point near the end of manual water ®tting using the PPG 1 data set, the coordinates were used to re®ne against the PPG 2 set. The initial Rfactor upon placing the model (not including water molecules) against the PPG 2 set was about 27% for data Ê . This and the preservation of elecbetween 8 and 1.8 A tron density connectivity along the c axis indicated consistency between the two data sets. Re®nement of the hydration structure continued against the two sets in parallel, including positional re®nement, simulated annealing, as well as group (PPG 1) and individual (PPG 2) B-factor re®nement. Independent re®nement of the PPG 2 structure was performed using both X-PLOR (BruÈnger, 1991) and PROLSQ (Hendrickson & Konnert, 1981). At the start of the re®nement of the PPG 2 structure, only the peptide portion of the PPG 1 structure was used; the Fourier was investigated independently for hydration peaks. The ®nal PPG 1 model contains 37 water and two acetic acid molecules and the ®nal PPG 2 model contains 40 water molecules. The ®nal R-factor of the PPG 1 model against the PPG 1 set of data (for re¯ections in Ê range using a 2s on F cutoff) is 18.1 % the 8.0 to 1.97 A and for the PPG 2 model against the PPG 2 set is 21.3% Ê range using a 2s on F (for re¯ections in the 8 to 1.6 A cutoff; Table 1). The ®nal models ®t the 2Fo ÿ Fc maps well and show no signi®cant chain discontinuities. The rms deviations for both models against standard geometries are given in Table 1. Coordinates for both the PPG 1 model and PPG 2 model have been deposited in the Brookhaven Protein Data Bank as 1a3i and 1a3j respectively. Structure factors have been deposited as well with the codes r1a3isf and r1a3jsf for PPG 1 and PPG 2, respectively. Acknowledgments Overall support for this project was received from grants GM 21589 to H.M.B. and AR19626 to B.B. from the National Institutes of Health as well as a grant from the Pittsburgh Supercomputing Center. The research of L.V. has been partially supported by an International Exchange Program award from the University of Naples ``Federico II''. Financial support was provided by the Italian CNR (National Research Council, Progetto Strategico ``Biologia Strutturale'') and ASI (Italian Space Agency). Computers and graphic facilities were made available by Ceinge (Naples). The research of R.Z.K. has been supported by the National Institutes of Health Molecular Biophysics Training Grant and the Department of Education's Graduate Assistance in Areas of National Need Grant. A.Z. is grateful to Professor H. A. Scheraga for his interest and encouragement. We are indebted to the CHESS staff and particularly to R. Walter for his constant support and assistance during data collection, and to G. Sorrentino and P. Occorsio for their technical assistance. R.Z.K. and L.V. contributed equally to this work. References Acton, S., Resnick, D., Freeman, M., Ekkel, Y., Ashkenas, J. & Krieger, M. (1993). The collagenous domains of macrophage scavenger receptors and complement component C1q mediate their similar, but not identical, binding speci®cities for polyanionic ligands. J. Biol. Chem. 268, 3530± 3537. Bella, J. & Berman, H. M. (1996). Crystallographic evidence for Ca-H OC hydrogen bonds in a collagen triple helix. J. Mol. Biol. 264, 734± 742. Bella, J., Eaton, M., Brodsky, B. & Berman, H. M. (1994). Crystal and molecular structure of a collagen-like Ê resolution. Science, 266, 75 ± 81. peptide at 1.9 A Bella, J., Brodsky, B. & Berman, H. M. (1995). Hydration structure of a collagen peptide. Structure, 3, 893± 906. BruÈnger, A. T. (1992). X-PLOR, Version 3.1, A system for X-ray Crystallography and NMR. Yale University Press, New Haven, Cl. Campbell-Smith, P. J. & Arnott, S. (1978). LALS: a linked-atom least-squares reciprocal-space re®nement system incorporating stereochemical restraints to supplement sparse diffraction data. Acta Crystallog. sect. A, 34, 3 ±11. Doi, T., Higashino, K.-i., Kurihara, Y., Wada, Y., Miyazaki, T., Nakamura, H., Uesugi, S., Imanishi, T., Kawabe, Y. & Itakura, H. (1993). Charged collagen structure mediates the recognition of negatively charged macromolecules by macrophage scavenger receptors. J. Biol. Chem. 268, 2126 ±2133. Evans, S. V. (1993). SETOR: hardware lighted threedimensional solid model representations of macromolecules. J. Mol. Graph. 11, 134± 138. Fair, C. K. (1992). MOLEN: An Interactive Structure Solution Procedure. Enraf-Nonius, Delft, Netherlands. Fraser, R. D. B., MacRae, T. P. & Suzuki, E. (1979). Chain conformation in the collagen molecule. J. Mol. Biol. 129, 463± 481. Fraser, R. D. B., MacRae, T. P., Miller, A. & Suzuki, E. (1983). Molecular conformation and packing in collagen ®brils. J. Mol. Biol. 167, 497±521. Hendrickson, W. A. & Konnert, J. H. (1981). PROLSQ. In Biomolecular Structure, Conformation, Function and Evolution. (Srinivasan, R., Subramanian, E. & 638 X-ray Structure of Repeating (Pro-Pro-Gly) Peptide Yathindra, N., eds.), pp. 43 ±57, Pergamon Press, Oxford. Hoppe, H.-J. & Reid, K. B. M. (1994). Collectins: soluble proteins containing collagenous regions and lectin domains and their roles in innate immunity. Protein Sci. 3, 1143± 1158. Kraulis, P. (1991). MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallog. 24, 946± 950. Li, M.-H., Fan, P., Brodsky, B. & Baum, J. (1993). Twodimensional NMR assignments and conformation of (Pro-Hyp-Gly)10 and a designed triple helical peptide. Biochemistry, 32, 7377± 7387. Miller, M. H. & Scheraga, H. A. (1976). Calculation of the structures of collagen models. Role of interchain interactions in determining the triple-helical coiledcoil conformation. I. Poly(glycyl-prolyl-prolyl). J. Polym. Sci. Symp. 54, 171± 200. Minor, W. (1993). XDISPLAYF program. Purdue University. Momany, F. A., McGuire, R. F., Burgess, A. W. & Scheraga, H. A. (1975). Energy parameters in polypeptides. VII. Geometric parameters, partial atomic charges, nonbonded interactions, hydrogen bond interactions, and intrinsic torsional potentials for the naturally occuring amino acids. J. Phys. Chem. 79, 2361± 2381. NeÂmethy, G., Gibson, K. D., Palmer, K. A., Yoon, C. N., Paterlini, G., Zagari, A., Rumsey, S. & Scheraga, H. A. (1992). Energy parameters in polypeptides. 10. Improved geometrical parameters and nonbonded interactions for use in the ECEPP/3 algorithm, with application to proline-containing peptides. J. Phys. Chem. 96, 6472. Okuyama, K., Okuyama, K., Arnott, S., Takayanagi, M. & Kakudo, M. (1981). Crystal and molecular structure of a collagen-like polypeptide (Pro-Pro-Gly)10. J. Mol. Biol. 152, 427±443. Otwinowski, Z. (1993). Oscillation data reduction program. In Proceedings of the CCP4 Study Weekend: Data collection and Processing. (Sawyer, L., Isaacs, N. & Bailey, S., eds.), pp. 56 ± 62, Warrington, UK, SERC Daresbury Laboratory. Rich, A. & Crick, F. H. C. (1961). The molecular structure of collagen. J. Mol. Biol. 3, 483±506. Rosenbloom, J., Harsch, M. & Jimenez, S. (1973). Hydroxyproline content determines the denaturation temperature of chick tendon collagen. Arch. Biochem. Biophys. 158, 478± 484. Sack, J. S. (1988). CHAIN: a crystallographic modeling program. J. Mol. Graphics. 6, 224± 225. Sakakibara, S., Kishida, Y., Kikuchi, Y., Sakai, R. & Kakiuchi, K. (1968). Synthesis of poly-(L-prolyl-Lprolylglycyl) of de®ned molecular weights. Bull. Chem. Soc. Jpn. 41, 1273. Sakakibara, S., Kishida, Y., Okuyama, K., Tanaka, N., Ashida, T. & Kakudo, M. (1972). Single crystals of (Pro-Pro-Gly)10 a synthetic polypeptide model of collagen. J. Mol. Biol. 65, 371±373. Sakakibara, S., Inouye, K., Shudo, K., Kishida, Y., Kobayashi, Y. & Prockop, D. J. (1973). Synthesis of (Pro-Hyp-Gly)n of de®ned molecular weights. Evidence for the stabilization of collagen triple helix by hydroxypyroline. Biochim. Biophys. Acta, 303, 198± 202. Sasisekharan, V. & Bansal, M. (1990). Self-similarity and the assembly of collagen molecules. Curr. Sci. 1990, 863± 866. Schneider, B., Cohen, D. M., Schleifer, L., Srinivasan, A. R., Olson, W. K. & Berman, H. M. (1993). A systematic method for studying the spatial distribution of water molecules around nucleic acid bases. Biophys. J. 65, 2291± 2303. Yonath, A. & Traub, W. (1969). Polymers of tripeptides as collagen models. J. Mol. Biol. 43, 461± 477. Edited by D. Rees (Received 23 January 1998; received in revised form 8 April 1998; accepted 9 April 1998)