Crystallographic Studies of the Insulin-Linked Polymorphic Region by Qingfei Zhang Submitted to the Department of Chemistry in partial fulfillment of the requirements for the degree of Master of Science in Chemistry at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY Febuary 1998 @ Massachusetts Institute of Technology, 1998. All Rights Reserved. Author .............. ....... .................... r " Departrent of Chemistry September 2, 1997 ............. Alexander Rich William Thompson Sedgwick Professor Of Biophysics Department of Biology Certified by ....................... . Accepted by ............................................................ Dietmar Seyferth Robert T. Haslam and Bradley Dewey Professor of Chemistry Chairman, Departmental Committee on Graduate Students M R (13 1343i~ Crystallographic Studies of the Insulin-Linked Polymorphic Region by Qingfei Zhang Submitted to the Department of Chemistry on September 2, 1997, in partial fulfillment of the requirements for the degree of Master of Science in Chemistry Abstract The insulin-linked polymorphic region (ILPR) contains a 14 base-pair long tandem repeat of 5'-ACAGGGGTGTGGGG-3' located 363 base-pairs upstream of the human insulin gene. Genetic studies have identified the ILPR as a locus for insulin-dependent diabetes mellitus (IDDM). Biochemical studies have shown the existence of a G-quartet structure in this region. To investigate the G-quartet structure in detail, the sequence GGGGTGTGGGG was crystallized with a 24-condition screening matrix. X-ray diffraction data of the DNA crystal were collected. Molecular replacement was attempted with the G- uartet model of TGGGGT. To solve the phase problem, the heavy atom derivatives GGGG UGTGGGG and GGGGTGIUGGGG were used for isomorphous replacement. Anomalous patterson peak search was also attempted. However, phases of the structure factors could not be solved due to the difficulty of locating heavy atom peaks. However, the structure of GGGGTGTGGGG solved by NMR shows the existence of an antiparallel hairpin G-quartet structure. The detailed structure is described and the connection between G-quartet formation and transcriptional regulation is discussed. Thesis Supervisor: Alexander Rich Title: William Thompson Sedgwick Professor of Biophysics Acknowledgments "It is a very great thing to be able to think as you like; but, after all, an important question remains: what you think." Matthew Arnold Not unlike many other business endeavors, scientific research can either engender serendipitous discoveries and grand rewards or evolve into a Kafkasque nightmare filled with disillusionment and dejections. Fortunately or unfortunately, I have experienced neither. Thanks to my parents, who offered unflinching support for my career endeavors, perspicacious judgement for my career decisions and comforting solace during my trying times. For which I am eternally grateful. Many thanks to Dr. Rich, who is arguably the most skillful Zen master of not only the science, but also the art of life. Little do I know about the limits of his knowledge or the boundaries of his wisdom. From whom much remains to be learned. I am also greatly indebted to Dr. Liqing Chen, who has shown the true marks of a crystallography guru, who possesses the technical brilliance and professional experience to offer needed assistance to his fledgling apprentice. To enumerate his good deeds in detail would be a Herculean effort. Most importantly, his teachings are not merely cerebral, but contain an element of the heart. Thanks to Sarah, for rescuing me from the solitude of my kingdom. Last but not least, I offer my heartfelt gratitude to many others, for leaving me with the memories of unfaded dreams and grand aspirations, and for the obviation of doubts that otherwise might have haunted me. Without their existence my work would certainly be an insular and myopic effort. A human being is an intrinsically complex animal. He synthesizes happiness from numerous sources, such as professional success, intellectual satisfaction, emotional fulfillment and financial freedom. However, I have come to believe that the true reward of scientific research lies not in societal or pecuniary rewards, nor in the invidious display of power and academic prestige, but in the pure and unadulterated joys of a curious and inquisitive mind. Table of Contents Abstract ........... ..... ............ Acknowledgm ents.... ........................................ ................................................ 2 .................................................... 3 Chapter 1: Introduction ............. 1.1 The Insulin-Linked Polymorphic Region ...................................... 1.2 X -ray Crystallography ............................................................... ................ 10 Chapter 2: Experimental 2.1 M aterials .......................................................................................................... 18 2.2 C rystal Grow th................................................ ........................................... 20 .......... ........... 21 2.3 X-Ray Diffraction ....................................................... 2.4 Molecular Replacement ....................................................... 23 2.5 Isomorphous Replacement..............................................27 Chapter 3: Results and Discussion ........................... 29 3.1 D iffraction A nalysis........................................................ ....................... 29 3.2 Molecular replacement.............................................. 3.3 Isomorphous replacement .................................................... 30 ........................ 31 3.4 Anomalous scattering.............................................. 3.5 The structure of G4TGTG4 ................................................... 34 3.6 Correlation between G-quartet structure and transcriptional activity...........39 R eferences......................................................... ................................................. 41 Appendix A X-PLOR Cell Symmetry File ................................................................. 43 44 Appendix B X-PLOR Data File .................................................................................. Appendix C X-PLOR Input File for Rotation Search.................................................46 Appendix D X-PLOR Input File for PC-Refinement ................................................. 50 54 Appendix E X-PLOR Input File for Translation Search ...................................... Appendix F X-PLOR Input File for Rigid Body Refinement .................................... 58 Appendix G X-PLOR Input File for Simulated Annealing ...................................... 60 Chapter 1: Introduction 1.1 The Insulin-Linked Polymorphic Region Diabetes mellitus is a disorder of carbohydrate metabolism resulting from insufficient production of or reduced sensitivity to insulin. One variety of this disease is insulindependent diabetes mellitus (IDDM), in which insulin is not secreted by the pancreas and hence must be taken by injection (Tisch and McDevitt, 1996). The genetic basis of this disease is multifactorial and susceptibility is determined by environmental and genetic factors. Inheritance is polygenic and is influenced by the genotype of the class II major histocompatibility complex (MHC). While MHC class II genotype is one of the strongest factors determining susceptibility to IDDM, it has been discovered, through microsatellite analysis of genome-wide polymorphisms in IDDM families, that many other genetic regions also influence susceptibility (Vyse and Todd, 1996). The first non-MHC genetic region implicated in IDDM is a polymorphic minisatellite 5' of the human insulin gene on chromosome 11 (Kennedy, et al., 1995). Minisatellites are highly repetitive DNA sequences found in mammalian genomes and vary in length from a few to several thousand base pairs. They vary from simple di and trinucleotide repeats to more complex repetitive elements. Some trinucleotide repeats have been implicated in human diseases as varied as fragile X syndrome, myotonic dystrophy and Kennedy's disease. The insulinlinked polymorphic region(ILPR) is composed of a variable number of 14-base pair tandem repeats of the sequence AGAGGGGTGTGGGG 365 base-pairs up stream from the transcription site. The ILPR is unique due to its high degree of polymorphism in the human population. The polymorphism is generated by variation in the number of tandem repeats within a given ILPR and by minor nucleotide sequence heterogeneity within the individual repeats. Overall, fourteen closely related repeats have been discovered and ten of them have been associated with IDDM. Nine of them consist of single-base pair substitutions in the repeat sequence, and one is polymorphic in the repeat number (variable number tandem repeat, VNTR). Classification of the ILPR into three main classes is based on length differences. Class I, II III alleles have lengths of 40, 85, 157 repeats, respectively. Recently it has been shown that different classes have different levels of transcriptional activity (Kennedy, et al, 1995), with a long ILPR having more transcriptional activity than a short ILPR. How the above two characteristics of the ILPR, namely, genetic susceptibility to IDDM and transcriptional activity, are related is not clear. However, because the short ILPR is preferentially associated with IDDM, it could be inferred that decreased insulin transcription is related to IDDM and higher levels of insulin transcription from the long ILPR protects against IDDM. Therefore, factors that cause the difference in transcriptional activity could very well account for the genetic linkage between the ILPR and IDDM. To investigate the basis for the difference in transcription, it is helpful to consider another feature of the ILPR, namely its guanine-rich nature. Works based on the effect of alkali metal ions on the mobility of oligonucleotides (Sen & Gilbert, 1990) containing the ILPR consensus sequence in non-denaturing gels have shown the existence of quadriplex structures (Hammond-Kosack, et al, 1992, 1993). Quadriplex structures are formed only by oligomers containing one or several runs of guanine residues. These structures are stabilized by Hoogsteen base pairing, involving the N-7 positions of the contributing G residues (Figure 1.1). The exactly nature of the quadriplex formed depends on the number of G runs within the oligomers and environmental conditions such as salt concentration, temperature and torsional stress. Based on analysis of gel electrophoretic patterns, chemical modifications, ultraviolet-induced crosslinks and recently, x-ray crystallography, three main quadriplex structures have been identified. They are tetrastrand parallel quadriplexes(G4-DNA) (Laughlan, G. et al, 1994), unimolecular antiparallel quadriplexes (Wang, Y & Patel, D.J., 1993), bimolecular quadriplexes (Kang, C. et al, 1992). In all of the reported four-stranded structures the DNA strands are aligned parallel to each other, with each of the nucleotides in the anti conformation. However, the folded strands must have some antiparallel alignment by nature, which requires some of the guanine nucleotides to adopt the syn conformation in order to form a G-quartet. More interesting is the class of bimolecular quadriplexes, in which the two DNA strands can adopt three different model: antiparallelstranded edge-looped model, diagonal-looped model and alternative diagonal-looped model (Figure 1.2). The x-ray structure of four-stranded Oxytricha telomeric DNA G4 T4 G4 shows an edge-looped quartet (Figure 1.3) in which the thymine groups are aligned along the edge of the neighboring G-quartets (Kang et al, 1992). However, NMR studies showed that the same DNA sequence forms a diagonal-looped quadruplex in solution with the thymine loops aligned diagonally across the neighboring G-quartets (Smith and Feigon, 1992). The difference in topology could be due to the flexibility of G-rich DNA to form quadruplexes under different ionic conditions. The existence of G-tetraplex in ILPR has provided a new possibility for explaining the difference in transcriptional activity between the different ILPR sequences. To study the structure of the G-tetraplex in detail, x-ray crystallographic studies on the ILPR consensus sequence GGGGTGTGGGG was carried out and the data was analyzed to gain some insight into the structure of the G-tetraplex in the crystal. Figure 1.1: Hydrogen-bonding structure of the G-tetrad. A B C Figure 1.2: Three models of bimolecular quadruplexes. (A)antiparallel-stranded edgelooped model (B) diagonal-looped model (C) alternative diagonal-looped model. b L Figure 1.3: Four views of the crystal structure of G4 T4 G4 . c 1.2 X-ray Crystallography At present, the most powerful method for determining molecular structures to atomic resolution is x-ray crystallography. More than 1,000 protein structures have been determined by this method. The underlying approach in the method is to interpret the diffraction of x-rays from many identical molecules in an ordered array such as a crystal. To achieve the final goal, which is to obtain an atomic-resolution picture of the molecular structure, high quality crystals of the molecule must be grown, the directions and intensities of x-ray beams diffracted from the crystals must be measured, and computer methods must be used to interpret the data and reconstruct a three-dimensional image of the crystal content. Finally, the electron density image must be interpreted by building a molecular model that is consistent with the image. Like small molecules, many macromolecules such as nucleic acids and proteins solidify to form crystals under certain conditions. During crystallization, each macromolecule adopt one of a few orientations. The result is an orderly packing of molecules in three dimensional arrays. The smallest volume in the crystal that can be repeated by translation is the unit cell. In crystallography, the content of the unit cell is determined as an electron density distribution which in turn is used to locate individual atoms in the cell. The most important aspect of x-ray crystallography is to obtain high quality crystals suitable for x-ray diffraction. Crystallization methods such as vapor diffusion, dialysis, seeding have been invented (Ducruix and Giege, 1992). Vapor diffusion is a widely used method to obtain crystals by adding precipitant to aqueous solution of macromolecules until the precipitant concentration is just below that required to precipitate the molecule. Then water is allowed to evaporate slowly, which gently raises the concentration of molecule and precipitant until precipitation occurs. Whether the molecule forms crystals or disordered precipitates depends on molecule's concentration, solution pH, temperature and salt conditions. Finding the right conditions for growing the perfect crystal requires many careful trials and is often more of an art than a science. One prevalent crystallization method is vapor diffusion, in which the solution is allowed to equilibrate in a closed container with a larger aqueous reservoir of optimal precipitant concentration. In this study, vapor diffusion method was used for all crystallizations and to screen for the optimal crystallization condition, a 24-condition matrix was used (Berger, I., et al. 1996). The matrix was designed based on the identification of factors that enhanced DNA crystal growth. Some of the factors are pH, concentration of monovalent cations, magnesium ions, other divalent cations, polyamines and cobalt hexammine. MPD was used as the precipitant at a concentration from 10% to 30%. After crystals of sufficiently high quality are grown, x-ray diffraction analysis can be performed. The central problem in x-ray crystallography is to determine p(x, y, z), the electron density distribution inside the unit cell: p(x, y, z) = p1 , -Fhkle F h ip(hkl) 27i(hx +ky + ky + lz) The diffraction pattern contains information about the structure factor F at each position (h, k, 1)in reciprocal space, which is the Fourier transform of electron density p: Fhki = f p (x, y, z)e 2 ni(hx + ky + lz)dxdydz xyz The magnitude of the structure factor at (h, k, 1) is proportional to the square root of the measured intensity I at that position: Fhkl 0 While it is relatively easy to determine the magnitudes of structure factors, it's much more difficult to obtain their phases. It can be accomplished by a few complex experimental techniques such as direct methods, isomorphous replacement and anomalous scattering methods. If the structure of a similar molecule is available, molecular replacement can also be used to determine the phases. The direct methods are a set of analytical techniques for deriving an approximate set of phases from which a first approximation to the electron density map can be calculated. Interpretation of this map gives a suitable trial structure of the molecule. It makes use of the existence of mathematical relationships among certain combinations of phases. From these relationships, enough initial phase estimates can be obtained to begin converging toward a complete set of phases. Direct methods work when the number of reflections is small. When the molecule is large enough that a heavy atom does not change its structure significantly, isomorphous replacement can be used. In this method, a heavy atom is incorporated into the molecule and the slight perturbation of diffraction patterns caused by the added atom can be used to obtain initial estimates of phases. Because the diffractive contributions of atoms are additive vectors, the structural factor of the heavy atom derivative, FHp is the vector sum of the structural factors for the heavy atom(FH) and for the protein(Fp): HP = PH + PP Once the heavy atom is located in the unit cell, FH is known and phase information can be obtained by representing the above equation in the complex plane. Phase ambiguities can be eliminated by incorporating a second heavy atom that binds to a different site from the first. To locate heavy atom in the unit cell, the relatively simple diffraction signature of the heavy atom is extracted from the far more complicated diffraction pattern of the heavy atom derivative. The standard technique for determining the heavy atom location employs the patterson function P(u, v, w), which is a variation of the Fourier transform used to compute the electron density p(x, y, z): Fhkl2 e-2 x i(hu + kv P(u, v, w) = 1 + w) hkl A difference patterson function, AP(u, v, w), with amplitudes of (AF) 2 = (IFHpI IFpI) 2 can be used to search for the heavy atom in the derivative crystal: (AFhk2 )e - 27i(hu+kv+1w) AP(u, v, w) = 1 hk l The Patterson map, which consists of coordinates (u, v, w), is a contour map of P(u, v, w) that displays peaks at locations corresponding to vectors between atoms. Through a trial-and-error process, these peaks can be used to identify the location of the heavy atom. Anomalous scattering is another method for phase determination. It takes advantage of the heavy atom's capacity to absorb x-rays of specified wavelength. The heavy atom absorbs appreciably the x-rays used and there is a phase change for the x-rays scattered by that atom relative to the phase of the x-rays scattered by an nonabsorbing atom at the same site. As a result, the intensities of symmetric reflections(Friedel pairs) hkl and hk-1- are not equal, which leads to a difference between the structure factor magnitudes IFhk1 12 and IFh-.k.-1 2 . The phase of a reflection in the heavy-atom derivative data can then be calculated, which in turn gives the phase of the corresponding reflection in the native data. If heavy atom derivatives suitable for diffraction analysis cannot be obtained, the method of molecular replacement can be used to determine the structure from a single native data set. A model of a known molecule can be placed in the unit cell of the new molecule and the phases from structure factors of the known molecule can be used as initial estimates of phases for the new one. If the phasing model and the new molecule are isomorphous, then the phases from the model molecule can be used directly to compute p(x, y, z) from native intensities of the new molecule: lz newl -27Ei(hx + ky + S(x Y, z) = ,hklmodel) hkl In this equation, IFnewl can be obtained from the native intensities of the new molecule, and the phases newmodel are from the model molecule. The process of iterative phase refinement can change the phases from those of the old model to those of the new molecule, thus giving the new structure. Often the phasing model is not isomorphous with the desired structure but is related to it by a translation and rotation operation, as in: X 2 = [C]X 1 + where X1, X2 are the position vectors of the model molecule and the new molecule, respectively and [C] is a rotation matrix and d is a vector defining translation. Then a search in three rotational degrees of freedom and three translational degrees of freedom needs to be performed on the phasing model to position it as identical as the new molecule in the unit cell. In the first step, a rotation function R is used to search the relative orientation(rotation) of the molecule in the unit cell: R = P(X2)P(X 1 )dX 1 where P(X1 ) and P(X 2 ) are the Patterson functions for molecule 1 and 2. It has a maximum value when the two self-vector sets are equivalently oriented. Having determined the orientation, the position of the molecule can be determined by maximizing the translation function T: T(A) = P(u+A)P([C]u-A)du Jul < 2r where A is a translation vector which is independent of the origin of the rotation axis and relates the centers of the two molecules (Brtinger, A., 1992) The condition lul>2r is used to remove the self vector set. The structure factors of the properly positioned model can then be calculated and the computed phases can be the initial estimates of the desired phases and can be refined by the use of any available noncrystallographic symmetry or by density modification and solvent flattering. When noncrystallographic symmetry is present, electron densities of the noncrystallographically related units are averaged and back-transformed. The resulting calculated phases are then applied to the observed structure factors to compute a new improved electron density map. The part of the structure outside the molecular envelope is usually flattened to represent solvent. The process is then repeated for many cycles until convergence has been achieved. Simple methods such as rigid body refinement can then be used to improve the model by minimizing the crystallographic residual factor(R-factor), which is defined as R- IIFobs - IFcaicI IFobs where Fobs is derived from a measured reflection intensity and Fcalc is the amplitude of the corresponding structure factor calculated from the model. The entire molecule or a group of atoms is treated as a rigid body and moved inside the asymmetric unit to obtain orientations and positions having a lower R-factor. More complicated methods such as conjugated gradient or simulated annealing minimizes the hybrid energy function: E total = Eempirical E +E effective Eempirical describes the energy of the molecule through an empirical energy function, which is a sum of energy terms describing bonding stretching, bond bending, dihedral angles, improper angles, hydrogen-bonding van der Waals and electrostatic interactions. Eeffective is an effective potential energy function that incorporates molecular dynamics into the energy function. It describes the difference between the observed structure factor amplitudes and those calculated from the atomic model. Simulated annealing method employs molecular dynamics, which is an attempt to simulate the movement of molecules by solving Newton's laws of motion for atoms moving within force fields that represent the effects of covalent and noncovalent bonding. The model is allowed to move as if at high temperature, in hopes of lifting it out of local energy minima. Then it's cooled slowly to find the lowest-energy conformation at the temperature of diffraction data collection. Many cycles of model building and structural refinement are required to converge the model with data. The primary measure of convergence is the R-factor. Values of R ranges from 0, for perfect agreement of calculated and observed intensities, to about 0.6, for a set of randomly calculated intensities. An R-factor greater than 0.5 implies a poor model and structural refinement will not be useful. On the other hand, An R-factor less than 0.2 implies a reliable model. Chapter 2: Experimental 2.1 Materials The DNA oligonucleotide GGGGTGTGGGG(G 4 TGTG 4 ) and the heavy atom derivatives GGGG-6-iodo-uracil-GTGGGG(G4IUGTG 4 ) and GGGGTG-6-iodo-uracilGGGG(G 4 TGIUG 4 ) were synthesized by the solid phase phosphoramidate method on an Applied Biosystems DNA synthesizer and purified by passing through a Sephadex column and several cycles of ethanol-precipitation and lyophilization. The DNA was then dissolved in 600 ul of distilled water and stored under freezing. Table 2.1 shows a 24-condition matrix for the crystallization of G4 TGTG 4 . It was prepared from the following stock solutions: Buffers: Cacodylate buffer pH 7.0, cacodylate buffer pH 6.0, cacodylate buffer pH 5.5. Polyamines: Spermine terahydrochloride, Cobalt hexaammine chloride Co(NH 3)6 C13 Monovalent ions: LiCl, NaCl, KC1. Divalent Ions: MgC12 , SrC12 , BaC12. Precipitant: 2-mehyl-2,4-pentane-diol (MPD). Table 2.1: 24-condition matrix composition Condition pH Polyamine Monovalent ion 1 7.0 12 cmM spermine 80 mM KC1 2 7.0 12 mM spermine 80 mM KC1 3 7.0 12 mM spermine 80 mM NaCl 4 7.0 12 mM spermine 80 mM NaCl Divalent ion 20 mM MgC12 20 mM MgCl 2 Table 2.1: 24-condition matrix composition Condition pH Polyamine Monovalent ion Divalent ion 20 mM MgC12 5 7.0 12 mM spermine 80 mM NaCI, 12 mM KCl 6 7.0 12 mM spermine 12 mM NaCi, 80 mM KCI 7 6.0 12 mM spermine 80 mM KCl 8 6.0 12 mM spermine 80 mM KCl 9 6.0 12 mM spermine 80 mM KCl 10 6.0 12 mM spermine 80 mM NaCI 11 6.0 12 mM spermine 80 mM NaCi, 12 mM KCl 12 6.0 12 mM spermine 12 mM NaC1, 80 mM KC1 13 7.0 12 mM spermine 80 mM NaCI 20 mM BaC12 14 7.0 12 mM spermine 80 mM KCl 20 mM BaC12 15 6.0 12 mM spermine 80 mM NaCI 20 mM BaC12 16 6.0 12 mM spermine 80 mM KCl 20 mM BaCl 2 17 7.0 12 mM spermine 40 mM LiCI 80 mM SrC12 , 20 mM MgC12 18 7.0 12 mM spermine 40 mM LiCl 80 mM SrC12 20 mM MgC12 20 mM MgC12 20 mM MgC12 Table 2.1: 24-condition matrix composition pH Condition Polyamine Monovalent Mon ion Divalent ion 19 7.0 12 mM spermine 80 mM SrCl 2 , 20 mM MgC12 20 6.0 12 mM spermine 80 mM SrC12 21 5.5 20 mM Co(NH 3 )6 C13 80 mM NaC1 20 mM MgC12 22 5.5 20 mM Co(NH 3 )6 - 80 mM KCI 20 mM MgC12 C13 23 5.5 20 mM Co(NH 3 )6 C13 12 mM NaC1, 80 mM KC1 24 5.5 20 mM Co(NH 3 )6 - 40 mM LiCl Cl3 20 mM MgC12 All conditions contain 40 mM cacodylate buffer and 10%(v/v) of precipitant MPD 2.2 Crystal Growth The DNA stock solutions were screened in the 24-condition screening matrix using the vapor diffusion method. Hanging drops were used for G4TGTG 4 and G4 TGIUG4 . Sitting drops were used for G4 1UGTG 4 . The reservoirs are MPD solutions with concentrations ranging from 10%(v/v) to 30%(v/v). After identifying the condition which produced the best quality forms based on size and regularity, plates were prepared with 24 reproductions of this particular condition. The best crystals in the reproductions were selected for x-ray analysis. All crystallizations were carried out at room temperature. Table 2.2: Best Conditions for Crystal Growth DNA Stock DNA used Best condition Volume of condition Reservoir Time of Crystallization G4TGTG 4 1 ul 12 2 ul 30% MPD two weeks G4'UGTG 4 1 ul 22 2 ul 20% MPD two days G4TG1 UG4 1 ul 2 2 ul 30% MPD overnight 2.3 X-Ray Diffraction Each crystals selected for diffraction study was mounted in a capillary tube and placed on a goniometer for data collection. X-rays from a monochromatic Cu source(1.54 A) were used for diffraction of the crystals. The diffraction patterns were recorded on a Rigaku RAXIS IIc imaging plate. Some of the diffraction data were collected under a 277 K nitrogen cold stream. Others were collected at room temperature. Three still images were taken first to determine the unit cell and the information was then used in the collection of oscillation images. Figure 2.1 shows a sample oscillation image of G4 UGTG 4. Figure 2.1: Sample oscillation image of G4 IUGTG4 22 Space group determination was carried out with the RAXIS software and the program Denzo. Diffraction intensities were integrated from image plate files using Denzo and RAXIS. The following unit cell information was obtained after processing data from still diffractions: Table 2.3: Unit Cell Data Crystal Space group Dimensions Dimensions (a, b, c) Resolution G4 TGTG 4 C222 24.2 A, 54.7 A, 40.9 A 2.3 A Os soaked G4TGTG 4 C2221 24.2 A, 53.8 A, 40.6 A 2.3 A G4IUGTG 4 C222 24.7 A, 53.0 A, 2.5 A 40.0 A G4 TGIUG 4 P222 unindexable 2.5 A 2.4 Molecular Replacement The phasing model used for molecular replacement is the G-tetraplex crystal structure of a DNA hexanucleotide TGGGGT (Laughlan, G. et al., 1994). The crystal belongs to space group P1, with cell dimensions a = 28.76 A, b = 35.47 A, c = 56.77 A and a = 74.390, 3 = 77.640, = 89.730. Each asymmetric unit contains four parallel-stranded tetraplexes (Figure 2.2). b LC a c La b b b Figure 2.2: Crystal structure of TGGGGT tetraplex. Four views of one of the tetraplexes in the asymmetric unit. Molecular replacement was carried out with the computer program X-PLOR, a powerful package for x-ray crystallography (Bringer, A., 1992). The origin of the model was shifted and each G-tetraplex layer was rotated 1800 to adapt to the new symmetry of the C222 space group. A cross rotation search was performed in Patterson space. The stationary Patterson map P2 is computed from observed intensities by Fast Fourier Transform. The to be rotated Patterson map P1 is computed from the TGGGGT model. The strongest Patterson vectors in P1 are used for rotation search using the Eulerian angles (Rossmann & Blow, 1962), pseudo-orthogonal Eulerian angles (Lattman, 1985) or spherical polar angles. The values of the Patterson map P2 at the positions of the rotated Patterson vectors of map P1 were computed by linear eight-point interpolation. For each sampled orientation Q the cross rotation function RF(Q) = Pobs Pmodel ( Q ) between the rotated vectors of P1 and the interpolated values of the Patterson map P2 was computed. Then all sampled orientations are sorted according to their RF values and a simple peak search was carried out. Patterson correlation(PC) refinement was then performed on the highest peaks of the rotation function (Brtinger, A., 1992). The target function for PC refinement is proportional to the negative correlation coefficient between the squared amplitudes of the observed and the calculated normalized structure factors. The correct orientation was identified by having the lowest value of the target function after refinement. The translation search was subsequently carried out on orientations with high RF values by computing the target function E xray WA The search routine computes the structure factors Fcalc of the translated primary molecule and the symmetry related molecules by applying appropriate phase shift operators in reciprocal space to the calculated structure factors of the original molecule and its symmetry mates, which are defined by the space group operators. Rigid body refinement was then carried out on the translated model, followed by simulated annealing. In the preparative stage of simulated annealing refinement, 40 cycles of minimization was performed to relieve strain or bad contacts of the structure. A slowcooling protocol (Briinger, A., 1990) was then used. The Newtonian equations of motion were solved numerically by the Verlet algorithm. The initial velocities are assigned to a Maxwellian distribution at the appropriate temperature. Velocity scaling, Langevin dynamics and T coupling were used to control the temperature during molecular-dynamics simulation. The following effective energy function is used: Eeffective - EXRAY + ENOE + EHARM + ECDIH + ENCS + EDG + EREL which consists of restraining energy terms that use experimental information. Descriptions of these energy terms can be found in the X-PLOR manual. 2.5 Isomorphous Replacement The native crystals of G4 TGTG 4 were soaked in various heavy atom solutions. The first soaking solution contained 10 mM platinum (ethylenediammine) dichloride and 50%(v/v) MPD. After soaking for 15 hours, the crystal was mounted on a capillary tube and analyzed by x-ray diffraction. Analysis of the still images showed the heavy atom did not get in the unit cell. Another native crystal was soaked in 10 mM methyl mercuric chloride and 50%(v/v) MPD for 51 hours, but unfortunately the diffraction data were of low quality and could not be indexed. Therefore another native crystal was soaked in a solution containing 10 mM mercuric chloride (HgC12 ) and 50%(v/v) MPD. Diffraction data were collected after 22 hours of soaking, but unfortunately of poor quality. Lastly, osmium was used as the heavy metal and by directly pipetting the heavy atom into the hanging drop, good diffraction images were obtained after 1.0 hour of soaking. Diffraction data on two soaked crystals were then collected and processed to produce difference Patterson maps. The R-factor of one difference map was lower than 0.15 (Table 2.4), indicating the absence of heavy atom in the unit cell. Although the R factor of the second map was larger than 0.15, which indicates the derivative data were different from that of the native, the Harker sections (Table 2.5) in the Patterson map contained too many small peaks and were uninterpretable. Table 2.4: Merge and Scale Data of the Osmium Derivative Resolution Number of Resolutio independ. range reflections R merge RMS dev. from linearity R factor 20A- 2.3A 1080 4.50% -- 0.1201 20A - 2.3A 1246 4.31% 0.072 0.2887 §: R merge is the agreement R-factor between symmetry-related observations. Table 2.5: Harker Vectors for Space Group C222 1 C222 1 X, Y, Z X, Y, Z X, Y, 1/2+Z X, Y, 1/2-Z X, Y, Z 0 0, 2Y, 2Z 2X, 2Y, 1/2 2X, 0, 1/2+2Z X, Y, Z 0, 2Y, 2Z 0 2X, 0, 1/2+2Z 2X, 2Y, 1/2 X, Y, 1/2+Z 2X, 2Y, 1/2 2X, 0, 1/2+2Z 0 0, 2Y, 2Z X, Y, 1/2-Z 2X,0,1/2+2Z 2X, 2Y, 1/2 0, 2Y, 2Z 0 Therefore, attempts were then made to crystallize iodinated derivatives of the native crystal, G4 IUGTG 4 and G4 TGIUG 4 , instead of further heavy atom soaking. Crystals were obtained for both of the derivatives but only that of one derivative, G4 UGTG 4, gave good diffraction. A total of 52 oscillation images were collected to a resolution of 2.5 A, with a 40 oscillation angle and a crystal-to-plate distance of 120 mm. The diffraction intensities were indexed, refined and scaled with the program Denzo. Chapter 3: Results and Discussion 3.1 Diffraction Analysis The diffraction pattern of the native crystal G4 TGTG 4 was analyzed. The lattice type was orthorhombic with unit cell dimension of a = 24.2 A, b = 54.7 A, c = 40.9 A. The space group was determined to be C222 by the indexing program. A unit cell with the above dimensions has a volume of 5.41 x 104 A3 . Assuming half of the crystal volume is occupied by the solvent, then the volume of the DNA is approximately 2.7 x 104 A3 . In the space group C222, there are eight asymmetric units per unit cell. So the volume of one asymmetric unit is 1/8 of 2.7 x 104 A3 , or 3.4 x 103 A3 . Since the specific volume of DNA is 0.50 cm 3 /g (Cantor, 1980), the molecular mass in one asymmetric unit can be obtained by dividing the volume of the asymmetric unit by the specific volume. In this case, it is 4.1 kD. The molecular mass of G4 TGTG 4 as calculated from formula is 3.6 kD, which is very close to the mass of one asymmetric unit. Therefore, it is likely that there is only one G4 TGTG 4 molecule per asymmetric unit. 3.2 Molecular replacement PC refinement of the orientations selected by the rotation function was carried out. However, the refined PC coefficients failed to show any major peaks that could indicate promising orientations (Figure 3.1). Therefore, a few orientations with highest rotation function values were then selected for positioning by the translation function, followed by rigid-body minimization. The output structure has a high R-factor(53%). Consequently, simulated annealing was used to refine the structure. However, it failed to improve the Rfactor significantly as the lowest R-factor obtained after simulated annealing is 47%. PC-Refinement 0.4 0.300.2 0.10.0 0 50 100 RF Peak Index Figure 3.1: PC refinement. No orientation with significantly higher PC coefficient was found. 3.3 Isomorphous replacement Table 3.1 shows that the diffraction intensities of G4 UGTG 4 has an R-factor of 0.107. A Patterson map was subsequently produced to locate the heavy atom. Unfortunately the map was uninterpretable due to the absence of any major peaks that could be attributed to the atom iodine. Table 3.1: Summary of Diffraction Intensities of G4 UGTG 4 Lower Resolution(A) Upper resolution(A) Average intensity Average error Norm X2 R-factor 25.00 4.39 8022.6 476.5 1.992 0.078 4.39 3.49 5617.5 386.1 1.233 0.098 3.49 3.05 3727.8 319.8 1.327 0.120 3.05 2.77 1537.8 182.4 0.710 0.148 2.77 2.57 1176.5 161.9 0.733 0.160 2.57 2.42 602.7 134.8 0.535 0.212 2.42 2.30 386.2 125.2 0.505 0.262 2.30 2.20 247.4 114.3 0.372 0.332 reflections 2961.3 252.3 0.997 0.107 All 3.4 Anomalous scattering The single wavelength anomalous scattering method (Wang, 1985) was used in another attempt to locate the iodine atom. The I+ and I- refections were processed separately and compared. As Table 3.2 shows, the X2 value for all reflections is less than 1, which indicates that the error is greater than intensity and no useful anomalous signal can be detected. One more effort to locate the anomalous scatterer used difference patterson analysis (Rossmann, 1961). Anomalous patterson maps were plotted from the data (Figure 3.2). However, they failed to show any heavy atom peaks. Table 3.2: Summary of Anomalous Signal Detection Average intensity Average error Norm X2 R-factor Lower Resolution(A) Upper resolution(A) 99.00 4.40 8370.5 2065.9 0.229 0.054 4.40 3.49 5367.8 1552.3 0.171 0.060 3.49 3.05 3073.8 1074.1 0.094 0.069 3.05 2.77 1487.2 725.2 0.080 0.106 2.77 2.57 1255.8 706.5 0.079 0.106 2.57 2.42 645.7 569.5 0.048 0.115 2.42 2.30 421.9 494.3 0.046 0.206 2.30 2.20 224.6 436.1 0.038 0.186 reflections 2991.2 1033.9 0.105 0.069 All Ing IGe. ne ZO-4A O(<15 OF PAT 30/30.5-15-97 Xx 0.0000 iglls6. 1.0000 Z 0.0000 . 20-4A OF'IS Or PAT 30/30.5-15-97 ingl16 ee OF PAT 30/30.5-15-97 Or<15 Z0-4A I.1111. ... 20-4A X3 0.5000 Z 0.0000 1I Lt. -. m AS 1.0000 I ! I/ W41s or pAr303.5-15-wr Zz 0.5000 4 ANe 0.0000 x 1.0000 3 0/30.S-I-. 1.0000 is*** -" W, ft?ani.s-.s-3r I)) OF PAr 0.0000 Zz 0.0000 Y: 0.5000 <ctS YZ 0.0000 0.0000 Figure 3.2: Anomalous Patterson maps of the G41UGTG 4 crystal. X 1.0000 3.5 The structure of G4TGTG4 Although the various crystallographic attempts have failed to produce a refined structure of G4 TGTG 4 , other techniques such as gel electrophoresis, circular dichroism(CD) and recently, one and two dimensional nuclear magnetic resonance spectroscopy (1D and 2D NMR) have been used to investigate the nature of its chain folding, the stacking interaction of the G-tetraplexes in the stem, and the interactions of the bases in the loops (Catasti, et al., 1996). Non-denaturing 20% polyacrylamide gel studies on G4 TGTG 4 as well as two other oligonucleotide sequences capable of forming G-tetraplexes, G4 ACAG 4 and G4 TGTG 4 ACAG 4 TGTG 4 show mobilities consistent with G-quartet folding (Figure 3.3). -30 bo -20 bp X13X12XI'- XG. 1 2 3 4 5 6 7 B 9 10 M Figure 3.3: Non-denaturing polyacrylamide gel of different ILPR fragments and at different ionic strengths. Lanes 1 to 3 contain the (G4 ACAG4 )2 fragment at respectively 50, 150, 250 mM NaCl. Similarly, lanes 4 to 6 contain (G4 TGTG 4 )2 , and 7 to 9 contain G4 TGTG 4 ACAG 4 TGTG 4 at 50, 150, 250 mM NaCl, respectively. Lane 10 contains (G4 )4 for control. Circular dichroism spectrum of G4 TGTG 4 shows one positive band at 295 nm but the band at 262 nm is absent, contrary to the CD spectra of G4 ACAG 4 and G4 TGTG 4 ACAG 4 TGTG4 (Figure 3.4). This could be explained by a difference in the topology of these structures (Figure 3.5). The band at 262 nm could indicate the CD effects of the 5'-GSYn-Gan-Gsn-Ganti-3' arm orientations, while the band at 295 nm indicates the CD effect of ACA or TGT loop since it is absent in the spectrum of G4 . 44) -10 Figure 3.4: CD spectra at room temperature for (G4)4, (G4 TGTG 4)2 , (G4 ACAG 4 )2 and G4 TGTG 4 ACAG4 TGTG4 . CD -I ' CD -cr Iii'29 11i c ld(G4)14 [d(G 4 ACAG) 12 4 262 295 CDI - ' v I (d(G 4 TGTGh2 (d(G 4 TGTG4 ACAG 4TGTG4)J Figure 35: Topological models for the interpretation of CD data. Imino proton exchange experiment monitored by one-dimensional IH NMR shows the accessibilities of various Guanine imino protons in the sequence G4 TGTG4 . The presence of proton signals at G2, G3, G9 and G10 after two days of incubation with 2 H2 0 indicates that these guanines are inaccessible to solvent and buried inside the G-quartet. Two-dimensional Nuclear Overhauser effect(NOE) spectroscopy of exchangeable and nonexchangeable NIH protons supports the hairpin G-quartet model as shown in Figure 3.6. In this model, the glycosyl torsions of the GGGG residues alternate as 5'-G synGani-Gsyn-Gani-3 ' while the sugar puckers for al the four residues are C2'-endo. The (T5- G6-T7) loop connects G4anti and G8 sy' along the wide edge of the (G4anti-G8sYn-G 1 la ntiG SYn) quartet. Tow hairpins are anti-parallel to each other. Intra-nucleotide NOEs suggest that TS, G6, and T7 adopt (c2'-endo, anti) conformation with G6 shifted more toward (c2'-endo, high anti). The presence of inter-hairpin NOEs such as Hi' (T7B)-H8(G1 la), Nlh(GIA)-HI'(T7B), N1H(G1A)-H4'(T7B) etc., is consistent with only the anti-parallel arrangement of the two (A and B) symmetric hairpins. Flgure 3.6: Map of the G-quartet folding schematics. Four layers of G-quartets were built according to the topological chain-folding pattern shown in Figure 3.6. Restrained molecular dynamics and energy minimization on this initial structure gave an average minimized structure of G4 TGTG 4 as shown in Figure 3.7. The internal G-quartets are quite planar, whereas G4, in the external layer, is tilted out of plane. The structure is also stabilized by intra and inter-strand interactions at the GantipG syn steps. In each 5'G-G-G-G-3' arm, the glycosyl torsions of the G residues alternate as 5'-Gsyn-Ganti-Gsyn-Ganti-3 ' while the sugar puckers for all the four G residues at C2'- endo. The T5-G6-T7 loops connects G4 anti and G8syn along the wide edge of the G4 an tiG8syn-G1 lanti-G 1syn tetrad. Two hairpins are anti-parallel to each other. T7 in the loop is stacked with G11 on the opposite strand and G6 is stabilized by a strong interaction with Gl on the opposite strand. Important non-bonding interactions between G6 and T7 with G4 offer additional loop stability. T5 is not stacked and it is locked in the narrow edge between G8 and G4, being stabilized mostly by electrostatic interactions with the backbone. Figure 3.7(A) shows the strong GantipGsyn stacking between the two internal layers of quartets. Figure 3.7(B) shows the strong vertical stacking of G6 in the loop with G1 on the opposite strand, and T7 with G11 of the opposite strand. Important NOEs were observed between N1H-G6 and H8-G1, and between T7(H1') and G11(H8) that justify the observed stacking. T5 does not show any substantial interaction with any of the bases, either in the loop or in the stem. Figure 3.7(A) shows that two symmetric T5-G6-T7 loops are disposed on two opposite sides of the G-quartet. Each T5-G6-T7 loop connects G4 and G8 along the edge of the G4-G8-G1 1-G1 tetrad. I Figure 3.7: (A)The structure of the hairpin G-quartet of (G4 TGTG 4 )2 after 2500 conjugate gradient steps of minimization. (B): Close-up view of the local structure of the (T5G6-T7) loop. 38 3.6 Correlation between G-quartet structure and transcriptional activity It is known that different ILPR sequences have different transcriptional activity. To explain the difference, one could look at the differences in their abilities to form the hairpin G-quartet structures. Figure 3.8 shows the enhancer, ILPR, promoter, and the transcription start site of the human insulin gene. The transcriptional activities of the ILPR sequence and a few mutations are also shown. A single G--A mutation from G4TGTG 4 to G4 TGTGAGG lowers the transcriptional activity to less than 50% of the consensus sequence. An G->C mutation together with an G--A mutation lower the activity to only 1%of the consensus sequence. These mutations also destabilize the cyclic-H-bonding and stacking in the hairpin G-quartet structure. Single mutation in the hairpin loop (G-C) gives only 1/3 of original transcriptional activities and at the same time, the mutation is know to disrupt the loop-loop and loop-tetrad interactions. These observations can be seen as evidence for a positive connection between the hairpin G-quartet structure of the ILPR and the transcriptional activity of the insulin gene. Further support for the correlation between hairpin Q-quartet formation and transcriptional regulation can be shown by the existence of telomere-like G/C rich regions in the genes of insulin-like growth factors and their receptor (Allander, et al., 1994), the human mucin, MUC-1, gene (Hareuveni, et al., 1990). Although they are not tandemly repeated as in the ILPR of the insulin gene, these sequences are similar to telomere sequence and the ILPR sequence. Thus they are capable of forming G-quartets. Other folded structures such as triple helix, cruciform and H-DNA are also capable of transcriptional regulation if formed upstream of a gene. ILPR enhancer V promoter INS ULIN Consensus Sequence ILPR ACA GGGG TGT GGGG T T T C A C A Transcriptional Activity 100% 41% 8% 32% 44% 1% Figure 3.8: Schematic representation of the human insulin gene located on the short arm of chromosome 11. The consensus sequence, single and double mutations are shown together with their transcriptional activities. References Allander, S.V., Larsson, C., Ehrenborg, E., Suwanichkul, A., Weber, G., Morris, S.L., Bajalica, S., Kiefer, M.C., Luthman, H. & Powell, D.R. Characterization of the chromosomal gene and promoter for human insulin-like growth factor binding protein-5. J. Biol. Chem. 269, 10891-10898 (1994). Berger, I., Kang, C., Sinha, N., Wolters, M. and Rich, A. A highly efficient 24-condition matrix for the crystallization of nucleic acid fragments. Acta Cryst. D 52, 465-468 (1996). Brtinger, A. T., and Anton Krukowski. Slow-cooling protocols for crystallographic refinement by simulated annealing. Acta Cryst. A 46, 585-593 (1990). Briinger, A. T., X-PLOR Version 3.0: A system for crystallographyand NMR (1992). Cantor, C. and Schimmel, P. Biophysical chemistry. W. H. Freeman Co. (1980) Catasti, P., Chen, X., Moyzis, R.K., Bradbury, E.M. and Gupta G. Structure-function correlations of the insulin-linked polymorphic region. J. Mol. Bio. 264, 534-545 (1996). Ducruix, A. and Giege, R. Crystallization of Nucleic Acids and Proteins, A Practical Approach. IRL Press (1992) Hammond-Kosack, M.C.U., Docherty, K. A consensus repeat sequence from the human insulin gene linked polymorphic region adopts multiple quadriplex DNA structures in vitro. FEBS Letts. 301, 79-82 (1992) Hammond-Kosack, M.C.U., Kilpatrick, M.W. & Docherty, K. The human insulin genelinked polymorphic region adopts a G-quartet structure in chromatin assembled in vitro. J. Mol. Endocrin. 10, 121-126 (1993) Hareuveni, M., Tsarfaty, I., Zaretsky, J., Kotkes, P., Horev, J., Zrihan, S., Weiss, M., Green, S., Lathe, R., Keydar, I., Wreschner, D.H. A transcribed gene, containing a variable number of tandem repeats, codes for a human epithelial tumor antigen. Euro. J. Biochem. 189, 475-486 (1990). Kang, C., Zhang, X., Ratliff, R., Moyzis, R. and Rich, A. Crystal structure of fourstranded Oxytricha telomeric DNA. Nature 356, 126-131 (1992). Kennedy, G.C., German, M.S. and Rutter, W.J., The minisatellite in the diabetes susceptibility locus IDDM2 regulates insulin transcription. Nature Genetics 9, 293-298 (1995). Laughlan, G., Murchie, A. I. H., Norman, D. G., Moore, M. H., Moody, P. C. E., Lilley, D. M. J. and Luisi, B. The high resolution crystal structure of a parallel-stranded guanine tetraplex. Science 265, 520-524 (1994). Lattman, E.E., Use of the rotation and translation functions. Methods Enzymol. 115, 55-77 (1985). Rossmann, M.G. Acta Cryst. A 14, 383 (1961). Rossmann, M.G., & Blow, D.M., Acta Cryst. A 15, 25-31 (1962). Sen, D. & Gilbert, W. A sodium-potassium switch in the formation of four-stranded G4DNA. Nature 344, 410-414 (1990). Smith, F. W. and Feigon, J. Quadruplex structure of Oxytricha telomeric DNA oligonucleotides. Nature 356, 164-168 (1992). Tisch, R. and McDevitt, H., Insulin-dependent diabetes mellitus. Cell 85, 291-297 (1996). Vyse, T. and Todd, J., Genetic analysis of autoimmune disease, Cell 85, 311-318 (1996). Wang, B.-C. Resolution of phase ambiguity in macromolecular crystallography. Methods Enzymol. 115, 90-112 (1985). Wang, Y. & Patel, D.J., Solution Structure of the human telomeric repeat d[AG3(T2AG3)3] G-tetraplex, Structure 1, 263-282 (1993). Appendix A X-PLOR Cell Symmetry File remarks unit cell parameters ING C222 a=24.24 b=54.67 c=40.90 alpha=90.0 beta=90.00 gamma=90.0 {* spacegroup=C222 NO. * } symmetry=( X,Y,Z) symmetry=( -X,-Y,Z) symmetry=( X,-Y,-Z) symmetry=( -X,Y,-Z) symmetry=( 1/2+X,1/2+Y,Z) symmetry=( 1/2-X,1/2-Y,Z) symmetry=( 1/2+X,1/2-Y,-Z) symmetry=( 1/2-X,1/2+Y,-Z) Appendix B X-PLOR Data File remarks file ingprepare.inp remarks preparation of various data structure @generate.psf end {* read structure file *} parameter @paraml l.dna {* read empirical potential *} {* parameter file *} {* append parameters for waters * } BOND HT OT 450.0 0.9572 ANGLE HT OT HT 55.0 104.52 {* for solute-water interactions * } 0.1591 2.8509 0.1591 2.8509 NONBONDED OT NONBONDED HT 0.0498 1.4254 0.0498 1.4254 {* for water-water interactions * } -------- A14------BB--------14 ----------A---------------nbfix ot ot 581980.4948 595.0436396 581980.4948 595.0436396 nbfix ht ht 3.085665E-06 7.533363E-04 3.085665E-06 7.533363E-04 nbfix ht ot 327.8404792 10.47230620 327.8404792 10.47230620 {* append parameters for pt * I nonbonded PT 0.1000 2.0000 0.1000 2.0000 ! platinum nbonds {* this statement specifies * } atom cdie shift eps=1.0 el4fac=0.4 {* the nonbonded interaction * } cutnb=7.5 ctonnb=6.0 ctofnb=6.5 {* energy options. Note the *} nbxmod=5 vswitch {* reduced nonbonding cutoff * } end {* to save some CPU time *} end flags {* in addition to the empirical potential *} include pele pvdw xref {* energy terms which are turned on initially. * } ? {* This statement turns on the crystallographic * } end {* residual term and packing term. *} xrefine @ing.cel {* this invokes the crystallographic data parser * } {* unitcell and * } {* symmetry operators for space group P22121* } *} {* notation is as in Int. Tables @scatter.sct {* approximation is used. Atoms are selected based on* } {* chemical atom type. Note the use of wildcards in the selection * } nref= 15000 {* this will allocate space for the reflection list; specify a *} {* number >= the actual no. of reflections* } {*fwindow=5.0 10000.0 this will select reflections based on the size of Fob* reflection @ing.fob end {* here we read in the diffraction data, *} {* a typical line in the file may look like this: *} {* FOBS= -32 1 5.958 WEIG= 1.0 PHASe=46. FOM=0.4 {* everything is free-field, if you don't specify something *} f * it'll be set to a reasonable default value *} method=FFT *} {* use the FFT method instead of direct summation * } fft memory= 1000000 {* this tells the FFT routine how much physical memory * } end {* is available, the number refers to DOUBLE COMPLEX * } S*words, the memory is allocated from the HEAP *} ? * this prints the current status *} end {* this terminates the diffraction data parser *} Appendix C X-PLOR Input File for Rotation Search remarks file xtalmr/rotation.inp -- cross rotation function (model P1 vs crystal) {===> structure @generate.psf end {===> } coor @generate.pdb { read structure file } { read coordinates } { specify location of Patterson map files } evaluate ( $pl_map="p l_map.dat" ) evaluate ( $p2_map="p2_map.dat" ) {===> } {===> } evaluate ( $max_vector=-20.) { maximum Patterson vector to be searched } evaluate( $m_max_vector=-$max_vector ) xrefin { make Patterson P1 map of model in P1 box } {===>) { the P1 box has to be larger than twice the } { the extend of the model in each direction } a=80.0 b=120.0 c=100.0 alpha=90.0 beta=90.0 gamma=90.0 symmetry=(x,y,z) SCATter ( chemical C* ) 2.31000 20.8439 1.02000 10.2075 SCATter ( chemical N* ) 12.2126 .005700 3.13220 9.89330 SCATter ( chemical O* ) 3.04850 13.2771 2.28680 5.70110 SCATter ( chemical S* ) 6.90530 1.46790 5.20340 22.2151 SCATter ( chemical P* ) 6.43450 1.90670 4.17910 27.1570 SCATter ( chemical FE* ) 11.1764 4.61470 7.38630 0.30050 1.58860 .568700 .865000 51.6512 .215600 2.01250 28.9975 1.16630 .582600 -11.529 1.54630 .323900 .867000 32.9089 .250800 1.43790 .253600 1.58630 56.1720 .866900 1.78000 0.52600 1.49080 68.1645 1.11490 3.39480 11.6729 0.07240 38.5566 0.97070 { allocate sufficient space for the reflections of the P1 box } {===>} nreflections=200000 {===> } resolution 8.0 3.0 { resolution range for P1 box i generate method=fft fft grid=0.25 end { generate reflections for P1 box I { sampling grid for FFT and Patterson map (1/4 high resol.) } update { compute Fcalcs for model in P1 box } do amplitude (fcalc=fcalcA2) do phase (fcalc=0.0) map { compute IFcalcl^2 and store in Fcalc I { compute Patterson map P 1 (which will be rotated) } { we write a hemisphere of Patterson vectors with } extend=box { lengths less than $max_vector. } xmin=$m_max_vector xmax=$max_vector ymin=$m_max_vector ymax=$max_vector zmin=0.0 zmax=$max_vector automatic=true formatted=false output=$p l_map end end xrefin { use automatic scaling of map } { write an unformatted map file } { make Patterson map P2 of crystal I { unit cell for crystal I {===>} a=24.24 b=54.67 c=40.90 alpha=90. beta=90. gamma=90. I ===> } symmetry=(x,y,z) { operators for Patterson symmetry of crystal P22121 symmetry=(-x,-y,z) symmetry=(x,-y,-z) symmetry=(-x,y,-z) {===>} nreflections=300000 reflection @ing.fob end { read reflections } {===>} resolution 8.0 3.0 { resolution range } reduce do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma)) fwind=0. 1= 100000 method=fft fft grid=0.25 end { sigma cutoff } { sampling grid for Patterson maps (1/4 high resol.) } do amplitude (fcalc=fobs^2) do phase (fcalc=0.0) map extend=unit automatic=true formatted=false output=$p2_map end end xrefin nrefl=10 search rotation plinput=$pl_map p2input=$p2_map { c( ompute IFobsl^2 and store in Fcalc } { cornLpute Patterson map P2 I { us(e automatic scaling of map } { release some memory } formatted=false (===> } range=5.0 $max_vector threshold=0.0 npeaks= 15000 { Patterson vector selection for map P1 } { use 15000 largest vectors of map P1 } {===>} tmmin=0.0 tmmax= 180. { Lattman angle grid. Specify asymmetric } t2min=0.0 t2max=90. { unit for rotation function here. See } tpmin=0.0 tpmax=720. { Rao, S.N. et al. (1980). Acta Cryst. A36} { 878--884. I delta=2.5 { Roughly, delta should be less than ArcSin[ high resol / (3*$max_vector)]. I list=rotation 1.rf nlist=6000 epsilon=0.25 end end stop { output file for cluster analysis } { analyse highest 6000 peaks of rotation function } { matrix norm for cluster analysis I Appendix D X-PLOR Input File for PC-Refinement remarks file xtalmr/filter.inp -- pc-refinement of rotation function peaks {===> } parameter @paraml 1.dna end {===> structure @generate.psf end {===>) coor @generate.pdb { read parameters } { read structure file } { read coordinates }) evaluate ($wa= 10000.) { this is the weight for the XREF energy term { in this case it is arbitrary since we're not } { combining it with other energy term } xrefin {===>) @ing.cel { unit cel l for crystal }) SCATter ( chemical C* ) 2.31000 20.8439 1.02000 10.2075 1.58860 .568700.865000 51.6512 .215600 SCATter ( chemical N* ) 12.2126 .005700 3.13220 9.89330 2.01250 28.9975 1.16630 .582600 -11.529 SCATter ( chemical O* ) 3.04850 13.2771 2.28680 5.70110 1.54630 .323900 .867000 32.9089 .250800 SCATter ( chemical S*) 6.90530 1.46790 5.20340 22.2151 1.43790 .253600 1.58630 56.1720 .866900 SCATter ( chemical P* ) 6.43450 1.90670 4.17910 27.1570 1.78000 0.52600 1.49080 68.1645 1.11490 SCATter ( chemical FE*) 11.1764 4.61470 7.38630 0.30050 3.39480 11.6729 0.07240 38.5566 0.97070 {===>} nreflections=30000 reflection @ing.fob end { re;id reflections {===>I resolution 8.0 3.0 { reso lution range } reduce do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma)) { sigma cutoff I fwind=0. 1= 100000 {===>} method=fft fft memory=2000000 end { fft method with memory statement I wa=$wa target=E2E2 mbins=20 { specify target } { number of bins used for E calculation } tolerance=0. lookup=false { this makes the minimizer happy } { expand data to a P1 hemisphere: this sequence of } { statements first applies the crystal symmetry ops } { to the current reflections. In the second step I { Friedel mates or other redundancies are removed. } { This is necessary since the application of the } { symmetry operators can produce Friedel mates } { under special conditions. I hermitian=false expand hermitian=true symmetry reset reduce end flags exclude * include xref end I only use XREF energy term I I write the results of the refinement } {===>} set display=filterl.list end { to a file called "filter.list" } set precision=5 end set message=off end set echo=off end { turn off messages and echo to reduce) { output } evaluate ($number=-0) evaluate ($counter=0) { loop over all orientations as specified ) { in file rotation.rf (conventional rf) } for $1 in ( @rotationl.rf) loop main { this series of statements } evaluate ($counter=--$counter+1) if ($counter=l1) then evaluate($index= $1) { assigns the information of} elseif ($counter=2) then evaluate($tl=$1) { a single line in file I elseif ($counter=3) then evaluate($t2=$1) { rotation.rf to the approp. } elseif ($counter=4) then evaluate($t3=$1) { variables. A single line } elseif ($counter=-5) then { contains } evaluate ($rf=$1) { $index $tl $2 $t3 $rf. } evaluate ($counter-0O) evaluate ($number=-$number+1) coor copy end { save current coordinates } coor rotate euler=-( $tl $t2 $t3 ) end { and then rotate them { according to the orientation } { specified by $t l, $t2, $t3 } energy end evaluate ($pc 1= 1.0-$xref/$wa) { compute initial energy } { and store in $pc 1 minimize rigid rigid body minimization of the nstep= 15 { c)rientation of the molecule } drop= 10. end evaluate ($pc2=1.0-$xref/$wa) coor swap end { fit coordinates to starting structure in } vector do (vx=x) ( all ) { order to measure the orientation of the } vector do (vy=y) ( all ) { PC-refined structure } { the arrays vx, vy, vz are used as temporary } vector do (vz=z) ( all) { stores in order to keep the starting } coor fit end { coordinates vector do (x=vx) ( all) } { the COOR FIT statement stores the angles vector do (y=vy) ( all) vector do (z=vz) ( all ) { in the symbol $thetal, $theta2, $theta3 } { print information: orientation of rotation I { function peak ($tl, $t2, $t3), orientation } { after PC-refinement ($thetal, $theta2, } { $theta3), index of the rotation function, } { rotation function value, PCs for initial, } { rigid body and domain refined structures. } display $tl $t2 $t3 end if end loop main stop $thetal $theta2 $theta3 $index $rf $pcl $pc2 Appendix E X-PLOR Input File for Translation Search remarks file xtalmr/translation.inp -- PC-refinement followed by translation search { The first part of this job is similar to the PC-refinement I { job (filter.inp). We actually have to repeat the refinement) { for the selected orientation since we did not store the } { refined coordinates. } {===> parameter @paraml l.dna end {===> } structure @generate.psf end {===>} coor @generate.pdb { read parameters } { read structure file } { read coordinates } evaluate ($wa= 10000.) { this is the weight for the XREF energy term { in this case it is arbitrary since we're not } { combining it with other energy term } xrefin {===>} @ing.cel SCATter ( chemical C* ) 2.31000 20.8439 1.02000 10.2075 SCATter ( chemical N* ) 12.2126 .0057003.13220 9.89330 SCATter ( chemical O* ) 3.04850 13.2771 2.28680 5.70110 SCATter ( chemical S* ) 6.90530 1.46790 5.20340 22.2151 SCATter ( chemical P* ) 6.43450 1.90670 4.17910 27.1570 SCATter ( chemical FE* ) 11.1764 4.61470 7.38630 0.30050 {===> } nreflections=30000 { unit cell for crystal } 1.58860 .568700 .865000 51.6512 .215600 2.01250 28.9975 1.16630 .582600 -11.529 1.54630 .323900 .867000 32.9089 .250800 1.43790 .253600 1.58630 56.1720 .866900 1.78000 0.52600 1.49080 68.1645 1.11490 3.39480 11.6729 0.07240 38.5566 0.97070 reflection @ing.fob end ({read reflections I {===> resolution 8.0 3.0 { resolution range } reduce do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma)) { sigma cutoff } fwind=0. 1= 100000 {===>} method=fft fift memory=2000000 end wa=$wa target=E2E2 mbins=20 { fft method with memory statement I { specify target used for both PC-refinement} { and translation search } { number of bins used for E calculation } tolerance=0. lookup=false { this makes the minimizer happy }) { expand data to a P1 hemisphere I hermitian=false expand hermitian=true symmetry reset reduce end flags exclude * include xref end { only use XREF energy term } {===> } coor rotate euler=(213.57 10 93.566) end { rotate the structure according to the selected } { orientation. Note: use the orientation that } { comes out of the rotation function (first three { numbers in file "filter.list". } {===> } minimize rigid nstep=15 drop=10. { repeat the refinement steps of job filter.inp }) end { now we have to turn the crystal symmetry on } { in order to carry out the translation search I xrefin {===> } @ing.cel reduce end {now we get ready for the translation search I xrefin { set the grid size for the translation search } { should be less than 1/3 high-resolution limit} evaluate ( $gridx=1./40. ) evaluate ( $gridy=1./50. ) evaluate ( $gridz=1./80. ) evaluate ( $grid=min($gridx,$gridz)) search translation mode=fractional xgrid=0.0 $grid 0.5 ygrid=0. 0.02 0.5 zgrid=0. $grid 0.5 nlist=1000 { we only have to search in x,z in this } { space group. In general we have to } { specify an asymmetric unit for the } translation function. N.B.: This is } NOT necessarily identical to an } asymmetric unit of the space group!! i { list the 1000 best solutions the list is returned in the standard } output file. } output=translation 1.3dmatrix { output matrix for plotting. } { this can be verbose for 3d } { translation functions!! } end end write coordinates output=translation.pdb end { the translation function { returns the coordinates of } { best solution. } xrefin resolution 8. 3.0 target=residual update print rfactor end stop { analyse the R factor distribution } { of the best solution. } } Appendix F X-PLOR Input File for Rigid Body Refinement remarks FILE RIGID.INP remarks rigid-body refinement @ingprepare.inp {*read various standard data sets* } coordinates @generate.pdb {*read in initial model*} coordinates copy end{ * copy to comparison set* } {* include only R-factor in energy function * } flags exclude bond angl dihe impr vdw elec pvdw pele include xref end xrefin resolutionlimits=8.0 2.5 tolerance=0.0 update-fcalc print R-factor wa=1300.0{*arbitrary value, since only XREF carries weight* } wp=0.0 end minimize rigid nstep=40 drop= 10.0 group = (resid 2:15 ) end minimize rigid nstep=40 drop= 10.0 group = (resid 2:5 ) group = (resid 12:15) end minimize rigid nstep=40 drop= 10.0 group = group = group = group = group = group = group = group = ( resid ( resid ( resid ( resid ( resid ( resid ( resid ( resid 2) 3) 4) 5) 12) 13) 14) 15) end write coordinates output=rigid.pdb end coordinates rms end{ *print out rms to intial coordinates* } stop Appendix G X-PLOR Input File for Simulated Annealing remarks file xtalrefine/slowcool.inp remarks crystallographic SA-refinement (slow-cooling method) {===> parameter @paraml 1.dna end {===> } structure @generate.psf end {===>) coor @prepingl.pdb I read parameters } { read structure file } { read coordinates } vector do ( charge=0.0 ) ( resname LYS and ( name ce or name nz or name hz* ) ) I Turn off charges on LYS vector do ( charge=0.0 ) ( resname GLU and ( name cg or name cd or name oe* ) ) { Turn off charges on GLU vector do ( charge=0.0) ( resname ASP and ( name cb or name cg or name od* ) ) { Turn off charges on ASP vector do ( charge=0.0) (resname ARG and ( name cd or name *E or name cz or name NH* or name HH* ) ) { Turn off charges on ARG } flags include pele pvdw xref end xrefine {===> } a=24.24 b=54.67 c=40.90 alpha=90.0 beta=90.00 gamma=90.0 {===> } symmetry=(x,y,z) symmetry=(-x,-y,z) symmetry=(x,-y,-z) symmetry=(-x,y,-z) symmetry=( 1/2+x,1/2+y,z) symmetry=(1/2-x, l/2-y,z) } } } symmetry=(1/2+x,1/2-y,-z) symmetry=(1/2-x, 1/2+y,-z) SCATter ( chemical C* ) 2.31000 20.8439 1.02000 SCATter ( chemical N* ) 12.2126 .005700 3.13220 SCATter ( chemical O* ) 3.04850 13.2771 2.28680 SCATter ( chemical S* ) 6.90530 1.46790 5.20340 SCATter ( chemical P*) 6.43450 1.90670 4.17910 SCATter ( chemical FE* ) 11.1764 4.61470 7.38630 10.2075 1.58860 .568700 .865000 51.6512 .215600 9.89330 2.01250 28.9975 1.16630 .582600 -11.529 5.70110 1.54630 .323900 .867000 32.9089 .250800 22.2151 1.43790 .253600 1.58630 56.1720 .866900 27.1570 1.78000 0.52600 1.49080 68.1645 1.11490 0.30050 3.39480 11.6729 0.07240 38.5566 0.97070 {===>} nreflections=15000 reflection @ing.fob end { read reflections } {===>} resolution 8.0 2.5 { resolution range ) reduce do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma)) fwind=0. 1= 100000 { sigma cutoff }) method=FFT fft memory= 1000000 end tolerance=0.2 { tolerance for dynamics } {===>} wa=6500 end { weight from job "check.inp" } set seed=432324368 end { set the initial random seed for the v-assignment I { ===> } evaluate ($init_temp=3000.) vector do (vx=maxwell($init_temp)) ( all ) vector do (vy=maxwell($init_temp)) ( all ) vector do (vz=maxwell($init_temp)) ( all) { starting temperature } vector do (fbeta=100.) ( all) evaluate ($1=$inittemp) while ($1 > 300.0) loop main dynamics verlet timestep=0.0005 nstep=50 iasvel=current nprint=5 iprfrq=0 tcoupling=true tbath=$1 end evaluate ($1=$1-25) end loop main xrefin tolerance=0.0 end minimize powell nstep=200 drop= 10.0 end lookup=false { this makes the minimizer happy I { final minimization I write coordinates output=slowing.pdb end { Write coordinates stop