Crystallographic Studies of the Insulin-Linked Polymorphic Region

advertisement
Crystallographic Studies of the Insulin-Linked
Polymorphic Region
by
Qingfei Zhang
Submitted to the Department of Chemistry
in partial fulfillment of the requirements for the degree of
Master of Science in Chemistry
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
Febuary 1998
@ Massachusetts Institute of Technology, 1998. All Rights Reserved.
Author ..............
.......
....................
r
"
Departrent of Chemistry
September 2, 1997
.............
Alexander Rich
William Thompson Sedgwick Professor Of Biophysics
Department of Biology
Certified by ....................... .
Accepted by ............................................................
Dietmar Seyferth
Robert T. Haslam and Bradley Dewey Professor of Chemistry
Chairman, Departmental Committee on Graduate Students
M
R (13 1343i~
Crystallographic Studies of the Insulin-Linked
Polymorphic Region
by
Qingfei Zhang
Submitted to the Department of Chemistry on September 2, 1997,
in partial fulfillment of the requirements for the degree of Master of
Science in Chemistry
Abstract
The insulin-linked polymorphic region (ILPR) contains a 14 base-pair long tandem
repeat of 5'-ACAGGGGTGTGGGG-3' located 363 base-pairs upstream of the human
insulin gene. Genetic studies have identified the ILPR as a locus for insulin-dependent diabetes mellitus (IDDM). Biochemical studies have shown the existence of a G-quartet
structure in this region. To investigate the G-quartet structure in detail, the sequence
GGGGTGTGGGG was crystallized with a 24-condition screening matrix. X-ray diffraction data of the DNA crystal were collected. Molecular replacement was attempted with
the G- uartet model of TGGGGT. To solve the phase problem, the heavy atom derivatives
GGGG UGTGGGG and GGGGTGIUGGGG were used for isomorphous replacement.
Anomalous patterson peak search was also attempted. However, phases of the structure
factors could not be solved due to the difficulty of locating heavy atom peaks.
However, the structure of GGGGTGTGGGG solved by NMR shows the existence
of an antiparallel hairpin G-quartet structure. The detailed structure is described and the
connection between G-quartet formation and transcriptional regulation is discussed.
Thesis Supervisor: Alexander Rich
Title: William Thompson Sedgwick Professor of Biophysics
Acknowledgments
"It is a very great thing to be able to think as you like; but, after all, an important
question remains: what you think."
Matthew Arnold
Not unlike many other business endeavors, scientific research can either engender
serendipitous discoveries and grand rewards or evolve into a Kafkasque nightmare filled
with disillusionment and dejections. Fortunately or unfortunately, I have experienced neither. Thanks to my parents, who offered unflinching support for my career endeavors, perspicacious judgement for my career decisions and comforting solace during my trying
times. For which I am eternally grateful.
Many thanks to Dr. Rich, who is arguably the most skillful Zen master of not only
the science, but also the art of life. Little do I know about the limits of his knowledge or
the boundaries of his wisdom. From whom much remains to be learned.
I am also greatly indebted to Dr. Liqing Chen, who has shown the true marks of a
crystallography guru, who possesses the technical brilliance and professional experience
to offer needed assistance to his fledgling apprentice. To enumerate his good deeds in
detail would be a Herculean effort. Most importantly, his teachings are not merely cerebral, but contain an element of the heart.
Thanks to Sarah, for rescuing me from the solitude of my kingdom.
Last but not least, I offer my heartfelt gratitude to many others, for leaving me with
the memories of unfaded dreams and grand aspirations, and for the obviation of doubts
that otherwise might have haunted me. Without their existence my work would certainly
be an insular and myopic effort.
A human being is an intrinsically complex animal. He synthesizes happiness from
numerous sources, such as professional success, intellectual satisfaction, emotional fulfillment and financial freedom. However, I have come to believe that the true reward of scientific research lies not in societal or pecuniary rewards, nor in the invidious display of
power and academic prestige, but in the pure and unadulterated joys of a curious and
inquisitive mind.
Table of Contents
Abstract ...........
..... ............
Acknowledgm ents.... ........................................
................................................
2
....................................................
3
Chapter 1: Introduction
.............
1.1 The Insulin-Linked Polymorphic Region ......................................
1.2 X -ray Crystallography ............................................................... ................ 10
Chapter 2: Experimental
2.1 M aterials .......................................................................................................... 18
2.2 C rystal Grow th................................................ ........................................... 20
.......... ........... 21
2.3 X-Ray Diffraction .......................................................
2.4 Molecular Replacement ....................................................... 23
2.5 Isomorphous Replacement..............................................27
Chapter 3: Results and Discussion
........................... 29
3.1 D iffraction A nalysis........................................................
....................... 29
3.2 Molecular replacement..............................................
3.3 Isomorphous replacement .................................................... 30
........................ 31
3.4 Anomalous scattering..............................................
3.5 The structure of G4TGTG4 ................................................... 34
3.6 Correlation between G-quartet structure and transcriptional activity...........39
R eferences.........................................................
................................................. 41
Appendix A X-PLOR Cell Symmetry File ................................................................. 43
44
Appendix B X-PLOR Data File ..................................................................................
Appendix C X-PLOR Input File for Rotation Search.................................................46
Appendix D X-PLOR Input File for PC-Refinement ................................................. 50
54
Appendix E X-PLOR Input File for Translation Search ......................................
Appendix F X-PLOR Input File for Rigid Body Refinement .................................... 58
Appendix G X-PLOR Input File for Simulated Annealing ...................................... 60
Chapter 1: Introduction
1.1 The Insulin-Linked Polymorphic Region
Diabetes mellitus is a disorder of carbohydrate metabolism resulting from insufficient production of or reduced sensitivity to insulin. One variety of this disease is insulindependent diabetes mellitus (IDDM), in which insulin is not secreted by the pancreas and
hence must be taken by injection (Tisch and McDevitt, 1996). The genetic basis of this
disease is multifactorial and susceptibility is determined by environmental and genetic
factors. Inheritance is polygenic and is influenced by the genotype of the class II major
histocompatibility complex (MHC). While MHC class II genotype is one of the strongest
factors determining susceptibility to IDDM, it has been discovered, through microsatellite
analysis of genome-wide polymorphisms in IDDM families, that many other genetic
regions also influence susceptibility (Vyse and Todd, 1996). The first non-MHC genetic
region implicated in IDDM is a polymorphic minisatellite 5' of the human insulin gene on
chromosome 11 (Kennedy, et al., 1995). Minisatellites are highly repetitive DNA
sequences found in mammalian genomes and vary in length from a few to several thousand base pairs. They vary from simple di and trinucleotide repeats to more complex
repetitive elements. Some trinucleotide repeats have been implicated in human diseases as
varied as fragile X syndrome, myotonic dystrophy and Kennedy's disease. The insulinlinked polymorphic region(ILPR) is composed of a variable number of 14-base pair tandem repeats of the sequence AGAGGGGTGTGGGG 365 base-pairs up stream from the
transcription site.
The ILPR is unique due to its high degree of polymorphism in the human population. The polymorphism is generated by variation in the number of tandem repeats within
a given ILPR and by minor nucleotide sequence heterogeneity within the individual
repeats. Overall, fourteen closely related repeats have been discovered and ten of them
have been associated with IDDM. Nine of them consist of single-base pair substitutions in
the repeat sequence, and one is polymorphic in the repeat number (variable number tandem repeat, VNTR). Classification of the ILPR into three main classes is based on length
differences. Class I, II III alleles have lengths of 40, 85, 157 repeats, respectively. Recently
it has been shown that different classes have different levels of transcriptional activity
(Kennedy, et al, 1995), with a long ILPR having more transcriptional activity than a short
ILPR.
How the above two characteristics of the ILPR, namely, genetic susceptibility to
IDDM and transcriptional activity, are related is not clear. However, because the short
ILPR is preferentially associated with IDDM, it could be inferred that decreased insulin
transcription is related to IDDM and higher levels of insulin transcription from the long
ILPR protects against IDDM. Therefore, factors that cause the difference in transcriptional
activity could very well account for the genetic linkage between the ILPR and IDDM.
To investigate the basis for the difference in transcription, it is helpful to consider
another feature of the ILPR, namely its guanine-rich nature. Works based on the effect of
alkali metal ions on the mobility of oligonucleotides (Sen & Gilbert, 1990) containing the
ILPR consensus sequence in non-denaturing gels have shown the existence of quadriplex
structures (Hammond-Kosack, et al, 1992, 1993). Quadriplex structures are formed only
by oligomers containing one or several runs of guanine residues. These structures are stabilized by Hoogsteen base pairing, involving the N-7 positions of the contributing G residues (Figure 1.1). The exactly nature of the quadriplex formed depends on the number of
G runs within the oligomers and environmental conditions such as salt concentration, temperature and torsional stress.
Based on analysis of gel electrophoretic patterns, chemical modifications, ultraviolet-induced crosslinks and recently, x-ray crystallography, three main quadriplex structures have been identified. They are tetrastrand parallel quadriplexes(G4-DNA)
(Laughlan, G. et al, 1994), unimolecular antiparallel quadriplexes (Wang, Y & Patel, D.J.,
1993), bimolecular quadriplexes (Kang, C. et al, 1992). In all of the reported four-stranded
structures the DNA strands are aligned parallel to each other, with each of the nucleotides
in the anti conformation. However, the folded strands must have some antiparallel alignment by nature, which requires some of the guanine nucleotides to adopt the syn conformation in order to form a G-quartet. More interesting is the class of bimolecular
quadriplexes, in which the two DNA strands can adopt three different model: antiparallelstranded edge-looped model, diagonal-looped model and alternative diagonal-looped
model (Figure 1.2). The x-ray structure of four-stranded Oxytricha telomeric DNA
G4 T4 G4 shows an edge-looped quartet (Figure 1.3) in which the thymine groups are
aligned along the edge of the neighboring G-quartets (Kang et al, 1992). However, NMR
studies showed that the same DNA sequence forms a diagonal-looped quadruplex in solution with the thymine loops aligned diagonally across the neighboring G-quartets (Smith
and Feigon, 1992). The difference in topology could be due to the flexibility of G-rich
DNA to form quadruplexes under different ionic conditions.
The existence of G-tetraplex in ILPR has provided a new possibility for explaining
the difference in transcriptional activity between the different ILPR sequences. To study
the structure of the G-tetraplex in detail, x-ray crystallographic studies on the ILPR consensus sequence GGGGTGTGGGG was carried out and the data was analyzed to gain
some insight into the structure of the G-tetraplex in the crystal.
Figure 1.1: Hydrogen-bonding structure of the G-tetrad.
A
B
C
Figure 1.2: Three models of bimolecular quadruplexes. (A)antiparallel-stranded edgelooped model (B) diagonal-looped model (C) alternative diagonal-looped model.
b
L
Figure 1.3: Four views of the crystal structure of G4 T4 G4 .
c
1.2 X-ray Crystallography
At present, the most powerful method for determining molecular structures to
atomic resolution is x-ray crystallography. More than 1,000 protein structures have been
determined by this method. The underlying approach in the method is to interpret the diffraction of x-rays from many identical molecules in an ordered array such as a crystal. To
achieve the final goal, which is to obtain an atomic-resolution picture of the molecular
structure, high quality crystals of the molecule must be grown, the directions and intensities of x-ray beams diffracted from the crystals must be measured, and computer methods
must be used to interpret the data and reconstruct a three-dimensional image of the crystal
content. Finally, the electron density image must be interpreted by building a molecular
model that is consistent with the image.
Like small molecules, many macromolecules such as nucleic acids and proteins
solidify to form crystals under certain conditions. During crystallization, each macromolecule adopt one of a few orientations. The result is an orderly packing of molecules in three
dimensional arrays. The smallest volume in the crystal that can be repeated by translation
is the unit cell. In crystallography, the content of the unit cell is determined as an electron
density distribution which in turn is used to locate individual atoms in the cell.
The most important aspect of x-ray crystallography is to obtain high quality crystals suitable for x-ray diffraction. Crystallization methods such as vapor diffusion, dialysis, seeding have been invented (Ducruix and Giege, 1992). Vapor diffusion is a widely
used method to obtain crystals by adding precipitant to aqueous solution of macromolecules until the precipitant concentration is just below that required to precipitate the molecule. Then water is allowed to evaporate slowly, which gently raises the concentration of
molecule and precipitant until precipitation occurs. Whether the molecule forms crystals
or disordered precipitates depends on molecule's concentration, solution pH, temperature
and salt conditions. Finding the right conditions for growing the perfect crystal requires
many careful trials and is often more of an art than a science. One prevalent crystallization
method is vapor diffusion, in which the solution is allowed to equilibrate in a closed container with a larger aqueous reservoir of optimal precipitant concentration. In this study,
vapor diffusion method was used for all crystallizations and to screen for the optimal crystallization condition, a 24-condition matrix was used (Berger, I., et al. 1996). The matrix
was designed based on the identification of factors that enhanced DNA crystal growth.
Some of the factors are pH, concentration of monovalent cations, magnesium ions, other
divalent cations, polyamines and cobalt hexammine. MPD was used as the precipitant at a
concentration from 10% to 30%.
After crystals of sufficiently high quality are grown, x-ray diffraction analysis can
be performed. The central problem in x-ray crystallography is to determine p(x, y, z), the
electron density distribution inside the unit cell:
p(x, y, z) =
p1
,
-Fhkle
F
h
ip(hkl)
27i(hx +ky
+ ky + lz)
The diffraction pattern contains information about the structure factor F at each
position (h, k, 1)in reciprocal space, which is the Fourier transform of electron density p:
Fhki = f
p (x, y, z)e 2 ni(hx + ky + lz)dxdydz
xyz
The magnitude of the structure factor at (h, k, 1) is proportional to the square root
of the measured intensity I at that position:
Fhkl 0
While it is relatively easy to determine the magnitudes of structure factors, it's
much more difficult to obtain their phases. It can be accomplished by a few complex
experimental techniques such as direct methods, isomorphous replacement and anomalous
scattering methods. If the structure of a similar molecule is available, molecular replacement can also be used to determine the phases.
The direct methods are a set of analytical techniques for deriving an approximate
set of phases from which a first approximation to the electron density map can be calculated. Interpretation of this map gives a suitable trial structure of the molecule. It makes
use of the existence of mathematical relationships among certain combinations of phases.
From these relationships, enough initial phase estimates can be obtained to begin converging toward a complete set of phases.
Direct methods work when the number of reflections is small. When the molecule
is large enough that a heavy atom does not change its structure significantly, isomorphous
replacement can be used. In this method, a heavy atom is incorporated into the molecule
and the slight perturbation of diffraction patterns caused by the added atom can be used to
obtain initial estimates of phases. Because the diffractive contributions of atoms are additive vectors, the structural factor of the heavy atom derivative, FHp is the vector sum of the
structural factors for the heavy atom(FH) and for the protein(Fp):
HP = PH + PP
Once the heavy atom is located in the unit cell, FH is known and phase information
can be obtained by representing the above equation in the complex plane. Phase ambiguities can be eliminated by incorporating a second heavy atom that binds to a different site
from the first.
To locate heavy atom in the unit cell, the relatively simple diffraction signature of
the heavy atom is extracted from the far more complicated diffraction pattern of the heavy
atom derivative. The standard technique for determining the heavy atom location employs
the patterson function P(u, v, w), which is a variation of the Fourier transform used to
compute the electron density p(x, y, z):
Fhkl2 e-2 x i(hu + kv
P(u, v, w) = 1
+
w)
hkl
A difference patterson function, AP(u, v, w), with amplitudes of (AF) 2 = (IFHpI IFpI) 2 can be used to search for the heavy atom in the derivative crystal:
(AFhk2 )e - 27i(hu+kv+1w)
AP(u, v, w) = 1
hk
l
The Patterson map, which consists of coordinates (u, v, w), is a contour map of
P(u, v, w) that displays peaks at locations corresponding to vectors between atoms.
Through a trial-and-error process, these peaks can be used to identify the location of the
heavy atom.
Anomalous scattering is another method for phase determination. It takes advantage of the heavy atom's capacity to absorb x-rays of specified wavelength. The heavy
atom absorbs appreciably the x-rays used and there is a phase change for the x-rays scattered by that atom relative to the phase of the x-rays scattered by an nonabsorbing atom at
the same site. As a result, the intensities of symmetric reflections(Friedel pairs) hkl and hk-1- are not equal, which leads to a difference between the structure factor magnitudes
IFhk1 12 and IFh-.k.-1 2 . The phase of a reflection in the heavy-atom derivative data can then be
calculated, which in turn gives the phase of the corresponding reflection in the native data.
If heavy atom derivatives suitable for diffraction analysis cannot be obtained, the
method of molecular replacement can be used to determine the structure from a single
native data set. A model of a known molecule can be placed in the unit cell of the new
molecule and the phases from structure factors of the known molecule can be used as initial estimates of phases for the new one. If the phasing model and the new molecule are
isomorphous, then the phases from the model molecule can be used directly to compute
p(x, y, z) from native intensities of the new molecule:
lz newl -27Ei(hx + ky +
S(x Y, z) =
,hklmodel)
hkl
In this equation, IFnewl can be obtained from the native intensities of the new
molecule, and the phases newmodel are from the model molecule. The process of iterative
phase refinement can change the phases from those of the old model to those of the new
molecule, thus giving the new structure. Often the phasing model is not isomorphous with
the desired structure but is related to it by a translation and rotation operation, as in:
X
2
= [C]X
1
+
where X1, X2 are the position vectors of the model molecule and the new molecule,
respectively and [C] is a rotation matrix and d is a vector defining translation. Then a
search in three rotational degrees of freedom and three translational degrees of freedom
needs to be performed on the phasing model to position it as identical as the new molecule
in the unit cell. In the first step, a rotation function R is used to search the relative orientation(rotation) of the molecule in the unit cell:
R =
P(X2)P(X 1 )dX 1
where P(X1 ) and P(X 2 ) are the Patterson functions for molecule 1 and 2. It has a maximum value when the two self-vector sets are equivalently oriented. Having determined the
orientation, the position of the molecule can be determined by maximizing the translation
function T:
T(A) =
P(u+A)P([C]u-A)du
Jul < 2r
where A is a translation vector which is independent of the origin of the rotation axis and
relates the centers of the two molecules (Brtinger, A., 1992) The condition lul>2r is used to
remove the self vector set.
The structure factors of the properly positioned model can then be calculated and
the computed phases can be the initial estimates of the desired phases and can be refined
by the use of any available noncrystallographic symmetry or by density modification and
solvent flattering. When noncrystallographic symmetry is present, electron densities of the
noncrystallographically related units are averaged and back-transformed. The resulting
calculated phases are then applied to the observed structure factors to compute a new
improved electron density map. The part of the structure outside the molecular envelope is
usually flattened to represent solvent. The process is then repeated for many cycles until
convergence has been achieved.
Simple methods such as rigid body refinement can then be used to improve the
model by minimizing the crystallographic residual factor(R-factor), which is defined as
R- IIFobs - IFcaicI
IFobs
where Fobs is derived from a measured reflection intensity and Fcalc is the amplitude of the
corresponding structure factor calculated from the model. The entire molecule or a group
of atoms is treated as a rigid body and moved inside the asymmetric unit to obtain orientations and positions having a lower R-factor.
More complicated methods such as conjugated gradient or simulated annealing
minimizes the hybrid energy function:
E total = Eempirical
E
+E effective
Eempirical describes the energy of the molecule through an empirical energy function,
which is a sum of energy terms describing bonding stretching, bond bending, dihedral
angles, improper angles, hydrogen-bonding van der Waals and electrostatic interactions.
Eeffective is an effective potential energy function that incorporates molecular dynamics
into the energy function. It describes the difference between the observed structure factor
amplitudes and those calculated from the atomic model.
Simulated annealing method employs molecular dynamics, which is an attempt to
simulate the movement of molecules by solving Newton's laws of motion for atoms moving within force fields that represent the effects of covalent and noncovalent bonding. The
model is allowed to move as if at high temperature, in hopes of lifting it out of local
energy minima. Then it's cooled slowly to find the lowest-energy conformation at the temperature of diffraction data collection.
Many cycles of model building and structural refinement are required to converge
the model with data. The primary measure of convergence is the R-factor. Values of R
ranges from 0, for perfect agreement of calculated and observed intensities, to about 0.6,
for a set of randomly calculated intensities. An R-factor greater than 0.5 implies a poor
model and structural refinement will not be useful. On the other hand, An R-factor less
than 0.2 implies a reliable model.
Chapter 2: Experimental
2.1 Materials
The DNA oligonucleotide GGGGTGTGGGG(G 4 TGTG 4 ) and the heavy atom
derivatives GGGG-6-iodo-uracil-GTGGGG(G4IUGTG 4 ) and GGGGTG-6-iodo-uracilGGGG(G 4 TGIUG 4 ) were synthesized by the solid phase phosphoramidate method on an
Applied Biosystems DNA synthesizer and purified by passing through a Sephadex column
and several cycles of ethanol-precipitation and lyophilization. The DNA was then dissolved in 600 ul of distilled water and stored under freezing.
Table 2.1 shows a 24-condition matrix for the crystallization of G4 TGTG 4 . It was
prepared from the following stock solutions:
Buffers: Cacodylate buffer pH 7.0, cacodylate buffer pH 6.0, cacodylate buffer pH 5.5.
Polyamines: Spermine terahydrochloride, Cobalt hexaammine chloride Co(NH 3)6 C13 Monovalent ions: LiCl, NaCl, KC1.
Divalent Ions: MgC12 , SrC12 , BaC12.
Precipitant: 2-mehyl-2,4-pentane-diol (MPD).
Table 2.1: 24-condition matrix composition
Condition
pH
Polyamine
Monovalent
ion
1
7.0
12 cmM
spermine
80 mM KC1
2
7.0
12 mM
spermine
80 mM KC1
3
7.0
12 mM
spermine
80 mM
NaCl
4
7.0
12 mM
spermine
80 mM
NaCl
Divalent ion
20 mM
MgC12
20 mM
MgCl 2
Table 2.1: 24-condition matrix composition
Condition
pH
Polyamine
Monovalent
ion
Divalent ion
20 mM
MgC12
5
7.0
12 mM
spermine
80 mM
NaCI, 12
mM KCl
6
7.0
12 mM
spermine
12 mM
NaCi, 80
mM KCI
7
6.0
12 mM
spermine
80 mM KCl
8
6.0
12 mM
spermine
80 mM KCl
9
6.0
12 mM
spermine
80 mM KCl
10
6.0
12 mM
spermine
80 mM
NaCI
11
6.0
12 mM
spermine
80 mM
NaCi, 12
mM KCl
12
6.0
12 mM
spermine
12 mM
NaC1, 80
mM KC1
13
7.0
12 mM
spermine
80 mM
NaCI
20 mM
BaC12
14
7.0
12 mM
spermine
80 mM KCl
20 mM
BaC12
15
6.0
12 mM
spermine
80 mM
NaCI
20 mM
BaC12
16
6.0
12 mM
spermine
80 mM KCl
20 mM
BaCl 2
17
7.0
12 mM
spermine
40 mM LiCI
80 mM
SrC12 , 20
mM MgC12
18
7.0
12 mM
spermine
40 mM LiCl
80 mM
SrC12
20 mM
MgC12
20 mM
MgC12
20 mM
MgC12
Table 2.1: 24-condition matrix composition
pH
Condition
Polyamine
Monovalent
Mon
ion
Divalent ion
19
7.0
12 mM
spermine
80 mM
SrCl 2 , 20
mM MgC12
20
6.0
12 mM
spermine
80 mM
SrC12
21
5.5
20 mM
Co(NH 3 )6 C13
80 mM
NaC1
20 mM
MgC12
22
5.5
20 mM
Co(NH 3 )6 -
80 mM KCI
20 mM
MgC12
C13
23
5.5
20 mM
Co(NH 3 )6 C13
12 mM
NaC1, 80
mM KC1
24
5.5
20 mM
Co(NH 3 )6 -
40 mM LiCl
Cl3
20 mM
MgC12
All conditions contain 40 mM cacodylate buffer and 10%(v/v) of precipitant MPD
2.2 Crystal Growth
The DNA stock solutions were screened in the 24-condition screening matrix
using the vapor diffusion method. Hanging drops were used for G4TGTG 4 and
G4 TGIUG4 . Sitting drops were used for G4 1UGTG 4 . The reservoirs are MPD solutions
with concentrations ranging from 10%(v/v) to 30%(v/v). After identifying the condition
which produced the best quality forms based on size and regularity, plates were prepared
with 24 reproductions of this particular condition. The best crystals in the reproductions
were selected for x-ray analysis. All crystallizations were carried out at room temperature.
Table 2.2: Best Conditions for Crystal Growth
DNA
Stock
DNA used
Best
condition
Volume of
condition
Reservoir
Time of
Crystallization
G4TGTG 4
1 ul
12
2 ul
30% MPD
two weeks
G4'UGTG 4
1 ul
22
2 ul
20% MPD
two days
G4TG1 UG4
1 ul
2
2 ul
30% MPD
overnight
2.3 X-Ray Diffraction
Each crystals selected for diffraction study was mounted in a capillary tube and
placed on a goniometer for data collection.
X-rays from a monochromatic Cu source(1.54 A) were used for diffraction of the
crystals. The diffraction patterns were recorded on a Rigaku RAXIS IIc imaging plate.
Some of the diffraction data were collected under a 277 K nitrogen cold stream. Others
were collected at room temperature. Three still images were taken first to determine the
unit cell and the information was then used in the collection of oscillation images. Figure
2.1 shows a sample oscillation image of G4 UGTG 4.
Figure 2.1: Sample oscillation image of G4 IUGTG4
22
Space group determination was carried out with the RAXIS software and the program Denzo. Diffraction intensities were integrated from image plate files using Denzo
and RAXIS. The following unit cell information was obtained after processing data from
still diffractions:
Table 2.3: Unit Cell Data
Crystal
Space group
Dimensions
Dimensions
(a, b, c)
Resolution
G4 TGTG 4
C222
24.2 A, 54.7 A,
40.9 A
2.3 A
Os soaked
G4TGTG 4
C2221
24.2 A, 53.8 A,
40.6 A
2.3 A
G4IUGTG 4
C222
24.7 A, 53.0 A,
2.5
A
40.0 A
G4 TGIUG 4
P222
unindexable
2.5 A
2.4 Molecular Replacement
The phasing model used for molecular replacement is the G-tetraplex crystal structure of a DNA hexanucleotide TGGGGT (Laughlan, G. et al., 1994). The crystal belongs
to space group P1, with cell dimensions a = 28.76 A, b = 35.47 A, c = 56.77
A and a
=
74.390, 3 = 77.640, = 89.730. Each asymmetric unit contains four parallel-stranded tetraplexes (Figure 2.2).
b
LC a
c
La
b
b
b
Figure 2.2: Crystal structure of TGGGGT tetraplex. Four views of one of the tetraplexes
in the asymmetric unit.
Molecular replacement was carried out with the computer program X-PLOR, a
powerful package for x-ray crystallography (Bringer, A., 1992). The origin of the model
was shifted and each G-tetraplex layer was rotated 1800 to adapt to the new symmetry of
the C222 space group. A cross rotation search was performed in Patterson space. The stationary Patterson map P2 is computed from observed intensities by Fast Fourier Transform. The to be rotated Patterson map P1 is computed from the TGGGGT model. The
strongest Patterson vectors in P1 are used for rotation search using the Eulerian angles
(Rossmann & Blow, 1962), pseudo-orthogonal Eulerian angles (Lattman, 1985) or spherical polar angles. The values of the Patterson map P2 at the positions of the rotated Patterson vectors of map P1 were computed by linear eight-point interpolation. For each
sampled orientation Q the cross rotation function
RF(Q) = Pobs Pmodel ( Q )
between the rotated vectors of P1 and the interpolated values of the Patterson map P2 was
computed. Then all sampled orientations are sorted according to their RF values and a
simple peak search was carried out.
Patterson correlation(PC) refinement was then performed on the highest peaks of
the rotation function (Brtinger, A., 1992). The target function for PC refinement is proportional to the negative correlation coefficient between the squared amplitudes of the
observed and the calculated normalized structure factors. The correct orientation was
identified by having the lowest value of the target function after refinement.
The translation search was subsequently carried out on orientations with high RF
values by computing the target function
E xray
WA
The search routine computes the structure factors Fcalc of the translated primary molecule
and the symmetry related molecules by applying appropriate phase shift operators in
reciprocal space to the calculated structure factors of the original molecule and its symmetry mates, which are defined by the space group operators.
Rigid body refinement was then carried out on the translated model, followed by
simulated annealing. In the preparative stage of simulated annealing refinement, 40 cycles
of minimization was performed to relieve strain or bad contacts of the structure. A slowcooling protocol (Briinger, A., 1990) was then used. The Newtonian equations of motion
were solved numerically by the Verlet algorithm. The initial velocities are assigned to a
Maxwellian distribution at the appropriate temperature. Velocity scaling, Langevin
dynamics and T coupling were used to control the temperature during molecular-dynamics simulation. The following effective energy function is used:
Eeffective - EXRAY + ENOE + EHARM + ECDIH + ENCS + EDG + EREL
which consists of restraining energy terms that use experimental information. Descriptions
of these energy terms can be found in the X-PLOR manual.
2.5 Isomorphous Replacement
The native crystals of G4 TGTG 4 were soaked in various heavy atom solutions. The
first soaking solution contained 10 mM platinum (ethylenediammine) dichloride and
50%(v/v) MPD. After soaking for 15 hours, the crystal was mounted on a capillary tube
and analyzed by x-ray diffraction. Analysis of the still images showed the heavy atom did
not get in the unit cell. Another native crystal was soaked in 10 mM methyl mercuric chloride and 50%(v/v) MPD for 51 hours, but unfortunately the diffraction data were of low
quality and could not be indexed. Therefore another native crystal was soaked in a solution
containing 10 mM mercuric chloride (HgC12 ) and 50%(v/v) MPD. Diffraction data were
collected after 22 hours of soaking, but unfortunately of poor quality. Lastly, osmium was
used as the heavy metal and by directly pipetting the heavy atom into the hanging drop,
good diffraction images were obtained after 1.0 hour of soaking. Diffraction data on two
soaked crystals were then collected and processed to produce difference Patterson maps.
The R-factor of one difference map was lower than 0.15 (Table 2.4), indicating the
absence of heavy atom in the unit cell. Although the R factor of the second map was larger
than 0.15, which indicates the derivative data were different from that of the native, the
Harker sections (Table 2.5) in the Patterson map contained too many small peaks and were
uninterpretable.
Table 2.4: Merge and Scale Data of the Osmium Derivative
Resolution
Number of
Resolutio
independ.
range
reflections
R merge
RMS dev.
from
linearity
R factor
20A- 2.3A
1080
4.50%
--
0.1201
20A - 2.3A
1246
4.31%
0.072
0.2887
§: R merge is the agreement R-factor between symmetry-related observations.
Table 2.5: Harker Vectors for Space Group C222 1
C222 1
X, Y, Z
X, Y, Z
X, Y, 1/2+Z
X, Y, 1/2-Z
X, Y, Z
0
0, 2Y, 2Z
2X, 2Y, 1/2
2X, 0, 1/2+2Z
X, Y, Z
0, 2Y, 2Z
0
2X, 0, 1/2+2Z
2X, 2Y, 1/2
X, Y, 1/2+Z
2X, 2Y, 1/2
2X, 0, 1/2+2Z
0
0, 2Y, 2Z
X, Y, 1/2-Z
2X,0,1/2+2Z
2X, 2Y, 1/2
0, 2Y, 2Z
0
Therefore, attempts were then made to crystallize iodinated derivatives of the
native crystal, G4 IUGTG 4 and G4 TGIUG 4 , instead of further heavy atom soaking. Crystals were obtained for both of the derivatives but only that of one derivative, G4 UGTG 4,
gave good diffraction. A total of 52 oscillation images were collected to a resolution of 2.5
A, with
a 40 oscillation angle and a crystal-to-plate distance of 120 mm. The diffraction
intensities were indexed, refined and scaled with the program Denzo.
Chapter 3: Results and Discussion
3.1 Diffraction Analysis
The diffraction pattern of the native crystal G4 TGTG 4 was analyzed. The lattice
type was orthorhombic with unit cell dimension of a = 24.2
A, b = 54.7 A, c = 40.9 A. The
space group was determined to be C222 by the indexing program. A unit cell with the
above dimensions has a volume of 5.41 x 104 A3 . Assuming half of the crystal volume is
occupied by the solvent, then the volume of the DNA is approximately 2.7 x 104 A3 . In the
space group C222, there are eight asymmetric units per unit cell. So the volume of one
asymmetric unit is 1/8 of 2.7 x 104 A3 , or 3.4 x 103 A3 . Since the specific volume of DNA
is 0.50 cm 3 /g (Cantor, 1980), the molecular mass in one asymmetric unit can be obtained
by dividing the volume of the asymmetric unit by the specific volume. In this case, it is 4.1
kD. The molecular mass of G4 TGTG 4 as calculated from formula is 3.6 kD, which is very
close to the mass of one asymmetric unit. Therefore, it is likely that there is only one
G4 TGTG 4 molecule per asymmetric unit.
3.2 Molecular replacement
PC refinement of the orientations selected by the rotation function was carried out.
However, the refined PC coefficients failed to show any major peaks that could indicate
promising orientations (Figure 3.1). Therefore, a few orientations with highest rotation
function values were then selected for positioning by the translation function, followed by
rigid-body minimization. The output structure has a high R-factor(53%). Consequently,
simulated annealing was used to refine the structure. However, it failed to improve the Rfactor significantly as the lowest R-factor obtained after simulated annealing is 47%.
PC-Refinement
0.4
0.300.2
0.10.0
0
50
100
RF Peak Index
Figure 3.1: PC refinement. No orientation with significantly higher PC coefficient was
found.
3.3 Isomorphous replacement
Table 3.1 shows that the diffraction intensities of G4 UGTG 4 has an R-factor of
0.107. A Patterson map was subsequently produced to locate the heavy atom. Unfortunately the map was uninterpretable due to the absence of any major peaks that could be
attributed to the atom iodine.
Table 3.1: Summary of Diffraction Intensities of G4 UGTG 4
Lower
Resolution(A)
Upper
resolution(A)
Average
intensity
Average
error
Norm X2
R-factor
25.00
4.39
8022.6
476.5
1.992
0.078
4.39
3.49
5617.5
386.1
1.233
0.098
3.49
3.05
3727.8
319.8
1.327
0.120
3.05
2.77
1537.8
182.4
0.710
0.148
2.77
2.57
1176.5
161.9
0.733
0.160
2.57
2.42
602.7
134.8
0.535
0.212
2.42
2.30
386.2
125.2
0.505
0.262
2.30
2.20
247.4
114.3
0.372
0.332
reflections
2961.3
252.3
0.997
0.107
All
3.4 Anomalous scattering
The single wavelength anomalous scattering method (Wang, 1985) was used in
another attempt to locate the iodine atom. The I+ and I- refections were processed separately and compared. As Table 3.2 shows, the X2 value for all reflections is less than 1,
which indicates that the error is greater than intensity and no useful anomalous signal can
be detected. One more effort to locate the anomalous scatterer used difference patterson
analysis (Rossmann, 1961). Anomalous patterson maps were plotted from the data (Figure
3.2). However, they failed to show any heavy atom peaks.
Table 3.2: Summary of Anomalous Signal Detection
Average
intensity
Average
error
Norm X2
R-factor
Lower
Resolution(A)
Upper
resolution(A)
99.00
4.40
8370.5
2065.9
0.229
0.054
4.40
3.49
5367.8
1552.3
0.171
0.060
3.49
3.05
3073.8
1074.1
0.094
0.069
3.05
2.77
1487.2
725.2
0.080
0.106
2.77
2.57
1255.8
706.5
0.079
0.106
2.57
2.42
645.7
569.5
0.048
0.115
2.42
2.30
421.9
494.3
0.046
0.206
2.30
2.20
224.6
436.1
0.038
0.186
reflections
2991.2
1033.9
0.105
0.069
All
Ing IGe.
ne ZO-4A O(<15 OF PAT 30/30.5-15-97
Xx 0.0000
iglls6.
1.0000
Z
0.0000
.
20-4A OF'IS
Or PAT 30/30.5-15-97
ingl16
ee
OF PAT 30/30.5-15-97
Or<15
Z0-4A
I.1111.
...
20-4A
X3 0.5000
Z
0.0000
1I Lt. -.
m AS
1.0000
I ! I/
W41s or pAr303.5-15-wr
Zz 0.5000
4
ANe
0.0000
x
1.0000
3
0/30.S-I-.
1.0000
is***
-"
W, ft?ani.s-.s-3r
I))
OF PAr
0.0000
Zz 0.0000
Y: 0.5000
<ctS
YZ 0.0000
0.0000
Figure 3.2: Anomalous Patterson maps of the G41UGTG 4 crystal.
X
1.0000
3.5 The structure of G4TGTG4
Although the various crystallographic attempts have failed to produce a refined
structure of G4 TGTG 4 , other techniques such as gel electrophoresis, circular dichroism(CD) and recently, one and two dimensional nuclear magnetic resonance spectroscopy
(1D and 2D NMR) have been used to investigate the nature of its chain folding, the stacking interaction of the G-tetraplexes in the stem, and the interactions of the bases in the
loops (Catasti, et al., 1996).
Non-denaturing 20% polyacrylamide gel studies on G4 TGTG 4 as well as two other
oligonucleotide
sequences
capable
of
forming
G-tetraplexes,
G4 ACAG 4
and
G4 TGTG 4 ACAG 4 TGTG 4 show mobilities consistent with G-quartet folding (Figure 3.3).
-30 bo
-20 bp
X13X12XI'-
XG.
1
2
3
4
5
6
7
B 9
10
M
Figure 3.3: Non-denaturing polyacrylamide gel of different ILPR fragments and at different ionic strengths. Lanes 1 to 3 contain the (G4 ACAG4 )2 fragment at respectively 50,
150, 250 mM NaCl. Similarly, lanes 4 to 6 contain (G4 TGTG 4 )2 , and 7 to 9 contain
G4 TGTG 4 ACAG 4 TGTG 4 at 50, 150, 250 mM NaCl, respectively. Lane 10 contains (G4 )4
for control.
Circular dichroism spectrum of G4 TGTG 4 shows one positive band at 295 nm but
the band at 262 nm is absent, contrary to the CD spectra of G4 ACAG 4 and
G4 TGTG 4 ACAG 4 TGTG4 (Figure 3.4). This could be explained by a difference in the
topology of these structures (Figure 3.5). The band at 262 nm could indicate the CD
effects of the 5'-GSYn-Gan-Gsn-Ganti-3' arm orientations, while the band at 295 nm indicates the CD effect of ACA or TGT loop since it is absent in the spectrum of G4 .
44)
-10
Figure 3.4: CD spectra at room temperature for (G4)4, (G4 TGTG 4)2 , (G4 ACAG 4 )2 and
G4 TGTG 4 ACAG4 TGTG4 .
CD -I '
CD -cr
Iii'29
11i
c
ld(G4)14
[d(G 4 ACAG) 12
4
262
295
CDI
-
'
v
I
(d(G 4 TGTGh2
(d(G 4 TGTG4 ACAG 4TGTG4)J
Figure 35: Topological models for the interpretation of CD data.
Imino proton exchange experiment monitored by one-dimensional IH NMR shows
the accessibilities of various Guanine imino protons in the sequence G4 TGTG4 . The presence of proton signals at G2, G3, G9 and G10 after two days of incubation with 2 H2 0
indicates that these guanines are inaccessible to solvent and buried inside the G-quartet.
Two-dimensional Nuclear Overhauser effect(NOE) spectroscopy of exchangeable
and nonexchangeable NIH protons supports the hairpin G-quartet model as shown in Figure 3.6. In this model, the glycosyl torsions of the GGGG residues alternate as 5'-G synGani-Gsyn-Gani-3
'
while the sugar puckers for al the four residues are C2'-endo. The (T5-
G6-T7) loop connects G4anti and G8 sy' along the wide edge of the (G4anti-G8sYn-G 1 la ntiG SYn) quartet. Tow hairpins are anti-parallel to each other. Intra-nucleotide NOEs suggest
that TS, G6, and T7 adopt (c2'-endo, anti) conformation with G6 shifted more toward
(c2'-endo, high anti). The presence of inter-hairpin NOEs such as Hi' (T7B)-H8(G1 la),
Nlh(GIA)-HI'(T7B), N1H(G1A)-H4'(T7B) etc., is consistent with only the anti-parallel
arrangement of the two (A and B) symmetric hairpins.
Flgure 3.6: Map of the G-quartet folding schematics.
Four layers of G-quartets were built according to the topological chain-folding pattern shown in Figure 3.6. Restrained molecular dynamics and energy minimization on this
initial structure gave an average minimized structure of G4 TGTG 4 as shown in Figure 3.7.
The internal G-quartets are quite planar, whereas G4, in the external layer, is tilted out of
plane. The structure is also stabilized by intra and inter-strand interactions at the GantipG syn steps. In each 5'G-G-G-G-3' arm, the glycosyl torsions of the G residues alternate as
5'-Gsyn-Ganti-Gsyn-Ganti-3
'
while the sugar puckers for all the four G residues at C2'-
endo. The T5-G6-T7 loops connects G4 anti and G8syn along the wide edge of the G4 an tiG8syn-G1 lanti-G 1syn tetrad. Two hairpins are anti-parallel to each other. T7 in the loop is
stacked with G11 on the opposite strand and G6 is stabilized by a strong interaction with
Gl on the opposite strand. Important non-bonding interactions between G6 and T7 with
G4 offer additional loop stability. T5 is not stacked and it is locked in the narrow edge
between G8 and G4, being stabilized mostly by electrostatic interactions with the backbone. Figure 3.7(A) shows the strong GantipGsyn stacking between the two internal layers
of quartets. Figure 3.7(B) shows the strong vertical stacking of G6 in the loop with G1 on
the opposite strand, and T7 with G11 of the opposite strand. Important NOEs were
observed between N1H-G6 and H8-G1, and between T7(H1') and G11(H8) that justify
the observed stacking. T5 does not show any substantial interaction with any of the bases,
either in the loop or in the stem. Figure 3.7(A) shows that two symmetric T5-G6-T7 loops
are disposed on two opposite sides of the G-quartet. Each T5-G6-T7 loop connects G4 and
G8 along the edge of the G4-G8-G1 1-G1 tetrad.
I
Figure 3.7: (A)The structure of the hairpin G-quartet of (G4 TGTG 4 )2 after 2500 conjugate gradient steps of minimization. (B): Close-up view of the local structure of the (T5G6-T7) loop.
38
3.6 Correlation between G-quartet structure and transcriptional activity
It is known that different ILPR sequences have different transcriptional activity. To
explain the difference, one could look at the differences in their abilities to form the hairpin G-quartet structures. Figure 3.8 shows the enhancer, ILPR, promoter, and the transcription start site of the human insulin gene. The transcriptional activities of the ILPR
sequence and a few mutations are also shown. A single G--A mutation from G4TGTG 4 to
G4 TGTGAGG lowers the transcriptional activity to less than 50% of the consensus
sequence. An G->C mutation together with an G--A mutation lower the activity to only
1%of the consensus sequence. These mutations also destabilize the cyclic-H-bonding and
stacking in the hairpin G-quartet structure. Single mutation in the hairpin loop (G-C)
gives only 1/3 of original transcriptional activities and at the same time, the mutation is
know to disrupt the loop-loop and loop-tetrad interactions. These observations can be seen
as evidence for a positive connection between the hairpin G-quartet structure of the ILPR
and the transcriptional activity of the insulin gene.
Further support for the correlation between hairpin Q-quartet formation and transcriptional regulation can be shown by the existence of telomere-like G/C rich regions in
the genes of insulin-like growth factors and their receptor (Allander, et al., 1994), the
human mucin, MUC-1, gene (Hareuveni, et al., 1990). Although they are not tandemly
repeated as in the ILPR of the insulin gene, these sequences are similar to telomere
sequence and the ILPR sequence. Thus they are capable of forming G-quartets. Other
folded structures such as triple helix, cruciform and H-DNA are also capable of transcriptional regulation if formed upstream of a gene.
ILPR
enhancer
V
promoter
INS ULIN
Consensus Sequence ILPR
ACA GGGG TGT GGGG
T
T
T
C
A
C
A
Transcriptional
Activity
100%
41%
8%
32%
44%
1%
Figure 3.8: Schematic representation of the human insulin gene located on the short arm
of chromosome 11. The consensus sequence, single and double mutations are shown
together with their transcriptional activities.
References
Allander, S.V., Larsson, C., Ehrenborg, E., Suwanichkul, A., Weber, G., Morris, S.L.,
Bajalica, S., Kiefer, M.C., Luthman, H. & Powell, D.R. Characterization of the chromosomal gene and promoter for human insulin-like growth factor binding protein-5. J. Biol.
Chem. 269, 10891-10898 (1994).
Berger, I., Kang, C., Sinha, N., Wolters, M. and Rich, A. A highly efficient 24-condition
matrix for the crystallization of nucleic acid fragments. Acta Cryst. D 52, 465-468 (1996).
Brtinger, A. T., and Anton Krukowski. Slow-cooling protocols for crystallographic refinement by simulated annealing. Acta Cryst. A 46, 585-593 (1990).
Briinger, A. T., X-PLOR Version 3.0: A system for crystallographyand NMR (1992).
Cantor, C. and Schimmel, P. Biophysical chemistry. W. H. Freeman Co. (1980)
Catasti, P., Chen, X., Moyzis, R.K., Bradbury, E.M. and Gupta G. Structure-function correlations of the insulin-linked polymorphic region. J. Mol. Bio. 264, 534-545 (1996).
Ducruix, A. and Giege, R. Crystallization of Nucleic Acids and Proteins, A Practical
Approach. IRL Press (1992)
Hammond-Kosack, M.C.U., Docherty, K. A consensus repeat sequence from the human
insulin gene linked polymorphic region adopts multiple quadriplex DNA structures in
vitro. FEBS Letts. 301, 79-82 (1992)
Hammond-Kosack, M.C.U., Kilpatrick, M.W. & Docherty, K. The human insulin genelinked polymorphic region adopts a G-quartet structure in chromatin assembled in vitro. J.
Mol. Endocrin. 10, 121-126 (1993)
Hareuveni, M., Tsarfaty, I., Zaretsky, J., Kotkes, P., Horev, J., Zrihan, S., Weiss, M.,
Green, S., Lathe, R., Keydar, I., Wreschner, D.H. A transcribed gene, containing a variable
number of tandem repeats, codes for a human epithelial tumor antigen. Euro. J. Biochem.
189, 475-486 (1990).
Kang, C., Zhang, X., Ratliff, R., Moyzis, R. and Rich, A. Crystal structure of fourstranded Oxytricha telomeric DNA. Nature 356, 126-131 (1992).
Kennedy, G.C., German, M.S. and Rutter, W.J., The minisatellite in the diabetes susceptibility locus IDDM2 regulates insulin transcription. Nature Genetics 9, 293-298 (1995).
Laughlan, G., Murchie, A. I. H., Norman, D. G., Moore, M. H., Moody, P. C. E., Lilley, D.
M. J. and Luisi, B. The high resolution crystal structure of a parallel-stranded guanine tetraplex. Science 265, 520-524 (1994).
Lattman, E.E., Use of the rotation and translation functions. Methods Enzymol. 115, 55-77
(1985).
Rossmann, M.G. Acta Cryst. A 14, 383 (1961).
Rossmann, M.G., & Blow, D.M., Acta Cryst. A 15, 25-31 (1962).
Sen, D. & Gilbert, W. A sodium-potassium switch in the formation of four-stranded G4DNA. Nature 344, 410-414 (1990).
Smith, F. W. and Feigon, J. Quadruplex structure of Oxytricha telomeric DNA oligonucleotides. Nature 356, 164-168 (1992).
Tisch, R. and McDevitt, H., Insulin-dependent diabetes mellitus. Cell 85, 291-297 (1996).
Vyse, T. and Todd, J., Genetic analysis of autoimmune disease, Cell 85, 311-318 (1996).
Wang, B.-C. Resolution of phase ambiguity in macromolecular crystallography. Methods
Enzymol. 115, 90-112 (1985).
Wang, Y. & Patel, D.J., Solution Structure of the human telomeric repeat
d[AG3(T2AG3)3] G-tetraplex, Structure 1, 263-282 (1993).
Appendix A
X-PLOR Cell Symmetry File
remarks unit cell parameters ING C222
a=24.24 b=54.67 c=40.90 alpha=90.0 beta=90.00 gamma=90.0
{* spacegroup=C222 NO. * }
symmetry=( X,Y,Z)
symmetry=( -X,-Y,Z)
symmetry=( X,-Y,-Z)
symmetry=( -X,Y,-Z)
symmetry=( 1/2+X,1/2+Y,Z)
symmetry=( 1/2-X,1/2-Y,Z)
symmetry=( 1/2+X,1/2-Y,-Z)
symmetry=( 1/2-X,1/2+Y,-Z)
Appendix B
X-PLOR Data File
remarks file ingprepare.inp
remarks preparation of various data
structure @generate.psf end
{* read structure file *}
parameter
@paraml l.dna
{* read empirical potential *}
{* parameter file
*}
{* append parameters for waters * }
BOND HT OT 450.0
0.9572
ANGLE HT OT HT
55.0 104.52
{* for solute-water interactions * }
0.1591 2.8509 0.1591 2.8509
NONBONDED OT
NONBONDED HT
0.0498 1.4254 0.0498 1.4254
{* for water-water interactions * }
-------- A14------BB--------14 ----------A---------------nbfix ot ot 581980.4948 595.0436396 581980.4948 595.0436396
nbfix ht ht 3.085665E-06 7.533363E-04 3.085665E-06 7.533363E-04
nbfix ht ot 327.8404792 10.47230620 327.8404792 10.47230620
{* append parameters for pt * I
nonbonded PT
0.1000 2.0000 0.1000 2.0000 ! platinum
nbonds
{* this statement specifies * }
atom cdie shift eps=1.0 el4fac=0.4
{* the nonbonded interaction * }
cutnb=7.5 ctonnb=6.0 ctofnb=6.5
{* energy options. Note the *}
nbxmod=5 vswitch
{* reduced nonbonding cutoff * }
end
{* to save some CPU time *}
end
flags
{* in addition to the empirical potential
*}
include pele pvdw xref {* energy terms which are turned on initially. * }
?
{* This statement turns on the crystallographic * }
end
{* residual term and packing term.
*}
xrefine
@ing.cel
{* this invokes the crystallographic data parser * }
{* unitcell and * }
{* symmetry operators for space group P22121* }
*}
{* notation is as in Int. Tables
@scatter.sct
{* approximation is used. Atoms are selected based on* }
{* chemical atom type. Note the use of wildcards in the selection * }
nref= 15000 {* this will allocate space for the reflection list; specify a *}
{* number >= the actual no. of reflections* }
{*fwindow=5.0 10000.0 this will select reflections based on the size of Fob*
reflection @ing.fob end {* here we read in the diffraction data, *}
{* a typical line in the file may look like this:
*}
{* FOBS= -32 1 5.958 WEIG= 1.0 PHASe=46. FOM=0.4
{* everything is free-field, if you don't specify something *}
f * it'll be set to a reasonable default value
*}
method=FFT
*}
{* use the FFT method instead of direct summation * }
fft
memory= 1000000 {* this tells the FFT routine how much physical memory * }
end
{* is available, the number refers to DOUBLE COMPLEX * }
S*words, the memory is allocated from the HEAP
*}
? * this prints the current status *}
end
{* this terminates the diffraction data parser *}
Appendix C
X-PLOR Input File for Rotation Search
remarks file xtalmr/rotation.inp -- cross rotation function (model P1 vs crystal)
{===> structure @generate.psf end
{===> } coor @generate.pdb
{ read structure file }
{ read coordinates }
{ specify location of Patterson map files }
evaluate ( $pl_map="p l_map.dat" )
evaluate ( $p2_map="p2_map.dat" )
{===> }
{===> }
evaluate ( $max_vector=-20.)
{ maximum Patterson vector to be searched }
evaluate( $m_max_vector=-$max_vector )
xrefin
{ make Patterson P1 map of model in P1 box }
{===>)
{ the P1 box has to be larger than twice the }
{ the extend of the model in each direction }
a=80.0 b=120.0 c=100.0 alpha=90.0 beta=90.0 gamma=90.0
symmetry=(x,y,z)
SCATter ( chemical C* )
2.31000 20.8439 1.02000 10.2075
SCATter ( chemical N* )
12.2126 .005700 3.13220 9.89330
SCATter ( chemical O* )
3.04850 13.2771 2.28680 5.70110
SCATter ( chemical S* )
6.90530 1.46790 5.20340 22.2151
SCATter ( chemical P* )
6.43450 1.90670 4.17910 27.1570
SCATter ( chemical FE* )
11.1764 4.61470 7.38630 0.30050
1.58860 .568700 .865000 51.6512 .215600
2.01250 28.9975 1.16630 .582600 -11.529
1.54630 .323900 .867000 32.9089 .250800
1.43790 .253600 1.58630 56.1720 .866900
1.78000 0.52600 1.49080 68.1645 1.11490
3.39480 11.6729 0.07240 38.5566 0.97070
{ allocate sufficient space for the reflections of the P1 box }
{===>}
nreflections=200000
{===> }
resolution 8.0 3.0
{ resolution range for P1 box i
generate
method=fft
fft
grid=0.25
end
{ generate reflections for P1 box I
{ sampling grid for FFT and Patterson map (1/4 high resol.) }
update
{ compute Fcalcs for model in P1 box }
do amplitude (fcalc=fcalcA2)
do phase (fcalc=0.0)
map
{ compute IFcalcl^2 and store in Fcalc I
{ compute Patterson map P 1 (which will be rotated) }
{ we write a hemisphere of Patterson vectors with }
extend=box
{ lengths less than $max_vector.
}
xmin=$m_max_vector xmax=$max_vector
ymin=$m_max_vector ymax=$max_vector
zmin=0.0 zmax=$max_vector
automatic=true
formatted=false
output=$p l_map
end
end
xrefin
{ use automatic scaling of map }
{ write an unformatted map file }
{ make Patterson map P2 of crystal I
{ unit cell for crystal I
{===>}
a=24.24 b=54.67 c=40.90 alpha=90. beta=90. gamma=90.
I ===> }
symmetry=(x,y,z)
{ operators for Patterson symmetry of crystal P22121
symmetry=(-x,-y,z)
symmetry=(x,-y,-z)
symmetry=(-x,y,-z)
{===>}
nreflections=300000
reflection @ing.fob end
{ read reflections }
{===>}
resolution 8.0 3.0
{ resolution range }
reduce
do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma))
fwind=0. 1= 100000
method=fft
fft
grid=0.25
end
{ sigma cutoff }
{ sampling grid for Patterson maps (1/4 high resol.) }
do amplitude (fcalc=fobs^2)
do phase (fcalc=0.0)
map
extend=unit
automatic=true
formatted=false
output=$p2_map
end
end
xrefin
nrefl=10
search rotation
plinput=$pl_map
p2input=$p2_map
{ c( ompute IFobsl^2 and store in Fcalc }
{ cornLpute Patterson map P2 I
{ us(e automatic scaling of map }
{ release some memory }
formatted=false
(===> }
range=5.0 $max_vector
threshold=0.0
npeaks= 15000
{ Patterson vector selection for map P1 }
{ use 15000 largest vectors of map P1 }
{===>}
tmmin=0.0 tmmax= 180.
{ Lattman angle grid. Specify asymmetric }
t2min=0.0 t2max=90.
{ unit for rotation function here. See }
tpmin=0.0 tpmax=720.
{ Rao, S.N. et al. (1980). Acta Cryst. A36}
{ 878--884.
I
delta=2.5
{ Roughly, delta should be less than ArcSin[ high resol / (3*$max_vector)]. I
list=rotation 1.rf
nlist=6000
epsilon=0.25
end
end
stop
{ output file for cluster analysis }
{ analyse highest 6000 peaks of rotation function }
{ matrix norm for cluster analysis I
Appendix D
X-PLOR Input File for PC-Refinement
remarks file xtalmr/filter.inp -- pc-refinement of rotation function peaks
{===> } parameter @paraml 1.dna end
{===> structure @generate.psf end
{===>) coor @generate.pdb
{ read parameters }
{ read structure file }
{ read coordinates })
evaluate ($wa= 10000.) { this is the weight for the XREF energy term
{ in this case it is arbitrary since we're not
}
{ combining it with other energy term
}
xrefin
{===>)
@ing.cel
{ unit cel l for crystal })
SCATter ( chemical C* )
2.31000 20.8439 1.02000 10.2075 1.58860 .568700.865000 51.6512 .215600
SCATter ( chemical N* )
12.2126 .005700 3.13220 9.89330 2.01250 28.9975 1.16630 .582600 -11.529
SCATter ( chemical O* )
3.04850 13.2771 2.28680 5.70110 1.54630 .323900 .867000 32.9089 .250800
SCATter ( chemical S*)
6.90530 1.46790 5.20340 22.2151 1.43790 .253600 1.58630 56.1720 .866900
SCATter ( chemical P* )
6.43450 1.90670 4.17910 27.1570 1.78000 0.52600 1.49080 68.1645 1.11490
SCATter ( chemical FE*)
11.1764 4.61470 7.38630 0.30050 3.39480 11.6729 0.07240 38.5566 0.97070
{===>}
nreflections=30000
reflection @ing.fob end
{ re;id reflections
{===>I
resolution 8.0 3.0
{ reso lution range }
reduce
do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma))
{ sigma cutoff I
fwind=0. 1= 100000
{===>}
method=fft
fft
memory=2000000
end
{ fft method with memory statement I
wa=$wa
target=E2E2
mbins=20
{ specify target }
{ number of bins used for E calculation }
tolerance=0. lookup=false
{ this makes the minimizer happy }
{ expand data to a P1 hemisphere: this sequence of }
{ statements first applies the crystal symmetry ops }
{ to the current reflections. In the second step I
{ Friedel mates or other redundancies are removed. }
{ This is necessary since the application of the }
{ symmetry operators can produce Friedel mates }
{ under special conditions. I
hermitian=false expand hermitian=true symmetry reset reduce
end
flags exclude * include xref end
I only use XREF energy term I
I write the results of the refinement }
{===>} set display=filterl.list end
{ to a file called "filter.list"
}
set precision=5 end
set message=off end
set echo=off end
{ turn off messages and echo to reduce)
{ output
}
evaluate ($number=-0)
evaluate ($counter=0)
{ loop over all orientations as specified )
{ in file rotation.rf (conventional rf) }
for $1 in ( @rotationl.rf) loop main
{ this series of statements }
evaluate ($counter=--$counter+1)
if ($counter=l1) then evaluate($index= $1) { assigns the information of}
elseif ($counter=2) then evaluate($tl=$1)
{ a single line in file I
elseif ($counter=3) then evaluate($t2=$1)
{ rotation.rf to the approp. }
elseif ($counter=4) then evaluate($t3=$1)
{ variables. A single line }
elseif ($counter=-5) then
{ contains
}
evaluate ($rf=$1)
{ $index $tl $2 $t3 $rf. }
evaluate ($counter-0O)
evaluate ($number=-$number+1)
coor copy end
{ save current coordinates }
coor rotate euler=-( $tl $t2 $t3 ) end { and then rotate them
{ according to the orientation }
{ specified by $t l, $t2, $t3 }
energy end
evaluate ($pc 1= 1.0-$xref/$wa)
{ compute initial energy }
{ and store in $pc 1
minimize rigid
rigid body minimization of the
nstep= 15
{ c)rientation of the molecule }
drop= 10.
end
evaluate ($pc2=1.0-$xref/$wa)
coor swap end
{ fit coordinates to starting structure in }
vector do (vx=x) ( all ) { order to measure the orientation of the }
vector do (vy=y) ( all ) { PC-refined structure
}
{ the arrays vx, vy, vz are used as temporary }
vector do (vz=z) ( all)
{ stores in order to keep the starting
}
coor fit end
{ coordinates
vector do (x=vx) ( all)
}
{ the COOR FIT statement stores the angles
vector do (y=vy) ( all)
vector do (z=vz) ( all ) { in the symbol $thetal, $theta2, $theta3 }
{ print information: orientation of rotation I
{ function peak ($tl, $t2, $t3), orientation }
{ after PC-refinement ($thetal, $theta2,
}
{ $theta3), index of the rotation function, }
{ rotation function value, PCs for initial, }
{ rigid body and domain refined structures. }
display $tl $t2 $t3
end if
end loop main
stop
$thetal $theta2 $theta3
$index $rf $pcl $pc2
Appendix E
X-PLOR Input File for Translation Search
remarks file xtalmr/translation.inp -- PC-refinement followed by translation search
{ The first part of this job is similar to the PC-refinement I
{ job (filter.inp). We actually have to repeat the refinement)
{ for the selected orientation since we did not store the }
{ refined coordinates.
}
{===> parameter @paraml l.dna end
{===> } structure @generate.psf end
{===>} coor @generate.pdb
{ read parameters }
{ read structure file }
{ read coordinates }
evaluate ($wa= 10000.) { this is the weight for the XREF energy term
{ in this case it is arbitrary since we're not
}
{ combining it with other energy term
}
xrefin
{===>}
@ing.cel
SCATter ( chemical C* )
2.31000 20.8439 1.02000 10.2075
SCATter ( chemical N* )
12.2126 .0057003.13220 9.89330
SCATter ( chemical O* )
3.04850 13.2771 2.28680 5.70110
SCATter ( chemical S* )
6.90530 1.46790 5.20340 22.2151
SCATter ( chemical P* )
6.43450 1.90670 4.17910 27.1570
SCATter ( chemical FE* )
11.1764 4.61470 7.38630 0.30050
{===> }
nreflections=30000
{ unit cell for crystal }
1.58860 .568700 .865000 51.6512 .215600
2.01250 28.9975 1.16630 .582600 -11.529
1.54630 .323900 .867000 32.9089 .250800
1.43790 .253600 1.58630 56.1720 .866900
1.78000 0.52600 1.49080 68.1645 1.11490
3.39480 11.6729 0.07240 38.5566 0.97070
reflection @ing.fob end
({read reflections I
{===>
resolution 8.0 3.0
{ resolution range }
reduce
do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma))
{ sigma cutoff }
fwind=0. 1= 100000
{===>}
method=fft
fift
memory=2000000
end
wa=$wa
target=E2E2
mbins=20
{ fft method with memory statement I
{ specify target used for both PC-refinement}
{ and translation search
}
{ number of bins used for E calculation }
tolerance=0. lookup=false
{ this makes the minimizer happy })
{ expand data to a P1 hemisphere I
hermitian=false expand hermitian=true symmetry reset reduce
end
flags exclude * include xref end
{ only use XREF energy term }
{===> }
coor rotate euler=(213.57 10 93.566) end
{ rotate the structure according to the selected }
{ orientation. Note: use the orientation that }
{ comes out of the rotation function (first three
{ numbers in file "filter.list".
}
{===> }
minimize rigid
nstep=15
drop=10.
{ repeat the refinement steps of job filter.inp })
end
{ now we have to turn the crystal symmetry on }
{ in order to carry out the translation search I
xrefin
{===> }
@ing.cel
reduce
end
{now we get ready for the translation search I
xrefin
{ set the grid size for the translation search }
{ should be less than 1/3 high-resolution limit}
evaluate ( $gridx=1./40. )
evaluate ( $gridy=1./50. )
evaluate ( $gridz=1./80. )
evaluate ( $grid=min($gridx,$gridz))
search translation
mode=fractional
xgrid=0.0 $grid 0.5
ygrid=0. 0.02 0.5
zgrid=0. $grid 0.5
nlist=1000
{ we only have to search in x,z in this }
{ space group. In general we have to }
{ specify an asymmetric unit for the }
translation function. N.B.: This is }
NOT necessarily identical to an
}
asymmetric unit of the space group!! i
{ list the 1000 best solutions
the list is returned in the standard }
output file.
}
output=translation 1.3dmatrix
{ output matrix for plotting. }
{ this can be verbose for 3d }
{ translation functions!! }
end
end
write coordinates output=translation.pdb end { the translation function
{ returns the coordinates of }
{ best solution.
}
xrefin
resolution 8. 3.0
target=residual
update
print rfactor
end
stop
{ analyse the R factor distribution }
{ of the best solution.
}
}
Appendix F
X-PLOR Input File for Rigid Body Refinement
remarks FILE RIGID.INP
remarks rigid-body refinement
@ingprepare.inp
{*read various standard data sets* }
coordinates @generate.pdb
{*read in initial model*}
coordinates copy end{ * copy to comparison set* }
{* include only R-factor in energy function * }
flags
exclude bond angl dihe impr vdw elec pvdw pele
include xref
end
xrefin
resolutionlimits=8.0 2.5
tolerance=0.0
update-fcalc
print R-factor
wa=1300.0{*arbitrary value, since only XREF carries weight* }
wp=0.0
end
minimize rigid
nstep=40
drop= 10.0
group = (resid 2:15 )
end
minimize rigid
nstep=40
drop= 10.0
group = (resid 2:5 )
group = (resid 12:15)
end
minimize rigid
nstep=40
drop= 10.0
group =
group =
group =
group =
group =
group =
group =
group =
( resid
( resid
( resid
( resid
( resid
( resid
( resid
( resid
2)
3)
4)
5)
12)
13)
14)
15)
end
write coordinates output=rigid.pdb end
coordinates rms end{ *print out rms to intial coordinates* }
stop
Appendix G
X-PLOR Input File for Simulated Annealing
remarks file xtalrefine/slowcool.inp
remarks crystallographic SA-refinement (slow-cooling method)
{===> parameter @paraml 1.dna end
{===> } structure @generate.psf end
{===>) coor @prepingl.pdb
I read parameters }
{ read structure file }
{ read coordinates }
vector do ( charge=0.0 ) ( resname LYS and
( name ce or name nz or name hz* ) ) I Turn off charges on LYS
vector do ( charge=0.0 ) ( resname GLU and
( name cg or name cd or name oe* ) ) { Turn off charges on GLU
vector do ( charge=0.0) ( resname ASP and
( name cb or name cg or name od* ) ) { Turn off charges on ASP
vector do ( charge=0.0) (resname ARG and
( name cd or name *E or name cz or name NH* or name HH* ) )
{ Turn off charges on ARG
}
flags
include pele pvdw xref
end
xrefine
{===> }
a=24.24 b=54.67 c=40.90 alpha=90.0 beta=90.00 gamma=90.0
{===> }
symmetry=(x,y,z)
symmetry=(-x,-y,z)
symmetry=(x,-y,-z)
symmetry=(-x,y,-z)
symmetry=( 1/2+x,1/2+y,z)
symmetry=(1/2-x, l/2-y,z)
}
}
}
symmetry=(1/2+x,1/2-y,-z)
symmetry=(1/2-x, 1/2+y,-z)
SCATter ( chemical C* )
2.31000 20.8439 1.02000
SCATter ( chemical N* )
12.2126 .005700 3.13220
SCATter ( chemical O* )
3.04850 13.2771 2.28680
SCATter ( chemical S* )
6.90530 1.46790 5.20340
SCATter ( chemical P*)
6.43450 1.90670 4.17910
SCATter ( chemical FE* )
11.1764 4.61470 7.38630
10.2075 1.58860 .568700 .865000 51.6512 .215600
9.89330 2.01250 28.9975 1.16630 .582600 -11.529
5.70110 1.54630 .323900 .867000 32.9089 .250800
22.2151 1.43790 .253600 1.58630 56.1720 .866900
27.1570 1.78000 0.52600 1.49080 68.1645 1.11490
0.30050 3.39480 11.6729 0.07240 38.5566 0.97070
{===>}
nreflections=15000
reflection @ing.fob end
{ read reflections }
{===>}
resolution 8.0 2.5
{ resolution range )
reduce
do amplitude ( fobs = fobs * heavy(fobs - 2.0*sigma))
fwind=0. 1= 100000
{ sigma cutoff })
method=FFT
fft
memory= 1000000
end
tolerance=0.2
{ tolerance for dynamics }
{===>}
wa=6500
end
{ weight from job "check.inp" }
set seed=432324368 end
{ set the initial random seed for the v-assignment I
{ ===> }
evaluate ($init_temp=3000.)
vector do (vx=maxwell($init_temp)) ( all )
vector do (vy=maxwell($init_temp)) ( all )
vector do (vz=maxwell($init_temp)) ( all)
{ starting temperature }
vector do (fbeta=100.) ( all)
evaluate ($1=$inittemp)
while ($1 > 300.0) loop main
dynamics verlet
timestep=0.0005
nstep=50
iasvel=current
nprint=5 iprfrq=0
tcoupling=true tbath=$1
end
evaluate ($1=$1-25)
end loop main
xrefin
tolerance=0.0
end
minimize powell
nstep=200
drop= 10.0
end
lookup=false
{ this makes the minimizer happy I
{ final minimization I
write coordinates output=slowing.pdb end { Write coordinates
stop
Download