aca2000 - Center for Structural Biology

advertisement
SHELX Workshop
St. Paul ACA Meeting 22nd July 2000
Contents
1. Workshop program and aims
2. Introduction to SHELX
3. SHELXD – integrated direct and Patterson methods (beta-test)
4. Guide to SHELX for macromolecules: Phasing
5. Guide to SHELX for macromolecules: Refinement
6. Frequently asked questions (by biocrystallographers)
7. References
8. Further useful sources of information
1
1. Workshop Program
The Workshop is divided into four sessions, with a discussion period after each session. Each
discussion is led by a panel consisting of the session chair and the speakers for that session.
A. Introduction, phasing etc. Chair: Duncan McRee
8:30 – 8:45 George Sheldrick
Historical introduction to SHELX
8:45 – 9:10 George Sheldrick
Dual-space ab initio direct methods in SHELXD
9:10 – 9:35 Thomas Schneider MAD phasing
9:35 – 10:00 Louis Farrugia
10:00 – 10:20 Discussion
10:20 – 10:35 Coffee/tea
The WinGX user interface
B. Structure refinement. Chair: Ethan Merritt
10:35 – 11:00 Dale Tronrud
Introduction to refinement, solvent model
11:00 – 11:20 George Sheldrick
Restraints and constraints
11:20 – 11:45 Bill Clegg
Weak data, disorder and other problems in small molecules
11:45 – 12:10 Thomas Schneider Disorder in macromolecules
12:10 – 12:30 Discussion
12:30 – 13:30 Buffet lunch
C. Twinning. Chair: George Sheldrick
13:30 – 13:45 Regine Herbst-Irmer Racemic twinning and the Flack parameter
13:45 – 14:10 Regine Herbst-Irmer Merohedral twins
14:10 – 14:35 Victor Young
Non-merohedral twins
14:35 – 15:00 Thomas Schneider Twinning in macromolecules
15:00 – 15:20 Discussion
15:20 – 15:35 Coffee/Tea
D. Errors, validation and anisotropic refinement. Chair: Bill Clegg
15:35 – 16:00 Ton Spek
Small-molecule validation
16:00 – 16:20 George Sheldrick
Estimation of parameter errors
16:20 – 16:45 Ethan Merritt
Anisotropic refinement of macromolecules
16:45 – 17:10 Duncan McRee
Validation of error estimates for metalloproteins
17:10 – 17:30 Discussion
2
1.1 Aims and organization of the Workshop
Although centered on a particular program system, it is intended that the Workshop should be
educational; previous experience of the SHELX programs should not be essential (though it will
clearly help). Applications to small molecules and macromolecules have been mixed up as
thoroughly as possible; an exchange of ideas must surely be beneficial to both groups of
crystallographers. The gap between the two approaches has long since disappeared. Large small
molecules that are bigger than small proteins are now being solved by molecular replacement or
anomalous dispersion methods, and small proteins are being solved by direct methods.
Anisotropic refinement with full-matrix estimation of standard deviations is now practicable for
macromolecules that diffract to high resolution, and the techniques used to model disordered
solvent in small molecule structures often now involve restraints developed first for
macromolecular refinements.
With these notes, we have tried to provide an introduction to the theory and application of the
new structure solution program SHELXD; which is proving very useful for the ab initio solution
of larger small molecules given data to atomic resolution, as well as for the location of heavier
atoms or anomalous scatterers from MAD, SIR, SIRAS and SAS data at much lower resolution.
We have also tried to provide a simple introduction to the SHELX system for biological
crystallographers using the programs for the first time, e.g. for the refinement of proteins at high
resolution or the refinement of twinned macromolecules at any resolution. No attempt has been
made to deal with routine small-molecule applications since these are well covered by the
existing documentation.
In order to maximize the information content, the Workshop will consist of talks and discussions
rather than computer demonstrations. There is a generous allocation of time for discussions and
participants are encouraged to make good use of this to ask awkward questions. The SHELX
Workshops in Göttingen have covered similar ground in about a week, so the program is
intensive and will require good teamwork from the speakers. Computers will be available during
exhibit hours at the ACA Meeting, so participants who would like to try out some of the
programs on their own data should contact the appropriate speakers.
A useful byproduct of the Workshop is the production of tutorials, documentation and examples
that have been made generally available on the Internet (via links from the SHELX homepage at
http://shelx.uni-ac.gwdg.de/SHELX/ ).
3
2. Introduction to SHELX
2.1 History
The original version of SHELX consisted of about 5000 lines of FORTRAN written around 1970
for the solution and refinement of small-molecule and inorganic structures from single crystal
diffraction data. Starting in 1976, this version was distributed in compressed form so that the
program and test data fitted into one box of ca. 2000 punched cards. SHELX76 was restricted to
160 atoms because of computer limitations! A separate structure solution program SHELXS was
released in 1986 to accommodate advances in direct methods, and in 1993 SHELXL replaced the
structure refinement part of SHELX76. There was a major update of both SHELXS and SHELXL
in 1997 and these are still the current versions. The SHELX97 system includes a program
CIFTAB for processing CIF format files that can be used for archiving structural data, and
programs SHELXPRO and SHELXWAT designed more specifically for macromolecular
applications.
At this Workshop a beta-test version of a new integrated Patterson and direct methods program
SHELXD is being released; it is proving particularly useful in MAD phasing of macromolecules
as well as for ab initio solution of structures - in the range 200-2000 unique atoms - between
small and macromolecules. The SHELX system consists purely of programs that input and output
text files. Several excellent graphical interfaces are available from other authors. At the
Workshop three such interfaces – PLATON and WinGX for small molecules and XtalView for
macromolecules - that like SHELX are available free to academics - will be introduced by their
authors, SHELXTL, a commercial version of SHELX incorporating the interactive graphics
programs XPREP (reciprocal space exploration) and XP (real space calculations and display), is
available from Bruker-AXS.
2.2 Program organization and philosophy
SHELX is written in a simple subset of FORTRAN-77 that has proved to be extremely portable.
The programs SHELXS (structure solution) and SHELXL (refinement) both require only two
input files: a reflection file (name.hkl) and a file (name.ins) that contains crystal data, atoms (if
any), and instructions in the form of keywords followed by free-format numbers, etc. These
programs write a listing file name.lst and a file, name.res, that can be renamed or edited to
name.ins for the next refinement. The common first part of the filename is read from the
command line by typing, e.g., ‘shelxl name’. The programs are executed independently without
the use of any hidden files, environment variables, etc.
4
The programs are general for all space-groups in conventional settings or otherwise and make
extensive use of default settings to keep user input and confusion to a minimum. Particular care
has been taken to test the programs thoroughly on as many computer systems and
crystallographic problems as possible before they were released, a process that often requires
several years!
2.3 Distribution of the programs
The programs are provided as sources as well as precompiled executables for common computer
systems, and may be downloaded by ftp or using a browser (CDROMs are also available). The
programs are free of charge for academics but a modest license fee (currently $2499) is required
for for-profit institutions. This license covers the use of the programs for an unlimited time on an
unlimited number of computers at one geographical location. This fee is necessary to cover the
costs of distribution and support for all users, we do not make a profit but the university requires
us to cover our costs. When there is a major new release a new license fee is required for the new
version. There will be no additional license fee for the beta-test of SHELXD, but the final version
of this program will be released at the same time as the next major SHELX update in 2001 or
2002 and so will require a license fee. To encourage for-profit users to switch to the new version
and to prevent a bug-ridden version remaining in circulation, the beta-test is provided in compiled
form only and has a built-in expiry date. The final version will be made available as usual in
source form without an expiry date. All users are required to fill in and sign an application form
before they are given the password for downloading the programs from the SHELX ftp server;
this form may be printed from the SHELX homepage.
2.4 Documentation and support
Information about new developments in the SHELX programs, workshops, related programs,
frequently asked questions and other sources of information are posted on the SHELX homepage
at: http://shelx.uni-ac.gwdg.de/SHELX/ which should be checked at regular intervals. A detailed
SHELX manual may be downloaded from the SHELX ftp server in Microsoft Word or in
Postscript format. This was written with small molecule users in mind and contains a full
explanation of the test structures that are provided with the programs. Since macromolecular
users may be unfamiliar with these examples these notes include a separate guide for
macromolecular Workshop participants. The author is happy to answer questions (email only
please, gsheldr@shelx.uni-ac.gwdg.de) provided that the questions are not in the lists of
‘frequently asked questions’!
5
3. SHELXD – integrated Patterson and direct methods (beta-test)
3.1 Introduction
Although the solution of the crystallographic phase problem is proving more elusive than
Fermat’s last theorem, in practice the large majority of small molecule structures are solved in
minutes (or even seconds) by conventional direct methods. However the phase probability
distributions on which these methods are based become weaker as the number of atoms increases,
and few structures with more than about 200 unique equal atoms have been solved in this way.
After more than a decade in which little progress was made in solving larger structures, the
introduction of the dual-space (also known as Shake & Bake) philosophy by the Buffalo group
(Miller et al., 1993) proved to be a significant improvement, increasing the size of structure that
could be solved by nearly an order of magnitude (Figure 3.1).
Figure 3.1 A general view of dual-space direct methods. The phase refinement (in reciprocal
space) is usually performed using the tangent formula (Karle & Hauptman, 1956) or minimal
function (Miller et al., 1993); the atomic model in real space may simply involve picking the
highest N peaks or may be more sophisticated.
This procedure, which was implemented in the computer programs SnB (Miller et al., 1994) and
later in SHELXD, was of necessity based on the strongest normalized structure factors E,
corresponding typically to the largest 15 to 20% of the observed structure factors F in each
resolution shell, because the probability formulae only provide significant phase information for
the strongest E-values. The number of unique non-hydrogen atoms N is assumed to be
6
approximately known. The dual-space recycling is typically performed for several hundred or
more sets of N random starting atoms, with typically 2N cycles for each. In SHELXD, potential
solutions are identified by high values of the correlation coefficient CC (Fujinaga & Read, 1987):
CC=100[wEo2Ec2•w–wEo2•wEc2]/{[wEo4•w–(wEo2)2]•[wEc4•w–(wEc2)2]}½
These potential solutions can be improved and extended by means of peaklist optimization
(Sheldrick & Gould, 1995) that finds the set of potential atoms that maximizes CC for all
reflections.
The structure solution, as monitored by the mean phase error, tends to happen quite suddenly
over a small number of cycles. Although there is little indication of an impending solution, a
single dominant peak in real space typically indicates that the phase refinement is locked in a
false minimum (Xu et al., 2000).
3.2 Random omit maps
In the course of testing SHELXD, it was discovered by accident that a very effective procedure is
to leave out about 30% of the peaks at random when calculating phases for the next cycle. In
retrospect it is possible to understand why this is an effective search strategy, by analogy with the
omit maps frequently used in macromolecular crystallography. If the deleted atoms are part of an
essentially correct solution, they will probably be regenerated; if not, they will be replaced by
different, and possibly better, potential atoms. The effectiveness of this random omit procedure is
illustrated in Figure 3.2 using gramicidin A (NS = 317; P212121) as a test structure; gramicidin A
was probably the most difficult structure solved by conventional direct methods (Langs, 1988).
At least for this structure, the most effective approach involved the combination of the tangent
formula in reciprocal space with random omit maps in real space; other attempts at modifying the
peak list were much less successful. Note that line (c) corresponds to the original Shake & Bake
procedure. A surprising observation in Figure 3.2 is that the combination (d) [no phase
refinement/random omit] is able to solve this structure (albeit less efficiently) although no phase
probability relations have been employed! This is important because, unlike the random omit
maps, the probabilities become weaker as the structure becomes larger. This provides the
important clue that for much larger structures, it might be more efficient to discard the
probabilistic approach to direct methods completely!
7
Figure 3.2 Percentage of correct solutions P against cycle number for gramicidin A using
various combinations of phase refinement and real space processing: (a) tangent / random omit;
(b) minimal function / random omit; (c) minimal function / top N Peaks; (d) no phase refinement
/ random omit. In the random omit procedure, the highest N peaks were found and 30% of them
omitted at random.
It should be emphasized that direct methods are almost entirely a phase searching problem;
phase refinement plays a minor role. There are much better ways of refining phases than the
tangent formula or the minimal function. For example, Sayre (1974) showed that it was possible
to refine the phases of the small protein rubredoxin with 1.5Å data by a least-squares fit to his
squaring equation:
Fh = Qh k Fk Fh-k

Qh is a constant, assuming equal atoms and equal isotropic displacement parameters. This
equation equates amplitudes as well as phases, compared with just phases in the case of the
tangent formula, and is equally valid for large and small structures, whereas probability formulas
become weaker as the size of the structure increases. On the other hand the use of all the data
rather than just a small subset of the strongest E-magnitudes probably makes it less suitable for
searching phase space.
8
Table 1. Some previously unsolved structures first solved using SHELXD. SG = space group, N
is the number of unique non-hydrogen atoms excluding solvent, NS the number including solvent
atoms. HA lists the unique atoms heavier than oxygen, if any, and dmin is the limiting resolution to
which data were processed.
Compound
SG
N
NS
HA
dmin(Å)
Hirustasin
P43212
402
467
10S
1.20
Cyclodextrin
P21
448
467
1.00
Cyclodextrin
P1
483
562
0.88
Decaplanin
P21
448
635
Amylose CA26
P1
624
771
Mersacidin
P32
750
826
24S
1.04
rc-WT Cv HiPIP
P212121
1264
1599
8Fe
1.20
Cytochrome c3
P31
2024
2208
8Fe
1.20
4Cl
1.00
1.10
3.3 Application to unknown structures
The best demonstration of the power of a new method is its ability to solve previously unsolved
structures; Table 3.1 shows some examples of this. It should be noted that the presence of heavier
atoms definitely improves the chances of success, and reduces the computer time needed per
solution, but is not essential. It should also be noted that these successes are limited to structures
for which data were available to atomic resolution (ca. 1.2 Å) or better. The only exception is
hirustasin (Usón et al., 1999) which could be solved using either the 1.2 Å (low temperature) or
the 1.4 Å (room temperature) data, even if the data were truncated to 1.55 Å.
3.4 Integration with other approaches
The extension of these algorithms to lower resolution and to larger structures is the subject of
intensive current research by the groups in Bari, Buffalo, Göttingen and York. An obvious
extension is to search for small groups of atoms (e.g. the five atoms of a peptide group) rather
than for individual atoms, but unfortunately this is very computer time intensive. Peak-picking is
after all an extreme form of density modification, and the low density elimination procedure of
9
Woolfson and co-workers (Shiono & Woolfson, 1992; Refaat & Woolfson, 1993) may provide a
good compromise between peak picking and techniques normally applied to improve maps at
lower resolution. Solvent boundaries have apparently not yet been included in direct methods
programs. A promising (but complicated) alternative would be to incorporate the wARP approach
(Perrakis, Lamzin et al., 1997, 1999) of refining the positions and B-values of potential atoms,
adding new atoms that correspond to high difference density and make chemical sense. The peak
positions from dual space direct methods are relatively precise, and simply refining B-values
against all data can significantly improve map interpretability (Usón et al., 1999; Parisini et al.,
1999), as shown in Figure 3.3:
Figure 3.3 (a) Part of the electron density map produced by dual-space recycling followed by
peaklist optimisation in the ab initio solution of a HiPIP protein (Parisini et al., 1999) and (b) The
same region of a sigma-A weighted 2mFo-DFc map (Read, 1985) after B-value refinement.
Although the atom positions were held fixed in this refinement, density appears at the sites of the
missing atoms.
3.5 Direct methods for the location of anomalous scatterers
In principle the MAD approach (Hendrickson, 1991; Smith, 1998) in which data are collected at
two or more wavelengths for which the f‘ and f“ anomalous scattering factors are non-zero for at
least one of the elements present, determines experimental phases directly. There is however a
hidden phase problem: it is still necessary to find the positions of the anomalous scatterers in
order to calculate reference phases. Without theses reference phases the protein phases cannot be
found. Although conventional direct methods and Patterson interpretation programs such as
SHELXS-97 can be misused to find a small number of anomalous scatterers from the MAD
10
estimates of the structure factors for these atoms alone (FA) or from the SAS (sometimes referred
to as SAD) anomalous differences F = F+–F-, the number of sites that can be found in this way
is limited to about 20. The main problem is that the data are noisy since they are based on
differences of observed structure factors; the best antidote is to collect highly redundant data. On
the other hand the resolution and completeness of the FA data are not critical; 3.5 Å is adequate,
since the anomalous atoms are more than 3.5 Å apart, and the problem is still highly overdetermined. Although higher resolution and completeness are not required to find the anomalous
scatterers, they do have a major influence on the quality of the resulting electron density maps
(Brodersen et al., 2000).
Before attempting to use MAD or SAS data to locate the anomalous scatterers, a critical decision
is to which resolution the data should be truncated. If data are used to a higher resolution than
there is significant dispersive and anomalous information, the effect will be to add noise. Since
direct methods are based on normalised structure factors, which emphasise the high resolution
data, they are particularly sensitive to this. Since there is some anomalous signal at all the
wavelengths in the MAD experiment, a good test is to calculate the correlation coefficient
between the signed anomalous differences F at different wavelengths as a function of the
resolution. A good general rule is to truncate the data where this correlation coefficient falls
below about 25 to 30%. Table 3.2 illustrates three very different cases. This procedure can also
indicate if there is a problem with the wavelength. For SAS data collected at a single wavelength
it is still possible to use the correlation coefficient between the anomalous differences collected
from two crystals, or from one crystal in two orientations, before merging the two data-sets.
Table 3.2. Correlation coefficients expressed as percentages between the high energy remote data
and the two or three other wavelengths collected in MAD experiments on three different proteins.
In (a) the high values involving the peak (pk) and inflection point (ip) data show that it is not
necessary to truncate the data, there is significant MAD information up to the highest resolution
collected. A poorer correlation would be expected with the low energy remote data (lrm) which
has a much smaller anomalous signal. In (b) it would be advisable to truncate the data to about
3.9Å (which indeed led to a successful solution using SHELXD). (c) is clearly hopeless and in
fact could not be solved.
(a) Apical Domain (Walsh et al., 1999) 1 x (3 Se-Met in 144aa) C2221
pk
ip
lrm
Inf - 8.0 - 6.0 - 5.0 - 4.0 - 3.6 - 3.4 - 3.2 - 3.0 - 2.8 - 2.6 - 2.4 - 2.2
91.2 93.9 93.9 89.6 88.6 89.4 89.4 83.9 76.9 65.7 57.0 44.8
89.7 90.0 87.0 84.4 79.8 78.9 79.4 74.7 71.1 54.3 47.2 39.2
48.5 52.8 52.9 38.0 28.4 34.6 14.2 21.1 24.7 9.1
5.4 -3.7
11
(b) RRF (Selmer et al., 1999) 1 x (4 Se-Met in 185aa) P43212
pk
ip
Inf - 8.0 - 6.0 - 5.0 - 4.6 - 4.4 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0
69.3 73.1 62.2 56.9 49.6 45.6 48.6 29.6 20.6 24.6 20.1 14.2
59.4 58.3 41.9 43.3 40.7 50.4 34.6 24.7 17.5 16.6
8.1 3.9
(c) Unknown Protein 4 x (4 Se-Met in 350aa) P21
Inf - 8.0 - 6.0 - 5.0 - 4.6 - 4.4 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0
pk 33.2 29.5 19.9 10.6 7.7 17.4 7.6 9.8 9.3 13.4
6.0 2.8
ip 37.6 38.9 37.8 26.5 13.5 24.0 14.2 27.3 25.9 23.1 24.3 22.8
3.6 Integration of direct and Patterson methods
The original dual-space algorithm is an effective way of locating a specified number of
anomalous scatterers from MAD Fa or SAS F data. The efficiency can however be improved by
at least an order of magnitude by using starting atoms consistent with the Patterson function
rather than random starting atoms. In addition, the Patterson provides a reliable relative (but not
absolute) indication of the correctness of the solution and also of which atoms are probably
correct.
The location of possible starting atoms makes extensive use of a special form of the Patterson
minimum function (PMF) which is calculated as follows (Nordman, 1966). Place two atoms in a
unit-cell and generate all their symmetry equivalents. Look up the Patterson function values
corresponding to the unique vectors between all these atoms and sort them in ascending order,
then find the mean value of the lowest (say) 30% of the values in this list. Since it is unlikely that
this PMF will have a high value for wrong atom positions, especially when the symmetry is high
and there are many vectors, it may be used as a criterion for a translational search for a two-atom
fragment.
Each strong general Patterson peak is in principle a suitable two-atom ‘fragment’ for this
translational search, because it may well correspond to a vector between two heavy atoms! Since
we are only interested in generating many different sets of atom co-ordinates consistent with the
Patterson function, there is no need to determine the global maximum PMF, indeed often this
does not give good starting atoms for the dual-space recycling. A simple and effective approach
is to try a fixed number (usually in the range 10000 to 99999) of random translations for a vector,
and retain the one with the highest PMF. A random selection of vectors from the Patterson peaklist (excluding Harker peaks), biased so that the high peaks are chosen more often, is an effective
way to pick the two-atom search fragment. If the atoms are expected to have lower than average
12
B-values, as is the case for iron atoms in heme groups or iron-sulfur clusters, it is advantageous to
sharpen the Patterson, e.g. by using coefficients (E3F)½ rather than F2.
Table 3.3 Application of integrated Patterson / direct methods to the location of the anomalous
scatterers from MAD data. In the number of sites column, the first number is the number found
and the second is the total number that should be present. aa = amino acid, SG = space group,
dmin is the limiting resolution to which the data were processed, and Soln./hr is the number of
solutions per hour on a 500MHz pentium PC. PAT-ratio is the speedup obtained by using starting
atoms consistent with the Patterson function.
Protein
No. of sites
No. of aa
SG
dmin[Å] CC[%] PAT-ratio Soln/hr
Api-dm
3/3 Se
144
C2221
2.2
45
16
256
RRF
3/4 Se
185
P43212
4.0
60
1.4
283
ModE
6/6 Se
524
P21212
3.0
66
7.3
163
9hem
18/18 Fe
584
P21
2.9
73
4.0
240
X1
32/32 Se
~1600
C2
3.5
49
5.1
9
Cyanase
40/40 Se
1560
P1
2.4
57
0.95
66
X2
51/60 Se
~1500
P21
2.5
52
2.8
13
X3
66/66 Se
2160
P21
2.6
60
12.5
24
Before the first dual-space cycle, the two starting atoms need to be extended to N atoms. A
difference Fourier synthesis would be effective for a small number of heavy atoms, but a better
technique for a large number is to calculate a full-symmetry Patterson superposition minimum
function (PSMF) (Buerger, 1959). First all symmetry equivalents are generated for the two
starting atoms. Each pixel of the PSMF map is assigned a value equal to the PMF for all vectors
between these atoms and a dummy atom placed at the pixel. Peaks are then obtained by map
interpolation and sorted in the usual way.
By applying this procedure before each run through the dual-space recycling, it is possible to
generate an unlimited number of different sets of starting atoms, all more or less consistent with
the Patterson function. Our tests have shown that this combination of direct and Patterson
methods produces more complete and precise solutions than just using the Patterson methods
13
alone. It appears that iterative Patterson-only procedures suffer from an accumulation of atomic
co-ordinate errors each time a new atom is added. Because it includes phase refinement, the dualspace approach does not suffer from this degradation as the number of atoms increases. Table 3.3
shown some results using this integrated Patterson / dual space recycling procedure on typical
MAD problems; note the efficiency in terms of solutions per hour and the completeness of the
solutions.
Table 3.4 Crossword table for location of the 8 iron atoms (two Fe4S4 clusters) in a HiPIP from
SAS F-data collected with Cu-K radiation (Rayment et al., 1992). Each entry in the table links
the atom forming the row with the atom forming the column, the top number of each pair is the
minimum distance between the two atoms, taking symmetry into account, and the bottom number
is the corresponding PMF. It is easy to find the two clusters by looking for Fe…Fe distances of
about 2.8Å, and – despite the weakness of the anomalous signal – the PMF values for the 8
correct atoms are in general higher than those involving spurious atoms.
Peak
x
y
z
self
cross-vectors
99.9
0.9201 0.0784 0.1133
27.7
26.6
88.4
0.9719 0.1047 0.1356
27.4
39.7
2.4
5.5
85.5
0.9043 0.1258 0.0884
27.7
27.3
2.6
23.3
82.7
0.9546 0.0950 0.0503
26.7
15.2
2.3 2.5 2.7
28.4 43.5 26.4
81.1
0.3542 0.5285 0.2615
31.2
20.9
14.6 16.6 14.4 14.6
41.4 14.8 9.5 21.5
80.5
0.4316 0.5144 0.2451
30.0
25.5
16.5 18.7 16.4 16.8
24.6 20.0 21.2 8.9
80.4
0.3942 0.5575 0.1995
29.6
0.0
14.4 16.4 13.9 14.6 2.7 2.9
31.4 7.7 22.6 33.8 26.6 19.4
73.9
0.3920 0.5023 0.1694
29.1
26.1
14.3 16.6 14.5 14.8 3.2
22.3 16.0 24.5 18.3 10.9
3.0
5.5
3.0
0.0
2.6 3.0
0.0 17.5
-------------------------------------------------------------------63.8
0.4025 0.4641 0.2218
29.9
18.4
58.9
0.9655 0.0517 0.0945
26.9
45.9
16.1 18.4 16.4 16.5
17.0 13.1 0.0 4.5
2.2 3.0
7.3 15.8
14
4.5
7.8
4.0
0.0
2.9
5.4
5.0
0.0
2.6 15.2 17.3 15.4
5.3 0.0 0.0 6.1
The Patterson superposition function is also the basis of the above crossword table that provides
a convenient way to assess which of the heavy atom sites are correct, and also in some cases to
recognize the presence of non-crystallographic symmetry. In this tables the rows and columns
correspond to the potential atoms. For each pair of atoms the top number is the minimum distance
between them, taking the space group symmetry into account, and the bottom number is the PMF
calculated from all vectors between the two atoms, also taking symmetry into account. The first
vertical column is based on the self-vectors, i.e. between one atom and its symmetry equivalents.
In general wrong sites can be recognized in this table by the presence of several zero PMF values
(negative values are replaced by zero). The mean PMF value for a specified number of atoms
provides a figure of merit PATFOM, useful for selecting the best solution, though the absolute
value depends on the structure in question. Almost always the correct solution has the largest CC
and the largest PATFOM.; this was the case for all the examples in Table 3.3. Table 3.4 illustrates
a typical crossword table. It is fortuitous that all four iron atoms in one cluster appear before
those in the other in this table, but one of the two independent molecules did indeed have higher
B-values than the other.
3.7 The .ins file instructions for SHELXD
SHELXD expects ONE and only one source of starting atoms. This can take the form:
A: Input atoms in normal SHELX format for expansion using PLOP
B: PATS to generate ‘slightly better than random’ atoms consistent with the Patterson
C: GROP and a PDB-format model fragment
D: Random atoms (used if none of the above apply)
The reflection data consists of an .hkl file containing F2 (HKLF 4) or F-values (HKLF 3). These
may correspond to either native data for ab initio structure solution or structure expansion, or
MAD, SAD, SIR or SIRAS FA or F values for heavy or anomalous atom location.
Dual-space recycling, using the largest E-values (FIND) is followed by peaklist optimization
(PLOP); one or both of these commands must be present. In the case of structure expansion only
PLOP can be used and the program then stops. When the starting atoms are generated randomly
or by PATS or GROP, the calculations are repeated for a new set of starting atoms. The total
number of such tries may be specified with NTRY, otherwise the program runs for ever; however
when the job is running the calculation may be terminated at the end of the current try by creating
a name.fin file in the current working directory.
15
In the following examples, TITL...UNIT in the normal SHELX format is assumed at the start of
the .ins file and HKLF 4 (or 3) followed by END at the end of the file. The cell contents defined
by SFAC and UNIT are only used by PLOP; in the FIND stage the atoms are assumed to be of
the same type but with occupancies proportional to the square root of the peak height.
1. To solve an approximately equal-atom structure using native data to atomic resolution (1.2Å or
better) the middle of the .ins file (between UNIT and HKLF) might be as follows (for 500 unique
non-hydrogen atoms):
FIND 400
PLOP 500 600
2. To solve the same structure by first locating a disulfide bond (PATS with a super-sharp
Patterson) then expanding to the complete structure (FIND/PLOP):
PATS
PSMF
FIND
PLOP
–2.06
-4
400
500 600
3. To locate 30 selenium atoms from MAD data:
PATS
FIND 30
MIND -3.5
[the .hkl file could contain h, k, l, FA and (FA) in FORMAT(3I4,2F8.2)].
4. To solve a cyclodextrin structure with four beta-cyclodextrins in the asymmetric unit and with
data barely to atomic resolution, the following could be tried:
GROP -1.8
FIND 240
PLOP 320 400
GEOM 4
ATOM
1 C41
ATOM
2 C31
ATOM
3 C21
.... diglucose
ATOM
21 C52
ATOM
22 O52
MOL
1
-3.859
MOL
1
-5.081
MOL
1
-5.211
fragment in PDB format
MOL
1
-0.292
MOL
1
-0.642
16
4.863
7.904 1.000
4.209
8.524 1.000
2.740
8.155 1.000
(see test provided)
4.714
7.025 1.000
5.837
6.253 1.000
10.00
10.00
10.00
...
10.00
10.00
SHELXD is started with the command line:
shelxd name
and expects to find both input files name.ins and name.hkl in the current directory. It writes a
summary to the current window (standard output) and creates the files name.lst (more extensive
listing file) and name.res (SHELX format atoms, crystal coordinates).
The following instructions may be included in the .ins file. Default values are given in
square brackets; the # sign indicates that the default depends on other instructions:
TITL, CELL, ZERR, LATT, SYMM, SFAC and UNIT as usual (see the SHELX manual).
TRIC (or TRIK)
Flags expansion to non-centrosymmetric triclinic.
SHEL dmax [infinity], dmin [0]
Resolution limits in Å for all calculations.
NTRY ntry [0]
Number of global tries if starting from random atoms, PATS or GROP. If ntry is zero or absent,
the program runs until it is interrupted by writing a name.fin file in the current working directory.
PATS +np or –dis [100], npt [#], nf [5]
Calculates and stores Patterson. Using top np peaks or a random orientation vector of length
|dis|, tries npt random translations, selecting the one with the best Patterson minimum function
PMF (see PSMF). When selecting a vector from the list of unique Patterson peaks, special
vectors are ignored and the highest vector is chosen from nf random selections. This favors the
highest peaks but (if nf is not too large) also allows lower peaks a chance. For example, with the
default np = 100 and nf = 5, the chance is 39.5% that one of the first 10 vectors will be chosen
and 91.9% that one of the first 50 will be chosen.
If the first parameter is negative, nf random oriented vectors of length |dis| are compared on the
basis of their heights in the Patterson and the 'best' used for the translation search.
17
If PATS is used together with a second FIND parameter ncy greater than zero (or FIND followed
by only one number) a full-symmetry Patterson superposition minimum function (i.e. a
superposition based on the two peaks and all their symmetry equivalents) is used to locate the
atoms in the first FIND cycle. PATS and GROP are mutually exclusive.
GROP +ZZ or -Egr [0], +/-ngt [99], nor [99], ntr [9999]
6D Patterson search for small rigid group. If the first parameter is positive, the search is
performed using the Patterson minimum function PMF (see PSMF), using interatomic vectors for
which the product of the two atomic numbers is greater than ZZ. For each of |ngt| attempts, nor
random orientations are generated. The orientation with the best PMF (based on intramolecular
vectors only) for each attempt is subject to ntr translations. The solution with the best PMF in the
translational search (using both intra- and intermolecular vectors) in all the |ngt| attempts is used
to generate the starting atoms for the next stage (usually FIND). If the first parameter is negative,
an analogous procedure is employed but the function maximized is the sum of Ec2(Eo2–1) for
reflections with E > |Egr| and resolution d > dlim (see ESEL).
If the second parameter ngt is negative, the above procedure is used for the rotation and
translation search, but then a correlation coefficient (CC20) between Eo2 and Ec2 is calculated for
each 'best' rotation/translation combination using 20% of all reflections up to the limiting
resolution of dlim (20% rather than 100% is used to speed up the calculation). Thus one CC20
value is calculated for each of the |ngt| attempts. The solution with the highest CC provides the
starting atoms for the next stage. This is a slower but almost always better than the other criteria.
The search model is read from PDB-format ATOM or HETATM records in the .ins file. All other
PDB records should be removed. The atomic number is deduced from the atom name applying
PDB rules. The PMF search is recommended for searching for a heavy-atom cluster (e.g. from
SAS or MAD data) whereas the (slower) structure-factor based search is suitable for equal-atom
fragments such as a short piece of alpha-helix (for solving small proteins) or a diglucose fragment
(for solving cyclodextrins).
PSMF pres [4.0], psfac [0.34]
pres is the resolution of the Patterson in terms of minimum ratio of the number of grid points
along an axis and the maximum reflection index along that axis. If nres is negative a 'super-sharp'
Patterson with coefficients (E3F) is calculated, otherwise a normal F2 Patterson is used. psfac is
the fraction of the lowest values in the sorted list of Patterson heights that is summed to get the
PMF.
18
FRES res [3.0]
Resolution of all Fourier syntheses (including the PSMF but excluding the Patterson itself) in
terms of the minimum ratio of the number of grid points along an axis and the maximum
reflection index used along that axis.
ESEL Emin [#], dlim [1.0]
Minimum E and high-resolution limit for FIND and TANG. The E2 values are normalized to 1
in resolution shells, then smoothed. Emin defaults to 1.2 for ab initio structure solution and to 1.5
for heavy atom location (the absolute value of the first MIND parameter is used to distinguish
between these two cases depending on whether it is less than 1.6 or not).
FIND na [0], ncy [#]
Search for na atoms in ncy internal loop cycles (tangent formula + E-Fourier). ncy defaults to 20
(for heavy-atom location) or the maximum of 20 or na (for ab initio direct methods, distinguished
using MIND mdis). The highest na / ( 1 – fr ) peaks are selected, where fr is the WEED
parameter. The effect is that approximately na peaks remain after the random omit procedure
(WEED). The occupancy is made proportional to square root of peak height in the FIND stage.
TANG ftan [0.9], fex [0.4]
Fraction ftan of the ncy FIND cycles are performed using the tangent formula, the rest using a
Sim-weighted E-map. fex is the fraction of reflections with the largest Ecalc values to hold fixed
when doing tangent expansion to find the remaining phases.
NTPR ntpr [100]
Maximum number of (largest) TPR per reflection; negative for output of mean phase errors (if
phases were input).
MIND mdis [1.0], mdeq [2.2]
|mdis| is the shortest distance allowed between atoms for PATS and FIND. If mdis is negative
PATFOM is calculated, and the crossword table for the best PATFOM value so far is output to
the .lst file. In this case the solution is passed on to the PLOP stage if either the CC is the best so
far or the PATFOM is the best so far. mdeq is the minimum distance between symmetry
equivalents for FIND (for PATS the |mdis| distance is used). Thus the default setting of mdeq
19
prevents FIND from placing atoms on special positions. This is usually desirable because it helps
to avoid pseudo-solutions such as the 'uraninum atom solution' that are incorrect but fit the
tangent formula, but it might be better to change this setting to -0.1 to allow special positions
when looking for e.g. metal ions. For PLOP the PREJ instruction can be used to control whether
peaks on special positions are selected. Note also that a |mdis| threshold of 1.6A is used to decide
between all-atom ab initio and heavy atom location for the purpose of setting various defaults for
other parameters.
SKIP min2 [0.5]
During FIND, if the second peak height is less than min2 times the first, the first peak is rejected
(before applying WEED to reject other peaks). This is sometimes useful to suppress 'uranium
atom' solutions. In fact, for large equal-atom structures in space group P1 it is a good idea to
specify ‘SKIP 0.999’ so that the first peak is ALWAYS rejected!
WEED fr [0.3]
Randomly OMIT fraction fr of atoms in FIND stage (except in the last cycle). Does not apply to
PLOP.
GEOM ngm [0], ndwt [1.0], nha [0], d13 [2.45], dd [0.3]
After the peaksearch in the FIND and PLOP routines, ngm cycles (typically 2 to 5) of geometry
optimization are performed so that distances within dd of d13 are brought closer to d13. In
addition, all peak heights after the highest nha (heavy atoms) are multiplied by ndwt (typically
0.7; 1.0 for no action) if the peaks have no other atoms or peaks within the distance range
(d13+dd) to (d13–dd). This instruction is an attempt to build in a little chemical information and
it is hoped that it will enable the resolution requirement to be relaxed a little.
TEST Ccmin [#], delCC [#]
Go on to PLOP if CC > CCmin or CC is within delCC of best CC value so far. CCmin is
reduced by 0.1% each cycle until a solution passes this test. The defaults are 45 and 1 resp. for ab
initio solutions, and 10 and 5 resp. for heavy atom location (MIND mdis test).
KEEP nh [0]
Number of (heavy) atoms to retain during PLOP expansion.
20
PLOP followed by up to 10 numbers
PLOP specifies the number of peaks to start with in each cycle of the 'peaklist optimization'
algorithm of Sheldrick & Gould (1995). Peaks are then eliminated one at a time until either the
correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated.
PREJ maxb [3], dsp [-0.01], mf [1]
maxb is the maximum number of bonds to atoms or higher peaks, the peak is deleted if there are
more. Peaks are also deleted if they are less than dsp from their equivalents (PLOP only, FIND
uses second MIND parameter), do not output atoms to final .res file if less than mf atoms in
'molecule'.
SEED nrand [0]
Set random number seed so that exactly the same results are generated if the job is repeated;
each integer nrand defines a different sequence of random numbers. If nrand is omitted or zero,
the seed is randomized so a different sequence is always generated..
MOVE dx [0], dy [0], dz [0], sign [1]
Shift following coordinates (not ATOM/HETATM).
ATOM and HETATM
PDB format atoms for GROP
HKLF m
m = 4 for F2 in .hkl file, m = 3 for F (or FA or F)
END
21
4. A guide to SHELX for macromolecules: Phasing
4.1 Introduction
Since small-molecule direct methods and Patterson interpretation algorithms can be used to
locate a small number of heavy atoms or anomalous scatterers, SHELXS has been used by
macromolecular crystallographers for a number of years, and SHELXD is designed both for the
ab initio solution of small proteins given data to atomic resolution (1.2Å or better) and for the
location of the anomalous scatterers from MAD or SAS (also known as SAD or OAS) data.
4.2 Heavy atom location using SHELXS and SHELXD
One might expect that a small-molecule direct methods program, such as SHELXS (Sheldrick,
1990), that routinely solves structures with 20-100 unique atoms in a few minutes or even
seconds of computer time, would have no difficulty in locating a handful of heavy atom sites
from isomorphous or anomalous F data. However, such data can be very noisy, and a single
seriously aberrant reflection can invalidate a large number of probabilistic phase relations. The
most important direct methods formula is still the tangent formula of Karle & Hauptman (1956);
most modern direct methods programs (e.g., Busetta et al., 1980; Debaerdemaeker, Tate &
Woolfson, 1985; Sheldrick, 1990) use versions of the tangent formula that have been modified to
incorporate information from weak reflections as well as strong reflections, which helps to avoid
pseudo-solutions with translationally displaced molecules or a single dominant peak (the socalled uranium atom solution). Isomorphous and anomalous F values represent lower limits on
the structure factors for the heavy atom substructure and so do not give reliable estimates of weak
reflections; thus, the improvements introduced into direct methods by the introduction of the
weak reflections are largely irrelevant when they are applied to F data. This does not apply
when FA values are derived from a MAD experiment, since these are true estimates of the heavy
atom structure factors; however, aberrant large and small FA estimates are difficult to avoid and
often upset the phase determination process. A further problem in applying direct methods to F
data is that it is not always clear what the effective number of atoms in the cell should be for use
in the probability formulas, especially when it is not known in advance how many heavy atom
sites are present.
4.3 The Patterson interpretation algorithm in SHELXS
Space-group general automatic Patterson interpretation was introduced in the program SHELXS86 (Sheldrick, 1985); completely different algorithms are employed in the current version of
22
SHELXS, based on the Patterson superposition minimum function (Buerger, 1959, 1964:
Richardson & Jacobson, 1987; Sheldrick, 1991, 1998a; Sheldrick et al., 1993). The algorithm
used in SHELXS-97 is as follows:
1. A single Patterson peak, v, is selected automatically (or input by the user) and used as a
superposition vector. A sharpened Patterson (with coefficients (E3F) instead of F2, where E is a
normalized structure factor) is calculated twice, once with the origin shifted to –v/2 and once
with the origin shifted to +v/2. At each grid point, the minimum of the two Patterson values is
stored, and this superposition minimum function is searched for peaks. If a true single-weight
heavy atom-to-heavy atom vector has been chosen as the superposition vector, this function
should consist ideally of one image of the heavy-atom structure and one inverted image, with two
atoms (the ones corresponding to the superposition vector) in common. There are thus about 2N
peaks in the map, compared with N2 in the original Patterson, a considerable simplification. The
only symmetry element of the superposition function is the inversion center at the origin relating
the two images.
2. Possible origin shifts are found so that the full space-group symmetry is obeyed by one of the
two images, i.e., for about half the peaks, most of the symmetry equivalents are present in the
map. This enables the peaks belonging to the other image to be eliminated and, in principle,
solves the heavy-atom substructure. In the space-group P1, the double image cannot be resolved
in this way.
3. For each plausible origin shift, the potential atoms are displayed as a triangular table that gives
the minimum distance and the Patterson superposition minimum function value for all vectors
linking each pair of atoms, taking all symmetry equivalents into account. This ‘crossword’ table
enables spurious atoms to be eliminated and occupancies to be estimated and also in some cases
reveals the presence of non-crystallographic symmetry.
4. The whole procedure is then repeated for further superposition vectors as required. The
program gives preference to general vectors (multiple vectors will lead to multiple images), and it
is advisable to specify a minimum distance of (say) 8Å for the superposition vector (3.5Å for
selenomethionine MAD data) to increase the chance of finding a true heavy atom-to-heavy atom
vector.
4.4 Examples of heavy-atom location with SHELXS
First we will consider a very straightforward example, using SIR F-data for the protein barnase,
that is provided as a test job for SHELXS-97. SHELXS (and SHELXL and SHELXD) requires
23
two standard text input files that in this case are called barnase.ins and barnase.hkl. The .ins file
contains the crystal data (cell, space group and contents) followed by specific instructions; the
.hkl file is simply a list of h, k, l, F and (F) in fixed format. In this particular case the very
short .ins file was created by hand using a text editor and the .hkl file was output by the CCP4
program mtz2various, but there a number of possible graphical interfaces (e.g. the XPREP
program in the Bruker SHELXTL system) that could have set up both files with a minimum of
user effort. The barnase.ins file takes the following form:
TITL
REM
REM
CELL
ZERR
LATT
SYMM
SYMM
SFAC
UNIT
PATT
HKLF
END
Barnase Au del(F) in P3(2)
Isomorphous delta-F data for Au derivative (3 sites)
kindly donated by Eleanor Dodson, University of York, UK
1.54178 58.970 58.970 81.580 90.00 90.00 120.00
1.00
0.008
0.008
0.016
0.00
0.00
0.00
-1
-Y, X-Y, .66667+Z
-X+Y, -X, .33333+Z
N AU
200 9
! fudge unit-cell contents for delta-F data
2
! PATT –4 or PATT –10 for more difficult cases
3
It will be seen that SHELX instructions consist of a four letter keyword followed by further
information in free format on the same line. In the days of punched-card input this was
revolutionary, now it looks antique but is still practicable and easy to transfer between different
operating systems etc. Comments can be added with the keyword ‘REM’ or may follow ‘!’ on a
single line. Some of the information here (e.g. the wavelength and cell esds) will not be used by
SHELXS but is required for consistency with SHELXL. ‘LATT –1’ specifies a noncentrosymmetric primitive lattice and is followed by two of the three symmetry operators that
define the space group P32 (the operator X, Y, Z is omitted since it is common to all space
groups). The last operator could also have been input as ‘SYMM y-x, -x, 1/3+z’; in general lower
and upper case letters are equivalent. The SFAC and UNIT instructions define the cell contents,
but for heavy-atom location from F-data the contents need to be fudged (square root of the
number of light atoms followed by the expected number of heavy atoms in the cell); this is only
important for direct methods. HKLF 3 is used to flag F rather than (F)2 (HKLF 4).
‘PATT 2’ specifies Patterson solution attempts using two different superposition vectors; for a
difficult problem ‘PATT –10’ (10 trial vectors, the minus sign reduces the thresholds so that
more peaks are tested etc., i.e. the program tries harder) would be more apropriate. Note that only
this line needs to be changed (to TREF for an easy problem or TREF 5000 for a difficult one) to
use the SHELXS direct methods to find the gold sites. In both cases the resulting heavy atom
24
positions are written to the file barnase.res; in the PATT case the barnase.lst listing file includes
the crossword table:
Name
At.No.
x
y
z
s.o.f.
Min. distances / PATSMF
AU1
84.4
0.1318 0.0494 0.5458
1.00
29.64
48.4
AU2
80.0
0.2831 0.5398 0.6667
1.00
29.45
80.1
27.48
62.3
AU3
48.2 -0.2260 0.3630 0.6308
1.00
28.91
40.8
35.00 35.46
34.5 41.1
AU4
21.8
1.00
27.28
12.2
0.0246 0.0134 0.6418
9.61 37.97 30.80
0.0 12.7
0.0
Understanding the crossword tables produced by SHELXS and SHELXD is the key to successful
heavy atom location. The names and atomic numbers are invented by the program and need not
be taken seriously. They are followed by the crystal co-ordinates of the heavy atoms and their
site occupancies (always 1.0 except for atoms on special positions; in which case the value is less
than one). Special positions should be treated with suspicion for heavy atom derivatives but can
happen; they can be eliminated by making the second PATT parameter (the minimum intersite
distance) negative. All the remaining items in the table are double entries; the top value is the
minimum distance between one atom and its symmetry equivalents (first column) or between the
atom marking the row and the atom marking the column (and all its symmetry equivalents;
remaining columns). The bottom number is the corresponding Patterson superposition minimum
function for all the vectors between one atom and its equivalents (first column) or between one
atom and another, including all equivalents of the latter (remaining columns). Thus 40.8 is the
Patterson minimum function for vectors between Au3 and its symmetry equivalents and 35.46 is
the minimum distance in Angstroms between Au2 and Au3, taking symmetry into account. Au4
is clearly a spurious atom (or possibly an additional low-occupancy site) because of the low
Patterson minimum function value involving it. In this example the distance information is not
useful (except as a check that the sites are not too close to one another); formation of a trimer
with equal gold-gold distances would not be obvious if an intertrimer Au...Au distance were
shorter than the intratrimer distance.
25
4.5 Integrated Patterson and direct methods: SHELXD
SHELXD is designed both for the ab initio solution of macromolecular structures from atomic
resolution native data alone and for the location of heavy-atom sites from F or FA values at
much lower resolution, in particular for the location of larger numbers of anomalous scatterers
from MAD data. The dual-space approach of SHELXD was inspired by the Shake and Bake
philosophy of Miller et al. (1993, 1994) but differs in many details, in particular in the extensive
use it makes of the Patterson function that proves very effective in the applications involving F
or FA data. An advantage of the Patterson is that it provides a good noise filter for the F or FA
data: negative regions of the Patterson can simply be ignored. On the other hand, the direct
methods approach is efficient at handling a large number of sites, whereas the number of
Patterson peaks to analyze increases with the square of the number of atoms. Thus, for reasons of
efficiency, the Patterson function is employed at two stages in SHELXD: at the beginning to
obtain starting atom positions (otherwise random starting atoms would be employed) and at the
end, in the form of the triangular crossword table as used in SHELXS, to recognize which atoms
are correct. In between, several cycles of real/reciprocal space alternation are employed as in the
ab initio structure solution, alternating between tangent refinement, E-map calculation, and peaksearch, and possibly random omit maps, in which a specified fraction of the potential atoms are
left out at random. Further details of the algorithms used in SHELXD are given by Sheldrick
(1998b), Usón & Sheldrick (1999) and in the previous chapter.
From the user’s point of view, the input to SHELXD for the location of heavy atom sites is
extremely similar to that for SHELXS. The PATT (or TREF) instruction is replaced by FIND
followed by the expected number of heavy atom sites, and the minimum allowed distance
between two sites is given after MIND, with a negative sign to indicate that a crossword table
should be calculated for the best solutions. Thus for barnase the .ins file would contain
TITL...UNIT as before followed by::
PATS
FIND
MIND
NTRY
HKLF
END
3
–8
100
3
This would start with atoms consistent with the Patterson, which results in about 90% of
solutions being correct. Leaving out ‘PATS’, i.e. starting instead from random atoms, reduces
this percentage to about 20%. The NTRY instruction specifies the number of tries, if this
instruction is missing the program runs for ever (which can sometimes be convenient; the job can
26
be interrupted, e.g. by creating a file barnase.fin, when it looks as though the structure has been
solved).
4.6 Practical considerations for locating heavy atoms
Since the input files for the direct and Patterson methods in SHELXS and the integrated method
in SHELXD are so similar, it is easy to try all three methods for a difficult problem. The
Patterson interpretation in SHELXS is a good choice if the heavy atoms have variable
occupancies and it is not known how many heavy-atom sites need to be found; the direct methods
approaches work best with equal atoms. In general, the conventional direct methods in SHELXS
will tend to perform best in the non-polar space-group that does not possess special positions.
However, for more than about a dozen sites, only the integrated approach in SHELXD is likely to
prove the most effective; the SHELXD algorithm works best when the number of sites is known,
at least approximately.
Especially for the MAD method, the quality of the data is decisive; it is essential to collect data
with a high redundancy to optimize the signal to noise ratio and eliminate outliers. SHELX does
not include a program to extract FA-values from MAD data but the Bruker XPREP program can
be used for this. In general, a resolution of 3.5Å is adequate for the location of heavy-atom sites.
A critical decision is the resolution limit at which to cut the data. XPREP calculates a correlation
coefficient between the signed anomalous differences for each pair of wavelengths as a function
of the resolution; in practice the correlation coefficient becomes smaller at high resolution.
Experience indicates that the data should be truncated at the resolution at which the correlation
coefficient falls below 30%. If this requires throwing away all the data, the anomalous signal is
probably too weak for structure solution!
When data have been collected at three or more difference wavelengths without major problems
such as crystal decay, icing, wavelength drift etc., SHELXS and SHELXD tend to perform better
with FA-values than with anomalous F-values. It is however debatable as to whether it is better
to collect highly redundant data at a single wavelength corresponding to a significant f”-value for
an SAS experiment or to spend the same time measuring less precise data at several wavelengths
for a MAD experiment.
SHELXS and SHELXD only provide a method of locating the heavy atoms or anomalous
scatterers. They do not include facilities for the further calculations necessary to obtain maps.
The programs SHARP (de la Fortelle & Bricogne, 1997) and DM (Cowtan, 1999) are
recommended for this purpose. Experience indicates that it is only necessary to refine the Bvalues of the heavy atoms using other programs; their coordinates are already rather precise.
27
4.7 A selenomethionine MAD example
The following example of GPATase, kindly provided as a test by Janet Smith and Joe Krahn,
illustrates the application of SHELXD to a selenomethionine MAD problem. 22 unique selenium
atoms were expected, but only 20 can be found, probably because the two N-terminal
selenomethionines exhibit high thermal motion. Unusually, in this test the solution with the
highest correlation coefficient (CC) is not correct, but the correct solution can easily be identified
on the basis of the PATFOM figure of merit and by inspection of the crossword table. First the
crossword table for the solution with the highest CC is shown, this is clearly wrong because there
are many low and zero Patterson values (the bottom number of each pair):
Solution 20 (false)
Initial CC 42.6
N
self
1
19.0
0.0
2
12.8
0.0
3
50.4
0.2
31.8 33.3
0.0 0.0
4
50.2
0.0
16.1 18.7 33.3
0.0 0.0 8.7
5
51.5
15.3
43.9 43.7 31.2 35.0
0.0 0.0 0.4 0.0
6
51.1
0.0
36.8 35.2 31.9 35.0 18.1
0.0 0.0 0.0 10.3 0.0
7
34.5
0.0
13.6 13.3 36.4 12.2 31.8 26.5
0.0 0.0 1.1 0.0 0.0 0.0
8
56.0
1.1
21.2 23.6 37.8
0.0 0.6 0.8
PATFOM 2.59
cross-vectors
4.0
0.0
5.6 31.2 33.8 15.1
0.0 0.1 0.0 0.0
...etc...
The correct solution (truncated) was as follows:
28
Solution 1 (correct)
Initial CC 33.1
PATFOM 19.63
N
self
cross-vectors
1
31.4
10.7
2
50.7
20.7
33.4
11.5
3
51.2
19.3
35.0 31.2
11.5 13.1
4
40.8
5.1
5
31.5
8.0
53.6 35.4 36.2 49.8
8.9 15.5 11.6 8.6
6
29.2
9.6
14.4 31.5 34.9 16.4 40.9
9.3 8.9 8.0 7.6 5.7
7
52.6
7.9
13.3 36.5 26.5
9.2 11.7 9.0
8
41.6
8.8
45.3 26.7 37.7 44.2 13.3 40.3 41.9
9.4 8.3 6.5 7.0 5.4 6.0 8.6
5.7 35.7 31.2
7.5 7.1 8.2
9.2 46.8 15.4
6.6 9.1 8.5
...etc...
18
35.0
0.0
38.7 33.7 27.9 41.8 16.8 25.9 40.0
3.5 1.8 2.7 2.1 1.7 2.0 2.5
19
37.8
0.0
16.6 25.9 32.3 18.6 39.8
0.0 1.8 1.5 1.8 0.0
20
29.3
2.8
22.8 27.6 31.8 26.0 31.9 10.7 25.3
1.5 1.3 0.4 0.0 0.8 2.4 0.0
6.1 17.1
2.3 3.6
============================================
21
16.3
0.2
20.6 38.8 32.8 22.8 34.5 11.1 22.4
0.0 0.0 0.0 0.0 0.0 0.0 0.0
22
37.4
0.0
33.7 26.4 29.7 30.8 46.3 27.3 22.2
1.4 0.0 0.2 0.0 0.0 0.0 0.0
The sites 21 and 22 are incorrect, as indicated by the many Patterson zeros!
29
5. A guide to SHELX for macromolecules: Refinement
5.1 Refinement of macromolecules with SHELXL
Recently, improvements in cryocrystallography, area detectors, and synchrotron data collection
have led to a rapid increase in the number of high resolution (<2Å) macromolecular data-sets.
The enormous increase in available computer power makes it feasible to refine these structures
using algorithms incorporated in SHELXL that were initially designed for small molecules.
These algorithms are generally slower but make fewer approximations (e.g., conventional
structure factor summation rather than FFT) and include features, such as anisotropic refinement,
modeling of complicated disorder and twinning, estimation of standard uncertainties by inverting
the normal matrix, etc., that are routine in small-molecule crystallography but are not widely
implemented in programs written specifically for macromolecular structure refinement.
SHELXL is a very general refinement program that is equally suitable for the refinement of
minerals, organometallic structures, oligonucleotides, or proteins (or any mixture thereof) against
X-ray or neutron single (or twinned!) crystal data. It has even been used with diffraction data
from powders, fibers, and two-dimensional crystals. For refinement against Laue data, it is
possible to specify a different wavelength and hence dispersion terms for each reflection. The
price of this generality is that it is somewhat slower than programs specifically written only for
protein structure refinement. Any protein- (or DNA-) specific information must be input to
SHELXL by the user in the form of refinement restraints, etc. Refinement of macromolecules
using SHELXL has been discussed by Sheldrick & Schneider (1997).
Despite this generality, it must be emphasized that SHELXL is not suitable for refinements at
resolutions lower than about 2.5Å because, unlike CNS and X-PLOR, it does not include general
energy terms, and that a least-squares refinement program such as SHELXL will suffer more
from model bias than a program based on maximum likelihood. Thus almost always the initial
refinement will have been performed with another program and SHELXL will be used for the
final refinement, perhaps involving extension to very high resolution, modeling of disorder,
anisotropic refinement and the least-squares estimation of parameter errors. Thus the starting
point for a SHELXL refinement will usually be a PDB format file from the previous refinement.
Even when SHELXL is used for the refinement of a twinned structure at lower resolution, the
starting model is likely to be in the form of a PDB file from a molecular replacement solution.
30
5.2 Input and output files for SHELXL
SHELXL, like SHELXS and SHELXD, usually requires two input files: an .ins file containing
crystal data, instructions and atoms, and an .hkl file containing h, k, l, F2 and (F2) in fixed
‘HKLF 4’ format [F and (F) may also be used and require the instruction ‘HKLF 3’].
The .ins file will usually be generated from a PDB format file using the ‘I’ option in
SHELXPRO. This sets up the TITL...UNIT instructions as for input to SHELXS or SHELXD,
followed by standard refinement instructions, restraints, instructions for generating hydrogen
atoms (commented out until needed) and atoms in crystal coordinates. For residues other than the
20 standard amino-acids, suitable restraints (see below) must be added by hand (DNA and RNA
restraints are provided in files on the SHELX ftp site). The ‘I’ option in SHELXPRO provides a
way of renumbering the residues; since SHELXL does not (currently) recognize chain identifiers,
chains must be emulated by (for example) adding 1000, 2000 etc. to the residue numbers.
SHELXPRO can also perform the reverse operation when preparing a PDB file for deposition
(the ‘B’ option). After each refinement job, the output .res file is edited or renamed to a new .ins
file that serves as the input for the next refinement job. The updating of the .res file to .ins may
also be performed by ‘U’ option in SHELXPRO, incorporating changes generated with the help
of the graphics display program XtalView (McRee, 1992). Sometimes hand editing of these text
files may be required too; this is the normal procedure for small molecule refinements. Annotated
extracts from a typical .ins file are shown at the end of this chapter and should be looked at
before reading further.
The .hkl file may be generated directly by the data reduction programs or by CCP4. It is not
necessary to sort the data, eliminate systematic absences or merge equivalents, SHELXL can do
this anyway. If it is desired to refine (using complex scattering factors) against separate F2-values
for h,k,l and –h,-k,-l some care is needed; there are problems using data processing software (such
as CCP4) that does not keep these measurements separate, and ‘MERG 2’ must be specified in
the .ins file to prevent SHELXL from merging the Friedel opposites (and setting all f” values to
zero). A further problem on continuing a refinement started with another program is to ensure
consistent flagging of the Rfree reflections (Brünger, 1992). SHELXPRO enables X-PLOR and
CNS files to be converted to SHELX format, retaining the Rfree flags, mtz2various can write an
.hkl file including Rfree flags, and the Bruker XPREP program provides general facilities for
setting Rfree flags and for transferring and extending Rfree flags consistently from one reflection
file to another taking space group symmetry into account. When twinning or NCS are present, it
is better to flag thin resolution shells, otherwise random reflections should be flagged.
SHELXL also writes an output .fcf file containing phased reflection data in CIF format. This may
be read directly into XtalView to create maps or into SHELXPRO to produce statistics.
31
SHELXPRO can also generate maps for O (Jones et al., 1991) and (Turbo)Frodo (Jones, 1978).
This .fcf file, since it is in CIF format, is suitable for direct deposition with the RCSD/PDB.
XtalView is recommended as the graphical interface for adjusting the model between SHELXL
refinements of macromolecules, because it can read the .pdb and .fcf output files written by
SHELXL directly, and is able to handle disorder and anisotropic displacement parameters
correctly.
5.3 Constraints and restraints
In refining macromolecular structures, it is almost always necessary to supplement the diffraction
data with chemical information in the form of restraints. A typical restraint is the condition that a
bond length should approximate to a target value with a given estimated standard deviation;
restraints are treated as extra experimental data items. Even if the crystal diffracts to 1.0Å, there
may well be poorly defined disordered regions for which restraints are essential to obtain a
chemically sensible model (the same can be true of small molecules too!). SHELXL is generally
not suitable for refinements at resolutions lower than about 2.5Å because it cannot handle general
potential energy functions, e.g., for torsion angles or hydrogen bonds; if non-crystallographic
symmetry restraints can be employed, this limit can be relaxed a little.
For some purposes (e.g., riding hydrogen atoms, rigid group refinement, or occupancies of atoms
in disordered side-chains), constraints, exact conditions that lead to a reduction in the number of
variable parameters, may be more appropriate than restraints; SHELXL allows such constraints
and restraints to be mixed freely, i.e. an atoms may be simultaneously subject to several different
constraints and restraints. Riding hydrogen atoms (set using HFIX or AFIX instructions) are
defined such that the C-H vector remains constant in magnitude and direction, but the carbon
atom is free to move; the same shifts are applied to both atoms, and both atoms contribute to the
least-squares derivative sums. This model may be combined with anti-bumping restraints that
involve hydrogen atoms, which helps to avoid unfavorable side-chain conformations. SHELXL
also provides, e.g., methyl groups that can rotate about their local three-fold axes; for small
molecules the initial torsion angle may be found using a difference electron density synthesis
calculated around the circle of possible hydrogen positions (HFIX 137). In macromolecules,
methyl groups are rarely so well defined, so a staggered riding model is usually better (HFIX 33).
Restraints and constraints provide good examples of the way in which individual residues can be
referenced by SHELXL. For example, in the .ins file given at the end of this chapter:
ANIS_* FE SG SD
makes atoms called FE, SD and SG in any residue anisotropic;
32
DFIX_1 C1 N 1.329
restrains a specific bond length (for the N-terminal formyl group). Note
that when no esd is given, the default (here 0.2Å from the DEFS instruction) is assumed.
DFIX_ALA 1.525 C CA
restrains the C-CA bond in all alanine residues
SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42 restrains the bond lengths in the FeS4
unit to be equal, but without a target value, with an esd of 0.04Å. The central iron atom is in
residue number 54 and the four cystein sulfurs are all in different residues.
FLAT_* 0.3 O_- CA_- N C_- CA restrains N and CA of each amino-acid and O, CA and C of
the preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity)
5.4 Least-squares refinement algebra
The original SHELX refinement algorithms were modeled closely on those described by
Cruickshank (1970). For macromolecular refinement, an alternative to (blocked) full-matrix
refinement is provided by the conjugate-gradient solution of the least-squares normal equations
as described by Hendrickson & Konnert (1980), including preconditioning of the normal matrix
that enables positional and displacement parameters to be refined in the same cycle. The structure
factor derivatives contribute only to the diagonal elements of the normal matrix, but all restraints
contribute fully to both the diagonal and non-diagonal elements, although neither the Jacobian
nor the normal matrix itself are ever generated by SHELXL. The parameter shifts are modified
by comparison with those in the previous cycle to accelerate convergence whilst reducing
oscillations. Thus, a larger shift is applied to a parameter when the current shift is similar to the
previous shift, and a smaller shift is applied when the current and previous shifts have opposite
signs.
SHELXL refines against F2 rather than F, which enables all data to be used in the refinement
with weights that include contributions from the experimental uncertainties, rather than having to
reject F-values below a preset threshold; there is a choice of appropriate weighting schemes.
Provided that reasonable estimates of (F2) are available, this enables more experimental
information to be employed in the refinement; it also facilitates refinement against data from
twinned crystals.
5.5 Full-matrix estimates of standard uncertainties
Inversion of the full normal matrix (or of large matrix blocks, e.g., for all positional parameters)
enables the precision of individual parameters to be estimated (Rollett, 1970), either with or
without the inclusion of the restraints in the matrix. The standard uncertainties in dependent
33
quantities (e.g., torsion angles or distances from mean planes) are calculated in SHELXL using
the full least-squares correlation matrix. These standard uncertainties reflect the data-toparameter ratio, i.e., the resolution and completeness of the data and the percentage of solvent,
and the quality of the agreement between the observed and calculated F2-values (and the
agreement of restrained quantities with their target values when restraints are included).
If high resolution data are available (there must be appreciably more data than parameters!) and
the structure is not too large, it may be possible to obtain rigorous esds by matrix inversion. The
structure should first be refined to convergence with CGLS setting the second parameter to –1 to
calculate Rfree, then a further refinement should be performed against all data by deleting the
second CGLS parameter, and finally a single full-matrix cycle should be performed (‘L.S. 1’)
with zero damping and a zero shift multiplier (‘DAMP 0 0’) in which all restraints have been
removed. Often ‘BLOC 1’ will be used so that the (anisotropic) displacement parameters are
fixed in this final cycle, which makes the matrix appreciably smaller and more stable on
inversion, but still allows the estimation of realistic standard deviations on all geometrical
parameters. BOND, RTAB, HTAB and MPLA instructions may be needed to define the
dependent parameters for which esds are required.
Full-matrix refinement is also useful when domains are refined as rigid groups in the early stages
of refinement (e.g., after structure solution by molecular replacement), since the total number of
parameters is small and the correlation between parameters may be large. To set up such a rigid
group refinement, simply add ‘AFIX 6’ before the first atom of each rigid group and ‘AFIX 0’
after the last atom in each group. ‘BLOC 1’ can be used to hold the temperature factors constant.
The AFIX instructions can easily be removed later to remove the rigid group constraints. There is
no need to remove any restraints made redundant by the rigid groups, and restraints linking atoms
in different rigid groups will continue to be useful. The BUMP instruction should be commented
out (with REM) for this rigid group refinement.
5.6 Refinement of anisotropic displacement parameters
The motion of macromolecules is clearly anisotropic, but the data-to-parameter ratio rarely
permits the refinement of the six independent anisotropic displacement parameters (ADPs) per
atom; even for small-molecules and data-to-atomic resolution, the anisotropic refinement of
disordered regions requires the use of restraints. SHELXL employs three types of ADP-restraint
(Sheldrick 1993; Sheldrick & Schneider, 1997). The rigid bond restraint, first suggested by
Rollett (1970), assumes that the components of the ADPs of two atoms connected via one (or
two) chemical bonds are equal within a specified standard deviation. This has been shown to hold
accurately (Hirshfeld, 1976; Trueblood & Dunitz, 1983) for precise structures of small34
molecules, so it can be applied as a ’hard’ restraint with small estimated standard deviation. The
similar ADP restraint assumes that atoms that are spatially close (but not necessarily bonded
because they may be different components of a disordered group) have similar Uij components.
An approximately isotropic restraint is useful for isolated solvent molecules. These two restraints
are only approximate and so should be applied with low weights, i.e., high estimated standard
deviations.
The transition from isotropic to anisotropic roughly doubles the number of parameters and almost
always results in an appreciable reduction in the R-factor. However, this represents an
improvement in the model only when it is accompanied by a significant reduction in the free Rfactor (Brünger, 1992). Since the free R-factor is itself subject to uncertainty because of the small
sample used, a drop of at least 1% is needed to justify anisotropic refinement. There should also
be a reduction in the goodness of fit, and the resulting thermal ellipsoids should make chemical
sense and not be ‘non-positive-definite’!
In the .ins file at the end of this chapter, since the resolution is borderline for an anisotropic
refinement of all atoms, one could try ‘ANIS’ (all atoms assumed) instead of ‘ANIS_* FE SD
SG’ but add an ‘approximately isotropic’ restraint with a tight esd (0.3) on all atoms except iron
and sulfur (which are better defined because of their larger number of electrons) by:
ISOR 0.03 $C_* $N_* $O_*
5.7 Similar geometry and NCS restraints
When there are several identical chemical moieties in the asymmetric unit, a very effective
restraint is to assume that the chemically equivalent 1,2- and 1,3-distances are the same, but
unknown. This technique is easy to apply using SHELXL and is often employed for smallmolecule structures and, in particular, for oligosaccharides. Similarly, the terminal P-O bond
lengths in DNA structures can be assumed to be the same (but without a target value), i.e., it is
assumed that the whole crystal is at the same pH. For proteins, the method is less suitable because
of the different abundance of the different amino-acids, and, in any case, good target distances
are available (Engh & Huber, 1991).
Local non-crystallographic-symmetry (NCS) restraints (Usón et al., 1999) may be applied to
restrain corresponding 1,4-distances and isotropic displacement parameters to be the same when
there are several identical macromolecular domains in the asymmetric unit; usually, the 1,2- and
1,3-distances are restrained to standard values in such cases and so do not require NCS restraints.
Such local NCS restraints are more flexible than global NCS constraints and – unlike the latter –
do not require the specification of a transformation matrix and mask. The NCSY instruction
35
requires the user to define a set of atoms and the difference(s) in residue numbers between the
NCS related units (often 1000, if this offset has been used to represent different chains). It should
be noted that SHELXPRO provides a variety of ways of representing NCS differences
graphically.
5.8 Modeling disorder
There are many ways of modeling disorder using SHELXL, but for macromolecules the most
convenient is to retain the same atom and residue names for the two or more components and
assign a different ‘part number’ (analogous to the PDB alternative site flag) to each component.
With this technique, no change is required to the input restraints, etc. Atoms in the same
component will normally have a common occupancy that is assigned to a free variable (fv). The
starting values for the free variables are given, in order, on the FVAR instruction; note that there
is no free variable number 1 (adding 10 fixes a parameter); the first FVAR parameter is the
overall scale factor. Residues Glu_12 and Cys_38 have disordered side-chains in the example;
their occupancies are tied to fv(2) (for the atoms in component [PART] 1) and to 1-fv(2) for the
atoms in component 2 for Glu_12, and similarly fv(4) and 1-fv(4) for Cys_38. This ensures that
the sum of occupancies for both components is held at unity. ’21.0’ is interpreted as 1.0 times
fv(2), and –21.0 as 1.0 times [1-fv(2)].
This notation is not very intuitive, but it is concise and very flexible. Free variables may also be
used in DFIX and CHIV restraints. Thus ’CHIV_PRO 31 CA’ would cause the chiral volumes of
all proline CA atoms to be restrained to free variable number 3, which itself is allowed to refine.
In this way reasonable geometrical restraints can be applied even when the target values are
unknown. By restraining distances to be equal to a free variable using DFIX, a standard deviation
of the mean distance may be calculated rigorously using full-matrix least-squares algebra.
If there are three or more disorder components, then each of the common occupancies must be
assigned to a separate free variable (e.g. as 51, 61 and 71), and their sum can be restrained to
unity by the use of a SUMP restraint (e.g. ‘SUMP 1 0.01 1 5 1 6 1 7’).
5.9 The bulk solvent correction and water divining
Modeling the low resolution data is always difficult in macromolecular refinement. Of course
these data could be left out (using the SHEL instruction) but then the electron density maps
would suffer considerably. The mean electron density of the solvent is only slightly less than that
of protein or DNA, but the model usually contains no atoms in the middle of the solvent regions
because the solvent density is so featureless. The result is that the observed diffracted intensities
36
tend on average to be much smaller than those calculated from the model at very low resolution.
SHELX, in common with several other programs, uses Babinet’s principle to define a bulk
solvent model with two refinable parameters (Moews & Kretsinger, 1975). In addition, global
anisotropic scaling (Usón et al., 1999) may be applied using a parameterization proposed by
Parkin, Moezzi & Hope (1995).
An auxiliary program, SHELXWAT, allows automatic water divining by iterative least-squares
refinement, rejection of waters with high displacement parameters, difference electron density
calculation, and a peak-search for potential water molecules that make at least one good hydrogen
bond and no bad contacts; this is a highly simplified version of the ARP procedure of Perrakis,
Lamzin et al. (1997, 1999).
5.10 Twinned crystals
SHELXL provides facilities for refining against data from merohedral, pseudo-merohedral, and
non-merohedral twins (Herbst-Irmer & Sheldrick, 1998). Refinement against data from
merohedrally twinned crystals is particularly straightforward, requiring only the twin law (a 3x3
matrix) and starting values for the volume fractions of the twin components. Failure to recognize
such twinning not only results in high R-factors and poor quality maps, it can also lead to
incorrect biochemical conclusions (Luecke, Richter & Lanyi, 1998). Twinning can often be
detected by statistical tests (Yeates & Fam, 1999), and it is probably much more widespread in
macromolecular crystals than is generally appreciated!
No changes are needed to the .hkl file for merohedral twinning, but the data should be merged in
the lower of the two relevant Laue groups). For non-merohedral twinning a special (‘HKLF 5’)
format is required; probably the only general program currently able to generate this format is the
Bruker GEMINI program, though several SHELX users have written special programs for
individual cases.
5.11 The radius of convergence
Least-squares refinement as implemented in SHELXL and other programs is appropriate for
structural models that are relatively complete, but when an appreciable fraction of the structure is
still to be located, maximum-likelihood refinement (Bricogne, 1991; Pannu & Read, 1996;
Murshudov, Vagin & Dodson, 1997) is likely to be more effective, especially when experimental
phase information can be incorporated (Pannu et al., 1998). Within the least-squares framework,
there are still several possible ways of improving the radius of convergence. SHELXL provides
the option of gradually extending the resolution of the data during the refinement; a similar effect
37
may be achieved by a resolution-dependent weighting scheme (Terwillinger & Berendzen, 1996).
Unimodal restraints, such as target distances, are less likely to result in local minima than are
multimodal restraints, such as torsion angles; multimodal functions are better used as validation
criteria. It is fortunate that validation programs, such as PROCHECK (Laskowski et al., 1993),
make good use of multimodal functions, such as torsion angles and hydrogen bonding patterns
that are not employed as restraints in SHELXL refinements.
5.12 Unstable refinements and other problems
However much care is taken in setting up a refinement, it can happen that the refinement
becomes unstable and diverges. Usually the program detects this in time but in extreme cases,
especially when full-matrix refinement is performed with a poorly conditioned matrix, it can
crash. It is much more difficult to identify the cause of such problem when a large number of
changes have been made in updating a .res file to the .ins file for the next job, so it is often more
effective to improve the model in small steps. The .lst file contains a great deal of useful
diagnostic information (which can be increased by using MORE 3); however the best place to
start looking for problems is the list of ‘disagreeable restraints’; these often pinpoint the atoms or
restraints that need changing. Also the presence of unrestrained atoms (which are commented on
by the program) is a common cause of instability. In general, the more parameters that are
refined, the less stable the refinement becomes; typical examples are the inclusion of dubious
solvent water molecules or making all atoms anisotropic when there are not enough data.
Anti-bumping restraints are very useful in maintaining a chemically sensible structure, especially
at lower resolution, but can also set traps for the unwary. For example if two atoms that should
be bonded are too far apart for the program to include them automatically in the connectivity
array, an anti-bumping restraint may be generated automatically to push them apart and this will
fight against a DFIX or DANG restraint that is trying to bring them together! The remedy is to
join the two atoms by hand so that they are bonded in the connectivity array, e.g.
BIND CB_23 CG_23
Even if the side-chain of residue 23 in this example is disordered and the bond is only broken in
one component, this will have the desired effect. An incorrect connectivity can also affect the
operation of a CHIV instruction (which requires the specified atom to be bonded to three and
only three non-hydrogen atoms) and the automatic generation of hydrogen atoms (HFIX).
Superfluous bonds may be removed from the connectivity array using e.g.
FREE CB_23 CD_23
38
Usually if the connectivity array (included in the .lst file except for MORE 0) is correct, the
restraints will ensure that a sensible geometry is obtained during the refinement.
5.13 Example of an .ins file for SHELXL refinement
The following extracts from the file 6rxn.ins (provided together with 6rxn.hkl on the SHELX ftp
site) illustrate a number of points that are annotate by comments. The structure was determined
by Stenkamp, Sieker & Jensen, (1990) who have kindly given permission for it to be used in this
way. As usual in .ins files, comments may be included as REM instructions or after exclamation
marks. The resolution of 1.5Å does not quite justify refinement of all non-hydrogen atoms
anisotropically ('ANIS' before the first atom would specify this), but the iron and sulfur atoms can
be made anisotropic as shown below
TITL Rubredoxin in P1 (from 6RXN in PDB)
CELL 1.54178 24.920 17.790 19.720 101.00 83.40 104.50 ! Lambda & cell
ZERR
1 0.025 0.018 0.020
0.05 0.05
0.05 ! Z & cell esds
LATT -1
! Space group P1
SFAC C
H
N O
S FE
! Scattering factor types and
UNIT 224 498 55 136 6 1
! unit-cell contents
DEFS 0.02 0.2 0.01 0.05
CGLS
SHEL
FMAP
PLAN
10 -1
999 0.1
2
200 2.3
LIST 6
WPDB
HTAB
!
!
!
!
! Global default restraint esds
10 Conjugate gradient cycles, calculate Rfree
Do not truncate resolution
Difference Fourier
Peak-search and identification of potential waters
! Output phased reflection file to generate maps etc.
! Write PDB output file
! Output analysis of hydrogen bonds (requires H-atoms !)
DELU $C_* $N_* $O_* $S_*
! Rigid bond restraints - ignored for isotropic
SIMU 0.1 $C_* $N_* $O_* $S_* ! Similar U restraints - iso. or anis.
! Esd should be changed to ca. 0.05 if whole structure is anis.
ISOR 0.1 O_201 > LAST
! Approximate isotropic restraints for waters;
! ignored for isotropic
ANIS_* FE SD SG
! Make iron and all sulfur atoms anisotropic
CONN 0 O_201 > LAST
BUMP
! Don't include water in connectivity array and
! generate antibumping restraints automatically
SWAT
! Bulk solvent model
REM HOPE
! Anisotropic scaling not included
MERG 4 ! Remove MERG 4 if Friedel opposites should not be merged
MORE 1 ! MORE 0 for minimum, 2 or 3 for more output for diagnostics
39
REM Special restraints etc. specific to this structure follow:
REM HFIX 43 C1_1
DFIX C1_1 N_1 1.329
DFIX C1_1 O1_1 1.231
DANG N_1 O1_1 2.250
DANG C1_1 CA_1 2.435
!
! O=C(H)- (formyl) on N-terminus
! incorporated into residue 1
!
!
DFIX_52 C OT1 C OT2 1.249
DANG_52 CA OT1 CA OT2 2.379
DANG_52 OT1 OT2 2.194
!
! Ionized carboxyl at C-terminus
!
SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42 ! Equal but unknown Fe-S
SADI_54 0.08 FE CB_6 FE CB_9 FE CB_39 FE CB_42 ! distances around Fe
REM HFIX 83 SG_38 SG_138
DFIX
DANG
DANG
DANG
FLAT
RTAB
RTAB
RTAB
! -SH for remaining cysteine (disordered)
C_18 N_26 1.329
! Patch break in numbering - residues
O_18 N_26 2.250
! 18 and 26 are bonded but there is a
CA_18 N_26 2.425
! gap in numbering for compatibility
C_18 CA_26 2.435
! with other rubredoxins that have an
0.3 O_18 CA_18 N_26 C_18 CA_26 ! extra loop
Omeg CA_18 C_18 N_26 CA_26
!
Phi C_18 N_26 CA_26 C_26
!
Psi N_18 CA_18 C_18 N_26
!
REM DFIX from CSD and R.A.Engh & R.Huber, Acta Cryst. A47 (1991) 392.
REM Remove 'REM ' before HFIX to activate H-atom generation
REM HFIX_ALA 43 N
REM HFIX_ALA 13 CA
REM HFIX_ALA 33 CB
REM
REM
REM
REM
HFIX_ASN
HFIX_ASN
HFIX_ASN
HFIX_ASN
43
13
23
93
N
CA
CB
ND2
REM HFIX_ASP 43 N
REM HFIX_ASP 13 CA
REM HFIX_ASP 23 CB
... etc ...
REM HFIX_VAL 43 N
REM HFIX_VAL 13 CA CB
REM HFIX_VAL 33 CG1 CG2
REM Peptide standard torsion angles and restraints
RTAB_*
RTAB_*
RTAB_*
RTAB_*
Omeg CA C N_+ CA_+
Phi C_- N CA C
Psi N CA C N_+
Cvol CA
DFIX_* 1.329 C_- N
DANG_* 2.425 CA_- N
40
DANG_* 2.250 O_- N
DANG_* 2.435 C_- CA
FLAT_* 0.3 O_- CA_- N C_- CA
REM Standard amino-acid restraints etc.
CHIV_ALA C
CHIV_ALA 2.477 CA
DFIX_ALA
DFIX_ALA
DFIX_ALA
DFIX_ALA
DANG_ALA
DANG_ALA
DANG_ALA
DANG_ALA
1.231
1.525
1.521
1.458
2.462
2.401
2.503
2.446
C O
C CA
CA CB
N CA
C N
O CA
C CB
CB N
RTAB_ASN Chi N CA CB CG
CHIV_ASN C CG
CHIV_ASN 2.503 CA
DFIX_ASN
DFIX_ASN
DFIX_ASN
DFIX_ASN
DFIX_ASN
DFIX_ASN
DANG_ASN
DANG_ASN
DANG_ASN
DANG_ASN
DANG_ASN
DANG_ASN
DANG_ASN
DANG_ASN
1.231
1.525
1.458
1.530
1.516
1.328
2.401
2.462
2.455
2.504
2.534
2.393
2.419
2.245
C O CG OD1
C CA
N CA
CA CB
CB CG
CG ND2
O CA
C N
CB N
C CB
CA CG
CB OD1
CB ND2
OD1 ND2
RTAB_ASP Chi N CA CB CG
CHIV_ASP C CG
CHIV_ASP 2.503 CA
DFIX_ASP
DFIX_ASP
DFIX_ASP
DFIX_ASP
DFIX_ASP
DFIX_ASP
DANG_ASP
DANG_ASP
DANG_ASP
DANG_ASP
DANG_ASP
DANG_ASP
DANG_ASP
1.231
1.525
1.530
1.516
1.458
1.249
2.401
2.462
2.455
2.504
2.534
2.379
2.194
C O
C CA
CA CB
CB CG
CA N
CG OD1 CG OD2
O CA
C N
CB N
C CB
CA CG
CB OD1 CB OD2
OD1 OD2
41
RTAB_CYS Chi N CA CB SG
CHIV_CYS C
CHIV_CYS 2.503 CA
DFIX_CYS
DFIX_CYS
DFIX_CYS
DFIX_CYS
DFIX_CYS
DANG_CYS
DANG_CYS
DANG_CYS
DANG_CYS
DANG_CYS
1.231
1.525
1.458
1.530
1.808
2.401
2.504
2.455
2.462
2.810
C O
C CA
N CA
CA CB
CB SG
O CA
C CB
CB N
C N
CA SG
... etc ...
RTAB_VAL Chi N CA CB CG1
RTAB_VAL Chi N CA CB CG2
CHIV_VAL C
CHIV_VAL 2.516 CA
DFIX_VAL
DFIX_VAL
DFIX_VAL
DFIX_VAL
DFIX_VAL
DANG_VAL
DANG_VAL
DANG_VAL
DANG_VAL
DANG_VAL
DANG_VAL
WGHT
FVAR
RESI
C1
O1
N
CA
CB
CG
SD
CE
C
O
RESI
N
CA
CB
CG
CD
OE1
1.231
1.458
1.525
1.540
1.521
2.401
2.462
2.497
2.515
2.479
2.504
C O
N CA
C CA
CA CB
CB CG2 CB CG1
O CA
C N
C CB
CA CG1 CA CG2
N CB
CG1 CG2
0.100000
1.00000
1
1
4
3
1
1
1
5
1
1
4
2
3
1
1
1
1
4
MET
-0.01633
0.01012
0.00712
0.05947
0.07411
0.03196
0.04907
0.11380
0.10634
0.10329
GLN
0.14741
0.18940
0.22933
0.27354
0.24547
0.22482
0.5
0.5
0.5
0.5
0.35547
0.32681
0.44703
0.48491
11.00000
11.00000
0.11817
0.17896
0.35446
0.33273
0.33732
0.28864
0.31846
0.29170
0.38738
0.45513
0.37983
0.35391
0.27909
0.22872
0.14359
0.12261
0.39766
0.41972
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
0.11863
0.06229
0.15678
0.14569
0.23570
0.21476
0.09178
0.16480
0.35678
0.39931
0.34643
0.38674
0.38838
0.32772
0.40741
0.45565
0.45886
0.51173
0.58387
0.60689
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
0.08599
0.09291
0.13253
0.09866
0.05748
0.16301
42
NE2
C
O
RESI
N
CA
CB
CG
CD
CE
NZ
C
O
3
1
4
0.24704
0.22198
0.25019
0.46053
0.47895
0.48377
0.62045
0.43826
0.38408
11.00000
11.00000
11.00000
0.10164
0.08193
0.10402
3
1
1
1
1
1
3
1
4
LYS
0.21781
0.25088
0.21991
0.16130
0.12843
0.10532
0.05943
0.30678
0.31462
0.54034
0.62006
0.68311
0.66288
0.72146
0.70085
0.74195
0.63497
0.59598
0.48673
0.47934
0.51795
0.49255
0.52924
0.60053
0.62796
0.50917
0.55179
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
0.07413
0.05181
0.09646
0.10455
0.22324
0.26354
0.40338
0.05714
0.07986
3
... etc ...
RESI
N
CA
PART
CB
CG
CD
OE1
OE2
PART
CB
CG
CD
OE1
OE2
PART
C
O
12
3
1
GLU
0.41413
0.37955
1.09215
1.01183
0.48246
0.48195
11.00000
11.00000
0.06790
0.05761
1
1
1
4
4
0.32666
0.29679
0.25357
0.24346
0.23012
1.01321
0.93111
0.93709
1.00278
0.87537
0.52971
0.54638
0.60700
0.63210
0.63031
21.00000
21.00000
21.00000
21.00000
21.00000
0.12219
0.15333
0.20272
0.26315
0.21375
1
1
1
4
4
0.32549
0.27756
0.22547
0.20774
0.20259
1.01718
0.94582
0.95184
0.90241
1.00588
0.52772
0.50954
0.55635
0.59575
0.55325
-21.00000
-21.00000
-21.00000
-21.00000
-21.00000
0.12065
0.15928
0.20457
0.22329
0.31441
1
4
0.36477
0.34317
0.97439
1.00861
0.40859
0.37369
11.00000
11.00000
0.04768
0.06890
1
2
0
... etc ...
RESI
N
CA
PART
CB
SG
PART
CB
SG
PART
C
O
38
RESI
N
CA
CB
SG
C
O
39
3
1
CYS
0.77141
0.78873
0.92674
0.97402
0.00625
0.07449
11.00000
11.00000
0.10936
0.13706
1
5
0.83868
0.89948
1.04271
1.00271
0.05517
0.02305
41.00000
41.00000
0.11889
0.18205
1
5
0.84149
0.83686
1.03666
1.10360
0.06538 -41.00000
0.01026 -41.00000
0.14933
0.17328
1
4
0.74143
0.70724
1.01670
1.02319
0.10383
0.06903
11.00000
11.00000
0.08401
0.10188
3
1
1
5
1
4
CYS
0.74699
0.70682
0.72588
0.67932
0.70922
0.75427
1.04547
1.09027
1.11964
1.17560
1.16093
1.20325
0.17051
0.20876
0.28230
0.33481
0.17333
0.15858
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
0.08888
0.06869
0.04269
0.08016
0.06208
0.07437
1
2
0
43
... etc ...
RESI
N
CA
CB
C
OT1
OT2
52
RESI
FE
54
REM
REM
REM
REM
3
1
1
1
4
4
ALA
0.33596
0.30961
0.34040
0.24852
0.22236
0.22682
0.63469
0.68882
0.77357
0.67507
0.72170
0.61667
0.69557
0.74487
0.74194
0.73435
0.77321
0.69191
11.00000
11.00000
11.00000
11.00000
11.00000
11.00000
0.04662
0.08939
0.13277
0.09032
0.11368
0.08341
1.22290
0.43784
11.00000
0.07929
FE
6
0.72017
Only the waters with high occupancies and low U's have been
retained, and all the occupancies have been reset to 1, with
a view to running the automatic water divining. Water
residue numbers have been changed to start at 201.
RESI
O
RESI
O
RESI
O
RESI
O
201
HOH
0.13450
202 HOH
4
0.84795
203 HOH
4
0.27771
204 HOH
4
0.37066
4
0.53192
0.60802
11.00000
0.13132
0.53873
0.69488
11.00000
0.15273
0.95750
0.25086
11.00000
0.11315
0.71872
0.90376
11.00000
0.10854
1.38725
0.25914
11.00000
0.10698
... etc ...
RESI 233 HOH
O
4
0.27813
HKLF 3
END
44
6. Frequently asked questions (by biocrystallographers)
Q1: Where is the manual?
A: Postscript and MSWord versions of the manual can be downloaded from the SHELX ftp site.
However this manual was written for small-molecule crystallographers, you will still need it as a
reference book (it even has an index) but you should start by reading the material provided for the
Workshop. There is also a lot of useful information on the SHELX homepage or accessible via
links from it, including tutorials for which test data are available..
Q2: How do I transfer my data, including Rfree flags, from X-PLOR or other programs to
SHELX?
A: Use the ‘Y’ option in SHELXPRO to convert the .fob file to .hkl, and the ‘I’ option to convert
.pdb to .ins. Although SHELXL prefers intensities, for macromolecules it is OK to continue to
use F-values if you were using them in X-PLOR. In CCP4, the mtz2various program can write
SHELX format files. The Bruker XPREP program provides a space-group general option for
transferring Rfree flags from one data-set to another, taking equivalent reflections into account.
Q3: I have a non-standard ligand, how do I make the topology file?
A: SHELXL doesn’t have a topology file, the restraints etc. are all included in the .ins file. One
good way to generate these restraints is to find a suitable fragment in the CSD, then use the ‘J’
option in SHELXPRO. If it’s not in the CSD, you could do a quick small-molecule structure
(using SHELX) and feed that into SHELXPRO.
Q4: Why are the R-factors different from X-PLOR etc.?
A: Check that you are using the same data (F or F2, resolution cutoffs, Rfree flags ?) and that the
bulk solvent model is not causing problems (it tends to interact with the B-values, so it might be
best to do a few refinement cycles first to sort this out).
Q5. After using SHELXPRO to prepare the .ins file from a PDB file and then running
SHELXL, I get the message: ‘** No match for two atoms in DFIX **’ but otherwise
everything seems OK.
A: This message probably refers to the fact that SHELXPRO labels the oxygens of the carboxyterminus OT1 and OT2 so that different bond length restraints can be applied than to the same
type of amino-acid when it is in a peptide chain. This is normal and can be safely ignored. Other
such messages should always be investigated carefully, they may indicate missing or bad
restraints or bad initial connectivity (which can be corrected using BIND and FREE).
45
Q6. I can solve the structure by molecular replacement in space group P32 but the R-factors
are high and the Rsym for P3221 was not much higher than for P32. What should I do?
A: Your structure may well be merohedrally twinned, but don’t panic! The E-statistics can be
calculated using e.g. SHELXS, SHELXD or XPREP; <|E2-1|> << 0.736 would also suggest
twinning. All you need to do in this particular case is to include the two instructions:
TWIN 0 1 0 1 0 0 0 0 –1
BASF 0.3
In your .ins file and repeat the refinement job! If the BASF parameter (you can find it in the .lst
or .res file) refines to a value intermediate between 0 and 1, and R1 and Rfree drop significantly,
you are winning. No other special action is needed, SHELXPRO and XtalView can be used in the
usual way because the .fcf file is effectively ‘detwinned’.
Q7: When is it justified to refine anisotropically?
A: In general if the resolution is worse than about 1.5Å it is unlikely to be worth trying, but it
depends on the completeness and quality of the data and the percentage of solvent. A drop of Rfree
of about 1% or more might be considered to justify anisotropic refinement. In borderline cases
tighter restraints (including possibly ISOR for all atoms) might be needed.
Q8: When should I add hydrogen atoms?
A: As late as possible because they cost computer time, though it usually brings a drop in Rfree
between 0.5 and 1.0%. In many cases it is probably more trouble than it is worth to include the
OH groups; these hydrogens tend to have higher B-values and are more difficult to position
automatically. If one is unlucky, the program will put two OH hydrogens along the same
hydrogen bond, and the combination of anti-bumping restraints and the riding model can then
distort the rest of the structure.
Q9: SHELXL complains that it does not have enough memory, what should I do?
A: Use the larger version SHELXH. If even this is not large enough, you will have to change the
dimensions of the arrays A and B and recompile the program. This is explained in the comments
at the start of the source shelxl.f and compiling instructions are given on the SHELX homepage.
Q10: What does ‘nan’ mean?
A: ‘Not a number’. Something has gone seriously wrong with the calculation. Check the .lst file
for other warning messages and in particular the list of ‘disagreeable restraints’ for indications of
the source of the error. Perhaps you are simply trying to refine more parameters than the data can
46
support. It is also a good idea to introduce changes in small steps rather than to change everything
at once.
Q11: What citations should be quoted when I write up a structure solved or refined with
SHELX?
A: One or more of the following are suggested, depending on which programs were used:
SHELXL (refinement): Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high resolution
refinement. Methods in Enzymology, 277, edited by C. W. Carter, Jr. & R. M. Sweet, pp. 319–
343. San Diego: Academic Press.
SHELXS (direct methods): Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct
methods for larger structures. Acta Cryst. A46, 467–473.
SHELXS (Patterson): Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C.
(1993). The application of direct methods and Patterson interpretation to high-resolution native
protein data. Acta Cryst. D49, 18–23.
SHELXD: Usón, I. & Sheldrick, G. M. (1999). Advances in direct methods for protein
crystallography. Curr. Opinion in Struct. Biol. 9, 643–648.
Q12: I have a lot more questions ...
A: Further ‘frequently asked questions’ may be found on the SHELX homepage and in Thomas
Schneider’s FAQ list: http://shelx.uni-ac.gwdg.de/~trs/shelxl_faq/shelxfaq_index.html The latter
contains a great deal of useful information and is primarily intended for macromolecular
crystallographers using SHELXL. When you have really exhausted these sources of information
you may try an email to gsheldr@shelx.uni-ac.gwdg.de, uson@shelx.uni-ac.gwdg.de or
trs@shelx.uni-ac.gwdg.de; questions on twinning may be sent to rherbst@shelx.uni-ac.gwdg.de
47
7. References
Bricogne, G. (1991). A multisolution method of phase determination by combined maximization
of entropy and likelihood III. Extension to powder diffraction data. Acta Cryst. A47, 803–829.
Brodersen, D. E., de La Fortelle, E., Vonrhein, C., Bricogne, G., Nyborg, J. & Kjeldgaard, M.
(2000). Application of single-wavelength anomalous dispersion at high and atomic resolution.
Acta Cryst. D56, 431–441.
Brünger, A. T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of
crystal structures. Nature (London) 355, 472–475.
Buerger, M. J. (1959). Vector space and its application in crystal structure investigation. New
York: Wiley.
Buerger, M. J. (1964). Image methods in crystal structure analysis. In Advanced methods of
crystallography, edited by G. N. Ramachandran, pp.1–24. Orlando, Florida: Academic Press.
Busetta, B., Giacovazzo, C., Burla, M. C., Nunzi, A., Polidori, G. & Viterbo, D. (1980). The SIR
program. I. Use of negative quartets. Acta Cryst. A36, 68–74.
Cowtan, K. (1999). Error estimation and bias correction in phase-improvement calculations.
Acta Cryst. D55, 1555–1567.
Cruickshank, D. W. J. (1970). Least-squares refinement of atomic parameters. In
Crystallographic computing, edited by F. R. Ahmed, S. R. Hall & C. P. Huber, pp. 187–197.
Copenhagen: Munksgaard.
Debeardemaeker, T., Tate, C. & Woolfson, M. M. (1985). On the application of phase relations
to complex structures. XXIV. The Sayre tangent formula. Acta Cryst. A41, 286–290.
De La Fortelle, E. & Bricogne, G. (1997). Maximum-Likelihood Heavy-Atom Parameter
Refinement in the MIR and MAD Methods. Methods in Enzymology 276, edited by C. W. Carter,
Jr. & R. M. Sweet, pp. 472–494. San Diego: Academic Press.
Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure
refinement. Acta Cryst. A47, 392–400.
Fujinaga M. & Read R. J. (1987). Experiences with a new translation-function program. J Appl
Cryst. 20, 517–521.
Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous
diffraction of synchrotron radiation. Science 254, 51–58.
48
Hendrickson W. A. & Konnert, J. H. (1980). Incorporating stereochemical information into
crystallographic refinement. In Computing in crystallography, edited by R. Diamond, S.
Ramaseshan & K. Venkatesan, pp. 13.01-13.23. Bangalore: Indian Academy of Sciences.
Herbst-Irmer, R. & Sheldrick, G. M. (1998). Refinement of twinned structures with SHELX97.
Acta Cryst. B54, 443–449.
Hirshfeld, F. L. (1976). Can X-ray data distinguish bonding effects from vibrational smearing?
Acta Cryst. A32, 239–244.
Jones, T. A. (1978). A graphics model-building and refinement system for macromolecules. J.
Appl. Cryst. 11, 268–272.
Jones, T. A., Zou, J. Y., Cowan, S. W., & Kjeldgaard, M. (1991). Improved methods for building
protein models in electron density maps and the location of errors in these models. Acta Cryst.
A47, 110–119.
Karle, J. & Hauptman, H: (1956). A theory of phase determination for the four types of noncentrosymmetric space groups 1P222, 2P22, 3P12, 3P22. Acta Cryst. 9, 635–651.
Langs, D. A. (1988). Three-dimensional structure at 0.86Å of uncomplexed form of the
transmembrane ion channel peptide gramicidin A. Science 241, 188–191.
Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK, a
program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.
Luecke, H., Richter, H. T. & Lanyi, J. K. (1998). Proton transfer pathways in bacteriorhodopsin
at 2.3Å resolution. Science 280, 1934–1937.
McRee, D. E. (1992). A visual protein crystallographic software system for X11/Xview. J. Mol.
Graph. 10, 44–46.
Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). On
the application of the minimal principle to solve unknown structures. Science 259, 1430–1433.
Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). SnB: crystal structure
determination via Shake-and-Bake. J. Appl. Cryst. 27, 613–621.
Moews, P. C. & Kretsinger, R. H. (1975). Refinement of carp muscle parvalbumin by model
building and difference Fourier analysis. J. Mol. Biol. 91, 201–228.
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of structures by the
maximum-likelihood method, Acta Cryst. D53, 240–255.
Nordman, C. E. (1966). Vector space search and refinement procedures. Trans. Am. Cryst.
Assoc. 2, 29–38.
49
Parisini E., Capozzi F., Lubini P., Lamzin V. S., Luchinat C. & Sheldrick G. M. (1999). Ab initio
solution and refinement of two high potential iron protein structures at atomic resolution. Acta
Cryst. D55, 1773–1784.
Parkin, S., Moezzi, B. & Hope, H. (1995). XABS2: an empirical absorption correction program.
J. Appl. Cryst. 28, 53–56.
Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Incorporation of prior
phase information strengthens maximum-likelihood structure refinement, Acta Cryst. D54, 1285–
1294.
Pannu, N. S. & Read, R. J. (1996). Improved structure refinement through maximum likelihood,
Acta Cryst. A52, 659–668.
Perrakis, A., Morris, R. & Lamzin, V. S. (1999), Automated protein model building combined
with iterative structure refinement. Nature Struct. Biol. 6, 458–463.
Perrakis, A., Sixma, T. K., Wilson, K. S. & Lamzin, V. S. (1997). wARP: Improvement and
extension of crystallographic phases by weighted averaging of multiple-refined dummy atomic
models. Acta Cryst. D53, 448–455.
Rayment, I., Wesenberg, G., Meyer, T. E., Cusanovich, M. A. & Holden, H. M. (1992). Threedimensional structure of the high-potential iron-sulfur protein isolated from the purple
phototropic bacterium Rhodocyclus tenuis determined and refined at 1.5Å resolution. J. Mol.
Biol. 228, 672–686.
Read R. J. (1985). Improved Fourier coefficients for maps using phases from partial structures
with errors. Acta Cryst. A42, 140–149.
Refaat., L. S. & Woolfson, M. M. (1993). Direct-space methods in phase extension and phase
determination. II. Developments of low-density elimination. Acta Cryst. D49, 367–371.
Richardson, J. W. & Jacobson, R. A. (1987). Computer-aided analysis of multi-solution
Patterson superpositions. In Patterson and pattersons, edited by J. P. Glusker, B. Patterson & M.
Rossi, pp. 311–317. Oxford: I.U.Cr. & O.U.P.
Rollett, J. S. (1970). Least-squares procedures in crystal structure analysis. In Crystallographic
computing, edited by F. R. Ahmed, S. R. Hall & C. P. Huber, pp. 167–181. Copenhagen:
Munksgaard.
Sayre, D. (1974). Least-squares phase refinement. II. High-resolution phasing of a small protein.
Acta Cryst. A30, 180–184.
50
Selmer, M., Al-Karadaghi, S., Hirokawa, G., Kaji, A. & Liljas, A. (1999). Crystal Structure of
Thermotoga maritima ribosome recycling factor: A tRNA mimic. Science 286, 2349–2352.
Sheldrick, G. M. (1985). Computing aspects of crystal structure determination. J. Mol. Struct.
130, 9–16.
Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct methods for larger structures.
Acta Cryst. A46, 467–473.
Sheldrick, G. M. (1991). Tutorial on automated Patterson methods to find heavy atoms. In
Crystallographic computing 5, edited by D. Moras, A. D. Podjarny & J. C. Thierry, pp. 145–157.
Oxford: I.U.Cr. & O.U.P.
Sheldrick, G. M. (1993). Refinement of large small-molecule structures using SHELXL-92. In
Crystallographic computing 6, edited by H. D. Flack, L. Párkányi & K. Simon, pp. 111–122.
Oxford: I.U.Cr. & O.U.P.
Sheldrick, G. M. (1998a). Location of heavy atoms by automated Patterson interpretation. In
Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 131–141.
Dordrecht: Kluwer Academic Publishers.
Sheldrick, G. M. (1998b). SHELX: applications to macromolecules. In Direct methods for
solving macromolecular structures, edited by S. Fortier, pp. 401–411. Dordrecht: Kluwer
Academic Publishers.
Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of
direct methods and Patterson interpretation to high-resolution native protein data. Acta Cryst.
D49, 18–23.
Sheldrick G. M. & Gould R. O. (1995). Structure solution by iterative peaklist optimization and
tangent expansion in space group P1. Acta Cryst. B51, 423–431.
Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high resolution refinement. Methods in
Enzymology, 277, edited by C. W. Carter, Jr. & R. M. Sweet, pp. 319–343. San Diego: Academic
Press.
Shiono M. & Woolfson, M. M. (1992). Direct-space methods in phase extension and phase
determination. I. Low-density elimination. Acta Cryst. A48, 451–456.
Smith J. L. (1998). Multiwavelength anomalous diffraction in macromolecular crystallography.
In Direct Methods for Solving Macromolecular Structures. Edited by Fortier S, Dordrecht:
Kluwer Academic Publishers. pp. 211–225.
51
Stenkamp, R. E., Sieker, L. C. & Jensen, L. H. (1990). The structure of rubredoxin from
Desulfovibrio desulfuricans strain 27774 at 1.5Å resolution. Proteins, Struct. Funct. Genet. 8,
352–364.
Terwilliger, T. C. & Berendzen, J. (1996). Bayesian weighting for macromolecular
crystallographic refinement, Acta Cryst. D52, 743–748.
Trueblood, K. N. & Dunitz, J. D. (1983). Internal motion in crystals. The estimation of force
constants, frequencies and barriers from diffraction data. A feasibility study. Acta Cryst. B39,
120–133.
Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H.-J. & Sheldrick, G. M.
(1999). 1.7 Å Structure of the stabilised REIv mutant T39K. Application of local NCS restraints.
Acta Cryst. D55, 1158–1167.
Usón, I. & Sheldrick, G. M. (1999). Advances in direct methods for protein crystallography.
Curr. Opinion in Struct. Biol. 9, 643–648.
Usón I., Sheldrick G. M., de la Fortelle E., Bricogne G., di Marco S., Priestle J. P., Grütter M. G.,
& Mittl P. R. E. (1999). The 1.2Å crystal structure of hirustasin reveals the intrinsic flexibility of
a family of highly disulphide bridged inhibitors. Structure 7, 55–63.
Walsh, M. A., Dementieva, I., Evans, G., Sanishvili, R. & Joachimiak, A. (1999). Taking MAD to
the extreme: ultrafast protein structure determination. Acta Cryst. D55, 1168–1173.
Xu, H., Weeks, C. M., Deacon, A. M., Miller, R. & Hauptman, H. A. (2000). Ill-conditioned
Shake-and-Bake: the trap of the false solution. Acta Cryst. A56, 112–118.
Yeates, T. O. & Fam, B. C. (1999). Protein crystals and their evil twins. Structure 7, R25–R29.
52
8. Other useful sources of information
The following internet addresses may be consulted for programs discussed during the Workshop:
SHELX (George Sheldrick):
http://shelx.uni-ac.gwdg.de/SHELX/
PLATON (Ton Spek): http://www.cryst.chem.uu.nl/platon/
WinGX (Louis Farrugia): http://www.chem.gla.ac.uk/~louis/wingx/
XtalView (Duncan McRee): http://www.scripps.edu/pub/dem-web/toc.html
Raster3D (Ethan Merritt): http://www.bmsc.washington.edu/raster3d/
Parvati (Ethan Merritt): http://www.bmsc.washington.edu/parvati/
and other speakers’ homepages are as follows:
Bill Clegg: http://www.staff.ncl.ac.uk/w.clegg/
Regine Herbst-Irmer: http://shelx.uni-ac.gwdg.de/~rherbst/
Thomas Schneider: http://shelx.uni-ac.gwdg.de/~trs/
Dale Tronrud: http://www.uoxray.uoregon.edu/dale/welcome.html
Victor Young: http://www.chem.umn.edu/services/xraylab/
Hartmut Luecke had to withdraw as a speaker at short notice, but some of the material for his
intended talk on the perils of ignoring twinning in macromolecules can be found at:
http://anx12.bio.uci.edu/~hudel/br/twinning/
In addition, the strongly recommended books “Crystal Structure Determination” by Werner
Massa, translated into English by Robert O. Gould (Springer, 2000) and “Practical Protein
Crystallography”, 2nd Edition, by Duncan McRee (Academic Press, 1999) contain detailed
accounts of the use of SHELX (for small molecules and macromolecules, respectively).
53
Download