SHELX Workshop St. Paul ACA Meeting 22nd July 2000 Contents 1. Workshop program and aims 2. Introduction to SHELX 3. SHELXD – integrated direct and Patterson methods (beta-test) 4. Guide to SHELX for macromolecules: Phasing 5. Guide to SHELX for macromolecules: Refinement 6. Frequently asked questions (by biocrystallographers) 7. References 8. Further useful sources of information 1 1. Workshop Program The Workshop is divided into four sessions, with a discussion period after each session. Each discussion is led by a panel consisting of the session chair and the speakers for that session. A. Introduction, phasing etc. Chair: Duncan McRee 8:30 – 8:45 George Sheldrick Historical introduction to SHELX 8:45 – 9:10 George Sheldrick Dual-space ab initio direct methods in SHELXD 9:10 – 9:35 Thomas Schneider MAD phasing 9:35 – 10:00 Louis Farrugia 10:00 – 10:20 Discussion 10:20 – 10:35 Coffee/tea The WinGX user interface B. Structure refinement. Chair: Ethan Merritt 10:35 – 11:00 Dale Tronrud Introduction to refinement, solvent model 11:00 – 11:20 George Sheldrick Restraints and constraints 11:20 – 11:45 Bill Clegg Weak data, disorder and other problems in small molecules 11:45 – 12:10 Thomas Schneider Disorder in macromolecules 12:10 – 12:30 Discussion 12:30 – 13:30 Buffet lunch C. Twinning. Chair: George Sheldrick 13:30 – 13:45 Regine Herbst-Irmer Racemic twinning and the Flack parameter 13:45 – 14:10 Regine Herbst-Irmer Merohedral twins 14:10 – 14:35 Victor Young Non-merohedral twins 14:35 – 15:00 Thomas Schneider Twinning in macromolecules 15:00 – 15:20 Discussion 15:20 – 15:35 Coffee/Tea D. Errors, validation and anisotropic refinement. Chair: Bill Clegg 15:35 – 16:00 Ton Spek Small-molecule validation 16:00 – 16:20 George Sheldrick Estimation of parameter errors 16:20 – 16:45 Ethan Merritt Anisotropic refinement of macromolecules 16:45 – 17:10 Duncan McRee Validation of error estimates for metalloproteins 17:10 – 17:30 Discussion 2 1.1 Aims and organization of the Workshop Although centered on a particular program system, it is intended that the Workshop should be educational; previous experience of the SHELX programs should not be essential (though it will clearly help). Applications to small molecules and macromolecules have been mixed up as thoroughly as possible; an exchange of ideas must surely be beneficial to both groups of crystallographers. The gap between the two approaches has long since disappeared. Large small molecules that are bigger than small proteins are now being solved by molecular replacement or anomalous dispersion methods, and small proteins are being solved by direct methods. Anisotropic refinement with full-matrix estimation of standard deviations is now practicable for macromolecules that diffract to high resolution, and the techniques used to model disordered solvent in small molecule structures often now involve restraints developed first for macromolecular refinements. With these notes, we have tried to provide an introduction to the theory and application of the new structure solution program SHELXD; which is proving very useful for the ab initio solution of larger small molecules given data to atomic resolution, as well as for the location of heavier atoms or anomalous scatterers from MAD, SIR, SIRAS and SAS data at much lower resolution. We have also tried to provide a simple introduction to the SHELX system for biological crystallographers using the programs for the first time, e.g. for the refinement of proteins at high resolution or the refinement of twinned macromolecules at any resolution. No attempt has been made to deal with routine small-molecule applications since these are well covered by the existing documentation. In order to maximize the information content, the Workshop will consist of talks and discussions rather than computer demonstrations. There is a generous allocation of time for discussions and participants are encouraged to make good use of this to ask awkward questions. The SHELX Workshops in Göttingen have covered similar ground in about a week, so the program is intensive and will require good teamwork from the speakers. Computers will be available during exhibit hours at the ACA Meeting, so participants who would like to try out some of the programs on their own data should contact the appropriate speakers. A useful byproduct of the Workshop is the production of tutorials, documentation and examples that have been made generally available on the Internet (via links from the SHELX homepage at http://shelx.uni-ac.gwdg.de/SHELX/ ). 3 2. Introduction to SHELX 2.1 History The original version of SHELX consisted of about 5000 lines of FORTRAN written around 1970 for the solution and refinement of small-molecule and inorganic structures from single crystal diffraction data. Starting in 1976, this version was distributed in compressed form so that the program and test data fitted into one box of ca. 2000 punched cards. SHELX76 was restricted to 160 atoms because of computer limitations! A separate structure solution program SHELXS was released in 1986 to accommodate advances in direct methods, and in 1993 SHELXL replaced the structure refinement part of SHELX76. There was a major update of both SHELXS and SHELXL in 1997 and these are still the current versions. The SHELX97 system includes a program CIFTAB for processing CIF format files that can be used for archiving structural data, and programs SHELXPRO and SHELXWAT designed more specifically for macromolecular applications. At this Workshop a beta-test version of a new integrated Patterson and direct methods program SHELXD is being released; it is proving particularly useful in MAD phasing of macromolecules as well as for ab initio solution of structures - in the range 200-2000 unique atoms - between small and macromolecules. The SHELX system consists purely of programs that input and output text files. Several excellent graphical interfaces are available from other authors. At the Workshop three such interfaces – PLATON and WinGX for small molecules and XtalView for macromolecules - that like SHELX are available free to academics - will be introduced by their authors, SHELXTL, a commercial version of SHELX incorporating the interactive graphics programs XPREP (reciprocal space exploration) and XP (real space calculations and display), is available from Bruker-AXS. 2.2 Program organization and philosophy SHELX is written in a simple subset of FORTRAN-77 that has proved to be extremely portable. The programs SHELXS (structure solution) and SHELXL (refinement) both require only two input files: a reflection file (name.hkl) and a file (name.ins) that contains crystal data, atoms (if any), and instructions in the form of keywords followed by free-format numbers, etc. These programs write a listing file name.lst and a file, name.res, that can be renamed or edited to name.ins for the next refinement. The common first part of the filename is read from the command line by typing, e.g., ‘shelxl name’. The programs are executed independently without the use of any hidden files, environment variables, etc. 4 The programs are general for all space-groups in conventional settings or otherwise and make extensive use of default settings to keep user input and confusion to a minimum. Particular care has been taken to test the programs thoroughly on as many computer systems and crystallographic problems as possible before they were released, a process that often requires several years! 2.3 Distribution of the programs The programs are provided as sources as well as precompiled executables for common computer systems, and may be downloaded by ftp or using a browser (CDROMs are also available). The programs are free of charge for academics but a modest license fee (currently $2499) is required for for-profit institutions. This license covers the use of the programs for an unlimited time on an unlimited number of computers at one geographical location. This fee is necessary to cover the costs of distribution and support for all users, we do not make a profit but the university requires us to cover our costs. When there is a major new release a new license fee is required for the new version. There will be no additional license fee for the beta-test of SHELXD, but the final version of this program will be released at the same time as the next major SHELX update in 2001 or 2002 and so will require a license fee. To encourage for-profit users to switch to the new version and to prevent a bug-ridden version remaining in circulation, the beta-test is provided in compiled form only and has a built-in expiry date. The final version will be made available as usual in source form without an expiry date. All users are required to fill in and sign an application form before they are given the password for downloading the programs from the SHELX ftp server; this form may be printed from the SHELX homepage. 2.4 Documentation and support Information about new developments in the SHELX programs, workshops, related programs, frequently asked questions and other sources of information are posted on the SHELX homepage at: http://shelx.uni-ac.gwdg.de/SHELX/ which should be checked at regular intervals. A detailed SHELX manual may be downloaded from the SHELX ftp server in Microsoft Word or in Postscript format. This was written with small molecule users in mind and contains a full explanation of the test structures that are provided with the programs. Since macromolecular users may be unfamiliar with these examples these notes include a separate guide for macromolecular Workshop participants. The author is happy to answer questions (email only please, gsheldr@shelx.uni-ac.gwdg.de) provided that the questions are not in the lists of ‘frequently asked questions’! 5 3. SHELXD – integrated Patterson and direct methods (beta-test) 3.1 Introduction Although the solution of the crystallographic phase problem is proving more elusive than Fermat’s last theorem, in practice the large majority of small molecule structures are solved in minutes (or even seconds) by conventional direct methods. However the phase probability distributions on which these methods are based become weaker as the number of atoms increases, and few structures with more than about 200 unique equal atoms have been solved in this way. After more than a decade in which little progress was made in solving larger structures, the introduction of the dual-space (also known as Shake & Bake) philosophy by the Buffalo group (Miller et al., 1993) proved to be a significant improvement, increasing the size of structure that could be solved by nearly an order of magnitude (Figure 3.1). Figure 3.1 A general view of dual-space direct methods. The phase refinement (in reciprocal space) is usually performed using the tangent formula (Karle & Hauptman, 1956) or minimal function (Miller et al., 1993); the atomic model in real space may simply involve picking the highest N peaks or may be more sophisticated. This procedure, which was implemented in the computer programs SnB (Miller et al., 1994) and later in SHELXD, was of necessity based on the strongest normalized structure factors E, corresponding typically to the largest 15 to 20% of the observed structure factors F in each resolution shell, because the probability formulae only provide significant phase information for the strongest E-values. The number of unique non-hydrogen atoms N is assumed to be 6 approximately known. The dual-space recycling is typically performed for several hundred or more sets of N random starting atoms, with typically 2N cycles for each. In SHELXD, potential solutions are identified by high values of the correlation coefficient CC (Fujinaga & Read, 1987): CC=100[wEo2Ec2•w–wEo2•wEc2]/{[wEo4•w–(wEo2)2]•[wEc4•w–(wEc2)2]}½ These potential solutions can be improved and extended by means of peaklist optimization (Sheldrick & Gould, 1995) that finds the set of potential atoms that maximizes CC for all reflections. The structure solution, as monitored by the mean phase error, tends to happen quite suddenly over a small number of cycles. Although there is little indication of an impending solution, a single dominant peak in real space typically indicates that the phase refinement is locked in a false minimum (Xu et al., 2000). 3.2 Random omit maps In the course of testing SHELXD, it was discovered by accident that a very effective procedure is to leave out about 30% of the peaks at random when calculating phases for the next cycle. In retrospect it is possible to understand why this is an effective search strategy, by analogy with the omit maps frequently used in macromolecular crystallography. If the deleted atoms are part of an essentially correct solution, they will probably be regenerated; if not, they will be replaced by different, and possibly better, potential atoms. The effectiveness of this random omit procedure is illustrated in Figure 3.2 using gramicidin A (NS = 317; P212121) as a test structure; gramicidin A was probably the most difficult structure solved by conventional direct methods (Langs, 1988). At least for this structure, the most effective approach involved the combination of the tangent formula in reciprocal space with random omit maps in real space; other attempts at modifying the peak list were much less successful. Note that line (c) corresponds to the original Shake & Bake procedure. A surprising observation in Figure 3.2 is that the combination (d) [no phase refinement/random omit] is able to solve this structure (albeit less efficiently) although no phase probability relations have been employed! This is important because, unlike the random omit maps, the probabilities become weaker as the structure becomes larger. This provides the important clue that for much larger structures, it might be more efficient to discard the probabilistic approach to direct methods completely! 7 Figure 3.2 Percentage of correct solutions P against cycle number for gramicidin A using various combinations of phase refinement and real space processing: (a) tangent / random omit; (b) minimal function / random omit; (c) minimal function / top N Peaks; (d) no phase refinement / random omit. In the random omit procedure, the highest N peaks were found and 30% of them omitted at random. It should be emphasized that direct methods are almost entirely a phase searching problem; phase refinement plays a minor role. There are much better ways of refining phases than the tangent formula or the minimal function. For example, Sayre (1974) showed that it was possible to refine the phases of the small protein rubredoxin with 1.5Å data by a least-squares fit to his squaring equation: Fh = Qh k Fk Fh-k Qh is a constant, assuming equal atoms and equal isotropic displacement parameters. This equation equates amplitudes as well as phases, compared with just phases in the case of the tangent formula, and is equally valid for large and small structures, whereas probability formulas become weaker as the size of the structure increases. On the other hand the use of all the data rather than just a small subset of the strongest E-magnitudes probably makes it less suitable for searching phase space. 8 Table 1. Some previously unsolved structures first solved using SHELXD. SG = space group, N is the number of unique non-hydrogen atoms excluding solvent, NS the number including solvent atoms. HA lists the unique atoms heavier than oxygen, if any, and dmin is the limiting resolution to which data were processed. Compound SG N NS HA dmin(Å) Hirustasin P43212 402 467 10S 1.20 Cyclodextrin P21 448 467 1.00 Cyclodextrin P1 483 562 0.88 Decaplanin P21 448 635 Amylose CA26 P1 624 771 Mersacidin P32 750 826 24S 1.04 rc-WT Cv HiPIP P212121 1264 1599 8Fe 1.20 Cytochrome c3 P31 2024 2208 8Fe 1.20 4Cl 1.00 1.10 3.3 Application to unknown structures The best demonstration of the power of a new method is its ability to solve previously unsolved structures; Table 3.1 shows some examples of this. It should be noted that the presence of heavier atoms definitely improves the chances of success, and reduces the computer time needed per solution, but is not essential. It should also be noted that these successes are limited to structures for which data were available to atomic resolution (ca. 1.2 Å) or better. The only exception is hirustasin (Usón et al., 1999) which could be solved using either the 1.2 Å (low temperature) or the 1.4 Å (room temperature) data, even if the data were truncated to 1.55 Å. 3.4 Integration with other approaches The extension of these algorithms to lower resolution and to larger structures is the subject of intensive current research by the groups in Bari, Buffalo, Göttingen and York. An obvious extension is to search for small groups of atoms (e.g. the five atoms of a peptide group) rather than for individual atoms, but unfortunately this is very computer time intensive. Peak-picking is after all an extreme form of density modification, and the low density elimination procedure of 9 Woolfson and co-workers (Shiono & Woolfson, 1992; Refaat & Woolfson, 1993) may provide a good compromise between peak picking and techniques normally applied to improve maps at lower resolution. Solvent boundaries have apparently not yet been included in direct methods programs. A promising (but complicated) alternative would be to incorporate the wARP approach (Perrakis, Lamzin et al., 1997, 1999) of refining the positions and B-values of potential atoms, adding new atoms that correspond to high difference density and make chemical sense. The peak positions from dual space direct methods are relatively precise, and simply refining B-values against all data can significantly improve map interpretability (Usón et al., 1999; Parisini et al., 1999), as shown in Figure 3.3: Figure 3.3 (a) Part of the electron density map produced by dual-space recycling followed by peaklist optimisation in the ab initio solution of a HiPIP protein (Parisini et al., 1999) and (b) The same region of a sigma-A weighted 2mFo-DFc map (Read, 1985) after B-value refinement. Although the atom positions were held fixed in this refinement, density appears at the sites of the missing atoms. 3.5 Direct methods for the location of anomalous scatterers In principle the MAD approach (Hendrickson, 1991; Smith, 1998) in which data are collected at two or more wavelengths for which the f‘ and f“ anomalous scattering factors are non-zero for at least one of the elements present, determines experimental phases directly. There is however a hidden phase problem: it is still necessary to find the positions of the anomalous scatterers in order to calculate reference phases. Without theses reference phases the protein phases cannot be found. Although conventional direct methods and Patterson interpretation programs such as SHELXS-97 can be misused to find a small number of anomalous scatterers from the MAD 10 estimates of the structure factors for these atoms alone (FA) or from the SAS (sometimes referred to as SAD) anomalous differences F = F+–F-, the number of sites that can be found in this way is limited to about 20. The main problem is that the data are noisy since they are based on differences of observed structure factors; the best antidote is to collect highly redundant data. On the other hand the resolution and completeness of the FA data are not critical; 3.5 Å is adequate, since the anomalous atoms are more than 3.5 Å apart, and the problem is still highly overdetermined. Although higher resolution and completeness are not required to find the anomalous scatterers, they do have a major influence on the quality of the resulting electron density maps (Brodersen et al., 2000). Before attempting to use MAD or SAS data to locate the anomalous scatterers, a critical decision is to which resolution the data should be truncated. If data are used to a higher resolution than there is significant dispersive and anomalous information, the effect will be to add noise. Since direct methods are based on normalised structure factors, which emphasise the high resolution data, they are particularly sensitive to this. Since there is some anomalous signal at all the wavelengths in the MAD experiment, a good test is to calculate the correlation coefficient between the signed anomalous differences F at different wavelengths as a function of the resolution. A good general rule is to truncate the data where this correlation coefficient falls below about 25 to 30%. Table 3.2 illustrates three very different cases. This procedure can also indicate if there is a problem with the wavelength. For SAS data collected at a single wavelength it is still possible to use the correlation coefficient between the anomalous differences collected from two crystals, or from one crystal in two orientations, before merging the two data-sets. Table 3.2. Correlation coefficients expressed as percentages between the high energy remote data and the two or three other wavelengths collected in MAD experiments on three different proteins. In (a) the high values involving the peak (pk) and inflection point (ip) data show that it is not necessary to truncate the data, there is significant MAD information up to the highest resolution collected. A poorer correlation would be expected with the low energy remote data (lrm) which has a much smaller anomalous signal. In (b) it would be advisable to truncate the data to about 3.9Å (which indeed led to a successful solution using SHELXD). (c) is clearly hopeless and in fact could not be solved. (a) Apical Domain (Walsh et al., 1999) 1 x (3 Se-Met in 144aa) C2221 pk ip lrm Inf - 8.0 - 6.0 - 5.0 - 4.0 - 3.6 - 3.4 - 3.2 - 3.0 - 2.8 - 2.6 - 2.4 - 2.2 91.2 93.9 93.9 89.6 88.6 89.4 89.4 83.9 76.9 65.7 57.0 44.8 89.7 90.0 87.0 84.4 79.8 78.9 79.4 74.7 71.1 54.3 47.2 39.2 48.5 52.8 52.9 38.0 28.4 34.6 14.2 21.1 24.7 9.1 5.4 -3.7 11 (b) RRF (Selmer et al., 1999) 1 x (4 Se-Met in 185aa) P43212 pk ip Inf - 8.0 - 6.0 - 5.0 - 4.6 - 4.4 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 69.3 73.1 62.2 56.9 49.6 45.6 48.6 29.6 20.6 24.6 20.1 14.2 59.4 58.3 41.9 43.3 40.7 50.4 34.6 24.7 17.5 16.6 8.1 3.9 (c) Unknown Protein 4 x (4 Se-Met in 350aa) P21 Inf - 8.0 - 6.0 - 5.0 - 4.6 - 4.4 - 4.2 - 4.0 - 3.8 - 3.6 - 3.4 - 3.2 - 3.0 pk 33.2 29.5 19.9 10.6 7.7 17.4 7.6 9.8 9.3 13.4 6.0 2.8 ip 37.6 38.9 37.8 26.5 13.5 24.0 14.2 27.3 25.9 23.1 24.3 22.8 3.6 Integration of direct and Patterson methods The original dual-space algorithm is an effective way of locating a specified number of anomalous scatterers from MAD Fa or SAS F data. The efficiency can however be improved by at least an order of magnitude by using starting atoms consistent with the Patterson function rather than random starting atoms. In addition, the Patterson provides a reliable relative (but not absolute) indication of the correctness of the solution and also of which atoms are probably correct. The location of possible starting atoms makes extensive use of a special form of the Patterson minimum function (PMF) which is calculated as follows (Nordman, 1966). Place two atoms in a unit-cell and generate all their symmetry equivalents. Look up the Patterson function values corresponding to the unique vectors between all these atoms and sort them in ascending order, then find the mean value of the lowest (say) 30% of the values in this list. Since it is unlikely that this PMF will have a high value for wrong atom positions, especially when the symmetry is high and there are many vectors, it may be used as a criterion for a translational search for a two-atom fragment. Each strong general Patterson peak is in principle a suitable two-atom ‘fragment’ for this translational search, because it may well correspond to a vector between two heavy atoms! Since we are only interested in generating many different sets of atom co-ordinates consistent with the Patterson function, there is no need to determine the global maximum PMF, indeed often this does not give good starting atoms for the dual-space recycling. A simple and effective approach is to try a fixed number (usually in the range 10000 to 99999) of random translations for a vector, and retain the one with the highest PMF. A random selection of vectors from the Patterson peaklist (excluding Harker peaks), biased so that the high peaks are chosen more often, is an effective way to pick the two-atom search fragment. If the atoms are expected to have lower than average 12 B-values, as is the case for iron atoms in heme groups or iron-sulfur clusters, it is advantageous to sharpen the Patterson, e.g. by using coefficients (E3F)½ rather than F2. Table 3.3 Application of integrated Patterson / direct methods to the location of the anomalous scatterers from MAD data. In the number of sites column, the first number is the number found and the second is the total number that should be present. aa = amino acid, SG = space group, dmin is the limiting resolution to which the data were processed, and Soln./hr is the number of solutions per hour on a 500MHz pentium PC. PAT-ratio is the speedup obtained by using starting atoms consistent with the Patterson function. Protein No. of sites No. of aa SG dmin[Å] CC[%] PAT-ratio Soln/hr Api-dm 3/3 Se 144 C2221 2.2 45 16 256 RRF 3/4 Se 185 P43212 4.0 60 1.4 283 ModE 6/6 Se 524 P21212 3.0 66 7.3 163 9hem 18/18 Fe 584 P21 2.9 73 4.0 240 X1 32/32 Se ~1600 C2 3.5 49 5.1 9 Cyanase 40/40 Se 1560 P1 2.4 57 0.95 66 X2 51/60 Se ~1500 P21 2.5 52 2.8 13 X3 66/66 Se 2160 P21 2.6 60 12.5 24 Before the first dual-space cycle, the two starting atoms need to be extended to N atoms. A difference Fourier synthesis would be effective for a small number of heavy atoms, but a better technique for a large number is to calculate a full-symmetry Patterson superposition minimum function (PSMF) (Buerger, 1959). First all symmetry equivalents are generated for the two starting atoms. Each pixel of the PSMF map is assigned a value equal to the PMF for all vectors between these atoms and a dummy atom placed at the pixel. Peaks are then obtained by map interpolation and sorted in the usual way. By applying this procedure before each run through the dual-space recycling, it is possible to generate an unlimited number of different sets of starting atoms, all more or less consistent with the Patterson function. Our tests have shown that this combination of direct and Patterson methods produces more complete and precise solutions than just using the Patterson methods 13 alone. It appears that iterative Patterson-only procedures suffer from an accumulation of atomic co-ordinate errors each time a new atom is added. Because it includes phase refinement, the dualspace approach does not suffer from this degradation as the number of atoms increases. Table 3.3 shown some results using this integrated Patterson / dual space recycling procedure on typical MAD problems; note the efficiency in terms of solutions per hour and the completeness of the solutions. Table 3.4 Crossword table for location of the 8 iron atoms (two Fe4S4 clusters) in a HiPIP from SAS F-data collected with Cu-K radiation (Rayment et al., 1992). Each entry in the table links the atom forming the row with the atom forming the column, the top number of each pair is the minimum distance between the two atoms, taking symmetry into account, and the bottom number is the corresponding PMF. It is easy to find the two clusters by looking for Fe…Fe distances of about 2.8Å, and – despite the weakness of the anomalous signal – the PMF values for the 8 correct atoms are in general higher than those involving spurious atoms. Peak x y z self cross-vectors 99.9 0.9201 0.0784 0.1133 27.7 26.6 88.4 0.9719 0.1047 0.1356 27.4 39.7 2.4 5.5 85.5 0.9043 0.1258 0.0884 27.7 27.3 2.6 23.3 82.7 0.9546 0.0950 0.0503 26.7 15.2 2.3 2.5 2.7 28.4 43.5 26.4 81.1 0.3542 0.5285 0.2615 31.2 20.9 14.6 16.6 14.4 14.6 41.4 14.8 9.5 21.5 80.5 0.4316 0.5144 0.2451 30.0 25.5 16.5 18.7 16.4 16.8 24.6 20.0 21.2 8.9 80.4 0.3942 0.5575 0.1995 29.6 0.0 14.4 16.4 13.9 14.6 2.7 2.9 31.4 7.7 22.6 33.8 26.6 19.4 73.9 0.3920 0.5023 0.1694 29.1 26.1 14.3 16.6 14.5 14.8 3.2 22.3 16.0 24.5 18.3 10.9 3.0 5.5 3.0 0.0 2.6 3.0 0.0 17.5 -------------------------------------------------------------------63.8 0.4025 0.4641 0.2218 29.9 18.4 58.9 0.9655 0.0517 0.0945 26.9 45.9 16.1 18.4 16.4 16.5 17.0 13.1 0.0 4.5 2.2 3.0 7.3 15.8 14 4.5 7.8 4.0 0.0 2.9 5.4 5.0 0.0 2.6 15.2 17.3 15.4 5.3 0.0 0.0 6.1 The Patterson superposition function is also the basis of the above crossword table that provides a convenient way to assess which of the heavy atom sites are correct, and also in some cases to recognize the presence of non-crystallographic symmetry. In this tables the rows and columns correspond to the potential atoms. For each pair of atoms the top number is the minimum distance between them, taking the space group symmetry into account, and the bottom number is the PMF calculated from all vectors between the two atoms, also taking symmetry into account. The first vertical column is based on the self-vectors, i.e. between one atom and its symmetry equivalents. In general wrong sites can be recognized in this table by the presence of several zero PMF values (negative values are replaced by zero). The mean PMF value for a specified number of atoms provides a figure of merit PATFOM, useful for selecting the best solution, though the absolute value depends on the structure in question. Almost always the correct solution has the largest CC and the largest PATFOM.; this was the case for all the examples in Table 3.3. Table 3.4 illustrates a typical crossword table. It is fortuitous that all four iron atoms in one cluster appear before those in the other in this table, but one of the two independent molecules did indeed have higher B-values than the other. 3.7 The .ins file instructions for SHELXD SHELXD expects ONE and only one source of starting atoms. This can take the form: A: Input atoms in normal SHELX format for expansion using PLOP B: PATS to generate ‘slightly better than random’ atoms consistent with the Patterson C: GROP and a PDB-format model fragment D: Random atoms (used if none of the above apply) The reflection data consists of an .hkl file containing F2 (HKLF 4) or F-values (HKLF 3). These may correspond to either native data for ab initio structure solution or structure expansion, or MAD, SAD, SIR or SIRAS FA or F values for heavy or anomalous atom location. Dual-space recycling, using the largest E-values (FIND) is followed by peaklist optimization (PLOP); one or both of these commands must be present. In the case of structure expansion only PLOP can be used and the program then stops. When the starting atoms are generated randomly or by PATS or GROP, the calculations are repeated for a new set of starting atoms. The total number of such tries may be specified with NTRY, otherwise the program runs for ever; however when the job is running the calculation may be terminated at the end of the current try by creating a name.fin file in the current working directory. 15 In the following examples, TITL...UNIT in the normal SHELX format is assumed at the start of the .ins file and HKLF 4 (or 3) followed by END at the end of the file. The cell contents defined by SFAC and UNIT are only used by PLOP; in the FIND stage the atoms are assumed to be of the same type but with occupancies proportional to the square root of the peak height. 1. To solve an approximately equal-atom structure using native data to atomic resolution (1.2Å or better) the middle of the .ins file (between UNIT and HKLF) might be as follows (for 500 unique non-hydrogen atoms): FIND 400 PLOP 500 600 2. To solve the same structure by first locating a disulfide bond (PATS with a super-sharp Patterson) then expanding to the complete structure (FIND/PLOP): PATS PSMF FIND PLOP –2.06 -4 400 500 600 3. To locate 30 selenium atoms from MAD data: PATS FIND 30 MIND -3.5 [the .hkl file could contain h, k, l, FA and (FA) in FORMAT(3I4,2F8.2)]. 4. To solve a cyclodextrin structure with four beta-cyclodextrins in the asymmetric unit and with data barely to atomic resolution, the following could be tried: GROP -1.8 FIND 240 PLOP 320 400 GEOM 4 ATOM 1 C41 ATOM 2 C31 ATOM 3 C21 .... diglucose ATOM 21 C52 ATOM 22 O52 MOL 1 -3.859 MOL 1 -5.081 MOL 1 -5.211 fragment in PDB format MOL 1 -0.292 MOL 1 -0.642 16 4.863 7.904 1.000 4.209 8.524 1.000 2.740 8.155 1.000 (see test provided) 4.714 7.025 1.000 5.837 6.253 1.000 10.00 10.00 10.00 ... 10.00 10.00 SHELXD is started with the command line: shelxd name and expects to find both input files name.ins and name.hkl in the current directory. It writes a summary to the current window (standard output) and creates the files name.lst (more extensive listing file) and name.res (SHELX format atoms, crystal coordinates). The following instructions may be included in the .ins file. Default values are given in square brackets; the # sign indicates that the default depends on other instructions: TITL, CELL, ZERR, LATT, SYMM, SFAC and UNIT as usual (see the SHELX manual). TRIC (or TRIK) Flags expansion to non-centrosymmetric triclinic. SHEL dmax [infinity], dmin [0] Resolution limits in Å for all calculations. NTRY ntry [0] Number of global tries if starting from random atoms, PATS or GROP. If ntry is zero or absent, the program runs until it is interrupted by writing a name.fin file in the current working directory. PATS +np or –dis [100], npt [#], nf [5] Calculates and stores Patterson. Using top np peaks or a random orientation vector of length |dis|, tries npt random translations, selecting the one with the best Patterson minimum function PMF (see PSMF). When selecting a vector from the list of unique Patterson peaks, special vectors are ignored and the highest vector is chosen from nf random selections. This favors the highest peaks but (if nf is not too large) also allows lower peaks a chance. For example, with the default np = 100 and nf = 5, the chance is 39.5% that one of the first 10 vectors will be chosen and 91.9% that one of the first 50 will be chosen. If the first parameter is negative, nf random oriented vectors of length |dis| are compared on the basis of their heights in the Patterson and the 'best' used for the translation search. 17 If PATS is used together with a second FIND parameter ncy greater than zero (or FIND followed by only one number) a full-symmetry Patterson superposition minimum function (i.e. a superposition based on the two peaks and all their symmetry equivalents) is used to locate the atoms in the first FIND cycle. PATS and GROP are mutually exclusive. GROP +ZZ or -Egr [0], +/-ngt [99], nor [99], ntr [9999] 6D Patterson search for small rigid group. If the first parameter is positive, the search is performed using the Patterson minimum function PMF (see PSMF), using interatomic vectors for which the product of the two atomic numbers is greater than ZZ. For each of |ngt| attempts, nor random orientations are generated. The orientation with the best PMF (based on intramolecular vectors only) for each attempt is subject to ntr translations. The solution with the best PMF in the translational search (using both intra- and intermolecular vectors) in all the |ngt| attempts is used to generate the starting atoms for the next stage (usually FIND). If the first parameter is negative, an analogous procedure is employed but the function maximized is the sum of Ec2(Eo2–1) for reflections with E > |Egr| and resolution d > dlim (see ESEL). If the second parameter ngt is negative, the above procedure is used for the rotation and translation search, but then a correlation coefficient (CC20) between Eo2 and Ec2 is calculated for each 'best' rotation/translation combination using 20% of all reflections up to the limiting resolution of dlim (20% rather than 100% is used to speed up the calculation). Thus one CC20 value is calculated for each of the |ngt| attempts. The solution with the highest CC provides the starting atoms for the next stage. This is a slower but almost always better than the other criteria. The search model is read from PDB-format ATOM or HETATM records in the .ins file. All other PDB records should be removed. The atomic number is deduced from the atom name applying PDB rules. The PMF search is recommended for searching for a heavy-atom cluster (e.g. from SAS or MAD data) whereas the (slower) structure-factor based search is suitable for equal-atom fragments such as a short piece of alpha-helix (for solving small proteins) or a diglucose fragment (for solving cyclodextrins). PSMF pres [4.0], psfac [0.34] pres is the resolution of the Patterson in terms of minimum ratio of the number of grid points along an axis and the maximum reflection index along that axis. If nres is negative a 'super-sharp' Patterson with coefficients (E3F) is calculated, otherwise a normal F2 Patterson is used. psfac is the fraction of the lowest values in the sorted list of Patterson heights that is summed to get the PMF. 18 FRES res [3.0] Resolution of all Fourier syntheses (including the PSMF but excluding the Patterson itself) in terms of the minimum ratio of the number of grid points along an axis and the maximum reflection index used along that axis. ESEL Emin [#], dlim [1.0] Minimum E and high-resolution limit for FIND and TANG. The E2 values are normalized to 1 in resolution shells, then smoothed. Emin defaults to 1.2 for ab initio structure solution and to 1.5 for heavy atom location (the absolute value of the first MIND parameter is used to distinguish between these two cases depending on whether it is less than 1.6 or not). FIND na [0], ncy [#] Search for na atoms in ncy internal loop cycles (tangent formula + E-Fourier). ncy defaults to 20 (for heavy-atom location) or the maximum of 20 or na (for ab initio direct methods, distinguished using MIND mdis). The highest na / ( 1 – fr ) peaks are selected, where fr is the WEED parameter. The effect is that approximately na peaks remain after the random omit procedure (WEED). The occupancy is made proportional to square root of peak height in the FIND stage. TANG ftan [0.9], fex [0.4] Fraction ftan of the ncy FIND cycles are performed using the tangent formula, the rest using a Sim-weighted E-map. fex is the fraction of reflections with the largest Ecalc values to hold fixed when doing tangent expansion to find the remaining phases. NTPR ntpr [100] Maximum number of (largest) TPR per reflection; negative for output of mean phase errors (if phases were input). MIND mdis [1.0], mdeq [2.2] |mdis| is the shortest distance allowed between atoms for PATS and FIND. If mdis is negative PATFOM is calculated, and the crossword table for the best PATFOM value so far is output to the .lst file. In this case the solution is passed on to the PLOP stage if either the CC is the best so far or the PATFOM is the best so far. mdeq is the minimum distance between symmetry equivalents for FIND (for PATS the |mdis| distance is used). Thus the default setting of mdeq 19 prevents FIND from placing atoms on special positions. This is usually desirable because it helps to avoid pseudo-solutions such as the 'uraninum atom solution' that are incorrect but fit the tangent formula, but it might be better to change this setting to -0.1 to allow special positions when looking for e.g. metal ions. For PLOP the PREJ instruction can be used to control whether peaks on special positions are selected. Note also that a |mdis| threshold of 1.6A is used to decide between all-atom ab initio and heavy atom location for the purpose of setting various defaults for other parameters. SKIP min2 [0.5] During FIND, if the second peak height is less than min2 times the first, the first peak is rejected (before applying WEED to reject other peaks). This is sometimes useful to suppress 'uranium atom' solutions. In fact, for large equal-atom structures in space group P1 it is a good idea to specify ‘SKIP 0.999’ so that the first peak is ALWAYS rejected! WEED fr [0.3] Randomly OMIT fraction fr of atoms in FIND stage (except in the last cycle). Does not apply to PLOP. GEOM ngm [0], ndwt [1.0], nha [0], d13 [2.45], dd [0.3] After the peaksearch in the FIND and PLOP routines, ngm cycles (typically 2 to 5) of geometry optimization are performed so that distances within dd of d13 are brought closer to d13. In addition, all peak heights after the highest nha (heavy atoms) are multiplied by ndwt (typically 0.7; 1.0 for no action) if the peaks have no other atoms or peaks within the distance range (d13+dd) to (d13–dd). This instruction is an attempt to build in a little chemical information and it is hoped that it will enable the resolution requirement to be relaxed a little. TEST Ccmin [#], delCC [#] Go on to PLOP if CC > CCmin or CC is within delCC of best CC value so far. CCmin is reduced by 0.1% each cycle until a solution passes this test. The defaults are 45 and 1 resp. for ab initio solutions, and 10 and 5 resp. for heavy atom location (MIND mdis test). KEEP nh [0] Number of (heavy) atoms to retain during PLOP expansion. 20 PLOP followed by up to 10 numbers PLOP specifies the number of peaks to start with in each cycle of the 'peaklist optimization' algorithm of Sheldrick & Gould (1995). Peaks are then eliminated one at a time until either the correlation coefficient cannot be increased any more or 50% of the peaks have been eliminated. PREJ maxb [3], dsp [-0.01], mf [1] maxb is the maximum number of bonds to atoms or higher peaks, the peak is deleted if there are more. Peaks are also deleted if they are less than dsp from their equivalents (PLOP only, FIND uses second MIND parameter), do not output atoms to final .res file if less than mf atoms in 'molecule'. SEED nrand [0] Set random number seed so that exactly the same results are generated if the job is repeated; each integer nrand defines a different sequence of random numbers. If nrand is omitted or zero, the seed is randomized so a different sequence is always generated.. MOVE dx [0], dy [0], dz [0], sign [1] Shift following coordinates (not ATOM/HETATM). ATOM and HETATM PDB format atoms for GROP HKLF m m = 4 for F2 in .hkl file, m = 3 for F (or FA or F) END 21 4. A guide to SHELX for macromolecules: Phasing 4.1 Introduction Since small-molecule direct methods and Patterson interpretation algorithms can be used to locate a small number of heavy atoms or anomalous scatterers, SHELXS has been used by macromolecular crystallographers for a number of years, and SHELXD is designed both for the ab initio solution of small proteins given data to atomic resolution (1.2Å or better) and for the location of the anomalous scatterers from MAD or SAS (also known as SAD or OAS) data. 4.2 Heavy atom location using SHELXS and SHELXD One might expect that a small-molecule direct methods program, such as SHELXS (Sheldrick, 1990), that routinely solves structures with 20-100 unique atoms in a few minutes or even seconds of computer time, would have no difficulty in locating a handful of heavy atom sites from isomorphous or anomalous F data. However, such data can be very noisy, and a single seriously aberrant reflection can invalidate a large number of probabilistic phase relations. The most important direct methods formula is still the tangent formula of Karle & Hauptman (1956); most modern direct methods programs (e.g., Busetta et al., 1980; Debaerdemaeker, Tate & Woolfson, 1985; Sheldrick, 1990) use versions of the tangent formula that have been modified to incorporate information from weak reflections as well as strong reflections, which helps to avoid pseudo-solutions with translationally displaced molecules or a single dominant peak (the socalled uranium atom solution). Isomorphous and anomalous F values represent lower limits on the structure factors for the heavy atom substructure and so do not give reliable estimates of weak reflections; thus, the improvements introduced into direct methods by the introduction of the weak reflections are largely irrelevant when they are applied to F data. This does not apply when FA values are derived from a MAD experiment, since these are true estimates of the heavy atom structure factors; however, aberrant large and small FA estimates are difficult to avoid and often upset the phase determination process. A further problem in applying direct methods to F data is that it is not always clear what the effective number of atoms in the cell should be for use in the probability formulas, especially when it is not known in advance how many heavy atom sites are present. 4.3 The Patterson interpretation algorithm in SHELXS Space-group general automatic Patterson interpretation was introduced in the program SHELXS86 (Sheldrick, 1985); completely different algorithms are employed in the current version of 22 SHELXS, based on the Patterson superposition minimum function (Buerger, 1959, 1964: Richardson & Jacobson, 1987; Sheldrick, 1991, 1998a; Sheldrick et al., 1993). The algorithm used in SHELXS-97 is as follows: 1. A single Patterson peak, v, is selected automatically (or input by the user) and used as a superposition vector. A sharpened Patterson (with coefficients (E3F) instead of F2, where E is a normalized structure factor) is calculated twice, once with the origin shifted to –v/2 and once with the origin shifted to +v/2. At each grid point, the minimum of the two Patterson values is stored, and this superposition minimum function is searched for peaks. If a true single-weight heavy atom-to-heavy atom vector has been chosen as the superposition vector, this function should consist ideally of one image of the heavy-atom structure and one inverted image, with two atoms (the ones corresponding to the superposition vector) in common. There are thus about 2N peaks in the map, compared with N2 in the original Patterson, a considerable simplification. The only symmetry element of the superposition function is the inversion center at the origin relating the two images. 2. Possible origin shifts are found so that the full space-group symmetry is obeyed by one of the two images, i.e., for about half the peaks, most of the symmetry equivalents are present in the map. This enables the peaks belonging to the other image to be eliminated and, in principle, solves the heavy-atom substructure. In the space-group P1, the double image cannot be resolved in this way. 3. For each plausible origin shift, the potential atoms are displayed as a triangular table that gives the minimum distance and the Patterson superposition minimum function value for all vectors linking each pair of atoms, taking all symmetry equivalents into account. This ‘crossword’ table enables spurious atoms to be eliminated and occupancies to be estimated and also in some cases reveals the presence of non-crystallographic symmetry. 4. The whole procedure is then repeated for further superposition vectors as required. The program gives preference to general vectors (multiple vectors will lead to multiple images), and it is advisable to specify a minimum distance of (say) 8Å for the superposition vector (3.5Å for selenomethionine MAD data) to increase the chance of finding a true heavy atom-to-heavy atom vector. 4.4 Examples of heavy-atom location with SHELXS First we will consider a very straightforward example, using SIR F-data for the protein barnase, that is provided as a test job for SHELXS-97. SHELXS (and SHELXL and SHELXD) requires 23 two standard text input files that in this case are called barnase.ins and barnase.hkl. The .ins file contains the crystal data (cell, space group and contents) followed by specific instructions; the .hkl file is simply a list of h, k, l, F and (F) in fixed format. In this particular case the very short .ins file was created by hand using a text editor and the .hkl file was output by the CCP4 program mtz2various, but there a number of possible graphical interfaces (e.g. the XPREP program in the Bruker SHELXTL system) that could have set up both files with a minimum of user effort. The barnase.ins file takes the following form: TITL REM REM CELL ZERR LATT SYMM SYMM SFAC UNIT PATT HKLF END Barnase Au del(F) in P3(2) Isomorphous delta-F data for Au derivative (3 sites) kindly donated by Eleanor Dodson, University of York, UK 1.54178 58.970 58.970 81.580 90.00 90.00 120.00 1.00 0.008 0.008 0.016 0.00 0.00 0.00 -1 -Y, X-Y, .66667+Z -X+Y, -X, .33333+Z N AU 200 9 ! fudge unit-cell contents for delta-F data 2 ! PATT –4 or PATT –10 for more difficult cases 3 It will be seen that SHELX instructions consist of a four letter keyword followed by further information in free format on the same line. In the days of punched-card input this was revolutionary, now it looks antique but is still practicable and easy to transfer between different operating systems etc. Comments can be added with the keyword ‘REM’ or may follow ‘!’ on a single line. Some of the information here (e.g. the wavelength and cell esds) will not be used by SHELXS but is required for consistency with SHELXL. ‘LATT –1’ specifies a noncentrosymmetric primitive lattice and is followed by two of the three symmetry operators that define the space group P32 (the operator X, Y, Z is omitted since it is common to all space groups). The last operator could also have been input as ‘SYMM y-x, -x, 1/3+z’; in general lower and upper case letters are equivalent. The SFAC and UNIT instructions define the cell contents, but for heavy-atom location from F-data the contents need to be fudged (square root of the number of light atoms followed by the expected number of heavy atoms in the cell); this is only important for direct methods. HKLF 3 is used to flag F rather than (F)2 (HKLF 4). ‘PATT 2’ specifies Patterson solution attempts using two different superposition vectors; for a difficult problem ‘PATT –10’ (10 trial vectors, the minus sign reduces the thresholds so that more peaks are tested etc., i.e. the program tries harder) would be more apropriate. Note that only this line needs to be changed (to TREF for an easy problem or TREF 5000 for a difficult one) to use the SHELXS direct methods to find the gold sites. In both cases the resulting heavy atom 24 positions are written to the file barnase.res; in the PATT case the barnase.lst listing file includes the crossword table: Name At.No. x y z s.o.f. Min. distances / PATSMF AU1 84.4 0.1318 0.0494 0.5458 1.00 29.64 48.4 AU2 80.0 0.2831 0.5398 0.6667 1.00 29.45 80.1 27.48 62.3 AU3 48.2 -0.2260 0.3630 0.6308 1.00 28.91 40.8 35.00 35.46 34.5 41.1 AU4 21.8 1.00 27.28 12.2 0.0246 0.0134 0.6418 9.61 37.97 30.80 0.0 12.7 0.0 Understanding the crossword tables produced by SHELXS and SHELXD is the key to successful heavy atom location. The names and atomic numbers are invented by the program and need not be taken seriously. They are followed by the crystal co-ordinates of the heavy atoms and their site occupancies (always 1.0 except for atoms on special positions; in which case the value is less than one). Special positions should be treated with suspicion for heavy atom derivatives but can happen; they can be eliminated by making the second PATT parameter (the minimum intersite distance) negative. All the remaining items in the table are double entries; the top value is the minimum distance between one atom and its symmetry equivalents (first column) or between the atom marking the row and the atom marking the column (and all its symmetry equivalents; remaining columns). The bottom number is the corresponding Patterson superposition minimum function for all the vectors between one atom and its equivalents (first column) or between one atom and another, including all equivalents of the latter (remaining columns). Thus 40.8 is the Patterson minimum function for vectors between Au3 and its symmetry equivalents and 35.46 is the minimum distance in Angstroms between Au2 and Au3, taking symmetry into account. Au4 is clearly a spurious atom (or possibly an additional low-occupancy site) because of the low Patterson minimum function value involving it. In this example the distance information is not useful (except as a check that the sites are not too close to one another); formation of a trimer with equal gold-gold distances would not be obvious if an intertrimer Au...Au distance were shorter than the intratrimer distance. 25 4.5 Integrated Patterson and direct methods: SHELXD SHELXD is designed both for the ab initio solution of macromolecular structures from atomic resolution native data alone and for the location of heavy-atom sites from F or FA values at much lower resolution, in particular for the location of larger numbers of anomalous scatterers from MAD data. The dual-space approach of SHELXD was inspired by the Shake and Bake philosophy of Miller et al. (1993, 1994) but differs in many details, in particular in the extensive use it makes of the Patterson function that proves very effective in the applications involving F or FA data. An advantage of the Patterson is that it provides a good noise filter for the F or FA data: negative regions of the Patterson can simply be ignored. On the other hand, the direct methods approach is efficient at handling a large number of sites, whereas the number of Patterson peaks to analyze increases with the square of the number of atoms. Thus, for reasons of efficiency, the Patterson function is employed at two stages in SHELXD: at the beginning to obtain starting atom positions (otherwise random starting atoms would be employed) and at the end, in the form of the triangular crossword table as used in SHELXS, to recognize which atoms are correct. In between, several cycles of real/reciprocal space alternation are employed as in the ab initio structure solution, alternating between tangent refinement, E-map calculation, and peaksearch, and possibly random omit maps, in which a specified fraction of the potential atoms are left out at random. Further details of the algorithms used in SHELXD are given by Sheldrick (1998b), Usón & Sheldrick (1999) and in the previous chapter. From the user’s point of view, the input to SHELXD for the location of heavy atom sites is extremely similar to that for SHELXS. The PATT (or TREF) instruction is replaced by FIND followed by the expected number of heavy atom sites, and the minimum allowed distance between two sites is given after MIND, with a negative sign to indicate that a crossword table should be calculated for the best solutions. Thus for barnase the .ins file would contain TITL...UNIT as before followed by:: PATS FIND MIND NTRY HKLF END 3 –8 100 3 This would start with atoms consistent with the Patterson, which results in about 90% of solutions being correct. Leaving out ‘PATS’, i.e. starting instead from random atoms, reduces this percentage to about 20%. The NTRY instruction specifies the number of tries, if this instruction is missing the program runs for ever (which can sometimes be convenient; the job can 26 be interrupted, e.g. by creating a file barnase.fin, when it looks as though the structure has been solved). 4.6 Practical considerations for locating heavy atoms Since the input files for the direct and Patterson methods in SHELXS and the integrated method in SHELXD are so similar, it is easy to try all three methods for a difficult problem. The Patterson interpretation in SHELXS is a good choice if the heavy atoms have variable occupancies and it is not known how many heavy-atom sites need to be found; the direct methods approaches work best with equal atoms. In general, the conventional direct methods in SHELXS will tend to perform best in the non-polar space-group that does not possess special positions. However, for more than about a dozen sites, only the integrated approach in SHELXD is likely to prove the most effective; the SHELXD algorithm works best when the number of sites is known, at least approximately. Especially for the MAD method, the quality of the data is decisive; it is essential to collect data with a high redundancy to optimize the signal to noise ratio and eliminate outliers. SHELX does not include a program to extract FA-values from MAD data but the Bruker XPREP program can be used for this. In general, a resolution of 3.5Å is adequate for the location of heavy-atom sites. A critical decision is the resolution limit at which to cut the data. XPREP calculates a correlation coefficient between the signed anomalous differences for each pair of wavelengths as a function of the resolution; in practice the correlation coefficient becomes smaller at high resolution. Experience indicates that the data should be truncated at the resolution at which the correlation coefficient falls below 30%. If this requires throwing away all the data, the anomalous signal is probably too weak for structure solution! When data have been collected at three or more difference wavelengths without major problems such as crystal decay, icing, wavelength drift etc., SHELXS and SHELXD tend to perform better with FA-values than with anomalous F-values. It is however debatable as to whether it is better to collect highly redundant data at a single wavelength corresponding to a significant f”-value for an SAS experiment or to spend the same time measuring less precise data at several wavelengths for a MAD experiment. SHELXS and SHELXD only provide a method of locating the heavy atoms or anomalous scatterers. They do not include facilities for the further calculations necessary to obtain maps. The programs SHARP (de la Fortelle & Bricogne, 1997) and DM (Cowtan, 1999) are recommended for this purpose. Experience indicates that it is only necessary to refine the Bvalues of the heavy atoms using other programs; their coordinates are already rather precise. 27 4.7 A selenomethionine MAD example The following example of GPATase, kindly provided as a test by Janet Smith and Joe Krahn, illustrates the application of SHELXD to a selenomethionine MAD problem. 22 unique selenium atoms were expected, but only 20 can be found, probably because the two N-terminal selenomethionines exhibit high thermal motion. Unusually, in this test the solution with the highest correlation coefficient (CC) is not correct, but the correct solution can easily be identified on the basis of the PATFOM figure of merit and by inspection of the crossword table. First the crossword table for the solution with the highest CC is shown, this is clearly wrong because there are many low and zero Patterson values (the bottom number of each pair): Solution 20 (false) Initial CC 42.6 N self 1 19.0 0.0 2 12.8 0.0 3 50.4 0.2 31.8 33.3 0.0 0.0 4 50.2 0.0 16.1 18.7 33.3 0.0 0.0 8.7 5 51.5 15.3 43.9 43.7 31.2 35.0 0.0 0.0 0.4 0.0 6 51.1 0.0 36.8 35.2 31.9 35.0 18.1 0.0 0.0 0.0 10.3 0.0 7 34.5 0.0 13.6 13.3 36.4 12.2 31.8 26.5 0.0 0.0 1.1 0.0 0.0 0.0 8 56.0 1.1 21.2 23.6 37.8 0.0 0.6 0.8 PATFOM 2.59 cross-vectors 4.0 0.0 5.6 31.2 33.8 15.1 0.0 0.1 0.0 0.0 ...etc... The correct solution (truncated) was as follows: 28 Solution 1 (correct) Initial CC 33.1 PATFOM 19.63 N self cross-vectors 1 31.4 10.7 2 50.7 20.7 33.4 11.5 3 51.2 19.3 35.0 31.2 11.5 13.1 4 40.8 5.1 5 31.5 8.0 53.6 35.4 36.2 49.8 8.9 15.5 11.6 8.6 6 29.2 9.6 14.4 31.5 34.9 16.4 40.9 9.3 8.9 8.0 7.6 5.7 7 52.6 7.9 13.3 36.5 26.5 9.2 11.7 9.0 8 41.6 8.8 45.3 26.7 37.7 44.2 13.3 40.3 41.9 9.4 8.3 6.5 7.0 5.4 6.0 8.6 5.7 35.7 31.2 7.5 7.1 8.2 9.2 46.8 15.4 6.6 9.1 8.5 ...etc... 18 35.0 0.0 38.7 33.7 27.9 41.8 16.8 25.9 40.0 3.5 1.8 2.7 2.1 1.7 2.0 2.5 19 37.8 0.0 16.6 25.9 32.3 18.6 39.8 0.0 1.8 1.5 1.8 0.0 20 29.3 2.8 22.8 27.6 31.8 26.0 31.9 10.7 25.3 1.5 1.3 0.4 0.0 0.8 2.4 0.0 6.1 17.1 2.3 3.6 ============================================ 21 16.3 0.2 20.6 38.8 32.8 22.8 34.5 11.1 22.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 22 37.4 0.0 33.7 26.4 29.7 30.8 46.3 27.3 22.2 1.4 0.0 0.2 0.0 0.0 0.0 0.0 The sites 21 and 22 are incorrect, as indicated by the many Patterson zeros! 29 5. A guide to SHELX for macromolecules: Refinement 5.1 Refinement of macromolecules with SHELXL Recently, improvements in cryocrystallography, area detectors, and synchrotron data collection have led to a rapid increase in the number of high resolution (<2Å) macromolecular data-sets. The enormous increase in available computer power makes it feasible to refine these structures using algorithms incorporated in SHELXL that were initially designed for small molecules. These algorithms are generally slower but make fewer approximations (e.g., conventional structure factor summation rather than FFT) and include features, such as anisotropic refinement, modeling of complicated disorder and twinning, estimation of standard uncertainties by inverting the normal matrix, etc., that are routine in small-molecule crystallography but are not widely implemented in programs written specifically for macromolecular structure refinement. SHELXL is a very general refinement program that is equally suitable for the refinement of minerals, organometallic structures, oligonucleotides, or proteins (or any mixture thereof) against X-ray or neutron single (or twinned!) crystal data. It has even been used with diffraction data from powders, fibers, and two-dimensional crystals. For refinement against Laue data, it is possible to specify a different wavelength and hence dispersion terms for each reflection. The price of this generality is that it is somewhat slower than programs specifically written only for protein structure refinement. Any protein- (or DNA-) specific information must be input to SHELXL by the user in the form of refinement restraints, etc. Refinement of macromolecules using SHELXL has been discussed by Sheldrick & Schneider (1997). Despite this generality, it must be emphasized that SHELXL is not suitable for refinements at resolutions lower than about 2.5Å because, unlike CNS and X-PLOR, it does not include general energy terms, and that a least-squares refinement program such as SHELXL will suffer more from model bias than a program based on maximum likelihood. Thus almost always the initial refinement will have been performed with another program and SHELXL will be used for the final refinement, perhaps involving extension to very high resolution, modeling of disorder, anisotropic refinement and the least-squares estimation of parameter errors. Thus the starting point for a SHELXL refinement will usually be a PDB format file from the previous refinement. Even when SHELXL is used for the refinement of a twinned structure at lower resolution, the starting model is likely to be in the form of a PDB file from a molecular replacement solution. 30 5.2 Input and output files for SHELXL SHELXL, like SHELXS and SHELXD, usually requires two input files: an .ins file containing crystal data, instructions and atoms, and an .hkl file containing h, k, l, F2 and (F2) in fixed ‘HKLF 4’ format [F and (F) may also be used and require the instruction ‘HKLF 3’]. The .ins file will usually be generated from a PDB format file using the ‘I’ option in SHELXPRO. This sets up the TITL...UNIT instructions as for input to SHELXS or SHELXD, followed by standard refinement instructions, restraints, instructions for generating hydrogen atoms (commented out until needed) and atoms in crystal coordinates. For residues other than the 20 standard amino-acids, suitable restraints (see below) must be added by hand (DNA and RNA restraints are provided in files on the SHELX ftp site). The ‘I’ option in SHELXPRO provides a way of renumbering the residues; since SHELXL does not (currently) recognize chain identifiers, chains must be emulated by (for example) adding 1000, 2000 etc. to the residue numbers. SHELXPRO can also perform the reverse operation when preparing a PDB file for deposition (the ‘B’ option). After each refinement job, the output .res file is edited or renamed to a new .ins file that serves as the input for the next refinement job. The updating of the .res file to .ins may also be performed by ‘U’ option in SHELXPRO, incorporating changes generated with the help of the graphics display program XtalView (McRee, 1992). Sometimes hand editing of these text files may be required too; this is the normal procedure for small molecule refinements. Annotated extracts from a typical .ins file are shown at the end of this chapter and should be looked at before reading further. The .hkl file may be generated directly by the data reduction programs or by CCP4. It is not necessary to sort the data, eliminate systematic absences or merge equivalents, SHELXL can do this anyway. If it is desired to refine (using complex scattering factors) against separate F2-values for h,k,l and –h,-k,-l some care is needed; there are problems using data processing software (such as CCP4) that does not keep these measurements separate, and ‘MERG 2’ must be specified in the .ins file to prevent SHELXL from merging the Friedel opposites (and setting all f” values to zero). A further problem on continuing a refinement started with another program is to ensure consistent flagging of the Rfree reflections (Brünger, 1992). SHELXPRO enables X-PLOR and CNS files to be converted to SHELX format, retaining the Rfree flags, mtz2various can write an .hkl file including Rfree flags, and the Bruker XPREP program provides general facilities for setting Rfree flags and for transferring and extending Rfree flags consistently from one reflection file to another taking space group symmetry into account. When twinning or NCS are present, it is better to flag thin resolution shells, otherwise random reflections should be flagged. SHELXL also writes an output .fcf file containing phased reflection data in CIF format. This may be read directly into XtalView to create maps or into SHELXPRO to produce statistics. 31 SHELXPRO can also generate maps for O (Jones et al., 1991) and (Turbo)Frodo (Jones, 1978). This .fcf file, since it is in CIF format, is suitable for direct deposition with the RCSD/PDB. XtalView is recommended as the graphical interface for adjusting the model between SHELXL refinements of macromolecules, because it can read the .pdb and .fcf output files written by SHELXL directly, and is able to handle disorder and anisotropic displacement parameters correctly. 5.3 Constraints and restraints In refining macromolecular structures, it is almost always necessary to supplement the diffraction data with chemical information in the form of restraints. A typical restraint is the condition that a bond length should approximate to a target value with a given estimated standard deviation; restraints are treated as extra experimental data items. Even if the crystal diffracts to 1.0Å, there may well be poorly defined disordered regions for which restraints are essential to obtain a chemically sensible model (the same can be true of small molecules too!). SHELXL is generally not suitable for refinements at resolutions lower than about 2.5Å because it cannot handle general potential energy functions, e.g., for torsion angles or hydrogen bonds; if non-crystallographic symmetry restraints can be employed, this limit can be relaxed a little. For some purposes (e.g., riding hydrogen atoms, rigid group refinement, or occupancies of atoms in disordered side-chains), constraints, exact conditions that lead to a reduction in the number of variable parameters, may be more appropriate than restraints; SHELXL allows such constraints and restraints to be mixed freely, i.e. an atoms may be simultaneously subject to several different constraints and restraints. Riding hydrogen atoms (set using HFIX or AFIX instructions) are defined such that the C-H vector remains constant in magnitude and direction, but the carbon atom is free to move; the same shifts are applied to both atoms, and both atoms contribute to the least-squares derivative sums. This model may be combined with anti-bumping restraints that involve hydrogen atoms, which helps to avoid unfavorable side-chain conformations. SHELXL also provides, e.g., methyl groups that can rotate about their local three-fold axes; for small molecules the initial torsion angle may be found using a difference electron density synthesis calculated around the circle of possible hydrogen positions (HFIX 137). In macromolecules, methyl groups are rarely so well defined, so a staggered riding model is usually better (HFIX 33). Restraints and constraints provide good examples of the way in which individual residues can be referenced by SHELXL. For example, in the .ins file given at the end of this chapter: ANIS_* FE SG SD makes atoms called FE, SD and SG in any residue anisotropic; 32 DFIX_1 C1 N 1.329 restrains a specific bond length (for the N-terminal formyl group). Note that when no esd is given, the default (here 0.2Å from the DEFS instruction) is assumed. DFIX_ALA 1.525 C CA restrains the C-CA bond in all alanine residues SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42 restrains the bond lengths in the FeS4 unit to be equal, but without a target value, with an esd of 0.04Å. The central iron atom is in residue number 54 and the four cystein sulfurs are all in different residues. FLAT_* 0.3 O_- CA_- N C_- CA restrains N and CA of each amino-acid and O, CA and C of the preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity) 5.4 Least-squares refinement algebra The original SHELX refinement algorithms were modeled closely on those described by Cruickshank (1970). For macromolecular refinement, an alternative to (blocked) full-matrix refinement is provided by the conjugate-gradient solution of the least-squares normal equations as described by Hendrickson & Konnert (1980), including preconditioning of the normal matrix that enables positional and displacement parameters to be refined in the same cycle. The structure factor derivatives contribute only to the diagonal elements of the normal matrix, but all restraints contribute fully to both the diagonal and non-diagonal elements, although neither the Jacobian nor the normal matrix itself are ever generated by SHELXL. The parameter shifts are modified by comparison with those in the previous cycle to accelerate convergence whilst reducing oscillations. Thus, a larger shift is applied to a parameter when the current shift is similar to the previous shift, and a smaller shift is applied when the current and previous shifts have opposite signs. SHELXL refines against F2 rather than F, which enables all data to be used in the refinement with weights that include contributions from the experimental uncertainties, rather than having to reject F-values below a preset threshold; there is a choice of appropriate weighting schemes. Provided that reasonable estimates of (F2) are available, this enables more experimental information to be employed in the refinement; it also facilitates refinement against data from twinned crystals. 5.5 Full-matrix estimates of standard uncertainties Inversion of the full normal matrix (or of large matrix blocks, e.g., for all positional parameters) enables the precision of individual parameters to be estimated (Rollett, 1970), either with or without the inclusion of the restraints in the matrix. The standard uncertainties in dependent 33 quantities (e.g., torsion angles or distances from mean planes) are calculated in SHELXL using the full least-squares correlation matrix. These standard uncertainties reflect the data-toparameter ratio, i.e., the resolution and completeness of the data and the percentage of solvent, and the quality of the agreement between the observed and calculated F2-values (and the agreement of restrained quantities with their target values when restraints are included). If high resolution data are available (there must be appreciably more data than parameters!) and the structure is not too large, it may be possible to obtain rigorous esds by matrix inversion. The structure should first be refined to convergence with CGLS setting the second parameter to –1 to calculate Rfree, then a further refinement should be performed against all data by deleting the second CGLS parameter, and finally a single full-matrix cycle should be performed (‘L.S. 1’) with zero damping and a zero shift multiplier (‘DAMP 0 0’) in which all restraints have been removed. Often ‘BLOC 1’ will be used so that the (anisotropic) displacement parameters are fixed in this final cycle, which makes the matrix appreciably smaller and more stable on inversion, but still allows the estimation of realistic standard deviations on all geometrical parameters. BOND, RTAB, HTAB and MPLA instructions may be needed to define the dependent parameters for which esds are required. Full-matrix refinement is also useful when domains are refined as rigid groups in the early stages of refinement (e.g., after structure solution by molecular replacement), since the total number of parameters is small and the correlation between parameters may be large. To set up such a rigid group refinement, simply add ‘AFIX 6’ before the first atom of each rigid group and ‘AFIX 0’ after the last atom in each group. ‘BLOC 1’ can be used to hold the temperature factors constant. The AFIX instructions can easily be removed later to remove the rigid group constraints. There is no need to remove any restraints made redundant by the rigid groups, and restraints linking atoms in different rigid groups will continue to be useful. The BUMP instruction should be commented out (with REM) for this rigid group refinement. 5.6 Refinement of anisotropic displacement parameters The motion of macromolecules is clearly anisotropic, but the data-to-parameter ratio rarely permits the refinement of the six independent anisotropic displacement parameters (ADPs) per atom; even for small-molecules and data-to-atomic resolution, the anisotropic refinement of disordered regions requires the use of restraints. SHELXL employs three types of ADP-restraint (Sheldrick 1993; Sheldrick & Schneider, 1997). The rigid bond restraint, first suggested by Rollett (1970), assumes that the components of the ADPs of two atoms connected via one (or two) chemical bonds are equal within a specified standard deviation. This has been shown to hold accurately (Hirshfeld, 1976; Trueblood & Dunitz, 1983) for precise structures of small34 molecules, so it can be applied as a ’hard’ restraint with small estimated standard deviation. The similar ADP restraint assumes that atoms that are spatially close (but not necessarily bonded because they may be different components of a disordered group) have similar Uij components. An approximately isotropic restraint is useful for isolated solvent molecules. These two restraints are only approximate and so should be applied with low weights, i.e., high estimated standard deviations. The transition from isotropic to anisotropic roughly doubles the number of parameters and almost always results in an appreciable reduction in the R-factor. However, this represents an improvement in the model only when it is accompanied by a significant reduction in the free Rfactor (Brünger, 1992). Since the free R-factor is itself subject to uncertainty because of the small sample used, a drop of at least 1% is needed to justify anisotropic refinement. There should also be a reduction in the goodness of fit, and the resulting thermal ellipsoids should make chemical sense and not be ‘non-positive-definite’! In the .ins file at the end of this chapter, since the resolution is borderline for an anisotropic refinement of all atoms, one could try ‘ANIS’ (all atoms assumed) instead of ‘ANIS_* FE SD SG’ but add an ‘approximately isotropic’ restraint with a tight esd (0.3) on all atoms except iron and sulfur (which are better defined because of their larger number of electrons) by: ISOR 0.03 $C_* $N_* $O_* 5.7 Similar geometry and NCS restraints When there are several identical chemical moieties in the asymmetric unit, a very effective restraint is to assume that the chemically equivalent 1,2- and 1,3-distances are the same, but unknown. This technique is easy to apply using SHELXL and is often employed for smallmolecule structures and, in particular, for oligosaccharides. Similarly, the terminal P-O bond lengths in DNA structures can be assumed to be the same (but without a target value), i.e., it is assumed that the whole crystal is at the same pH. For proteins, the method is less suitable because of the different abundance of the different amino-acids, and, in any case, good target distances are available (Engh & Huber, 1991). Local non-crystallographic-symmetry (NCS) restraints (Usón et al., 1999) may be applied to restrain corresponding 1,4-distances and isotropic displacement parameters to be the same when there are several identical macromolecular domains in the asymmetric unit; usually, the 1,2- and 1,3-distances are restrained to standard values in such cases and so do not require NCS restraints. Such local NCS restraints are more flexible than global NCS constraints and – unlike the latter – do not require the specification of a transformation matrix and mask. The NCSY instruction 35 requires the user to define a set of atoms and the difference(s) in residue numbers between the NCS related units (often 1000, if this offset has been used to represent different chains). It should be noted that SHELXPRO provides a variety of ways of representing NCS differences graphically. 5.8 Modeling disorder There are many ways of modeling disorder using SHELXL, but for macromolecules the most convenient is to retain the same atom and residue names for the two or more components and assign a different ‘part number’ (analogous to the PDB alternative site flag) to each component. With this technique, no change is required to the input restraints, etc. Atoms in the same component will normally have a common occupancy that is assigned to a free variable (fv). The starting values for the free variables are given, in order, on the FVAR instruction; note that there is no free variable number 1 (adding 10 fixes a parameter); the first FVAR parameter is the overall scale factor. Residues Glu_12 and Cys_38 have disordered side-chains in the example; their occupancies are tied to fv(2) (for the atoms in component [PART] 1) and to 1-fv(2) for the atoms in component 2 for Glu_12, and similarly fv(4) and 1-fv(4) for Cys_38. This ensures that the sum of occupancies for both components is held at unity. ’21.0’ is interpreted as 1.0 times fv(2), and –21.0 as 1.0 times [1-fv(2)]. This notation is not very intuitive, but it is concise and very flexible. Free variables may also be used in DFIX and CHIV restraints. Thus ’CHIV_PRO 31 CA’ would cause the chiral volumes of all proline CA atoms to be restrained to free variable number 3, which itself is allowed to refine. In this way reasonable geometrical restraints can be applied even when the target values are unknown. By restraining distances to be equal to a free variable using DFIX, a standard deviation of the mean distance may be calculated rigorously using full-matrix least-squares algebra. If there are three or more disorder components, then each of the common occupancies must be assigned to a separate free variable (e.g. as 51, 61 and 71), and their sum can be restrained to unity by the use of a SUMP restraint (e.g. ‘SUMP 1 0.01 1 5 1 6 1 7’). 5.9 The bulk solvent correction and water divining Modeling the low resolution data is always difficult in macromolecular refinement. Of course these data could be left out (using the SHEL instruction) but then the electron density maps would suffer considerably. The mean electron density of the solvent is only slightly less than that of protein or DNA, but the model usually contains no atoms in the middle of the solvent regions because the solvent density is so featureless. The result is that the observed diffracted intensities 36 tend on average to be much smaller than those calculated from the model at very low resolution. SHELX, in common with several other programs, uses Babinet’s principle to define a bulk solvent model with two refinable parameters (Moews & Kretsinger, 1975). In addition, global anisotropic scaling (Usón et al., 1999) may be applied using a parameterization proposed by Parkin, Moezzi & Hope (1995). An auxiliary program, SHELXWAT, allows automatic water divining by iterative least-squares refinement, rejection of waters with high displacement parameters, difference electron density calculation, and a peak-search for potential water molecules that make at least one good hydrogen bond and no bad contacts; this is a highly simplified version of the ARP procedure of Perrakis, Lamzin et al. (1997, 1999). 5.10 Twinned crystals SHELXL provides facilities for refining against data from merohedral, pseudo-merohedral, and non-merohedral twins (Herbst-Irmer & Sheldrick, 1998). Refinement against data from merohedrally twinned crystals is particularly straightforward, requiring only the twin law (a 3x3 matrix) and starting values for the volume fractions of the twin components. Failure to recognize such twinning not only results in high R-factors and poor quality maps, it can also lead to incorrect biochemical conclusions (Luecke, Richter & Lanyi, 1998). Twinning can often be detected by statistical tests (Yeates & Fam, 1999), and it is probably much more widespread in macromolecular crystals than is generally appreciated! No changes are needed to the .hkl file for merohedral twinning, but the data should be merged in the lower of the two relevant Laue groups). For non-merohedral twinning a special (‘HKLF 5’) format is required; probably the only general program currently able to generate this format is the Bruker GEMINI program, though several SHELX users have written special programs for individual cases. 5.11 The radius of convergence Least-squares refinement as implemented in SHELXL and other programs is appropriate for structural models that are relatively complete, but when an appreciable fraction of the structure is still to be located, maximum-likelihood refinement (Bricogne, 1991; Pannu & Read, 1996; Murshudov, Vagin & Dodson, 1997) is likely to be more effective, especially when experimental phase information can be incorporated (Pannu et al., 1998). Within the least-squares framework, there are still several possible ways of improving the radius of convergence. SHELXL provides the option of gradually extending the resolution of the data during the refinement; a similar effect 37 may be achieved by a resolution-dependent weighting scheme (Terwillinger & Berendzen, 1996). Unimodal restraints, such as target distances, are less likely to result in local minima than are multimodal restraints, such as torsion angles; multimodal functions are better used as validation criteria. It is fortunate that validation programs, such as PROCHECK (Laskowski et al., 1993), make good use of multimodal functions, such as torsion angles and hydrogen bonding patterns that are not employed as restraints in SHELXL refinements. 5.12 Unstable refinements and other problems However much care is taken in setting up a refinement, it can happen that the refinement becomes unstable and diverges. Usually the program detects this in time but in extreme cases, especially when full-matrix refinement is performed with a poorly conditioned matrix, it can crash. It is much more difficult to identify the cause of such problem when a large number of changes have been made in updating a .res file to the .ins file for the next job, so it is often more effective to improve the model in small steps. The .lst file contains a great deal of useful diagnostic information (which can be increased by using MORE 3); however the best place to start looking for problems is the list of ‘disagreeable restraints’; these often pinpoint the atoms or restraints that need changing. Also the presence of unrestrained atoms (which are commented on by the program) is a common cause of instability. In general, the more parameters that are refined, the less stable the refinement becomes; typical examples are the inclusion of dubious solvent water molecules or making all atoms anisotropic when there are not enough data. Anti-bumping restraints are very useful in maintaining a chemically sensible structure, especially at lower resolution, but can also set traps for the unwary. For example if two atoms that should be bonded are too far apart for the program to include them automatically in the connectivity array, an anti-bumping restraint may be generated automatically to push them apart and this will fight against a DFIX or DANG restraint that is trying to bring them together! The remedy is to join the two atoms by hand so that they are bonded in the connectivity array, e.g. BIND CB_23 CG_23 Even if the side-chain of residue 23 in this example is disordered and the bond is only broken in one component, this will have the desired effect. An incorrect connectivity can also affect the operation of a CHIV instruction (which requires the specified atom to be bonded to three and only three non-hydrogen atoms) and the automatic generation of hydrogen atoms (HFIX). Superfluous bonds may be removed from the connectivity array using e.g. FREE CB_23 CD_23 38 Usually if the connectivity array (included in the .lst file except for MORE 0) is correct, the restraints will ensure that a sensible geometry is obtained during the refinement. 5.13 Example of an .ins file for SHELXL refinement The following extracts from the file 6rxn.ins (provided together with 6rxn.hkl on the SHELX ftp site) illustrate a number of points that are annotate by comments. The structure was determined by Stenkamp, Sieker & Jensen, (1990) who have kindly given permission for it to be used in this way. As usual in .ins files, comments may be included as REM instructions or after exclamation marks. The resolution of 1.5Å does not quite justify refinement of all non-hydrogen atoms anisotropically ('ANIS' before the first atom would specify this), but the iron and sulfur atoms can be made anisotropic as shown below TITL Rubredoxin in P1 (from 6RXN in PDB) CELL 1.54178 24.920 17.790 19.720 101.00 83.40 104.50 ! Lambda & cell ZERR 1 0.025 0.018 0.020 0.05 0.05 0.05 ! Z & cell esds LATT -1 ! Space group P1 SFAC C H N O S FE ! Scattering factor types and UNIT 224 498 55 136 6 1 ! unit-cell contents DEFS 0.02 0.2 0.01 0.05 CGLS SHEL FMAP PLAN 10 -1 999 0.1 2 200 2.3 LIST 6 WPDB HTAB ! ! ! ! ! Global default restraint esds 10 Conjugate gradient cycles, calculate Rfree Do not truncate resolution Difference Fourier Peak-search and identification of potential waters ! Output phased reflection file to generate maps etc. ! Write PDB output file ! Output analysis of hydrogen bonds (requires H-atoms !) DELU $C_* $N_* $O_* $S_* ! Rigid bond restraints - ignored for isotropic SIMU 0.1 $C_* $N_* $O_* $S_* ! Similar U restraints - iso. or anis. ! Esd should be changed to ca. 0.05 if whole structure is anis. ISOR 0.1 O_201 > LAST ! Approximate isotropic restraints for waters; ! ignored for isotropic ANIS_* FE SD SG ! Make iron and all sulfur atoms anisotropic CONN 0 O_201 > LAST BUMP ! Don't include water in connectivity array and ! generate antibumping restraints automatically SWAT ! Bulk solvent model REM HOPE ! Anisotropic scaling not included MERG 4 ! Remove MERG 4 if Friedel opposites should not be merged MORE 1 ! MORE 0 for minimum, 2 or 3 for more output for diagnostics 39 REM Special restraints etc. specific to this structure follow: REM HFIX 43 C1_1 DFIX C1_1 N_1 1.329 DFIX C1_1 O1_1 1.231 DANG N_1 O1_1 2.250 DANG C1_1 CA_1 2.435 ! ! O=C(H)- (formyl) on N-terminus ! incorporated into residue 1 ! ! DFIX_52 C OT1 C OT2 1.249 DANG_52 CA OT1 CA OT2 2.379 DANG_52 OT1 OT2 2.194 ! ! Ionized carboxyl at C-terminus ! SADI_54 0.04 FE SG_6 FE SG_9 FE SG_39 FE SG_42 ! Equal but unknown Fe-S SADI_54 0.08 FE CB_6 FE CB_9 FE CB_39 FE CB_42 ! distances around Fe REM HFIX 83 SG_38 SG_138 DFIX DANG DANG DANG FLAT RTAB RTAB RTAB ! -SH for remaining cysteine (disordered) C_18 N_26 1.329 ! Patch break in numbering - residues O_18 N_26 2.250 ! 18 and 26 are bonded but there is a CA_18 N_26 2.425 ! gap in numbering for compatibility C_18 CA_26 2.435 ! with other rubredoxins that have an 0.3 O_18 CA_18 N_26 C_18 CA_26 ! extra loop Omeg CA_18 C_18 N_26 CA_26 ! Phi C_18 N_26 CA_26 C_26 ! Psi N_18 CA_18 C_18 N_26 ! REM DFIX from CSD and R.A.Engh & R.Huber, Acta Cryst. A47 (1991) 392. REM Remove 'REM ' before HFIX to activate H-atom generation REM HFIX_ALA 43 N REM HFIX_ALA 13 CA REM HFIX_ALA 33 CB REM REM REM REM HFIX_ASN HFIX_ASN HFIX_ASN HFIX_ASN 43 13 23 93 N CA CB ND2 REM HFIX_ASP 43 N REM HFIX_ASP 13 CA REM HFIX_ASP 23 CB ... etc ... REM HFIX_VAL 43 N REM HFIX_VAL 13 CA CB REM HFIX_VAL 33 CG1 CG2 REM Peptide standard torsion angles and restraints RTAB_* RTAB_* RTAB_* RTAB_* Omeg CA C N_+ CA_+ Phi C_- N CA C Psi N CA C N_+ Cvol CA DFIX_* 1.329 C_- N DANG_* 2.425 CA_- N 40 DANG_* 2.250 O_- N DANG_* 2.435 C_- CA FLAT_* 0.3 O_- CA_- N C_- CA REM Standard amino-acid restraints etc. CHIV_ALA C CHIV_ALA 2.477 CA DFIX_ALA DFIX_ALA DFIX_ALA DFIX_ALA DANG_ALA DANG_ALA DANG_ALA DANG_ALA 1.231 1.525 1.521 1.458 2.462 2.401 2.503 2.446 C O C CA CA CB N CA C N O CA C CB CB N RTAB_ASN Chi N CA CB CG CHIV_ASN C CG CHIV_ASN 2.503 CA DFIX_ASN DFIX_ASN DFIX_ASN DFIX_ASN DFIX_ASN DFIX_ASN DANG_ASN DANG_ASN DANG_ASN DANG_ASN DANG_ASN DANG_ASN DANG_ASN DANG_ASN 1.231 1.525 1.458 1.530 1.516 1.328 2.401 2.462 2.455 2.504 2.534 2.393 2.419 2.245 C O CG OD1 C CA N CA CA CB CB CG CG ND2 O CA C N CB N C CB CA CG CB OD1 CB ND2 OD1 ND2 RTAB_ASP Chi N CA CB CG CHIV_ASP C CG CHIV_ASP 2.503 CA DFIX_ASP DFIX_ASP DFIX_ASP DFIX_ASP DFIX_ASP DFIX_ASP DANG_ASP DANG_ASP DANG_ASP DANG_ASP DANG_ASP DANG_ASP DANG_ASP 1.231 1.525 1.530 1.516 1.458 1.249 2.401 2.462 2.455 2.504 2.534 2.379 2.194 C O C CA CA CB CB CG CA N CG OD1 CG OD2 O CA C N CB N C CB CA CG CB OD1 CB OD2 OD1 OD2 41 RTAB_CYS Chi N CA CB SG CHIV_CYS C CHIV_CYS 2.503 CA DFIX_CYS DFIX_CYS DFIX_CYS DFIX_CYS DFIX_CYS DANG_CYS DANG_CYS DANG_CYS DANG_CYS DANG_CYS 1.231 1.525 1.458 1.530 1.808 2.401 2.504 2.455 2.462 2.810 C O C CA N CA CA CB CB SG O CA C CB CB N C N CA SG ... etc ... RTAB_VAL Chi N CA CB CG1 RTAB_VAL Chi N CA CB CG2 CHIV_VAL C CHIV_VAL 2.516 CA DFIX_VAL DFIX_VAL DFIX_VAL DFIX_VAL DFIX_VAL DANG_VAL DANG_VAL DANG_VAL DANG_VAL DANG_VAL DANG_VAL WGHT FVAR RESI C1 O1 N CA CB CG SD CE C O RESI N CA CB CG CD OE1 1.231 1.458 1.525 1.540 1.521 2.401 2.462 2.497 2.515 2.479 2.504 C O N CA C CA CA CB CB CG2 CB CG1 O CA C N C CB CA CG1 CA CG2 N CB CG1 CG2 0.100000 1.00000 1 1 4 3 1 1 1 5 1 1 4 2 3 1 1 1 1 4 MET -0.01633 0.01012 0.00712 0.05947 0.07411 0.03196 0.04907 0.11380 0.10634 0.10329 GLN 0.14741 0.18940 0.22933 0.27354 0.24547 0.22482 0.5 0.5 0.5 0.5 0.35547 0.32681 0.44703 0.48491 11.00000 11.00000 0.11817 0.17896 0.35446 0.33273 0.33732 0.28864 0.31846 0.29170 0.38738 0.45513 0.37983 0.35391 0.27909 0.22872 0.14359 0.12261 0.39766 0.41972 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 0.11863 0.06229 0.15678 0.14569 0.23570 0.21476 0.09178 0.16480 0.35678 0.39931 0.34643 0.38674 0.38838 0.32772 0.40741 0.45565 0.45886 0.51173 0.58387 0.60689 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 0.08599 0.09291 0.13253 0.09866 0.05748 0.16301 42 NE2 C O RESI N CA CB CG CD CE NZ C O 3 1 4 0.24704 0.22198 0.25019 0.46053 0.47895 0.48377 0.62045 0.43826 0.38408 11.00000 11.00000 11.00000 0.10164 0.08193 0.10402 3 1 1 1 1 1 3 1 4 LYS 0.21781 0.25088 0.21991 0.16130 0.12843 0.10532 0.05943 0.30678 0.31462 0.54034 0.62006 0.68311 0.66288 0.72146 0.70085 0.74195 0.63497 0.59598 0.48673 0.47934 0.51795 0.49255 0.52924 0.60053 0.62796 0.50917 0.55179 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 0.07413 0.05181 0.09646 0.10455 0.22324 0.26354 0.40338 0.05714 0.07986 3 ... etc ... RESI N CA PART CB CG CD OE1 OE2 PART CB CG CD OE1 OE2 PART C O 12 3 1 GLU 0.41413 0.37955 1.09215 1.01183 0.48246 0.48195 11.00000 11.00000 0.06790 0.05761 1 1 1 4 4 0.32666 0.29679 0.25357 0.24346 0.23012 1.01321 0.93111 0.93709 1.00278 0.87537 0.52971 0.54638 0.60700 0.63210 0.63031 21.00000 21.00000 21.00000 21.00000 21.00000 0.12219 0.15333 0.20272 0.26315 0.21375 1 1 1 4 4 0.32549 0.27756 0.22547 0.20774 0.20259 1.01718 0.94582 0.95184 0.90241 1.00588 0.52772 0.50954 0.55635 0.59575 0.55325 -21.00000 -21.00000 -21.00000 -21.00000 -21.00000 0.12065 0.15928 0.20457 0.22329 0.31441 1 4 0.36477 0.34317 0.97439 1.00861 0.40859 0.37369 11.00000 11.00000 0.04768 0.06890 1 2 0 ... etc ... RESI N CA PART CB SG PART CB SG PART C O 38 RESI N CA CB SG C O 39 3 1 CYS 0.77141 0.78873 0.92674 0.97402 0.00625 0.07449 11.00000 11.00000 0.10936 0.13706 1 5 0.83868 0.89948 1.04271 1.00271 0.05517 0.02305 41.00000 41.00000 0.11889 0.18205 1 5 0.84149 0.83686 1.03666 1.10360 0.06538 -41.00000 0.01026 -41.00000 0.14933 0.17328 1 4 0.74143 0.70724 1.01670 1.02319 0.10383 0.06903 11.00000 11.00000 0.08401 0.10188 3 1 1 5 1 4 CYS 0.74699 0.70682 0.72588 0.67932 0.70922 0.75427 1.04547 1.09027 1.11964 1.17560 1.16093 1.20325 0.17051 0.20876 0.28230 0.33481 0.17333 0.15858 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 0.08888 0.06869 0.04269 0.08016 0.06208 0.07437 1 2 0 43 ... etc ... RESI N CA CB C OT1 OT2 52 RESI FE 54 REM REM REM REM 3 1 1 1 4 4 ALA 0.33596 0.30961 0.34040 0.24852 0.22236 0.22682 0.63469 0.68882 0.77357 0.67507 0.72170 0.61667 0.69557 0.74487 0.74194 0.73435 0.77321 0.69191 11.00000 11.00000 11.00000 11.00000 11.00000 11.00000 0.04662 0.08939 0.13277 0.09032 0.11368 0.08341 1.22290 0.43784 11.00000 0.07929 FE 6 0.72017 Only the waters with high occupancies and low U's have been retained, and all the occupancies have been reset to 1, with a view to running the automatic water divining. Water residue numbers have been changed to start at 201. RESI O RESI O RESI O RESI O 201 HOH 0.13450 202 HOH 4 0.84795 203 HOH 4 0.27771 204 HOH 4 0.37066 4 0.53192 0.60802 11.00000 0.13132 0.53873 0.69488 11.00000 0.15273 0.95750 0.25086 11.00000 0.11315 0.71872 0.90376 11.00000 0.10854 1.38725 0.25914 11.00000 0.10698 ... etc ... RESI 233 HOH O 4 0.27813 HKLF 3 END 44 6. Frequently asked questions (by biocrystallographers) Q1: Where is the manual? A: Postscript and MSWord versions of the manual can be downloaded from the SHELX ftp site. However this manual was written for small-molecule crystallographers, you will still need it as a reference book (it even has an index) but you should start by reading the material provided for the Workshop. There is also a lot of useful information on the SHELX homepage or accessible via links from it, including tutorials for which test data are available.. Q2: How do I transfer my data, including Rfree flags, from X-PLOR or other programs to SHELX? A: Use the ‘Y’ option in SHELXPRO to convert the .fob file to .hkl, and the ‘I’ option to convert .pdb to .ins. Although SHELXL prefers intensities, for macromolecules it is OK to continue to use F-values if you were using them in X-PLOR. In CCP4, the mtz2various program can write SHELX format files. The Bruker XPREP program provides a space-group general option for transferring Rfree flags from one data-set to another, taking equivalent reflections into account. Q3: I have a non-standard ligand, how do I make the topology file? A: SHELXL doesn’t have a topology file, the restraints etc. are all included in the .ins file. One good way to generate these restraints is to find a suitable fragment in the CSD, then use the ‘J’ option in SHELXPRO. If it’s not in the CSD, you could do a quick small-molecule structure (using SHELX) and feed that into SHELXPRO. Q4: Why are the R-factors different from X-PLOR etc.? A: Check that you are using the same data (F or F2, resolution cutoffs, Rfree flags ?) and that the bulk solvent model is not causing problems (it tends to interact with the B-values, so it might be best to do a few refinement cycles first to sort this out). Q5. After using SHELXPRO to prepare the .ins file from a PDB file and then running SHELXL, I get the message: ‘** No match for two atoms in DFIX **’ but otherwise everything seems OK. A: This message probably refers to the fact that SHELXPRO labels the oxygens of the carboxyterminus OT1 and OT2 so that different bond length restraints can be applied than to the same type of amino-acid when it is in a peptide chain. This is normal and can be safely ignored. Other such messages should always be investigated carefully, they may indicate missing or bad restraints or bad initial connectivity (which can be corrected using BIND and FREE). 45 Q6. I can solve the structure by molecular replacement in space group P32 but the R-factors are high and the Rsym for P3221 was not much higher than for P32. What should I do? A: Your structure may well be merohedrally twinned, but don’t panic! The E-statistics can be calculated using e.g. SHELXS, SHELXD or XPREP; <|E2-1|> << 0.736 would also suggest twinning. All you need to do in this particular case is to include the two instructions: TWIN 0 1 0 1 0 0 0 0 –1 BASF 0.3 In your .ins file and repeat the refinement job! If the BASF parameter (you can find it in the .lst or .res file) refines to a value intermediate between 0 and 1, and R1 and Rfree drop significantly, you are winning. No other special action is needed, SHELXPRO and XtalView can be used in the usual way because the .fcf file is effectively ‘detwinned’. Q7: When is it justified to refine anisotropically? A: In general if the resolution is worse than about 1.5Å it is unlikely to be worth trying, but it depends on the completeness and quality of the data and the percentage of solvent. A drop of Rfree of about 1% or more might be considered to justify anisotropic refinement. In borderline cases tighter restraints (including possibly ISOR for all atoms) might be needed. Q8: When should I add hydrogen atoms? A: As late as possible because they cost computer time, though it usually brings a drop in Rfree between 0.5 and 1.0%. In many cases it is probably more trouble than it is worth to include the OH groups; these hydrogens tend to have higher B-values and are more difficult to position automatically. If one is unlucky, the program will put two OH hydrogens along the same hydrogen bond, and the combination of anti-bumping restraints and the riding model can then distort the rest of the structure. Q9: SHELXL complains that it does not have enough memory, what should I do? A: Use the larger version SHELXH. If even this is not large enough, you will have to change the dimensions of the arrays A and B and recompile the program. This is explained in the comments at the start of the source shelxl.f and compiling instructions are given on the SHELX homepage. Q10: What does ‘nan’ mean? A: ‘Not a number’. Something has gone seriously wrong with the calculation. Check the .lst file for other warning messages and in particular the list of ‘disagreeable restraints’ for indications of the source of the error. Perhaps you are simply trying to refine more parameters than the data can 46 support. It is also a good idea to introduce changes in small steps rather than to change everything at once. Q11: What citations should be quoted when I write up a structure solved or refined with SHELX? A: One or more of the following are suggested, depending on which programs were used: SHELXL (refinement): Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high resolution refinement. Methods in Enzymology, 277, edited by C. W. Carter, Jr. & R. M. Sweet, pp. 319– 343. San Diego: Academic Press. SHELXS (direct methods): Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct methods for larger structures. Acta Cryst. A46, 467–473. SHELXS (Patterson): Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of direct methods and Patterson interpretation to high-resolution native protein data. Acta Cryst. D49, 18–23. SHELXD: Usón, I. & Sheldrick, G. M. (1999). Advances in direct methods for protein crystallography. Curr. Opinion in Struct. Biol. 9, 643–648. Q12: I have a lot more questions ... A: Further ‘frequently asked questions’ may be found on the SHELX homepage and in Thomas Schneider’s FAQ list: http://shelx.uni-ac.gwdg.de/~trs/shelxl_faq/shelxfaq_index.html The latter contains a great deal of useful information and is primarily intended for macromolecular crystallographers using SHELXL. When you have really exhausted these sources of information you may try an email to gsheldr@shelx.uni-ac.gwdg.de, uson@shelx.uni-ac.gwdg.de or trs@shelx.uni-ac.gwdg.de; questions on twinning may be sent to rherbst@shelx.uni-ac.gwdg.de 47 7. References Bricogne, G. (1991). A multisolution method of phase determination by combined maximization of entropy and likelihood III. Extension to powder diffraction data. Acta Cryst. A47, 803–829. Brodersen, D. E., de La Fortelle, E., Vonrhein, C., Bricogne, G., Nyborg, J. & Kjeldgaard, M. (2000). Application of single-wavelength anomalous dispersion at high and atomic resolution. Acta Cryst. D56, 431–441. Brünger, A. T. (1992). Free R value: a novel statistical quantity for assessing the accuracy of crystal structures. Nature (London) 355, 472–475. Buerger, M. J. (1959). Vector space and its application in crystal structure investigation. New York: Wiley. Buerger, M. J. (1964). Image methods in crystal structure analysis. In Advanced methods of crystallography, edited by G. N. Ramachandran, pp.1–24. Orlando, Florida: Academic Press. Busetta, B., Giacovazzo, C., Burla, M. C., Nunzi, A., Polidori, G. & Viterbo, D. (1980). The SIR program. I. Use of negative quartets. Acta Cryst. A36, 68–74. Cowtan, K. (1999). Error estimation and bias correction in phase-improvement calculations. Acta Cryst. D55, 1555–1567. Cruickshank, D. W. J. (1970). Least-squares refinement of atomic parameters. In Crystallographic computing, edited by F. R. Ahmed, S. R. Hall & C. P. Huber, pp. 187–197. Copenhagen: Munksgaard. Debeardemaeker, T., Tate, C. & Woolfson, M. M. (1985). On the application of phase relations to complex structures. XXIV. The Sayre tangent formula. Acta Cryst. A41, 286–290. De La Fortelle, E. & Bricogne, G. (1997). Maximum-Likelihood Heavy-Atom Parameter Refinement in the MIR and MAD Methods. Methods in Enzymology 276, edited by C. W. Carter, Jr. & R. M. Sweet, pp. 472–494. San Diego: Academic Press. Engh, R. A. & Huber, R. (1991). Accurate bond and angle parameters for X-ray protein structure refinement. Acta Cryst. A47, 392–400. Fujinaga M. & Read R. J. (1987). Experiences with a new translation-function program. J Appl Cryst. 20, 517–521. Hendrickson, W. A. (1991). Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science 254, 51–58. 48 Hendrickson W. A. & Konnert, J. H. (1980). Incorporating stereochemical information into crystallographic refinement. In Computing in crystallography, edited by R. Diamond, S. Ramaseshan & K. Venkatesan, pp. 13.01-13.23. Bangalore: Indian Academy of Sciences. Herbst-Irmer, R. & Sheldrick, G. M. (1998). Refinement of twinned structures with SHELX97. Acta Cryst. B54, 443–449. Hirshfeld, F. L. (1976). Can X-ray data distinguish bonding effects from vibrational smearing? Acta Cryst. A32, 239–244. Jones, T. A. (1978). A graphics model-building and refinement system for macromolecules. J. Appl. Cryst. 11, 268–272. Jones, T. A., Zou, J. Y., Cowan, S. W., & Kjeldgaard, M. (1991). Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Cryst. A47, 110–119. Karle, J. & Hauptman, H: (1956). A theory of phase determination for the four types of noncentrosymmetric space groups 1P222, 2P22, 3P12, 3P22. Acta Cryst. 9, 635–651. Langs, D. A. (1988). Three-dimensional structure at 0.86Å of uncomplexed form of the transmembrane ion channel peptide gramicidin A. Science 241, 188–191. Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). PROCHECK, a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291. Luecke, H., Richter, H. T. & Lanyi, J. K. (1998). Proton transfer pathways in bacteriorhodopsin at 2.3Å resolution. Science 280, 1934–1937. McRee, D. E. (1992). A visual protein crystallographic software system for X11/Xview. J. Mol. Graph. 10, 44–46. Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M. & Hauptman, H. A. (1993). On the application of the minimal principle to solve unknown structures. Science 259, 1430–1433. Miller, R., Gallo, S. M., Khalak, H. G. & Weeks, C. M. (1994). SnB: crystal structure determination via Shake-and-Bake. J. Appl. Cryst. 27, 613–621. Moews, P. C. & Kretsinger, R. H. (1975). Refinement of carp muscle parvalbumin by model building and difference Fourier analysis. J. Mol. Biol. 91, 201–228. Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Refinement of structures by the maximum-likelihood method, Acta Cryst. D53, 240–255. Nordman, C. E. (1966). Vector space search and refinement procedures. Trans. Am. Cryst. Assoc. 2, 29–38. 49 Parisini E., Capozzi F., Lubini P., Lamzin V. S., Luchinat C. & Sheldrick G. M. (1999). Ab initio solution and refinement of two high potential iron protein structures at atomic resolution. Acta Cryst. D55, 1773–1784. Parkin, S., Moezzi, B. & Hope, H. (1995). XABS2: an empirical absorption correction program. J. Appl. Cryst. 28, 53–56. Pannu, N. S., Murshudov, G. N., Dodson, E. J. & Read, R. J. (1998). Incorporation of prior phase information strengthens maximum-likelihood structure refinement, Acta Cryst. D54, 1285– 1294. Pannu, N. S. & Read, R. J. (1996). Improved structure refinement through maximum likelihood, Acta Cryst. A52, 659–668. Perrakis, A., Morris, R. & Lamzin, V. S. (1999), Automated protein model building combined with iterative structure refinement. Nature Struct. Biol. 6, 458–463. Perrakis, A., Sixma, T. K., Wilson, K. S. & Lamzin, V. S. (1997). wARP: Improvement and extension of crystallographic phases by weighted averaging of multiple-refined dummy atomic models. Acta Cryst. D53, 448–455. Rayment, I., Wesenberg, G., Meyer, T. E., Cusanovich, M. A. & Holden, H. M. (1992). Threedimensional structure of the high-potential iron-sulfur protein isolated from the purple phototropic bacterium Rhodocyclus tenuis determined and refined at 1.5Å resolution. J. Mol. Biol. 228, 672–686. Read R. J. (1985). Improved Fourier coefficients for maps using phases from partial structures with errors. Acta Cryst. A42, 140–149. Refaat., L. S. & Woolfson, M. M. (1993). Direct-space methods in phase extension and phase determination. II. Developments of low-density elimination. Acta Cryst. D49, 367–371. Richardson, J. W. & Jacobson, R. A. (1987). Computer-aided analysis of multi-solution Patterson superpositions. In Patterson and pattersons, edited by J. P. Glusker, B. Patterson & M. Rossi, pp. 311–317. Oxford: I.U.Cr. & O.U.P. Rollett, J. S. (1970). Least-squares procedures in crystal structure analysis. In Crystallographic computing, edited by F. R. Ahmed, S. R. Hall & C. P. Huber, pp. 167–181. Copenhagen: Munksgaard. Sayre, D. (1974). Least-squares phase refinement. II. High-resolution phasing of a small protein. Acta Cryst. A30, 180–184. 50 Selmer, M., Al-Karadaghi, S., Hirokawa, G., Kaji, A. & Liljas, A. (1999). Crystal Structure of Thermotoga maritima ribosome recycling factor: A tRNA mimic. Science 286, 2349–2352. Sheldrick, G. M. (1985). Computing aspects of crystal structure determination. J. Mol. Struct. 130, 9–16. Sheldrick, G. M. (1990). Phase annealing in SHELX-90: direct methods for larger structures. Acta Cryst. A46, 467–473. Sheldrick, G. M. (1991). Tutorial on automated Patterson methods to find heavy atoms. In Crystallographic computing 5, edited by D. Moras, A. D. Podjarny & J. C. Thierry, pp. 145–157. Oxford: I.U.Cr. & O.U.P. Sheldrick, G. M. (1993). Refinement of large small-molecule structures using SHELXL-92. In Crystallographic computing 6, edited by H. D. Flack, L. Párkányi & K. Simon, pp. 111–122. Oxford: I.U.Cr. & O.U.P. Sheldrick, G. M. (1998a). Location of heavy atoms by automated Patterson interpretation. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 131–141. Dordrecht: Kluwer Academic Publishers. Sheldrick, G. M. (1998b). SHELX: applications to macromolecules. In Direct methods for solving macromolecular structures, edited by S. Fortier, pp. 401–411. Dordrecht: Kluwer Academic Publishers. Sheldrick, G. M., Dauter, Z., Wilson, K. S., Hope, H. & Sieker, L. C. (1993). The application of direct methods and Patterson interpretation to high-resolution native protein data. Acta Cryst. D49, 18–23. Sheldrick G. M. & Gould R. O. (1995). Structure solution by iterative peaklist optimization and tangent expansion in space group P1. Acta Cryst. B51, 423–431. Sheldrick, G. M. & Schneider, T. R. (1997). SHELXL: high resolution refinement. Methods in Enzymology, 277, edited by C. W. Carter, Jr. & R. M. Sweet, pp. 319–343. San Diego: Academic Press. Shiono M. & Woolfson, M. M. (1992). Direct-space methods in phase extension and phase determination. I. Low-density elimination. Acta Cryst. A48, 451–456. Smith J. L. (1998). Multiwavelength anomalous diffraction in macromolecular crystallography. In Direct Methods for Solving Macromolecular Structures. Edited by Fortier S, Dordrecht: Kluwer Academic Publishers. pp. 211–225. 51 Stenkamp, R. E., Sieker, L. C. & Jensen, L. H. (1990). The structure of rubredoxin from Desulfovibrio desulfuricans strain 27774 at 1.5Å resolution. Proteins, Struct. Funct. Genet. 8, 352–364. Terwilliger, T. C. & Berendzen, J. (1996). Bayesian weighting for macromolecular crystallographic refinement, Acta Cryst. D52, 743–748. Trueblood, K. N. & Dunitz, J. D. (1983). Internal motion in crystals. The estimation of force constants, frequencies and barriers from diffraction data. A feasibility study. Acta Cryst. B39, 120–133. Usón, I., Pohl, E., Schneider, T. R., Dauter, Z., Schmidt, A., Fritz, H.-J. & Sheldrick, G. M. (1999). 1.7 Å Structure of the stabilised REIv mutant T39K. Application of local NCS restraints. Acta Cryst. D55, 1158–1167. Usón, I. & Sheldrick, G. M. (1999). Advances in direct methods for protein crystallography. Curr. Opinion in Struct. Biol. 9, 643–648. Usón I., Sheldrick G. M., de la Fortelle E., Bricogne G., di Marco S., Priestle J. P., Grütter M. G., & Mittl P. R. E. (1999). The 1.2Å crystal structure of hirustasin reveals the intrinsic flexibility of a family of highly disulphide bridged inhibitors. Structure 7, 55–63. Walsh, M. A., Dementieva, I., Evans, G., Sanishvili, R. & Joachimiak, A. (1999). Taking MAD to the extreme: ultrafast protein structure determination. Acta Cryst. D55, 1168–1173. Xu, H., Weeks, C. M., Deacon, A. M., Miller, R. & Hauptman, H. A. (2000). Ill-conditioned Shake-and-Bake: the trap of the false solution. Acta Cryst. A56, 112–118. Yeates, T. O. & Fam, B. C. (1999). Protein crystals and their evil twins. Structure 7, R25–R29. 52 8. Other useful sources of information The following internet addresses may be consulted for programs discussed during the Workshop: SHELX (George Sheldrick): http://shelx.uni-ac.gwdg.de/SHELX/ PLATON (Ton Spek): http://www.cryst.chem.uu.nl/platon/ WinGX (Louis Farrugia): http://www.chem.gla.ac.uk/~louis/wingx/ XtalView (Duncan McRee): http://www.scripps.edu/pub/dem-web/toc.html Raster3D (Ethan Merritt): http://www.bmsc.washington.edu/raster3d/ Parvati (Ethan Merritt): http://www.bmsc.washington.edu/parvati/ and other speakers’ homepages are as follows: Bill Clegg: http://www.staff.ncl.ac.uk/w.clegg/ Regine Herbst-Irmer: http://shelx.uni-ac.gwdg.de/~rherbst/ Thomas Schneider: http://shelx.uni-ac.gwdg.de/~trs/ Dale Tronrud: http://www.uoxray.uoregon.edu/dale/welcome.html Victor Young: http://www.chem.umn.edu/services/xraylab/ Hartmut Luecke had to withdraw as a speaker at short notice, but some of the material for his intended talk on the perils of ignoring twinning in macromolecules can be found at: http://anx12.bio.uci.edu/~hudel/br/twinning/ In addition, the strongly recommended books “Crystal Structure Determination” by Werner Massa, translated into English by Robert O. Gould (Springer, 2000) and “Practical Protein Crystallography”, 2nd Edition, by Duncan McRee (Academic Press, 1999) contain detailed accounts of the use of SHELX (for small molecules and macromolecules, respectively). 53