Phasing by multiple isomorphous replacement Mike Lawrence Walter & Eliza Hall Institute of Medical Research Parkville, Melbourne Lecture 1 Heavy atom derivatives and how to make them Some history.... • Bernal (1934) – showed that pepsin crystals diffracted, suggesting that it may be possible to determine the atomic structure of proteins. • Patterson (1934) – showed that the distances between “heavy” atoms in a molecule could be computed from diffraction intensities alone, without the need for phases. • Kendrew, J. et al. (1958) A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181, 662666. • Muirhead, H. & Perutz, M.F. (1963) Structure of haemoglobin. A three-dimensional Fourier synthesis of reduced human haemoglobin at 5.5 Å resolution. Nature 199, 633-638. Why heavy atoms can be used to yield phases Consider a protein crystal doped with heavy atoms. The structure factors then obey the equation FPH = FP + FH where FPH is the structure factor for the derivatized crystal, FP is the structure factor for the protein crystal, and FH is the structure factor for the heavy atoms alone. provided that the structure of the protein and the basic assembly of the crystal is not changed by the addition of the heavy atoms (i.e. the derivative is isomorphous!). Expected differences in diffraction intensities due to heavy atoms It can be shown that for acentric reflections, the magnitude of the heavy atom differences ∆I = |IPH – IP| is related to the relevant cumulative atomic numbers of the protein and heavy atoms by the equation √<∆I>² / <IP> ≈ √(2<IH> / <IP>) and <I> = Σifi² . Hence for a protein with one mercury atom: (for sin Θ / λ = 0): MW (kDa) <I>/<I> 28 0.36 56 0.25 112 0.18 224 0.13 448 0.09 i.e. a significant change in intensity results from the addition of even a single heavy atom Why heavy atoms work We have FPH = FP + FH We can measure the amplitudes of FP and FPH. Patterson methods allow us to compute FH, the vector quantity (I’ll show you this later…) Simple geometry then shows that there are, in general, two possibilities for the phase of FP, assuming that there is no anomalous signal. A second (i.e. multiple ! ) derivative will yield another two possibilities for the phase of FP, and the phase that is in common should then be the correct phase for FP (QED !) Phase choice 1 Phase choice 2 Where the problems lie… 1. Not all derivatives diffract or diffract as well as the native. 2. Not all derivatives are isomorphous to the native. 3. Determination of the heavy atom positions may not be possible. 4. Errors in |FP| and |FPH| and lack of isomorphism can make the theory difficult to apply, i.e. the phases generated can be hopelessly inaccurate. Powerful algorithms have thus been developed to handle the error issues, yielding carefully weighted phase sets and unbiased determination of the heavy atom detail. Making heavy atoms derivatives There are four fundamental strategies for making heavy atom derivatized crystals:1. Soak the crystal in a solution containing heavy atoms. 2. Co-crystallize the protein with a heavy atom containing solution. 3. Simply use a heavy atom already in the protein (e.g. metalloprotease). 4. Covalently modify the protein to include a heavy atom. Crystal soaking Advantages Easy to do. Simply make up a solution of the heavy atom in the crystallization solution and soak the crystal in it (hours to days). Disadvantages Not easy to control the derivatization. May destroy the crystal or reduce its resolution. Co-crystallization Advantages Controlled stoichiometry. Avoids steric hindrance problems. Disadvantages Protein may no longer crystallize or may no longer crystallize in the same space group and unit cell (i.e. no longer isomorphous). Use heavy atoms already within the protein Advantages Particularly suitable for MAD (multiple anomalous dispesrion) techniques Disadvantages May not be sufficient anomalous signal at the wavelengths available. Requires synchrotron radiation. Covalently modify the protein to include a heavy atom Advantages Suitable for MAD and MIR. Should yield good derivatives. Disadvantages Requires additional work at the protein production stage. May not be suitable for all proteins. May change crystallization conditions. Which heavy metal do I use? There are some rules, in particular which relate to the pH of the precipitant solution, the chemical composition of the precipitant solution and to the sequence of the native protein. However, these are guides only and provide no guarantee of success. For example - platinum binds preferentially to histidine residues. - platinum derivatization may not work in ammonium sulphate solutions. Native gels may also be used to pre-evaluate heavy atom binding. http://www.doembi.ucla.edu/~sawaya/m230d/Crystallization/crystallization.html Making up heavy metal solutions These are toxic. Care must be taken, particularly with methyl mercury derivatives, with uranium salts, and with some explosive mixtures. Study the MSDS and read the literature. Clean up after yourself and avoid contaminating tips, lab, other people's protein, etc. Work with as small a volume as possible. Start with say a 2mM solution and soak overnight. Solubility of heavy atoms in precipitant may not be high, organic solvents such as DMSO may help. Tranfer the crystal to the heavy atom solution. Monitor the crystal for signs of decay during soaking- check for dissolving, cracking, colour change. Leave it to soak overnight. Collect some diffraction frames. Check if the cell is the same. Check the resolution. If the diffraction is strong, process these frames and compare the data with the native data. Survey of successful derivatives Some useful sites: http://www.sbg.bio.ic.ac.uk/had/ http://hatodas.harima.riken.go.jp/ http://eagle.mmid.med.ualberta.ca/tutorials/HA/ http://www-bio3d-igbmc.u-strasbg.fr/~romier/heavyatom.html Platinum derivatives have proven the most popular, in particular K2PtCl4, K2Pt(NO2)4 and K2PtCl6. Mercury, gold, tungsten, uranium derivatives are also popular. The di-platinum-di-iodo derivative PIP is very useful for large proteins. Xenon is also particularly interesting. Lecture 2 Anomalous scattering & the collection and processing of diffraction data from derivatized crystals X-ray scattering X-ray scattering by atoms involves the following process:1. The electromagnetic field of the incident X-ray beam exerts a force on the electrons within the atoms. 2. To a first approximation the electrons can be regarded as free electrons which oscillate at the same frequency as the incident X-rays. 3. These electrons then hence emit X-rays of the same wavelength as the incident X- rays. 4. The phase of the scattered X-rays differs by 180° from that of the incident X-rays. This is called the free-electron approximation to X-ray scattering. Anomalous X-ray scattering The free electron approximation breaks down for "high" Z atoms. The inner electrons are more tightly attached to the nucleus than the outer electrons, and at certain X-ray wavelengths these electrons may be ejected from the inner shell (say K) into the continuous energy region. This happens if the incident wavelength is close to what is termed the absorption edge. The electron then re-emits an X-ray photon as it falls back into a lower energy shell (say L). The emitted photon does not necessarily differ in phase by 180° from the incident photon. Anomalous X-ray scattering This diagram show the total scattering by an atom. It consists of the free electron component f and an anomalous component Δf + if". It is the magnitude of f" that is of most importance in phase determination. The anomalous component is a function of both Z and wavelength, and tables of Δf and f“ exist for each wavelength and Z. ftotal fiso fanom Δf if" ftotal = fiso + f + if” = fiso + fanom Friedel's Law Friedels law states the the reflections FP(+) and FP(-) are symmetrically located with respect to the horizontal axis, i.e. the phase of the reflection (-h,-k,-l) is opposite in sign to the phase of the reflection (h,k,l). FP(+) a -a FP(-) The same applies to FH(+) and FH(-). Provided that there is no anomalous scattering !! Breakdown of Friedel's Law FH(+) FP(+) FPH(+) FP(-) FPH(-) FH(-) Friedel’s law breaks down if the heavy atom scatters anomalously and there is then no phase or amplitude relationship between FPH(+) and FPH(-). Anomalous scattering The consequences of anomalous scattering are as follows 1. The Friedel pairs are no longer have the same intensity nor do they have related phases. 2. The magnitude of the anomalous scattering for a particular atom varies considerably as a function of incident wavelength - it is strongest near the so-called absorption edges. 3. The anomalous signal can be used to break the phase ambiguity inherent in SIR (SIRAS). 4. The anomalous scattering can be used to "generate" multiple derivatives by the incident wavelength is changed (MAD). 5. The incident wavelength can be tuned to match scatterers within the protein itself (SAD). Centric and acentric reflections Space group symmetry sometimes constrains phases of certain reflections to have only a limited, finite number of phase possibilities. For example, in orthorhombic space groups, any reflection of the type (hk0), (h0l) and (0kl) may only have a phase of 0 or 180. The axial planes in reciprocal space are thus referred to as centric zones, and the reflections within these planes are referred to as centric reflections. Reflections with un-restricted phases are referred to as acentric reflections. The probability distributions associated with these two classes of reflections are different, and in any phasing program, statistics are often reported separately for the two classes of reflections. It is in general "easier" to determine the phase angle of centric reflections. Processing heavy atom data Collecting derivative data Derivative data are collected and processed in much the same way as native data, except that the Friedel pairs are not merged with each other (as they are not equivalent if anomalous scattering is present). This implies that a greater range of oscillation data will have to be collected to achieve a complete data set, or at least a data set of equal multiplicity to that of the native. Once the first derivative oscillation image has been collected, it should be processed to ensure that the derivative unit cell is the same as that of the native and that the crystal diffracts to adequate resolution. If the derivative and native unit cells are the "same" and the crystal exhibits a useful level of resolution, then it is worth collecting more data. Once a reasonable amount of data has been collected, it can be compared with the native to check for isomorphous derivatization. Scaling the derivative data to the native data Comparison of the derivative data with those of the native requires first that they be placed on the same scale. There are a variety of ways of doing this, we will consider only the least squares approach in SCALEIT. Minimize the sum of weighted squares of isomorphous differences: w (k|FPH| - |FP|)2 with respect to the scale factor. w (weight) = reciprocal variance of the isomorphous difference: w = 1/((k PH)2 + P2) The assumption is that |FH| can be ignored; this introduces an error of 5-10% in the scale factor. Scaling the derivative data to the native data In CCP4 this is done with the program SCALEIT. SCALEIT provides a variety of very useful features: (i) The derivative data is scaled w.r.t. the native data by means of a single scale factor plus an anisotropic temperature factor matrix. (ii) The scaling can be determined from a selected reliable subset of reflections and then applied to all reflections (this avoids using poor data in the scaling exercise). (iii) Anomalous data can be treated separately. (iv) Various measures of the isomorphous differences are reported. These can be of great value in assessing whether to continue with data collection, whether to re-soak for a different time, whether to move on to another condition or whether the derivative is isomorphous. Assessing derivatization Some rules of thumb for looking at SCALEIT output. 1. Calculate the isomorphous R-factor on F's. This should be in the range 12% - 25% overall. 2. The isomorphous R-factor will in all likelihood be higher at very low resolution (due to problems in scaling) and higher at high resolution (due to data inaccuracy). Nevertheless it should exhibit a large "flat" regime across the useful resolution range. 3. The weighted R-factor is also a useful number. This should more or less match the R-factor. If it is much lower then most of the differences is due to noise in the data rather than to isomorphous signal. Assessing derivatization Some rules of thumb (continued) 4. The scale factor should be fairly uniform across the resolution range. 5. There should not be many outliers. These should be rejected and SCALEIT re-run. 6. The derivative data should not be significantly anisotropic or there may be problems in scaling it to the native. How many heavy atoms sites Having established that the derivative shows significant isomorphous differences with respect to the native, the next questions are - how many heavy atoms sites are there in the crystal ? - what are their coordinates ? - in what stoichiometric ratio are the heavy atoms bound at these sites ? - what are their thermal (B) values ? Remember, in soaking experiments not all sites are necessarily 100% occupied. Ideally, one wants as few sites as possible commensurate with the overall Riso that one seeks to achieve. Lecture 3 Patterson functions Patterson functions The Patterson function is the auto-correlation function of the electron density ρ(x) of the structure. The key points here are (i) The Patterson function is independent of the structure factor phases and can thus be computed directly from the reduced intensity data for the reflections. (ii) The peaks in the Patterson function of the difference electron density (derivative-native) correspond approximately to the interatomic distances between the heavy atoms themselves, provided that the derivative is isomorphous and that the differences in structure factor amplitudes are small (i.e. in the Riso range of about 10 -28 %). Calculation of the Patterson function The Patterson (auto-correlation) function P of ρ(x) is P(x) = ∫ρ(u)ρ(u-x)du Peaks in P(x) correspond to the vectors between peaks in ρ(x). P(x) is easily computed using the fact that correlation in real space corresponds to multiplication in Fourier space. Remember, no phases are needed! What does a Patterson function look like ? Here is a simple 2D Patterson function for a set of three atoms. Real space The blue dots are atoms, linked by their inter-atomic vectors. Patterson space The red dots plot the inter-atomic vectors. Properties of a Patterson Properties of the Patterson function: 1. It is centrosymmetric i.e. for every pair of heavy atoms the displacement between them can be considered in either the positive or negative sense. Hence, for a given heavy atom set within the crystal, the Patterson cannot distinguish between the set and its mirror image. 2. The height of a peak in the Patterson is proportional to the product of the atomic numbers of the atoms responsible for the peak. 3. The symmetry of the Patterson function is that of the Laue group of the diffraction pattern (i.e. screw axes become non-screw axes and a centre of symmetry is added) 4. Symmetry elements in real space give rise to peaks in special planes or lines in Patterson space, termed Harker planes or lines. 5. The Patterson function has a large origin peak (why?). How do we deconvolute the Patterson function ? Deconvolution has traditionally be done by hand, starting with an inspection of the Harker sections of the Patterson function. Peaks in these sections arise from vectors between a particular heavy atom and its symmetry-related mates and allow coordinates to be assigned with limited ambiguity to all the heavy atoms. However, peaks between one heavy atom and another (not a symmetry mate) are not constrained to be in any special position. Interpretation of these peaks allows one to check the coordinates assigned from Harker sections and to resolve the ambiguity in the peak coordinates. Simple Harker sections Consider the space group P222 with a heavy atom at (x,y,z). The symmetry related heavy atoms are therefore at (-x,y,-z), (x,-y,-z) and (-x,-y,z). Interatomic vectors are therefore of the form (2x,0,2z), (2x,2y,0) and (0,2y,2z), i.e. they all lie on the axial planes. The axial planes are termed the Harker planes corresponding to P222, and one need look only in these planes for the vectors between atoms and their symmetry-related partners. Given peak coordinates of the form (u,v,w) within the Harker planes, simple algebra allows the determination of (x,y,z). But note that one can always add or subtract 1/2 from any one or more of the final coordinates! Why? How do we deconvolute the Patterson function ? Thus the manual search process is as follows. 1. List all the peaks in the Harker section. Try to find self consistent sets that yield trial coordinates of each and every heavy atom. Note that in practice the peaks may be of varying height, some peaks may be entirely absent, some peaks may be the superposition of more than one vector. Note further that there can be both a hand and origin ambiguity associated with the vectors. Try to account for all the peaks present. 2. Then take the possible coordinates of each pair of atoms and search for all the cross peaks, in an endeavour to resolve the ambiguities and to check the assignment. In practice... Even once the ambiguity in the heavy atom coordinates has been resolved, one may still be left with overall ambiguity in the hand of the space group or in the hand of the heavy atom cluster. Clearly the process is complicated but it can be automated to some extent. A particularly useful semi-automated search program in CCP4 is RSPS. Automated Patterson does not start from the list of Harker peaks. Instead it considers every possible coordinate (x,y,z) in the crystallographic asymmetric unit, computes the corresponding Harker coordinates (uvw) and then evaluates a score function of the Patterson values at these positions (say the sum of the Patterson values). The (xyz) coordinates with the highest score function in Patterson space are then retained as potential heavy atom sites. In practice... One may then generate all the possible cross vector sets, allowing for ambiguity and use a score function to check these, finally taking the set of heavy atom positions that have the highest overall score. Alternatively one may perform cross-vector searches directly. From a starting set of heavy atom positions one may search for a further site by simply computing the coordinates of all possible cross-vectors between it and the starting set and scoring these positions in Patterson space. Then check for the corresponding This technique is sometimes a better way to proceed than Harker searches as there are more vectors involved and that leads to better averaging of scores. Lecture 4 - How do we calculate the phases The heavy atom model Assume we have m derivatives. We then have the following measurements: The native data : |FP|, σP The derivative data sets : |FPHn|, σPHn (n=1,…,m) We are seeking to describe the difference between these by means of a heavy atom model made up of rn heavy atoms for each derivative n. Coordinates xi,n,yi,n,zi,n Temperature factors bi,n Occupancies oi,n Anomalous scattering ∆fi,n,f”I,n Scale factor Sn and anisotropic temperature factor matrix [B]n i = 1, … , rn The heavy atom model i.e. we are seeking to use these few 10’s of numbers to describe the differences between a few 10’s of thousands of reflections! Furthermore, both the native and derivative data themselves may be quite inaccurate. So inevitably we cannot expect the model on the whole to succeed in accounting for all the differences between the each native reflection amplitude and each corresponding derivative reflection amplitude. Thus there are going to be errors. We start by minimizing w[Δ|Fiso| - |FH|calc]²over all reflections as a function of the model, where w is some weight. This gives some preliminary values to the model parameters. Remember that this is only an approximation! The heavy atom model Now we need to compute the phases. To do this we need to realize that there will be what is termed lack of closure i.e. the triangles do not quite close. So how does one compute the phase and its error? The standard way is to use what are called phase probabilities – these are based on assuming that the lack of closure itself has a gaussian error distribution with standard deviation of E = <|Δhkl|> (i.e. on average for a particular resolution range) . Then P(αhkl) = N exp(-Δ²(αhkl)/ 2E²) where P(αhkl) is the probability that reflection (hkl) has phase αhkl. Lack of closure The next strategy used to refine the heavy atom parameters is to minimize, using least squares, the lack of closure across the entire data set. Minimize hklmhklΔ²hkl where m is a weight (termed the figure of merit) indicating the reliability of the protein phase angle based on its probability. So this means that we iterate a procedure of refining HA parameters, computing phase angle distributions, refining HA parameters, computing phase angle distributions … until convergence. Problems, problems, problems All of this is rather complicated, and leads to a number of fundamental problems and poor convergence and incorrect and biased results. In particular, the least squares minimization assumes that the native data are error free. Furthermore, the native data is used for every derivative. This means that there is considerable overweighting of the native data (“tyranny of the native”). Furthermore, the phase angles themselves are functions of the parameters being refined, so we are assuming we now the very quantities we are trying to minimize! Maximum likelihood Maximum likelihood is a concept that underlies much of modern refinement procedures in protein crystallography, including molecular replacement (BEAST), crystallographic refinement (CNS, Refmac), isomorphous replacement (SHARP, MLPHARE), solvent flattening (RESOLVE). The idea here is quite simple, but highly powerful and gets rid of all the problems in traditional heavy atom refinement. It is based on so-called Bayesian statistics, wherein one assumes that one can calculate the probability of a particular observation occurring as a function of a set of underlying parameters. One then seeks to select a set of values for the parameters that have the maximum likelihood of resulting in that particular observation. Maximum likelihood These principles are increasingly applied to crystallography. The most well-defined formulation of a maximum likelihood treatment of the MIR problem is embedded in the program SHARP. SHARP treats the native protein itself as a derivative with no heavy atoms and hence does not favour it above others. The formulae involved are quite complicated and lead to probability expressions that are a function of the heavy atom coordinates, occupancy, Bvalue, and scattering as well as of the scale factors and lack of isomorphism error. I will not be describing SHARP. SOLVE SOLVE on the other hand is not a maximum likelihood program. However, it has some other advantages which make it valuable:It solves the Patterson functions automatically, refines the heavy atom positions by standard non ML techniques and then evaluates the solutions according to a number of criteria:(i) A Patterson score (ii) A difference Fourier score (iii) A figure of merit score (iv) An evaluation of protein/solvent partitioning in the native Fourier SOLVE keeps detailed lists of all solutions it builds up and tries to work out the best one based on the overall scores. Monitoring the success of MIR There are a number of statistics which are of value in MIR. RCullis = Σhkl Σα P(α) | |FP(obs) + FH(calc)| - FPH(obs) | / Σ hkl |FPH(obs) - FP(obs)| or <probability-weighted lack of closure> / <isomorphous difference> RCullis(centrics) < 0.6 excellent, < 0.9 usable. RCullis (ano) = Σhkl Σα P(α )| ano(obs) - ano(calc)| / Σhkl |ano(obs)| or <probability-weighted |observed - calculated anomalous difference|> / <|observed anomalous difference|> For anomalous data any RCullis < 1 is useful. Monitoring the success of MIR Phasing power (PP) = <|FH|> / <(|∆|)> or <heavy-atom amplitude> / <probability-weighted lack of closure error> PP > 1.5 excellent, > 1 good, > 0.5 usable. The Cullis R-factor and the phasing power are the two most useful statistics for individual derivatives. Monitoring the success of MIR Mean Figure of Merit (FOM) FOM = mean of <cos(Δα)> FOM is a measure of the precision of the "best" phase. The FOM can be computed for both single derivatives as well as for the overall phase set. All statistics should be computed for a range of resolution. What is normally true is that could stats can be generated at low resolution and these become progressively poorer at high resolution. The final overall values will depend on the resolution cutoff applied. What do MIR maps look like? Maps calculated from heavy atoms phases can vary considerably in quality, depending on the number of derivatives, the degree of anomalous scattering and the resolution. Sometimes it may not even be possible to see much beyond differentiation of protein and solvent. Nevertheless density modification can be used to improve these maps further. Density modification involves systematic alteration to the map in order to force it to obey certain constraints - the most powerful of these are non-crystallographic symmetry averaging and solvent flattening. Lecture 5 - Density modification Lecture 5 - Density modification Density modification (DM) is one of the most powerful phasing techniques. Essentially it involves applying known constraints to an electron density map in order to improve the phases. Lecture 5 - Density modification The basic idea behind density modification is as follows i) Calculate an (Fobs, acalc) map. Call this calc . ii) Apply the density constraints to the map to generate a new map calc' . iii) Calculate the Fourier transform of this map to generate a new set of structure factors (F'calc,a'calc). iv) Return to step (i) and calculate the map (Fobs, a'calc). v) Repeat this until the procedure converges. Density modification in practice The algorithm should in general lead to an improved map, provided that the density modification procedure is correct. However, there is always a bias in the calculation towards the starting phases, and much of the implementation of the procedure is directed towards breaking this bias, particularly if the constraints are relatively weak. Solvent flattening A particularly powerful algorithm is solvent flattening, wherein one seeks to bring in the prior knowledge that a large percentage of the unit cell consists of solvent, which is amorphous. As a consequence this will have uniform electron density. If one can predetermine which portion of the unit cell consists of solvent then one can modify the calculated electron density so that this will indeed be the case. As a consequence, around 50% of the electron density to be determined is 'automatically known' The trick is how to determine the protein-solvent boundary. Various algorithms exist, and in common implementations of solvent flattening the molecular envelope itself be iteratively improved as the phases improved. This technique is particularly powerful when the solvent content is high, less powerful when it is low. RESOLVE is a maximum likelihood solvent flattening, auto-building program Non crystallographic symmetry averaging This technique employs, in addition to solvent flattening, the constraint that the asu may contain identical copies of the same molecule. In this case, one has the constraint that the electron density associated with these regions of the unit cell should be identical, whilst all voume outside of the protomers is solvent and should have uniform e.d. This constraint is enormously powerful and in the case of high copy number (e.g. viruses) can be used to improve enormously even extremely poor starting phases. The technique can also be applied to phase extension, a technique whereby one can progressively generate higher resolution phases. Histogram matching This technique is not very powerful, but is easily implemented and can lead to some phase improvement. The idea is that the electron density distributions of all proteins are more or less identical. If the electron density distribution of a calculated map does not match that expected, histogram equalization technqiues can be applied to force the experimental e.d. distribuation to match that expected. Do not expect much gain in phase accuracy by this technique. Other DM techniques Powerful high resolution techniques include skeletonization (wherein one attempts to discern backbone and then force the backbone to have a tube like form, and atomicity constraints, wherein one forces the distribution to have an atom-like appearance (thru the implementation of a Sayre's equation constraint). These techniques are not particularly useful for MIR as the resolution of MIR maps is in general rather low, and if the resolution is high then auto-building technqiues can be employed instead.