Patterson Map of a Crystal Crystal (real space) Patterson function (vector space) Solving the Phase Problem Perturbing the X-ray Scattering in a Predictable Way • Isomorphous replacement with heavyy atoms. • Anomalous scattering of x-rays by endogenous or added scatterers. • inelastic scattering of x-rays causes shift in phases of scattered rays. • extremely useful in conjunction with tunable (synchrotron) radiation Guessing the Phases • Molecular replacement using a model of a related object. • Direct methods – phase relationships for triplets of reflections. Locating Heavy Atoms Patterson vectors pile up (generate strong density) in peaks resulting from superposition of molecules by rotational symmetry. Peaks resulting from crystallographic symmetry are located on the Harker sections specified for each space group (except P1=Triclinic). Patterson space is CENTROSYMMETRIC, reflecting the contributions of pairs of vectors (a b, b a) for all atoms. Translational components of symmetry are absent in Patterson vector space. For example, space groups P2 and P21 have identical symmetry in Patterson space. A 2D Patterson: Finding Heavy Atoms A protein was crystallized in space group 19 (P212121) with the followin following symmetry operators: 1. 2. 3. 4. x,y,z -x+1/2, -y, z+1/2 x+1/2, -y+1/2, -z -x, y+1/2, -z+1/2 Harker vector equations: 1.-2. = 2x+1/2, 2y, 1/2 1.-3. = 1/2, 2y+1/2, 2z 1.-4. = 2x, 1/2, 2z+1/2 Finding Heavy Atoms A heavy atom derivative was prepared and : 1. 2. 3. 4. x,y,z -x+1/2, -y, z+1/2 x+1/2, -y+1/2, -z -x, y+1/2, -z+1/2 Harker vector equations: 1.-2. = 2x-1/2, 2y, 1/2 1.-3. = 1/2, 2y-1/2, 2z 1.-4. = 2x, 1/2, 2z-1/2 Finding Heavy Atoms What are the real space coordinates of the heavy atom(s)? Harker k vector equations: (0.5, 0.25, 0.6) = 1/2, 2y-1/2, 2z ; y= (+/-) 0.125 z= (+/-) 0.3 (0.25, 0.5, 0.1) = 2x, 1/2, 2z-1/2 ; x= (+/-) 0.125 z= (+/-) 0.3 Working with Experimental Phases Anomalous scattering is another source of phase information that we won’t have time to discuss. The phases calculated from each heavy atom derivative are improved by heavy atom parameter refinement (x,y,z, occupancy, and B-factor). We are refining the heavy atom model, taking into account the protein phases estimated from multiple sources. Our phase estimates contain errors causing incomplete closure l off th the ““phase h ttriangle.” i l ” Th The FIGURE OF MERIT corresponds to the cosine of the lack of closure error. Typically, an experimentally phased electron density map is calculated with each reflection weighted according to its figure of merit. Phase Improvement Density Modification: change the calculated density in sensible ways then back transform (Fourier synthesis) to obtain modified (more accurate) phases that can be subsequently applied to observed F(h)’s to improve the electron density map. • Add definition to boundary between protein and solvent, remove spurious density in solvent region. • Modify density values assigned as protein envelope to reflect values typical of the %solvent and resolution. • Calculate average density of multiple independent copies of the protein—apply noncrystallographic symmetry to superimpose molecules then calculate average values . Molecular Replacement Definition: Using phases from a known structure as the initial estimates to phase an unknown protein structure. We are guessing/hoping that the unknown resembles the known model protein. Procedure: position/orient known protein in unit cell of unknown to best match experimental diffraction data. Improve model by adding missing pieces and refining atomic parameters to better agree with experiment. What can go wrong? Errors in the MR model are perfectly correlated with calculated starting phases = model bias that is hard to detect and correct. In contrast, building a model into experimentally phased (heavy atom method) electron density results in errors that are uncorrelated with the starting phases. Improvements in the protein geometry and it’s fit to the density will result in a model with more accurate phases that can be combined with experimental phases to increase the accuracy of the electron density. Conclusion: experimental phases are always preferable to MR phases. Molecular Replacement How similar must the unknown/known proteins be in order for MR to succeed? • Inaccurately placed atoms become more evident at high resolution. At low resolution, the MR model may be reasonably accurate even though it fails to recapitulate high resolution features of the unknown protein. • Missing atoms contribute equally to error across all resolution. This missing information contributes to noise that obscures the “signal” of a correctly placed/oriented MR model. • A reasonable MR model might result in a Rcryst = 0.45-0.48 prior to model refinement (recall that fully refined models typically have a Rcryst = 0.20-0.26). A successful model typically includes an accurate representation of >70% of the unknown structure. Model Refinement The initial molecular replacement solution is refined against the experimental data (Fhkl’s) to improve model accuracy. New features of the unknown protein (additional side chains, missing segments) will appear in the electron density if the phases are improving. This is the same principle as the difference Fourier used to find “missing” heavy atoms in the isomorphous replacement method. Full atom refinement may fail if initial model is rough (inaccurate, poorly placed). In this case, the model is far from the true minimum and small random changes in atomic positions sampled during model refinement do not sample the correct solution. Rigid body refinement of the initial MR solution may provide a more accurate starting point for full atom refinement. Rigid body refinement consists of 3 translational and 3 rotational parameters. We’re treating the model as one rigid object. The model can be further divided into domains that are refined as independent bodies (can be linked by “springs” = geometric constraints). Model Refinement During model refinement, we are comparing |Fobs| (containing experimental errors, contributions from solvent scattering) to |Fcalc| (Fourier amplitudes of the “perfect protein” in a vacuum). Solvent scattering/contrast is most evident at low resolution (Fobs ~12-9Å), whereas model inaccuracy (Fcalc) is increasingly evident when comparing higher resolution terms. Can add a “solvent mask” term to Fcalc’s to improve agreement with Fobs at low resolution. This improves scaling of Fcalc to Fobs. Placing the MR Model in the Unit Cell of the Unknown Protein Goal: superimpose each domain of the MR model protein onto homologous domains of the unknown protein. Test: all possible orientations/positions of the protein in the unit cell. Target function: calculate agreement between Fobs and Fcalc as model is rotated/translated. Use simple difference |Fobs - Fcalc| or correlation function between observed and calculated (MR model) structure factor amplitudes. Placing the MR Model in the Unit Cell of the Unknown Protein Practical: usually need to break the problem into 2 steps. Rotation function (Patterson based vector superposition) sets orientation, followed by a translation function to position the model in the unit cell (recall that the Patterson function superimposes interatomic vectors on a single origin, so translations are lost). Big problem: more than 1 protein molecule in asymmetric unit of unknown crystal. Too many combinations to test all orientations/positions of multiple molecules in a global search. Modeling this unit cell with a single protein MR model may result in too many “missing atoms” and failure to identify the correct solution. Placing the MR Model in the Unit Cell of the Unknown Protein How finely must all possible orientations/translations be sampled? Fcalc and Fobs must be correlated in the highest resolution shell that is sampled by MR calculations. At 4 Å resolution, a 1 Å error in atomic coordinates causes a ¼ wave (90 deg.) error in the phases! For a globular protein having a ~10 Å radius, a rotational error of 5 deg. would correspond to 1 Å in placement of atoms on and around the protein’s outer surface. Thus, candidate rotational orientations must be sampled in 5 deg. increments to obtain a correct solution with <90 deg. phase error for peripheral atoms of the MR model. It would be computationally (too) expensive to do this fine rotational sampling simultaneously with all possible translations (in <1 Å increments). Full rotation/translation searches (simultaneously) are only practical if we’re searching a small region of space that we know contains the correct solution. The Rotation Function The Patterson function is a map of all interatomic vectors in the crystal. A spherical region of the Patterson centered on the origin includes short interatomic vectors, and excludes longer vectors relating atoms in different molecules in the crystal. The large origin peak (self vectors) of the Patterson function can be subtracted to improve contrast in remaining regions=better signal to noise ratio. Idea: if 2 structures have some domain in common, then at some resolution, their spherically-cut, origin-subtracted Pattersons maps should have a subset of vectors in common when the structures are properly oriented. (parameters to be optimized are underlined) The Rotation Function Self-rotation function: both copies of spherically-cut Patterson function come from unknown crystal. The idea is to see if there are multiple NCS-related copies of a protein inside unit cell (largest peaks are caused by crystallographic symmetry). Cross-rotation function: sample Patterson of known model in different orientations against Patterson of unknown crystal in an attempt to find corresponding orientation of search model. • Put 1 copy of atomic model into empty box at least 2x the size of the model, in order to avoid overlap with models in neighboring boxes of the “crystal.” • Fourier transform of model => Fcalc => square to obtain Icalc => Fourier inverse => Pcalc(u) => spherical cut => rotate model and repeat. The Rotation Function Sampling of (α,β,γ) during RF depends on size of molecule and resolution. It is common to work at 10-4 Å resolution to determine global orientation without requiring extremely fine sampling of (α,β,γ). Peaks of Patterson function are about 2x wider than Fourier peaks, making RF solutions inaccurate. Can refine RF solutions by Patterson Correlation (PC) refinement (see Brunger et al.) A “correct” RF is sometimes distinguished by its high value, but it is common for correct solution to be further down the list of candidate solutions. Customary to evaluate several RF solutions in subsequent calculations. High crystallographic symmetry makes the RF noisy because single molecule used as search object represents smaller fraction of total interatomic vectors in unknown crystal. Translation Function Assume that we have a list of candidate RF solutions including the correct answer. For each candidate RF, apply translations to generate every possible position of search molecule (on appropriately fine grid): • Generate neighboring molecules by applying crystal symmetry. • Fourier transform the ensemble (calculate Fcalc). • Evaluate the TF: TF = ∑hkl I (h) obs I (h) calc • This sum, calculated over all (hkl)s in the resolution range, minimizes the least squares residual between Iobs and Icalc. A Tail of Two Cats Fourier amplitudes recorded without phases http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html A Manx Cat (incomplete model for molecular replacement) Apply Manx Phases to Cat Amplitudes F.T. reveals the new information that was not in model phases MAD Phasing Collection of anomalous scattering data at specific wavelengths where heavy atoms scatter strongly. This is a Multiwavelength Anomalous Diffraction experiment. For anomalous scattering, isomorphism is perfect. However anomalous signal is small and requires However, accurate intensity measurements. Anomalous signal increases with resolution, but diffraction intensity decreases, resulting in lower accuracy measurements at high angles of diffraction. Judging the Quality of X-ray Structures X-ray Data Quality • Rsym– the error in measured intensities of equivalent reflections (typically ranging from 3% at low resolution to 35% at the high resolution limit) limit). • Resolution, signal-to-noise ratio (I/sigma > 3-4 for useful data) Crystallographic Model Quality • An crystallographic model is constructed to represent the electron density obtained from the diffraction experiment. • Rcryst – the error in agreement between the model and experimental structure fa factor t amplitudes tudes (typically ranging from 16% (high resolution structure) to 28% (lower resolution). • Free R-factor (Rfree) – a crystallographic R-factor calculated from a small set (5-10%) of reflections that are reserved and not used during model refinement (Rfree is typically larger (+ 2-4% ) than Rcryst). Over-refinement causes an artificial decrease in Rcryst with little or no change in Rfree. Judging the Quality of X-ray Structures Crystallographic Model Quality (cont) • Agreement between the model and known structures. • Ramachandran plot. • Deviation from standard geometry (bond angles, lengths, etc.). • Fold recognition – does the model look like any other proteins in the protein data bank? Does the model satisfy other th experimental constraints/data? • Locations of functionally important residues. • Shape consistent with known function(s). “Table 1” : A Standard for Crystal Structure Papers X-ray Scattering Basics • Recall that X-ray diffraction results from the interaction of waves (x-rays) with matter (electrons bound to atoms of our protein). • Electromagnetic waves have electrical and magnetic components oriented perpendicular to one another and to the direction of travel. • A wave can be described by a cosine function with an amplitude and period (wavelength): A•cos(2πντ) ( ) or A•cos(2πx/λ) 9