Document 13999368

advertisement
Patterson Map of a Crystal
Crystal (real space)
Patterson function (vector space)
Solving the Phase Problem
Perturbing the X-ray Scattering in a Predictable Way
• Isomorphous replacement with heavyy atoms.
• Anomalous scattering of x-rays by endogenous or added scatterers.
• inelastic scattering of x-rays causes shift in phases of scattered rays.
• extremely useful in conjunction with tunable (synchrotron) radiation
Guessing the Phases
• Molecular replacement using a model of a related object.
• Direct methods – phase relationships for triplets of reflections.
Locating Heavy Atoms
Patterson vectors pile up (generate strong density) in peaks resulting
from superposition of molecules by rotational symmetry. Peaks
resulting from crystallographic symmetry are located on the Harker
sections specified for each space group (except P1=Triclinic).
Patterson space is CENTROSYMMETRIC, reflecting the contributions of
pairs of vectors (a b, b a) for all atoms.
Translational components of symmetry are absent in Patterson vector
space. For example, space groups P2 and P21 have identical symmetry
in Patterson space.
A 2D Patterson:
Finding Heavy Atoms
A protein was crystallized in space group 19 (P212121) with
the followin
following symmetry operators:
1.
2.
3.
4.
x,y,z
-x+1/2, -y, z+1/2
x+1/2, -y+1/2, -z
-x, y+1/2, -z+1/2
Harker vector equations:
1.-2. = 2x+1/2, 2y, 1/2
1.-3. = 1/2, 2y+1/2, 2z
1.-4. = 2x, 1/2, 2z+1/2
Finding Heavy Atoms
A heavy atom derivative was prepared and :
1.
2.
3.
4.
x,y,z
-x+1/2, -y, z+1/2
x+1/2, -y+1/2, -z
-x, y+1/2, -z+1/2
Harker vector equations:
1.-2. = 2x-1/2, 2y, 1/2
1.-3. = 1/2, 2y-1/2, 2z
1.-4. = 2x, 1/2, 2z-1/2
Finding Heavy Atoms
What are the real space coordinates of the heavy atom(s)?
Harker
k vector equations:
(0.5, 0.25, 0.6) = 1/2, 2y-1/2, 2z ; y= (+/-) 0.125 z= (+/-) 0.3
(0.25, 0.5, 0.1) = 2x, 1/2, 2z-1/2 ; x= (+/-) 0.125 z= (+/-) 0.3
Working with Experimental Phases
Anomalous scattering is another source of phase
information that we won’t have time to discuss.
The phases calculated from each heavy atom derivative are
improved by heavy atom parameter refinement (x,y,z,
occupancy, and B-factor). We are refining the heavy
atom model, taking into account the protein phases
estimated from multiple sources.
Our phase estimates contain errors causing incomplete
closure
l
off th
the ““phase
h
ttriangle.”
i
l ” Th
The FIGURE OF MERIT
corresponds to the cosine of the lack of closure error.
Typically, an experimentally phased electron density map is
calculated with each reflection weighted according to its
figure of merit.
Phase Improvement
Density Modification: change the calculated density in
sensible ways then back transform (Fourier synthesis) to
obtain modified (more accurate) phases that can be
subsequently applied to observed F(h)’s to improve the
electron density map.
• Add definition to boundary between protein and solvent, remove
spurious density in solvent region.
• Modify density values assigned as protein envelope to reflect
values typical of the %solvent and resolution.
• Calculate average density of multiple independent copies of the
protein—apply noncrystallographic symmetry to superimpose
molecules then calculate average values .
Molecular Replacement
Definition: Using phases from a known structure as the initial estimates to
phase an unknown protein structure.
We are guessing/hoping that the unknown resembles the known model
protein.
Procedure: position/orient known protein in unit cell of unknown to best match
experimental diffraction data. Improve model by adding missing pieces and
refining atomic parameters to better agree with experiment.
What can go wrong? Errors in the MR model are perfectly correlated with
calculated starting phases = model bias that is hard to detect and correct.
In contrast, building a model into experimentally phased (heavy atom method)
electron density results in errors that are uncorrelated with the starting
phases. Improvements in the protein geometry and it’s fit to the density
will result in a model with more accurate phases that can be combined with
experimental phases to increase the accuracy of the electron density.
Conclusion: experimental phases are always preferable to MR phases.
Molecular Replacement
How similar must the unknown/known proteins be in order
for MR to succeed?
• Inaccurately placed atoms become more evident at high
resolution. At low resolution, the MR model may be reasonably
accurate even though it fails to recapitulate high resolution
features of the unknown protein.
• Missing atoms contribute equally to error across all resolution.
This missing information contributes to noise that obscures the
“signal” of a correctly placed/oriented MR model.
• A reasonable MR model might result in a Rcryst = 0.45-0.48 prior
to model refinement (recall that fully refined models typically
have a Rcryst = 0.20-0.26). A successful model typically includes
an accurate representation of >70% of the unknown structure.
Model Refinement
The initial molecular replacement solution is refined against the
experimental data (Fhkl’s) to improve model accuracy. New features of
the unknown protein (additional side chains, missing segments) will
appear in the electron density if the phases are improving. This is the
same principle as the difference Fourier used to find “missing” heavy
atoms in the isomorphous replacement method.
Full atom refinement may fail if initial model is rough (inaccurate, poorly
placed). In this case, the model is far from the true minimum and small
random changes in atomic positions sampled during model refinement
do not sample the correct solution.
Rigid body refinement of the initial MR solution may provide a more
accurate starting point for full atom refinement. Rigid body refinement
consists of 3 translational and 3 rotational parameters. We’re treating
the model as one rigid object. The model can be further divided into
domains that are refined as independent bodies (can be linked by
“springs” = geometric constraints).
Model Refinement
During model refinement, we are comparing |Fobs| (containing
experimental errors, contributions from solvent scattering)
to |Fcalc| (Fourier amplitudes of the “perfect protein” in a
vacuum).
Solvent scattering/contrast is most evident at low resolution
(Fobs ~12-9Å), whereas model inaccuracy (Fcalc) is
increasingly evident when comparing higher resolution
terms.
Can add a “solvent mask” term to Fcalc’s to improve agreement
with Fobs at low resolution. This improves scaling of Fcalc to
Fobs.
Placing the MR Model in the Unit
Cell of the Unknown Protein
Goal: superimpose each domain of the MR model protein
onto homologous domains of the unknown protein.
Test: all possible orientations/positions of the protein in the
unit cell.
Target function: calculate agreement between Fobs and Fcalc
as model is rotated/translated. Use simple difference
|Fobs - Fcalc| or correlation function between observed
and calculated (MR model) structure factor amplitudes.
Placing the MR Model in the Unit
Cell of the Unknown Protein
Practical: usually need to break the problem into 2 steps.
Rotation function (Patterson based vector superposition)
sets orientation, followed by a translation function to
position the model in the unit cell (recall that the
Patterson function superimposes interatomic vectors on
a single origin, so translations are lost).
Big problem: more than 1 protein molecule in asymmetric
unit of unknown crystal. Too many combinations to test
all orientations/positions of multiple molecules in a global
search. Modeling this unit cell with a single protein MR
model may result in too many “missing atoms” and
failure to identify the correct solution.
Placing the MR Model in the Unit
Cell of the Unknown Protein
How finely must all possible orientations/translations be sampled?
Fcalc and Fobs must be correlated in the highest resolution shell that is sampled
by MR calculations.
At 4 Å resolution, a 1 Å error in atomic coordinates causes a ¼ wave (90 deg.)
error in the phases!
For a globular protein having a ~10 Å radius, a rotational error of 5 deg. would
correspond to 1 Å in placement of atoms on and around the protein’s outer
surface.
Thus, candidate rotational orientations must be sampled in 5 deg. increments to
obtain a correct solution with <90 deg. phase error for peripheral atoms of
the MR model.
It would be computationally (too) expensive to do this fine rotational sampling
simultaneously with all possible translations (in <1 Å increments). Full
rotation/translation searches (simultaneously) are only practical if we’re
searching a small region of space that we know contains the correct
solution.
The Rotation Function
The Patterson function is a map of all interatomic vectors in the crystal.
A spherical region of the Patterson centered on the origin includes
short interatomic vectors, and excludes longer vectors relating
atoms in different molecules in the crystal.
The large origin peak (self vectors) of the Patterson function can be
subtracted to improve contrast in remaining regions=better signal to
noise ratio.
Idea: if 2 structures have some domain in common, then at some
resolution, their spherically-cut, origin-subtracted Pattersons maps
should have a subset of vectors in common when the structures are
properly oriented. (parameters to be optimized are underlined)
The Rotation Function
Self-rotation function: both copies of spherically-cut Patterson
function come from unknown crystal. The idea is to see if there are
multiple NCS-related copies of a protein inside unit cell (largest
peaks are caused by crystallographic symmetry).
Cross-rotation function: sample Patterson of known model in
different orientations against Patterson of unknown crystal in an
attempt to find corresponding orientation of search model.
• Put 1 copy of atomic model into empty box at least 2x the size of the
model, in order to avoid overlap with models in neighboring boxes of
the “crystal.”
• Fourier transform of model => Fcalc => square to obtain Icalc
=> Fourier inverse => Pcalc(u) => spherical cut => rotate model and
repeat.
The Rotation Function
Sampling of (α,β,γ) during RF depends on size of molecule and
resolution. It is common to work at 10-4 Å resolution to determine
global orientation without requiring extremely fine sampling of
(α,β,γ).
Peaks of Patterson function are about 2x wider than Fourier peaks,
making RF solutions inaccurate. Can refine RF solutions by
Patterson Correlation (PC) refinement (see Brunger et al.)
A “correct” RF is sometimes distinguished by its high value, but it is
common for correct solution to be further down the list of candidate
solutions.
Customary to evaluate several RF solutions in subsequent calculations.
High crystallographic symmetry makes the RF noisy because single
molecule used as search object represents smaller fraction of total
interatomic vectors in unknown crystal.
Translation Function
Assume that we have a list of candidate RF solutions including the
correct answer.
For each candidate RF, apply translations to generate every possible
position of search molecule (on appropriately fine grid):
• Generate neighboring molecules by applying crystal symmetry.
• Fourier transform the ensemble (calculate Fcalc).
• Evaluate the TF:
TF = ∑hkl I (h) obs I (h) calc
• This sum, calculated over all (hkl)s in the resolution range, minimizes
the least squares residual between Iobs and Icalc.
A Tail of Two Cats
Fourier amplitudes
recorded without phases
http://www.ysbl.york.ac.uk/~cowtan/fourier/fourier.html
A Manx Cat
(incomplete model
for molecular
replacement)
Apply Manx Phases
to Cat Amplitudes
F.T. reveals the new
information that was not
in model phases
MAD Phasing
Collection of anomalous scattering data at specific
wavelengths where heavy atoms scatter strongly. This
is a Multiwavelength Anomalous Diffraction experiment.
For anomalous scattering, isomorphism is perfect.
However anomalous signal is small and requires
However,
accurate intensity measurements.
Anomalous signal increases with resolution, but diffraction
intensity decreases, resulting in lower accuracy
measurements at high angles of diffraction.
Judging the Quality of X-ray Structures
X-ray Data Quality
• Rsym– the error in measured intensities of equivalent reflections
(typically ranging from 3% at low resolution to 35% at the high
resolution limit)
limit).
• Resolution, signal-to-noise ratio (I/sigma > 3-4 for useful data)
Crystallographic Model Quality
• An crystallographic model is constructed to represent the electron
density obtained from the diffraction experiment.
• Rcryst – the error in agreement between the model and
experimental structure fa
factor
t amplitudes
tudes (typically ranging from
16% (high resolution structure) to 28% (lower resolution).
• Free R-factor (Rfree) – a crystallographic R-factor calculated from
a small set (5-10%) of reflections that are reserved and not used
during model refinement (Rfree is typically larger (+ 2-4% ) than
Rcryst). Over-refinement causes an artificial decrease in Rcryst with
little or no change in Rfree.
Judging the Quality of X-ray Structures
Crystallographic Model Quality (cont)
• Agreement between the model and known structures.
• Ramachandran plot.
• Deviation from standard geometry (bond
angles, lengths, etc.).
• Fold recognition – does the model look like
any other proteins in the protein data bank?
Does the model satisfy other
th experimental constraints/data?
• Locations of functionally important residues.
• Shape consistent with known function(s).
“Table 1” : A Standard for Crystal Structure Papers
X-ray Scattering Basics
• Recall that X-ray diffraction results from the interaction of
waves (x-rays) with matter (electrons bound to atoms of our
protein).
• Electromagnetic waves have electrical and magnetic components
oriented perpendicular to one another and to the direction of
travel.
• A wave can be described by a cosine function with an amplitude
and period (wavelength):
A•cos(2πντ)
(
)
or
A•cos(2πx/λ)
9
Download