[PC] Tutorial 2: Phasing by isomorphous replacement

advertisement
Tutorial 2: Phasing by isomorphous replacement
Mike Lawrence, September 2007
You will have already completed Tutorial 1, the aim of which was to teach you how to process
oscillation data using the HKL suite and, as an example, to process the native data set for the
pneumococcal surface antigen A (PsaA) data set.
The aim of this tutorial is to teach you
(i) how to process a heavy atom derivative data set
(ii) how to compare this data set with the native data set to see whether or not the heavy atom is
bound in a useful fashion
(iii) how to inspect heavy atom Patterson maps
(iv) how to compute heavy atom phases and improve them by solvent flattening
(v) how to inspect and interpret heavy atom maps
In order to do this tutorial you will need to have access to a computer containing the platinum
nitrate derivative data set for PsaA, the processed native data set for PsaA from Tutorial 1, and a
variety of software and their manuals.
You should also have completed as much as possible of the isomorphous phasing pages on the
Birkbeck course.
The theory associated with heavy-atom phasing and the practical detail of heavy atom
derivatization will be dealt with in the lectures.
Once the heavy atom data sets have been obtained there are an increasing variety of pathways to
obtained heavy atoms maps, depending on the software package being used. These pathways are
more or less automated depending on the packages being used..
This tutorial will focus on one pathway – use HKL to process the data, use CCP4i (the CCP4 gui)
to scale and merge the heavy atom data with the native data set, use CCP4i to compute difference
Pattersons as appropriate, use SOLVE for determining the heavy atom positions and RESOLVE for
solvent flattening.
Other useful pathways involve the programs RSPS for manual determination of the heavy atom
positions, MLPHARE for heavy atom phasing, SHARP for heavy atom phasing and solvent
flattening, DM for solvent flattening. Undoubtedly more are emerging.
Part 1. Processing the derivative data set
Background
The K2Pt(NO3)4 data set consist of 202 one-degree oscillation images collected on a Rigaku R-axis
IV image plate detector mounted on a laboratory rotating anode X-ray source. The data set is stored
in the directory $PSAA_DATA/ptnit and the respective file names are of the form psa40###.osc,
where ### is a three-digit number ranging from 001 to 202.
This data set was obtained by soaking the crystal in 12 mM platinum nitrate overnight, and then
mounting in the conventional way for cryo-crystals.
Heavy atom data set processing with HKL is done in the same fashion as native data set processing,
except that the Friedel pairs must be kept separate to allow for anomalous scattering by the heavy
atom.
Exercise
Process the derivative data set using HKL. Processing of derivative data within HKL is identical to
that of native data. The only difference is that in the scalepack command file an extra command is
needed, namely
anomalous on
This instructs scalepack to separate the anomalous pairs in the output. You may also use the
scale anomalous
command, which then keeps the Friedel pairs separate during both scaling and statistics
computation. This is worthwhile if the data is of good quality and adequate redundancy.
Questions
1.1 What are the merging R-factor, the completeness, the maximum resolution, the mean
redundancy, and the mean signal-to-noise for this data set ?
1.2 How does these statistics compare with those of the native ?
1.3 Is the space group for this data set the same as that of the native ?
1.4 How does the post-refined cell compare to that of the native data set ?
1.5 Examine the output reflection file from scalepack. Comment on why the number of columns
varies down the file. What values are contained in these columns ?
Part 2. Merging the heavy atom and native data sets
Background
The CCP4 suite of programs has a very convenient way of handling multiple heavy atom and
native data sets, namely via a file format that has labelled columns for the data (*.mtz files). Each
column within the file carries a particular data item for each reflection and can be referred to by a
user-defined name. Use of that particular column of values by a CCP4 program requires that that
column be associated with a program column name. Likewise, new columns of data associated with
each reflection can be created by the program and the user can assign a name to each output
column. This is particularly simple within the CCP4 gui; within CCP4 script files this is done via
the LABIN and LABOUT assignment commands.
Two CCP4 programs are useful here: scalepack2mtz and cad. The former converts a scalepack
output file to mtz format (should be done for both the derivative and native data sets) and the latter
merges the two mtz files into a single mtz file. The "truncate" procedure is also invoked to give a
systematic handling of negative intensities. Note that multiple derivatives can be progressively
merged into a single file, and their columns names are conveniently distinguished. cad also does
some other useful things, such checking that the files are sorted in the same order and use the same
convention for the Laue group asymmetric unit. A free R-factor column is also added and dummy
reflection records are also created for missing reflections.
Useful checks on the import include
(a) a check of the Wilson plot, to see that it is protein-like
(b) a check on whether or not there is systematic variation in reflection amplitude with respect to
index parity
(c) a check for excessive anisotropy
(d) a check for crystal twinning
cad itself should run without much informative output, but a quick scan of the log file for errors can
do no harm !
Exercise
Use the CCP4 gui to import the native and ptnit data sets into CCP4 mtz format using
scalepack2mtz, and then merge these into a single file using cad.
Questions
2.1 Perform each of the above checks on the import of both the native and the ptnit derivative data
and discuss the relevant numbers (twinning is not possible in P212121 so ignore that check for now).
Part 3. Scaling the heavy atom data set to the native data set
Background
Heavy atom phasing is dependent on obtaining accurate values for the differences in amplitude of
(i) the native and derivative reflections and (ii) the Friedel mates of the derivative (if anomalous
scattering is present). This is dependent upon having the native and derivative data sets on the same
scale.
There are two approaches – either place both data sets on the same absolute scale, or place the data
sets on a common relative scale. The latter approach is usually better, as it is not always clear what
the best absolute (Wilson) scale is. The basis of relative scaling is much the same as scaling
individual images to each other in scalepack, i.e. computing an overall scale factor and a
“temperature factor”. The only improvement is to allow for anisotropic scaling, i.e. the temperature
factor is described by a set of components which describe its variation as a function of direction in
reciprocal space. This can be further elaborated into “local scaling”, but the latter form of scaling is
not widely used.
Effective scaling will ensure that mean native and derivative amplitudes are “as similar as possible”
across the resolution range being considered. Often it is not possible to get good scaling across the
entire resolution range and it is thus useful to omit the low resolution data (say lower than 20 Å or
lower than 15 Å). Care must also be taken to exclude outlier data that may erroneously effect the
scaling. This may require a few rounds of scaling to get the best possible result.
Derivative scaling in CCP4 is done with the program scaleit. scaleit does all of the above and
produces many useful statistics which can be used to assess the extent to which useful isomorphous
derivatization has occurred.
Useful checks of the scaleit output file including the following
(a) Check the anisotropy scaling matrix to see whether or not the degree of anisotropy of the data is
signficantly different between the native and derivative.
(b) Are there reflections with exceptionally high isomorphous or anomalous differences, and should
these be excluded from scaling by repeating the run, or by total deletion from the data set. Beware,
this procedure could delete entirely valid data!
(c) An assessment of the differences in terms of a normal error model. The numbers to check here
are the Gradient (should be >> 1) and the intercept (should be ≈ 0.). Check the original paper if in
doubt (Smith and Howell, J. Appl. Cryst. (1992), v25, 81-86).
(d) The matrix of differences vs resolution. These should look normal in distribution without being
significantly skewed.
(e) The most useful table is probably the final one, which provides the scale factors that would be
appropriate to each resolution shell (should all be ≈ 1.), Riso, RF_I, and the weighted R-factor as
well as the size of the anomalous signal compare to the isomorphous signal. These values should all
correspond to that expected for an isomorphous derivative. (see lectures).
Exercise
Use scaleit within the CCP4 gui to scale the ptnit data set to the native data set.
Questions
3.1 Examine the scaleit log file and perform each of the above checks and comment on whether or
not they suggest the derivative will be useful or not.
Part 4. Plotting Harker sections for the ptnit derivative and determining the heavy atom
coordinates
Background
A Patterson function computed with coefficients |ΔFiso|² will show peaks at positions corresponding
to the inter-atomic heavy atom vectors. Space group symmetry often results in the vectors relating a
heavy atom to its symmetry mates to lie in certain planes within the Patterson (the so-called Harker
planes). Analysis of the positions of these vectors can lead to heavy atom coordinates.
The traditional way of doing this is to print out and view the Harker planes.
Exercise
Calculate a ptnit difference Patterson map using the CCP4i gui. Plot these using pltdev (as far as I
am aware this program has to be accessed via the command lie rather than via the gui).
Questions
4.1 Comment on the Harker section plots. How many platinum atoms do you think are bound to the
protein?
4.2 Compute the fractional coordinates of the heavy atom(s) directly from the Patterson.
Part 5. Using SOLVE for heavy atom phasing
Background
SOLVE is an extremely powerful heavy atom search and refinement program. It is capable of
scaling multiple derivatives, determining their heavy atom locations and determining a phase set.
(Whilst this is an attractive "black box", analysis of the derivative/native scaling and visual
assessment of the Harker planes remain invaluable processes before investing effort with SOLVE.
Manual assessment of derivatives is also important in terms of working out whther or not to
continue data collection).
SOLVE endeavours to search for the heavy atom coordinates progressively and automatically,
adding in as many sites as it can and using trees to keep track of all the combinations tried. It then
uses a variety of measures to assess the quality of the phases obtained. It is also capable of
determining the hand of the heavy atom cluster and of the space group via an analysis of the final
map.
These include
1) a score of the overall heights of all vectors in the Patterson
2) a score of the overall heights of peaks in difference fouriers
3) the figure of merit
4) the partitioning between protein and solvent in the heavy atom map
We will start by using SOLVE for the ptnit dreivative only.
Exercise
Set up a SOLVE run for the native / ptnit data. For simplicity start with the already merged data in
the scaleit output file. Use the SOLVE man pages to work out how to set this up.
Questions
5.1 List the heavy atom sites obtained by SOLVE. Are these the same as you determined by hand?
5.2 List the temperature and occupancy associated with each heavy atom. What do these suggest ?
5.3 Examine the Cullis R-factor, the figure of merit and the phasing power of the solution. What
does these suggest?
5.4 Examine the four scores produced by SOLVE for this solution. What do they suggest?
Part 6
Solvent flattening
Background
The phases determined by MIR or SIR are often not very good (MAD on the other hand can yield
spectacularly good maps). They seldom lead to a map that is completely traceable from one
terminus to the next. In some instances they may only yield phases to low resolution (poorer than
say 6Å). However, a variety of density modification techniques (DM) can be used to improve the
situation. MIR maps inevitably do not meet certain expected properties of a protein map. These
properties include flat solvent, non-crystallographic symmetry if present, an ideal histogram for the
electron density and protein-like structure. DM seeks to improve the phases by iteratively enforcing
these conditions onto the map and then recombining the altered structure factors with the observed
amplitudes.
Solvent flattening starts by determining an envelope for the protein within the MIR map, and then
flattens all density outside of that (i.e. places it a a uniform level). Protein density is left alone, or
sometimes made non-negative. The modified map is Fourier transformed, new phases calculated,
which are then combined with the original Fobs values to generate a new map. The procedure is
iterated, with new envelopes being computed every few cycles.
We will seek to carry out solvent flattening by means of RESOLVE. RESOLVE is a new, very
powerful capable not only of solvent flattening but also of attempting to build the structure straight
into the map. RESOLVE uses maximum likelihood.
To run RESOLVE do the following
i) Create a file called resolve.in with the following lines in it
line 1: solvent_content ### (where ### is the fractional solvent content appropriate to PsaA)
line 2: no_build
ii) Then type in at the command prompt
export SYMOP /usr/local/ccp4/lib/data/symop.lib
(you can place this in your .bashrc file if you like)
iii) then run resolve from within the same directory you ran solve as follows
resolve < resolve.in > resolve.log
Exercise
Start by computing the solvent content of the PsaA crystal. This can be computed within the CCP4
gui (assume say 300 residues for PsaA). Then use RESOLVE to carry out solvent flattening of the
SOLVE electron density map.
Questions
6.1 What percentage to you get for the solvent content? How does this compare with the average
protein crystal?
6.2 Which (amplitude, phase) combination are used from the SOLVE output as input to RESOLVE
and why? (look in the log file). Discuss.
6.3 What would happen during this procedure if the SOLVE phases were junk?
Part 7. Compute and examine an electron density map
Background
The solvent flattened phases from RESOLVE are about as good as one can get from a single heavyatom derivative with significant anomalous signal. These phases can now be used to compute and
electron density map, which may either be an (Fcalc, calc) map or an (mFobs,calc) map. Each of
these has certain advantages. The former effectively assume that the Fcalc's are more accurate than
the weighted Fobs's, whilst the latter assumes that the error resides in the phases and downweights
the Fobs values appropriately.
At this stage we do not know where in the unit cell the protein lies, so it is probably best to
compute a large map volume (say a unit cell). Map calculation is readily performed using the ccp4
gui, and can be made directly into o maps.
Exercise
Compute an (mFobs,calc) electron density map in both CCP4 format and O format covering the
entire unit cell using the CCP4 gui. Load the O formatted file into O and display it using the fm_
commands
Questions
7.1 What is you overall impression of the solve map? Does it differentiate protein from solvent and
why? What is your overall impression of the resolve map. Does the protein density look like
protein and why? Can you see any features?
7.2 Use the map peak search routine within the CCP4 gui to find the highest point in this map. How
high is it? What is it coordinates? Inspect this in the electron density map and comment on what
you see? What is the likely source of this peak (read up something about PsaA from the protein
data base).
Download