PHASING BY MULTIPLE ISOMORPHOUS REPLACEMENT (MIR

advertisement
PHASING BY MULTIPLE ISOMORPHOUS REPLACEMENT (MIR)
We will now use the native data you collected on lysozyme (or data just like it), as well as two
derivative data sets we collected for you, to obtain phases for lysozyme using MIR. The derivatives
are a Uranium (from a K3UO2F5 soak) and a Mercury (from co-crystallization of lysozyme with
PCMS).
The flow chart of what we will be doing is as follows, along with the programs in parentheses:
1.
Convert the data to CCP4 format
2.
Combine these three data sets into one file
3.
Scaling the data set together
(denzo2mtz)
(cad)
(scaleit)
4.
Calculating a Difference Patterson for the U derivative
5.
Viewing the Harker section of the difference patterson map
6.
Solving for a consistent set of heavy atom positions for U
(rsps)
7.
Using the U phases to solve for the Hg positoins
(ddf)
8.
Refining the heavy atom positions
9.
Calculating phases
10.
Calcuating a map
11.
Improving the map using density modification
12.
Viewing the map
(dp)
(npo)
(mlphare)
(mlphare)
(fft)
(dm)
(O)
To accomplish each of these steps (except the last), we will use the CCP4 suite of programs. There
are numerous program packages that will do this - XtalView, SOLVE, CNS, even CCP4I (an
interactive, GUI-based version of CCP4). We will use CCP4 because it is the most transparent and
least ?black-boxed?. Once you understand the idividual steps from using CCP4, the other programs
will be easy to use.
General information on CCP4
The Collaborative Computing Project #4 (CCP4) is a collection of crystallographic programs
written by people all over the world and maintained by a group of scientists in the UK, largely at
York and Darebury.
The information you will need to know about CCP4 is available from their web site:
http://www.dl.ac.uk/CCP/CCP4/main.html
For example, if you want documentation on any of the programs or routines we will be using, you
can go from this main page to -Documentation- , then -Program Documentation-, then scroll down
to SUPPORTED for a list of each program they have (final web site is
http://www.dl.ac.uk/CCP/CCP4/dist/html/INDEX.html ). We will use several of them, either in
separate command files or in combinations in single command files. Also, information on the
theory and their primary references are also available from each of these program links.
O.
GETTING STARTED
Copy the three data files we have created for you from the following location on deka:
/deka/people/raxis/student
Don?t use your native data set of lysozyme - they are probably great, but we know the three here
work.
Also, when you create these command files (.com) to run the CCP4 routines, you may have to
execute the following command:
chmod +x denzo2mtz.com
to tell the computer that this file is an executable so you can run it.
I.
DIFFERENCE PATTERSONS
1.
Convert the native and derivative data files from SCALEPACK output format (.hkl,
ascii) to CCP4 format (.mtz, binary).
The current data files are as follows, from SCALEPACK:
deriv_Hg_lyso.hkl deriv_U_lyso.hkl native_lyso.hkl
We will convert these to the CCP4 convension using this command file, which contains two
separate programs:
denzo2mtz.com
$CCP4/bin/f2mtz HKLIN native_lyso.hkl \
HKLOUT temp.mtz <<EOF-F> denzo2mtz.log
TITLE
CELL 78.547
78.547
36.836
90.000
SYMM p43212
LABOUT H K L IMEAN SIGIMEAN
CTYP H H H J
SKIP 3
END
EOF-F
#
Q
90.000
90.000
#
Conversion from Is to Fs
#
truncate HKLIN temp.mtz \
HKLOUT native_lyso.mtz <<EOF-T> truncate.log
TITLE
LABOUT F=FP SIGF=SIGFP
WILSON
NRESID 129
RESOL 20 1.7
RSCALE 3.0 1.7
ANOMAL NO
TRUNCA YES
EOF-T
#
rm -f temp.mtz
Run this command files by typing ?denzo2mtz.com?, after editing it with the appropriate
information.
Two programs are used: f2mtz and truncate. F2mtz takes F?s (or intensities, as in our case) in any
asci format and converts them to .mtz format, which is binary and what CCP4 uses. The advantage
to .mtz is that they take up less space; the disadvantage, you cannot open them and look at them
easily.
mtzdump
CCP4 has a small program to look at mtz?s called mtzdump. To use it to look at an mtz, type:
mtzdump HKLIN deriv_U_lyso.mtz
then type ?go? (if you want to see all the reflections dumped to the screen, type ?nref=-1? before
you type ?go?; if you want to dump this to an asci file instead of the screen, add ? > test.out ? to the
end of the ?mtzdump ? ? command line listed above).
You can see that the .mtz contains a lot of info other than just h, k, l, F and sigma.
CCP4 requires each column in an mtz to have a label. We will label our columns, H, K, L, IMEAN
and SIGIMEAN. Each column also needs a type associated with it, so the programs will know
what to expect there. The types of each of our columns are: H, J, Q. H means that this column
contains an index (h, k, or l); J means an intensity is there; Q is the sigma (error) on the intensity.
We will also skip the first three lines from our input .hkl file because they contain things other than
H K L I or SIGI.
Notice that the output file from f2mtz is called temp.mtz - this is a temporary file. We delete it
after the next step, truncate.
Truncate converts our Intensities, which is what we read from the image plate and is indexed,
processed and scaled by Denzo/Scalepack, to Structure Factors or F?s. The simplest way to do this
is to take the square root of the I; however, its been shown that many other corrections should be
make during this conversion. That is what truncate does. To get the detailed background on
truncate, check its documentation. Briefly, truncate takes our I?s and SigI?s from f2mtz, puts them
on an absolute scale (e.g. corrects for the unmeasureable |F(0,0,0)| reflection, the value of which
can be calculated) and applies a Wilson B-factor (after the first user of this).
LABOUT adds two new columns to the output .mtz:
SIGF
FP and SIGFP, which are of type F and
NRESID is the number of AA in the protein, necessary for the truncate routine
RESOL is the complete resolution of the data
RSCALE is the resolution used for Wilson scaling: only data of higher resolution than 3 Å is used
ANOM is no because we are not including anomalous data
TRUNCATE is yes because we want to put the data on an absolute scale and do Wilson scaling
Notice that each program write a log file: truncate.log in this case. You can change the name of the
log file for each data set you run through this (e.g. truncate_native.log for native data).
Look at the log file from truncate. It contains a wealth of info, including the actual plot used to
calculate the Wilson B and scale factors (all data are plotted, only the 3 - 1.7 Å are used). An
examination of this log file (with the help of the documentation) is itself a crash course in
crystallography.
After running denzo2mtz.com three times, we have now converted our .hkl files into three separate
.mtz file:
deriv_Hg_lyso.mtz deriv_U_lyso.mtz native_lyso.mtz
2.
Combine the three data sets (one native, two derivative) into one file.
The following command file combines the three .mtz files we created into one.
cad.com
cad HKLIN1 native_lyso.mtz \
HKLIN2 deriv_U_lyso.mtz \
HKLIN3 deriv_Hg_lyso.mtz \
HKLOUT lyso_nat_U_Hg.mtz <<EOF-F> cad.log
RESO FILE_NUMBER 1 20 1.7
LABI FILE 1 E1=FP E2=SIGFP
CTYP FILE 1 E1=F E2=Q
RESO FILE_NUMBER 2 20 1.7
LABI FILE 2 E1=FPU E2=SIGFPU
CTYP FILE 2 E1=F E2=Q
RESO FILE_NUMBER 3 20 1.7
LABI FILE 3 E1=FPHg E2=SIGFPHg
CTYP FILE 3 E1=F E2=Q
END
EOF-F
There are three input files (HKLIN1, HKLIN2, HKLIN3) and one output file (HKLOUT). For
each file, the proper names and types for each column have to be read in (FP, SIGFP, FPU,
SIGFPU, and so on). They are combined together into one output file.
Run this by typing ?cad.com?.
A portion of this output .mtz file is shown here, obtained with mtzdump:
OVERALL FILE STATISTICS for resolution range 0.003 - 0.346
=======================
Col Sort Min Max Num
num order
%
Missing complete
Mean
Mean Resolution Type Column
abs. Low High
label
1 ASC
0
46
0 100.00
24.3
24.3 19.64 1.70 H H
2 NONE
0
32
0 100.00
10.0
10.0 19.64 1.70 H K
3 NONE
0
21
0 100.00
7.9
7.9 19.64 1.70 H L
4 NONE 5.2 1325.8 136 98.97 146.17 146.17 19.05 1.70 F FP
5 NONE 0.8
27.8 136 98.97
6 NONE 5.1 1437.3
7 NONE 0.8
31.3
3.48
3.48 19.05 1.70 Q SIGFP
10 99.92 161.75 161.75 19.64 1.70 F FPU
10 99.92
2.81
2.81 19.64 1.70 Q SIGFPU
8 NONE 5.6 1344.7 2005 84.76 185.44 185.44 19.05 1.79 F FPHg
9 NONE 0.7
26.8 2005 84.76
3.22
3.22 19.05 1.79 Q SIGFPHg
No. of reflections used in FILE STATISTICS 13153
LIST OF REFLECTIONS
===================
0 0 16
0 0 20
1 0 11
1 0 12
567.44
?
18.31
?
283.89
?
193.44
9.04
?
?
?
6.40
270.11
342.10
?
107.28
8.25
7.45
?
5.22
?
?
?
?
1 0 13
?
?
278.51
6.12
?
?
1 0 14
?
?
31.61
3.31
?
?
1 0 15
52.78
4.04
39.74
2.56
?
?
1 0 16
359.78
11.99
260.82
4.10
?
?
1 0 17
152.85
3.62
196.61
3.59
?
?
1 0 18
206.38
4.69
208.55
4.76
181.19
6.29
The native data (FP and SIGFP, for the structure factor and sigmas of the Protein) is 98.97%
complete, the U derivative is ~100% complete, and the Hg is less complete.The first 10 reflections
are listed; the ??? indicate when the reflection wasn?t measured in a data set.
Now we have our native and two derivative data sets in one .mtz file.
3.
Scale the derivative data sets to the native data set.
Just as in data processing when the data from different images needed to be scaled together (usually
with the first image used as the standard), each of these complete data sets need to be corrected by
scaling. In this case, we will scale the two derivative data sets to the native. The command file for
this program ?scaleit? follows:
sacleit.com
$CCP4/bin/scaleit HKLIN lyso_nat_U_Hg.mtz HKLOUT lyso_nat_U_Hg_sc.mtz >
scale_lyso_nat_U_Hg.log << EOF
RESO 20.0 1.7
CONVERGE NCYC 20 ABS 0.000001 TOLERANCE 0.0000000001
EXCLUDE SIGFP 1.0 SIGFPH1 1.0 SIGFPH2 1.0
LABIN FP=FP
SIGFP=SIGFP -
FPH1=FPU SIGFPH1=SIGFPU FPH2=FPHg SIGFPH2=SIGFPHg
REFINE ANISOTROPIC
GRAPH H K L MODF
END
EOF
The .mtz we just made in cad.com is input; new output and log file names are supplied. Again, the
labels must match those in the input .mtz file. We will use all data (20-1.7 Å), ask for 20 cycles of
refinement of the scale factors, and tell it that when the shifts in the scale factors are less than
0.000001 we are done (even if 20 cycles have not been reached). Next we exclude all reflections
that have F?s less than 1.0*sigmaF (i.e. we throw out weak data).
Run this by typing ?scaleit.com?.
We have chosen to refine the scale and an anisotropic B-factor (REFINE ANISOTROPIC). You
can refine just a scale between data sets, a scale and isotropic B, or a scale and an anisotropic B, as
shown below from the documentation:
An overall scale (REFINE SCALE)
C
Isotropic temperature factor (REFINE ISOTROPIC)
C * exp (-B sintheta/lambda)
Anisotropic temperature factor (REFINE ANISOTROPIC) (default)
C * exp(-(h**2 B11 + k**2 B22 + l**2 B33 +
2hk B12 + 2hl B13 + 2kl B23)
An anisotropic B is essentially a B-factor that can vary depending on the direction of reciprocal
space. This is useful for crystals that may not diffract as well in one direction as in others. You can
think of an anisotropic B as being shaped like a three-dimensional ellipse - perhaps equal in two
dimensions but elongated in a third. The anisotropic B refines a matrix of on-diagonal (B11, B22,
B33) and off-diagonal (B12, B13, B23) values for B.
Again, the ouput file contains a wealth of information; we will consider only a small portion of it.
Some of the log file (you can use ?/stringtosearchfor? in vi to look for the equivalent lines from
your log file):
Derivative:
FP= FP FPH= FPU SIGFPU
Initial derivative scale factor (for F) =
Derivative:
0.9179 from 13007 reflections
FP= FP FPH= FPHg SIGFPHg
Initial derivative scale factor (for F) =
0.8705 from 11049 reflections
For derivative : 1
beta matrix - array elements beta11 beta12 beta13
beta21 beta22 beta23,
beta31 beta32 beta33
1.9855
0.0000
0.0000
0.0000
1.9855
0.0000
0.0000
0.0000
-0.1937
For derivative : 2
beta matrix - array elements beta11 beta12 beta13
beta21 beta22 beta23,
beta31 beta32 beta33
2.0784
0.0000
0.0000
0.0000
2.0784
0.0000
0.0000
0.0000
-4.3380
These are our scale and aniso B?s for each deriv (#1=U, #2=Hg) - notice that the off-diagonal terms
in the aniso-B matrices are 0. Thus, we could have used an isotropic B for these lysozyme data sets
(no harm in using aniso, though).
------------------------------------------------------------------------------------------------------------------Isomorphous Differences
Derivative title: FP= FP FPH= FPU SIGFPU
Differences greater than 4.1281 * RMSDIF are unlikely,
ie acceptable differences are less than 95.15
Maximum difference 133.00
----------------------------------------------------------
---------------------------------------------------------Isomorphous Differences
Derivative title: FP= FP FPH= FPHg SIGFPHg
Differences greater than 4.0908 * RMSDIF are unlikely,
ie acceptable differences are less than 202.47
Maximum difference 341.00
The ?acceptable differences? lines will be useful when we calculate difference Pattersons - large
differences between native and derivative data sets should be eliminated as they skew the sensitive
difference Patterson calculation. Scaleit gives you some useful #?s to use to start eliminating large
isomorphous differences. Thus, we will use 95.15 as a max difference when we calculate our U
difference Patterson.
Also, for each derivative, the R-factor versus resolution is examined. For the U derivative:
$TABLE:Analysis v resolution FP= FP FPH= FPU SIGFPU :
$GRAPHS :
Kraut_sc and RMS(FP/FHP) v resolution :N:1,5,6 :
: Rfac/Wted_R v resolution :N:1,7,9 :
: <diso> and Max(diso) v resolution :N:1,10,11:
: <dano> and Max(dano) v resolution :N:1,14,15:$$
1/resol^2
Resolution
Number_reflections
Mean_FP_squared
Kraut_scale
Sqrt(Mean_FP_squared/Mean_FPH_squared)
Rfactor
Rfactor_I
Rfactor_W
Mean_Abs(Diso)
Maximum_Diso
Number_reflections_with_anomalous_differences
Mean_Dano(=0?)
Mean_Abs(Dano)
Maximum_Dano
Mean_Diso/0.5*Mean_Dano(Kemp) $$
1/resol^2 Res NRef <FP**2> Sc_kraut SCALE RFAC RF_I Wted_R <diso> max(diso)
N(ano) <ano> <|ano|> max(ano) Kemp $$
0.009 10.8
0 0.0000
54
143351. 1.022 1.012 0.127 0.161 0.289
40.3
123
0.026 6.2
0 0.0000
241
97123. 1.013 1.005 0.100 0.146 0.246
26.8
91
0
0.0
0.0
0.043 4.8
0 0.0000
339
94205. 1.006 0.997 0.102 0.159 0.184
27.3
116
0
0.0
0.0
0.061 4.1
0 0.0000
395
147343. 1.007 1.000 0.090 0.138 0.213
30.0
107
0
0.0
0.0
0.078 3.6
0 0.0000
445
150639. 1.012 1.004 0.089 0.141 0.155
30.2
127
0
0.0
0.0
0.095 3.2
0 0.0000
504
120067. 1.017 1.009 0.091 0.147 0.146
27.5
133
0
0.0
0.0
0.112 3.0
0 0.0000
536
77148. 1.021 1.011 0.100 0.149 0.148
24.3
122
0
0.0
0.0
0.130 2.8
0 0.0000
577
53511. 1.014 1.004 0.095 0.150 0.126
19.2
102
0
0.0
0.0
0.147 2.6
0 0.0000
623
41177. 1.015 1.002 0.109 0.166 0.141
19.3
96
0
0.0
0.0
646
33609. 1.012 0.998 0.118 0.187 0.143
18.9
110
0
0.0
0.0
0.164
2.5
0
0.0
0.0
0 0.0000
0.182 2.3
0 0.0000
687
29327. 1.013 0.998 0.119 0.188 0.139
17.7
86
0
0.0
0.0
0.199 2.2
0 0.0000
727
28052. 1.020 1.003 0.123 0.199 0.137
18.1
109
0
0.0
0.0
0.216 2.2
0 0.0000
759
22836. 1.017 1.002 0.116 0.186 0.121
15.3
85
0
0.0
0.0
0.234 2.1
0 0.0000
767
20323. 1.021 1.004 0.123 0.196 0.128
15.5
58
0
0.0
0.0
0.251 2.0
0 0.0000
820
15048. 0.999 0.980 0.139 0.229 0.129
14.8
72
0
0.0
0.0
0.268 1.9
0 0.0000
829
11484. 1.005 0.982 0.146 0.235 0.131
13.6
68
0
0.0
0.0
0.285
0.0000
1.9
866
8988. 0.993 0.969 0.157 0.255 0.133
12.9
77
0
0.0
0.0
0
0.303
0.0000
1.8
885
6511. 0.998 0.971 0.165 0.276 0.130
11.6
57
0
0.0
0.0
0
0.320
0.0000
1.8
909
4904. 0.978 0.954 0.167 0.285 0.125
10.3
60
0
0.0
0.0
0
0.337
0.0000
1.7
1398
3697. 0.967 0.941 0.182 0.306 0.131
9.6
47
0
0.0
0.0
0
37435. 1.012 1.000 0.117 0.166 0.138
17.1
$$
"><b>For inline graphs use a Java browser</b></applet>
THE TOTALS 13007.
133.
The R-factor of native vs. this U derivative varies from 12.7 t0 18.2% depending on resolution, and
has a overall value of 11.7%. Compare this value to the Rmerge values for each data set alone
(~5%) - thus, there are differences between these data sets.Perhaps U is a derivative. We will know
when we calculate the difference Patterson and refine potential sites. A lot of people have
?guidelines? for deciding if a data set is a potential derivative. One is to look for a dip in the
R-factors at medium resolution. We have it here where the R dips to ~9% between 4-3 Å resolution
and then rises again. For great data and a good derivative (like this case) you may see this. On the
other hand, you may not. Only at the difference Patterson and heavy atom refinement stages are a
derivative?s potential merit revealed. You can look for the same values for the Hg derivative.
(This one is less isomorphous and does not show this dip characteristic.)
We now have the two derivative data sets scaled with the native in the file:
lyso_nat_U_Hg_sc.mtz
4.
Calculate a difference Patterson for the U derivative.
We will now use the Fast Fourier Transform routine in CCP4 to calculate the difference Patterson
for the U derivative. The input file is:
fft_dp.com
fft HKLIN lyso_nat_U_Hg_sc.mtz \
MAPOUT lyso_nat_U_dp.map <<eof-f> fft_dp_U.log
TITLE difference Patterson
LABIN F1=FP SIG1=SIGFP F2=FPU SIG2=SIGFPU
RESO 10 2.5
xyzlim asu
PATTERSON
EXCLUDE sig1 3 sig2 3 diff 100
END
eof-f
We use our scaled .mtz as input and will create a map file containing the Patterson map. The input
labels are as we have kept them. We need to tell the program that we want a Patterson map
calculation.
Resolution should be lower that your maximum - error increases with resolution because the data is
weaker at higher resolution. The only exception to this is when you are calculating an anomalous
difference Patterson. For anomalous, the signal does not diminish with resolution but increases, so
if you have good resolution you should do your anomalous DP?s up to higher res (maybe 2 Å or
higher in this case). For this isomorphous difference Patterson, we choose 2.5 Å as our maximum
resolution.
We also choose to exclude all data weaker than 3.0 sigma, and the reflection pairs where the
absolute difference between structure factors is greater than 100. Recall the output from scaleit that
indicated that we do this (essentially because these difference are larger than ~4 sigma of the mean
difference). Difference Pattersons are sensitive calculations and poor data or unlikely differences
can throw them off; thus, we throw ?bad? reflection pairs away.
Run this by typing ?fft_dp.com?.
The map file is a binary that we cannot look at directly. First, we will search for peaks in it:
peakmax.com
$CCP4/bin/peakmax mapin lyso_nat_U_dp.map << eof_peak > lyso_nat_U_dp.peaks
output PEAKS
threshold rms 3.0
eof_peak
The output file (lyso_nat_U_dp.peaks) contains information about the map:
Number of columns, rows, sections ............... 49 49 21
Map mode ........................................ 2
Start and stop points on columns, rows, sections
0 48
0 48
0 20
Grid sampling on x, y, z ........................ 96 96 40
Cell dimensions ................................. 78.54700 78.54700 36.83600 90.00000 90.00000
90.00000
Fast, medium, slow axes ......................... Y X Z
Minimum density ................................. -24.31904
Maximum density ................................. 227.59644
Mean density ....................................
0.00812
Rms deviation from mean density .................
2.27337
Space-group ..................................... 123
Number of titles ................................ 1
Notice that the rms deviation from mean density is 2.27. This is the sigma level for the map; we
will need this latere. It also contains a list of peaks greater than 3 sigma for this map:
There are 16 peaks higher than the threshold 6.82012 ( 3.00000 *sigma)
These peaks are sorted into descending order of height, the top 12 are selected for output
The number of symmertry related peaks rejected for being too close to the map edge is 4
Peaks related by symmetry are assigned the same site number
Order No. Site Height/Rms Grid
1
1
1 100.11
2 16 11
3
5
4
4 14 10
5
7
5
6.23
5.23
4.74
4.41
Fractional coordinates Orthogonal coordinates
0 0 0 0.0000 0.0000 0.0000
16 35 20 0.1676 0.3643 0.5000
23 23 3 0.2363 0.2363 0.0795
38 38 17 0.4005 0.4005 0.4166
13 48 7 0.1342 0.5000 0.1683
0.00 0.00 0.00
13.16 28.62 18.42
18.56 18.56 2.93
31.46 31.46 15.35
10.54 39.27 6.20
6 11
8
3.94
48 32 13 0.5000 0.3334 0.3338
39.27 26.19 12.29
7
8
6
3.70
25 10 10 0.2639 0.0996 0.2519
20.73 7.83 9.28
8
9
6
3.70
10 25 10 0.0996 0.2639 0.2519
7.83 20.73 9.28
9
2
2
3.37
40 40 0 0.4172 0.4172 0.0000
32.77 32.77 0.00
10
3
3
3.33
3 0 3 0.0360 0.0000 0.0809
11 10
7
3.26
0 0 12 0.0000 0.0000 0.3077
0.00 0.00 11.33
12 13
9
3.10
0 0 14 0.0000 0.0000 0.3454
0.00 0.00 12.72
2.83 0.00 2.98
Notice that the peak at the origin is the largest in the map, as is expected from the Patterson
function. We will ignore this peak.
The Harker sections for this space group are w=0.25, 0.5, 0.75, u=0.5, v=0.5. Notice there is a
large peak (6.23 sigma) at u=0.16, v=0.36, w=0.5.
5.
Visualising this difference Patterson.
We visualize this map by using the following plotting command file:
npo.com
rm -f top.*
$CCP4/bin/npo mapin lyso_nat_U_dp.map plot top.plo << eof > npo.log
TITL Lysozyme DP U - Native (10-2.5 A)
CELL 78.54700 78.54700 36.83600 90.00000 90.00000 90.00000
MAP
CONTRS 4.6 TO 200.0 BY 2.27
GRID
SECTNS 0,21
COLOUR BLACK
SIZE 60.0 CHAR 2.5 SCALE 2.5
THICK 0.3
PLOT
eof
$CCP4/bin/pltdev -i top.plo -o top.ps -sca 1.0
This generates a postscript file containing your Patterson map sectioned from 1-21 along the slow
section, which is Z in this case (take my word for it; if you want to look for this, look in the fft log
file or in the peakmax log file). Remember the sigma level for the map, 2.27? Notice in the
?CONTRS? line, we are contouring our plot from 2 sigma (4.6) to a large value by 2.27. In other
words, our map contours will start at 2 sigma and go up by 1 sigma from there.
The output is a postscript file called
and view this by typing
Or convert it to a pdf using
and view it with
top.ps
xpsview top.ps
ps2pdf top.ps top.pdf
acroread
The plot?s will say X, Y and Z, but they mean U, V and W, respectively, for a Patterson map.
From now on we will discuss them as X, Y and Z. This Patterson contains the following area of
Patterson space: 0 to 1 for X and Y, and 0 to 0.5 for Z.
6.
Solving the Patterson for a consistent set of U positions in real space.
Using a combination of the peak lists we got from peakmax and by viewing the Patterson map
directly, we can begin to interpret what it means. Recall that there are Harker vectors (or peaks),
ones that arise only on the Harker sections and corresponding directly to heavy atom sites. Also,
there are Cross vectors (or peaks), ones that arise when Harker vectors ?interact? with one another
in the Patterson function; these appear at places other than the Harker sections. The Harkers are
Z=0.25, 0.5, X=0.5, Y=0.5. So as we scan through the Patterson map (using ?page-up? or some
equivalent command in your viewer), we will look for things at the 0 and 0.5 sections of Z:
Z=0 section (page 1)
100 sigma origin peak
HARKER PEAKS:
correspond directly to heavy-atom positions
Z=0.5 section (page 21)
6.2 sigma peaks for our putative U positions
Corresponds to (0.16, 0.36 0.5) peak in peak list
Two peaks appear because of a diagonal mirror plane --
the other is at (0.36, 0.16, 0.5)
Z=0.17 (page 7)
Two 4.4 sigma peaks, at (0.13, 0.5, 0.17) and (0.5, 0.13, 0.17)
These are on the X and Y Harker sections; again, mirror symmetry in
play
Z=0.32 (page 13)
Two 3.9 sigma peaks, at (0.5, 0.33, 0.33) and (0.33, 0.5, 0.33)
Also on the X and Y Harker sections,with mirror symmetry
Z=0.25 (page 10)
Two 3.7 sigma peaks, at (0.26, 0.1, 0.25) and (0.1, 0.26, 0.25)
On Z Harker section, with mirror symmetry
CROSS PEAKS:
correspond to heavy atom cross vectors
Z=0.08 (page 3)
One 5.2 sigma peak at (0.24, 0.24, 0.8)
Z=0.42 (page 17)
One 4.7 sigma peak at (0.4, 0.4, 0.42)
We have interpreted all the peaks in the peak list from peakmax greater than 3.7 sigma.
Solving the Patterson by hand:
The symmetry operations for space group P43212 are:
Symmetry operation # 1:
X
, Y
, Z
Symmetry operation # 2:
-X
, -Y
, Z+1/2
Symmetry operation # 3:
-Y+1/2
, X+1/2 , Z+3/4
Symmetry operation # 4:
Y+1/2
, -X+1/2 , Z+1/4
Symmetry operation # 5:
-X+1/2
, Y+1/2 , -Z+3/4
Symmetry operation # 6:
X+1/2
, -Y+1/2 , -Z+1/4
Symmetry operation # 7:
Y
, X
, -Z
Symmetry operation # 8:
-Y
, -X
, -Z+1/2
For simple space groups like the common P21 that only have one Harker section (V=0.5), its
possible to solve the Patterson function by hand, and to account for all the cross vectors. In
complex space groups like this with five Harker sections, we are fortunate to have programs to help
us.
Real Space Patterson Search
The following command file will search for real space heavy atom positions that satisfy our various
Harker and cross peaks:
rsps.com
rsps << eof > lyso_nat_U_dp.log
SCORE
HARMONIC
BUMP
5.0
SPACEGROUP P43212
LOW
PATFILE
50
lyso_nat_U_dp.map TRUNCATE 500.0 RESET 0 0 0 10.0 0.0
SCORFILE dp_harker.map
WEIGHT
SCAN
2
AU
PICK
SCOREMAP 100
VLIST
SITE 1 25
WRITE
POSITIONS dp_harker.pdb
#
eof
rm -f dp_harker.map
rm -f dp_cross.map
rm -f TO
This program takes our input Patterson map (lyso_nat_U_dp.map), and some information like space
group, etc, and searches for the top 25 real space heavy atom positions that satisfies our Patterson
map.
Again, the log file (lyso_nat_U_dp.log) is a wealth of crystallographic information, but we will
focus on its solution to our Patterson:
RSPS SINGLE ATOMS VLIST >>
Vectors with density less than
2.28 ( 1.0 sigma above mean) are counted as low
Scores are computed as 1./Sum(1./(Weight*Rho/Sigma))/Nvec where
Weight is the multiplicity of the vector
Rho is the density at a vector position
Sigma is the rms deviation from the mean of the map
Nvec is the number of vectors contributing to the sum
****************************************************************************
Harker vectors for a heavy atom position at 0.5830 0.8177 0.0416:
Vec
U
V
W
Rho
Multiplicity
--- ------ ------ ------ ------- -----------1
0.1660 0.3645 0.5000
14.16
1
2
0.0992 0.2653 0.2500
8.42
2
3
0.3340 0.5000 0.3331
8.95
1
4
0.5000 0.1355 0.1669
10.03
1
5
0.2347 0.2347 0.0831
11.89
1
6
0.4008 0.4008 0.4169
10.77
1
Score =
5.09 with
0 low peaks
****************************************************************************
****************************************************************************
Harker vectors for a heavy atom position at 0.0811 0.1771 0.0349:
Vec
U
V
W
Rho
Multiplicity
--- ------ ------ ------ ------- -----------1
0.1622 0.3542 0.5000
9.71
1
2
0.2418 0.4040 0.2500
1.52
2
3
0.3378 0.5000 0.3198
8.95
1
4
0.5000 0.1458 0.1802
5.74
1
5
0.0960 0.0960 0.0698
3.50
1
6
0.2582 0.2582 0.4302
1.64
1
Score =
1.64 with
3 low peaks
****************************************************************************
The top solution for a heavy atom position is (0.58, 0.81, 0.04). Note that for this solution, all the
Patterson vectors (denoted by U,V,W, naturally) exist and have high rho (peak hieght) values.
Also, there are no ?low peaks?, or peaks in the Patterson that are missing. It also accounts for most
of the peaks we observed in our inspection of this Patterson (the rest we observed are related by
symmetry to those noted above). This is a good solution; it gets a high overall score (score defined
above). The next solution is not. It contains some good Patterson peaks, but is missing 3 and gets a
low score.
Thus, our position of the U heavy atom in this derivative of lysozyme is (0.58, 0.81, 0.04).
We now move on the refining this position and calculating initial SIR (single isomorphous
replacement) phases.
N.B.
I will leave it to you to calculate and examine the Patterson map for the Hg derivative.
The command files will be the same; so are the Harker sections, etc. We will use the phases from
the U derivative to get the Hg positions directly in real space (ie. not in Patterson space).
7.
Refining the U position and using it to find Hg sites.
Refinement of heavy atom positions, as well as phase calculation, will be done using the program
MLPHARE (Maximum Likelihood PHase REfinement). With only one derivative, we are
calculating SIR phases. The command file looks like this:
mlphare_sir.com
$CCP4/bin/mlphare HKLIN lyso_nat_U_Hg_sc.mtz \
HKLOUT lyso_nat_U_mlph.mtz <<eof-f> mlphare_nat_U.log
TITLE refining U position(s)
CYCLE 20
THRES 2.5 0.5
ANGLE 10
PRINT AVE AVF
LABIN FP=FP SIGFP=SIGFP FPH1=FPU SIGFPH1=SIGFPU
LABOUT ALLIN PHIB=PHIsir FOM=FOMsir
RESO 10 4.0
EXCLUDE SIGFP 3.0
HLOUT
DERIV U
DCYCLE PHASE ALL REFCYC ALL KBOV ALL
RESO 10 4.0
EXCLUDE DISO 100 SIGFPH1 3.0
ATOM U 0.5830 0.8177 0.0416 1.00 BFAC 40.000
ATREF X ALL Y ALL Z ALL OCC ALL B ALL
END
eof-f
Most of this file should be understandable now; details can be found in the documentation (on the
CCP4 web site). A few things: our output columns will be the same as the input, plus a phi-best (sir
phase) per reflection, and a figure-of-merit per reflection. We will use data only from 10-4.0 Å at
this stage. Our derivative is a U, and we have one site (denoted on the line starting ATOM). Note
our U position is inserted there, followed by its starting occupancy (1.0) and B (40.0). We will
refine the X , Y, and Z positions, as well as the OCC and B on all refinement cycles.
Run this by typing ?mlphare_sir.com?.
The output file (mlphare_nat_U.log) contains, yet again, an enormous amount of information. We
will look at a few areas that inform us as to how good a derivative this U atom is. At the bottom of
it are the refined parameters for this U:
DERIV
U
DCYCLE PHASE ALL REFCYC ALL KBOV ALL
RESO
10.00
4.00
SCALE FPH1 0.9839
ISOE
0.9602
20.84 19.53 18.86 18.12 19.93 27.81 28.50 28.19
ATOM1 U
0.582 0.818 0.044 0.145 BFAC 9.209
The x, y and z positions have not changed much; the occupancy dropped, as did the B. These last
two parameters are highly correlated (perhaps they should not be refined together, but it is fine in
this case).
A bit above this are the values for the quality of this derivative for phasing:
Resolution(Angstroms)
Number_acentric_reflections
Isomorphous_difference_acentric
Lack_of_closure_acentric
Phasing_power_acentric
Cullis_R_acentric(?<1.0)
Number_centric_reflections
Isomorphous_difference_centric
Lack_of_closure_centric
Phasing_power_centric
Cullis_R_centric(?<1.0) $$
1/resol^2 Resol Nref_a DISO_a LOC_a PhP_a CullR_a Nref_c DISO_c LOC_cPhP_c CullR_c
$$
0.014
8.42
60 25.6 15.1 1.98 0.59
59 40.0 19.9 1.56 0.50
0.019
7.27
38 28.9 16.7 1.95 0.58
25 30.8 15.5 1.35 0.50
0.024
6.40
58 21.9 14.2 2.08 0.65
35 35.1 17.2 1.77 0.49
0.031
5.71
75 21.1 13.4 2.19 0.63
34 26.2 19.0 1.25 0.72
0.038
5.16
90 24.2 15.3 1.77 0.63
36 38.5 18.5 1.46 0.48
0.045
4.71
118 29.6 22.0 1.25 0.74
47 30.4 24.2 0.91 0.80
0.053
4.32
134 31.1 24.2 1.11 0.78
51 28.5 22.5 0.94 0.79
0.063
4.00
166 27.7 22.5 1.08 0.81
47 36.1 27.2 0.95 0.75
739 27.0 19.4 1.42 0.72
334 33.6 21.1 1.21 0.63
$$
TOTAL
Look at the Centric reflections (those occuring on planes containing Friedel pairs, like the x,y,0
plane); these are less biased in an SIR refinement. They are at the right, indicated by _c following
each parameter. First, the Phasing power (PhP) - anything above 1.0 is very good. The cullis-R
factor next - anything 0.8 or below is good. Thus, this is a very good derivative.
Next, a bit farther up in the log file is information about Figure-of-Merit (FOM):
Number of Measurements phased -ACENTRIC
60
38
58
75
90
118
134
166
TOTAL
739
Mean Figure of Merit
0.4840 0.4473 0.4005 0.4172 0.4318 0.3388 0.3353 0.3029
Number of Measurements phased -CENTRIC
59
25
35
34
36
47
51
0.3716
TOTAL
47
334
Mean Figure of Merit
0.7833 0.5903 0.7464 0.6008 0.6493 0.4517 0.4238 0.5691
Number of Measurements phased -ALL
0.6003
TOTAL
119
63
93
109
126
165
185
213
1073
Mean Figure of Merit
0.6324 0.5040 0.5307 0.4745 0.4939 0.3710 0.3597 0.3616
0.4428
FOM is the cosine of the phase error; thus, 1.0 would be perfect phases, 0.0 would be random
phases. Again, look at the centrics - FOM of 0.6 is very good, especially for an SIR phasing
situation where it is difficult to break the phase ambiguity.
So we have a good U derivative that provides us with our first set of experimental phases for this
structure. These phases are in lyso_nat_U_mlph.mtz. We can use these phases to find the Hg
atoms in that derivative.
Cross-Difference Fouriers
Using the SIR phases calculated from the U derivative, we can use a |FHg-Fnat|, PhiU fourier map to
find positions of Hg. This is a so-called Cross-Difference Fourier because phases from one
derivative are used to find sites in another.
fft_cdf.com
$CCP4/bin/fft HKLIN lyso_nat_U_mlph.mtz \
MAPOUT temp.map <<eof-f> fft_cdf.log
TITLE
LABIN F1=FPHg SIG1=SIGFPHg F2=FP SIG2=SIGFP PHI=PHIsir W=FOMsir
RESO 10 4.0
EXCLUDE sig1 3 sig2 3 diff 200
END
eof-f
$CCP4/bin/peakmax mapin temp.map << eof > lyso_nat_U_toget_Hg_cdf.peaks
threshold rms 3.0 #negatives
output peaks
eof
rm -f TO
#rm -f temp.map
We use the F?s and SigF?s from the Hg and native data sets, and the SIR phases and FOM from the
U derivative, and calculate a electron density map. Significant peaks in this map should be the
position of a Hg atom in the Hg derivative data set. We do a peaksearch to find these peaks in the
same command file.
Here are the peaks from lyso_nat_U_toget_Hg_cdf.peaks:
Order No. Site Height/Rms Grid
Fractional coordinates Orthogonal coordinates
1
3
2 12.00
55 19 1 0.9173 0.3128 0.0180
72.05 24.57 0.66
2
4
1 12.00
35 49 1 0.5836 0.8168 0.0428
45.84 64.16 1.58
3
7
5
6.59
41 5 3 0.6847 0.0843 0.0970
53.78 6.62 3.57
4
5
3
4.04
37 59 1 0.6208 0.9833 0.0408
48.76 77.24 1.50
5
6
4
3.19
15 52 2 0.2553 0.8626 0.0590
20.05 67.76 2.17
6
9
6
3.06
11 37 4 0.1754 0.6226 0.1250
13.78 48.91 4.60
Right away we see a feature of cross difference Fouriers using phases from derivative data - the
Download