PHASING BY MULTIPLE ISOMORPHOUS REPLACEMENT (MIR) We will now use the native data you collected on lysozyme (or data just like it), as well as two derivative data sets we collected for you, to obtain phases for lysozyme using MIR. The derivatives are a Uranium (from a K3UO2F5 soak) and a Mercury (from co-crystallization of lysozyme with PCMS). The flow chart of what we will be doing is as follows, along with the programs in parentheses: 1. Convert the data to CCP4 format 2. Combine these three data sets into one file 3. Scaling the data set together (denzo2mtz) (cad) (scaleit) 4. Calculating a Difference Patterson for the U derivative 5. Viewing the Harker section of the difference patterson map 6. Solving for a consistent set of heavy atom positions for U (rsps) 7. Using the U phases to solve for the Hg positoins (ddf) 8. Refining the heavy atom positions 9. Calculating phases 10. Calcuating a map 11. Improving the map using density modification 12. Viewing the map (dp) (npo) (mlphare) (mlphare) (fft) (dm) (O) To accomplish each of these steps (except the last), we will use the CCP4 suite of programs. There are numerous program packages that will do this - XtalView, SOLVE, CNS, even CCP4I (an interactive, GUI-based version of CCP4). We will use CCP4 because it is the most transparent and least ?black-boxed?. Once you understand the idividual steps from using CCP4, the other programs will be easy to use. General information on CCP4 The Collaborative Computing Project #4 (CCP4) is a collection of crystallographic programs written by people all over the world and maintained by a group of scientists in the UK, largely at York and Darebury. The information you will need to know about CCP4 is available from their web site: http://www.dl.ac.uk/CCP/CCP4/main.html For example, if you want documentation on any of the programs or routines we will be using, you can go from this main page to -Documentation- , then -Program Documentation-, then scroll down to SUPPORTED for a list of each program they have (final web site is http://www.dl.ac.uk/CCP/CCP4/dist/html/INDEX.html ). We will use several of them, either in separate command files or in combinations in single command files. Also, information on the theory and their primary references are also available from each of these program links. O. GETTING STARTED Copy the three data files we have created for you from the following location on deka: /deka/people/raxis/student Don?t use your native data set of lysozyme - they are probably great, but we know the three here work. Also, when you create these command files (.com) to run the CCP4 routines, you may have to execute the following command: chmod +x denzo2mtz.com to tell the computer that this file is an executable so you can run it. I. DIFFERENCE PATTERSONS 1. Convert the native and derivative data files from SCALEPACK output format (.hkl, ascii) to CCP4 format (.mtz, binary). The current data files are as follows, from SCALEPACK: deriv_Hg_lyso.hkl deriv_U_lyso.hkl native_lyso.hkl We will convert these to the CCP4 convension using this command file, which contains two separate programs: denzo2mtz.com $CCP4/bin/f2mtz HKLIN native_lyso.hkl \ HKLOUT temp.mtz <<EOF-F> denzo2mtz.log TITLE CELL 78.547 78.547 36.836 90.000 SYMM p43212 LABOUT H K L IMEAN SIGIMEAN CTYP H H H J SKIP 3 END EOF-F # Q 90.000 90.000 # Conversion from Is to Fs # truncate HKLIN temp.mtz \ HKLOUT native_lyso.mtz <<EOF-T> truncate.log TITLE LABOUT F=FP SIGF=SIGFP WILSON NRESID 129 RESOL 20 1.7 RSCALE 3.0 1.7 ANOMAL NO TRUNCA YES EOF-T # rm -f temp.mtz Run this command files by typing ?denzo2mtz.com?, after editing it with the appropriate information. Two programs are used: f2mtz and truncate. F2mtz takes F?s (or intensities, as in our case) in any asci format and converts them to .mtz format, which is binary and what CCP4 uses. The advantage to .mtz is that they take up less space; the disadvantage, you cannot open them and look at them easily. mtzdump CCP4 has a small program to look at mtz?s called mtzdump. To use it to look at an mtz, type: mtzdump HKLIN deriv_U_lyso.mtz then type ?go? (if you want to see all the reflections dumped to the screen, type ?nref=-1? before you type ?go?; if you want to dump this to an asci file instead of the screen, add ? > test.out ? to the end of the ?mtzdump ? ? command line listed above). You can see that the .mtz contains a lot of info other than just h, k, l, F and sigma. CCP4 requires each column in an mtz to have a label. We will label our columns, H, K, L, IMEAN and SIGIMEAN. Each column also needs a type associated with it, so the programs will know what to expect there. The types of each of our columns are: H, J, Q. H means that this column contains an index (h, k, or l); J means an intensity is there; Q is the sigma (error) on the intensity. We will also skip the first three lines from our input .hkl file because they contain things other than H K L I or SIGI. Notice that the output file from f2mtz is called temp.mtz - this is a temporary file. We delete it after the next step, truncate. Truncate converts our Intensities, which is what we read from the image plate and is indexed, processed and scaled by Denzo/Scalepack, to Structure Factors or F?s. The simplest way to do this is to take the square root of the I; however, its been shown that many other corrections should be make during this conversion. That is what truncate does. To get the detailed background on truncate, check its documentation. Briefly, truncate takes our I?s and SigI?s from f2mtz, puts them on an absolute scale (e.g. corrects for the unmeasureable |F(0,0,0)| reflection, the value of which can be calculated) and applies a Wilson B-factor (after the first user of this). LABOUT adds two new columns to the output .mtz: SIGF FP and SIGFP, which are of type F and NRESID is the number of AA in the protein, necessary for the truncate routine RESOL is the complete resolution of the data RSCALE is the resolution used for Wilson scaling: only data of higher resolution than 3 Å is used ANOM is no because we are not including anomalous data TRUNCATE is yes because we want to put the data on an absolute scale and do Wilson scaling Notice that each program write a log file: truncate.log in this case. You can change the name of the log file for each data set you run through this (e.g. truncate_native.log for native data). Look at the log file from truncate. It contains a wealth of info, including the actual plot used to calculate the Wilson B and scale factors (all data are plotted, only the 3 - 1.7 Å are used). An examination of this log file (with the help of the documentation) is itself a crash course in crystallography. After running denzo2mtz.com three times, we have now converted our .hkl files into three separate .mtz file: deriv_Hg_lyso.mtz deriv_U_lyso.mtz native_lyso.mtz 2. Combine the three data sets (one native, two derivative) into one file. The following command file combines the three .mtz files we created into one. cad.com cad HKLIN1 native_lyso.mtz \ HKLIN2 deriv_U_lyso.mtz \ HKLIN3 deriv_Hg_lyso.mtz \ HKLOUT lyso_nat_U_Hg.mtz <<EOF-F> cad.log RESO FILE_NUMBER 1 20 1.7 LABI FILE 1 E1=FP E2=SIGFP CTYP FILE 1 E1=F E2=Q RESO FILE_NUMBER 2 20 1.7 LABI FILE 2 E1=FPU E2=SIGFPU CTYP FILE 2 E1=F E2=Q RESO FILE_NUMBER 3 20 1.7 LABI FILE 3 E1=FPHg E2=SIGFPHg CTYP FILE 3 E1=F E2=Q END EOF-F There are three input files (HKLIN1, HKLIN2, HKLIN3) and one output file (HKLOUT). For each file, the proper names and types for each column have to be read in (FP, SIGFP, FPU, SIGFPU, and so on). They are combined together into one output file. Run this by typing ?cad.com?. A portion of this output .mtz file is shown here, obtained with mtzdump: OVERALL FILE STATISTICS for resolution range 0.003 - 0.346 ======================= Col Sort Min Max Num num order % Missing complete Mean Mean Resolution Type Column abs. Low High label 1 ASC 0 46 0 100.00 24.3 24.3 19.64 1.70 H H 2 NONE 0 32 0 100.00 10.0 10.0 19.64 1.70 H K 3 NONE 0 21 0 100.00 7.9 7.9 19.64 1.70 H L 4 NONE 5.2 1325.8 136 98.97 146.17 146.17 19.05 1.70 F FP 5 NONE 0.8 27.8 136 98.97 6 NONE 5.1 1437.3 7 NONE 0.8 31.3 3.48 3.48 19.05 1.70 Q SIGFP 10 99.92 161.75 161.75 19.64 1.70 F FPU 10 99.92 2.81 2.81 19.64 1.70 Q SIGFPU 8 NONE 5.6 1344.7 2005 84.76 185.44 185.44 19.05 1.79 F FPHg 9 NONE 0.7 26.8 2005 84.76 3.22 3.22 19.05 1.79 Q SIGFPHg No. of reflections used in FILE STATISTICS 13153 LIST OF REFLECTIONS =================== 0 0 16 0 0 20 1 0 11 1 0 12 567.44 ? 18.31 ? 283.89 ? 193.44 9.04 ? ? ? 6.40 270.11 342.10 ? 107.28 8.25 7.45 ? 5.22 ? ? ? ? 1 0 13 ? ? 278.51 6.12 ? ? 1 0 14 ? ? 31.61 3.31 ? ? 1 0 15 52.78 4.04 39.74 2.56 ? ? 1 0 16 359.78 11.99 260.82 4.10 ? ? 1 0 17 152.85 3.62 196.61 3.59 ? ? 1 0 18 206.38 4.69 208.55 4.76 181.19 6.29 The native data (FP and SIGFP, for the structure factor and sigmas of the Protein) is 98.97% complete, the U derivative is ~100% complete, and the Hg is less complete.The first 10 reflections are listed; the ??? indicate when the reflection wasn?t measured in a data set. Now we have our native and two derivative data sets in one .mtz file. 3. Scale the derivative data sets to the native data set. Just as in data processing when the data from different images needed to be scaled together (usually with the first image used as the standard), each of these complete data sets need to be corrected by scaling. In this case, we will scale the two derivative data sets to the native. The command file for this program ?scaleit? follows: sacleit.com $CCP4/bin/scaleit HKLIN lyso_nat_U_Hg.mtz HKLOUT lyso_nat_U_Hg_sc.mtz > scale_lyso_nat_U_Hg.log << EOF RESO 20.0 1.7 CONVERGE NCYC 20 ABS 0.000001 TOLERANCE 0.0000000001 EXCLUDE SIGFP 1.0 SIGFPH1 1.0 SIGFPH2 1.0 LABIN FP=FP SIGFP=SIGFP - FPH1=FPU SIGFPH1=SIGFPU FPH2=FPHg SIGFPH2=SIGFPHg REFINE ANISOTROPIC GRAPH H K L MODF END EOF The .mtz we just made in cad.com is input; new output and log file names are supplied. Again, the labels must match those in the input .mtz file. We will use all data (20-1.7 Å), ask for 20 cycles of refinement of the scale factors, and tell it that when the shifts in the scale factors are less than 0.000001 we are done (even if 20 cycles have not been reached). Next we exclude all reflections that have F?s less than 1.0*sigmaF (i.e. we throw out weak data). Run this by typing ?scaleit.com?. We have chosen to refine the scale and an anisotropic B-factor (REFINE ANISOTROPIC). You can refine just a scale between data sets, a scale and isotropic B, or a scale and an anisotropic B, as shown below from the documentation: An overall scale (REFINE SCALE) C Isotropic temperature factor (REFINE ISOTROPIC) C * exp (-B sintheta/lambda) Anisotropic temperature factor (REFINE ANISOTROPIC) (default) C * exp(-(h**2 B11 + k**2 B22 + l**2 B33 + 2hk B12 + 2hl B13 + 2kl B23) An anisotropic B is essentially a B-factor that can vary depending on the direction of reciprocal space. This is useful for crystals that may not diffract as well in one direction as in others. You can think of an anisotropic B as being shaped like a three-dimensional ellipse - perhaps equal in two dimensions but elongated in a third. The anisotropic B refines a matrix of on-diagonal (B11, B22, B33) and off-diagonal (B12, B13, B23) values for B. Again, the ouput file contains a wealth of information; we will consider only a small portion of it. Some of the log file (you can use ?/stringtosearchfor? in vi to look for the equivalent lines from your log file): Derivative: FP= FP FPH= FPU SIGFPU Initial derivative scale factor (for F) = Derivative: 0.9179 from 13007 reflections FP= FP FPH= FPHg SIGFPHg Initial derivative scale factor (for F) = 0.8705 from 11049 reflections For derivative : 1 beta matrix - array elements beta11 beta12 beta13 beta21 beta22 beta23, beta31 beta32 beta33 1.9855 0.0000 0.0000 0.0000 1.9855 0.0000 0.0000 0.0000 -0.1937 For derivative : 2 beta matrix - array elements beta11 beta12 beta13 beta21 beta22 beta23, beta31 beta32 beta33 2.0784 0.0000 0.0000 0.0000 2.0784 0.0000 0.0000 0.0000 -4.3380 These are our scale and aniso B?s for each deriv (#1=U, #2=Hg) - notice that the off-diagonal terms in the aniso-B matrices are 0. Thus, we could have used an isotropic B for these lysozyme data sets (no harm in using aniso, though). ------------------------------------------------------------------------------------------------------------------Isomorphous Differences Derivative title: FP= FP FPH= FPU SIGFPU Differences greater than 4.1281 * RMSDIF are unlikely, ie acceptable differences are less than 95.15 Maximum difference 133.00 ---------------------------------------------------------- ---------------------------------------------------------Isomorphous Differences Derivative title: FP= FP FPH= FPHg SIGFPHg Differences greater than 4.0908 * RMSDIF are unlikely, ie acceptable differences are less than 202.47 Maximum difference 341.00 The ?acceptable differences? lines will be useful when we calculate difference Pattersons - large differences between native and derivative data sets should be eliminated as they skew the sensitive difference Patterson calculation. Scaleit gives you some useful #?s to use to start eliminating large isomorphous differences. Thus, we will use 95.15 as a max difference when we calculate our U difference Patterson. Also, for each derivative, the R-factor versus resolution is examined. For the U derivative: $TABLE:Analysis v resolution FP= FP FPH= FPU SIGFPU : $GRAPHS : Kraut_sc and RMS(FP/FHP) v resolution :N:1,5,6 : : Rfac/Wted_R v resolution :N:1,7,9 : : <diso> and Max(diso) v resolution :N:1,10,11: : <dano> and Max(dano) v resolution :N:1,14,15:$$ 1/resol^2 Resolution Number_reflections Mean_FP_squared Kraut_scale Sqrt(Mean_FP_squared/Mean_FPH_squared) Rfactor Rfactor_I Rfactor_W Mean_Abs(Diso) Maximum_Diso Number_reflections_with_anomalous_differences Mean_Dano(=0?) Mean_Abs(Dano) Maximum_Dano Mean_Diso/0.5*Mean_Dano(Kemp) $$ 1/resol^2 Res NRef <FP**2> Sc_kraut SCALE RFAC RF_I Wted_R <diso> max(diso) N(ano) <ano> <|ano|> max(ano) Kemp $$ 0.009 10.8 0 0.0000 54 143351. 1.022 1.012 0.127 0.161 0.289 40.3 123 0.026 6.2 0 0.0000 241 97123. 1.013 1.005 0.100 0.146 0.246 26.8 91 0 0.0 0.0 0.043 4.8 0 0.0000 339 94205. 1.006 0.997 0.102 0.159 0.184 27.3 116 0 0.0 0.0 0.061 4.1 0 0.0000 395 147343. 1.007 1.000 0.090 0.138 0.213 30.0 107 0 0.0 0.0 0.078 3.6 0 0.0000 445 150639. 1.012 1.004 0.089 0.141 0.155 30.2 127 0 0.0 0.0 0.095 3.2 0 0.0000 504 120067. 1.017 1.009 0.091 0.147 0.146 27.5 133 0 0.0 0.0 0.112 3.0 0 0.0000 536 77148. 1.021 1.011 0.100 0.149 0.148 24.3 122 0 0.0 0.0 0.130 2.8 0 0.0000 577 53511. 1.014 1.004 0.095 0.150 0.126 19.2 102 0 0.0 0.0 0.147 2.6 0 0.0000 623 41177. 1.015 1.002 0.109 0.166 0.141 19.3 96 0 0.0 0.0 646 33609. 1.012 0.998 0.118 0.187 0.143 18.9 110 0 0.0 0.0 0.164 2.5 0 0.0 0.0 0 0.0000 0.182 2.3 0 0.0000 687 29327. 1.013 0.998 0.119 0.188 0.139 17.7 86 0 0.0 0.0 0.199 2.2 0 0.0000 727 28052. 1.020 1.003 0.123 0.199 0.137 18.1 109 0 0.0 0.0 0.216 2.2 0 0.0000 759 22836. 1.017 1.002 0.116 0.186 0.121 15.3 85 0 0.0 0.0 0.234 2.1 0 0.0000 767 20323. 1.021 1.004 0.123 0.196 0.128 15.5 58 0 0.0 0.0 0.251 2.0 0 0.0000 820 15048. 0.999 0.980 0.139 0.229 0.129 14.8 72 0 0.0 0.0 0.268 1.9 0 0.0000 829 11484. 1.005 0.982 0.146 0.235 0.131 13.6 68 0 0.0 0.0 0.285 0.0000 1.9 866 8988. 0.993 0.969 0.157 0.255 0.133 12.9 77 0 0.0 0.0 0 0.303 0.0000 1.8 885 6511. 0.998 0.971 0.165 0.276 0.130 11.6 57 0 0.0 0.0 0 0.320 0.0000 1.8 909 4904. 0.978 0.954 0.167 0.285 0.125 10.3 60 0 0.0 0.0 0 0.337 0.0000 1.7 1398 3697. 0.967 0.941 0.182 0.306 0.131 9.6 47 0 0.0 0.0 0 37435. 1.012 1.000 0.117 0.166 0.138 17.1 $$ "><b>For inline graphs use a Java browser</b></applet> THE TOTALS 13007. 133. The R-factor of native vs. this U derivative varies from 12.7 t0 18.2% depending on resolution, and has a overall value of 11.7%. Compare this value to the Rmerge values for each data set alone (~5%) - thus, there are differences between these data sets.Perhaps U is a derivative. We will know when we calculate the difference Patterson and refine potential sites. A lot of people have ?guidelines? for deciding if a data set is a potential derivative. One is to look for a dip in the R-factors at medium resolution. We have it here where the R dips to ~9% between 4-3 Å resolution and then rises again. For great data and a good derivative (like this case) you may see this. On the other hand, you may not. Only at the difference Patterson and heavy atom refinement stages are a derivative?s potential merit revealed. You can look for the same values for the Hg derivative. (This one is less isomorphous and does not show this dip characteristic.) We now have the two derivative data sets scaled with the native in the file: lyso_nat_U_Hg_sc.mtz 4. Calculate a difference Patterson for the U derivative. We will now use the Fast Fourier Transform routine in CCP4 to calculate the difference Patterson for the U derivative. The input file is: fft_dp.com fft HKLIN lyso_nat_U_Hg_sc.mtz \ MAPOUT lyso_nat_U_dp.map <<eof-f> fft_dp_U.log TITLE difference Patterson LABIN F1=FP SIG1=SIGFP F2=FPU SIG2=SIGFPU RESO 10 2.5 xyzlim asu PATTERSON EXCLUDE sig1 3 sig2 3 diff 100 END eof-f We use our scaled .mtz as input and will create a map file containing the Patterson map. The input labels are as we have kept them. We need to tell the program that we want a Patterson map calculation. Resolution should be lower that your maximum - error increases with resolution because the data is weaker at higher resolution. The only exception to this is when you are calculating an anomalous difference Patterson. For anomalous, the signal does not diminish with resolution but increases, so if you have good resolution you should do your anomalous DP?s up to higher res (maybe 2 Å or higher in this case). For this isomorphous difference Patterson, we choose 2.5 Å as our maximum resolution. We also choose to exclude all data weaker than 3.0 sigma, and the reflection pairs where the absolute difference between structure factors is greater than 100. Recall the output from scaleit that indicated that we do this (essentially because these difference are larger than ~4 sigma of the mean difference). Difference Pattersons are sensitive calculations and poor data or unlikely differences can throw them off; thus, we throw ?bad? reflection pairs away. Run this by typing ?fft_dp.com?. The map file is a binary that we cannot look at directly. First, we will search for peaks in it: peakmax.com $CCP4/bin/peakmax mapin lyso_nat_U_dp.map << eof_peak > lyso_nat_U_dp.peaks output PEAKS threshold rms 3.0 eof_peak The output file (lyso_nat_U_dp.peaks) contains information about the map: Number of columns, rows, sections ............... 49 49 21 Map mode ........................................ 2 Start and stop points on columns, rows, sections 0 48 0 48 0 20 Grid sampling on x, y, z ........................ 96 96 40 Cell dimensions ................................. 78.54700 78.54700 36.83600 90.00000 90.00000 90.00000 Fast, medium, slow axes ......................... Y X Z Minimum density ................................. -24.31904 Maximum density ................................. 227.59644 Mean density .................................... 0.00812 Rms deviation from mean density ................. 2.27337 Space-group ..................................... 123 Number of titles ................................ 1 Notice that the rms deviation from mean density is 2.27. This is the sigma level for the map; we will need this latere. It also contains a list of peaks greater than 3 sigma for this map: There are 16 peaks higher than the threshold 6.82012 ( 3.00000 *sigma) These peaks are sorted into descending order of height, the top 12 are selected for output The number of symmertry related peaks rejected for being too close to the map edge is 4 Peaks related by symmetry are assigned the same site number Order No. Site Height/Rms Grid 1 1 1 100.11 2 16 11 3 5 4 4 14 10 5 7 5 6.23 5.23 4.74 4.41 Fractional coordinates Orthogonal coordinates 0 0 0 0.0000 0.0000 0.0000 16 35 20 0.1676 0.3643 0.5000 23 23 3 0.2363 0.2363 0.0795 38 38 17 0.4005 0.4005 0.4166 13 48 7 0.1342 0.5000 0.1683 0.00 0.00 0.00 13.16 28.62 18.42 18.56 18.56 2.93 31.46 31.46 15.35 10.54 39.27 6.20 6 11 8 3.94 48 32 13 0.5000 0.3334 0.3338 39.27 26.19 12.29 7 8 6 3.70 25 10 10 0.2639 0.0996 0.2519 20.73 7.83 9.28 8 9 6 3.70 10 25 10 0.0996 0.2639 0.2519 7.83 20.73 9.28 9 2 2 3.37 40 40 0 0.4172 0.4172 0.0000 32.77 32.77 0.00 10 3 3 3.33 3 0 3 0.0360 0.0000 0.0809 11 10 7 3.26 0 0 12 0.0000 0.0000 0.3077 0.00 0.00 11.33 12 13 9 3.10 0 0 14 0.0000 0.0000 0.3454 0.00 0.00 12.72 2.83 0.00 2.98 Notice that the peak at the origin is the largest in the map, as is expected from the Patterson function. We will ignore this peak. The Harker sections for this space group are w=0.25, 0.5, 0.75, u=0.5, v=0.5. Notice there is a large peak (6.23 sigma) at u=0.16, v=0.36, w=0.5. 5. Visualising this difference Patterson. We visualize this map by using the following plotting command file: npo.com rm -f top.* $CCP4/bin/npo mapin lyso_nat_U_dp.map plot top.plo << eof > npo.log TITL Lysozyme DP U - Native (10-2.5 A) CELL 78.54700 78.54700 36.83600 90.00000 90.00000 90.00000 MAP CONTRS 4.6 TO 200.0 BY 2.27 GRID SECTNS 0,21 COLOUR BLACK SIZE 60.0 CHAR 2.5 SCALE 2.5 THICK 0.3 PLOT eof $CCP4/bin/pltdev -i top.plo -o top.ps -sca 1.0 This generates a postscript file containing your Patterson map sectioned from 1-21 along the slow section, which is Z in this case (take my word for it; if you want to look for this, look in the fft log file or in the peakmax log file). Remember the sigma level for the map, 2.27? Notice in the ?CONTRS? line, we are contouring our plot from 2 sigma (4.6) to a large value by 2.27. In other words, our map contours will start at 2 sigma and go up by 1 sigma from there. The output is a postscript file called and view this by typing Or convert it to a pdf using and view it with top.ps xpsview top.ps ps2pdf top.ps top.pdf acroread The plot?s will say X, Y and Z, but they mean U, V and W, respectively, for a Patterson map. From now on we will discuss them as X, Y and Z. This Patterson contains the following area of Patterson space: 0 to 1 for X and Y, and 0 to 0.5 for Z. 6. Solving the Patterson for a consistent set of U positions in real space. Using a combination of the peak lists we got from peakmax and by viewing the Patterson map directly, we can begin to interpret what it means. Recall that there are Harker vectors (or peaks), ones that arise only on the Harker sections and corresponding directly to heavy atom sites. Also, there are Cross vectors (or peaks), ones that arise when Harker vectors ?interact? with one another in the Patterson function; these appear at places other than the Harker sections. The Harkers are Z=0.25, 0.5, X=0.5, Y=0.5. So as we scan through the Patterson map (using ?page-up? or some equivalent command in your viewer), we will look for things at the 0 and 0.5 sections of Z: Z=0 section (page 1) 100 sigma origin peak HARKER PEAKS: correspond directly to heavy-atom positions Z=0.5 section (page 21) 6.2 sigma peaks for our putative U positions Corresponds to (0.16, 0.36 0.5) peak in peak list Two peaks appear because of a diagonal mirror plane -- the other is at (0.36, 0.16, 0.5) Z=0.17 (page 7) Two 4.4 sigma peaks, at (0.13, 0.5, 0.17) and (0.5, 0.13, 0.17) These are on the X and Y Harker sections; again, mirror symmetry in play Z=0.32 (page 13) Two 3.9 sigma peaks, at (0.5, 0.33, 0.33) and (0.33, 0.5, 0.33) Also on the X and Y Harker sections,with mirror symmetry Z=0.25 (page 10) Two 3.7 sigma peaks, at (0.26, 0.1, 0.25) and (0.1, 0.26, 0.25) On Z Harker section, with mirror symmetry CROSS PEAKS: correspond to heavy atom cross vectors Z=0.08 (page 3) One 5.2 sigma peak at (0.24, 0.24, 0.8) Z=0.42 (page 17) One 4.7 sigma peak at (0.4, 0.4, 0.42) We have interpreted all the peaks in the peak list from peakmax greater than 3.7 sigma. Solving the Patterson by hand: The symmetry operations for space group P43212 are: Symmetry operation # 1: X , Y , Z Symmetry operation # 2: -X , -Y , Z+1/2 Symmetry operation # 3: -Y+1/2 , X+1/2 , Z+3/4 Symmetry operation # 4: Y+1/2 , -X+1/2 , Z+1/4 Symmetry operation # 5: -X+1/2 , Y+1/2 , -Z+3/4 Symmetry operation # 6: X+1/2 , -Y+1/2 , -Z+1/4 Symmetry operation # 7: Y , X , -Z Symmetry operation # 8: -Y , -X , -Z+1/2 For simple space groups like the common P21 that only have one Harker section (V=0.5), its possible to solve the Patterson function by hand, and to account for all the cross vectors. In complex space groups like this with five Harker sections, we are fortunate to have programs to help us. Real Space Patterson Search The following command file will search for real space heavy atom positions that satisfy our various Harker and cross peaks: rsps.com rsps << eof > lyso_nat_U_dp.log SCORE HARMONIC BUMP 5.0 SPACEGROUP P43212 LOW PATFILE 50 lyso_nat_U_dp.map TRUNCATE 500.0 RESET 0 0 0 10.0 0.0 SCORFILE dp_harker.map WEIGHT SCAN 2 AU PICK SCOREMAP 100 VLIST SITE 1 25 WRITE POSITIONS dp_harker.pdb # eof rm -f dp_harker.map rm -f dp_cross.map rm -f TO This program takes our input Patterson map (lyso_nat_U_dp.map), and some information like space group, etc, and searches for the top 25 real space heavy atom positions that satisfies our Patterson map. Again, the log file (lyso_nat_U_dp.log) is a wealth of crystallographic information, but we will focus on its solution to our Patterson: RSPS SINGLE ATOMS VLIST >> Vectors with density less than 2.28 ( 1.0 sigma above mean) are counted as low Scores are computed as 1./Sum(1./(Weight*Rho/Sigma))/Nvec where Weight is the multiplicity of the vector Rho is the density at a vector position Sigma is the rms deviation from the mean of the map Nvec is the number of vectors contributing to the sum **************************************************************************** Harker vectors for a heavy atom position at 0.5830 0.8177 0.0416: Vec U V W Rho Multiplicity --- ------ ------ ------ ------- -----------1 0.1660 0.3645 0.5000 14.16 1 2 0.0992 0.2653 0.2500 8.42 2 3 0.3340 0.5000 0.3331 8.95 1 4 0.5000 0.1355 0.1669 10.03 1 5 0.2347 0.2347 0.0831 11.89 1 6 0.4008 0.4008 0.4169 10.77 1 Score = 5.09 with 0 low peaks **************************************************************************** **************************************************************************** Harker vectors for a heavy atom position at 0.0811 0.1771 0.0349: Vec U V W Rho Multiplicity --- ------ ------ ------ ------- -----------1 0.1622 0.3542 0.5000 9.71 1 2 0.2418 0.4040 0.2500 1.52 2 3 0.3378 0.5000 0.3198 8.95 1 4 0.5000 0.1458 0.1802 5.74 1 5 0.0960 0.0960 0.0698 3.50 1 6 0.2582 0.2582 0.4302 1.64 1 Score = 1.64 with 3 low peaks **************************************************************************** The top solution for a heavy atom position is (0.58, 0.81, 0.04). Note that for this solution, all the Patterson vectors (denoted by U,V,W, naturally) exist and have high rho (peak hieght) values. Also, there are no ?low peaks?, or peaks in the Patterson that are missing. It also accounts for most of the peaks we observed in our inspection of this Patterson (the rest we observed are related by symmetry to those noted above). This is a good solution; it gets a high overall score (score defined above). The next solution is not. It contains some good Patterson peaks, but is missing 3 and gets a low score. Thus, our position of the U heavy atom in this derivative of lysozyme is (0.58, 0.81, 0.04). We now move on the refining this position and calculating initial SIR (single isomorphous replacement) phases. N.B. I will leave it to you to calculate and examine the Patterson map for the Hg derivative. The command files will be the same; so are the Harker sections, etc. We will use the phases from the U derivative to get the Hg positions directly in real space (ie. not in Patterson space). 7. Refining the U position and using it to find Hg sites. Refinement of heavy atom positions, as well as phase calculation, will be done using the program MLPHARE (Maximum Likelihood PHase REfinement). With only one derivative, we are calculating SIR phases. The command file looks like this: mlphare_sir.com $CCP4/bin/mlphare HKLIN lyso_nat_U_Hg_sc.mtz \ HKLOUT lyso_nat_U_mlph.mtz <<eof-f> mlphare_nat_U.log TITLE refining U position(s) CYCLE 20 THRES 2.5 0.5 ANGLE 10 PRINT AVE AVF LABIN FP=FP SIGFP=SIGFP FPH1=FPU SIGFPH1=SIGFPU LABOUT ALLIN PHIB=PHIsir FOM=FOMsir RESO 10 4.0 EXCLUDE SIGFP 3.0 HLOUT DERIV U DCYCLE PHASE ALL REFCYC ALL KBOV ALL RESO 10 4.0 EXCLUDE DISO 100 SIGFPH1 3.0 ATOM U 0.5830 0.8177 0.0416 1.00 BFAC 40.000 ATREF X ALL Y ALL Z ALL OCC ALL B ALL END eof-f Most of this file should be understandable now; details can be found in the documentation (on the CCP4 web site). A few things: our output columns will be the same as the input, plus a phi-best (sir phase) per reflection, and a figure-of-merit per reflection. We will use data only from 10-4.0 Å at this stage. Our derivative is a U, and we have one site (denoted on the line starting ATOM). Note our U position is inserted there, followed by its starting occupancy (1.0) and B (40.0). We will refine the X , Y, and Z positions, as well as the OCC and B on all refinement cycles. Run this by typing ?mlphare_sir.com?. The output file (mlphare_nat_U.log) contains, yet again, an enormous amount of information. We will look at a few areas that inform us as to how good a derivative this U atom is. At the bottom of it are the refined parameters for this U: DERIV U DCYCLE PHASE ALL REFCYC ALL KBOV ALL RESO 10.00 4.00 SCALE FPH1 0.9839 ISOE 0.9602 20.84 19.53 18.86 18.12 19.93 27.81 28.50 28.19 ATOM1 U 0.582 0.818 0.044 0.145 BFAC 9.209 The x, y and z positions have not changed much; the occupancy dropped, as did the B. These last two parameters are highly correlated (perhaps they should not be refined together, but it is fine in this case). A bit above this are the values for the quality of this derivative for phasing: Resolution(Angstroms) Number_acentric_reflections Isomorphous_difference_acentric Lack_of_closure_acentric Phasing_power_acentric Cullis_R_acentric(?<1.0) Number_centric_reflections Isomorphous_difference_centric Lack_of_closure_centric Phasing_power_centric Cullis_R_centric(?<1.0) $$ 1/resol^2 Resol Nref_a DISO_a LOC_a PhP_a CullR_a Nref_c DISO_c LOC_cPhP_c CullR_c $$ 0.014 8.42 60 25.6 15.1 1.98 0.59 59 40.0 19.9 1.56 0.50 0.019 7.27 38 28.9 16.7 1.95 0.58 25 30.8 15.5 1.35 0.50 0.024 6.40 58 21.9 14.2 2.08 0.65 35 35.1 17.2 1.77 0.49 0.031 5.71 75 21.1 13.4 2.19 0.63 34 26.2 19.0 1.25 0.72 0.038 5.16 90 24.2 15.3 1.77 0.63 36 38.5 18.5 1.46 0.48 0.045 4.71 118 29.6 22.0 1.25 0.74 47 30.4 24.2 0.91 0.80 0.053 4.32 134 31.1 24.2 1.11 0.78 51 28.5 22.5 0.94 0.79 0.063 4.00 166 27.7 22.5 1.08 0.81 47 36.1 27.2 0.95 0.75 739 27.0 19.4 1.42 0.72 334 33.6 21.1 1.21 0.63 $$ TOTAL Look at the Centric reflections (those occuring on planes containing Friedel pairs, like the x,y,0 plane); these are less biased in an SIR refinement. They are at the right, indicated by _c following each parameter. First, the Phasing power (PhP) - anything above 1.0 is very good. The cullis-R factor next - anything 0.8 or below is good. Thus, this is a very good derivative. Next, a bit farther up in the log file is information about Figure-of-Merit (FOM): Number of Measurements phased -ACENTRIC 60 38 58 75 90 118 134 166 TOTAL 739 Mean Figure of Merit 0.4840 0.4473 0.4005 0.4172 0.4318 0.3388 0.3353 0.3029 Number of Measurements phased -CENTRIC 59 25 35 34 36 47 51 0.3716 TOTAL 47 334 Mean Figure of Merit 0.7833 0.5903 0.7464 0.6008 0.6493 0.4517 0.4238 0.5691 Number of Measurements phased -ALL 0.6003 TOTAL 119 63 93 109 126 165 185 213 1073 Mean Figure of Merit 0.6324 0.5040 0.5307 0.4745 0.4939 0.3710 0.3597 0.3616 0.4428 FOM is the cosine of the phase error; thus, 1.0 would be perfect phases, 0.0 would be random phases. Again, look at the centrics - FOM of 0.6 is very good, especially for an SIR phasing situation where it is difficult to break the phase ambiguity. So we have a good U derivative that provides us with our first set of experimental phases for this structure. These phases are in lyso_nat_U_mlph.mtz. We can use these phases to find the Hg atoms in that derivative. Cross-Difference Fouriers Using the SIR phases calculated from the U derivative, we can use a |FHg-Fnat|, PhiU fourier map to find positions of Hg. This is a so-called Cross-Difference Fourier because phases from one derivative are used to find sites in another. fft_cdf.com $CCP4/bin/fft HKLIN lyso_nat_U_mlph.mtz \ MAPOUT temp.map <<eof-f> fft_cdf.log TITLE LABIN F1=FPHg SIG1=SIGFPHg F2=FP SIG2=SIGFP PHI=PHIsir W=FOMsir RESO 10 4.0 EXCLUDE sig1 3 sig2 3 diff 200 END eof-f $CCP4/bin/peakmax mapin temp.map << eof > lyso_nat_U_toget_Hg_cdf.peaks threshold rms 3.0 #negatives output peaks eof rm -f TO #rm -f temp.map We use the F?s and SigF?s from the Hg and native data sets, and the SIR phases and FOM from the U derivative, and calculate a electron density map. Significant peaks in this map should be the position of a Hg atom in the Hg derivative data set. We do a peaksearch to find these peaks in the same command file. Here are the peaks from lyso_nat_U_toget_Hg_cdf.peaks: Order No. Site Height/Rms Grid Fractional coordinates Orthogonal coordinates 1 3 2 12.00 55 19 1 0.9173 0.3128 0.0180 72.05 24.57 0.66 2 4 1 12.00 35 49 1 0.5836 0.8168 0.0428 45.84 64.16 1.58 3 7 5 6.59 41 5 3 0.6847 0.0843 0.0970 53.78 6.62 3.57 4 5 3 4.04 37 59 1 0.6208 0.9833 0.0408 48.76 77.24 1.50 5 6 4 3.19 15 52 2 0.2553 0.8626 0.0590 20.05 67.76 2.17 6 9 6 3.06 11 37 4 0.1754 0.6226 0.1250 13.78 48.91 4.60 Right away we see a feature of cross difference Fouriers using phases from derivative data - the