BIOSCI 737 X-ray Lab 1 BIOSCI 737: X-ray Diffraction Laboratory 1: Indexing, integrating, scaling and merging X-ray data Friday 21st March 2014 Introduction So far in this module, you have been bombarded with X-ray crystallography theory, and now it's time to try some in practice. Hopefully this series of laboratories will enable you to place some of this theoretical information into a useful, practical context. The data. The X-ray diffraction data provided here are from a crystal of a 187 amino acid protein which have been soaked in a 5mM solution of a platinum compound (hence the filename prefix 5mM_xxx). The data were collected on our laboratory X-ray source using the MAR345 image plate detector, and hence all the files have the .mar2300 file extension. There is a structure available for a homologous protein, which gives us the opportunity to use molecular replacement to solve the structure, and this is the method we'll use in this practical. The data are provided as 90 image files (5mM_257.mar2300 through to 5mM_347.mar2300) which are available to download as a zip archive from here: http://persephone.sbs.auckland.ac.nz/richard/local/teaching.html Some theory. This tutorial assumes you have been able to grasp something about the theory of diffraction and symmetry from your lectures, and it will help to have your lecture notes with you. Other very good sources of information are David Blow's book, the two online courses listed below, and of course Berhard Rupp’s masterpiece. “Outline of Crystallography for Biologists” by David Blow (Oxford University Press) (http://www.oup.co.uk/isbn/0-19-851051-9) University of Cambridge Course in Structural Medicine: 1 BIOSCI 737 X-ray Lab 1 http://www-structmed.cimr.cam.ac.uk/course.html Bernhard Rupp’s Crystallography 101: http://www.ruppweb.org/Xray/101index.html “Biomolecular Crystallography” by Bernhard Rupp (Garland Science) The software This tutorial will primarily use the program MOSFLM to index and integrate X-ray diffraction data, and its companion program SCALA to merge and scale the integrated data. These programs are part of what is known as the CCP4 (Collaborative Computing Project 4) suite of programs, which is one of the most commonly used sets of software for X-ray crystallography. The MOSFLM manual is very good on the practical considerations of dealing with X-ray data and well worth reading. Some starter notes (mosflm_intro.pdf) and the user guide (mosflm_user_guide.pdf) for MOSFLM have been posted on Cecil, as well as a tutorial for the iMosflm GUI (iMosflm_tutorial.pdf). In contrast, the SCALA manual is not readily interpretable by normal human beings. After scaling all the measured X-ray intensities (I), they are converted to structure factor amplitudes (F) using the program TRUNCATE. Help and information can also found online at the following places: MOSFLM: http://www.mrc-lmb.cam.ac.uk/harry/mosflm/ iMosflm: http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver107/ CCP4: http://www.ccp4.ac.uk 2 BIOSCI 737 X-ray Lab 1 Getting the software We will be running this practical using OSX, but the software we are using is free for academic use and will run on Windows and Linux as well. Check out http://www.ccp4.ac.uk/download.php for instructions on downloading a version for the operating system of your preference. Getting the data All the X-ray diffraction data you will need for this practical has been posted as a .zip archive here: http://persephone.sbs.auckland.ac.nz/richard/local/teaching.html The machines you are using will by default open new windows, save files etc. in /Users/student/Documents so it is probably a good idea to make a new folder there called ‘737_xray_lab’ or similar. Drag the 5mM.zip file from the Downloads folder to that folder, and double-click to unzip it into a folder full of data files. 3 BIOSCI 737 X-ray Lab 1 The first frame To get started, go to Applications folder, open the ccp4-6.3.0 folder and launch iMosflm by clicking on the icon: You should see something like this: You can work your way through the process of indexing and integrating a set of diffraction images by working your way down the available options in the left-hand column of the GUI. MOSFLM will read the information regarding crystal-to-detector distance and φ angle (rotation angle of the crystal) from the header of the image files, so you don't need to worry about entering this yourself. Autoindexing The first step in processing the diffraction data is to index it, i.e. determine the Bravais lattice and unit cell dimensions, which allow us to predict the positions of all the spots in the diffraction images. Subsequent measurement of the intensities of each spot will allow us to determine the Laue symmetry (the point group symmetry of the diffraction pattern) and likely space group of the crystal. In most cases the Bravais lattice and unit cell dimensions can be determined automatically from a single image 4 BIOSCI 737 X-ray Lab 1 (or for a more robust result, from two or more images, preferably well-spaced in φ) hence the term ‘autoindexing’. For this to work well, you must have a reasonably good idea of where the direct beam position is (i.e. the centre of the diffraction pattern). You can’t measure that directly from your images because the incident X-ray beam is blocked by the protective beam stop. If using synchrotron data, this information will be have been returned with the data. At home, the best way is to collect a wax image or to use a water ring to set the beam centre. For this set of data, we have none of the above available to us, so we are going to assume the direct beam is in the centre of the image. This is likely to be approximately true, although one should not rely on this assumption if at all possible! However, you can see that this is the default assumption of the program by checking out the values for ‘Beam X Position’ and ‘Beam Y Position’ at the top of the GUI, which are 172.5mm (i.e. half of the 345mm diameter of the MAR345 detector used to collect this data.) Ok, let’s load the diffraction images into the program. Click on the ‘Add images’ button and navigate to the folder where you saved the image files: Clicking on the first image file 5mM_257.mar2300 will by default load all of the available images into iMosflm, and your main GUI window should now look like this: 5 BIOSCI 737 X-ray Lab 1 Additionally, a second window will now have opened, showing the actual image, like this: One thing that should be done before we start is to mask off the area of the image hidden by the backstop. Click on the green square and then use the arrow tool to adjust the size and position of the backstop mask. You should end up with something like this: 6 BIOSCI 737 X-ray Lab 1 Now we can attempt to autoindex the image by pressing the Indexing button on the left hand side of the GUI. When you do this, a number of things happen, and the GUI should look something like this: 7 BIOSCI 737 X-ray Lab 1 By default the program will locate the most intense spots on the image (I/σI>5) and will attempt to autoindex the pattern using spots with I/σI>20, i.e. just the strongest peaks. The picked spots are marked as red crosses on the image window, and the predicted diffraction pattern is shown as blue and yellow squares: blue for ‘fully recorded’ and yellow for ‘partially recorded’ i.e. recorded over two or more adjacent images. Spots that can’t be measured due to being overlapped with their neighbours or because they have moved through too wide a ϕ angle are flagged in red and green respectively. The table lists some possible crystal lattices and associated cell dimensions, sorted in order of their ‘penalty’ i.e. how much the ideal lattice would have to be distorted to fit the observed diffraction pattern. As per the lecture notes, the lattices are divided into “crystal systems” and the Mosflm notation is (from most symmetrical to least symmetrical): cubic (c), tetragonal (t), trigonal/hexagonal (h), orthorhombic (o), monoclinic (m) or triclinic (a), with the unit cell being Primitive (P), Face-centred in all faces (F), Face-centred in one face (C) or Body-centred (I). (see the table the crystal systems from Lecture 2. Don’t worry about the “missing” Rhombohedral crystal system … Mosflm bundles that with the hexagonal system, an ongoing crystallographic argument which need not concern us) You should take as your starting point the highest symmetry option with an acceptably low penalty score i.e. 8 BIOSCI 737 X-ray Lab 1 the least distortion to make the observed fit the predicted pattern. Often, as in this case, there will be a clear step-point that will guide your decision, and the program will also make a suggestion – highlighted in blue. In this case, an I-centred tetragonal cell is the clear choice, which gives us four possible space groups to choose from: I4, I41, I422 or I4122 (see the table of the 65 “Biological” space groups from lecture 2). There is a priori no way of telling these apart at this point, and we need to systematically check out the possible options. We have to the measure the intensities of the diffraction maxima and determine the actual symmetry of the diffraction pattern. Systematic absences, resulting from the presence of screw rotation axes will further narrow the possibilities Specifically, spacegroups I4 and I41 will both give rise to 4/m symmetry in the diffraction pattern, whereas spacegroups I422 and I4122 will both show 4/mmm symmetry in the diffraction data (see the diagrams of the Laue groups, Lecture 2). The spacegroups with a screw axis (I41 and I4122) will show a pattern of systematic absences in the 00l reflections (i.e. along the l-axis), whereas the spacegroups without a screw axis (I4 and I422) won’t show that pattern. (see the relevant table in lecture 5) As described in Richard's lecture notes, the best practice is to go with the lowest symmetry in this group (I4) and subsequently check the processed diffraction data for evidence of higher symmetry, screw axes etc. This is also what the program is suggesting, so let's follow its advice. One of the key criteria for success of indexing is the final standard deviation of spot position after refinement, σ(x,y) i.e. how much on average everything had to move to fit the predicted spot pattern to the observed one. (Check out section 3.1.4.1 of the MOSFLM manual for details). As a general rule of thumb, this value should be below ~0.3 mm. In this case you can see it is reassuringly low at <0.2 mm, which indicates we’re on the right track. Visually - a key indication that we’ve made the right choice is whether the predicted diffraction pattern (blue and yellow boxes) matches the observed pattern of diffraction 9 BIOSCI 737 X-ray Lab 1 spots. Try choosing some ‘solutions’ with high penalty values (marked in yellow in the list of solutions in the GUI) and see whether the patterns fit. The thid GUI window estimates the mosaicity of the crystal. Roughly speaking, this is a measure of the imperfection of your crystal. You can picture your crystal as being made up of a mosaic of many small constituent crystals, which are not perfectly aligned with each other. For the crystal to be useful, the mosaicity should be as low as possible – much less than 1° if possible. Cell refinement Having decided on a choice of cell via autoindexing, it is best to then refine the cell parameters to obtain accurate values for them. The number of images required for this to work well depends on the symmetry, but iMosflm will provide sensible defaults. Click on the ‘Cell refinement’ tab, and iMosflm suggests using two blocks of images ~90° apart: 257-260, 344-347. Click ‘Process’ and watch what happens… The program will load in the images you’ve specified and optimise the predicted diffraction pattern against the observed pattern across all of those images. The main display will look something like this: 10 BIOSCI 737 X-ray Lab 1 The upper right panel is showing the ‘profiles’ of a selection of spots on the image – an indication of whether predicted and observed spots are coinciding as they should. What we are looking for here is for the dark squares to be inside the blue circle. As before, the key number to keep an eye on is the RMS residual in the bottom right panel, which should refine stably and which should be <0.3mm. The other results of the cell refinement should also be looked at by checking out the options in the bottom right-hand panel. If everything is working well, the value of YSCALE should stay extremely close to 1.000 and the detector to crystal distance should also vary by very little. The mosaicity calculated from this set of images will hopefully be a better estimated value – in this case it has increased slightly. Here, everything looks good, so we can accept the refinement results and go on to actually integrate some images. Image integration We’re now getting close to actually integrating some spots and making some actual measurements. It’s a good idea to start by integrating a block of just a few images (say 10) to make sure that everything is working as expected, and if that is successful, to then integrate the whole dataset. So, let’s begin by clicking (I hoped you guessed it!) the ‘Integration’ tab. Let’s start with 10 images, so specify 257-266 as the range and press ‘Process’. The program will run through the images selected and write the integrated spot intensities and their standard deviations into the specified .mtz file, which by default is named to match the initial image. (Some information and warning messages are written out into the SUMMARY and mosflm.lp logfiles in your current directory, so you can go back and check them later.) The main window should look like this: 11 BIOSCI 737 X-ray Lab 1 As before, you should check that the values of parameters such as YSCALE and RMS residual are small and stable, and that the spot profiles are centred on actual diffraction spots. Additionally, the bottom layer of windows now tell you something about the strength of the data you are measuring: the <I/σ(I)> value in the highresolution (HR) shell of data is our primary criterion at this stage of deciding on the effective ‘resolution’ of our X-ray diffraction data. OK, it is finally time to actually measure and integrate the X-ray data! We have a total of 90 images in the data directory, from which we will be able obtain a complete set of measurements. Click Process again, but this time include images 257 through to 347. Integrating the data from these 90 images will take a few minutes of computer time. Whilst the data are being integrated, it’s good practice to keep an eye on the predicted pattern and on the spot profiles to make sure everything is still working ok as the program works its way through the images. Assuming all goes well, MOSFLM will produce a single .mtz file named (by default) to match the filename of the first image, in this case 5mM_257.mtz. (You could and probably should choose a more informative name as required.) 12 BIOSCI 737 X-ray Lab 1 You should save all the parameters from the current run. This is usually a good idea, as it allows you to go back and use the settings you’ve made in the interactive session again, without having to re-enter them all. Click on the disc icon and save out a Mosflm session (.mos) file with a suitable name. 13 BIOSCI 737 X-ray Lab 1 So, what have we done again? To recap, we have indexed the diffraction pattern, meaning that we have assigned Miller Indices (h,k,l) to each of the spots. In doing this we have determined the probable Bravais lattice and dimensions of the unit cell of the crystal. We have then measured (integrated) the intensity of the X-rays at each point in the diffraction pattern and stored those measured intensities as a multi-record .mtz file. Based on the lattice, we have taken a guess about the symmetry of the diffraction pattern (4/m … its Laue group), and the space group of the crystal (I4). Now we’ll see how good that guess is by scaling and merging the data under the assumed symmetry. The next step - scaling and merging We will scale and merge all of the individual data measurements into a single composite dataset, consisting of a measurement of intensity (I) and an estimate of the error in the measurement (σI) for each Miller lattice point (h,k,l). We then convert the measured intensities (I) into structure factors (F), ready to calculate electron density. The scaling and merging process uses the 4/m symmetry we have assumed. If this assumption is correct, the process will go well. If this assumption is incorrect, the process will go badly !! The CCP4 suite of programs is conveniently run from a GUI called CCP4i. It’s probably a good idea to make a specific subdirectory in your directory in which to run subsequent programs. Clicking on the CCP4 icon the following graphic window: 14 should bring up BIOSCI 737 X-ray Lab 1 The individual programs and computational tasks available are listed in the left-hand column, arranged in functional themes or 'modules'. The right-hand column has some buttons for running the programs and performing various tasks to organise your data, and the centre panel will keep a record of the computing tasks performed and their outputs, which is a very handy way to keep track of what you have done previously. In order to keep things in one place, let's set up a project directory: press the 'Directories & Project Dir' button at the top right and set up a Project as illustrated in the example below, though obviously you will chose to make your own folders in /Users/student/Documents. Set the project for use in this session, make sure that the TEMPORARY alias is set to a local directory such as /tmp and then click on 'Apply&Exit'. OK, now we're set to go in CCP4i! 15 BIOSCI 737 X-ray Lab 1 The Matthews coefficient Before we do anything else, we should calculate the Matthews coefficient, as some estimate of the contents of the asymmetric unit of the crystal is needed for the calculation of Fs from Is. In the left hand column, change from the Data Reduction module to the Program List. In the list will be a program called Matthews_coef – click on it to open the setup window. Enter a meaningful job title, choose the .mtz file you produced from MOSFLM, and enter the number of amino acid residues in our protein: 187. Click Run Now, and a set of possible Matthews numbers and a probability for each of them will be displayed. Q1: What is the most likely number of molecules in the asymmetric unit from this calculation? What % solvent does this equate to, and is this reasonable? How many amino acid residues do we therefore expect to find in total in the asymmetric unit of the crystal? 16 BIOSCI 737 X-ray Lab 1 Scaling and merging Next we'll do the scaling and merging, using SCALA, and convert the merged values of I to structure factors, using TRUNCATE (use ‘Old Truncate’ not Ctruncate). Go to the Data Reduction module and choose the Scale and Merge Intensities task from the list. You will get a setup box like the one below. Choose ‘old Truncate’ raher than ‘Ctruncate’, and make sure that you have checked the boxes to include an Rfree set of data for subsequent refinement and that you have added the number of amino acid residues in the asymmetric unit that you have just calculated. It also makes sense to give useful names for the crystal and dataset you are using. Pull down Run Now to launch the computational task, which should take 3-4 minutes to run. You should be able to follow the progress of the job on the centre panel of the GUI. 17 BIOSCI 737 X-ray Lab 1 When the task has finished, the final output will be another .mtz file, named as defined in the ‘MTZ out’ box, which by default will be the input name appended with '_scala1', and a set of log files and graphs which will let us assess how the calculations went. We can look at the latter outputs by clicking View Files from Job > View Log Graphs. There are a number of useful graphs here which we can use to figure out if the data reduction has been successful and determine the quality of the merged dataset First we need to figure out if our assumed symmetry is correct. There is a lot of information summarised in the log file and log graphs, and it can all be a little confusing. However, examining just a few of these plots will tell us if the data reduction has been successful. The Scales v rotation range graphs should vary smoothly, with no sudden discontinuities, and the number of data points rejected in each frame should be a small fraction of the total. The Completeness, Multiplicity, Rmeas v resolution graphs should show the data agreeing very well at low resolution, with the agreement worsening at high resolution (where the spots are weaker). If the assumed symmetry was wrong, agreement of the data would be bad, even at low resolution. If yoou want to see the numbers in tabular form, you can directly inspect the SCALA log file. This is largely full of complex technical output, but by searching for the string 'summary' using the Find String function of the fileviewer, you can find a summary table of quality indicators like Rmerge, Rmeasure and I/σI for the outer (weakest) and inner (strongest) shells of data, and for the dataset as a whole. Q2: Assess the overall quality of your dataset with reference to the R factors, I/σ I, completeness and multiplicity figures. Does it appear that the data have the the 4/m symmetry we assumed ? We have processed the data to an effective maximum resolution of ~2Å. Is this reasonable? Next let’s check out the systematic absences, as that carries important information about the space group (and in fact is the only way to discriminate between some space groups from their diffraction pattern) Scroll down to the graph called Axial 18 BIOSCI 737 X-ray Lab 1 reflections, axis l, which shows the 00l reflections and should look something like this: If you mouse over the graph points, you can see that reflections along the l axis of the reciprocal lattice when l=16, 20, 24 etc are strong, with I/σI= 25 or so, but that reflections when l=14, 18, 22 etc are weak, with I/σI=5 or usually much less. 19 BIOSCI 737 X-ray Lab 1 Q3: What is this pattern of strong and weak reflections telling us about our choice of spacegroup? Finally, let's check out the data to see if there is any further symmetry. For this we'll use a program called ViewHKL, which will allow us to visualise an undistorted view of the reciprocal lattice (a ‘pseudo-precession photograph’) to make the task easier. Click on the ViewHKL icon and you should get a GUI interface in the same style as the MOSFLM one we used earlier. Click on the 'Open' button to read in your scaled .mtz file, the one that was produced by SCALA. Click on the ‘HKL Zones’ button and then on the ‘hk0’ button and you should see something like this: 20 BIOSCI 737 X-ray Lab 1 Here we are looking at a slice through the reciprocal lattice where l=0, looking down the l-axis. Hopefully the four-fold symmetry is obvious, and this is what we would expect from our tetragonal spacegroup. Now look at the 0kl and h0l sections – what symmetries are visible? You might also find it helps in trying to visualise what’s going on if you step through the levels of the lattice (1kl, 2kl etc.). Q4: Is there any apparent symmetry in the diffraction pattern along the h and k axes? Do you think the Laue symmetry of the diffraction data is 4/m or 4/mmm? Combining this information with the pattern of systematic absences we saw in Q3, what should our final choice of spacegroup be? 21 BIOSCI 737 X-ray Lab 1 Some diagrams to help you visualise what’s going on: Here are the 4 and 422 point groups: And here are the associated Laue symmetries, 4/m and 4/mmm: 22