1 BIOSCI 737: X-ray Diffraction Laboratory 1

advertisement
BIOSCI 737 X-ray Lab 1
BIOSCI 737: X-ray Diffraction
Laboratory 1: Indexing, integrating, scaling and merging X-ray data
Friday 21st March 2014
Introduction
So far in this module, you have been bombarded with X-ray crystallography theory,
and now it's time to try some in practice. Hopefully this series of laboratories will
enable you to place some of this theoretical information into a useful, practical
context.
The data.
The X-ray diffraction data provided here are from a crystal of a 187 amino acid
protein which have been soaked in a 5mM solution of a platinum compound (hence
the filename prefix 5mM_xxx). The data were collected on our laboratory X-ray
source using the MAR345 image plate detector, and hence all the files have the
.mar2300 file extension. There is a structure available for a homologous protein,
which gives us the opportunity to use molecular replacement to solve the structure,
and this is the method we'll use in this practical. The data are provided as 90 image
files (5mM_257.mar2300 through to 5mM_347.mar2300) which are available to
download as a zip archive from here:
http://persephone.sbs.auckland.ac.nz/richard/local/teaching.html
Some theory.
This tutorial assumes you have been able to grasp something about the theory of
diffraction and symmetry from your lectures, and it will help to have your lecture
notes with you. Other very good sources of information are David Blow's book, the
two online courses listed below, and of course Berhard Rupp’s masterpiece.
“Outline of Crystallography for Biologists” by David Blow (Oxford University Press)
(http://www.oup.co.uk/isbn/0-19-851051-9)
University of Cambridge Course in Structural Medicine:
1
BIOSCI 737 X-ray Lab 1
http://www-structmed.cimr.cam.ac.uk/course.html
Bernhard Rupp’s Crystallography 101:
http://www.ruppweb.org/Xray/101index.html
“Biomolecular Crystallography” by Bernhard Rupp (Garland Science)
The software
This tutorial will primarily use the program MOSFLM to index and integrate X-ray
diffraction data, and its companion program SCALA to merge and scale the integrated
data. These programs are part of what is known as the CCP4 (Collaborative
Computing Project 4) suite of programs, which is one of the most commonly used sets
of software for X-ray crystallography. The MOSFLM manual is very good on the
practical considerations of dealing with X-ray data and well worth reading. Some
starter notes (mosflm_intro.pdf) and the user guide (mosflm_user_guide.pdf) for
MOSFLM have been posted on Cecil, as well as a tutorial for the iMosflm GUI
(iMosflm_tutorial.pdf). In contrast, the SCALA manual is not readily interpretable by
normal human beings. After scaling all the measured X-ray intensities (I), they are
converted to structure factor amplitudes (F) using the program TRUNCATE. Help
and information can also found online at the following places:
MOSFLM: http://www.mrc-lmb.cam.ac.uk/harry/mosflm/
iMosflm: http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver107/
CCP4: http://www.ccp4.ac.uk
2
BIOSCI 737 X-ray Lab 1
Getting the software
We will be running this practical using OSX, but the software we are using is free for
academic use and will run on Windows and Linux as well. Check out
http://www.ccp4.ac.uk/download.php for instructions on downloading a version for
the operating system of your preference.
Getting the data
All the X-ray diffraction data you will need for this practical has been posted as a .zip
archive here:
http://persephone.sbs.auckland.ac.nz/richard/local/teaching.html
The machines you are using will by default open new windows, save files etc. in
/Users/student/Documents so it is probably a good idea to make a new folder there
called ‘737_xray_lab’ or similar. Drag the 5mM.zip file from the Downloads folder to
that folder, and double-click to unzip it into a folder full of data files.
3
BIOSCI 737 X-ray Lab 1
The first frame
To get started, go to Applications folder, open the ccp4-6.3.0 folder and launch
iMosflm by clicking on the icon:
You should see something like this:
You can work your way through the process of indexing and integrating a set of
diffraction images by working your way down the available options in the left-hand
column of the GUI. MOSFLM will read the information regarding crystal-to-detector
distance and φ angle (rotation angle of the crystal) from the header of the image files,
so you don't need to worry about entering this yourself.
Autoindexing
The first step in processing the diffraction data is to index it, i.e. determine the
Bravais lattice and unit cell dimensions, which allow us to predict the positions of all
the spots in the diffraction images. Subsequent measurement of the intensities of each
spot will allow us to determine the Laue symmetry (the point group symmetry of the
diffraction pattern) and likely space group of the crystal. In most cases the Bravais
lattice and unit cell dimensions can be determined automatically from a single image
4
BIOSCI 737 X-ray Lab 1
(or for a more robust result, from two or more images, preferably well-spaced in φ)
hence the term ‘autoindexing’. For this to work well, you must have a reasonably
good idea of where the direct beam position is (i.e. the centre of the diffraction
pattern). You can’t measure that directly from your images because the incident X-ray
beam is blocked by the protective beam stop. If using synchrotron data, this
information will be have been returned with the data. At home, the best way is to
collect a wax image or to use a water ring to set the beam centre. For this set of data,
we have none of the above available to us, so we are going to assume the direct beam
is in the centre of the image. This is likely to be approximately true, although one
should not rely on this assumption if at all possible! However, you can see that this is
the default assumption of the program by checking out the values for ‘Beam X
Position’ and ‘Beam Y Position’ at the top of the GUI, which are 172.5mm (i.e. half
of the 345mm diameter of the MAR345 detector used to collect this data.)
Ok, let’s load the diffraction images into the program. Click on the ‘Add images’
button
and navigate to the folder where you saved the image files:
Clicking on the first image file 5mM_257.mar2300 will by default load all of the
available images into iMosflm, and your main GUI window should now look like this:
5
BIOSCI 737 X-ray Lab 1
Additionally, a second window will now have opened, showing the actual image, like
this:
One thing that should be done before we start is to mask off the area of the image
hidden by the backstop. Click on the green square
and then use the arrow tool
to adjust the size and position of the backstop mask. You should end up with
something like this:
6
BIOSCI 737 X-ray Lab 1
Now we can attempt to autoindex the image by pressing the Indexing button on the
left hand side of the GUI. When you do this, a number of things happen, and the GUI
should look something like this:
7
BIOSCI 737 X-ray Lab 1
By default the program will locate the most intense spots on the image (I/σI>5) and
will attempt to autoindex the pattern using spots with I/σI>20, i.e. just the strongest
peaks. The picked spots are marked as red crosses on the image window, and the
predicted diffraction pattern is shown as blue and yellow squares: blue for ‘fully
recorded’ and yellow for ‘partially recorded’ i.e. recorded over two or more adjacent
images. Spots that can’t be measured due to being overlapped with their neighbours or
because they have moved through too wide a ϕ angle are flagged in red and green
respectively.
The table lists some possible crystal lattices and associated cell dimensions, sorted in
order of their ‘penalty’ i.e. how much the ideal lattice would have to be distorted to fit
the observed diffraction pattern. As per the lecture notes, the lattices are divided into
“crystal systems” and the Mosflm notation is (from most symmetrical to least
symmetrical): cubic (c), tetragonal (t), trigonal/hexagonal (h),
orthorhombic (o),
monoclinic (m) or triclinic (a), with the unit cell being Primitive (P), Face-centred in
all faces (F), Face-centred in one face (C) or Body-centred (I). (see the table the
crystal systems from Lecture 2. Don’t worry about the “missing” Rhombohedral
crystal system … Mosflm bundles that with the hexagonal system, an ongoing
crystallographic argument which need not concern us) You should take as your
starting point the highest symmetry option with an acceptably low penalty score i.e.
8
BIOSCI 737 X-ray Lab 1
the least distortion to make the observed fit the predicted pattern. Often, as in this
case, there will be a clear step-point that will guide your decision, and the program
will also make a suggestion – highlighted in blue.
In this case, an I-centred tetragonal cell is the clear choice, which gives us four
possible space groups to choose from: I4, I41, I422 or I4122 (see the table of the 65
“Biological” space groups from lecture 2). There is a priori no way of telling these
apart at this point, and we need to systematically check out the possible options. We
have to the measure the intensities of the diffraction maxima and determine the actual
symmetry of the diffraction pattern. Systematic absences, resulting from the presence
of screw rotation axes will further narrow the possibilities
Specifically, spacegroups I4 and I41 will both give rise to 4/m symmetry in the
diffraction pattern, whereas spacegroups I422 and I4122 will both show 4/mmm
symmetry in the diffraction data (see the diagrams of the Laue groups, Lecture 2). The
spacegroups with a screw axis (I41 and I4122) will show a pattern of systematic
absences in the 00l reflections (i.e. along the l-axis), whereas the spacegroups without
a screw axis (I4 and I422) won’t show that pattern. (see the relevant table in lecture 5)
As described in Richard's lecture notes, the best practice is to go with the lowest
symmetry in this group (I4) and subsequently check the processed diffraction data for
evidence of higher symmetry, screw axes etc. This is also what the program is
suggesting, so let's follow its advice.
One of the key criteria for success of indexing is the final standard deviation of spot
position after refinement, σ(x,y) i.e. how much on average everything had to move to
fit the predicted spot pattern to the observed one. (Check out section 3.1.4.1 of the
MOSFLM manual for details). As a general rule of thumb, this value should be below
~0.3 mm. In this case you can see it is reassuringly low at <0.2 mm, which indicates
we’re on the right track.
Visually - a key indication that we’ve made the right choice is whether the predicted
diffraction pattern (blue and yellow boxes) matches the observed pattern of diffraction
9
BIOSCI 737 X-ray Lab 1
spots. Try choosing some ‘solutions’ with high penalty values (marked in yellow in
the list of solutions in the GUI) and see whether the patterns fit.
The thid GUI window estimates the mosaicity of the crystal. Roughly speaking, this is
a measure of the imperfection of your crystal. You can picture your crystal as being
made up of a mosaic of many small constituent crystals, which are not perfectly
aligned with each other. For the crystal to be useful, the mosaicity should be as low as
possible – much less than 1° if possible.
Cell refinement
Having decided on a choice of cell via autoindexing, it is best to then refine the cell
parameters to obtain accurate values for them. The number of images required for this
to work well depends on the symmetry, but iMosflm will provide sensible defaults.
Click on the ‘Cell refinement’ tab, and iMosflm suggests using two blocks of
images ~90° apart: 257-260, 344-347. Click ‘Process’ and watch what happens…
The program will load in the images you’ve specified and optimise the predicted
diffraction pattern against the observed pattern across all of those images.
The main display will look something like this:
10
BIOSCI 737 X-ray Lab 1
The upper right panel is showing the ‘profiles’ of a selection of spots on the image –
an indication of whether predicted and observed spots are coinciding as they should.
What we are looking for here is for the dark squares to be inside the blue circle. As
before, the key number to keep an eye on is the RMS residual in the bottom right
panel, which should refine stably and which should be <0.3mm. The other results of
the cell refinement should also be looked at by checking out the options in the bottom
right-hand panel. If everything is working well, the value of YSCALE should stay
extremely close to 1.000 and the detector to crystal distance should also vary by very
little. The mosaicity calculated from this set of images will hopefully be a better
estimated value – in this case it has increased slightly. Here, everything looks good,
so we can accept the refinement results and go on to actually integrate some images.
Image integration
We’re now getting close to actually integrating some spots and making some actual
measurements. It’s a good idea to start by integrating a block of just a few images
(say 10) to make sure that everything is working as expected, and if that is successful,
to then integrate the whole dataset.
So, let’s begin by clicking (I hoped you guessed it!) the ‘Integration’ tab.
Let’s start with 10 images, so specify 257-266 as the range and press ‘Process’.
The program will run through the images selected and write the integrated spot
intensities and their standard deviations into the specified .mtz file, which by default
is named to match the initial image. (Some information and warning messages are
written out into the SUMMARY and mosflm.lp logfiles in your current directory, so
you can go back and check them later.) The main window should look like this:
11
BIOSCI 737 X-ray Lab 1
As before, you should check that the values of parameters such as YSCALE and RMS
residual are small and stable, and that the spot profiles are centred on actual
diffraction spots. Additionally, the bottom layer of windows now tell you something
about the strength of the data you are measuring: the <I/σ(I)> value in the highresolution (HR) shell of data is our primary criterion at this stage of deciding on the
effective ‘resolution’ of our X-ray diffraction data.
OK, it is finally time to actually measure and integrate the X-ray data! We have a total
of 90 images in the data directory, from which we will be able obtain a complete set
of measurements. Click Process again, but this time include images 257 through to
347.
Integrating the data from these 90 images will take a few minutes of computer time.
Whilst the data are being integrated, it’s good practice to keep an eye on the predicted
pattern and on the spot profiles to make sure everything is still working ok as the
program works its way through the images.
Assuming all goes well, MOSFLM will produce a single .mtz file named (by default)
to match the filename of the first image, in this case 5mM_257.mtz. (You could and
probably should choose a more informative name as required.)
12
BIOSCI 737 X-ray Lab 1
You should save all the parameters from the current run. This is usually a good idea,
as it allows you to go back and use the settings you’ve made in the interactive session
again, without having to re-enter them all. Click on the disc icon and save out a
Mosflm session (.mos) file with a suitable name.
13
BIOSCI 737 X-ray Lab 1
So, what have we done again?
To recap, we have indexed the diffraction pattern, meaning that we have assigned
Miller Indices (h,k,l) to each of the spots. In doing this we have determined the
probable Bravais lattice and dimensions of the unit cell of the crystal. We have then
measured (integrated) the intensity of the X-rays at each point in the diffraction
pattern and stored those measured intensities as a multi-record .mtz file. Based on the
lattice, we have taken a guess about the symmetry of the diffraction pattern (4/m …
its Laue group), and the space group of the crystal (I4). Now we’ll see how good that
guess is by scaling and merging the data under the assumed symmetry.
The next step - scaling and merging
We will scale and merge all of the individual data measurements into a single
composite dataset, consisting of a measurement of intensity (I) and an estimate of the
error in the measurement (σI) for each Miller lattice point (h,k,l). We then convert the
measured intensities (I) into structure factors (F), ready to calculate electron density.
The scaling and merging process uses the 4/m symmetry we have assumed. If this
assumption is correct, the process will go well. If this assumption is incorrect, the
process will go badly !!
The CCP4 suite of programs is conveniently run from a GUI called CCP4i. It’s
probably a good idea to make a specific subdirectory in your directory in which to run
subsequent programs. Clicking on the CCP4 icon
the following graphic window:
14
should bring up
BIOSCI 737 X-ray Lab 1
The individual programs and computational tasks available are listed in the left-hand
column, arranged in functional themes or 'modules'. The right-hand column has some
buttons for running the programs and performing various tasks to organise your data,
and the centre panel will keep a record of the computing tasks performed and their
outputs, which is a very handy way to keep track of what you have done previously.
In order to keep things in one place, let's set up a project directory: press the
'Directories & Project Dir' button at the top right and set up a Project as
illustrated in the example below, though obviously you will chose to make your own
folders in /Users/student/Documents. Set the project for use in this session, make sure
that the TEMPORARY alias is set to a local directory such as /tmp and then click on
'Apply&Exit'.
OK, now we're set to go in CCP4i!
15
BIOSCI 737 X-ray Lab 1
The Matthews coefficient
Before we do anything else, we should calculate the Matthews coefficient, as some
estimate of the contents of the asymmetric unit of the crystal is needed for the
calculation of Fs from Is.
In the left hand column, change from the Data Reduction module to the
Program List. In the list will be a program called Matthews_coef – click on it
to open the setup window. Enter a meaningful job title, choose the .mtz file you
produced from MOSFLM, and enter the number of amino acid residues in our protein:
187.
Click Run Now, and a set of possible Matthews numbers and a probability for each
of them will be displayed.
Q1: What is the most likely number of molecules in the asymmetric unit from
this calculation? What % solvent does this equate to, and is this reasonable?
How many amino acid residues do we therefore expect to find in total in the
asymmetric unit of the crystal?
16
BIOSCI 737 X-ray Lab 1
Scaling and merging
Next we'll do the scaling and merging, using SCALA, and convert the merged values
of I to structure factors, using TRUNCATE (use ‘Old Truncate’ not Ctruncate). Go to
the Data
Reduction module and choose the Scale
and
Merge
Intensities task from the list. You will get a setup box like the one below.
Choose ‘old Truncate’ raher than ‘Ctruncate’, and make sure that you have checked
the boxes to include an Rfree set of data for subsequent refinement and that you have
added the number of amino acid residues in the asymmetric unit that you have just
calculated. It also makes sense to give useful names for the crystal and dataset you are
using. Pull down Run Now to launch the computational task, which should take 3-4
minutes to run. You should be able to follow the progress of the job on the centre
panel of the GUI.
17
BIOSCI 737 X-ray Lab 1
When the task has finished, the final output will be another .mtz file, named as
defined in the ‘MTZ out’ box, which by default will be the input name appended
with '_scala1', and a set of log files and graphs which will let us assess how the
calculations went. We can look at the latter outputs by clicking View Files from
Job > View Log Graphs. There are a number of useful graphs here which we
can use to figure out if the data reduction has been successful and determine the
quality of the merged dataset
First we need to figure out if our assumed symmetry is correct. There is a lot of
information summarised in the log file and log graphs, and it can all be a little
confusing. However, examining just a few of these plots will tell us if the data
reduction has been successful. The Scales v rotation range graphs should
vary smoothly, with no sudden discontinuities, and the number of data points rejected
in each frame should be a small fraction of the total. The Completeness,
Multiplicity, Rmeas v resolution graphs should show the data agreeing
very well at low resolution, with the agreement worsening at high resolution (where
the spots are weaker). If the assumed symmetry was wrong, agreement of the data
would be bad, even at low resolution.
If yoou want to see the numbers in tabular form, you can directly inspect the SCALA
log file. This is largely full of complex technical output, but by searching for the
string 'summary' using the Find String function of the fileviewer, you can find a
summary table of quality indicators like Rmerge, Rmeasure and I/σI for the outer
(weakest) and inner (strongest) shells of data, and for the dataset as a whole.
Q2: Assess the overall quality of your dataset with reference to the R factors,
I/σ I, completeness and multiplicity figures. Does it appear that the data have the
the 4/m symmetry we assumed ? We have processed the data to an effective
maximum resolution of ~2Å. Is this reasonable?
Next let’s check out the systematic absences, as that carries important information
about the space group (and in fact is the only way to discriminate between some space
groups from their diffraction pattern) Scroll down to the graph called Axial
18
BIOSCI 737 X-ray Lab 1
reflections, axis l, which shows the 00l reflections and should look
something like this:
If you mouse over the graph points, you can see that reflections along the l axis of the
reciprocal lattice when l=16, 20, 24 etc are strong, with I/σI= 25 or so, but that
reflections when l=14, 18, 22 etc are weak, with I/σI=5 or usually much less.
19
BIOSCI 737 X-ray Lab 1
Q3: What is this pattern of strong and weak reflections telling us about our
choice of spacegroup?
Finally, let's check out the data to see if there is any further symmetry. For this we'll
use a program called ViewHKL, which will allow us to visualise an undistorted view
of the reciprocal lattice (a ‘pseudo-precession photograph’) to make the task easier.
Click on the ViewHKL icon
and you should get a GUI interface in the
same style as the MOSFLM one we used earlier. Click on the 'Open' button to read
in your scaled .mtz file, the one that was produced by SCALA. Click on the ‘HKL
Zones’ button and then on the ‘hk0’ button and you should see something like
this:
20
BIOSCI 737 X-ray Lab 1
Here we are looking at a slice through the reciprocal lattice where l=0, looking down
the l-axis. Hopefully the four-fold symmetry is obvious, and this is what we would
expect from our tetragonal spacegroup. Now look at the 0kl and h0l sections – what
symmetries are visible? You might also find it helps in trying to visualise what’s
going on if you step through the levels of the lattice (1kl, 2kl etc.).
Q4: Is there any apparent symmetry in the diffraction pattern along the h and k
axes? Do you think the Laue symmetry of the diffraction data is 4/m or 4/mmm?
Combining this information with the pattern of systematic absences we saw in
Q3, what should our final choice of spacegroup be?
21
BIOSCI 737 X-ray Lab 1
Some diagrams to help you visualise what’s going on:
Here are the 4 and 422 point groups:
And here are the associated Laue symmetries, 4/m and 4/mmm:
22
Download