Introduction to Molecular Replacement

advertisement
MOLECULAR REPLACEMENT
Basic approach
Thoughtful approach
Many many thanks to Airlie McCoy
What do we know? (1)
• The sequence of the molecule under investigation and
its molecular weight.
• Likely models can be gleaned from matching the
sequence to those of known 3-d structures. There are
many excellent tools for this – mostly web based
• The models indicate a likely fold
• They give some clues to the likely biological entity, eg
trimer? Tetramer?
What do we know? (2)
• The point group of the new crystal form, the volume of
the asymmetric unit and hence the likely number of
molecules in unit cell
Note: We often cannot be sure of the space group and will
have to test solution in several SGs.
[SG determination rests on observation of absences in
certain zones – eg Only l=4n seen on the 00l axis.
Is this a 4 1 screw axis? A 4 3 screw axis?
Or are there two molecules in the asymmetric unit in the
same orientation but separated by (x,y,1/4)?
Check native Patterson for this….]
What do we know? (3)
• The quality of the experimental
observations of intensities.
• Are they complete?
• Saturated at low resolution?
• Anisotropic?
• Are the intensity statistics reasonable?
Could the data be twinned?
What do we know? (4)
• We can calculate a native Patterson from the
intensity measurements alone.
• If there is more than one molecule in the
asymmetric unit it can show how the different
copies are related to each other.
• Is there a non-crystallographic translation vector
• What does a self rotation function show?
(More later about these..)
Deviation: OVERVIEW OF PATTERSON METHOD
There are two very important points about the
Patterson method
The Patterson can be calculated from the diffraction
data of the crystal without knowing the phases. The
Patterson is the Fourier Transform using the
intensities as the amplitudes and setting all the
phases to zero.
The Patterson is a vector map of the structure.
These two things enable the Patterson Method to be
used for MR.
THE VECTOR MAP OF TWO ATOMS
What is the complete set of
vectors between two atoms?
There are four vectors, two
equal and opposite interatomic
vectors and two self vectors.
Vector atom 1 to atom 2
Vector atom 1 to atom 2
Vector atom 1 to atom 1
Vector atom 2 to atom 2
The vector map has a large
peak at the origin and two
lower peaks on either side of it,
separated from the origin by the
distance between the two
atoms.
Vector map
What do we know? (5)
•
The interaction of biological and crystal symmetry leads to
horrendous complexity and we will avoid discussing it now!
•
Crystal symmetry underpins the process of crystallisation;
what holds the loosely packed macromolecules together.
•
Biology imposes its own symmetry on molecules, which is
sometimes expressed within the crystal symmetry;
sometimes independent of it.
• It can be a delight when you recognise this but can also be
a nightmare to disentangle.
THE PRINCIPLE OF MOLECULAR REPLACEMENT
The intensities of the reflections from the Bragg planes give us the electron
density of the protein in the crystal IF we can find phases for each of them.
+ phases =
Molecular replacement is one of the two major types of methods for solving the
phase problem (the other is Multiple Isomorphous Replacement)
The basic principle of the method is to find a known structure that you think
looks something like the structure in the crystal and guess the phases of your
structure factors using the known structure.
MR WITH IDENTICAL CRYSTALS
If the known
structure and the
unknown structure
crystallised in
exactly the same
way, with the same
spacegroup, unit cell
dimensions and
packing, it would be
very easy to guess
the phases - you
would just copy the
phases from the
known structure to
the unknown
structure.
Do NOT use MR!!!
unknown structure
known structure
MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAI
YIDASCLTWEGQQFQGKAAIVEKLSSLPFQKI
QHSITAQDHQPTPDSCIISMVVGQLKADEDPI
MGFHQMFLLKNINDAWVCTNDMFRLALHNFG
HKL F f
0 0 1 12.6 120
0 0 2 2.1 10
0 0 3 69.9 280
etc...
copy
PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNS
SYVHGGVDASGKPQEAVYGQNDIHHKVLSLNFS
ECHTKIRHVDAHATLSDGVVVQVMGLLSNSGQP
ERKFMQTFVLAPEGSVPNKFYVHNDMFRYEDE
HKL F f
0 0 1 10.4 120
0 0 2 3.1 10
0 0 3 52.2 280
etc...
MR WITH IDENTICAL CRYSTALS
•
•
•
•
The phases copied from the known structure are a starting point for
refinement. Refinement is the process of improving the phases at the
end of structure solution.
But BEWARE – the cell dimensions are usually slightly different and
you will need to do a few cycles of rigid body fitting.
At the end of refinement the phases would not be the same as in the
known structure. They would be the correct phases for the unknown
structure. H K L F f
HKL F f
0 0 1 12.6 120 refinement
0 0 1 12.6 115
0 0 2 2.1 10
0 0 3 69.9 280
0 0 2 2.1 355
0 0 3 69.9 283
etc...
etc...
Molecular replacement with identical crystals is so simple, it is not
called molecular replacement. It is called the “Difference Fourier”
method, and is very often used for solving a special set of structures
where a single protein is crystallised with a series of drug compounds.
MR WITH NON-IDENTICAL CRYSTALS
However, if
the known
and unknown
structure
crystallise is
different
ways, then
finding the
correct phases
for your
structure
factors is
more
complicated.
unknown structure
known structure
MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDA
SCLTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQD
HQPTPDSCIISMVVGQLKADEDPIMGFHQMFLLKNI
NDAWVCTNDMFRLALHNFG
PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVH
GGVDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRH
VDAHATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAP
EGSVPNKFYVHNDMFRYEDE
HKL F
0 0 1 2.5
0 0 2 72.1
0 0 3 26.9
etc...
f
?
?
?
?
HKL F f
0 0 1 10.4 120
0 0 2 3.1 10
0 0 3 52.2 280
etc...
MR WITH NON-IDENTICAL CRYSTALS
unknown structure
If we can find
the rotation MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDA
SCLTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQD
HQPTPDSCIISMVVGQLKADEDPIMGFHQMFLLKNI
and
NDAWVCTNDMFRLALHNFG
translation
that puts the
model in the
correct
position in
the crystal
cell, THEN
origin o
we can
calculate
HKL F
f
phases.
0 0 1 2.5 30
known structure
PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVHG
GVDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRHVD
AHATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAPEGS
VPNKFYVHNDMFRYEDE
origin o
0 0 2 72.1 85
0 0 3 26.9 310
HKL F
f
0 0 1 10.4 120
0 0 2 3.1 10
0 0 3 52.2 280
etc...
etc...
CRYATALLOGRAPHIC SYMMETRY
known structure
unknown structure
MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDASC
LTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQDHQPT
PDSCIISMVVGQLKADEDPIMGFHQMFLLKNINDAWVC
TNDMFRLALHNFG
PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVHGG
VDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRHVDAH
ATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAPEGSVPN
KFYVHNDMFRYEDE
ROTATION
TRANSLATION
ROTATION
TRANSLATION
origin o
origin o
When symmetry is present, we only
have to find one rotation and translation
operator; the other one is given by the
symmetry.
FINDING THE ROTATION AND TRANSLATION
The key part of molecular replacement is finding the rotation and
translation that puts your model structure on top of your new crystal
structure.
Is this a paradox?
Coordinates give phases of known structure
Which gives electron density of known structure
X
Fit to electron density of unknown structure to find rotation and translation?
phases of unknown structure
FINDING THE ROTATION AND TRANSLATION
The molecular replacement problem is not a paradox because we use the
Patterson function or Likelihood methods to find the rotation and
translation. We will only talk about Patterson methods.
Structure factors of known model
Generates
Patterson density of known
model
find rotation and translation to match Patterson of new unknown
structure
Then generate new coordinates and calculate approximate phases
for new unknown structure
THE VECTOR MAP OF SOME ATOMS
We can generate a vector map
of a molecule by putting each
atom in succession at the origin
molecule
Patterson
ROTATION AND TRANSLATION FUNCTIONS
Solving a structure by MR using the Patterson method is a two
process
• First, find the orientation using the Rotation Function
• Second, find the translation using the Translation Function
ROTATION FUNCTION
First, consider the model Patterson
We put the model in a large P1 box and calculate the Patterson from the structure
factors of the model in the P1 box.
model in large P1 box
ROTATION?
“search” model
Patterson of
model in large P1
box
Clusters
separated by P1
cell dimensions
ROTATION FUNCTION
The Patterson of our unknown structure contains self-vectors and crossvectors, but because the cell was large, the self-vectors and cross vectors are
well separated from one another.
self vector
cross vector
ROTATION FUNCTION
Just as we generated the Patterson for our model in the first orientation, we can
generate the Patterson for the model in any orientation in any sized box.
model in same large P1 box
in different orientation
Patterson of model
in large P1 box
in different
orientation
ROTATION FUNCTION
When the models are in different orientations the Pattersons will not
match one another.
=
X
ROTATION FUNCTION
However, when the second model is in the same orientation parts of the
Pattersons will match one another, and we can “solve” the rotation function
for the model.
=
ROTATION FUNCTION
If the model were in a different sized box, the Patterson of the intramolecular
vectors, which are located in a sphere centred on the origin, can be overlaid.
We can cut out the peaks corresponding to the inter-molecular vectors from each
Patterson and just compare the central parts of the Pattersons.
``
``
ROTATION FUNCTION
Now, the Pattersons of the
intra-molecular vectors
will match when the
model is in the correct
orientation. (This is done
using spherical harmonics
to expand the patterson –
can use FFTs)
=
X
=
CROSS-ROTATION FUNCTION
Now we will consider what happens when we compare the model to the crystal
Patterson. This is called the cross rotation function.
The most important difference between the model and crystal Patterson is that
the crystal Patterson the model is not in a large P1 box. It is in whatever space
group and unit cell the protein crystallised in!
The Patterson for the crystal is therefore much more complicated that the
Patterson for the model.
unknown structure in crystal
Patterson of unknown
structure in crystal
CROSS-ROTATION FUNCTION
In the crystal Patterson, the intra- and inter- molecular vectors are not as well
separated from one another, and this is specially true if there is very little
solvent.
Cutting out a sphere of the Pattersons around the origin does not give a simple
Patterson of the structure in the crystal.
None-the-less, if the sphere is small enough, most of the vectors near the origin
will be intra-molecular vectors
CROSS-ROTATION FUNCTION
The centre of the Pattersons
will match when the model is
in the correct orientation.
We have “solved” the rotation
function for our crystal
structure.
=
X
In this case, the rotation
required was 90 degrees.
=
ROTATION = 90 degrees
CROSS-ROTATION FUNCTION
It is important that the box used to calculate the Patterson for the search
molecule is large enough that inter-molecular vector are not present in the
search volume. Otherwise, both Pattersons are complicated by vectors that
hide the signal. Most programs select the box size automatically.
Patterson of model, P1 search box too small
Crystal Patterson
ROTATION FUNCTION WITH SYMMETRY
Symmetry adds complexity to the Patterson of the crystal. In general, the
higher the symmetry, the harder it is to find the rotational part of a
molecular replacement solution.
Unknown crystal structure
with two-fold symmetry
Crystal Patterson
ROTATION FUNCTION WITH SYMMETRY
Our search Patterson will overlay with the intra-molecular peaks of the
crystal Patterson in two orientations
180 degree rotation
Inter molecular peaks of
Crystal Patterson
We get two solutions in a full 360 degree
search, or alternatively, we only have to search
180 degrees to get a solution.
Symmetry defines the search volume required
to find a solution.
ROTATION FUNCTION TARGET
The overlap between the observed and calculated Pattersons is measured with
the product function i.e. corresponding positions in the observed and
calculated Patterson maps are multiplied. This gives a maximum when the two
Pattersons overlap.
The origin is omitted to avoid the large origin peak of the Patterson.
The sphere is limited in radius because the intra-molecular vectors are
concentrated near the origin.
Translation section
TRANSLATION FUNCTION
Once we have a rotation function solution for our crystal, we must
now find where to place the structure relative to the crystal origin,
which is defined by the position of the symmetry operators.
We know the unit cell for the crystal, so we can put our oriented model
somewhere in the correct unit cell (rather than a large P1 box).
known structure in unit cell of
unknown crystal structure
Patterson of known
structure in unit cell of
unknown crystal
structure
TRANSLATION FUNCTION
Now let us put the known structure in a different position in the unit cell.
In the case of a P1 crystal structure, the Patterson is exactly the same.
This is because the only inter-molecular vectors are from one molecule
to the molecule in the next unit cell, and these do not change.
In P1 we can position the known structure anywhere in the unit cell, and
get the correct structure amplitudes.
known structure in unit cell of
unknown crystal structure
Patterson of known
structure in unit cell of
unknown crystal
structure
SYMMETRY AND THE TRANSLATION FUNCTION
Now let us consider the effects of symmetry. If we move our known
structure in the unit cell, the symmetry related copies will move according
to the symmetry operators.
The inter-molecular vectors change as the reference structure is moved
around the unit cell.
Unknown crystal structure
with two-fold symmetry
Unknown crystal structure
with two-fold symmetry, new position
Structure Factors
• We will check the agreement between the structure
factor magnitudes calculated from the repositioned
model, and the experimental ones.
• The magnitudes will be the same for any of the
symmetry operators and for any alternate origin.
• (See $CHTML/alternate_origins.html)
SYMMETRY AND THE TRANSLATION FUNCTION
Note that this translation function only tells us the component of the
translation perpendicular to the symmetry rotation axis.
For higher symmetry space groups, we can compute such translation
functions for each symmetry axis, which gives us all the components of
the translation vector.
TRANSLATION PRODUCT FUNCTION
The required translation, relative to the two-fold axis, can be found by
translating the intermolecular vectors over the observed Patterson map
and computing a product function. When the correct translation is
chosen, there should be a large peak because the vector sets will coincide.
The intermolecular vector set
TRANSLATION CORELLATION FUNCTION
Since, at each search position, we know both the orientation and position of
the search model, we can calculate the structure factors from the model in
that position in the unit cell.
We can calculate the correlation between the observed and calculated data,
and see how well they match. The correct translation should give a peak in
this map.
This method gives the translation vector directly, with all the symmetry
operations considered together. It generally has better signal to noise than
the product function. It is equivalent to the correlation coefficient between
the two origin-removed Patterson maps.
NON-CRYSTALLOGRAPHIC TRANSLATION VECTOR
If the asymmetric unit contains two molecules
related by a translation, then the native
Patterson will have a large peak at the position
representing this translation.
Unlike non-crystallographic rotations, noncrystallographic translations are not useful in
structure determination.
In fact, they introduce awkward structure
factor correlations not currently accounted for,
and can make structures difficult to refine.
If there are more than one molecule in the
asymmetric unit you should also check for
non-crystallographic translation.
Asymmetric unit
of unknown
crystal structure
with noncrystallographic
translation.
Crystal
Patterson has
origin sized
peak at the
translation
vector.
SELF-ROTATION FUNCTION
If there is more than model molecule in the
asymmetric unit (and no NCT), then the rotation
function of the Patterson on itself will give a
peak at the angle corresponding to the relative
rotation between the two.
The self rotation function does not need a
model!
This is useful for confirming or determining how
many copies of the structure you have in the
asymmetric unit. It should therefore be one of
the first things you do with a new data set. If
non-crystallographic symmetry is present it us
extremely useful in MIR and density
modification.
Asymmetric unit
of unknown
crystal structure
with two-fold
symmetry
Crystal
Patterson has
same two-fold
symmetry near
the origin (intramolecular
peaks only)
THE SEARCH MODEL
Molecular replacement only works if you have a good search model.
HOMOLOGY
•
•
•
The higher the sequence identity the easier MR.
There is a rough inverse correlation between the rms deviation of
atomic positions and the percentage sequence identity.
It is not possible to give an minimum value of homology at which
MR will work. Below 30% it is usually difficult but it can be
challenging with a 100% identical molecule. (See 1tj3)
COMPLETENESS
•
The search model should represent a significant fraction of the
unknown structure. If it only represents 10% of the unknown
structure MR will be difficult.
FLEXIBILITY
•
•
Some proteins (for instance antibodies) have domains that can
adopt different relative orientations with respect to each other.
MR will be difficult if the search model does not have the same
relative orientation of these domains
SEARCH ENSEMBLES
If you have several homology models, you can combine them for use in
molecular replacement using a method called ensembling.
The correct weights for each model are calculated from the expected rms
difference between the coordinates of the model and the coordinates of
the structures. The expected rms difference is a function of the sequence
homology between the model structure and the structure in the crystal.
A structure factors from the combined set of correctly weighted models
is then calculated and this structure factor set used in the molecular
replacement, instead of the structure factor calculated from a single
molecule.
Rotation Matrices
Coordinate transformations and rotations
An introduction to the mathematics of rotations
In molecular replacement, we need to move a model, described as a
list of orthogonal coordinates (usually in Å) into a new position and
orientation. We also must relate the orthogonal coordinate system to
the crystallographic system, which may not be orthogonal, and to
consider crystallographic symmetry, which applies to coordinates
expressed as fractions of the unit cell.
In the orthogonal frame, the transformation is most easily written as a
rotation matrix multiplying a set of vectors, plus the translation
component.
For each atom
x'i
=
R xi + t
For a set of coordinates
(x'1 x'2 x'3 x'4 ...)
=
R (x1 x2 x3 x4 ...) + (t t t t ...)
This talk is principally concerned with the properties and description of
the rotation matrix R.
DO NOT ATTEMPT TO COPY DOWN MY EQUATIONS
Rotation matrices
• Properties of a Rotation matrix
• Determinant equals 1
• Inverse identical to transpose.
• It can be expressed as a function of
THREE rotation angles.
Rotation matrices and angles
A rotation is uniquely defined by a rotation matrix, so why do we need rotation angles?
The matrix is useful when we know the answer, but not useful when we are trying to
find it. The 9 elements of the 3 x 3 rotation matrix are not independent, but are related
to 3 independent parameters, rotation angles.
To find the anwer by an exhaustive search method, we need to enumerate all possible
rotations, and to optimise a trial solution we need to refine the rotation values. For
these purposes, we need the angle parameters.
Unfortunately, a rotation matrix may be expressed as 3 angles in many ways, and
many of these ways have been used in different programs. This is not a problem within
any one system, but is confusing if you have to convert from one program to another.
The matrix remains unique, so conversion may be done via the matrix.
Verifying Solution
• We can only do this by generating new
coordinates from the model, then testing
whether these generate structure amplitudes
which agree reasonably well with our
observed ones.
• Initial R values and Correlation Coefficients
always high, but correct solution will (usually!)
be slightly better than random, and will refine.
Verifying Solution
. The amplitudes will be the same,
irrespective of any crystallographic
symmetry operator applied to the solution,
or unit cell origin chosen, hence it can be
difficult to compare solutions from different
programs.
How to Refine and Eliminate Bias???
Tommorrow’s discussion
• By refinement followed by hand rebuilding into
map?
• Arp/Warp?
• Solve/Resolve?
• Buccanneer?
• Acorn – density modification?
Download