MOLECULAR REPLACEMENT Basic approach Thoughtful approach Many many thanks to Airlie McCoy What do we know? (1) • The sequence of the molecule under investigation and its molecular weight. • Likely models can be gleaned from matching the sequence to those of known 3-d structures. There are many excellent tools for this – mostly web based • The models indicate a likely fold • They give some clues to the likely biological entity, eg trimer? Tetramer? What do we know? (2) • The point group of the new crystal form, the volume of the asymmetric unit and hence the likely number of molecules in unit cell Note: We often cannot be sure of the space group and will have to test solution in several SGs. [SG determination rests on observation of absences in certain zones – eg Only l=4n seen on the 00l axis. Is this a 4 1 screw axis? A 4 3 screw axis? Or are there two molecules in the asymmetric unit in the same orientation but separated by (x,y,1/4)? Check native Patterson for this….] What do we know? (3) • The quality of the experimental observations of intensities. • Are they complete? • Saturated at low resolution? • Anisotropic? • Are the intensity statistics reasonable? Could the data be twinned? What do we know? (4) • We can calculate a native Patterson from the intensity measurements alone. • If there is more than one molecule in the asymmetric unit it can show how the different copies are related to each other. • Is there a non-crystallographic translation vector • What does a self rotation function show? (More later about these..) Deviation: OVERVIEW OF PATTERSON METHOD There are two very important points about the Patterson method The Patterson can be calculated from the diffraction data of the crystal without knowing the phases. The Patterson is the Fourier Transform using the intensities as the amplitudes and setting all the phases to zero. The Patterson is a vector map of the structure. These two things enable the Patterson Method to be used for MR. THE VECTOR MAP OF TWO ATOMS What is the complete set of vectors between two atoms? There are four vectors, two equal and opposite interatomic vectors and two self vectors. Vector atom 1 to atom 2 Vector atom 1 to atom 2 Vector atom 1 to atom 1 Vector atom 2 to atom 2 The vector map has a large peak at the origin and two lower peaks on either side of it, separated from the origin by the distance between the two atoms. Vector map What do we know? (5) • The interaction of biological and crystal symmetry leads to horrendous complexity and we will avoid discussing it now! • Crystal symmetry underpins the process of crystallisation; what holds the loosely packed macromolecules together. • Biology imposes its own symmetry on molecules, which is sometimes expressed within the crystal symmetry; sometimes independent of it. • It can be a delight when you recognise this but can also be a nightmare to disentangle. THE PRINCIPLE OF MOLECULAR REPLACEMENT The intensities of the reflections from the Bragg planes give us the electron density of the protein in the crystal IF we can find phases for each of them. + phases = Molecular replacement is one of the two major types of methods for solving the phase problem (the other is Multiple Isomorphous Replacement) The basic principle of the method is to find a known structure that you think looks something like the structure in the crystal and guess the phases of your structure factors using the known structure. MR WITH IDENTICAL CRYSTALS If the known structure and the unknown structure crystallised in exactly the same way, with the same spacegroup, unit cell dimensions and packing, it would be very easy to guess the phases - you would just copy the phases from the known structure to the unknown structure. Do NOT use MR!!! unknown structure known structure MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAI YIDASCLTWEGQQFQGKAAIVEKLSSLPFQKI QHSITAQDHQPTPDSCIISMVVGQLKADEDPI MGFHQMFLLKNINDAWVCTNDMFRLALHNFG HKL F f 0 0 1 12.6 120 0 0 2 2.1 10 0 0 3 69.9 280 etc... copy PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNS SYVHGGVDASGKPQEAVYGQNDIHHKVLSLNFS ECHTKIRHVDAHATLSDGVVVQVMGLLSNSGQP ERKFMQTFVLAPEGSVPNKFYVHNDMFRYEDE HKL F f 0 0 1 10.4 120 0 0 2 3.1 10 0 0 3 52.2 280 etc... MR WITH IDENTICAL CRYSTALS • • • • The phases copied from the known structure are a starting point for refinement. Refinement is the process of improving the phases at the end of structure solution. But BEWARE – the cell dimensions are usually slightly different and you will need to do a few cycles of rigid body fitting. At the end of refinement the phases would not be the same as in the known structure. They would be the correct phases for the unknown structure. H K L F f HKL F f 0 0 1 12.6 120 refinement 0 0 1 12.6 115 0 0 2 2.1 10 0 0 3 69.9 280 0 0 2 2.1 355 0 0 3 69.9 283 etc... etc... Molecular replacement with identical crystals is so simple, it is not called molecular replacement. It is called the “Difference Fourier” method, and is very often used for solving a special set of structures where a single protein is crystallised with a series of drug compounds. MR WITH NON-IDENTICAL CRYSTALS However, if the known and unknown structure crystallise is different ways, then finding the correct phases for your structure factors is more complicated. unknown structure known structure MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDA SCLTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQD HQPTPDSCIISMVVGQLKADEDPIMGFHQMFLLKNI NDAWVCTNDMFRLALHNFG PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVH GGVDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRH VDAHATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAP EGSVPNKFYVHNDMFRYEDE HKL F 0 0 1 2.5 0 0 2 72.1 0 0 3 26.9 etc... f ? ? ? ? HKL F f 0 0 1 10.4 120 0 0 2 3.1 10 0 0 3 52.2 280 etc... MR WITH NON-IDENTICAL CRYSTALS unknown structure If we can find the rotation MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDA SCLTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQD HQPTPDSCIISMVVGQLKADEDPIMGFHQMFLLKNI and NDAWVCTNDMFRLALHNFG translation that puts the model in the correct position in the crystal cell, THEN origin o we can calculate HKL F f phases. 0 0 1 2.5 30 known structure PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVHG GVDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRHVD AHATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAPEGS VPNKFYVHNDMFRYEDE origin o 0 0 2 72.1 85 0 0 3 26.9 310 HKL F f 0 0 1 10.4 120 0 0 2 3.1 10 0 0 3 52.2 280 etc... etc... CRYATALLOGRAPHIC SYMMETRY known structure unknown structure MGDKPIWEQIGSSFIQHYYQLFDNDRTQLGAIYIDASC LTWEGQQFQGKAAIVEKLSSLPFQKIQHSITAQDHQPT PDSCIISMVVGQLKADEDPIMGFHQMFLLKNINDAWVC TNDMFRLALHNFG PSPLLVGREFVRQYYTLLNKAPEYLHRFYGRNSSYVHGG VDASGKPQEAVYGQNDIHHKVLSLNFSECHTKIRHVDAH ATLSDGVVVQVMGLLSNSGQPERKFMQTFVLAPEGSVPN KFYVHNDMFRYEDE ROTATION TRANSLATION ROTATION TRANSLATION origin o origin o When symmetry is present, we only have to find one rotation and translation operator; the other one is given by the symmetry. FINDING THE ROTATION AND TRANSLATION The key part of molecular replacement is finding the rotation and translation that puts your model structure on top of your new crystal structure. Is this a paradox? Coordinates give phases of known structure Which gives electron density of known structure X Fit to electron density of unknown structure to find rotation and translation? phases of unknown structure FINDING THE ROTATION AND TRANSLATION The molecular replacement problem is not a paradox because we use the Patterson function or Likelihood methods to find the rotation and translation. We will only talk about Patterson methods. Structure factors of known model Generates Patterson density of known model find rotation and translation to match Patterson of new unknown structure Then generate new coordinates and calculate approximate phases for new unknown structure THE VECTOR MAP OF SOME ATOMS We can generate a vector map of a molecule by putting each atom in succession at the origin molecule Patterson ROTATION AND TRANSLATION FUNCTIONS Solving a structure by MR using the Patterson method is a two process • First, find the orientation using the Rotation Function • Second, find the translation using the Translation Function ROTATION FUNCTION First, consider the model Patterson We put the model in a large P1 box and calculate the Patterson from the structure factors of the model in the P1 box. model in large P1 box ROTATION? “search” model Patterson of model in large P1 box Clusters separated by P1 cell dimensions ROTATION FUNCTION The Patterson of our unknown structure contains self-vectors and crossvectors, but because the cell was large, the self-vectors and cross vectors are well separated from one another. self vector cross vector ROTATION FUNCTION Just as we generated the Patterson for our model in the first orientation, we can generate the Patterson for the model in any orientation in any sized box. model in same large P1 box in different orientation Patterson of model in large P1 box in different orientation ROTATION FUNCTION When the models are in different orientations the Pattersons will not match one another. = X ROTATION FUNCTION However, when the second model is in the same orientation parts of the Pattersons will match one another, and we can “solve” the rotation function for the model. = ROTATION FUNCTION If the model were in a different sized box, the Patterson of the intramolecular vectors, which are located in a sphere centred on the origin, can be overlaid. We can cut out the peaks corresponding to the inter-molecular vectors from each Patterson and just compare the central parts of the Pattersons. `` `` ROTATION FUNCTION Now, the Pattersons of the intra-molecular vectors will match when the model is in the correct orientation. (This is done using spherical harmonics to expand the patterson – can use FFTs) = X = CROSS-ROTATION FUNCTION Now we will consider what happens when we compare the model to the crystal Patterson. This is called the cross rotation function. The most important difference between the model and crystal Patterson is that the crystal Patterson the model is not in a large P1 box. It is in whatever space group and unit cell the protein crystallised in! The Patterson for the crystal is therefore much more complicated that the Patterson for the model. unknown structure in crystal Patterson of unknown structure in crystal CROSS-ROTATION FUNCTION In the crystal Patterson, the intra- and inter- molecular vectors are not as well separated from one another, and this is specially true if there is very little solvent. Cutting out a sphere of the Pattersons around the origin does not give a simple Patterson of the structure in the crystal. None-the-less, if the sphere is small enough, most of the vectors near the origin will be intra-molecular vectors CROSS-ROTATION FUNCTION The centre of the Pattersons will match when the model is in the correct orientation. We have “solved” the rotation function for our crystal structure. = X In this case, the rotation required was 90 degrees. = ROTATION = 90 degrees CROSS-ROTATION FUNCTION It is important that the box used to calculate the Patterson for the search molecule is large enough that inter-molecular vector are not present in the search volume. Otherwise, both Pattersons are complicated by vectors that hide the signal. Most programs select the box size automatically. Patterson of model, P1 search box too small Crystal Patterson ROTATION FUNCTION WITH SYMMETRY Symmetry adds complexity to the Patterson of the crystal. In general, the higher the symmetry, the harder it is to find the rotational part of a molecular replacement solution. Unknown crystal structure with two-fold symmetry Crystal Patterson ROTATION FUNCTION WITH SYMMETRY Our search Patterson will overlay with the intra-molecular peaks of the crystal Patterson in two orientations 180 degree rotation Inter molecular peaks of Crystal Patterson We get two solutions in a full 360 degree search, or alternatively, we only have to search 180 degrees to get a solution. Symmetry defines the search volume required to find a solution. ROTATION FUNCTION TARGET The overlap between the observed and calculated Pattersons is measured with the product function i.e. corresponding positions in the observed and calculated Patterson maps are multiplied. This gives a maximum when the two Pattersons overlap. The origin is omitted to avoid the large origin peak of the Patterson. The sphere is limited in radius because the intra-molecular vectors are concentrated near the origin. Translation section TRANSLATION FUNCTION Once we have a rotation function solution for our crystal, we must now find where to place the structure relative to the crystal origin, which is defined by the position of the symmetry operators. We know the unit cell for the crystal, so we can put our oriented model somewhere in the correct unit cell (rather than a large P1 box). known structure in unit cell of unknown crystal structure Patterson of known structure in unit cell of unknown crystal structure TRANSLATION FUNCTION Now let us put the known structure in a different position in the unit cell. In the case of a P1 crystal structure, the Patterson is exactly the same. This is because the only inter-molecular vectors are from one molecule to the molecule in the next unit cell, and these do not change. In P1 we can position the known structure anywhere in the unit cell, and get the correct structure amplitudes. known structure in unit cell of unknown crystal structure Patterson of known structure in unit cell of unknown crystal structure SYMMETRY AND THE TRANSLATION FUNCTION Now let us consider the effects of symmetry. If we move our known structure in the unit cell, the symmetry related copies will move according to the symmetry operators. The inter-molecular vectors change as the reference structure is moved around the unit cell. Unknown crystal structure with two-fold symmetry Unknown crystal structure with two-fold symmetry, new position Structure Factors • We will check the agreement between the structure factor magnitudes calculated from the repositioned model, and the experimental ones. • The magnitudes will be the same for any of the symmetry operators and for any alternate origin. • (See $CHTML/alternate_origins.html) SYMMETRY AND THE TRANSLATION FUNCTION Note that this translation function only tells us the component of the translation perpendicular to the symmetry rotation axis. For higher symmetry space groups, we can compute such translation functions for each symmetry axis, which gives us all the components of the translation vector. TRANSLATION PRODUCT FUNCTION The required translation, relative to the two-fold axis, can be found by translating the intermolecular vectors over the observed Patterson map and computing a product function. When the correct translation is chosen, there should be a large peak because the vector sets will coincide. The intermolecular vector set TRANSLATION CORELLATION FUNCTION Since, at each search position, we know both the orientation and position of the search model, we can calculate the structure factors from the model in that position in the unit cell. We can calculate the correlation between the observed and calculated data, and see how well they match. The correct translation should give a peak in this map. This method gives the translation vector directly, with all the symmetry operations considered together. It generally has better signal to noise than the product function. It is equivalent to the correlation coefficient between the two origin-removed Patterson maps. NON-CRYSTALLOGRAPHIC TRANSLATION VECTOR If the asymmetric unit contains two molecules related by a translation, then the native Patterson will have a large peak at the position representing this translation. Unlike non-crystallographic rotations, noncrystallographic translations are not useful in structure determination. In fact, they introduce awkward structure factor correlations not currently accounted for, and can make structures difficult to refine. If there are more than one molecule in the asymmetric unit you should also check for non-crystallographic translation. Asymmetric unit of unknown crystal structure with noncrystallographic translation. Crystal Patterson has origin sized peak at the translation vector. SELF-ROTATION FUNCTION If there is more than model molecule in the asymmetric unit (and no NCT), then the rotation function of the Patterson on itself will give a peak at the angle corresponding to the relative rotation between the two. The self rotation function does not need a model! This is useful for confirming or determining how many copies of the structure you have in the asymmetric unit. It should therefore be one of the first things you do with a new data set. If non-crystallographic symmetry is present it us extremely useful in MIR and density modification. Asymmetric unit of unknown crystal structure with two-fold symmetry Crystal Patterson has same two-fold symmetry near the origin (intramolecular peaks only) THE SEARCH MODEL Molecular replacement only works if you have a good search model. HOMOLOGY • • • The higher the sequence identity the easier MR. There is a rough inverse correlation between the rms deviation of atomic positions and the percentage sequence identity. It is not possible to give an minimum value of homology at which MR will work. Below 30% it is usually difficult but it can be challenging with a 100% identical molecule. (See 1tj3) COMPLETENESS • The search model should represent a significant fraction of the unknown structure. If it only represents 10% of the unknown structure MR will be difficult. FLEXIBILITY • • Some proteins (for instance antibodies) have domains that can adopt different relative orientations with respect to each other. MR will be difficult if the search model does not have the same relative orientation of these domains SEARCH ENSEMBLES If you have several homology models, you can combine them for use in molecular replacement using a method called ensembling. The correct weights for each model are calculated from the expected rms difference between the coordinates of the model and the coordinates of the structures. The expected rms difference is a function of the sequence homology between the model structure and the structure in the crystal. A structure factors from the combined set of correctly weighted models is then calculated and this structure factor set used in the molecular replacement, instead of the structure factor calculated from a single molecule. Rotation Matrices Coordinate transformations and rotations An introduction to the mathematics of rotations In molecular replacement, we need to move a model, described as a list of orthogonal coordinates (usually in Å) into a new position and orientation. We also must relate the orthogonal coordinate system to the crystallographic system, which may not be orthogonal, and to consider crystallographic symmetry, which applies to coordinates expressed as fractions of the unit cell. In the orthogonal frame, the transformation is most easily written as a rotation matrix multiplying a set of vectors, plus the translation component. For each atom x'i = R xi + t For a set of coordinates (x'1 x'2 x'3 x'4 ...) = R (x1 x2 x3 x4 ...) + (t t t t ...) This talk is principally concerned with the properties and description of the rotation matrix R. DO NOT ATTEMPT TO COPY DOWN MY EQUATIONS Rotation matrices • Properties of a Rotation matrix • Determinant equals 1 • Inverse identical to transpose. • It can be expressed as a function of THREE rotation angles. Rotation matrices and angles A rotation is uniquely defined by a rotation matrix, so why do we need rotation angles? The matrix is useful when we know the answer, but not useful when we are trying to find it. The 9 elements of the 3 x 3 rotation matrix are not independent, but are related to 3 independent parameters, rotation angles. To find the anwer by an exhaustive search method, we need to enumerate all possible rotations, and to optimise a trial solution we need to refine the rotation values. For these purposes, we need the angle parameters. Unfortunately, a rotation matrix may be expressed as 3 angles in many ways, and many of these ways have been used in different programs. This is not a problem within any one system, but is confusing if you have to convert from one program to another. The matrix remains unique, so conversion may be done via the matrix. Verifying Solution • We can only do this by generating new coordinates from the model, then testing whether these generate structure amplitudes which agree reasonably well with our observed ones. • Initial R values and Correlation Coefficients always high, but correct solution will (usually!) be slightly better than random, and will refine. Verifying Solution . The amplitudes will be the same, irrespective of any crystallographic symmetry operator applied to the solution, or unit cell origin chosen, hence it can be difficult to compare solutions from different programs. How to Refine and Eliminate Bias??? Tommorrow’s discussion • By refinement followed by hand rebuilding into map? • Arp/Warp? • Solve/Resolve? • Buccanneer? • Acorn – density modification?