Phase Improvement: Phases are not perfect: - SIR: phase ambiguity (Zweideutigkeit) without a second heavy atom derivative MIR can suffer from non-isomorphism Calculating the heavy atom position from Patterson map and further |FH| and H can be inaccurate. MR uses phases from a similar but not the same protein. Rotation and Translation are not perfect. Procedure: After MR Rigid body refinement After MIR or MAD Phase improvement by density modification (not always) Electron density maps calculation Model building Model refinement, protein coordinates + overall B-factor Add water, ions, ligands … more refinement Model refinement, protein coordinates + atomic B-factors Model refinement, multiple occupancy and anisotropic B-factors (if atomic resolution data) Model validation Improve the arrangment of a part of the protein Improve density –> Calculate better phase Calculate the first map Interpret the map Fit the structure to your observations Define accuracy Rigid body refinement: 1. The model contains blocks of predictable structure, such as domains similar to known structures. 2. This structures are treated as rigid bodies. 3. You can refine lengths of secondary structure (-helices or β-strands) or helical oligonucleotides but also prosthetic groups. 4. With techniques of MR you place these structures as well as possible in the electron density to give the best fit to observed intensities. 5. For example after MR you can treat the entire molecule as a rigid body and refine its position and orientation in the unit cell to arrange it more properly. Density modification: MIR, MAD and MR yield phases you can use to calculate an electron-density map. In the best case the resulting map can be interpreted in terms of atomic positions. But often the resulting map raises many questions. There are methods to improve an electron-density map without any more experimental data (more isomorphous or anomalous scattering derivatives) which are known as density modification. This step can be leaved out for good MR starting models o o o Sovent flattening: Each crystal contains channels filled with solvent whose molecules are in a disordered liquid state which presents a uniform density. Recognizing these uniform solvent regions allows to draw surfaces separating solvent from protein regions. Solvent flattening is one of two procedures to do so. It modifies the solvent region and set it to a mean value leaving the protein region unalterated. The resulting electron density is used to calculate structure factors with improved phases (so we hope). The observed structure factor and the new phases now yield an improved electron density map with much flatter density in solvent regions. The procedure can be repeated cyclically to remove remaining variations in the solvent region. Today computer techniques have minimized the labour of defining the molecular surface. But there is still a serious hazard of assigning structure as solvent. Averaging: There could be more than one copy of structure in the asymmetric unit of a crystal (i.e. if a protein is an oligomer that consists of identical subunits related by one of the point-group symmetries). If the internal symmetry between these copies applies locally and does not show up in the crystal symmetry, we call it local or non-crystallographic symmetry which is surprisingly common. The first step of averaging is to find out how the subunits are arranged. This problem is similar to rotation and translation in MR. After this the electron density of all symmetry related points should be identical. We can force them to be equal by replacing the density at some points by the average density at all equivalent points. We can do so for every point in the subunit. Errors tend to cancel out. For this a carefully defined volume must be chosen. Averaging makes structure determination for the capsids of spherical viruses a routine procedure. Skeletonization: It should be possible to make out a long connected path (main chain) o Proteins tend to have a similar frequency distribution of electron density values Electron-density map calculation: ρexp(x) = V-1 Σh Fexp(h) exp(-2πihx) with Fexp(h) = |Fobs(h)|exp[ibest(h)] Difference map: You want to check the validity of a newly interpreted model. For this you compare the electron density map of a built model with the map from which it has been built. ρobs(x ) – ρcalc(x) = V-1Σh(Fobs(h) – Fcalc(h)) exp[-2πihx] Fcalc you can calculate from the coordinates of the built model. ρ(Fo-Fc) > 0 where features exist not adequately represented in the model ρ(Fo-Fc) < 0 where features exist not supported by the observation => So you can recognize missing or wrongly spaced atoms. Map after improvement of phases: When you use phases obtained from density modification you create an electron density which confirms your model. You cannot eliminate this bias (Voreingenommenheit) but you can reduce it by calculating a “2Fo-Fc-map”. Also other variants like a “3Fo-2Fc-map” are employed. This maps look like ordinary electron density maps of a protein. Model bias can be further reduced by a technique known as sigma-weighting (not mentioned in the script). (2|Fobs(h)| - |Fcalc(h)|)exp[icalc(h)] = | Fobs(h)| exp[icalc(h)] + (|Fobs(h)| - |Fcalc(h)|)exp[icalc(h)] => This map (native map plus difference map) should look like the correct model. Omit map: Sometimes the interpretation of some parts of the map is still doubtful. We can then delete the atoms of this part and calculate the phase of the remainder which is less accurate but an unbiased estimate for the volume omitted. The omitted volume should be at most one-eighth. This technique you can apply in Fo-Fc- as well as in 2Fo-Fc-maps. Model building: Computer assistance proposes possible conformations of the main chain and atomic positions found in similar main chain conformations and side chain conformations most frequently found. o o A hierarchical knowledge based approach: Library of well refined 1. C positions structures 2. Main chain atoms 3. Side chains in correct conformations A fully automatic approach: peaks in electrons density -> atoms -> atom pattern is interpreted as protein structure Steps of model building: 1. Produce a continuous trace approximately along the polypeptide (main) chain (skeletonised map) 2. Identify at least one point in the sequence with the help of the density or known markers. Easily recognizable points (Sulfur atoms of high density, aromatic side chains, Glycins) can help align a known sequence (see number 5) to a 3D structure seen for the first time. <- recognizable amino acids: Se-Met, Hg-Cys <- Further recognizable: active site, prosthetic groups etc. 3. The direction of main chain can be determined with the help of -helices whose side chains point toward the N-Terminus (Christmas tree). You can only use β-sheets for this purpose when your resolution is high enough to see the carbonyl groups. 4. Place the C atoms <- 3.8Å ruler: 3.8Å is the distance between two C atoms Search for this distance in the map to find C atoms. => Now you can place all further main chain atoms and the Cβ atoms (side chain). 5. Assign the correct sequence number to the residues and identify them if possible. Built poly-Ala to some amino acids when the position in the sequence cannot be determined. You will recognize them in the electron density map by this way. 6. Fill in other atoms ( known stereochemistry, library of common backbone conformations) 7. Add side chains ( common rotamer conformations: Rotamers are isomers interconverted by rotation about a single bond.) 8. Manual or (semi)automatic fitting: Let fit each residue to the density. Possible errors: - misplaced C atoms - side chains not in the most common rotamer conformation R-Factor: Quality of the model This factor compares the observed structure amplitudes to those calculated from the current model to monitor the quality of the model. It indicates how well the model matches the observed data: R= 𝛴ℎ |𝐹𝑜𝑏𝑠 | − |𝐹𝑐𝑎𝑙𝑐 || 𝛴ℎ |𝐹𝑜𝑏𝑠 | Because we know the change in amplitude with change in coordinates or in atomic B-factor we can calculate the partial derivative of the R-factor with respect to each atomic position or to B.A 3D derivative is a gradient and each atom is moved along this gradient. Model refinement: Structure refinement can only begin when most of the atoms are at their correct position. Therefore we need the preceding steps (density modifications, model building). It is better to leave badly placed atoms out at first. Therefore further rounds of model building are needed after refinement. The aim of structure refinement is to adjust a structure (that means its parameters x, y, z and B) to give the best possible fit to the crystallographic observations: => Minimising E = Σh[w(h)*(|Fobs(h)|-|Fcalc(h)|)2] (least square method) (The weighting term w(h) can be adjusted according to the accuracy of the observation.) Refinement is a large size project only possible through the availability of fast computers. Overdetermintation: The model is specified by a number of variables (the coordinates and the B-factor). Refinement procedures can only work when there are enough observations (= reflections: at least as many as the number of variables, but in practice more are needed). Imagine a diffraction experiment with 25,000 measured independent X-ray reflection (depends on resolution: The better the resolution (small d) the bigger the number of reflections) and 2000 nonhydrogen atoms (hydrogen atoms are disregarded because they have only one electron and therefore their scattering influence is low). So you have 8000 parameters. 25,000/8000 is about 3. This is a poor overdetermination and so we incorporate as many additional observations as possible (known bond length and angles also true in the investigated protein structure, solvent flattening, noncrystallographic symmetry, …). Restraints (= extra information): o Covalent: Torsion/dihedral angles (controls C-C-distance), φ (controls C’-C’-distance) and ψ (controls N-Ndistance) Bond angles (i.e. of side chains:1 tills maximum 5 in arginine) Bond length Partial double bond Partial double bond The partial double bond restrains the value of . * torsion angles: o Noncovalent: Electrostatic (attraction between opposite charges) Hydrogen bonding (N-H O=C, weaker than electrostatic, stronger than Van der Waal’s) Van der Waal’s (between permanent or induced dipoles) Chiral Volume (volume of a unit cell calculated by using the interatomic vectors; Atom1 is the chiral center, atom2 till 5 are bound to it. Imagine atom2 till 4 o in the plane of picture. Atom1 below this plane -> chiral volume is plus, otherwise it is minus) Planar groups (aromatic side chains, Glu, Asp, Arg) Atomic B-factor: Atoms bounded to each other should not have large B-factor-differences. Restraints are considered in terms of energy: 𝑞𝑞 E = EX-ray + Σbondsk(d-d0)2 + Σanglesk(-0)2 + Σtorsionk[1+cos(nw+w0)] + ΣiΣg4𝜋𝑖𝑟𝑗 + … 𝑖,𝑗 2 with EX-ray = Σh[w(h)*(|Fobs(h)|-|Fcalc(h)|) ] (least square methode) d0, 0 and w0 are ideal values and k is the force constant for bond distortion(Verzerrung) imposing a certain flexibility instead of rigidity. Be aware that when i.e. d differs from d0 E becomes bigger. That runs counter to the aim of refinement. Radius of convergence: In math a radius of convergence of a power series (Potenzreihe) defines an interval of convergence in which this series converges toward a limit (Grenzwert). In structure refinement it describes how far away from the truth (limit) the model can be. - Increasing data reduces false local minima. - This false local minima can be overcome by a good method. Refinement can only find the next minimum. Starting at the red crosses this can result in good or bad models. R-factor Parameter: i.e. x-coordinat of atom1 To escape from local minima programs use two alternating techniques: - session of interactive graphics - automatic model improvement After a round of computer refinement 2Fo-Fc- (correct model) and Fo-Fc-maps (best estimate of difference) are calculated: Negative difference density yielding a wrong rotamer Positive difference density yielding a wrong rotamer 2Fo-Fc-density yielding the right conformation More refinement with the help of waters, ions or ligands: - Calculate a difference map and interpret positive peaks (not available in the model) by adding waters, ions or ligands. - ad water: It contains two hydrogen atoms (only one electron each -> weak scattering) and one oxygen atom (six valence electrons -> good scattering). A well-ordered water molecule is more important than a poorly ordered part of the protein. You need a high resolution to recognize them: better than 2.8Å-3Å. You place just the oxygen to the positive peaks if there is no other atom but there is an atom nearby. Also avoid putting waters into features better interpreted as ions, ligands, un- or missbuilt proteins. Ad B-factor: - B(main chain) < B(side chain) - Differences between maximum and minimum B in side chain (5-60Å2) bigger than in main chain (5-35Å2) - Logically B-factors of disordered parts of the protein are higher. Ad anisotropic B-factors: Anisotrophism = dependence on directionality: This factor allows the density distribution of atoms to show the particular direction in which they are more free to move. As a result the atom density cloud can be modeled by an ellipsoidal Gaussian with six parameters (6Bij). The total number of parameters is therefore no more four (x, y, z, B) but nine per atom. The refinement of the anistropic B-factor needs high resolution (like atomic B-factor does) and a high quality model. Ad multiple occupancy: I.e. side chains of tyrosine, serine or valine are flexible and have more than one conformation and their atoms have more than one location. Therefore we must define multiple alternative locations (multiple occupancies) for these atoms. Validation: When the creation of the best possible model is finished we need some insight into its accuracy and reliability: R-Factor: R= - - 𝛴ℎ |𝐹𝑜𝑏𝑠 | − |𝐹𝑐𝑎𝑙𝑐 || 𝛴ℎ |𝐹𝑜𝑏𝑠 | Our aim is a maximum R-factor of 20%. Proteins: 15% - 20% Small molecules 2% - 6% The ideal R-factor value would be zero. Compare: randomly distributed atoms in a centrosymmetric space group: R = 0.83 in a noncentrosymmetric space group: R = 0.59 Free R-factor: Sometimes R-factors can reach surprisingly low values which appear later to be incorrect. That is because for instance the number of model parameters is taken to high. Therefore Brünger (1992, 1993) introduced the free R-factor. He divided (cross-validation) the observed reflections into a working set and test set, a random selection of 5-10% not correlated with the reflections in the working set. While the working set is used for refinement the test set is used for calculation of the free R-factor. Rfree = ∑ℎ𝑘𝑙𝑇||𝐹𝑜𝑏𝑠 |− 𝑘|𝐹𝑐𝑎𝑙𝑐 || ∑ℎ𝑘𝑙𝑇 |𝐹𝑜𝑏𝑠 | for all reflections hkl belonging () to the test set T The underlying clue is that if a structure is well improved with the help of refinement R(working set) as well as free R will decrease. But if R(working set) decreases due to fitting to noise free R will increase. The free R-factor represents an unbiased monitor of structure refinement and helps to prevent over-fitting of the structure to the observations by asking how well did the model predicts data it HAS NOT BEEN FIT TO (test set). - In practice the free R-factor is 5% higher than the normal one. Low resolution (3Å): Rfree = 30% High resolution (better than 1.5Å): Rfree approaches R RMSD Root mean square deviation, a frequently-used measure of differences between a real value and its ideal) from ideal bond length, angles, planes, chiral volumes etc. - bond length: 0.02Å (C-C 1.54Å, N-C 1.43Å, O-H O-H 2.8Å) - bond angles: 4° Ramachandran plot: Stereochemistry of the main chain can be investigated by plotting the dihedral angel ψ against the dihedral angle φ. The diagram distinguishes allowed, partially allowed and regions depending on Van der Waal’s distances and tetrahedral angles (109.5° for H-C-H, tetrahedral shape of i.e. methan). In the partially allowed region the Van der Waal’s radia are (only) slightly distorted. As refinement proceeds and the structure is improved the distribution of the angles in the plot improves also. That means that for high refined structures nearly all φ/ψ-values lie in the allowed region. A Ramachandran plot is made after every session of manual fitting or automated refinement. Luzzati plot: Luzzati (1952) has developed an indicator of the precision of the atomic coordinates of the molecular model. The precision depends on the resolution and quality (Rmerge) of the diffraction data. Best value: 1/5-1/10 of maximum resolution Low precision is due to - thermal motion* (atomic vibration around an mean position) - model disorder (static* or dynamic: atoms/atom groups are not at equivalent positions in all molecules) * not distinguishable, reflected in the thermal B-factor: > For this reason a crystal structure determined with the help of X-ray diffraction is a time and space average. The clue of the Luzzati plot is that the difference between |Fobs| and |Fcalc| in the R-factor is due exclusively to errors in the position of atoms. Therefore the R-factor is plotted against the reciprocal value of d. This is a function of *d-1. is the average value of the error in the atomic coordinates of the molecular model. Lines for different values are calculated and the line closest to experimental curve (of our protein) determines the value for our crystal structure. The final result: 3D coordinates of the atoms of the molecule PDB format (The protein data bank file format is a textual file format. It describes the 3D structure of a protein and often of the waters, ions, ligands etc. crystallized together with the protein. It mentions orthogonalised coordinates in Å, temperature factors and occupancies.) We can calculate: o o o o o Bond length (coordination distances) Angles Torsion angles (angles between the normals on two planes)* H bonds and electrostatic interactions Planes and the angles between them Programms: o o o o For stereochemistry For surfaces For Comparison of two molecules For secondary structure ad Surface-programs: o o Molecular surface area (MSA determined by atomic radii) Solvent accessible area (SAA bigger than MSA, determined by Van der Waal’s radii) Surface exposed to solvent and other macromolecules, ligands, inhibitors, substrates etc. Macromolecule volume Drug targets: Cavities and pockets Surface between two or more macromolecules like in a dimer, between protein and DNA or protein and ligand SAA: Which amino acids are buried or exposed? Surface properties: electrostatic potential and amino acid conservation (as derived from the amino acid sequence alignment) We can compare two structures: Comparison is carried out by superposition of the two molecules (Mol1 = Mol2*Rot+Tra). We then can determine the average RMSD between equivalent atoms in Å and their distances also in Å. Resolution: http://www-structmed.cimr.cam.ac.uk/Course/Overview/Overview.html: “If the atoms were completely still, the molecules throughout the crystal were in identical conformations, and the crystal were perfectly ordered, then all the molecules would scatter in phase regardless of the angle of scattering and we would be able to collect diffraction data to a limit imposed only by the wavelength of the X-rays. The electron density map would have peaks at each of the atomic positions. But reality is rarely so favourable. Proteins are generally fairly flexible, and crystals have lattice disorder, i.e. the repeating units are not necessarily perfectly aligned throughout the crystal. So as we start to look at finer details by going to higher scattering angles, the diffraction pattern starts to cancel out. For this reason, most protein structures are limited to a level of detail where atoms are not resolved from one another. What we see is typically tubes of electron density for atoms that are bonded together.” Limit of resolution: 0.707*dmin Bragg law: n = 2dsin -> dmin = /(2sinmax) 2 L/2 S max = (arctan[L/2S]):2 Small d means high resolution. Interpretable density maps: resolution of 2.5Å to 3Å This resolution is too small to resolve two covalently bound atoms. But nevertheless can built the model by using building blocks (groups of atoms instead of individual atoms) like amino acid residues. o Individual atoms can be fitted at a resolution of 1Å. o Alpha helices can be seen at 6Å. Beta sheets cannot be seen at this resolution. o Very low resolution: large portions have to be fitted at one time. At a resolution lower than 8Å only whole molecules can be placed. o High resolution amplitudes depend on B more than low resolution amplitudes do. Therefore we need high resolution (2.5Å or better) to refine atomic B-factors.