R-Factor

advertisement
Phase Improvement:
Phases are not perfect:
-
SIR: phase ambiguity (Zweideutigkeit) without a second heavy atom derivative
MIR can suffer from non-isomorphism
Calculating the heavy atom position from Patterson map and further |FH| and H can be
inaccurate.
MR uses phases from a similar but not the same protein.
Rotation and Translation are not perfect.
Procedure:
After MR
Rigid body refinement
After MIR or MAD
Phase improvement by density modification (not always)
Electron density maps calculation
Model building
Model refinement, protein coordinates + overall B-factor
Add water, ions, ligands … more refinement
Model refinement, protein coordinates + atomic B-factors
Model refinement, multiple occupancy and anisotropic B-factors
(if atomic resolution data)
Model validation
Improve the arrangment of a
part of the protein
Improve density –> Calculate
better phase
Calculate the first map
Interpret the map
Fit the structure to your
observations
Define accuracy
Rigid body refinement:
1. The model contains blocks of predictable structure, such as domains similar to known
structures.
2. This structures are treated as rigid bodies.
3. You can refine lengths of secondary structure (-helices or β-strands) or helical
oligonucleotides but also prosthetic groups.
4. With techniques of MR you place these structures as well as possible in the electron density
to give the best fit to observed intensities.
5. For example after MR you can treat the entire molecule as a rigid body and refine its
position and orientation in the unit cell to arrange it more properly.
Density modification:
MIR, MAD and MR yield phases you can use to calculate an electron-density map. In the best case
the resulting map can be interpreted in terms of atomic positions. But often the resulting map
raises many questions. There are methods to improve an electron-density map without any more
experimental data (more isomorphous or anomalous scattering derivatives) which are known as
density modification. This step can be leaved out for good MR starting models
o
o
o
Sovent flattening:
Each crystal contains channels filled with solvent whose molecules are in a disordered liquid
state which presents a uniform density. Recognizing these uniform solvent regions allows to
draw surfaces separating solvent from protein regions. Solvent flattening is one of two
procedures to do so.
It modifies the solvent region and set it to a mean value leaving the protein region unalterated.
The resulting electron density is used to calculate structure factors with improved phases (so
we hope). The observed structure factor and the new phases now yield an improved electron
density map with much flatter density in solvent regions. The procedure can be repeated
cyclically to remove remaining variations in the solvent region.
Today computer techniques have minimized the labour of defining the molecular surface. But
there is still a serious hazard of assigning structure as solvent.
Averaging:
There could be more than one copy of structure in the asymmetric unit of a crystal (i.e. if a
protein is an oligomer that consists of identical subunits related by one of the point-group
symmetries). If the internal symmetry between these copies applies locally and does not show
up in the crystal symmetry, we call it local or non-crystallographic symmetry which is
surprisingly common.
The first step of averaging is to find out how the subunits are arranged. This problem is similar
to rotation and translation in MR. After this the electron density of all symmetry related points
should be identical. We can force them to be equal by replacing the density at some points by
the average density at all equivalent points. We can do so for every point in the subunit. Errors
tend to cancel out. For this a carefully defined volume must be chosen.
Averaging makes structure determination for the capsids of spherical viruses a routine
procedure.
Skeletonization:
It should be possible to make out a long connected path (main chain)
o
Proteins tend to have a similar frequency distribution of electron density values
Electron-density map calculation:
ρexp(x) = V-1 Σh Fexp(h) exp(-2πihx)



with Fexp(h) = |Fobs(h)|exp[ibest(h)]
Difference map:
You want to check the validity of a newly interpreted model. For this you compare the electron
density map of a built model with the map from which it has been built.
ρobs(x ) – ρcalc(x) = V-1Σh(Fobs(h) – Fcalc(h)) exp[-2πihx]
Fcalc you can calculate from the coordinates of the built model.
ρ(Fo-Fc) > 0 where features exist not adequately represented in the model
ρ(Fo-Fc) < 0 where features exist not supported by the observation
=> So you can recognize missing or wrongly spaced atoms.
Map after improvement of phases:
When you use phases obtained from density modification you create an electron density which
confirms your model. You cannot eliminate this bias (Voreingenommenheit) but you can reduce
it by calculating a “2Fo-Fc-map”. Also other variants like a “3Fo-2Fc-map” are employed. This
maps look like ordinary electron density maps of a protein. Model bias can be further reduced
by a technique known as sigma-weighting (not mentioned in the script).
(2|Fobs(h)| - |Fcalc(h)|)exp[icalc(h)] = | Fobs(h)| exp[icalc(h)] + (|Fobs(h)| - |Fcalc(h)|)exp[icalc(h)]
=> This map (native map plus difference map) should look like the correct model.
Omit map:
Sometimes the interpretation of some parts of the map is still doubtful. We can then delete the
atoms of this part and calculate the phase of the remainder which is less accurate but an
unbiased estimate for the volume omitted. The omitted volume should be at most one-eighth.
This technique you can apply in Fo-Fc- as well as in 2Fo-Fc-maps.
Model building:
Computer assistance proposes possible conformations of the main chain and atomic positions
found in similar main chain conformations and side chain conformations most frequently found.
o
o
A hierarchical knowledge based approach:
Library of well refined
1. C positions
structures
2. Main chain atoms
3. Side chains in correct conformations
A fully automatic approach:
peaks in electrons density -> atoms -> atom pattern is interpreted as protein
structure
 Steps of model building:
1. Produce a continuous trace approximately along the polypeptide (main) chain
(skeletonised map)
2. Identify at least one point in the sequence with the help of the density or known
markers.
Easily recognizable points (Sulfur atoms of high density, aromatic side chains, Glycins)
can help align a known sequence (see number 5) to a 3D structure seen for the first
time.
<- recognizable amino acids: Se-Met, Hg-Cys
<- Further recognizable: active site, prosthetic groups etc.
3. The direction of main chain can be determined with the help of -helices whose side
chains point toward the N-Terminus (Christmas tree). You can only use β-sheets for this
purpose when your resolution is high enough to see the carbonyl groups.
4. Place the C atoms
<- 3.8Å ruler:
3.8Å is the distance between two C atoms
Search for this distance in the map to find C atoms.
=> Now you can place all further main chain atoms and the Cβ atoms (side chain).
5. Assign the correct sequence number to the residues and identify them if possible. Built
poly-Ala to some amino acids when the position in the sequence cannot be determined.
You will recognize them in the electron density map by this way.
6. Fill in other atoms ( known stereochemistry, library of common backbone
conformations)
7. Add side chains ( common rotamer conformations: Rotamers are isomers
interconverted by rotation about a single bond.)
8. Manual or (semi)automatic fitting: Let fit each residue to the density.
Possible errors:
- misplaced C atoms
- side chains not in the most common rotamer conformation
R-Factor: Quality of the model
This factor compares the observed structure amplitudes to those calculated from the
current model to monitor the quality of the model. It indicates how well the model
matches the observed data:
R=
𝛴ℎ |𝐹𝑜𝑏𝑠 | − |𝐹𝑐𝑎𝑙𝑐 ||
𝛴ℎ |𝐹𝑜𝑏𝑠 |
Because we know the change in amplitude with change in coordinates or in atomic B-factor
we can calculate the partial derivative of the R-factor with respect to each atomic position
or to B.A 3D derivative is a gradient and each atom is moved along this gradient.
Model refinement:
Structure refinement can only begin when most of the atoms are at their correct position.
Therefore we need the preceding steps (density modifications, model building). It is better to leave
badly placed atoms out at first. Therefore further rounds of model building are needed after
refinement.
The aim of structure refinement is to adjust a structure (that means its parameters x, y, z and B)
to give the best possible fit to the crystallographic observations:
=> Minimising E = Σh[w(h)*(|Fobs(h)|-|Fcalc(h)|)2]
(least square method)
(The weighting term w(h) can be adjusted according to the accuracy of the observation.)
Refinement is a large size project only possible through the availability of fast computers.
Overdetermintation:
The model is specified by a number of variables (the coordinates and the B-factor). Refinement
procedures can only work when there are enough observations (= reflections: at least as many as
the number of variables, but in practice more are needed).
Imagine a diffraction experiment with 25,000 measured independent X-ray reflection (depends on
resolution: The better the resolution (small d) the bigger the number of reflections) and 2000
nonhydrogen atoms (hydrogen atoms are disregarded because they have only one electron and
therefore their scattering influence is low). So you have 8000 parameters. 25,000/8000 is about 3.
This is a poor overdetermination and so we incorporate as many additional observations as possible
(known bond length and angles also true in the investigated protein structure, solvent flattening,
noncrystallographic symmetry, …).
Restraints (= extra information):
o
Covalent:
 Torsion/dihedral angles
 (controls C-C-distance), φ (controls C’-C’-distance) and ψ (controls N-Ndistance)
 Bond angles (i.e. of side chains:1 tills maximum 5 in arginine)
 Bond length
Partial double bond
Partial double bond
The partial double bond restrains the value of .
* torsion angles:
o

Noncovalent:
 Electrostatic (attraction between opposite charges)
 Hydrogen bonding (N-H  O=C, weaker than electrostatic, stronger than
Van der Waal’s)
 Van der Waal’s (between permanent or induced dipoles)
 Chiral Volume
(volume of a unit cell calculated by using the interatomic vectors;
Atom1 is the chiral center, atom2 till 5 are bound to it. Imagine atom2 till 4

o
in the plane of picture. Atom1 below this plane -> chiral volume is plus,
otherwise it is minus)
Planar groups (aromatic side chains, Glu, Asp, Arg)
Atomic B-factor:
Atoms bounded to each other should not have large B-factor-differences.
 Restraints are considered in terms of energy:
𝑞𝑞
E = EX-ray + Σbondsk(d-d0)2 + Σanglesk(-0)2 + Σtorsionk[1+cos(nw+w0)] + ΣiΣg4𝜋𝑖𝑟𝑗 + …
𝑖,𝑗
2
with EX-ray = Σh[w(h)*(|Fobs(h)|-|Fcalc(h)|) ] (least square methode)
d0, 0 and w0 are ideal values and k is the force constant for bond distortion(Verzerrung)
imposing a certain flexibility instead of rigidity. Be aware that when i.e. d differs from d0 E
becomes bigger. That runs counter to the aim of refinement.
Radius of convergence:
In math a radius of convergence of a power series (Potenzreihe) defines an interval of
convergence in which this series converges toward a limit (Grenzwert).
In structure refinement it describes how far away from the truth (limit) the model can be.
- Increasing data reduces false local minima.
- This false local minima can be overcome by a good method.
Refinement can only find the next minimum. Starting at the red crosses this can result in good
or bad models.
R-factor
Parameter: i.e. x-coordinat of atom1
To escape from local minima programs use two alternating techniques:
- session of interactive graphics
- automatic model improvement
After a round of computer refinement 2Fo-Fc- (correct model) and Fo-Fc-maps (best
estimate of difference) are calculated:
Negative difference density yielding a wrong
rotamer
Positive difference density yielding a wrong
rotamer
2Fo-Fc-density yielding the right
conformation
More refinement with the help of waters, ions or ligands:
- Calculate a difference map and interpret positive peaks (not available in the model) by adding
waters, ions or ligands.
- ad water: It contains two hydrogen atoms (only one electron each -> weak scattering) and one
oxygen atom (six valence electrons -> good scattering).
A well-ordered water molecule is more important than a poorly ordered part of the protein.
You need a high resolution to recognize them: better than 2.8Å-3Å.
You place just the oxygen to the positive peaks if there is no other atom but there is an atom
nearby. Also avoid putting waters into features better interpreted as ions, ligands, un- or missbuilt
proteins.
Ad B-factor:
- B(main chain) < B(side chain)
- Differences between maximum and minimum B in side chain (5-60Å2) bigger than in main chain
(5-35Å2)
- Logically B-factors of disordered parts of the protein are higher.
Ad anisotropic B-factors:
Anisotrophism = dependence on directionality:
This factor allows the density distribution of atoms to show the particular direction in which they
are more free to move. As a result the atom density cloud can be modeled by an ellipsoidal
Gaussian with six parameters (6Bij). The total number of parameters is therefore no more four (x, y,
z, B) but nine per atom.
The refinement of the anistropic B-factor needs high resolution (like atomic B-factor does) and a
high quality model.
Ad multiple occupancy:
I.e. side chains of tyrosine, serine or valine are flexible and have more than one conformation and
their atoms have more than one location. Therefore we must define multiple alternative locations
(multiple occupancies) for these atoms.
Validation:
When the creation of the best possible model is finished we need some insight into its accuracy and
reliability:
R-Factor:
R=
-
-
𝛴ℎ |𝐹𝑜𝑏𝑠 | − |𝐹𝑐𝑎𝑙𝑐 ||
𝛴ℎ |𝐹𝑜𝑏𝑠 |
Our aim is a maximum R-factor of 20%.
 Proteins: 15% - 20%
 Small molecules 2% - 6%
The ideal R-factor value would be zero.
Compare: randomly distributed atoms
 in a centrosymmetric space group: R = 0.83
 in a noncentrosymmetric space group: R = 0.59
Free R-factor:
Sometimes R-factors can reach surprisingly low values which appear later to be incorrect. That is
because for instance the number of model parameters is taken to high. Therefore Brünger (1992,
1993) introduced the free R-factor. He divided (cross-validation) the observed reflections into a
working set and test set, a random selection of 5-10% not correlated with the reflections in the
working set. While the working set is used for refinement the test set is used for calculation of the
free R-factor.
Rfree =
∑ℎ𝑘𝑙𝑇||𝐹𝑜𝑏𝑠 |− 𝑘|𝐹𝑐𝑎𝑙𝑐 ||
∑ℎ𝑘𝑙𝑇 |𝐹𝑜𝑏𝑠 |
for all reflections hkl belonging () to the test set T
The underlying clue is that if a structure is well improved with the help of refinement R(working set)
as well as free R will decrease. But if R(working set) decreases due to fitting to noise free R will
increase.
The free R-factor represents an unbiased monitor of structure refinement and helps to prevent
over-fitting of the structure to the observations by asking how well did the model predicts data it
HAS NOT BEEN FIT TO (test set).
-
In practice the free R-factor is 5% higher than the normal one.
Low resolution (3Å): Rfree = 30%
High resolution (better than 1.5Å): Rfree approaches R
RMSD
Root mean square deviation, a frequently-used measure of differences between a real value and its
ideal) from ideal bond length, angles, planes, chiral volumes etc.
- bond length: 0.02Å (C-C 1.54Å, N-C 1.43Å, O-H  O-H 2.8Å)
- bond angles: 4°
Ramachandran plot:
Stereochemistry of the main chain can be investigated by plotting the dihedral angel ψ against the
dihedral angle φ.
The diagram distinguishes allowed, partially allowed and
regions depending on Van der
Waal’s distances and tetrahedral angles (109.5° for H-C-H, tetrahedral shape of i.e. methan). In the
partially allowed region the Van der Waal’s radia are (only) slightly distorted.
As refinement proceeds and the structure is improved the distribution of the angles in the plot
improves also. That means that for high refined structures nearly all φ/ψ-values lie in the allowed
region.
A Ramachandran plot is made after every session of manual fitting or automated refinement.
Luzzati plot:
Luzzati (1952) has developed an indicator of the precision of the atomic coordinates of the
molecular model.
The precision depends on the resolution and quality (Rmerge) of the diffraction data.
Best value: 1/5-1/10 of maximum resolution
Low precision is due to
- thermal motion* (atomic vibration around an mean position)
- model disorder (static* or dynamic: atoms/atom groups are not at equivalent positions in all
molecules)
* not distinguishable, reflected in the thermal B-factor:
> For this reason a crystal structure determined with the help of X-ray diffraction is a time and
space average.
The clue of the Luzzati plot is that the difference between |Fobs| and |Fcalc| in the R-factor is due
exclusively to errors in the position of atoms. Therefore the R-factor is plotted against the
reciprocal value of d. This is a function of *d-1.  is the average value of the error in the atomic
coordinates of the molecular model. Lines for different  values are calculated and the line closest
to experimental curve (of our protein) determines the  value for our crystal structure.
The final result:
 3D coordinates of the atoms of the molecule
 PDB format
(The protein data bank file format is a textual file format. It describes the 3D structure
of a protein and often of the waters, ions, ligands etc. crystallized together with the
protein. It mentions orthogonalised coordinates in Å, temperature factors and
occupancies.)
We can calculate:
o
o
o
o
o
Bond length (coordination distances)
Angles
Torsion angles (angles between the normals on two planes)*
H bonds and electrostatic interactions
Planes and the angles between them
Programms:
o
o
o
o
For stereochemistry
For surfaces
For Comparison of two molecules
For secondary structure
ad Surface-programs:
o
o





Molecular surface area (MSA determined by atomic radii)
Solvent accessible area (SAA bigger than MSA, determined by Van der Waal’s radii)
Surface exposed to solvent and other macromolecules, ligands, inhibitors,
substrates etc.
Macromolecule volume
Drug targets: Cavities and pockets
Surface between two or more macromolecules like in a dimer, between protein and
DNA or protein and ligand
SAA: Which amino acids are buried or exposed?
Surface properties: electrostatic potential and amino acid conservation (as derived
from the amino acid sequence alignment)
We can compare two structures:
Comparison is carried out by superposition of the two molecules (Mol1 = Mol2*Rot+Tra).
We then can determine the average RMSD between equivalent atoms in Å and their
distances also in Å.
Resolution:
http://www-structmed.cimr.cam.ac.uk/Course/Overview/Overview.html: “If the atoms were
completely still, the molecules throughout the crystal were in identical conformations, and
the crystal were perfectly ordered, then all the molecules would scatter in phase regardless
of the angle of scattering and we would be able to collect diffraction data to a limit imposed
only by the wavelength of the X-rays. The electron density map would have peaks at each of
the atomic positions. But reality is rarely so favourable. Proteins are generally fairly flexible,
and crystals have lattice disorder, i.e. the repeating units are not necessarily perfectly aligned
throughout the crystal. So as we start to look at finer details by going to higher scattering
angles, the diffraction pattern starts to cancel out. For this reason, most protein structures are
limited to a level of detail where atoms are not resolved from one another. What we see is
typically tubes of electron density for atoms that are bonded together.”
Limit of resolution:  0.707*dmin
Bragg law: n = 2dsin -> dmin = /(2sinmax)
2
L/2
S
max = (arctan[L/2S]):2

Small d means high resolution.
Interpretable density maps: resolution of 2.5Å to 3Å
This resolution is too small to resolve two covalently bound atoms. But nevertheless can
built the model by using building blocks (groups of atoms instead of individual atoms) like
amino acid residues.
o Individual atoms can be fitted at a resolution of 1Å.
o Alpha helices can be seen at 6Å.
Beta sheets cannot be seen at this resolution.
o Very low resolution: large portions have to be fitted at one time. At a resolution
lower than 8Å only whole molecules can be placed.
o High resolution amplitudes depend on B more than low resolution amplitudes do.
Therefore we need high resolution (2.5Å or better) to refine atomic B-factors.
Download