Refmac_Erice_workshop - York Structural Biology Laboratory

advertisement
Erice CCP4 (refmac) wokshop
Garib N Murshudov
York Structural Laboratory
Chemistry Department
University of York
1
Contents
TWIN refinement in REFMAC and its extension
Rfactors: be careful
Problems of low resolution refinement
Some tools for low resolution: ncs, external restraints, B
value restraints and “jelly” body
5) Map sharpening: General approach and some applications
6) JLigand
7) Questions and answers
1)
2)
3)
4)
2
Available refinement programs
•
•
•
•
•
•
•
•
•
SHELXL
CNS
REFMAC5
TNT
BUSTER/TNT
Phenix.refine
RESTRAINT
MOPRO
MAIN
3
What can REFMAC do?
•
•
•
•
•
•
•
•
•
•
•
•
Simple maximum likelihood restrained refinement
Twin refinement
Phased refinement (with Hendrickson-Lattmann coefficients)
SAD/SIRAS refinement
Structure idealisation
Library for more than 8000 ligands (from the next version)
Covalent links between ligands and ligand-protein
Rigid body refinement
NCS local, restraints to external structures
TLS refinement
Map sharpening
etc
4
Twinning
merohedral and pseudo-merohedral twinning
Crystal symmetry:
Constrain:
Lattice symmetry *:
(rotations only)
Possible twinning:
P3
P622
P2
β = 90º
P222
P2
P2
merohedral
pseudo-merohedral
-
Domain 1
Twinning operator
-
Domain 2
Crystal lattice is invariant with respect to twinning operator.
The crystal is NOT invariant with respect to twinning operator.
6
The whole crystal: twin or polysynthetic twin?
A single crystal can be cut
out of the twin:
twin
polysynthetic
twin
yes
no
The shape of the crystal suggested that we dealt with polysynthetic OD-twin
Twinning
If we have only two domains related with twin operator then observed intensities will be
IT1 = (1-α)I1 + αI2
IT2 = (1-α)I2 + αI1
IT1 and IT2 are observed intensities, I1 and I2 are intensities from single crystals, α is proportion
of the second domain. α is between 0 and 0.5. When it is 0.5 then twin called perfect twin.
In principle these equations can be solved and I1 and I2 (intensities for single crystal) can be
calculated. It is called detwinning. It turns out that detwinning increases errors in intensities.
Also, completeness after detwinning can decrease substantially. Moreover when α=0.5 it is
impossible to detwin.
For some purposes (e.g. for phasing, sometimes for molecular replacement) detwinning may
give reasonable results. For refinement general rule is to avoid detwinning and use the data
directly. Almost all known refinement programs can handle twinning to a certain degree.
Effect on intensity statistics
Take a simple case. We have two intensities: weak and strong. When
we sum them we will have four options w+w, w+s, s+w, s+s. So we
will have one weak, two medium and one strong reflection.
As a results of twinning, proportion of weak and strong reflections
becomes small and the number of medium reflections increases. It has
effect on intensity statistics
In probabilistic terms: without twinning distribution of intensities is χ2
with degree of freedom 2 and after perfect twinning degree of freedom
increases and becomes 4. χ2 distributions with higher degree of
freedom behave like normal distribution
Example of effect of twinning on cumulative
distributions
Cumulative distribution is the proportion (more precisely
probability ) of data below given values - F(x) = P(X<x).
No twinning. Around 20% of
acentric reflections are less
than 0.2.
Perfect twinning. Only around 5%
of acentric reflections are less than
0.2.
Cumulative distribution of normalised structure factors
Red lines - of acentric reflections,
Blue lines - centric reflections
Twinning and Pseudo Rotation
In many cases twin symmetry is (almost) parallel to noncrystallographic symmetry. In these case we need to consider the effects
of two phenomena: twinning and NCS.
Because of NCS two related intensities (I1 and I2) will be similar (not
identical) to each other and because of twinning proportion of weak
intensities will be smaller. It has effect on cumulative intensity
distributions as well as on H-test
Perfect twinning
Cumulative intensity distribution with and without
interfering NCS
Partial twinning
H-test with and without interfering NCS
Twin refinement in REFMAC
Twin refinement in refmac (5.5 or later) is automatic.
– Identify “twin” operators
– Calculate “Rmerge” (Σ|Ih-<I>twin| /ΣΙh) for each operator. Ιf
Rmerge>0.50 keep it: Twin plus crystal symmetry operators
should form a group
– Refine twin fractions. Keep only “significant” domains (default
threshold is 5%): Twin plus symmetry operators should form a
group
Intensities can be used
If phases are available they can be used
Maximum likelihood refinement is used
12
Likelihood
The dimension of integration is in general twice the number of twin
related domains. Since the phases do not contribute to the first part of
the integrant the second part becomes Rice distribution.
The integration is carried out using Laplace approximation.
These equations are general enough to account for: non-merohedral
twinning (including allawtwin), unmerged data. A little bit
modification should allow handling of simultaneous twin and
SAD/MAD phasing, radiation damage
13
Electron density: likelihood based
Map coefficients
It seems to be working reasonable well. For unbiased map it is
necessary to integrate over errors in all parameters (observations as
well as refined parameters.
14
Electron density: 1jrg
Warning: Usually twin refinement reduces R factors but
electron density does not improve much
“non twin” map
“twin” map
15
Effect of twin on electron density:
Data provided by Ivan Campeotto
Space group:
Cell parameters:
Resolution:
Twin operators:
Twin fractions:
“Rmerge”:
Rmerge for calc:
R/Rfree no twin:
R/Rfee twin:
P21
54.63 142.77 84.37 90.00 108.76 90.00
1.8
-H, -K, H+L (or H, -K, -H-L)
0.46, 0.54
0.065
0.36
0.30/0.34
0.21/0.26
R/Rfree are final statistics after refinement and rebuilding
Note: Reindexing may be needed. In these cases REFMAC warns
16
that you may reconsider reindexing.
Effect of twin on electron density:
Data provided by Ivan Campeotto
Twin off (difficult rebuilding)
Twin on (final model)
17
Twinning: Warning
Rfactors
Random R factor in the presence of perfect twin with twinning modeled is
around 40. If twinning is not modeled then it is around 50. Be careful after
molecular replacement
Small twin fractions: Be careful with small twin fractions. Refmac removes
twin domains with fraction less than 5%
High symmetry:
If twin fractions are refined towards perfect twinning then space group may
be higher. Program zanuda from YSBL website may sort out some of the
space group uncertainty problems:
www.ysbl.york.ac.uk/YSBLPrograms/index.jsp
18
Twin: Few warnings about R factors
For acentric case only:
For random structure
Crystallographic R factors
No twinning
58%
For perfect twinning: twin modelled
40%
For perfect twinning without twin modelled
50%
R merges without experimental error
No twinning
50%
Along non twinned axes with another axis than twin
37.5%
Non twin
19
Twin
Effect of twinning on electron density
Using twinning in refinement programs is straightforward. It improves
statistics substantially (sometimes R-factors can go down by 10%).
However improvement of electron density is not very dramatic (just
like when you use TLS). It may improve electron density in weak parts
but in general do not expect miracles. Especially when twinning and
NCS are close then improvements are marginal.
20
Problems of low resolution refinement
1) Function to describe fit of the model into experiment: likelihood or similar
1) Data may come from very peculiar “crystals”: Twin, OD, multiple cell
2) Radiation damage
3) Converting I-s to |F| may not be valid
2) Limited and noisy data: use of available knowledge
1) Known structures
2) Internal patterns: NCS, secondary structure
3) Smeared electron density with vanishing side chains, secondary structures, domains:
High B values and series termination:
1) Filtering methods: Solve inverse problem with regulariser
2) Missing data problem: Data augmentation, bootstrap
21
Use of available knowledge
1) NCS local
2) Restraints to known structure(s)
3) Restraints to current inter-atomic distances (implicit normal modes or “jelly”
body)
4) Better restraints on B values
These are available from the version 5.6
Note
Buster/TNT has local NCS and restraints to known structures
CNS has restraints to known structures (they call it deformable elastic network)
Phenix has B-value restraints on non-bonded atom pairs and automatic global NCS
Local NCS (only for torsion angle related atom pairs) was available in SHELXL since the beginning
of time
22
Auto NCS: local and global
1. Align all chains with all chains using Needleman-Wunsh method
2. If alignment score is higher than predefined (e.g.80%) value then consider
them as similar
3.Find local RMS and if average local RMS is less than predefined value then
consider them aligned
4. Find correspondence between atoms
5. If global restraints (i.e. restraints based on RMS between atoms of aligned
chains) then identify domains
6.For local NCS make the list of corresponding interatomic distances (remove
bond and angle related atom pairs)
7.Design weights
The list of interatomic distance pairs is calculated at every cycle
23
Auto NCS
Aligned regions
Global RMS is calculated using all
aligned atoms.
Local RMS is calculated using k
(default is 5) residue sliding windows
and then averaging of the results
Chain A
Chain B
k(=5)
Nk 1
1
Ave(Rm sLoc) k 
 Rm sLoci
N  k 1 i1
RMS  Ave(Rm sLoc) N
24
Auto NCS: Neighbours
Water or ligand
After alignment, neighbours are analysed.
1) Each water, ligand is assigned to the chain they
are close to.
2) Neighbours included in restrains when possible
Shell 2
Shell 1
Chain A
Water or ligand
Chain B
Shell 2
Shell 1
25
Auto NCS: Iterative alignment
Example of alignment: 2vtu.
There are two chains similar to each other. There appears to be gene duplication
RMS – all aligned atoms
Ave(RmsLoc) – local RMS
********* Alignment results *********
------------------------------------------------------------------------------: N:
Chain 1 :
Chain 2 : No of aligned :Score :
RMS
:Ave(RmsLoc):
------------------------------------------------------------------------------: 1 : J( 131 - 256 ) : J(
3 - 128 ) :
126 : 1.0000 :
5.2409 :
1.6608 :
: 2 : J(
1 - 257 ) : L(
1 - 257 ) :
257 : 1.0000 :
4.8200 :
1.6694 :
: 3 : J( 131 - 256 ) : L(
3 - 128 ) :
126 : 1.0000 :
5.2092 :
1.6820 :
: 4 : J(
3 - 128 ) : L( 131 - 256 ) :
126 : 1.0000 :
3.0316 :
1.5414 :
: 5 : L( 131 - 256 ) : L(
3 - 128 ) :
126 : 1.0000 :
0.4515 :
0.0464 :
---------------------------------------------------------------------------------------------------------------------------------------------26
Auto NCS: Conformational changes
Domain 2
In many cases it could be expected that two or
more copies of the same molecule will have
(slightly) different conformation. For example if
there is a domain movement then internal
structures of domains will be same but between
domains distances will be different in two copies
of a molecule
Domain 1
Domain 2
Domain 1
27
Robust estimators
One class of robust (to outliers) estimators are
called M-estimators: maximum-likelihood like
estimators. One of the popular functions is
Geman-Mcclure.
Essentially when distances are similar then they
should be kept similar and when they are too
different they should be allowed to be different.
This function is used for NCS local restraints as
well as for restraints to external structures
Red line:
x2
Black line: x^2/(1+w x^2)
where x=(d1-d2)/σ, w=0.1
28
Restraints to external structures
It is done by Rob Nicholls
ProSmart
Compares Two Protein Chains
•
•
•
•
Conformation-invariant structural comparison
Residue-residue alignment
Superimposition
Residue-based and global similarity scores
Produces local atomic distance restraints
• Based on one or more aligned chains
• Possibility of multi-crystal refinement
29
ProSmart Restrain
structure to be refined
known similar structure
(prior)
xÅ
30
ProSmart Restrain
structure to be refined
known similar structure
(prior)
Remove bond and angle related pairs
31
To allow conformational changes, Geman-McClure type robust estimator
functions are used
32
Restraints to current distances
The term is added to the target function:
 w(| d |  | d
current
|) 2
pairs
Summation is over all pairs in the same chain and within given distance (default
4.2A). dcurrent is recalculated at every cycle. This function does not contribute to
gradients. It
only contributes to the second derivative matrix.
It is equivalent to adding springs between atom pairs. During refinement inter-atomic
distances are not changed very much. If all pairs would be used and weights would
be very large then it would be equivalent to rigid body refinement.
It could be called “implicit normal modes”, “soft” body or “jelly” body refinement.
33
B value restraints and TLS
Designing restraints on B values is much more difficult.
Current available options to deal with B values at low resolutions
1)Group B as implemented in CNS
2)TLS group refinement as implemented in refmac and phenix.refine
Both of them have some applications. TLS seems to work for wide range of cases but
unfortunately it is very often misused. One of the problems is discontinuity of B values.
Neighbouring atoms may end up having wildly different B values
In ideal world anisotropic U with good restraints should be used. But this world is far
far away yet. Only in some cases full aniso refinement at 3Å gives better R/Rfree than
TLS refinement. These cases are with extreme ansiotropic data.
TLS
2
TLS
1
loop
34
Parameters: B value restraints and TLS
Restraints on B values
1)Differences of projections of aniso U of atom on the bond
should be similar (rigid bond)
2)Kullback-Liblier (conditional entropy) divergence should
be small:
For isotropic atoms (for bonded and non-bonded atoms)
B1/B2+B2/B1-2
1)Local TLS: Neighboring atoms should be related as TLS
groups (not available yet)
35
Kullback-Leibler divergence
If there are two densities of distributions – p(x) and q(x) then symmetrised KullbackLeibler divergence between them is defined (it is distance between distributions)

1 
p(x)
q(x)
(  p(x)log(
)dx   p(x)log(
)dx)
2 
q(x)
p(x)

If both distributions are Gaussian with the same mean values and U1 and U2 variances
then this distance becomes:

1
1
tr(U1 U2  U2U1  2I)
And for isotropic case it becomes

B1 B2
(B1  B2 ) 2
3( 
 2)  3
B2 B1
B1B2
Restraints for bonded pairs have more weights more than for non-bonded pairs. For
nonbonded atoms weights depend on the distance between atoms.

This type of restraint is also applied for rigid bond restraints in anisotropic refinement
36
Example, after molecular replacement
3A resolution, data completeness 71%
Rfactors vs cycle
Black – simple refinement
Red – Global NCS
Blue – Local NCS
Green – “Jelly” body
Solid lines – Rfactor
Dashed lines - Rfree
37
Example: 4A resolution, data from pdb 2r6c
Rfactors vs cycle
Black – Simple refinement
Red – External restraints
Blue – “Jelly” body
Solid lines – Rfactor
Dashed lines - Rfree
38
Example: 5A resolution, data from pdb 2w6h
Rfactors vs cycle
Black – Simple refinement
Red – External restraints
Blue – “Jelly” + local NCS
Solid lines – Rfactor
Dashed lines - Rfree
39
MAP SHARPENING: INVERSE PROBLEM
We want to observe ρ0(x) but we observe ρ(x). These two entities are
related:
 K(x, y) (y)dy  (x)
0
If K is known, calculate ρ0. In general problem is easy: Discretise and
solve the linear equation. However these problems are ill-posed (small
perturbation in the input causes large deviation in the output). In practice:

by sharpening
signal as well as noise are amplified.
Regularisation may help:
|| K0   ||2 f () ==> min
0 = (KT K  L)1K T 
L is related with regularisation function. For L2 norm (value of ρ should
be small) L is identity and for Sobolev norm (ρ should be smooth) of first

order it is Laplace
operators
40
MAP SHARPENING: INVERSE PROBLEM
In general K is effect of such terms as TLS or smoothly varying blurring
function.
Noises in electron density: series termination, errors in phases and noises
in experimental data.
Very simple case: K is overall B value. Then the problem is solved using
FFT: Fourier transform of Gaussian is Gaussian, Fourier transform of
Laplace operator is square length of the reciprocal space vector.
eB|s| / 4
 2B|s| 2 / 4
F  K (s,B)F
2
e
 |s|
2
Fdeblurred

41
MAP SHARPENING: INVERSE PROBLEM
REGULARISATION PARAMETER.
One way of selecting regularisation parameter: Minimise predicted
error.
PE 
 (A 1)
2
| Fmap |2 2
observed
 A | F  F
map
|2   2
observed
 A | F | 
2
unobserved
Where Aα=K Kα
If we restore unobserved data with their expected values Fe then the
last term would be replaced by
(A 1) | F |  2  A | F  F | 
2
2
2
e
unobserved
e
unobserved
Restoring seems to give less predictive error (problem of bias towards
 remains)
error in phases
“Best” regularisation parameter is that that minimises PE. 42
MAP SHARPENING: 2R6C, 4Å RESOLUTION
Original
Sharpening, median B
α0
No sharpening
Sharpening, median B
α optimised
Top left and bottom:
After local NCS
refinement
43
Some of the other new features in
REFMAC
SAD refinement
SIRAS refinement
available from version 5.5
available from version 5.6
New and complete dictionary
Improved mask solvent
Jligand for ligand dictionary and link description
available from version 5.6
available from version 5.6
44
How to use new features
Download refmac from the website
www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_linux.tar.gz
www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_macintel.tar.gz
Download the dictionary:
www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_dictionary_v5.18.gz
Change atom names using molprobity (optional: important if you have dna/rna)
http://molprobity.biochem.duke.edu/
Refmac refmac5 with the new one and you are ready for the new version.
45
Twin refinement (it works with older version also
46
Adding external keywords
• Add the following command to a file:
ncsr local
# automatic and local ncs
ridg dist sigm 0.05 # jelly body restraints
mapcalculate shar # regularised map sharpening
Save in a file (say keyw.dat)
47
Add external keywords file in refmac
interface
Browse files
48
Add external keywords file in refmac
interface
Select keywords
file 49
Add external keywords file in refmac
interface
Keywords file
50
JLigand
Download from:
http://www.ysbl.york.ac.uk/mxstat/JLigand/1.html
Jligand.jar as well as tutorials. Follow the instructions there.
51
Jligand
People involved: Andrey Lebedev and Paul Young (and ccp4)
A GUI to design ligands and covalent link descriptions.
Link
52
Conclusion
• Twin refinement improves statistics and occasionally electron
density
• Use of similar structures should improve reliability of the derived
model: Especially at low resolution
• NCS restraints must be done automatically: but conformational
flexibility must be accounted for
• “Jelly” body works better than I thought it should
• Regularised map sharpening looks promising. More work should
be done on series termination and general sharpening operators
53
Acknowledgment
York
Leiden
Alexei Vagin
Pavol Skubak
Andrey Lebedev
Raj Pannu
Rob Nocholls
Fei Long
CCP4, YSBL people
REFMAC is available from CCP4 or from York’s ftp site:
www.ysbl.york.ac.uk/refmac/latest_refmac.html
This and other presentations can be found on:
www.ysbl.york.ac.uk/refmac/Presentations/
54
Download