Erice CCP4 (refmac) wokshop Garib N Murshudov York Structural Laboratory Chemistry Department University of York 1 Contents TWIN refinement in REFMAC and its extension Rfactors: be careful Problems of low resolution refinement Some tools for low resolution: ncs, external restraints, B value restraints and “jelly” body 5) Map sharpening: General approach and some applications 6) JLigand 7) Questions and answers 1) 2) 3) 4) 2 Available refinement programs • • • • • • • • • SHELXL CNS REFMAC5 TNT BUSTER/TNT Phenix.refine RESTRAINT MOPRO MAIN 3 What can REFMAC do? • • • • • • • • • • • • Simple maximum likelihood restrained refinement Twin refinement Phased refinement (with Hendrickson-Lattmann coefficients) SAD/SIRAS refinement Structure idealisation Library for more than 8000 ligands (from the next version) Covalent links between ligands and ligand-protein Rigid body refinement NCS local, restraints to external structures TLS refinement Map sharpening etc 4 Twinning merohedral and pseudo-merohedral twinning Crystal symmetry: Constrain: Lattice symmetry *: (rotations only) Possible twinning: P3 P622 P2 β = 90º P222 P2 P2 merohedral pseudo-merohedral - Domain 1 Twinning operator - Domain 2 Crystal lattice is invariant with respect to twinning operator. The crystal is NOT invariant with respect to twinning operator. 6 The whole crystal: twin or polysynthetic twin? A single crystal can be cut out of the twin: twin polysynthetic twin yes no The shape of the crystal suggested that we dealt with polysynthetic OD-twin Twinning If we have only two domains related with twin operator then observed intensities will be IT1 = (1-α)I1 + αI2 IT2 = (1-α)I2 + αI1 IT1 and IT2 are observed intensities, I1 and I2 are intensities from single crystals, α is proportion of the second domain. α is between 0 and 0.5. When it is 0.5 then twin called perfect twin. In principle these equations can be solved and I1 and I2 (intensities for single crystal) can be calculated. It is called detwinning. It turns out that detwinning increases errors in intensities. Also, completeness after detwinning can decrease substantially. Moreover when α=0.5 it is impossible to detwin. For some purposes (e.g. for phasing, sometimes for molecular replacement) detwinning may give reasonable results. For refinement general rule is to avoid detwinning and use the data directly. Almost all known refinement programs can handle twinning to a certain degree. Effect on intensity statistics Take a simple case. We have two intensities: weak and strong. When we sum them we will have four options w+w, w+s, s+w, s+s. So we will have one weak, two medium and one strong reflection. As a results of twinning, proportion of weak and strong reflections becomes small and the number of medium reflections increases. It has effect on intensity statistics In probabilistic terms: without twinning distribution of intensities is χ2 with degree of freedom 2 and after perfect twinning degree of freedom increases and becomes 4. χ2 distributions with higher degree of freedom behave like normal distribution Example of effect of twinning on cumulative distributions Cumulative distribution is the proportion (more precisely probability ) of data below given values - F(x) = P(X<x). No twinning. Around 20% of acentric reflections are less than 0.2. Perfect twinning. Only around 5% of acentric reflections are less than 0.2. Cumulative distribution of normalised structure factors Red lines - of acentric reflections, Blue lines - centric reflections Twinning and Pseudo Rotation In many cases twin symmetry is (almost) parallel to noncrystallographic symmetry. In these case we need to consider the effects of two phenomena: twinning and NCS. Because of NCS two related intensities (I1 and I2) will be similar (not identical) to each other and because of twinning proportion of weak intensities will be smaller. It has effect on cumulative intensity distributions as well as on H-test Perfect twinning Cumulative intensity distribution with and without interfering NCS Partial twinning H-test with and without interfering NCS Twin refinement in REFMAC Twin refinement in refmac (5.5 or later) is automatic. – Identify “twin” operators – Calculate “Rmerge” (Σ|Ih-<I>twin| /ΣΙh) for each operator. Ιf Rmerge>0.50 keep it: Twin plus crystal symmetry operators should form a group – Refine twin fractions. Keep only “significant” domains (default threshold is 5%): Twin plus symmetry operators should form a group Intensities can be used If phases are available they can be used Maximum likelihood refinement is used 12 Likelihood The dimension of integration is in general twice the number of twin related domains. Since the phases do not contribute to the first part of the integrant the second part becomes Rice distribution. The integration is carried out using Laplace approximation. These equations are general enough to account for: non-merohedral twinning (including allawtwin), unmerged data. A little bit modification should allow handling of simultaneous twin and SAD/MAD phasing, radiation damage 13 Electron density: likelihood based Map coefficients It seems to be working reasonable well. For unbiased map it is necessary to integrate over errors in all parameters (observations as well as refined parameters. 14 Electron density: 1jrg Warning: Usually twin refinement reduces R factors but electron density does not improve much “non twin” map “twin” map 15 Effect of twin on electron density: Data provided by Ivan Campeotto Space group: Cell parameters: Resolution: Twin operators: Twin fractions: “Rmerge”: Rmerge for calc: R/Rfree no twin: R/Rfee twin: P21 54.63 142.77 84.37 90.00 108.76 90.00 1.8 -H, -K, H+L (or H, -K, -H-L) 0.46, 0.54 0.065 0.36 0.30/0.34 0.21/0.26 R/Rfree are final statistics after refinement and rebuilding Note: Reindexing may be needed. In these cases REFMAC warns 16 that you may reconsider reindexing. Effect of twin on electron density: Data provided by Ivan Campeotto Twin off (difficult rebuilding) Twin on (final model) 17 Twinning: Warning Rfactors Random R factor in the presence of perfect twin with twinning modeled is around 40. If twinning is not modeled then it is around 50. Be careful after molecular replacement Small twin fractions: Be careful with small twin fractions. Refmac removes twin domains with fraction less than 5% High symmetry: If twin fractions are refined towards perfect twinning then space group may be higher. Program zanuda from YSBL website may sort out some of the space group uncertainty problems: www.ysbl.york.ac.uk/YSBLPrograms/index.jsp 18 Twin: Few warnings about R factors For acentric case only: For random structure Crystallographic R factors No twinning 58% For perfect twinning: twin modelled 40% For perfect twinning without twin modelled 50% R merges without experimental error No twinning 50% Along non twinned axes with another axis than twin 37.5% Non twin 19 Twin Effect of twinning on electron density Using twinning in refinement programs is straightforward. It improves statistics substantially (sometimes R-factors can go down by 10%). However improvement of electron density is not very dramatic (just like when you use TLS). It may improve electron density in weak parts but in general do not expect miracles. Especially when twinning and NCS are close then improvements are marginal. 20 Problems of low resolution refinement 1) Function to describe fit of the model into experiment: likelihood or similar 1) Data may come from very peculiar “crystals”: Twin, OD, multiple cell 2) Radiation damage 3) Converting I-s to |F| may not be valid 2) Limited and noisy data: use of available knowledge 1) Known structures 2) Internal patterns: NCS, secondary structure 3) Smeared electron density with vanishing side chains, secondary structures, domains: High B values and series termination: 1) Filtering methods: Solve inverse problem with regulariser 2) Missing data problem: Data augmentation, bootstrap 21 Use of available knowledge 1) NCS local 2) Restraints to known structure(s) 3) Restraints to current inter-atomic distances (implicit normal modes or “jelly” body) 4) Better restraints on B values These are available from the version 5.6 Note Buster/TNT has local NCS and restraints to known structures CNS has restraints to known structures (they call it deformable elastic network) Phenix has B-value restraints on non-bonded atom pairs and automatic global NCS Local NCS (only for torsion angle related atom pairs) was available in SHELXL since the beginning of time 22 Auto NCS: local and global 1. Align all chains with all chains using Needleman-Wunsh method 2. If alignment score is higher than predefined (e.g.80%) value then consider them as similar 3.Find local RMS and if average local RMS is less than predefined value then consider them aligned 4. Find correspondence between atoms 5. If global restraints (i.e. restraints based on RMS between atoms of aligned chains) then identify domains 6.For local NCS make the list of corresponding interatomic distances (remove bond and angle related atom pairs) 7.Design weights The list of interatomic distance pairs is calculated at every cycle 23 Auto NCS Aligned regions Global RMS is calculated using all aligned atoms. Local RMS is calculated using k (default is 5) residue sliding windows and then averaging of the results Chain A Chain B k(=5) Nk 1 1 Ave(Rm sLoc) k Rm sLoci N k 1 i1 RMS Ave(Rm sLoc) N 24 Auto NCS: Neighbours Water or ligand After alignment, neighbours are analysed. 1) Each water, ligand is assigned to the chain they are close to. 2) Neighbours included in restrains when possible Shell 2 Shell 1 Chain A Water or ligand Chain B Shell 2 Shell 1 25 Auto NCS: Iterative alignment Example of alignment: 2vtu. There are two chains similar to each other. There appears to be gene duplication RMS – all aligned atoms Ave(RmsLoc) – local RMS ********* Alignment results ********* ------------------------------------------------------------------------------: N: Chain 1 : Chain 2 : No of aligned :Score : RMS :Ave(RmsLoc): ------------------------------------------------------------------------------: 1 : J( 131 - 256 ) : J( 3 - 128 ) : 126 : 1.0000 : 5.2409 : 1.6608 : : 2 : J( 1 - 257 ) : L( 1 - 257 ) : 257 : 1.0000 : 4.8200 : 1.6694 : : 3 : J( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 5.2092 : 1.6820 : : 4 : J( 3 - 128 ) : L( 131 - 256 ) : 126 : 1.0000 : 3.0316 : 1.5414 : : 5 : L( 131 - 256 ) : L( 3 - 128 ) : 126 : 1.0000 : 0.4515 : 0.0464 : ---------------------------------------------------------------------------------------------------------------------------------------------26 Auto NCS: Conformational changes Domain 2 In many cases it could be expected that two or more copies of the same molecule will have (slightly) different conformation. For example if there is a domain movement then internal structures of domains will be same but between domains distances will be different in two copies of a molecule Domain 1 Domain 2 Domain 1 27 Robust estimators One class of robust (to outliers) estimators are called M-estimators: maximum-likelihood like estimators. One of the popular functions is Geman-Mcclure. Essentially when distances are similar then they should be kept similar and when they are too different they should be allowed to be different. This function is used for NCS local restraints as well as for restraints to external structures Red line: x2 Black line: x^2/(1+w x^2) where x=(d1-d2)/σ, w=0.1 28 Restraints to external structures It is done by Rob Nicholls ProSmart Compares Two Protein Chains • • • • Conformation-invariant structural comparison Residue-residue alignment Superimposition Residue-based and global similarity scores Produces local atomic distance restraints • Based on one or more aligned chains • Possibility of multi-crystal refinement 29 ProSmart Restrain structure to be refined known similar structure (prior) xÅ 30 ProSmart Restrain structure to be refined known similar structure (prior) Remove bond and angle related pairs 31 To allow conformational changes, Geman-McClure type robust estimator functions are used 32 Restraints to current distances The term is added to the target function: w(| d | | d current |) 2 pairs Summation is over all pairs in the same chain and within given distance (default 4.2A). dcurrent is recalculated at every cycle. This function does not contribute to gradients. It only contributes to the second derivative matrix. It is equivalent to adding springs between atom pairs. During refinement inter-atomic distances are not changed very much. If all pairs would be used and weights would be very large then it would be equivalent to rigid body refinement. It could be called “implicit normal modes”, “soft” body or “jelly” body refinement. 33 B value restraints and TLS Designing restraints on B values is much more difficult. Current available options to deal with B values at low resolutions 1)Group B as implemented in CNS 2)TLS group refinement as implemented in refmac and phenix.refine Both of them have some applications. TLS seems to work for wide range of cases but unfortunately it is very often misused. One of the problems is discontinuity of B values. Neighbouring atoms may end up having wildly different B values In ideal world anisotropic U with good restraints should be used. But this world is far far away yet. Only in some cases full aniso refinement at 3Å gives better R/Rfree than TLS refinement. These cases are with extreme ansiotropic data. TLS 2 TLS 1 loop 34 Parameters: B value restraints and TLS Restraints on B values 1)Differences of projections of aniso U of atom on the bond should be similar (rigid bond) 2)Kullback-Liblier (conditional entropy) divergence should be small: For isotropic atoms (for bonded and non-bonded atoms) B1/B2+B2/B1-2 1)Local TLS: Neighboring atoms should be related as TLS groups (not available yet) 35 Kullback-Leibler divergence If there are two densities of distributions – p(x) and q(x) then symmetrised KullbackLeibler divergence between them is defined (it is distance between distributions) 1 p(x) q(x) ( p(x)log( )dx p(x)log( )dx) 2 q(x) p(x) If both distributions are Gaussian with the same mean values and U1 and U2 variances then this distance becomes: 1 1 tr(U1 U2 U2U1 2I) And for isotropic case it becomes B1 B2 (B1 B2 ) 2 3( 2) 3 B2 B1 B1B2 Restraints for bonded pairs have more weights more than for non-bonded pairs. For nonbonded atoms weights depend on the distance between atoms. This type of restraint is also applied for rigid bond restraints in anisotropic refinement 36 Example, after molecular replacement 3A resolution, data completeness 71% Rfactors vs cycle Black – simple refinement Red – Global NCS Blue – Local NCS Green – “Jelly” body Solid lines – Rfactor Dashed lines - Rfree 37 Example: 4A resolution, data from pdb 2r6c Rfactors vs cycle Black – Simple refinement Red – External restraints Blue – “Jelly” body Solid lines – Rfactor Dashed lines - Rfree 38 Example: 5A resolution, data from pdb 2w6h Rfactors vs cycle Black – Simple refinement Red – External restraints Blue – “Jelly” + local NCS Solid lines – Rfactor Dashed lines - Rfree 39 MAP SHARPENING: INVERSE PROBLEM We want to observe ρ0(x) but we observe ρ(x). These two entities are related: K(x, y) (y)dy (x) 0 If K is known, calculate ρ0. In general problem is easy: Discretise and solve the linear equation. However these problems are ill-posed (small perturbation in the input causes large deviation in the output). In practice: by sharpening signal as well as noise are amplified. Regularisation may help: || K0 ||2 f () ==> min 0 = (KT K L)1K T L is related with regularisation function. For L2 norm (value of ρ should be small) L is identity and for Sobolev norm (ρ should be smooth) of first order it is Laplace operators 40 MAP SHARPENING: INVERSE PROBLEM In general K is effect of such terms as TLS or smoothly varying blurring function. Noises in electron density: series termination, errors in phases and noises in experimental data. Very simple case: K is overall B value. Then the problem is solved using FFT: Fourier transform of Gaussian is Gaussian, Fourier transform of Laplace operator is square length of the reciprocal space vector. eB|s| / 4 2B|s| 2 / 4 F K (s,B)F 2 e |s| 2 Fdeblurred 41 MAP SHARPENING: INVERSE PROBLEM REGULARISATION PARAMETER. One way of selecting regularisation parameter: Minimise predicted error. PE (A 1) 2 | Fmap |2 2 observed A | F F map |2 2 observed A | F | 2 unobserved Where Aα=K Kα If we restore unobserved data with their expected values Fe then the last term would be replaced by (A 1) | F | 2 A | F F | 2 2 2 e unobserved e unobserved Restoring seems to give less predictive error (problem of bias towards remains) error in phases “Best” regularisation parameter is that that minimises PE. 42 MAP SHARPENING: 2R6C, 4Å RESOLUTION Original Sharpening, median B α0 No sharpening Sharpening, median B α optimised Top left and bottom: After local NCS refinement 43 Some of the other new features in REFMAC SAD refinement SIRAS refinement available from version 5.5 available from version 5.6 New and complete dictionary Improved mask solvent Jligand for ligand dictionary and link description available from version 5.6 available from version 5.6 44 How to use new features Download refmac from the website www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_linux.tar.gz www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_macintel.tar.gz Download the dictionary: www.ysbl.york.ac.uk/refmac/data/refmac_experimental/refmac5.6_dictionary_v5.18.gz Change atom names using molprobity (optional: important if you have dna/rna) http://molprobity.biochem.duke.edu/ Refmac refmac5 with the new one and you are ready for the new version. 45 Twin refinement (it works with older version also 46 Adding external keywords • Add the following command to a file: ncsr local # automatic and local ncs ridg dist sigm 0.05 # jelly body restraints mapcalculate shar # regularised map sharpening Save in a file (say keyw.dat) 47 Add external keywords file in refmac interface Browse files 48 Add external keywords file in refmac interface Select keywords file 49 Add external keywords file in refmac interface Keywords file 50 JLigand Download from: http://www.ysbl.york.ac.uk/mxstat/JLigand/1.html Jligand.jar as well as tutorials. Follow the instructions there. 51 Jligand People involved: Andrey Lebedev and Paul Young (and ccp4) A GUI to design ligands and covalent link descriptions. Link 52 Conclusion • Twin refinement improves statistics and occasionally electron density • Use of similar structures should improve reliability of the derived model: Especially at low resolution • NCS restraints must be done automatically: but conformational flexibility must be accounted for • “Jelly” body works better than I thought it should • Regularised map sharpening looks promising. More work should be done on series termination and general sharpening operators 53 Acknowledgment York Leiden Alexei Vagin Pavol Skubak Andrey Lebedev Raj Pannu Rob Nocholls Fei Long CCP4, YSBL people REFMAC is available from CCP4 or from York’s ftp site: www.ysbl.york.ac.uk/refmac/latest_refmac.html This and other presentations can be found on: www.ysbl.york.ac.uk/refmac/Presentations/ 54