Molecular Replacement in CCP4

advertisement
Molecular Replacement in CCP4
Martyn Winn
CCP4 group, Daresbury Laboratory
Data analysis before MR
Matthews coefficient
Number copies in a.s.u.
Native Patterson
(translational NCS)
B factor analysis
Self RF
(rotational NCS)
Data analysis before MR
Interface to Sfcheck (currently in Validation&Deposition module)
completeness, anisotropy, Wilson B, twinning check,
pseudo-translation check
Finding search models
Need a PDB file for a structurally similar protein. This usually
means a homologous protein.
Either you have one already? 
Or you search the Protein Data Bank
Search is based on sequence alignment between target
protein and proteins in PDB.
Several bioinformatics tools can help here:
OCA, MSDlite, MSDtarget - all use FASTA
www.ebi.ac.uk/msd
psiBLAST - iterative searching
www.ncbi.nlm.nih.gov/BLAST
FFAS - profile-profile alignment
ffas.ljcrf.edu/ffas-cgi/cgi/ffas.pl
Editing search models
Don’t use a raw PDB file for Molecular Replacement unless it is
very similar (e.g. same protein, different conditions, ligand, etc.)
Edit it to:
• remove residues that don’t occur in the target
• remove side chain atoms that don’t occur in the target
(these assume a know alignment from model to target)
• remove uncertain regions of model (check B factors,
occupancies)
• remove flexible loops
Note that we don’t add anything!! Homology modelling?
Consider use of individual domains and multimers
(see MrBUMP below)
Chainsaw
Norman Stein, Daresbury Lab.
MR model preparation: chainsaw
•
Molecular replacement model preparation utility that edits a PDB
search model according to a sequence alignment.
•
Features:
– Removes un-aligned residues from the model
– Prunes non-conserved residues back to the gamma atom
– Preserves more atoms than in polyalanine model
Unmodified template
Chainsaw template
Polyalanine template
Example of 1mr6 used as a template for 1tgx (38% sequence identity)
Running Chainsaw:
complete
PDB file
model to
target
alignment
Alignment from:
original search tool (FASTA, psiBLAST, etc.)
multiple alignment (set of search models, protein family, etc.)
hand-created
Molrep
Alexei Vagin, York
http://www.ysbl.york.ac.uk/~alexei/molrep.html
Molrep: overview of functionality
Performs complete MR in single step:
Expt. data (MTZ)
Molrep
Positioned
search model
Search model (PDB)
• Individual steps for more difficult cases: CRF, TF, rigid-body
• Multi-copy search: locked CRF, dyad search
• Self RF
• Phased TF, spherically-averaged phased TF
• Improve search model
• Other search models: electron density map, NMR models
• Fit model in electron density map / EM map
MR for straightforward case via GUI:
title
mode
MTZ file
MTZ labels
search model
RUN IT!
Other parameters
DEFAULTS ARE GOOD
Low resolution cut-off
Molrep uses soft cut-off, Boff (BOFF, COMPL, RESMIN)
High resolution cut-off
Molrep uses soft cut-off, Badd (BADD, SIM)
|F|new = |F|input *exp(-Badd*s2)*(1-exp(-Boff*s2)
Defaults estimated
High resolution limit
Absolute cut-off (RESMAX)
Default estimated
Radius of Patterson sphere for CRF
Default is twice radius of gyration of search model,
Keyword RAD, Infrequently Used Parameters in GUI
Cross Rotation Function
polar angles
Euler angles (CCP4)
R factor
List of top
RF peaks
More details here
Translation Function
polar angles
fractional
translation
List of
solutions:
top TF for
each RF
solution
contrast of solution
R factor
Score
Identification of solutions
SCORE = product Correlation Coefficient and maximal value of
Packing Function
Packing Function integrated into TF search  removes solutions
with overlapping molecules
CONTRAST = ratio of top score to mean score:
>2.5 - definitely solution
<2.5 and > 1.8 - solution
<1.8 and > 1.5 - maybe solution
<1.5 and > 1.3 - maybe not solution, but program accepts it
<1.3 - probably not solution
Finding more than one copy in the asu
By default, Molrep will estimate number of copies to find.
Override with NMON keyword
Program flow:
CRF
TF for first copy
Fix first copy
TF for second copy
Fix second copy
TF for third copy
.
.
.
Solving complexes
• Choose first component (largest, highest similarity)
• Solve for first component (probably need to specify NMON
explicitly)
• New Molrep job
Model in - second component
Fixed in - positioned first component
• Repeat for all other components
Possibility to use spherically-averaged phased TF using phases
from first component
Phaser
Randy Read, Airlie McCoy, Cambridge
Phaser website:
http://www-structmed.cimr.cam.ac.uk/phaser/
Performs complete MR in single step:
Expt. data (MTZ)
Phaser
Positioned
search model
Search model (PDB)
Use “MODE MR_AUTO” or “automated search” in the GUI
• anisotropy correction
• fast rotation function
• fast translation function
• packing
• refinement and phasing
loop over models
More functionality ...
• All steps can be run separately
• Search over spacegroups:
MTZ spacegroup and enantiomorph
All spacegroups in MTZ point-group
Selected spacegroups
• Ensemble models (see later)
• Brute RF and TF - slow and accurate
• Normal mode analysis
Generates perturbed models
MR for straightforward case via GUI:
mode
MTZ file
target details
search model
specify search
RUN IT!
FRF
Euler angles (CCP4)
Top LLG and Z-scores
for FRF
FTF
fractional
translation
FRF solution
number
Top LLG and
Z-scores for
FRF
Packing
Phaser does packing check after FTF
Clashes = C atoms closer than 2Å
Default number of clashes = 0
Think about increasing to 2 or 5
Solution files:
.sol file produced at end of job
• Contains summary of all solutions
• Each solution contains rotations and usually translations 3DIM vs 6DIM
•One line per model located
•.sol file can be read back into Phaser in later jobs
Z-score
Have I solved it?
less than 5
5-6
6-7
7-8
more than 8
no
unlikely
possibly
probably
definitely
RFZ = RF Z-score
TFZ = TF Z-score
Ensemble models
Phaser refers to search models as “ensembles”
Often, ensemble contains single model, as in traditional MR
But Phaser can use an ensemble of > 1 models, which may work
better than any single model
Models in an ensemble must be superposed prior to use in Phaser
- use e.g. Superpose in CCP4
N.B. Phaser will complain if:
–
–
MW of models in ensemble are too different
RMS between models is too large
(In Molrep, construct ensemble as pseudo-NMR PDB file)
Finding more than one copy in the asu
Specify > 1 in Composition of the asymmetric unit
(keyword COMPOSITION ... NUMBER)
Specify > 1 in Number of copies to search for
(keyword SEARCH ... NUMBER)
Phaser will issue warnings if these numbers are wrong.
CRF
TF for first copy
Fix first copy (possibly multiple sets)
CRF for second opy
TF for second copy
Fix second copy (possibly multiple sets)
.
.
.
Complexes
As before, but:
• Define > 1 type of component
Composition of the asymmetric unit
Define another component
• Define > 1 ensemble
Define ensembles
Add ensemble
• Specify all searches
Search details
Add another search
E.g. beta-blip example in Phaser tutorial:
http://www-structmed.cimr.cam.ac.uk/phaser/tutorial/Phaser_MR_tute.html
MrBUMP
Ronan Keegan, Martyn Winn, Daresbury Lab.
The aim of MrBUMP
•An automation framework for Molecular Replacement.
•Particular emphasis on generating a variety of search models.
•Can be used to generate models only.
Wraps Phaser and/or Molrep.
•Also uses a variety of helper applications (e.g. Chainsaw)
and bioinformatics tools (e.g. Fasta, Mafft)
•Uses on-line databases (e.g. PDB, Scop)
•In favourable cases, gives “one-button” solution
•In unfavourable cases, will suggest likely search models
for manual investigation (lead generation)
The Pipeline
Target MTZ
&
Sequence
Target
`
Details
Template
`
Search
Check scores
and exit or select
the next model
Model
`
Preparation
Molecular Replacement
`
& Refinement
Search for homologous proteins
FASTA search of PDB
• Sequence based search using sequence of target structure.
• Can be run locally if user has fasta34 program installed or remotely
using the OCA web-based service hosted by the EBI.
All of the resulting PDB id
codes are added to a list
These structures are called
model templates
Search for additional similar structures
• Additional structure-based search (optional)
– Top hit from the FASTA search is used as the template structure
for a secondary structure based search.
– Uses the SSM webservice provided by the EBI (a.k.a. MSDfold)
– Any new structures found are
added to the list.
– Provides structural variation,
not based on direct sequence
similarity to target
• Manual addition
• Can add additional PDB id codes to the list, e.g. from FFAS
or psiBLAST searches
• Can add local PDB files
Multiple Alignment
• After the set of PDB ids are collected in the FASTA and
SSM searches, their coordinate-based sequences are
collected and put through a multiple alignment with the
target sequence
• Aims:
– Score template structures in a consistent manner, in order to
prioritise them for subsequent steps
– Extract pairwise alignment between template and target for use
in Chainsaw step. Multiple alignment should give a better set of
alignments than the original pair-wise FASTA alignments
Multiple Alignment
target
model
templates
pairwise
alignment
Jalview 2.08.1 Barton group, Dundee
currently support ClustalW or MAFFT for multiple alignment
Template Model Scoring
•
Alignment Scoring:
score = sequence identity X alignment quality
•
Sequence identity:
•
Alignment quality:
– Ungapped sequence identity i.e. sequence identity of aligned target
residues
– Dependent on the alignment length, the number of gaps created in the
template alignment and the extent of each of these gaps.
– The penalties given for gaps and the size of the gaps is biased so that
alignments that preserve domains of the structure rather than spreading the
aligned residues out score higher.
The top scoring models are then used for further processing
Domains
• Suitable templates for target
domains may exist in isolation
in PDB, or in combination with
dissimilar domains
• In case of relative domain
motion, may want to solve
domains separately
Domains
• Domains search:
– Top scoring templates from multiple alignment are tested to see
if they contain any domains.
– Uses the SCOP database. This only lists domains that appear
more than once in the PDB.
– The database is scanned to to see if domains exist for each of
the PDBs in the list of templates
– Domains are then extracted from the parent PDB structure file
and added to the list of template models as additional search
models for MR.
Multimers
• Multimer search:
– Search for quaternary structures that may be used as search
models.
– Better signal-to-noise ratio than monomer, if assembly is
correct for the target.
– Multimeric structures based on top templates are retrieved
using the PQS service at the EBI, and added to the list of
search models
– PQS will soon be replaced by the use of the PISA service
at the EBI (Eugene Krissinel)
1n5a
1n5b
1n5c
1n5d
SPLIT-ASU into
4 Oligomeric files of type TRIMERIC
SPLIT-ASU into
2 Oligomeric files of type DIMERIC
SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
SYMMETRY-COMPLEX Oligomeric file of type DIMERIC
Search Model Preparation
Search models prepared in four ways:
1.
PDBclip
–
original PDB with waters removed, hydrogens removed, most
probable conformations for side chains selected and chain ID’s added
if missing.
2.
Molrep
–
Molrep contains a model preparation function which will align the
template sequence with the target sequence and prune the nonconserved side chains accordingly.
–
Chainsaw
–
Can be given any alignment between the target and template
sequences.
–
Non-conserved residues are pruned back to the gamma atom.
1.
Polyalanine
–
Created by excluding all of the side chain atoms beyond the CB atom
using the Pdbset program
Also create an ensemble model for Phaser based on top 5 models
Molecular Replacement and Refinement
• The search models can be processed with Molrep or Phaser or
both.
• The resulting models from molecular replacement are passed to
Refmac for restrained refinement.
• The change in the Rfree value during refinement is used as rough
estimate of how good the resulting model is.
final Rfree < 0.35 or
final Rfree < 0.5 and dropped by 20%

“success”
final Rfree < 0.48 or
final Rfree < 0.52 and dropped by 5%

“marginal”

“failure”
otherwise
• MR scores and un-refined models available for later inspection.
MrBUMP on compute clusters
• MrBUMP can take advantage of a
compute cluster to farm out the
Molecular Replacement jobs.
• Currently Sun Grid Engine
enabled clusters are supported
but support will be added for LSF
and condor and any other types
of queuing system if there is
enough demand.
• All nodes terminate when one
finds a solution
Pre-release version of MrBUMP
• Pre-release made available in Jan 06
• Simple installation
• Currently runs on Linux and OSX.
• Windows version almost ready.
•Comes with CCP4 GUI .
•Can also be run from the command line
with keyword input
• First citation in Obiero et al., Acta Cryst.
(2006). F62, 757-760
•Regular updates (currently version 0.3.2)
http://www.ccp4.ac.uk/MrBUMP
A few observations ...
• In difficult cases, success in MrBUMP may depend on
particular template, chain and model preparation method
• Nevertheless, may get several putative solutions
• Ease of subsequent model re-building, model completion may
depend on choice of solution
• First solution or check everything?
• Expectation that quick solution required - in fact, most users
seem happy to let MrBUMP run for long time (hours, days)
• Worth checking “failed” solutions!
Download