Evaluating the Solution from MrBUMP and Balbes

advertisement
Evaluating the Solution from
MrBUMP and Balbes
Ronan Keegan, Fei Long, Martyn Winn, Garib
Murshudov
STFC Daresbury Laboratory
&
York Structural Biology Laboratory, University of York
PDB Depositions (1999-2009)
1999
2009
What are MrBUMP and Balbes?
• Automated molecular replacement (MR) and
beyond....
• MrBUMP:
– Brute force approach - try a variety of search
models prepared in several different ways in MR
• Balbes:
– Identifies the best possible search model to be
used in MR by using a specially grafted version of
the PDB optimised for use in MR
MrBUMP
•An automation framework for Molecular Replacement.
•Particular emphasis on generating a variety of search
models.
• Wraps Phaser and/or Molrep.
• Uses a variety of helper applications (e.g. Chainsaw) and
bioinformatics tools (e.g. Fasta, Mafft) to generate search models
• Makes use of up-to-date on-line databases (e.g. PDB, Scop)
•In favourable cases, gives “one-button” solution
•In Complicated Cases, will suggest likely search models for
manual investigation (lead generation)
The Pipeline
Target MTZ
&
Sequence
Target
`
Details
Template
`
Search
N templates
Model
`
Preparation
Check scores and
exit or select the
next model
N x M models
Molecular Replacement
`
& Refinement
Phase Improvement
Search for model templates
• FASTA search of PDB
– Sequence based search using sequence of target structure
All of the resulting PDB id codes
are added to a list
These structures are called
model templates
• Other templates from:
o SSM search using top hit from the FASTA search
o Can add additional PDB id codes to the list
o Can add local PDB files
Multiple Alignment step
target
model
templates
Jalview 2.08.1 Barton group, Dundee
pairwise
alignment
(used in
Chainsaw)
currently support ClustalW, MAFFT, probcons or T-coffee for multiple alignment
Model template scoring: score = sequence identity X alignment quality
Domain 1
X
Domain 2
SCOP
Domains
e.g. if relative
domain motion
PQS/PISA
Multimers
template chains
superpose
Better signal-tonoise ratio than
monomer, if
assembly is correct
for the target.
Ensembles
Create ensemble of top search
models, for use in additional run of
Phaser.
Need to be similar in MW and rmsd
Search Model Preparation
Search models prepared in four ways:
PDBclip
–
original PDB with waters removed, most
probable conformations selected and format
tidied (e.g. chain ID added)
Molrep
–
Molrep model preparation function which
aligns the template sequence with the target
more side
sequence and prunes the non-conserved side
chain
chains accordingly.
truncation
Chainsaw
–
Can be given any alignment between the
target and template sequences. Nonconserved residues are pruned back to the
gamma atom.
Polyalanine
–
Created by excluding all of the side chain
atoms beyond the CB atom using the Pdbset
program
deal with
deletions
Molecular replacement step
Running MR
• For each search model, MR done with Molrep or
Phaser or both.
• MR programs run mostly with defaults
• MrBUMP provides LABIN columns, MW of target,
sequence identity of search model, number of copies
to search for, number of clashes tolerated
Molecular replacement step
• MR output
– MR scores and un-refined
models available for later
inspection
 assess quality of
solution, extent of
model bias
– MrBUMP doesn’t use MR
scores, but checks for
output file with positioned
model, and passes to
refinement step
Testing enantiomorphic
spacegroups
– 11 pairs of enantiomorphic spacegroups containing screw axes
of opposite handedness, e.g. P41 and P43)
– usually both need to be tested in MR
– correct spacegroup indicated by TF and packing
– MrBUMP can test both in Molrep and/or Phaser.
– For each search model, best MR results used to fix
spacegroup for subsequent steps.
– Discrimination good for good search model + correct MR
solution
Inclusion of fixed models
• MrBUMP will now accept one or more positioned models.
• These are included as fixed models in all MR jobs.
• Thus, solve complexes through consecutive runs of MrBUMP.
• Automation of this in progress ....
Restrained refinement step
• The resulting models from molecular replacement are passed to Refmac
for restrained refinement.
• The change in the Rfree value during refinement is used as rough estimate
of how good the resulting model is.
final Rfree < 0.35 or
final Rfree < 0.5 and dropped by 20%
final Rfree < 0.48 or
final Rfree < 0.52 and dropped by 5%
otherwise
conservative .....

“success”

“marginal”

“poor”
Refinement results
Summarised results..
Best search model so far
and file location for this
model
List of sorted
results so far
Example (thanks to Elien
Vandermarliere)
Target is an arabinofuranosidase
Data to 1.55Å in P212121
Small C domain (144 res) solved with 34% seq ident
model
(1w9t_B_MOLREP best out of 4 solutions)
With C domain solution fixed, large N domain (345 res) solved with 28% seq ident
model
(1gyh_C_CHNSAW best out of 7 solutions)
Acorn: CC increases from 0.04 to 0.18
This step part of MrBUMP
ARP/wARP then builds 457/493 residues to R/Rfree 0.185/0.225
MrBUMP output
Log file gives summary of models tried and
results of MR
• May get several putative solutions
• Ease of subsequent model re-building,
model completion may depend on choice of
solution
• Worth checking “poor” solutions
Top solution available from ccp4i
Using MrBUMP
• Part of CCP4 suite
• CCP4i GUI
(Balbes)
Fei Long, Alexei Vagin, Paul Young, Garib Murshudov
YSBL, York University
http://www.ysbl.york.ac.uk/~fei/balbes/index.html
Balbes
• Balbes uses a reorganised version of the PDB
database customised for use in MR.
• Has its own classification of domains most suitable
for use in molecular replacement
• Database also includes classification of possible
oligomers based on the basic structures.
• Built upon Molrep for molecular replacement and
Refmac for doing the refinement
Balbes Database
• Cut down version of the PDB keeping only single copies of
unique folds.
• Where 2 entries have sequence ID > 80 % and rmsd < 1
angstrom, the entry with the highest resolution is retained
• More than 20000 entries
• More than 23000 domains classified
• For each sequence additional information is stored including
secondary structure information, number of domains and
potential to form multimers
• Flexible loops are cut from the stored structures
• Multimers are classified with information from the EBI PISA
service
Balbes Workflow
Assemblies
• Fully automatic support for handling
assemblies
• Simply provide additional sequences in input
sequence file for each additional component
in the target data
• Balbes will first look for assemblies in its
database containing all of the sequences but if
not find it will look for subsets of those
sequences
Using Balbes
•
Through CCP4i:
– included in Linux and Mac OSX releases
– Simple CCP4i interface - provide sequence and structure
factor file (MTZ)
– Enter multiple sequences in input file for complexes
Using Balbes
• Through the YSBL
Web Server:
– Balbes is one of
several programs
available to use via
the web at the York
University YSBL
Web Server
– Create an account
for access
http://www.ysbl.york.ac.uk/YSBLPrograms/index.jsp
Balbes Web Server
• Enter MTZ and
sequence information
as a file or paste
sequence
• Option to check all
related space groups
• Option to submit
resulting map to
ARP/wARP server for
final model building
Web Server Results
• Summary of processing for
each spacegroup
• Final best result highlighted
• For each spacegroup log file
and all output files are made
available for download (5
days)
• If user opted to use
ARP/wARP server a link to the
ARP/wARP results is provided
Balbes Output
• Spacegroup specific
output:
– Download files
– Main summary file
showing results of
MR and refinement
for each template
model that was
used.
– Q value scoring
Balbes – the statistics
• In 2006, 67% of structures deposited in the PDB
where solved by MR
• Balbes improved this to 80% using
– A better organised database (for MR)
– A better choice of protocols for tackling the MR problem
– Improved algorithms in MR and refinement programs
Balbes Usage
• 90 structures in PDB referencing use of Balbes
as part of structure solution
• 35 citations
• More than 5000 users of web server version of
program
• ~ 150 users per month
Summary
• MrBUMP:
– Brute force approach, try everything
– May give lead in borderline cases as well automating
straight-forward cases
– All results are easily accessible and summarised
• Balbes:
– Efficient and quick at solving structures that have
reasonable homologues
– Deals with assemblies/complexes fully automatically
– Combined with ARP/wARP it can take structure nearly all
the way to completion
• In conclusion:
– Try Both!! Compute cycles are very cheap.
Acknowledgements
• Fei Long, Alexei Vagin, Paul Young, Andrey
Lebedev, Garib Murshudov – Balbes and
Refmac
• Martyn Winn, MrBUMP
• Airlie McCoy, Randy Read – Phaser
• Alexi Vagin – Molrep
• Norman Stein – Chainsaw and Ctruncate
• CCP4 Core team – Support and Testing
Downloading Balbes Database
– Balbes database available as a separate
download (1.6GB)
PDB Statistics
• The number of structures in the PDB is
increasing rapidly year on year
• By the end of 2009 there were > 62000
structures deposited
• 7449 structures were deposited in 2009 (12 %
of the total)
• MrBUMP and Balbes seek to exploit this
wealth of data to improve the success rate of
molecular replacement
Download