smasb_pdbeshape_instruct_2015 - CCP-EM

advertisement
Rigid body matching/fitting in intermediatelow resolution density
Agnel Praveen Joseph
STFC, Harwell
Data organisation & annotation
(Work along with the data submission pipeline at EMDB/PDBe and as a
development for CCP-EM)
Web-server
Database
User’s query map/
New Entry
Sample classigication
Taxonomy
Pfam domains (fitted models)
Uniprot ID (text search/Pfam link)
User’s query PDB/
New Entry
Alignment scores, transformation
matrix
Methods
http://3dem.ucsd.edu/SI_Table1_ver2.pdf
Villa and Lasker, Finding the right fit: chiseling structures out of cryo-electron microscopy mapsCurr Opin Struct Biol.
2014
Density based 6D search :
•
Fast Translational Matching (COLORES) : Fast Fourier Transform (FFT) driven translational search at each
rotation (Chacon and Wriggers, 2002). Cross correlation scores with or without Laplacian filters, are used for
matching
•
Author test cases: Simulated maps down to 34Å resolution were used, and correct alignments were
obtained down to 25Å resolution using Laplacian correlation filter. Components of microtubule were also
assembled successfully in a 20Å experimental map.
•
Fast Rotational Matching (ADP-EM,) : Spherical Harmonics accelerated rotational search at each translation
(Garzon et al. 2007, Kovacs et al. 2003). Cross correlation scores with or without Laplacian filters, are used for
matching
•
Author test cases: Simulated maps of down to 30Å were used for testing and accurate results were
obtained even at 30Å resolution, using higher harmonic bandwidth values. Fitting components on a 23Å
experimental map was also successful.
•
Random Sampling (CHIMERA,ModEM): Randomly sample rotation and translation space (Goddard et al.
2007, Topf et al.2005). Overlap and correlation based scores are used for matching.
Methods
Reduced representation :
•
Gaussian Mixture Model (GMFIT): Linear combination of N 3D gaussian density (Kawabata, 2008). A
gaussian overlap metric is used to score the alignment.
•
•
Author test cases: With the right choice of the number of GDFs, correct alignments were obtained
using simulated maps down to 30Å resolution. Tests on a 23.5Å experimental map also gave ‘nearnative’ alignment with respect to a reference.
FoldEM : Local density gradients are described as orthogonal feature vectors which are rotationally
invariant (represented as vectors covering a set of neighboring grid points: Local Region Descriptors).
Graphs are constructed with these descriptors as nodes and a graph-matching technique is applied to detect
the maximal sub-graph. The size of the common sub-graph is returned as the score, along with an atom
inclusion score from Chimera (Pettersen et al., 2004).
•
Author test cases: Tests were carried out with simulated EM maps down to 20Å resolution and
correct fits were obtained with resolutions down to 15Å. Comparison of 11Å ribosome
(experimental) maps in two conformational states also gave a good alignment.
GMfit
Number of 3D gaussians used to approximate the density/model depends on the number
of local features/components.
Both map and model needs to be converted into gaussian mixture models prior to
matching
EMD-1056, E-coli 70S, 9Å
EMD-2017, E-coli 30S, 13.5Å
Alignment of gaussian mixtures
ADP-EM
Spherical harmonics are an infinite set of harmonic functions
defined on the sphere. The basis functions are indexed
according to two integer constants, the order, l, and the degree,
m. As the number of coefficients increases, higher frequency
signals can be approximated more accurately.
The two arguments l and m break the family of polynomials
into bands of functions
Option of laplacian filter for
gradient detection
Atomic structure modeling and processing
 If experimentatly determined structures are not available: Fold
recognition and Homology modeling
 Remove flexible loops. Separate compact domains connected
by a hinge/flexible loop.
 Check if sub-complexes can be modelled
2631/4umm : 11.6Å cryo-EM structure of palindromic DNA bound USP/EcR nuclear
receptor
Maletta et al. The palindromic DNA-bound USP/EcR nuclear receptor adopts an
asymmetric organization with allosteric domain positioning. Nat comm 2014
A case: 1534/2y7c/2y7h
18Å EcoK1 methyltransferase with a bound antirestriction protein
HsdM
T7 phage antirestriction
protein ocr
HsdS-ocr complex modeled using
guided multi-body docking by
HADDOCK
(http://haddock.science.uu.nl/services/
HADDOCK2.2/haddock.php)
HsdS
ROSETTA(http://robetta.
bakerlab.org/) ab-inito
model
Volume processing
• Shift background peak to zero
• Some scores to measure fit quality are sensitive to scale of data values
e.g: overlap and correlation (not about mean) metrics in Chimera use
sum of products of density values.
• Select a contour threshold
• Usually subjective. Can be calculated from molecular weight of sample.
• Calculations on a subset from EMDB shows a peak at 2 sigma
A higher contour may be considered for
local weak density regions:
Contour level suggested by authors (EMD5287, 26Å).
Search space
• Probe/target size ratio should be considered unless you are using an exhaustive search
method.
• Target map should be segmented to increase probe/target(segment) size ratio.
Gmfit: Random sampling (-I R), segmentation (-I SF) and symmetry based search (-I Y)
options are available.
adp_em: masking search (-s 2, default): the translational space is limited to positions on
which the dimension of the probe (atomic structure) roughly fits inside the experimental
EM map.
radial search (-s 1): more uniform and useful for structures with holes.
exhaustive search (-s 0)
Chimera parameter : search N (number of random configurations)
Translation and rotation Sampling
• Depends on resolution/grid spacing – coarse sampling for low
resolution maps
• Size of probe : finer rotation sampling for a larger probe
• adp_em : higher bandwidth - finer rotational sampling and more
accurate harmonic description. 16 default. Bandwidth values
correspond to 360/(2*B) degrees of rotation.
translational sampling in Å: -t (one/two voxel steps)
Try different methods: An example EMD-2631
TUTORIAL
For intermediate/low resolution fitting:
• Multiple methods might have to be tried
• Different solutions might have to be checked (for those that are structurally
stable, functionally relevant and supported by experiments)
• Multiple scores might be required to re-rank the solutions : surface/envelope
matching scores are especially useful at low resolutions.
/scratch/ccpem_tutorial/apj_tutorial_22_04/exa
mples/2631
module load chimera
module load adp_em
module load gmfit
module load gmconvert
Volume matching pipeline
(ccp-EM/EMDB-PDBe)
Current version under development
• Database of EMDB volume alignments and PDB model –
EMDB volume comparisons (fitting)
• Web service and software for volume-volume and modelvolume alignments
• Web service to search user model/map in EMDB for similar
volumes, volumes from certain taxa and sample categories.
• Web service and software for 3D analysis of alignments, map
feature representations (gaussian mixtures, points), calculate
difference maps.
• Annotation (sequence/interactions/taxonomy) of entries in
EMDB
Volume pre-processing
Contoured density
Segmentation
Gaussian mixture
Dusting
Feature points
Scores (TEMPy)
TEMPy (Farabella et al.) scores used for re-ranking hits obtained.
Global scores are influenced by non uniform noise, interpolation
and padding effects. Local scores used - maps need to be
contoured prior to calculations.
If the resolution of the map is very low to be informative (density
variation) – Envelope/Surface based scores may be useful.
Local cross-correlation, Local mutual information, surface
distance score and Overlap score are used for re-ranking solutions
Examples
Human Echovirus 12: 16Å vs
Human Coxsackievirus A21: 8Å
E-coli 70S ribosome: 9Å vs
E-coli 30S ribosome: 13.5Å
Methanococcus maripaludis
Mm-cpn chaperone: 4.9Å vs
E-coli GroEL/GroES (mut)
chaperone: 9.2Å
Surface definitions based on a given contour:
22.0Å Dengue virus
9Å 70S E coli vs 6.6Å 80S yeast
Surface point
definitions
Surface feature extraction
Partial surface
overlap detection
Release
• Test version of volume matching software is available at:
http://www.ccpem.ac.uk/download.php
– Currently full functionalities are not included: multiple scores, user
map input, more methods to be added (adp_em, shapeEM)
– Fully functional version should be released in the next few months.
• Web service (EMDB/PDBe) will be also available in the next few months
will all functionalities
Thank You
Martyn Winn
Maya Topf
Ardan Patwardhan
Ingvar Lagerstedt
Download