dimaio.ismb06.ppt

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry Department University of Wisconsin – Madison USA Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006), Fortaleza, Brazil, August 7, 2006 X-ray Crystallography X-ray beam FFT Protein Crystal Collection Plate Electron Density Map (“3D picture”) Given: Sequence + Density Map O O N O N N O N O N O N O N H Sequence + Electron Density Map Find: Each Atom’s Coordinates O O N O N N O N O N O N O N H Our Subtask: Backbone Trace Cα Cα Cα Cα The Unit Cell  3D density function ρ(x,y,z) provided over unit cell  Unit cell may contain multiple copies of the protein The Unit Cell  3D density function ρ(x,y,z) provided over unit cell  Unit cell may contain multiple copies of the protein Density Map Resolution 2Å 4Å 3Å ARP/wARP TEXTAL (Perrakis et al. 1997) (Ioerger et al. 1999) Resolve (Terwilliger 2002) Our focus Overview of ACMI (our method)  Local Match   Algorithm searches for sequence-specific 5-mers centered at each amino acid Many false positives  Global Consistency   Use probabilistic model to filter false positives Find most probable backbone trace 5-mer Lookup and Cluster …VKH V LVSPEKIEELIKGY… PDB Cluster 1 wt=0.67 Cluster 2 wt=0.33 NOTE: can be done in precompute step 5-mer Search  6D search (rotation + translation) for representative structures in density map  Compute “similarity” t ( x)    frag ( y) frag ( y)   map ( x  y)  2 y  Computed by Fourier convolution (Cowtan 2001)  Use tuneset to convert similarity score to probability Convert Scores to Probabilities NEG POS match to tuneset score distributions 5-mer representative Bayes’ rule probability distribution over unit cell P(5-mer at ui | Map) search density map scores ti (ui) In This Talk…  Where we are now For each amino acid in the protein, we have a probability distribution over the unit cell P(ui | Map)  Where we are headed Find the backbone layout maximizing    P(u | Map)    P(conformation {ui , u j })    i  AAs i  AA-pairs i, j   Pairwise Markov Field Models  A type of undirected graphical model  Represent joint probabilities as y product of vertex and edge potentials  Similar to (but more general than) u1 u2 u3 Bayesian networks p(U | y )   edges s t ψst (us ,ut )  vertices s s (us | y ) Protein Backbone Model ALA GLY LYS LEU  Each vertex is an amino acid  Each label ui  xi , qi is location + orientation  Evidence y is the electron density map  Each vertex (or observational) potential comes from the 5-mer matching  i (ui | y) Protein Backbone Model ALA GLY LYS LEU  Two types of edge (or structural) potentials  Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in the proper orientation Protein Backbone Model ALA GLY LYS LEU  Two types of structural (edge) potentials   Adjacency constraints ensure adjacent amino acids are ~3.8Å apart and in the proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space Backbone Model Potential p(u | Map)   ψadj (ui ,u j )  adjacent AAs i j  ψocc (ui ,u j )  nonadjacent AAs i j   i (ui | Map) amino acid i Constraints between adjacent amino acids: ψadj (ui ,u j ) = px (|| xi  x j ||) x p (ui ,u j ) Backbone Model Potential p(u | Map)   adjacent AAs i j ψadj (ui ,u j )   ψocc (ui ,u j )  nonadjacent AAs i j   i (ui | Map) amino acid i Constraints between nonadjacent amino acids: 0 if || xi  x j || K ψocc  ui ,u j    1 otherwise Backbone Model Potential p(u | Map)   ψadj (ui ,u j )  adjacent AAs i j  nonadjacent AAs i j ψocc (ui ,u j )    i (ui | Map) amino acid i Observational (“amino-acid-finder”) probabilities ψi  ui | Map   Pr(5mer si -2 ...si  2 at ui ) Probabilistic Inference  Want to find backbone layout that maximizes  adjacent AAs i j ψadj (ui ,u j )   ψocc (ui ,u j )  nonadjacent AAs i j   i (ui | Map) amino acid i  Exact methods are intractable  Use belief propagation (BP) to approximate marginal distributions pi (ui | Map)    p(U | Map) uk , k  i Belief Propagation (BP)  Iterative, message-passing method (Pearl 1988)  A message, min j , from amino acid i to amino acid j indicates where i expects to find j  An approximation to the marginal (or belief) bin, is given as the product of incoming messages Belief Propagation Example ALA 02 bALA ( xALA | y ) GLY 2 m1GLY ALA GLY ALA ( xGLY ALA ) 10 bGLY ( xGLY | y ) Technical Challenges  Representation of potentials Store Fourier coefficients in Cartesian space  At each location x, store a single orientation r   Speeding X up O(N2X2) naïve implementation = the unit cell size (# Fourier coefficients)  N = the number of residues in the protein Speeding Up O(N2X2) Implementation  O(X2) computation for each occupancy message    O(N2) messages computed & stored    Each message must integrate over the unit cell O(X log X) as multiplication in Fourier space Approx N-3 occupancy messages with a single message O(N) messages using a message product accumulator Improved implementation O(NX log X) 1XMT at 3Å Resolution prob(AA at location) HIGH 0.82 0.17 LOW 1.12Å RMSd 100% coverage 1VMO at 4Å Resolution prob(AA at location) HIGH 0.25 0.02 LOW 3.63Å RMSd 72% coverage 1YDH at 3.5Å Resolution prob(AA at location) HIGH 0.27 0.02 LOW 1.47Å RMSd 90% coverage Experiments  Tested ACMI against other map interpretation algorithms: TEXTAL and Resolve  Used ten model-phased maps  Smoothly diminished reflection intensities yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps RMS Deviation Cα RMS Deviation 12 ACMI ACMI ACMI Textal Textal Resolve Resolve 10 8 6 4 2 0 2.0 2.5 3.0 3.5 4.0 Density Map Resolution 4.5 Model Completeness % residues identified % chain traced 100% 80% 60% ACMI ACMI ACMI 40% Textal Textal 20% Resolv11e Resolve 0% 2.0 2.5 3.0 3.5 4.0 4.5 100% 80% 60% 40% 20% 0% 2.0 2.5 Density Map Resolution 3.0 3.5 4.0 4.5 16 14 12 10 8 2.5A 3.0A 3.5A 4.0A 6 4 2 0 0 2 4 6 8 10 12 14 16 Resolve RMS Error TEXTAL RMS Error Per-protein RMS Deviation 16 14 12 10 8 6 4 2 0 0 ACMI RMS Error 2 4 6 8 10 12 14 16 Conclusions  ACMI effectively combines weakly-matching templates to construct a full model  Produces an accurate trace even with poor-quality density map data  Reduces computational complexity from O(N2 X2) to O(N X log X)  Inference possible for even large unit cells Future Work  Improve “amino-acid-finding” algorithm  Incorporate sidechain placement / refinement  Manage missing data Disordered regions  Only exterior visible (e.g., in CryoEM)  Acknowledgements  Ameet Soni  Craig Bingman  NLM grants 1R01 LM008796 and 1T15 LM007359

dimaio.ismb06.ppt

Related documents

Products

Support

dimaio.ismb06.ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib