dimaio.ismb06.ppt

advertisement
A Probabilistic Approach to
Protein Backbone Tracing in
Electron Density Maps
Frank DiMaio, Jude Shavlik
Computer Sciences Department
George Phillips
Biochemistry Department
University of Wisconsin – Madison
USA
Presented at the Fourteenth Conference on Intelligent Systems for Molecular Biology (ISMB 2006),
Fortaleza, Brazil, August 7, 2006
X-ray Crystallography
X-ray beam
FFT
Protein
Crystal
Collection
Plate
Electron
Density Map
(“3D picture”)
Given: Sequence + Density Map
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Sequence + Electron Density Map
Find: Each Atom’s Coordinates
O
O
N
O
N
N
O
N
O
N
O
N
O
N
H
Our Subtask: Backbone Trace
Cα
Cα
Cα
Cα
The Unit Cell
 3D density function ρ(x,y,z) provided over unit cell
 Unit cell may contain multiple copies of the protein
The Unit Cell
 3D density function ρ(x,y,z) provided over unit cell
 Unit cell may contain multiple copies of the protein
Density Map Resolution
2Å
4Å
3Å
ARP/wARP
TEXTAL
(Perrakis et al. 1997)
(Ioerger et al. 1999)
Resolve
(Terwilliger 2002)
Our focus
Overview of ACMI (our method)
 Local Match


Algorithm searches for sequence-specific
5-mers centered at each amino acid
Many false positives
 Global Consistency


Use probabilistic model to filter false positives
Find most probable backbone trace
5-mer Lookup and Cluster
…VKH V LVSPEKIEELIKGY…
PDB
Cluster 1
wt=0.67
Cluster 2
wt=0.33
NOTE: can be done in precompute step
5-mer Search
 6D search (rotation + translation) for
representative structures in density map
 Compute “similarity”
t ( x)    frag ( y) frag ( y)   map ( x  y) 
2
y
 Computed by Fourier convolution (Cowtan 2001)
 Use tuneset to convert similarity score to probability
Convert Scores to Probabilities
NEG
POS
match to tuneset
score
distributions
5-mer representative
Bayes’
rule
probability
distribution
over unit cell
P(5-mer at ui | Map)
search density map
scores ti (ui)
In This Talk…
 Where we are now
For each amino acid in the protein, we have a
probability distribution over the unit cell
P(ui | Map)
 Where we are headed
Find the backbone layout maximizing

  P(u | Map)   
P(conformation {ui , u j }) 


i
 AAs i
 AA-pairs i, j


Pairwise Markov Field Models
 A type of undirected graphical model
 Represent joint probabilities as
y
product of vertex and edge potentials
 Similar to (but more general than)
u1
u2
u3
Bayesian networks
p(U | y ) 

edges s t
ψst (us ,ut )

vertices s
s
(us | y )
Protein Backbone Model
ALA
GLY
LYS
LEU
 Each vertex is an amino acid
 Each label
ui  xi , qi is location + orientation
 Evidence y is the electron density map
 Each vertex (or observational) potential
comes from the 5-mer matching
 i (ui | y)
Protein Backbone Model
ALA
GLY
LYS
LEU
 Two types of edge (or structural) potentials

Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Protein Backbone Model
ALA
GLY
LYS
LEU
 Two types of structural (edge) potentials


Adjacency constraints ensure adjacent amino acids
are ~3.8Å apart and in the proper orientation
Occupancy constraints ensure nonadjacent amino
acids do not occupy same 3D space
Backbone Model Potential
p(u | Map) 

ψadj (ui ,u j ) 
adjacent AAs
i j

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
Constraints between adjacent amino acids:
ψadj (ui ,u j )
=
px (|| xi  x j ||)
x
p (ui ,u j )
Backbone Model Potential
p(u | Map) 

adjacent AAs
i j
ψadj (ui ,u j ) 

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
Constraints between nonadjacent amino acids:
0 if || xi  x j || K
ψocc  ui ,u j   
1 otherwise
Backbone Model Potential
p(u | Map) 

ψadj (ui ,u j ) 
adjacent AAs
i j

nonadjacent AAs
i j
ψocc (ui ,u j ) 

 i (ui | Map)
amino acid i
Observational (“amino-acid-finder”) probabilities
ψi  ui | Map  
Pr(5mer si -2 ...si  2 at ui )
Probabilistic Inference
 Want to find backbone layout that maximizes

adjacent AAs
i j
ψadj (ui ,u j ) 

ψocc (ui ,u j ) 
nonadjacent AAs
i j

 i (ui | Map)
amino acid i
 Exact methods are intractable
 Use belief propagation (BP) to approximate
marginal distributions
pi (ui | Map)  
 p(U | Map)
uk , k  i
Belief Propagation (BP)
 Iterative, message-passing method (Pearl 1988)
 A message,
min j , from amino acid i to
amino acid j indicates where i expects to find j
 An approximation to the marginal (or belief)
bin,
is given as the product of incoming messages
Belief Propagation Example
ALA
02
bALA
( xALA | y )
GLY
2
m1GLY
ALA
GLY
ALA ( xGLY
ALA )
10
bGLY
( xGLY | y )
Technical Challenges
 Representation
of potentials
Store Fourier coefficients in Cartesian space
 At each location x, store a single orientation r

 Speeding
X
up O(N2X2) naïve implementation
= the unit cell size (# Fourier coefficients)
 N = the number of residues in the protein
Speeding Up O(N2X2) Implementation

O(X2) computation for each occupancy message



O(N2) messages computed & stored



Each message must integrate over the unit cell
O(X log X) as multiplication in Fourier space
Approx N-3 occupancy messages with a single message
O(N) messages using a message product accumulator
Improved implementation O(NX log X)
1XMT at 3Å Resolution
prob(AA at location)
HIGH
0.82
0.17
LOW
1.12Å RMSd
100% coverage
1VMO at 4Å Resolution
prob(AA at location)
HIGH
0.25
0.02
LOW
3.63Å RMSd
72% coverage
1YDH at 3.5Å Resolution
prob(AA at location)
HIGH
0.27
0.02
LOW
1.47Å RMSd
90% coverage
Experiments
 Tested ACMI against other map interpretation
algorithms: TEXTAL and Resolve
 Used ten model-phased maps
 Smoothly diminished reflection intensities
yielding 2.5, 3.0, 3.5, 4.0 Å resolution maps
RMS Deviation
Cα RMS Deviation
12
ACMI
ACMI
ACMI
Textal
Textal
Resolve
Resolve
10
8
6
4
2
0
2.0
2.5
3.0
3.5
4.0
Density Map Resolution
4.5
Model Completeness
% residues identified
% chain traced
100%
80%
60%
ACMI
ACMI
ACMI
40%
Textal
Textal
20%
Resolv11e
Resolve
0%
2.0
2.5
3.0
3.5
4.0
4.5
100%
80%
60%
40%
20%
0%
2.0
2.5
Density Map Resolution
3.0
3.5
4.0
4.5
16
14
12
10
8
2.5A
3.0A
3.5A
4.0A
6
4
2
0
0
2
4
6
8
10 12 14 16
Resolve RMS Error
TEXTAL RMS Error
Per-protein RMS Deviation
16
14
12
10
8
6
4
2
0
0
ACMI RMS Error
2
4
6
8 10 12 14 16
Conclusions
 ACMI effectively combines weakly-matching
templates to construct a full model
 Produces an accurate trace even with
poor-quality density map data
 Reduces computational complexity from
O(N2 X2) to O(N X log X)
 Inference possible for even large unit cells
Future Work
 Improve “amino-acid-finding” algorithm
 Incorporate sidechain placement /
refinement
 Manage missing data
Disordered regions
 Only exterior visible (e.g., in CryoEM)

Acknowledgements
 Ameet Soni
 Craig Bingman
 NLM grants 1R01 LM008796 and
1T15 LM007359
Download