Remote Homology Detection of Beta-Structural Motifs Using Random Fields ISMB 3Dsig 2010

advertisement
Remote Homology Detection of
Beta-Structural Motifs Using
Random Fields
Matt Menke, Tufts
Bonnie Berger, MIT
Lenore Cowen, Tufts
ISMB 3Dsig 2010
July 10, 2010
Inferring structural similarity from homology is
hard at the SCOP superfamily/fold level
Profile HMMs
HMM is trained from Sequence
Alignment of Known Structures
But: cannot capture pariwise
long-range beta-sheet interactions!
HMMs cannot capture statistical preferences
from residues close in space but far, and a
variable distance apart in seq.
Pectate Lyase C (Yoder et al. 1993)
Look at Just Pairs or Generalize to
Markov Random Fields
Only look at Pairs:
B3
T2
Generalize to Markov
Random Fields
Liu et al. 2009
Zhao et al. 2010
B2
B1
Menke et al. 2010
(This work)
[Bradley, Cowen, Menke, King, Berger, PNAS, 2001, 98:26, 14,819-14,824 ;
Cowen, Bradley, Menke, King, Berger (2002), J Comp Biol, 9, 261-276]
Let’s look at what this would mean
for propeller folds
Goal: capture HMM sequence information and
pairwise information in beta-structural motifs at
the same time!
SCOP (http://scop.mrc-lmb.cam.ac.uk/scop
Structural Motifs Using Random
Fields
SMURF
Structural Motifs Using Random
Fields
Can we get
the benefit
of pairwise
correlations
without having
to throw away
all sequence info?
The template is learned from
solved structures in the PDB
The template is learned from solved
structures in the PDB:
Aligned with Matt
Digression: Matt structural
alignment program
Menke, Berger, Cowen,
(PLOS Combio 2008)
Specifically designed to align
more distant homologs
AFP chaining using dynamic
programming with
“translations and twists”
(flexibility)
The template is learned from solved
structures in the PDB:
Aligned with Matt
Two beta tables are learned from amphapathic beta sheets
that are not propellers from solved structures in the PDB.
A
A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
0.78 0.18 0.14 0.15 0.59 0.70 0.06 1.06 0.07 1.19 0.17 0.12 0.05 0.11 0.08 0.22 0.25 1.53 0.17 0.27
C 0.18 0.24 0.03 0.06 0.12 0.14 0.05 0.28 0.03 0.34 0.07 0.02 0.01 0.03 0.02 0.05 0.08 0.39 0.10 0.10
D 0.14 0.03 0.03 0.06 0.10 0.15 0.02 0.11 0.01 0.16 0.05 0.07 0.01 0.05 0.08 0.07 0.11 0.16 0.03 0.03
E
0.15 0.06 0.06 0.05 0.26 0.18 0.14 0.40 0.10 0.57 0.08 0.10 0.02 0.08 0.15 0.19 0.25 0.57 0.05 0.18
F 0.59 0.12 0.10 0.26 0.66 0.61 0.10 1.06 0.05 1.19 0.24 0.08 0.05 0.15 0.08 0.13 0.22 1.35 0.13 0.43
G 0.70 0.14 0.15 0.18 0.61 0.58 0.10 0.77 0.07 1.13 0.11 0.23 0.07 0.17 0.09 0.24 0.31 1.27 0.18 0.48
H 0.06 0.05 0.02 0.14 0.10 0.10 0.04 0.13 0.02 0.13 0.04 0.05 0.01 0.01 0.02 0.06 0.09 0.23 0.03 0.07
I
1.06 0.28 0.11 0.40 1.06 0.77 0.13 2.27 0.10 2.21 0.38 0.14 0.05 0.29 0.13 0.26 0.45 2.56 0.18 0.42
K
0.07 0.03 0.01 0.10 0.05 0.07 0.02 0.10 0.03 0.16 0.03 0.04 0.00 0.05 0.01 0.05 0.05 0.17 0.02 0.10
L
1.19 0.34 0.16 0.57 1.19 1.13 0.13 2.21 0.16 2.96 0.48 0.18 0.06 0.33 0.18 0.29 0.36 2.64 0.25 0.50
M 0.17 0.07 0.05 0.08 0.24 0.11 0.04 0.38 0.03 0.48 0.10 0.01 0.01 0.03 0.04 0.06 0.07 0.49 0.08 0.06
N 0.12 0.02 0.07 0.10 0.08 0.23 0.05 0.14 0.04 0.18 0.01 0.05 0.01 0.05 0.06 0.12 0.16 0.18 0.04 0.08
P
0.05 0.01 0.01 0.02 0.05 0.07 0.01 0.05 0.00 0.06 0.01 0.01 0.01 0.01 0.01 0.02 0.02 0.09 0.02 0.04
Q 0.11 0.03 0.05 0.08 0.15 0.17 0.01 0.29 0.05 0.33 0.03 0.05 0.01 0.04 0.08 0.17 0.17 0.27 0.05 0.13
R 0.08 0.02 0.08 0.15 0.08 0.09 0.02 0.13 0.01 0.18 0.04 0.06 0.01 0.08 0.04 0.05 0.07 0.16 0.02 0.07
S
0.22 0.05 0.07 0.19 0.13 0.24 0.06 0.26 0.05 0.29 0.06 0.12 0.02 0.17 0.05 0.17 0.15 0.29 0.08 0.09
T
0.25 0.08 0.11 0.25 0.22 0.31 0.09 0.45 0.05 0.36 0.07 0.16 0.02 0.17 0.07 0.15 0.25 0.44 0.03 0.11
V
1.53 0.39 0.16 0.57 1.35 1.27 0.23 2.56 0.17 2.64 0.49 0.18 0.09 0.27 0.16 0.29 0.44 3.74 0.23 0.64
Exposed Residue
W 0.17 0.10 0.03 0.05 0.13 0.18 0.03 0.18 0.02 0.25 0.08 0.04 0.02 0.05 0.02 0.08 0.03 0.23 0.05 0.05
Y
0.27 0.10 0.03 0.18 0.43 0.48 0.07 0.42 0.10 0.50 0.06 0.08 0.04 0.13 0.07 0.09 0.11 0.64 0.05 0.10
A
Buried Residue
http://bcb.cs.tufts.edu/propellers/si/
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
A 0.27
0.04 0.13 0.28 0.22 0.18 0.11 0.31 0.23 0.38 0.06 0.11 0.06 0.13 0.22 0.28 0.37 0.49 0.06 0.25
C 0.04
0.08 0.05 0.07 0.04 0.03 0.03 0.04 0.07 0.04 0.02 0.06 0.01 0.08 0.11 0.05 0.06 0.10 0.04 0.09
D 0.13
0.05 0.09 0.13 0.09 0.08 0.13 0.08 0.71 0.12 0.06 0.22 0.03 0.15 0.50 0.36 0.41 0.24 0.02 0.12
E 0.28
0.07 0.13 0.43 0.31 0.15 0.21 0.43 1.92 0.50 0.14 0.28 0.10 0.25 1.49 0.60 1.01 0.63 0.09 0.32
F 0.22
0.04 0.09 0.31 0.23 0.16 0.12 0.34 0.28 0.32 0.12 0.14 0.06 0.19 0.29 0.27 0.34 0.38 0.13 0.33
G 0.18
0.03 0.08 0.15 0.16 0.08 0.06 0.15 0.16 0.15 0.06 0.08 0.05 0.10 0.15 0.14 0.17 0.21 0.03 0.19
H 0.11
0.03 0.13 0.21 0.12 0.06 0.06 0.08 0.25 0.12 0.04 0.10 0.07 0.11 0.14 0.19 0.20 0.21 0.05 0.14
I
0.31
0.04 0.08 0.43 0.34 0.15 0.08 0.48 0.57 0.32 0.10 0.14 0.07 0.28 0.43 0.30 0.32 0.59 0.07 0.40
K 0.23
0.07 0.71 1.92 0.28 0.16 0.25 0.57 0.63 0.38 0.15 0.46 0.08 0.42 0.33 0.70 1.17 0.71 0.22 0.52
L 0.38
0.04 0.12 0.50 0.32 0.15 0.12 0.32 0.38 0.48 0.10 0.15 0.12 0.23 0.36 0.26 0.34 0.62 0.07 0.39
M 0.06
0.02 0.06 0.14 0.12 0.06 0.04 0.10 0.15 0.10 0.12 0.09 0.04 0.08 0.10 0.12 0.14 0.10 0.02 0.08
N 0.11
0.06 0.22 0.28 0.14 0.08 0.10 0.14 0.46 0.15 0.09 0.38 0.09 0.22 0.25 0.48 0.49 0.27 0.05 0.18
P 0.06
0.01 0.03 0.10 0.06 0.05 0.07 0.07 0.08 0.12 0.04 0.09 0.02 0.06 0.07 0.07 0.13 0.13 0.02 0.16
Q 0.13
0.08 0.15 0.25 0.19 0.10 0.11 0.28 0.42 0.23 0.08 0.22 0.06 0.24 0.32 0.28 0.48 0.26 0.03 0.16
R 0.22
0.11 0.50 1.49 0.29 0.15 0.14 0.43 0.33 0.36 0.10 0.25 0.07 0.32 0.36 0.47 0.68 0.72 0.11 0.30
S 0.28
0.05 0.36 0.60 0.27 0.14 0.19 0.30 0.70 0.26 0.12 0.48 0.07 0.28 0.47 0.91 0.88 0.50 0.06 0.27
T 0.37
0.06 0.41 1.01 0.34 0.17 0.20 0.32 1.17 0.34 0.14 0.49 0.13 0.48 0.68 0.88 1.60 0.82 0.07 0.27
V 0.49
0.10 0.24 0.63 0.38 0.21 0.21 0.59 0.71 0.62 0.10 0.27 0.13 0.26 0.72 0.50 0.82 0.87 0.21 0.64
W 0.06
0.04 0.02 0.09 0.13 0.03 0.05 0.07 0.22 0.07 0.02 0.05 0.02 0.03 0.11 0.06 0.07 0.21 0.02 0.13
Y 0.25
0.09 0.12 0.32 0.33 0.19 0.14 0.40 0.52 0.39 0.08 0.18 0.16 0.16 0.30 0.27 0.27 0.64 0.13 0.38
Computing a Score
• Sequences are scored by computing their
best “threading” or “parse” against the
template as a sum of HMM(score) +
pairwise(score)
• No longer polynomial time (multidimensional dynamic programming)
• Tractable on propellers because paired
beta-strands don’t interleave too much
Let’s look at what this would mean
for propeller folds
Let’s look at what this would mean
for propeller folds
• Training set for HMM
score: leavesuperfamily-out cross
validation
• Training set for
pairwise score:
amphapathic betasheets from NONpropellers
Results on Propellers
TNeg
97%
96%
95%
94%
93%
92%
91%
90%
6-bladed
Hmmer Smurf
52
80
56
80
64
80
68
84
68
84
68
88
68
92
68
92
7-bladed
Hmmer Smurf
80
87
80
87
87
93
90
93
90
93
90
97
90
97
93
100
Results on Propellers
• Note that this is “6 (or 7)” bladed propeller
versus non-propeller– distinguishing the
number of blades in the propeller seems to
be a much harder problem….
Different propeller closures
1jof
2trc
So: what new sequences fold into
propellers?
• We predict a double propeller motif in the Nterminal region of a hybrid 2-component sensor
protein.
What are these proteins?
• First found in a benign bacteria in human gut.
• May be involved in adapting to changes in
diet/efficiently processing different sugars
• Found in other bacterial species: help sense and
adapt to environmental changes.
• Big stretch (I am not a biologist): help to study
human obesity epidemic??
Popular Domains
•
•
•
•
•
•
HisKA histidine kinase domain
GGDEF adenylyl cyclase signalling domain
SpoIIE sporulation domain
Gaf domain
PAS domain
HATPase domain
Species distribution
Distinguishing Number of Blades
• The automatic SMURF consensus 7-bladed
template only learns 6 blades.
• Sequence motifs are similar– the same Pfam
motif occurs in propellers with different numbers
of blades
• The fix: throw out propellers with a “funky” 7th
blade by hand and build a new template. Now 6bladed propellers don’t like the 7-bladed
template
• Double propellers we found are probably 7-7
(but 7-6 is also plausible).
Predict propellers with Smurf!
• http://smurf.cs.tufts.edu
– Accepts sequences in FASTA format
– 6,7,8-bladed templates, as well as all 9
double-propeller template
http://bcb.cs.tufts.edu/propellers/si
pairwise tables
long list of predicted propeller sequences
What’s Next for SMURF?
Long-range dependencies
Deeply interleaved β-strand pairs
Conclusions
• Combining an HMM score with a pairwise
score can help recognize beta-structures
• Computing this score exactly with a
random field is highly computationally
intensive
• We will begin to look at when it is feasible
and when we should use heuristics.
• Also: add side-chain packing, other model
refinements.
More Questions
• When should we over-weight the HMM
versus the pair portion of the score?
-- the case of 8-bladed propellers
• Are there other ways to incorporate
pairwise dependencies into HMMs?
An Hmm is only as good as its
training data
• An Hmm is only as good as its training
data– or is it?
• Idea: we augment the training set, using
the simplest model of evolution!
• See Kumar and Cowen’s ISMB
proceedings paper!
Acknowledgements
• National Institutes of Health
Thank you!
Download