Non-linear optimization

advertisement
correlating graph-theoretical
centrality indices with interface
residue propensity
or: where do things stick together?
Stefan Maetschke
Teasdale Group
1
…a bit more specific
Prediction of interface residues
 Protein-RNA interfaces
 Machine learning methods
 Structural information
 Graph-topological features

2
something for the visual cortex
Protein-RNA complex
[JMol,1R3E_A]
Binding site
[Terribilini et al. 2006]
Contact graph
[Jung Library]
3
questions
Most predictors are sequence based:

What impact has structural information on
prediction accuracy?

What features are predictive for interface
residues?
4
obvious features
Interface residue…





is on surface
has to bind
must be stabilized
prefers flat surface
is conserved
=>
=>
=>
=>
=>
Accessible surface area
Physico-chemical prop.
Contact graph topology
not really
maybe not that much
5
accessible surface area (ASA)
http://www.see.ed.ac.uk/~tduren/research/surface_area/
http://www.ysbl.york.ac.uk/~ccp4mg/ccp4mg_help/analysis.html
6
physico-chemical properties
Hydrophobicity

Inside/Outside

Conformation

Partition Coefficient
AAIndex database
approx. 400 indices
AUC over 144 protein chains
4304 binding and 27932 non-binding
sequence similarity < 30%
7
patch types
8
patch type comparison





Naïve Bayes
PSI-BLAST Profiles
AUC
5-fold x-validation
RB144 data set
9
features over patches
10
betweenness-centrality (BC)
s
http://en.wikipedia.org/wiki/Image:Graph_betweenness.svg
v
t
11
BC for contact graph
Histogram: binned BC over RB144




1FJG_K
AUC = 0.71
Red: interface residue
Size: betweenness centrality
12
combined features





WRC : distance-weighted retention coefficient
BC : betweenness centrality
ASA : accessible surface area
5-fold x–validation, RB144
Patch sizes: sequential->11, topological->19, spatial->19
13
summary







Patch size is critical for sequential patches
Spatial/topological patches perform better
Structural information helps – but not much: +5%
Novelty: centrality indices as predictors
SVM superior to NB
Top prediction accuracy – as far as one can tell
Accuracy in general is still low (MCC < 0.4)
14
what’s next…
Prediction of disease associated SNPs
 Graph-spectral methods
 Protein function prediction

15
acknowledgments

Zheng Yuan
– Data sets and much more …

Karin Kassahn – Aminoacyl-tRNA synthetases
http://en.wikipedia.org/wiki/Aminoacyl_tRNA_synthetase
16
questions
17
Download