correlating graph-theoretical centrality indices with interface residue propensity or: where do things stick together? Stefan Maetschke Teasdale Group 1 …a bit more specific Prediction of interface residues Protein-RNA interfaces Machine learning methods Structural information Graph-topological features 2 something for the visual cortex Protein-RNA complex [JMol,1R3E_A] Binding site [Terribilini et al. 2006] Contact graph [Jung Library] 3 questions Most predictors are sequence based: What impact has structural information on prediction accuracy? What features are predictive for interface residues? 4 obvious features Interface residue… is on surface has to bind must be stabilized prefers flat surface is conserved => => => => => Accessible surface area Physico-chemical prop. Contact graph topology not really maybe not that much 5 accessible surface area (ASA) http://www.see.ed.ac.uk/~tduren/research/surface_area/ http://www.ysbl.york.ac.uk/~ccp4mg/ccp4mg_help/analysis.html 6 physico-chemical properties Hydrophobicity Inside/Outside Conformation Partition Coefficient AAIndex database approx. 400 indices AUC over 144 protein chains 4304 binding and 27932 non-binding sequence similarity < 30% 7 patch types 8 patch type comparison Naïve Bayes PSI-BLAST Profiles AUC 5-fold x-validation RB144 data set 9 features over patches 10 betweenness-centrality (BC) s http://en.wikipedia.org/wiki/Image:Graph_betweenness.svg v t 11 BC for contact graph Histogram: binned BC over RB144 1FJG_K AUC = 0.71 Red: interface residue Size: betweenness centrality 12 combined features WRC : distance-weighted retention coefficient BC : betweenness centrality ASA : accessible surface area 5-fold x–validation, RB144 Patch sizes: sequential->11, topological->19, spatial->19 13 summary Patch size is critical for sequential patches Spatial/topological patches perform better Structural information helps – but not much: +5% Novelty: centrality indices as predictors SVM superior to NB Top prediction accuracy – as far as one can tell Accuracy in general is still low (MCC < 0.4) 14 what’s next… Prediction of disease associated SNPs Graph-spectral methods Protein function prediction 15 acknowledgments Zheng Yuan – Data sets and much more … Karin Kassahn – Aminoacyl-tRNA synthetases http://en.wikipedia.org/wiki/Aminoacyl_tRNA_synthetase 16 questions 17