Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33. 05/02/2008 Jae Hyun Kim Contents Terminology Motivation Method Molecular Signature Signature Kernel Signature Product Kernel Results Conclusion jaekim@ku.edu 2 Terminology (1) Catalyst Enzyme Increases the rate of chemical reaction / biological process Remains unchanged Biomolecules that catalyze chemical reactions Usually proteins Metabolite Intermediates & products of metabolism Restricted to small molecules Reference: www.wikipedia.org jaekim@ku.edu 3 Terminology (2) Inhibitor Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons Reference: www.wikipedia.org jaekim@ku.edu 4 Enzyme Commission (EC) Number EC Number Numerical Classification scheme for Enzymecatalyzed reactions Four levels of hierarchy Example: EC 3.4.11.4 : tripeptide aminopeptidases EC 3 : hydrolases (enzymes that use water to break up some other molecules ) EC 3.4 : hydrolases that act on peptide bonds EC 3.4.11 : hydrolases that cleave off the aminoterminal amino acid from polypeptide EC 3.4.11.4 : hydrolases that cleave off the aminoReference: terminal end from a tripeptide www.wikipedia.org jaekim@ku.edu 5 Motivation Genome scale Large-scale enzyme-metabolite and drug-target interaction Protein-Chemical Interaction predictions Machine-learning Technique using the signature molecular descriptor jaekim@ku.edu 6 Molecular Signature G=(V,E) : Molecular Graph Atomic Signature V : vertex (atom) set E : edge (bond) set Canonical representation of subgraph surrounding a particular atom include atoms and bonds up to a predefined distance (height) Molecular Signature of G : h(G) h (x) G Height : atomic signature in G rooted at x of height h Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7) jaekim@ku.edu 7 Molecular Signature: Example (Leucine) (Isoleucine) •Depth First Search up to “height” deep •‘(‘ going down, ‘)’ going back up jaekim@ku.edu (Glycine) c_, n_: sp3 carbon/nitrogen atom c=, o= : sp2 (double-bond) carbon/oxygen atom h_: hydrogen 8 Reaction Signature General form of enzymatic reaction R s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm Height h signature of reaction R jaekim@ku.edu 9 Pairwise Kernel To predict/classify protein-protein interactions To measure similarity between two pairs of proteins Kernel Function K( (X1,X2), (X’1,X’2) ) How to measure similarity between pairs? jaekim@ku.edu 10 Kernel Types Pairwise similarity by component similarity If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’) Assess directly similarity between pairs From Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46. x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2) Similarity inside the pair Similarity between pairs jaekim@ku.edu 11 Signature Kernel Definition Apply to chemicals, proteins, reactions jaekim@ku.edu 12 Signature Product Kernel (1/2) P: Protein, C: Chemical Definition : Signature of Complex PC Two pairs of P-C interaction (P,C) & (Q,D) jaekim@ku.edu 13 Signature Product Kernel (2/2) Similarly, Therefore, jaekim@ku.edu 14 Signature Kernel : Example (height 1) # of occurrence jaekim@ku.edu 15 Signature Product Kernel : Example jaekim@ku.edu 16 Signature Similarity VS. Sequence Alignment Scores • Computed for every pair of amino acids • Correlation : Chemically similar high BLOSUM62 score jaekim@ku.edu 17 EC Number Classification Positive Examples Negative Examples: download from KEGG more than 50, max 500 Equal Number, Random Selection Signature Kernel, 5-fold CV Using only reactions jaekim@ku.edu Using only protein sequences 18 EC Classification •Using both sequences & reactions •Signature Product Kernel Class 1 Class 1.1 Class 1.1.1 Class 1.1.1.1 jaekim@ku.edu 19 Comparison with other Methods •Accuracy = (TP+TN)/ (TP+TN+FP+FN) •Auc = Area Under Curve •Precision = TP/(TP+FP) •Sensitivity=TP/(TP+FN) •Specificity=TN/(TN+FP) •Jaccard Coefficient = TP/(TP+FP+FN) • A larger number indicates better results jaekim@ku.edu 20 Predicting New Enzyme Interactions Prediction EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a given reaction Signature Product Kernel jaekim@ku.edu 21 Predict DRUGBANK Using KEGG •Class I : Both in training set •Class II: Different Partners •Class III: Only Target •Class IV: Only Drug •Class V: None •Signature Product Kernel Area under ROC = 0.74 jaekim@ku.edu 22 Conclusion Unified method for predicting proteinchemical interactions Atomistic structure representation of proteins encompasses information stored in substitution matrices. jaekim@ku.edu 23