Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33. 05/02/2008 Jae Hyun Kim Contents    Terminology Motivation Method      Molecular Signature Signature Kernel Signature Product Kernel Results Conclusion jaekim@ku.edu 2 Terminology (1)  Catalyst    Enzyme    Increases the rate of chemical reaction / biological process Remains unchanged Biomolecules that catalyze chemical reactions Usually proteins Metabolite   Intermediates & products of metabolism Restricted to small molecules Reference: www.wikipedia.org jaekim@ku.edu 3 Terminology (2)  Inhibitor    Molecules that decrease enzyme activity Compete with substrates Most of drugs/poisons Reference: www.wikipedia.org jaekim@ku.edu 4 Enzyme Commission (EC) Number  EC Number Numerical Classification scheme for Enzymecatalyzed reactions  Four levels of hierarchy Example: EC 3.4.11.4 : tripeptide aminopeptidases  EC 3 : hydrolases (enzymes that use water to break up some other molecules )  EC 3.4 : hydrolases that act on peptide bonds  EC 3.4.11 : hydrolases that cleave off the aminoterminal amino acid from polypeptide  EC 3.4.11.4 : hydrolases that cleave off the aminoReference: terminal end from a tripeptide www.wikipedia.org   jaekim@ku.edu 5 Motivation  Genome scale Large-scale enzyme-metabolite and drug-target interaction Protein-Chemical Interaction predictions Machine-learning Technique using the signature molecular descriptor jaekim@ku.edu 6 Molecular Signature  G=(V,E) : Molecular Graph    Atomic Signature    V : vertex (atom) set E : edge (bond) set Canonical representation of subgraph surrounding a particular atom include atoms and bonds up to a predefined distance (height) Molecular Signature of G : h(G)  h (x) G  Height   : atomic signature in G rooted at x of height h Chemicals : 0~6 Protein: 6~18 (amino acid residue 1~7) jaekim@ku.edu 7 Molecular Signature: Example (Leucine) (Isoleucine) •Depth First Search up to “height” deep •‘(‘ going down, ‘)’ going back up jaekim@ku.edu (Glycine) c_, n_: sp3 carbon/nitrogen atom c=, o= : sp2 (double-bond) carbon/oxygen atom h_: hydrogen 8 Reaction Signature  General form of enzymatic reaction R   s1S1+s2S2+…+snSn  p1P1+p2P2+…+pmPm Height h signature of reaction R jaekim@ku.edu 9 Pairwise Kernel  To predict/classify protein-protein interactions    To measure similarity between two pairs of proteins Kernel Function K( (X1,X2), (X’1,X’2) ) How to measure similarity between pairs? jaekim@ku.edu 10 Kernel Types  Pairwise similarity by component similarity   If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’) Assess directly similarity between pairs   From Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46. x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2) Similarity inside the pair  Similarity between pairs jaekim@ku.edu 11 Signature Kernel  Definition  Apply to chemicals, proteins, reactions jaekim@ku.edu 12 Signature Product Kernel (1/2)  P: Protein, C: Chemical  Definition : Signature of Complex PC  Two pairs of P-C interaction (P,C) & (Q,D) jaekim@ku.edu 13 Signature Product Kernel (2/2)  Similarly,  Therefore, jaekim@ku.edu 14 Signature Kernel : Example (height 1) # of occurrence jaekim@ku.edu 15 Signature Product Kernel : Example jaekim@ku.edu 16 Signature Similarity VS. Sequence Alignment Scores • Computed for every pair of amino acids • Correlation : Chemically similar  high BLOSUM62 score jaekim@ku.edu 17 EC Number Classification  Positive Examples    Negative Examples:   download from KEGG more than 50, max 500 Equal Number, Random Selection Signature Kernel, 5-fold CV Using only reactions jaekim@ku.edu Using only protein sequences 18 EC Classification •Using both sequences & reactions •Signature Product Kernel Class 1 Class 1.1 Class 1.1.1 Class 1.1.1.1 jaekim@ku.edu 19 Comparison with other Methods •Accuracy = (TP+TN)/ (TP+TN+FP+FN) •Auc = Area Under Curve •Precision = TP/(TP+FP) •Sensitivity=TP/(TP+FN) •Specificity=TN/(TN+FP) •Jaccard Coefficient = TP/(TP+FP+FN) • A larger number indicates better results jaekim@ku.edu 20 Predicting New Enzyme Interactions  Prediction    EC No. accepted in September 2006 : Test Set Predict whether or not a given enzyme will catalyze a given reaction Signature Product Kernel jaekim@ku.edu 21 Predict DRUGBANK Using KEGG •Class I : Both in training set •Class II: Different Partners •Class III: Only Target •Class IV: Only Drug •Class V: None •Signature Product Kernel Area under ROC = 0.74 jaekim@ku.edu 22 Conclusion   Unified method for predicting proteinchemical interactions Atomistic structure representation of proteins encompasses information stored in substitution matrices. jaekim@ku.edu 23

Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

Related documents

Products

Support

Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib