Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

advertisement
Genome scale enzyme-metabolite and
drug-target interaction predictions using
the signature molecular descriptor
Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.
05/02/2008
Jae Hyun Kim
Contents



Terminology
Motivation
Method





Molecular Signature
Signature Kernel
Signature Product Kernel
Results
Conclusion
jaekim@ku.edu
2
Terminology (1)

Catalyst



Enzyme



Increases the rate of chemical reaction / biological process
Remains unchanged
Biomolecules that catalyze chemical reactions
Usually proteins
Metabolite


Intermediates & products of metabolism
Restricted to small molecules
Reference:
www.wikipedia.org
jaekim@ku.edu
3
Terminology (2)

Inhibitor



Molecules that decrease enzyme activity
Compete with substrates
Most of drugs/poisons
Reference:
www.wikipedia.org
jaekim@ku.edu
4
Enzyme Commission (EC) Number

EC Number
Numerical Classification scheme for Enzymecatalyzed reactions
 Four levels of hierarchy
Example: EC 3.4.11.4 : tripeptide aminopeptidases
 EC 3 : hydrolases (enzymes that use water to break
up some other molecules )
 EC 3.4 : hydrolases that act on peptide bonds
 EC 3.4.11 : hydrolases that cleave off the aminoterminal amino acid from polypeptide
 EC 3.4.11.4 : hydrolases that cleave off the aminoReference:
terminal end from a tripeptide
www.wikipedia.org


jaekim@ku.edu
5
Motivation

Genome scale
Large-scale
enzyme-metabolite and
drug-target interaction
Protein-Chemical Interaction
predictions
Machine-learning Technique
using
the signature molecular descriptor
jaekim@ku.edu
6
Molecular Signature

G=(V,E) : Molecular Graph



Atomic Signature



V : vertex (atom) set
E : edge (bond) set
Canonical representation of subgraph surrounding a
particular atom
include atoms and bonds up to a predefined distance
(height)
Molecular Signature of G : h(G)

h (x)
G

Height


: atomic signature in G rooted at x of height h
Chemicals : 0~6
Protein: 6~18 (amino acid residue 1~7)
jaekim@ku.edu
7
Molecular Signature: Example
(Leucine)
(Isoleucine)
•Depth First Search up to “height” deep
•‘(‘ going down, ‘)’ going back up
jaekim@ku.edu
(Glycine)
c_, n_: sp3 carbon/nitrogen atom
c=, o= : sp2 (double-bond) carbon/oxygen atom
h_: hydrogen
8
Reaction Signature

General form of enzymatic reaction R


s1S1+s2S2+…+snSn  p1P1+p2P2+…+pmPm
Height h signature of reaction R
jaekim@ku.edu
9
Pairwise Kernel

To predict/classify protein-protein
interactions



To measure similarity between two pairs of
proteins
Kernel Function K( (X1,X2), (X’1,X’2) )
How to measure similarity between
pairs?
jaekim@ku.edu
10
Kernel Types

Pairwise similarity by component similarity


If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’)
Assess directly similarity between pairs


From
Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting
protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.
x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2)
Similarity inside the pair  Similarity between pairs
jaekim@ku.edu
11
Signature Kernel

Definition

Apply to chemicals, proteins, reactions
jaekim@ku.edu
12
Signature Product Kernel (1/2)

P: Protein, C: Chemical

Definition : Signature of Complex PC

Two pairs of P-C interaction (P,C) & (Q,D)
jaekim@ku.edu
13
Signature Product Kernel (2/2)

Similarly,

Therefore,
jaekim@ku.edu
14
Signature Kernel : Example (height 1)
# of occurrence
jaekim@ku.edu
15
Signature Product Kernel : Example
jaekim@ku.edu
16
Signature Similarity VS.
Sequence Alignment Scores
• Computed for every pair of amino acids
• Correlation : Chemically similar  high BLOSUM62 score
jaekim@ku.edu
17
EC Number Classification

Positive Examples



Negative Examples:


download from KEGG
more than 50, max 500
Equal Number, Random Selection
Signature Kernel, 5-fold CV
Using only reactions
jaekim@ku.edu
Using only
protein sequences
18
EC Classification
•Using both sequences & reactions
•Signature Product Kernel
Class 1
Class 1.1
Class 1.1.1
Class 1.1.1.1
jaekim@ku.edu
19
Comparison with other Methods
•Accuracy = (TP+TN)/
(TP+TN+FP+FN)
•Auc = Area Under Curve
•Precision = TP/(TP+FP)
•Sensitivity=TP/(TP+FN)
•Specificity=TN/(TN+FP)
•Jaccard Coefficient
= TP/(TP+FP+FN)
• A larger number indicates
better results
jaekim@ku.edu
20
Predicting New Enzyme Interactions

Prediction



EC No. accepted in September 2006 : Test Set
Predict whether or not a given enzyme will catalyze a
given reaction
Signature Product Kernel
jaekim@ku.edu
21
Predict DRUGBANK Using KEGG
•Class I : Both in training set
•Class II: Different Partners
•Class III: Only Target
•Class IV: Only Drug
•Class V: None
•Signature Product Kernel
Area under ROC = 0.74
jaekim@ku.edu
22
Conclusion


Unified method for predicting proteinchemical interactions
Atomistic structure representation of
proteins encompasses information stored
in substitution matrices.
jaekim@ku.edu
23
Related documents
Download