An ILP Approach to Model and Classify Hexose Binding Sites

advertisement
An ILP Approach to Model and
Classify Hexose Binding Sites
Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid
Keirouz, and David Page
Problem Description
 Hexoses play a key role in many cellular pathways
 Hexose binding properties are of great interest to biomedical
researchers
 Current protein-sugar computational models are based on
prior biochemical knowledge
 Goal: Investigate the empirical support for biochemical
findings by comparing ILP-induced rules to actual
biochemical results
Biochemical Review
 An amino acid consists of:
 a central carbon atom C
 an amino group NH2
 a carboxyl group COOH
 a hydrogen atom H
 and a side chains R
 Amino acids differ by their side chain R
 Side chain confer to each amino acid its distinctive properties
 There are 20 different amino acids, also called residues
Biochemical Review
 A protein is a long chain of amino acids linked together
 Residue sequence determines protein shape and function
 Similar residues can be easily interchanged in a protein
Biochemical Review
 Hexoses are 6-carbon sugar molecules
 Hexoses consist of:
 A core pyranose ring
 Hydroxyl (-OH) groups sticking out
Biochemical Review
 Interaction forces between hexoses and residues are due to:
 Charge
 Hydrogen bond
 Hydrophobicity
 Pyranose ring:
 Hydroxyl (-OH) groups:
 Apolar (no charge)
 Polar (negative charge)
 hydrophobic (water hating)
 Hydrophilic (water loving)
 Form hydrogen bonds
Prior Biochemical Findings
 Planar polar residues (Asn, Asp, Gln, Glu, Arg) are




frequently involved in hydrogen bonding.
The aromatic residues (Trp, Tyr, Phe, His), stack
against the apolar surface of the sugar pyranose ring.
Planar polar and aromatic residues are present at higher
frequencies in hexose binding sites.
Ordered water molecules and metal ions are involved in
binding specificity and affinity.
Hexoses and their binding sites are neither hydrophobic nor
hydrophilic. They exhibit both properties in a dual nature.
Predicting Sugar Binding Sites
 Prior biochemical findings have been incorporated in binding
site classifiers
 Black box models
 We take the opposite approach:
Given hexose binding sites data, what biochemical rules
can we extract with no prior biochemical knowledge?
Dataset
 Mine Protein Data Bank for galactose, glucose and mannose
 Remove redundancies and keep proteins with a hexose
docked in binding site
 Get 80 protein-hexose binding sites (positive set)
 Extract 80 negatives: non-hexose binding sites and non-
binding surface grooves
 Total of 160 entries, equally divided.
Binding Site Representation
 Only the few atoms present at the binding site determine the
binding site affinity.
 We define the binding site as a sphere of radius 10 Å,
centered at the binding site center
 We extract all atoms in this sphere
 We ONLY consider atoms, not residues
 For every atom, we compute its charge, hydrogen bonding,
and hydrophobicity properties
Problem Formulation
 Use Aleph heuristic search to learn first-order rules
 Estimate the performance using 10-fold cross-validation
 The consequent of any rule is bind(A), where A is predicted to
be a hexose binding site
 Restrict clause length to a maximum of 8 literals
 Tolerate a clause coverage of up to 5 training-set negatives
 Minimize the cost function:
cost = (# covered negatives) − (# covered positives)
Literals
 Individual atom literal:
point(A,B,C,D,E, F,G,H, I, J)
 A: binding site
 B: atom number
 C, D, E: Cartesian coordinates
 F: charge value (nominal)
 G: hydrogen bonding value (nominal)
 H: hydrophobicity value (nominal)
 I, J: atomic element and its name
Literals
 Distance between two atoms literal:
dist(A,B1,B2,M,N)
 A: binding site
 B1, B2: two atoms numbers
 M: their Euclidean distance
 N: the error, resulting in M±N (set to 0.5 Å)
Results
 32.5% error rate over 10 folds (p-value < 0.0002)
 comparable to other general sugar binding site classifiers
 Although we only consider atoms, we can infer valuable
information regarding amino acids
 Example: ND1 atoms are only present in His residues. A
rule requiring ND1 is actually requiring His.
 We present the rule’s English translation, with residue
substitution and sorted by coverage
A is a hexose-binding site if:
It has a Trp residue and a Glu with an OE1 atom that is
8.53 Å away from a negatively charged Oxygen.
[Pos cover = 22, Neg cover = 4]
2. It has a Phe or Tyr residue and an Asp with an OD1
atom that is 5.24 Å away from an Asp or Asn’s OD1.
[Pos cover = 21, Neg cover = 3]
3. It has a branching aliphatic residue (Leu, Val, Ile), an
Asp and an Asn. Asp and Asn’s OD1 atoms are 3.41 Å
away.
[Pos cover = 15, Neg cover = 0]
1.
A is a hexose-binding site if:
4.
5.
6.
It has a hydrophilic non-hydrogen bonding Nitrogen atom (Pro,
Arg, His) with a distance of 7.95 Å away from a His ND1
nitrogen, and 9.60 Å away from a branching aliphatic residue’s CG1.
[Pos cover = 10, Neg cover = 0]
It has a hydrophobic CD2 atom, a hydrophilic Pro backbone or His
ND1 nitrogen and two Glu (or two Gln) distant by 11.89 Å.
[Pos cover = 11, Neg cover = 2]
It has an Asp B, two identical atoms Q and X, and a hydrophilic
hydrogen-bonding atom K. Atoms K, Q and X have the same charge.
B’s ODE1 oxygen share the same Y-coordinate with K and the same Zcoordinate with Q. Atom X is 8.29 Å away from atom K.
[Pos cover = 8, Neg cover = 0]
A is a hexose-binding site if:
It has a Ser, and two Gln and/or His, with NE2 atoms
that are 3.88 Å apart.
[Pos cover = 8, Neg cover = 2]
8. It has an Asn and a Phe, Tyr or His residue, with a
CE1 atom that is 7.07 Å away from a Calcium.
[Pos cover = 5, Neg cover = 0]
9. It has a Lys or Arg, a Phe or Tyr, a Pro or His, and
a Sulfate or a Phosphate.
[Pos cover = 3, Neg cover = 0]
7.
Discussion
 We infer most of the established biochemical information
 Rules 1 and 2, with highest coverage, rely on the aromatic residues
Trp, Tyr, and Phe. The fourth aromatic residue, His, is
mentioned in many different rules.
This highlights the docking interaction between the hexose and
the aromatic residues.
 All rules require the presence of a planar polar residue (Asn,
Asp, Gln, Glu, Arg).
These residues are most frequently involved in hexose hydrogenbonding.
Discussion
 The residues mostly mentioned in the rules are aromatic and
planar polar. Which mirrors the fact that they are present at
higher frequencies in hexose binding sites
 Rule 5 requires both a hydrophobic and a hydrophilic
elements. It reflects the dual nature of hexose docking.
 Rules 8 and 9 require the presence of different ions
(Calcium, Sulfate, Phosphate), confirming the relevance of
ions in hexose binding.
New discovery?
 Rule 2 suggests a dependency between Phe/Tyr and
Asn/Asp. Such a relation has been proven in lectins.
 Similarly, rule 1 suggests a dependency between Trp and
Glu.
A link not previously identified in the literature
 Further investigation is needed to confirm this finding
Conclusion
 ILP achieves a similar accuracy as other general sugar black-
box classifiers
 In addition, it offers insight into the discriminating process.
 Aleph was able to induce most of the known hexose-protein
interaction biochemical rules.
 ILP finds a previously unreported dependency between Trp
and Glu.
Download