Protein Design with DEE/A* Algorithms for Drug Design 03/02/2011

advertisement
Protein Design with DEE/A*
Algorithms for Drug Design
03/02/2011
General Protein Redesign Scheme
Rotamer Library
Input
Structure
Energy Function:
…
<
<
<
…
Benefits of Provable Methods
In order enumeration
Conformations: Low E
High E
heuristic
(MC, SCMF, GA)
provable
(A* Enumeration)
3
1
2
3
4
4
6
5
6
Enumeration
with gaps
7
Gap free
enumeration
Dead-End Elimination (DEE)
it
ir
ir
E
it
conformations
Enumeration with A*
• After DEE, more than one possibility at each
design position.
• Need to evaluate conformations.
• Find the GMEC and a list of ordered
conformations.
• A* search algorithm.
• First used by Leach and Lemon. PROTEINS
1998
Slight Modification to DEE
Leach. et. al. PROTEINS. 33:227-239 (1998).
Slight Modification to DEE
Leach. et. al. PROTEINS. 33:227-239 (1998).
A* search
• Finds the least-cost path from the root node
to one or more goal nodes.
• Evaluation function – f*
• At any node n, f* = g* + h*
– g* = cost of reaching node n from the root node.
– h* = estimated cost of reaching the goal node
from n.
A* search continued
• Search maintains a priority queue, with nodes
ordered according to the value of f*.
• At each stage – node with minimum value of
f* is expanded and its successor nodes
calculated.
• Successor nodes entered in the queue,
maintaining the f* order.
An example
• Design of a Tri-peptide.
• After DEE prunning,
– Residue A – 3 rotamers.
– Residue B – 3 rotamers.
– Residue C – 2 rotamers.
• Assume values of g* and h* are given at every
node.
Leach. et. al. PROTEINS. 33:227-239 (1998).
100
10
1
2
200
3
A
100
200
10
1
8
A2(21),A1(108),A3(206).
2
11
3
6
A
100
200
10
1
2
8
3
11
6
A
4
3
3
1
8
2
8
3
10
A2B2(21),A2B1(22),A2B3(23),A1(108),A3(206).
B
100
200
10
1
2
8
3
11
6
A
4
3
3
1
8
2
12
1
8
3
10
B
8
2
A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108),
A3(206).
C
100
200
10
1
2
8
3
11
6
A
4
3
3
1
8
2
12
1
8
3
10
B
8
2
Rank 1
A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108),
A3(206).
C
100
200
10
1
2
8
3
11
6
A
4
3
3
1
8
2
8
3
10
B
8
12
1
12
2
1
8
2
Rank 1
A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108),
A3(206).
C
100
200
10
1
2
8
3
11
6
A
4
3
3
1
8
2
8
3
10
B
8
12
1
Rank 2
12
2
1
8
2
Rank 1
A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108),
A3(206).
C
Provable Guarantees with A*
• At any node n, g* is known exactly.
– g* - known exactly
– h* - estimate
• h* should be admissible.
– If C* is the actual cost, h*<= C*
• A* guarantees to never overlook the
possibility of a lower-cost path.
Proof
• A* returns a goal node, when it is at the head
of the queue => its cost is minimum.
Proof
• A* returns a goal node, when it is at the head
of the queue => its cost is minimum.
• Actual cost of head node <= the estimated
cost of other nodes.
Proof
• A* returns a goal node, when it is at the head
of the queue => its cost is minimum.
• Actual cost of head node <= the estimated
cost of other nodes.
• estimated cost <= actual cost.
Proof
• A* returns a goal node, when it is at the head
of the queue => its cost is minimum.
• Actual cost of head node <= the estimated
cost of other nodes.
• estimated cost <= actual cost.
• Actual cost of head node <= actual cost of
other nodes.
Protein Design and A*
• Each node in the tree – partially assigned
conformation.
• g* = energy of the partially assigned
conformations.
• h* = minimum energy required to complete
the model – hence will never overestimate the
actual energy.
1
2
1
2
3
3
A
B
1
2
1
2
g* = E(A2) + E(B1) + E(A2,B1)
3
3
A
B
1
2
1
2
3
3
A
B
g* = E(A2) + E(B1) + E(A2,B1)
1
2
h* = min [ E(Ci) + E(A2,Ci) + E(B1,Ci) ]
i=1,2
C
1
2
1
1
2
3
3
2
A
B
C
g*??, h*??
Questions??
Download