Protein Design with DEE/A* Algorithms for Drug Design 03/02/2011 General Protein Redesign Scheme Rotamer Library Input Structure Energy Function: … < < < … Benefits of Provable Methods In order enumeration Conformations: Low E High E heuristic (MC, SCMF, GA) provable (A* Enumeration) 3 1 2 3 4 4 6 5 6 Enumeration with gaps 7 Gap free enumeration Dead-End Elimination (DEE) it ir ir E it conformations Enumeration with A* • After DEE, more than one possibility at each design position. • Need to evaluate conformations. • Find the GMEC and a list of ordered conformations. • A* search algorithm. • First used by Leach and Lemon. PROTEINS 1998 Slight Modification to DEE Leach. et. al. PROTEINS. 33:227-239 (1998). Slight Modification to DEE Leach. et. al. PROTEINS. 33:227-239 (1998). A* search • Finds the least-cost path from the root node to one or more goal nodes. • Evaluation function – f* • At any node n, f* = g* + h* – g* = cost of reaching node n from the root node. – h* = estimated cost of reaching the goal node from n. A* search continued • Search maintains a priority queue, with nodes ordered according to the value of f*. • At each stage – node with minimum value of f* is expanded and its successor nodes calculated. • Successor nodes entered in the queue, maintaining the f* order. An example • Design of a Tri-peptide. • After DEE prunning, – Residue A – 3 rotamers. – Residue B – 3 rotamers. – Residue C – 2 rotamers. • Assume values of g* and h* are given at every node. Leach. et. al. PROTEINS. 33:227-239 (1998). 100 10 1 2 200 3 A 100 200 10 1 8 A2(21),A1(108),A3(206). 2 11 3 6 A 100 200 10 1 2 8 3 11 6 A 4 3 3 1 8 2 8 3 10 A2B2(21),A2B1(22),A2B3(23),A1(108),A3(206). B 100 200 10 1 2 8 3 11 6 A 4 3 3 1 8 2 12 1 8 3 10 B 8 2 A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108), A3(206). C 100 200 10 1 2 8 3 11 6 A 4 3 3 1 8 2 12 1 8 3 10 B 8 2 Rank 1 A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108), A3(206). C 100 200 10 1 2 8 3 11 6 A 4 3 3 1 8 2 8 3 10 B 8 12 1 12 2 1 8 2 Rank 1 A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108), A3(206). C 100 200 10 1 2 8 3 11 6 A 4 3 3 1 8 2 8 3 10 B 8 12 1 Rank 2 12 2 1 8 2 Rank 1 A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108), A3(206). C Provable Guarantees with A* • At any node n, g* is known exactly. – g* - known exactly – h* - estimate • h* should be admissible. – If C* is the actual cost, h*<= C* • A* guarantees to never overlook the possibility of a lower-cost path. Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes. Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes. • estimated cost <= actual cost. Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes. • estimated cost <= actual cost. • Actual cost of head node <= actual cost of other nodes. Protein Design and A* • Each node in the tree – partially assigned conformation. • g* = energy of the partially assigned conformations. • h* = minimum energy required to complete the model – hence will never overestimate the actual energy. 1 2 1 2 3 3 A B 1 2 1 2 g* = E(A2) + E(B1) + E(A2,B1) 3 3 A B 1 2 1 2 3 3 A B g* = E(A2) + E(B1) + E(A2,B1) 1 2 h* = min [ E(Ci) + E(A2,Ci) + E(B1,Ci) ] i=1,2 C 1 2 1 1 2 3 3 2 A B C g*??, h*?? Questions??