Dynamic Programming • • • • • Optimization Problems Dynamic Programming Paradigm Example: Matrix multiplication Principle of Optimality Example: String Matching Problems Optimization Problems • In an optimization problem, there are typically many feasible solutions for any input instance I • For each solution S, we have a cost or value function f(S) • Typically, we wish to find a feasible solution S such that f(S) is either minimized or maximized • Thus, when designing an algorithm to solve an optimization problem, we must prove the algorithm produces a best possible solution. Example Problem You have six hours to complete as many tasks as possible, all of which are equally important. Task A - 2 hours Task D - 3.5 hours Task B - 4 hours Task E - 2 hours Task C - 1/2 hour Task F - 1 hour How many can you get done? • Is this a minimization or a maximization problem? • Give one example of a feasible but not optimal solution along with its associated value. • Give an optimal solution and its associated value. Dynamic Programming • The key idea behind dynamic program is that it is a divide-and-conquer technique at heart • That is, we solve larger problems by patching together solutions to smaller problems • However, dynamic programming is typically faster because we compute these solutions in a bottom-up fashion Fibonacci numbers • F(n) = F(n-1) + F(n-2) – F(0) = 0 – F(1) = 1 • Top-down recursive computation is very inefficient – Many F(i) values are computed multiple times • Bottom-up computation is much more efficient – Compute F(2), then F(3), then F(4), etc. using stored values for smaller F(i) values to compute next value – Each F(i) value is computed just once Recursive Computation F(n) = F(n-1) + F(n-2) ; F(0) = 0, F(1) = 1 Recursive Solution: F(6) = 8 F(4) F(5) F(4) F(3) F(2) F(1) F(0) F(2) F(1) F(3) F(3) F(1) F(0) F(2) F(1) F(0) F(1) F(2) F(1) F(0) F(2) F(1) F(1) F(0) Bottom-up computation We can calculate F(n) in linear time by storing small values. F[0] = 0 F[1] = 1 for i = 2 to n F[i] = F[i-1] + F[i-2] return F[n] Moral: We can sometimes trade space for time. Key implementation steps • Identify subsolutions that may be useful in computing whole solution – Often need to introduce parameters • Develop a recurrence relation (recursive solution) – Set up the table of values/costs to be computed • The dimensionality is typically determined by the number of parameters • The number of values should be polynomial • Determine the order of computation of values • Backtrack through the table to obtain complete solution (not just solution value) Example: Matrix Multiplication • Input – List of n matrices to be multiplied together using traditional matrix multiplication – The dimensions of the matrices are sufficient • Task – Compute the optimal ordering of multiplications to minimize total number of scalar multiplications performed • Observations: – Multiplying an X Y matrix by a Y Z matrix takes X Y Z multiplications – Matrix multiplication is associative but not commutative Example Input • Input: – M1, M2, M3, M4 • • • • M1: 13 x 5 M2: 5 x 89 M3: 89 x 3 M4: 3 x 34 • Feasible solutions and their values – – – – – ((M1 M2) M3) M4:10,582 scalar multiplications (M1 M2) (M3 M4): 54,201 scalar multiplications (M1 (M2 M3)) M4: 2856 scalar multiplications M1 ((M2 M3) M4): 4055 scalar multiplications M1 (M2 (M3 M4)): 26,418 scalar multiplications Key implementation steps • Identify subsolutions that may be useful in computing whole solution – Often need to introduce parameters – Define dimensions to be (d0, d1, …, dn) where matrix Mi has dimensions di-1 x di – Let M(i,j) be the matrix formed by multiplying matrices Mi through Mj – Define C(i,j) to be the minimum cost for computing M(i,j) Key implementation steps • Develop a recurrence relation (recursive solution) C 1 1 0 – Recurrence relation for C(i,j) • C(i,j) = mink=i to j-1 ( C(i,k)+ C(k+1,j) + di-1dkdj) – The last multiplication is between matrices M(i,k) and M(k+1,j) 2 • C(i,i) = 0 – Set up the table of values/costs to be computed • The dimensionality is typically determined by the number of parameters • The number of values should be polynomial 3 4 2 3 4 0 0 0 Key implementation steps • Determine the order of computation of values C 1 2 3 4 1 0 1 2 3 0 1 2 0 1 2 3 4 0 Computing actual ordering C 1 2 3 4 P 1 2 3 4 1 0 5785 1530 2856 1 0 1 1 3 0 1335 1845 2 0 2 3 0 9078 3 0 3 0 4 2 3 4 0 P(i,j) records the intermediate multiplication k used to compute M(i,j). That is, P(i,j) = k if last multiplication was M(i,k) M(k+1,j) Pseudocode int MatrixOrder() forall i, j C[i, j] = 0; for j = 2 to n for i = j-1 to 1 C(i,j) = mini<=k<=j-1 (C(i,k)+ C(k+1,j) + di-1dkdj) P[i, j]=k; return C[1, n]; Backtracking Procedure ShowOrder(i, j) if (i=j) write ( “Ai”) ; else k = P [ i, j ] ; write “ ( ” ; ShowOrder(i, k) ; write “ ” ; ShowOrder (k+1, j) ; write “)” ; Principle of Optimality • In book, this is termed “Optimal substructure” • An optimal solution contains within it optimal solutions to subproblems. • More detailed explanation – Suppose solution S is optimal for problem P. – Suppose we decompose P into P1 through Pk and that S can be decomposed into pieces S1 through Sk corresponding to the subproblems. – Then solution Si is an optimal solution for subproblem Pi Example 1 • Matrix Multiplication – In our solution for computing matrix M(1,n), we have a final step of multiplying matrices M(1,k) and M(k+1,n). – Our subproblems then would be to compute M(1,k) and M(k+1,n) – Our solution uses optimal solutions for computing M(1,k) and M(k+1,n) as part of the overall solution. Example 2 • Shortest Path Problem – Suppose a shortest path from s to t visits u – We can decompose the path into s-u and u-t. – The s-u path must be a shortest path from s to u, and the u-t path must be a shortest path from u to t • Conclusion: dynamic programming can be used for computing shortest paths Example 3 • Longest Path Problem – Suppose a longest path from s to t visits u – We can decompose the path into s-u and u-t. – Is it true that the s-u path must be a longest path from s to u? • Conclusion? Example 4: The Traveling Salesman Problem What recurrence relation will return the optimal solution to the Traveling Salesman Problem? If T(i) is the optimal tour on the first i points, will this help us in solving larger instances of the problem? Can we set T(i+1) to be T(i) with the additional point inserted in the position that will result in the shortest path? No! T(4) T(5) Shortest Tour Summary of bad examples • There almost always is a way to have the optimal substructure if you expand your subproblems enough • For longest path and TSP, the number of subproblems grows to exponential size • This is not useful as we do not want to compute an exponential number of solutions When is dynamic programming effective? • Dynamic programming works best on objects that are linearly ordered and cannot be rearranged – – – – characters in a string files in a filing cabinet points around the boundary of a polygon the left-to-right order of leaves in a search tree. • Whenever your objects are ordered in a left-toright way, dynamic programming must be considered. Efficient Top-Down Implementation • We can implement any dynamic programming solution top-down by storing computed values in the table – If all values need to be computed anyway, bottom up is more efficient – If some do not need to be computed, top-down may be faster Inexact Matching of Strings • General Problem – Input • Strings S and T – Questions • How distant is S from T? • How similar is S to T? • Solution Technique – Dynamic programming with cost/similarity/scoring matrix Measuring Distance of S and T • Consider S and T • We can transform S into T using the following four operations – insertion of a character into S – deletion of a character from S – substitution (replacement) of a character in S by another character (typically in T) – matching (no operation) Example • • • • • • • • S = vintner T = writers vintner wintner (Replace v with w) wrintner (Insert r) writner (Delete first n) writer (Delete second n) writers (Insert S) Example • Edit Transcript (or just transcript): – a string that describes the transformation of one string into the other • Example – RIMDMDMMI – v intner – wri t ers Edit Distance • Edit distance of strings S and T – The minimum number of edit operations (insertion, deletion, replacement) needed to transform string S into string T – Levenshtein distance, Levenshtein appears to have been the first to define this concept • Optimal transcript – An edit transcript of S and T that has the minimum number of edit operations – cooptimal transcripts Alignment • A global alignment of strings S and T is obtained – by inserting spaces (dashes) into S and T • they should have the same number of characters (including dashes) at the end – then placing two strings over each other matching one character (or dash) in S with a unique character (or dash) in T – Note ALL positions in both S and T are involved Alignments and Edit transcripts • Example Alignment – v-intner– wri-t-ers • Alignments and edit transcripts are interrelated – edit transcript: emphasizes process • the specific mutational events – alignment: emphasizes product • the relationship between the two strings – Alignments are often easier to work with and visualize • also generalize better to more than 2 strings Edit Distance Problem • Input – 2 strings S and T • Task – Output edit distance of S and T – Output optimal edit transcript – Output optimal alignment • Solution method – Dynamic Programming Identifying Subproblems • Let D(i,j) be the edit distance of S[1..i] and T[1..j] – The edit distance of the first i characters of S with the first j characters of T – Let |S| = n, |T| = m • D(n,m) = edit distance of S and T • We will compute D(i,j) for all i and j such that 0 <= i <= n, 0 <= j <= m Recurrence Relation • Base Case: – For 0 <= i <= n, D(i,0) = i – For 0 <= j <= m, D(0,j) = j • Recursive Case: – 0 < i <= n, 0 < j <= m – D(i,j) = min • D(i-1,j) + 1 (what does this mean?) • D(i,j-1) + 1 (what does this mean?) • D(i-1,j-1) + d(i,j) (what does this mean?) – d(i,j) = 0 if S(i) = T(j) and is 1 otherwise What the various cases mean • D(i,j) = min – D(i-1,j) + 1: • Align S[1..i-1] with T[1..j] optimally • Match S(i) with a dash in T – D(i,j-1) + 1 • Align S[1..i] with T[1..j-1] optimally • Match a dash in S with T(j) – D(i-1,j-1) + d(i,j) • Align S[1..i-1] with T[1..j-1] optimally • Match S(i) with T(j) Computing D(i,j) values D(i,j) 0 v i n t n e r 0 1 2 3 4 5 6 7 w 1 r 2 i 3 t 4 e 5 r 6 s 7 Initialization: Base Case D(i,j) v i n t n e r 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 w 1 1 r 2 2 i 3 3 t 4 4 e 5 5 r 6 6 s 7 7 Row i=1 D(i,j) v i n t n e r 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 w 1 1 1 r 2 2 2 i 3 3 3 t 4 4 4 e 5 5 5 r 6 6 6 s 7 7 7 Entry i=2, j=2 D(i,j) v i n t n e r 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 w 1 1 1 2 r 2 2 2 ? i 3 3 3 t 4 4 4 e 5 5 5 r 6 6 6 s 7 7 7 Entry i=2, j=3 D(i,j) v i n t n e r 0 1 2 3 4 5 6 7 0 0 1 2 3 4 5 6 7 w 1 1 1 2 r 2 2 2 2 i 3 3 3 ? t 4 4 4 e 5 5 5 r 6 6 6 s 7 7 7 Calculation methodologies • Location of edit distance – D(n,m) • • • • Example was to calculate row by row Can also calculate column by column Can also use antidiagonals Key is to build from upper left corner Traceback • Using table to construct optimal transcript • Pointers in cell D(i,j) – Set a pointer from cell (i,j) to • cell (i, j-1) if D(i,j) = D(i, j-1) + 1 • cell (i-1,j) if D(i,j) = D(i-1,j) + 1 • cell (i-1,j-1) if D(i,j) = D(i-1,j-1) + d(i,j) – Follow path of pointers from (n,m) back to (0,0) What the pointers mean • horizontal pointer: cell (i,j) to cell (i, j-1) – Align T(j) with a space in S – Insert T(j) into S • vertical pointer: cell (i,j) to cell (i-1, j) – Align S(i) with a space in T – Delete S(i) from S • diagonal pointer: cell (i,j) to cell (i-1, j-1) – Align S(i) with T(j) – Replace S(i) with T(j) Table and transcripts • The pointers represent all optimal transcripts • Theorem: – Any path from (n,m) to (0,0) following the pointers specifies an optimal transcript. – Conversely, any optimal transcript is specified by such a path. – The correspondence between paths and transcripts is one to one. Running Time • Initialization of table – O(n+m) • Calculating table and pointers – O(nm) • Traceback for one optimal transcript or optimal alignment – O(n+m) Operation-Weight Edit Distance • Consider S and T • We can assign weights to the various operations – – – – insertion/deletion of a character: cost d substitution (replacement) of a character: cost r matching: cost e Previous case: d = r = 1, e = 0 Modified Recurrence Relation • Base Case: – For 0 <= i <= n, D(i,0) = i d – For 0 <= j <= m, D(0,j) = j d • Recursive Case: – 0 < i <= n, 0 < j <= m – D(i,j) = min • D(i-1,j) + d • D(i,j-1) + d • D(i-1,j-1) + d(i,j) – d(i,j) = e if S(i) = T(j) and is r otherwise Alphabet-Weight Edit Distance • Define weight of each possible substitution – r(a,b) where a is being replaced by b for all a,b in the alphabet – For example, with DNA, maybe r(A,T) > r(A,G) – Likewise, I(a) may vary by character • Operation-weight edit distance is a special case of this variation • Weighted edit distance refers to this alphabetweight setting Modified Recurrence Relation • Base Case: – For 0 <= i <= n, D(i,0) = S1 <= k <= i I(S(k)) – For 0 <= j <= m, D(0,j) = S1 <= k <= j I(T(k)) • Recursive Case: – 0 < i <= n, 0 < j <= m – D(i,j) = min • D(i-1,j) + I(S(i)) • D(i,j-1) + I(T(j)) • D(i-1,j-1) + d(i,j) – d(i,j) = r(S(i), T(j)) Measuring Similarity of S and T • Definitions – Let S be the alphabet for strings S and T – Let S’ be the alphabet S with character - added – For any two characters x,y in S’, s(x,y) denotes the value (or score) obtained by aligning x with y – For a given alignment A of S and T, let S’ and T’ denote the strings after the chosen insertion of spaces and l their new length – The value of alignment A is S1<=i<=l s(S’(i),T’(i)) Example s a b - a 1 -2 0 2 -1 b •a b a a - b a b •a a a a a b - b • 1-2+1+1+0+2+0+2=5 0 String Similarity Problem • Input – 2 strings S and T – Scoring matrix s for alphabet S’ • Task – Output optimal alignment value of S and T • The alignment of S and T with maximal, not minimal, value – Output this alignment Modified Recurrence Relation • Base Case: – For 0 <= i <= n, V(i,0) = S1 <= k <= i s(S(k),-) – For 0 <= j <= m, V(0,j) = S1 <= k <= j s(-,T(k)) • Recursive Case: – 0 < i <= n, 0 < j <= m – V(i,j) = max • V(i-1,j) + s(S(i),-) • V(i,j-1) + s(-,T(j)) • V(i-1,j-1) + s(S(i), T(j)) Longest Common Subsequence Problem • Given 2 strings S and T, a common subsequence is a subsequence that appears in both S and T. • The longest common subsequence problem is to find a longest common subsequence (lcs) of S and T – subsequence: characters need not be contiguous – different than substring • Can you use dynamic programming to solve the longest common subsequence problem? Computing alignments using linear space. • Hirschberg [1977] • Suppose we only need the maximum similarity/distance value of S and T without an alignment or transcript • How can we conserve space? – Only save row i-1 when computing row i in the table Illustration 0 0 1 2 3 4 .. . n-1 n 1 2 3 4 5 6 7 … m Linear space and an alignment • Assume S has length 2n • Divide and conquer approach – Compute value of optimal alignment of S[1..n] with all prefixes of T • Store row n only at end along with pointer values of row n – Compute value of optimal alignment of Sr[1..n] with all prefixes of Tr • Store only values in row n • Find k such that – V(S[1..n],T[1..k]) + V(Sr[1..n],Tr[1..m-k]) – is maximized over 0 <= k <=m Illustration k=0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 V(S[1..6], T[1..0]) m-k=18 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 V(Sr[1..6], Tr[1..18]) Illustration k=1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 V(S[1..6], T[1..1]) m-k=17 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 V(Sr[1..6], Tr[1..17]) Illustration k=2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 V(S[1..6], T[1..2]) m-k=16 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 V(Sr[1..6], Tr[1..16]) Illustration k=9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 V(S[1..6], T[1..9]) m-k=9 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 V(Sr[1..6], Tr[1..9]) Illustration k=18 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 V(S[1..6], T[1..18]) m-k=0 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 V(Sr[1..6], Tr[1..0]) Illustration 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Recursive Step • Let k* be the k that maximizes – V(S[1..n],T[1..k]) + V(Sr[1..n],Tr[1..m-k]) • Record all steps on row n including the one from n-1 and the one to n+1 • Recurse on the two subproblems – S[1..n-1] with T[1..j] where j <= k* – Sr[1..n] with Tr[1..q] where q <= m-k* Illustration 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 6 5 4 3 2 1 0 - 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 Time Required • cmn time to get this answer so far • Two subproblems have at most half the total size of this problem – At most the same cmn time to get the rest of the solution • cmn/2 + cmn/4 + cmn/8 + cmn/16 + … <= cmn • Final result – Linear space with only twice as much time