Dynamic Programming

advertisement
Dynamic Programming
•
•
•
•
•
Optimization Problems
Dynamic Programming Paradigm
Example: Matrix multiplication
Principle of Optimality
Example: String Matching Problems
Optimization Problems
• In an optimization problem, there are typically
many feasible solutions for any input instance I
• For each solution S, we have a cost or value
function f(S)
• Typically, we wish to find a feasible solution S
such that f(S) is either minimized or maximized
• Thus, when designing an algorithm to solve an
optimization problem, we must prove the
algorithm produces a best possible solution.
Example Problem
You have six hours to complete as many tasks as
possible, all of which are equally important.
Task A - 2 hours
Task D - 3.5 hours
Task B - 4 hours
Task E - 2 hours
Task C - 1/2 hour
Task F - 1 hour
How many can you get done?
• Is this a minimization or a maximization problem?
• Give one example of a feasible but not optimal
solution along with its associated value.
• Give an optimal solution and its associated value.
Dynamic Programming
• The key idea behind dynamic program is that it is
a divide-and-conquer technique at heart
• That is, we solve larger problems by patching
together solutions to smaller problems
• However, dynamic programming is typically
faster because we compute these solutions in a
bottom-up fashion
Fibonacci numbers
• F(n) = F(n-1) + F(n-2)
– F(0) = 0
– F(1) = 1
• Top-down recursive computation is very
inefficient
– Many F(i) values are computed multiple times
• Bottom-up computation is much more efficient
– Compute F(2), then F(3), then F(4), etc. using stored
values for smaller F(i) values to compute next value
– Each F(i) value is computed just once
Recursive Computation
F(n) = F(n-1) + F(n-2) ; F(0) = 0, F(1) = 1
Recursive Solution:
F(6) = 8
F(4)
F(5)
F(4)
F(3)
F(2)
F(1)
F(0)
F(2)
F(1)
F(3)
F(3)
F(1)
F(0)
F(2)
F(1)
F(0)
F(1)
F(2)
F(1)
F(0)
F(2)
F(1)
F(1)
F(0)
Bottom-up computation
We can calculate F(n) in linear time by storing small
values.
F[0] = 0
F[1] = 1
for i = 2 to n
F[i] = F[i-1] + F[i-2]
return F[n]
Moral: We can sometimes trade space for time.
Key implementation steps
• Identify subsolutions that may be useful in
computing whole solution
– Often need to introduce parameters
• Develop a recurrence relation (recursive solution)
– Set up the table of values/costs to be computed
• The dimensionality is typically determined by the number of
parameters
• The number of values should be polynomial
• Determine the order of computation of values
• Backtrack through the table to obtain complete
solution (not just solution value)
Example: Matrix Multiplication
• Input
– List of n matrices to be multiplied together using traditional matrix
multiplication
– The dimensions of the matrices are sufficient
• Task
– Compute the optimal ordering of multiplications to minimize total
number of scalar multiplications performed
• Observations:
– Multiplying an X  Y matrix by a Y  Z matrix takes X  Y  Z
multiplications
– Matrix multiplication is associative but not commutative
Example Input
• Input:
– M1, M2, M3, M4
•
•
•
•
M1: 13 x 5
M2: 5 x 89
M3: 89 x 3
M4: 3 x 34
• Feasible solutions and their values
–
–
–
–
–
((M1 M2) M3) M4:10,582 scalar multiplications
(M1 M2) (M3 M4): 54,201 scalar multiplications
(M1 (M2 M3)) M4: 2856 scalar multiplications
M1 ((M2 M3) M4): 4055 scalar multiplications
M1 (M2 (M3 M4)): 26,418 scalar multiplications
Key implementation steps
• Identify subsolutions that may be useful in
computing whole solution
– Often need to introduce parameters
– Define dimensions to be (d0, d1, …, dn) where
matrix Mi has dimensions di-1 x di
– Let M(i,j) be the matrix formed by multiplying
matrices Mi through Mj
– Define C(i,j) to be the minimum cost for
computing M(i,j)
Key implementation steps
• Develop a recurrence relation
(recursive solution)
C
1
1
0
– Recurrence relation for C(i,j)
• C(i,j) = mink=i to j-1 ( C(i,k)+
C(k+1,j) + di-1dkdj)
– The last multiplication is
between matrices M(i,k)
and M(k+1,j)
2
• C(i,i) = 0
– Set up the table of values/costs
to be computed
• The dimensionality is
typically determined by the
number of parameters
• The number of values should
be polynomial
3
4
2
3
4
0
0
0
Key implementation steps
• Determine the order of
computation of values
C
1
2
3
4
1
0
1
2
3
0
1
2
0
1
2
3
4
0
Computing actual ordering
C
1
2
3
4
P
1
2
3
4
1
0
5785
1530
2856
1
0
1
1
3
0
1335
1845
2
0
2
3
0
9078
3
0
3
0
4
2
3
4
0
P(i,j) records the intermediate multiplication k used to compute
M(i,j). That is, P(i,j) = k if last multiplication was M(i,k) M(k+1,j)
Pseudocode
int MatrixOrder()
forall i, j C[i, j] = 0;
for j = 2 to n
for i = j-1 to 1
C(i,j) = mini<=k<=j-1 (C(i,k)+ C(k+1,j) + di-1dkdj)
P[i, j]=k;
return C[1, n];
Backtracking
Procedure ShowOrder(i, j)
if (i=j) write ( “Ai”) ;
else
k = P [ i, j ] ;
write “ ( ” ;
ShowOrder(i, k) ;
write “  ” ;
ShowOrder (k+1, j) ;
write “)” ;
Principle of Optimality
• In book, this is termed “Optimal substructure”
• An optimal solution contains within it optimal
solutions to subproblems.
• More detailed explanation
– Suppose solution S is optimal for problem P.
– Suppose we decompose P into P1 through Pk and that S
can be decomposed into pieces S1 through Sk
corresponding to the subproblems.
– Then solution Si is an optimal solution for subproblem
Pi
Example 1
• Matrix Multiplication
– In our solution for computing matrix M(1,n),
we have a final step of multiplying matrices
M(1,k) and M(k+1,n).
– Our subproblems then would be to compute
M(1,k) and M(k+1,n)
– Our solution uses optimal solutions for
computing M(1,k) and M(k+1,n) as part of the
overall solution.
Example 2
• Shortest Path Problem
– Suppose a shortest path from s to t visits u
– We can decompose the path into s-u and u-t.
– The s-u path must be a shortest path from s to
u, and the u-t path must be a shortest path from
u to t
• Conclusion: dynamic programming can be
used for computing shortest paths
Example 3
• Longest Path Problem
– Suppose a longest path from s to t visits u
– We can decompose the path into s-u and u-t.
– Is it true that the s-u path must be a longest path
from s to u?
• Conclusion?
Example 4: The Traveling
Salesman Problem
What recurrence relation will return the optimal solution
to the Traveling Salesman Problem?
If T(i) is the optimal tour on the first i points, will this
help us in solving larger instances of the problem?
Can we set T(i+1) to be T(i) with the additional point
inserted in the position that will result in the shortest
path?
No!
T(4)
T(5)
Shortest Tour
Summary of bad examples
• There almost always is a way to have the
optimal substructure if you expand your
subproblems enough
• For longest path and TSP, the number of
subproblems grows to exponential size
• This is not useful as we do not want to
compute an exponential number of solutions
When is dynamic programming
effective?
• Dynamic programming works best on objects that
are linearly ordered and cannot be rearranged
–
–
–
–
characters in a string
files in a filing cabinet
points around the boundary of a polygon
the left-to-right order of leaves in a search tree.
• Whenever your objects are ordered in a left-toright way, dynamic programming must be
considered.
Efficient Top-Down Implementation
• We can implement any dynamic
programming solution top-down by storing
computed values in the table
– If all values need to be computed anyway,
bottom up is more efficient
– If some do not need to be computed, top-down
may be faster
Inexact Matching of Strings
• General Problem
– Input
• Strings S and T
– Questions
• How distant is S from T?
• How similar is S to T?
• Solution Technique
– Dynamic programming with
cost/similarity/scoring matrix
Measuring Distance of S and T
• Consider S and T
• We can transform S into T using the
following four operations
– insertion of a character into S
– deletion of a character from S
– substitution (replacement) of a character in S by
another character (typically in T)
– matching (no operation)
Example
•
•
•
•
•
•
•
•
S = vintner
T = writers
vintner
wintner (Replace v with w)
wrintner (Insert r)
writner (Delete first n)
writer (Delete second n)
writers (Insert S)
Example
• Edit Transcript (or just transcript):
– a string that describes the transformation of one
string into the other
• Example
– RIMDMDMMI
– v intner
– wri t ers
Edit Distance
• Edit distance of strings S and T
– The minimum number of edit operations (insertion,
deletion, replacement) needed to transform string S into
string T
– Levenshtein distance, Levenshtein appears to have been
the first to define this concept
• Optimal transcript
– An edit transcript of S and T that has the minimum
number of edit operations
– cooptimal transcripts
Alignment
• A global alignment of strings S and T is obtained
– by inserting spaces (dashes) into S and T
• they should have the same number of characters (including
dashes) at the end
– then placing two strings over each other matching one
character (or dash) in S with a unique character (or
dash) in T
– Note ALL positions in both S and T are involved
Alignments and Edit transcripts
• Example Alignment
– v-intner– wri-t-ers
• Alignments and edit transcripts are interrelated
– edit transcript: emphasizes process
• the specific mutational events
– alignment: emphasizes product
• the relationship between the two strings
– Alignments are often easier to work with and visualize
• also generalize better to more than 2 strings
Edit Distance Problem
• Input
– 2 strings S and T
• Task
– Output edit distance of S and T
– Output optimal edit transcript
– Output optimal alignment
• Solution method
– Dynamic Programming
Identifying Subproblems
• Let D(i,j) be the edit distance of S[1..i] and
T[1..j]
– The edit distance of the first i characters of S
with the first j characters of T
– Let |S| = n, |T| = m
• D(n,m) = edit distance of S and T
• We will compute D(i,j) for all i and j such
that 0 <= i <= n, 0 <= j <= m
Recurrence Relation
• Base Case:
– For 0 <= i <= n, D(i,0) = i
– For 0 <= j <= m, D(0,j) = j
• Recursive Case:
– 0 < i <= n, 0 < j <= m
– D(i,j) = min
• D(i-1,j) + 1
(what does this mean?)
• D(i,j-1) + 1
(what does this mean?)
• D(i-1,j-1) + d(i,j)
(what does this mean?)
– d(i,j) = 0 if S(i) = T(j) and is 1 otherwise
What the various cases mean
• D(i,j) = min
– D(i-1,j) + 1:
• Align S[1..i-1] with T[1..j] optimally
• Match S(i) with a dash in T
– D(i,j-1) + 1
• Align S[1..i] with T[1..j-1] optimally
• Match a dash in S with T(j)
– D(i-1,j-1) + d(i,j)
• Align S[1..i-1] with T[1..j-1] optimally
• Match S(i) with T(j)
Computing D(i,j) values
D(i,j)
0
v
i
n
t
n
e
r
0
1
2
3
4
5
6
7
w
1
r
2
i
3
t
4
e
5
r
6
s
7
Initialization: Base Case
D(i,j)
v
i
n
t
n
e
r
0
1
2
3
4
5
6
7
0
0
1
2
3
4
5
6
7
w
1
1
r
2
2
i
3
3
t
4
4
e
5
5
r
6
6
s
7
7
Row i=1
D(i,j)
v
i
n
t
n
e
r
0
1
2
3
4
5
6
7
0
0
1
2
3
4
5
6
7
w
1
1
1
r
2
2
2
i
3
3
3
t
4
4
4
e
5
5
5
r
6
6
6
s
7
7
7
Entry i=2, j=2
D(i,j)
v
i
n
t
n
e
r
0
1
2
3
4
5
6
7
0
0
1
2
3
4
5
6
7
w
1
1
1
2
r
2
2
2
?
i
3
3
3
t
4
4
4
e
5
5
5
r
6
6
6
s
7
7
7
Entry i=2, j=3
D(i,j)
v
i
n
t
n
e
r
0
1
2
3
4
5
6
7
0
0
1
2
3
4
5
6
7
w
1
1
1
2
r
2
2
2
2
i
3
3
3
?
t
4
4
4
e
5
5
5
r
6
6
6
s
7
7
7
Calculation methodologies
• Location of edit distance
– D(n,m)
•
•
•
•
Example was to calculate row by row
Can also calculate column by column
Can also use antidiagonals
Key is to build from upper left corner
Traceback
• Using table to construct optimal transcript
• Pointers in cell D(i,j)
– Set a pointer from cell (i,j) to
• cell (i, j-1) if D(i,j) = D(i, j-1) + 1
• cell (i-1,j) if D(i,j) = D(i-1,j) + 1
• cell (i-1,j-1) if D(i,j) = D(i-1,j-1) + d(i,j)
– Follow path of pointers from (n,m) back to (0,0)
What the pointers mean
• horizontal pointer: cell (i,j) to cell (i, j-1)
– Align T(j) with a space in S
– Insert T(j) into S
• vertical pointer: cell (i,j) to cell (i-1, j)
– Align S(i) with a space in T
– Delete S(i) from S
• diagonal pointer: cell (i,j) to cell (i-1, j-1)
– Align S(i) with T(j)
– Replace S(i) with T(j)
Table and transcripts
• The pointers represent all optimal transcripts
• Theorem:
– Any path from (n,m) to (0,0) following the
pointers specifies an optimal transcript.
– Conversely, any optimal transcript is specified
by such a path.
– The correspondence between paths and
transcripts is one to one.
Running Time
• Initialization of table
– O(n+m)
• Calculating table and pointers
– O(nm)
• Traceback for one optimal transcript or
optimal alignment
– O(n+m)
Operation-Weight Edit Distance
• Consider S and T
• We can assign weights to the various
operations
–
–
–
–
insertion/deletion of a character: cost d
substitution (replacement) of a character: cost r
matching: cost e
Previous case: d = r = 1, e = 0
Modified Recurrence Relation
• Base Case:
– For 0 <= i <= n, D(i,0) = i d
– For 0 <= j <= m, D(0,j) = j d
• Recursive Case:
– 0 < i <= n, 0 < j <= m
– D(i,j) = min
• D(i-1,j) + d
• D(i,j-1) + d
• D(i-1,j-1) + d(i,j)
– d(i,j) = e if S(i) = T(j) and is r otherwise
Alphabet-Weight Edit Distance
• Define weight of each possible substitution
– r(a,b) where a is being replaced by b for all a,b in the
alphabet
– For example, with DNA, maybe r(A,T) > r(A,G)
– Likewise, I(a) may vary by character
• Operation-weight edit distance is a special case of
this variation
• Weighted edit distance refers to this alphabetweight setting
Modified Recurrence Relation
• Base Case:
– For 0 <= i <= n, D(i,0) = S1 <= k <= i I(S(k))
– For 0 <= j <= m, D(0,j) = S1 <= k <= j I(T(k))
• Recursive Case:
– 0 < i <= n, 0 < j <= m
– D(i,j) = min
• D(i-1,j) + I(S(i))
• D(i,j-1) + I(T(j))
• D(i-1,j-1) + d(i,j)
– d(i,j) = r(S(i), T(j))
Measuring Similarity of S and T
• Definitions
– Let S be the alphabet for strings S and T
– Let S’ be the alphabet S with character - added
– For any two characters x,y in S’, s(x,y) denotes
the value (or score) obtained by aligning x with y
– For a given alignment A of S and T, let S’ and T’
denote the strings after the chosen insertion of
spaces and l their new length
– The value of alignment A is S1<=i<=l s(S’(i),T’(i))
Example
s
a
b
-
a
1
-2
0
2
-1
b
•a b a a - b a b
•a a a a a b - b
• 1-2+1+1+0+2+0+2=5
0
String Similarity Problem
• Input
– 2 strings S and T
– Scoring matrix s for alphabet S’
• Task
– Output optimal alignment value of S and T
• The alignment of S and T with maximal, not
minimal, value
– Output this alignment
Modified Recurrence Relation
• Base Case:
– For 0 <= i <= n, V(i,0) = S1 <= k <= i s(S(k),-)
– For 0 <= j <= m, V(0,j) = S1 <= k <= j s(-,T(k))
• Recursive Case:
– 0 < i <= n, 0 < j <= m
– V(i,j) = max
• V(i-1,j) + s(S(i),-)
• V(i,j-1) + s(-,T(j))
• V(i-1,j-1) + s(S(i), T(j))
Longest Common Subsequence
Problem
• Given 2 strings S and T, a common subsequence is a
subsequence that appears in both S and T.
• The longest common subsequence problem is to find
a longest common subsequence (lcs) of S and T
– subsequence: characters need not be contiguous
– different than substring
• Can you use dynamic programming to solve the
longest common subsequence problem?
Computing alignments using
linear space.
• Hirschberg [1977]
• Suppose we only need the maximum
similarity/distance value of S and T without
an alignment or transcript
• How can we conserve space?
– Only save row i-1 when computing row i in the
table
Illustration
0
0
1
2
3
4
..
.
n-1
n
1
2
3
4
5
6
7
…
m
Linear space and an alignment
• Assume S has length 2n
• Divide and conquer approach
– Compute value of optimal alignment of S[1..n] with all
prefixes of T
• Store row n only at end along with pointer values of row n
– Compute value of optimal alignment of Sr[1..n] with all
prefixes of Tr
• Store only values in row n
• Find k such that
– V(S[1..n],T[1..k]) + V(Sr[1..n],Tr[1..m-k])
– is maximized over 0 <= k <=m
Illustration
k=0
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
V(S[1..6],
T[1..0])
m-k=18
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
V(Sr[1..6],
Tr[1..18])
Illustration
k=1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
V(S[1..6],
T[1..1])
m-k=17
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
V(Sr[1..6],
Tr[1..17])
Illustration
k=2
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
V(S[1..6],
T[1..2])
m-k=16
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
V(Sr[1..6],
Tr[1..16])
Illustration
k=9
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
V(S[1..6],
T[1..9])
m-k=9
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
V(Sr[1..6],
Tr[1..9])
Illustration
k=18
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
V(S[1..6],
T[1..18])
m-k=0
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
V(Sr[1..6],
Tr[1..0])
Illustration
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
Recursive Step
• Let k* be the k that maximizes
– V(S[1..n],T[1..k]) + V(Sr[1..n],Tr[1..m-k])
• Record all steps on row n including the one
from n-1 and the one to n+1
• Recurse on the two subproblems
– S[1..n-1] with T[1..j] where j <= k*
– Sr[1..n] with Tr[1..q] where q <= m-k*
Illustration
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 0
1
2
3
4
5
6
6
5
4
3
2
1
0
- 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
Time Required
• cmn time to get this answer so far
• Two subproblems have at most half the total
size of this problem
– At most the same cmn time to get the rest of the
solution
• cmn/2 + cmn/4 + cmn/8 + cmn/16 + … <= cmn
• Final result
– Linear space with only twice as much time
Download