Document

advertisement
Chapter 6
Dynamic Programming
1
Algorithmic Paradigms
Greedy. Build up a solution incrementally, optimizing some local
criterion.
Divide-and-conquer. Break up a problem into sub-problems, solve each
sub-problem independently, and combine solution to sub-problems to
form solution to original problem.
Dynamic programming. Break up a problem into a series of overlapping
sub-problems, and build up solutions to larger and larger sub-problems.
2
Dynamic Programming Applications
Areas.
Bioinformatics.
Control theory.
Information theory.
Operations research.
Computer science: theory, graphics, AI, compilers, systems, ….





Some famous dynamic programming algorithms.
Linux diff for comparing two files.
Smith-Waterman for genetic sequence alignment.
Bellman-Ford for shortest path routing in networks.
Cocke-Kasami-Younger for parsing context free grammars.




3
Knapsack Problem
Knapsack problem.
Given n objects and a "knapsack."
Item i weighs wi > 0 kilograms and has value vi > 0.
Knapsack has capacity of W kilograms.
Goal: fill knapsack so as to maximize total value.




Ex: { 3, 4 } has value 40.
W = 11
#
value
weight
1
1
1
2
6
2
3
18
5
4
22
6
5
28
7
Greedy: repeatedly add item with maximum ratio vi / wi.
Ex: { 5, 2, 1 } achieves only value = 35  greedy not optimal.
4
Dynamic Programming: False Start
Def. OPT(i) = max profit subset of items 1, …, i.


Case 1: OPT does not select item i.
– OPT selects best of { 1, 2, …, i-1 }
Case 2: OPT selects item i.
– accepting item i does not immediately imply that we will have to
reject other items
– without knowing what other items were selected before i,
we don't even know if we have enough room for i
Conclusion. Need more sub-problems!
5
Dynamic Programming: Adding a New Variable
Def. OPT(i, w) = max profit subset of items 1, …, i with weight limit w.


Case 1: OPT does not select item i.
– OPT selects best of { 1, 2, …, i-1 } using weight limit w
Case 2: OPT selects item i.
– new weight limit = w – wi
– OPT selects best of { 1, 2, …, i–1 } using this new weight limit
 0
if i  0

OPT(i, w)  OPT(i 1, w)
if w i  w
max OPT(i 1, w), v  OPT(i 1, w  w ) otherwise


i
i 

6
Knapsack Problem: Bottom-Up
Knapsack. Fill up an n-by-W array.
Input: n, W, w1,…,wN, v1,…,vN
for w = 0 to W
M[0, w] = 0
for i = 1 to n
for w = 1 to W
if (wi > w)
M[i, w] = M[i-1, w]
else
M[i, w] = max {M[i-1, w], vi + M[i-1, w-wi ]}
return M[n, W]
7
Knapsack Algorithm
W+1
n+1
0
1
2
3
4
5
6
7
8
9
10
11

0
0
0
0
0
0
0
0
0
0
0
0
{1}
0
1
1
1
1
1
1
1
1
1
1
1
{ 1, 2 }
0
1
6
7
7
7
7
7
7
7
7
7
{ 1, 2, 3 }
0
1
6
7
7
18
19
24
25
25
25
25
{ 1, 2, 3, 4 }
0
1
6
7
7
18
22
24
28
29
29
40
{ 1, 2, 3, 4, 5 }
0
1
6
7
7
18
22
28
29
34
34
40
OPT: { 4, 3 }
value = 22 + 18 = 40
W = 11
Item
Value
Weight
1
1
1
2
6
2
3
18
5
4
22
6
5
28
7
8
Knapsack Problem: Running Time
Running time. (n W).
Not polynomial in input size!
"Pseudo-polynomial."
Decision version of Knapsack is NP-complete. [Chapter 8]



Knapsack approximation algorithm. There exists a poly-time algorithm
that produces a feasible solution that has value within 0.01% of
optimum. [Section 11.8]
9
String Similarity
How similar are two strings?


ocurrance
occurrence
o
c
u
r
r
a
n
c
e
-
o
c
c
u
r
r
e
n
c
e
6 mismatches, 1 gap
o
c
-
u
r
r
a
n
c
e
o
c
c
u
r
r
e
n
c
e
1 mismatch, 1 gap
o
c
-
u
r
r
-
a
n
c
e
o
c
c
u
r
r
e
-
n
c
e
0 mismatches, 3 gaps
10
Edit Distance
Applications.
Basis for Linux diff.
Speech recognition.
Computational biology.



Edit distance. [Levenshtein 1966, Needleman-Wunsch 1970]
Gap penalty ; mismatch penalty pq.
In general, 2 >= pq.
Cost = sum of gap and mismatch penalties.



C
T
G
A
C
C
T
A
C
C
T
-
C
T
G
A
C
C
T
A
C
C
T
C
C
T
G
A
C
T
A
C
A
T
C
C
T
G
A
C
-
T
A
C
A
T
TC + GT + AG+ 2CA
2 + CA
11
Sequence Alignment
Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn of symbols,
find alignment of minimum cost.
Def. An alignment M is a set of ordered pairs xi-yj such that each
symbol occurs in at most one pair and no crossings. The number of xi
and yj that don’t appear in M is the number of gaps.
Def. The pair xi-yj and xi'-yj' cross if i < i', but j > j'.
cost(M) 
  xi y j 
(x i , y j )  M
mismatch
 
i : xi unmatched
 
j : y j unmatched
gap
Ex: CTACCG vs. TACATG.
Sol: M = x2-y1, x3-y2, x4-y3, x5-y4, x6-y6.
x1
x2
x3
x4
x5
C
T
A
C
C
-
G
-
T
A
C
A
T
G
y1
y2
y3
y4
y5
y6
x6
12
Sequence Alignment: Problem Structure
Def. OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 . . . yj.
Case 1: OPT matches xi-yj.
– pay mismatch for xi-yj + min cost of aligning two strings
x1 x2 . . . xi-1 and y1 y2 . . . yj-1
Case 2a: OPT leaves xi unmatched.
– pay gap for xi and min cost of aligning x1 x2 . . . xi-1 and y1 y2 . . . yj
Case 2b: OPT leaves yj unmatched.
– pay gap for yj and min cost of aligning x1 x2 . . . xi and y1 y2 . . . yj-1



 j

 x i y j  OPT(i 1, j 1)



OPT(i, j)   min    OPT(i 1, j)

   OPT(i, j 1)



 i
if i  0
otherwise
if j  0

13
Sequence Alignment: Algorithm
Alignment(m, n, x1x2...xm, y1y2...yn, , ) {
// A[0..m,0..n]: int array
for i = 0 to m
A[i, 0] = i
for j = 1 to n
A[0, j] = j
for i = 1 to m
A[i, j] = min([xi, yj] + A[i-1, j-1],
 + A[i-1, j],
 + A[i, j-1])
return A[m, n]
}
Analysis. (mn) time and space.
English words or sentences: m, n  10.
Computational biology: m = n = 100,000.
10 billions ops OK, but 10GB array?
14
Sequence Alignment: Algorithm
Alignment(m, n, x1x2...xm, y1y2...yn, , ) {
// A[0..m,0..n]: int array
for i = 0 to m
A[i, 0] = i
for j = 1 to n
A[0, j] = j
for i = 1 to m
A[i, j] = min([xi, yj] + A[i-1, j-1],
 + A[i-1, j],
 + A[i, j-1])
return A[m, n]
}
Assuming
 =1
[xi, yj] = 0 if xi=yj
[xi, yj] = 1 otherwise
15
Subequence Alignment
Goal: Given two strings X = x1 x2 . . . xm and Y = y1 y2 . . . yn of symbols,
find alignment of X and a substring of Y with minimum cost.
Ex: CTACCG vs. TXYTACATGAH.
Sol: Substring is TACATG and M = x2-y4, x3-y5, x4-y6, x5-y7, x6-y9.
cost(M) 
  xi y j 
(x i , y j )  M
mismatch

 
i : xi unmatched
 
j : y j unmatched
gap
Sequence Alignment: Problem Structure
Def. OPT(i, j) = min cost of aligning strings x1 x2 . . . xi and y1 y2 . . . yj.
Case 1: OPT matches xi-yj.
– pay mismatch for xi-yj + min cost of aligning two strings
x1 x2 . . . xi-1 and y1 y2 . . . yj-1
Case 2a: OPT leaves xi unmatched.
– pay gap for xi and min cost of aligning x1 x2 . . . xi-1 and y1 y2 . . . yj
Case 2b: i < m and OPT leaves yj unmatched.
– pay gap for yj and min cost of aligning x1 x2 . . . xi and y1 y2 . . . yj-1
Case 2c: i == m and OPT leaves yj unmatched.
– pay 0 for yj and min cost of aligning x1 x2 . . . xi and y1 y2 . . . yj-1




 0



OPT (i, j )   min



 i
 xi y j  OPT (i  1, j  1)

  OPT (i  1, j )
  OPT (i, j  1)

if i  0
ot herwise
if j  0
17
Subequence Alignment: Algorithm
Alignment(m, n, x1x2...xm, y1y2...yn, , ) {
// A[0..m,0..n]: int array
for i = 0 to m
A[i, 0] = i
for j = 1 to n
A[0, j] = 0
for i = 1 to m - 1
A[i, j] = min([xi, yj] + A[i-1, j-1],
 + A[i-1, j],
 + A[i, j-1])
A[m, j] = min([xm, yj] + A[m-1, j-1],
 + A[m-1, j],
A[m, j-1])
return A[m, n]
}
Analysis. (mn) time and space.
18
Longest common subsequence
•
•
•
•
The longest common subsequence (not substring) between
“democrat” and “republican” is eca.
A common subsequence is defined by all the identical character
matches in an alignment of two strings.
To maximize the number of such matches, we must prevent
substitution of non-identical characters, that is, 2 <= pq for p != q.
A[i, j] = min([xi, yj] + A[i-1, j-1],
 + A[i-1, j],
 + A[i, j-1])
19
Maximum Monotone Subsequence
•
•
•
•
•
A numerical sequence is monotonically increasing if the ith element
is at least as big as the (i - 1)st element.
The maximum monotone subsequence problem seeks to delete the
fewest number of elements from an input string S to leave a
monotonically increasing subsequence.
Ex: A longest increasing subsequence of “243519698” is “24569.”
Let X be the input sequence and Y be the sorted input sequence.
Then a longest increasing subsequence of X is also a longest common
subsequence of X and Y, and vice versa.
Using the previous idea, we can solve this problem in O(n2) space and
time. Can we do better?
20
Maximum Monotone Subsequence
•
•
A numerical sequence is monotonically increasing if the ith element
is at least as big as the (i - 1)st element. Given X = x1 x2 . . . xn find
the longest monotonically increasing subsequence of X.
Let OPT(i) be the longest monotonically increasing subsequence
ending with xi. Then
OPT(1) = 1 and
• OPT(i) = max(OPT(j)+1 : j < i and xj < xi )
•
MonotoneSubsequence(x1x2...xn) {
// A[1..n]: int array
for i = 1 to n {
A[i] = 1
for j = 1 to i - 1
if (xi >= xj) A[i] = max(A[i], A[j]+1)
}
return max(A[1..n])
}
// O(n) space, O(n2) time
21
Maximum Monotone Subsequence
•
MonotoneSubsequence returns the length of maximum monotone
subsequence. How to return the maximum monotone subsequence?
MonotoneSubsequence2(x1x2...xn) {
y = MonotoneSubsequence(x1x2...xn)
for k = 1 to n if (A[k] == y) i = k;
S = [];
while (i > 0) {
S = xi + S
for j = i – 1 to 1
if (xi >= xj && A[i] == A[j]+1) {
i = j; break;
}
if (j < 1) break;
}
return S
}
// O(n) time
22
Download