Midterm: week 7 in the lecture for 2 hours 2016/5/29 chapter25 1 2016/5/29 chapter25 2 2016/5/29 chapter25 3 2016/5/29 chapter25 4 Recursive Algorithm: Compute-Opt(j) if j=0 then return 0 else return max {vj+Compute-Opt(p(j)), Compute-Opt(j-1)} Running time: >2n/2. (not required) 2016/5/29 chapter25 5 Index 1 2 v1=2 p(1)=0 v2=4 p(2)=0 v3=4 p(3)=1 3 4 v4=7 p(4)=0 v5=2 5 v6=1 6 2016/5/29 chapter25 p(5)=3 p(6)=3 6 (not required) OPT(6) OPT(5) OPT(4) OPT(3) OPT(3) OPT(2) OPT(1) OPT(3) OPT(2) OPT(2) OPT(1) OPT(1) OPT(1) OPT(1) OPT(1) 2016/5/29 chapter25 The tree of subproblems grows very quickly It may take exponential time 7 (not required) T(n)=T(n-1)+T(n-2)>2T(n-2)>4T(n-4) > 8T(n-6)>…>2n/2T(1) 2016/5/29 chapter25 8 Weighted Interval Scheduling: Bottom-Up Input: n, s1, s2, …, sn, f1, f2, …, fn, v1, v2, …, vn Sort jobs by finish times so that f1f2 … fn. Compute p(1), p(2) , …, p(n) M[0]=0; for j = 1 to n do M[j] = max { vj+m[p(j)], m[j-1]} if (M[j] == M[j-1]) then B[j]=0 else B[j]=1 /*for backtracking m=n; /*** Backtracking while ( m ≠0) { if (B[m]==1) then print job m; m=p(m) else 2016/5/29 m=m-1 } B[j]=0 indicating job j is not selected. B[j]=1 indicating job j is selected. chapter25 9 M[2]=w2+M[0]=4+0; M[3]=w3+M[1]=4+2; M[4]=W4+M[0]=7+0; M[5]=W5+M[3]=2+6; M[6]=w6+M[3]=1+6<8; Index w1=2 1 p(2)=0 w3=4 3 p(4)=0 w5=2 5 w6=1 6 Backtracking: job1, job 3, job 5 2016/5/29 2 3 4 5 M= 0 2 0 2 4 0 2 4 6 0 2 4 6 7 0 2 4 6 7 8 0 2 4 6 7 8 6 p(3)=1 w4=7 4 1 p(1)=0 w2=4 2 0 chapter25 p(5)=3 p(6)=3 j: 0 1 2 3 4 5 6 B: 0 1 1 1 1 1 0 10 8 Backtracking and time complexity •Backtracking is used to get the schedule. •P()’s can be computed in O(n) time after sorting all the jobs based on the starting times. •Time complexity • O(n) if the jobs are sorted and p() is computed. • Total time: O(n log n) including sorting. 2016/5/29 chapter25 11 Computing p()’s in O(n) time P()’s can be computed in O(n) time using two sorted lists, one sorted by finish time (if two jobs have the same finish time, sort them based on starting time) and the other sorted by start time. Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(3,8), e(6,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. (See demo7) 2016/5/29 chapter25 12 Example 2: Start time: b(0, 5), a(1, 3), e(3, 8), c(5, 6), d(6, 8) Finish time a(1, 3), b(0,5), c(5,6), d(6,8), e(3,8) P(d)=c, p(c )=b, p(e)= a, p(a)=0, p(b)=0. v(a)=2, v(b)=3, v(c )=5, v(d) =6, v(e)=8.8. Solution: M[0]=0, M[a]=2. M[b]=max{2, 3+M[p(b)]}=3. M[c]=max{3, 5+M[p(c )]}=5+M[b]=8. M[d]=max{8, 6+M[p(d)]}=6+M[c]=6+8=14. M[e]=max{14, 8.8+M[p(e)]}=max{14, 8.8+M[a]}=max {14, 10.8}=14. Backtracking: b, c, d. 2016/5/29 Job: a b c d e chapter25 B: 1 1 1 1 0 13 Longest common subsequence • Definition 1: Given a sequence X=x1x2...xm, another sequence Z=z1z2...zk is a subsequence of X if there exists a strictly increasing sequence i1i2...ik of indices of X such that for all j=1,2,...k, we have xij=zj. • Example 1: If X=abcdefg, Z=abdg is a subsequence of X. X=abcdefg, Z=ab d g 2016/5/29 chapter25 14 • Definition 2: Given two sequences X and Y, a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. • Example 2: X=abcdefg and Y=aaadgfd. Z=adf is a common subsequence of X and Y. X=abc defg Y=aaaadgfd Z=a d f 2016/5/29 chapter25 15 • Definition 3: A longest common subsequence of X and Y is a common subsequence of X and Y with the longest length. (The length of a sequence is the number of letters in the seuqence.) • Longest common subsequence may not be unique. • Example: abcd acbd Both acd and abd are LCS. 2016/5/29 chapter25 16 Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output: a longest common subsequence of X and Y. • Applications: • Similarity of two lists – Given two lists: L1: 1, 2, 3, 4, 5 , L2:1, 3, 2, 4, 5, – Length of LCS=4 indicating the similarity of the two lists. • Unix command “diff”. 2016/5/29 chapter25 17 Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output: a longest common subsequence of X and Y. • A brute-force approach Suppose that mn. Try all subsequence of X (There are 2m subsequence of X), test if such a subsequence is also a subsequence of Y, and select the one with the longest length. 2016/5/29 chapter25 18 Charactering a longest common subsequence • Theorem (Optimal substructure of an LCS) • Let X=x1x2...xm, and Y=y1y2...yn be two sequences, and • Z=z1z2...zk be any LCS of X and Y. • 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1]. • 2. If xm yn, then zkxm implies that Z is an LCS of X[1..m-1] and Y. • 2. If xm yn, then zkyn implies that Z is an LCS of X and Y[1..n-1]. 2016/5/29 chapter25 19 The recursive equation • Let c[i,j] be the length of an LCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: 0 if i=0 or j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, max{c[i,j-1],c[i-1,j]} if i,j>0 and xiyj. Computing the length of an LCS • There are nm c[i,j]’s. So we can compute them in a specific order. 2016/5/29 chapter25 20 The algorithm to compute an LCS • • • • • • • • • • • • • • • • 1. for i=1 to m do 2. c[i,0]=0; 3. for j=0 to n do 4. c[0,j]=0; 5. for i=1 to m do 6. for j=1 to n do 7. { 8. if x[i] ==y[j] then 9. c[i,j]=c[i-1,j-1]+1; 10 b[i,j]=1; 11. else if c[i-1,j]>=c[i,j-1] then 12. c[i,j]=c[i-1,j] 13. b[i,j]=2; 14. else c[i,j]=c[i,j-1] 15. b[i,j]=3; 14 } 2016/5/29 chapter25 21 Example 3: X=BDCABA and Y=ABCBDAB. 2016/5/29 chapter25 22 Constructing an LCS (back-tracking) • We can find an LCS using b[i,j]’s. • We start with b[n,m] and track back to some cell b[0,i] or b[i,0]. • The algorithm to construct an LCS (backtracking) 1. 2. 3. 4. i=m j=n; if i==0 or j==0 then exit; if b[i,j]==1 then { i=i-1; j=j-1; print “xi”; } 5. if b[i,j]==2 i=i-1 6. if b[i,j]==3 j=j-1 7. Goto Step 3. • The time complexity: O(nm). 2016/5/29 chapter25 23 Remarks on weighted interval scheduling • it takes long time to explain. (50+13 minutes) • Do not mention exponent time etc. • For the first example, use the format of example 2 to show the computation process (more clearly). 2016/5/29 chapter25 24 Shortest common supersequence • Definition: Let X and Y be two sequences. A sequence Z is a supersequence of X and Y if both X and Y are subsequence of Z. • Shortest common supersequence problem: Input: Two sequences X and Y. Output: a shortest common supersequence of X and Y. • Example: X=abc and Y=abb. Both abbc and abcb are the shortest common supersequences for X and Y. 2016/5/29 chapter25 25 Recursive Equation: • Let c[i,j] be the length of an SCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: j if i=0 i if j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj. 2016/5/29 chapter25 26 2016/5/29 chapter25 27 The pseudo-codes for i=0 to n do c[i, 0]=i; for j=0 to m do c[0,j]=j; for i=1 to n do for j=1 to m do if (xi == yj) c[i ,j]= c[i-1, j-1]+1; b[i.j]=1; else { c[i,j]=min{c[i-1,j]+1, c[i,j-1]+1}. if (c[I,j]=c[i-1,j]+1 then b[I,j]=2; else b[I,j]=3; } p=n, q=m; / backtracking while (p≠0 or q≠0) { if (b[p,q]==1) then {print x[p]; p=p-1; q=q-1} if (b[p,q]==2) then {print x[p]; p=p-1} if (b[p,q]==3) then {print y[q]; q=q-1} } 2016/5/29 chapter25 28 Exercises • Exercise 1: For the weighted interval scheduling problem, there are eight jobs with starting time and finish time as follows: j1=(0, 8), j2=(2, 3), j3=(3, 6), j4=(5, 9), j5=(8, 12), j6=(9, 11), j7=(10, 13) and j8=(11, 16). The weight for each job is as follows: v1=3.5, v2=2.0, v3=3.0, v4=3.0, v5=6.5, v6=2.5, v7=12.0, and v8=8.0. Find a maximum weight subset of mutually compatible jobs. (Backtracking process is required.) (You have to compute p()’s. The process of computing p()’s is NOT required.) • Exercise 2: Let X=abbacab and Y=baabcbb. Find the longest common subsequence for X and Y. Backtracking process is required. 2016/5/29 chapter25 29 Summary of Week 6 • Understand the algorithms for the weighted Interval Scheduling problem, LCS and SCS. • The “alignment of sequences” part is not taught. 2016/5/29 chapter25 30 Alignment of sequences • An alignment: – inserting spaces into X and Y such that the two resulting sequences X’ and Y’ are of the same length. – every letter in X’ is opposite to a unique letter in Y’. Examples: o-currence o-curr-ance abbbaa--bbbbaab occurrence o-curre-nce ababaaabbbbba-b n • The alignment value: s( X '[i], Y '[i]) i 1 – where X’[i] and Y’[i] are the two letters in column i of the alignment and s(X’[i], Y’[i]) is the score (weight) of these opposing letters. • There are several popular socre schemes for DNA and protein sequences. 2016/5/29 chapter25 31 • Recursive equations: c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Similarity Score Scheme (max): – match: 1; – mismatch or insertion or deletion: 0. Example: A B B C A A A A B B C AAA 0 0 0 0 0 0 0 0 A B C C AA A 0 1 1 1 1 1 1 1 B 0 1 2 2 2 2 2 2 The same as LCS if we C 0 1 2 2 3 3 3 3 C 0 1 2 2 3 3 3 3 use the special A 0 1 2 2 3 4 4 4 similarity score and A 0 1 2 2 3 4 5 5 maximization 2016/5/29 chapter25 32 • Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. Distance Score Scheme (mix): – match: 0 insertion and deletion 1; – Mismatch 2 Example: A B B C A A A A B B C AAA 0 1 2 3 4 5 6 7 A B C C AA A 1 1 2 3 4 5 6 7 B 2 2 2 3 4 5 6 7 C 3 3 3 4 4 5 6 7 C 4 4 4 5 5 6 7 8 The same as SCS if we use A 5 5 5 6 6 6 7 8 the special distance score A 6 6 6 7 7 7 7 8 and minimization 2016/5/29 chapter25 33 A score emphasizing A-A match: (max) – A-A match: 1, – Any other match or mismatch: 0. Example: A B B C A A A A B B C AAA 0 0 0 0 0 0 0 0 A B C C AA A 0 1 1 1 1 1 1 1 B 0 1 1 1 1 1 1 1 C 0 1 1 1 1 1 1 1 There are 3 A-A C 0 1 1 1 1 1 1 1 matchs A 0 1 1 1 1 2 2 2 A 0 1 1 1 1 2 3 3 2016/5/29 chapter25 34 • Recursive equations: c[i,j]=min{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. c[i,j]=max{ c[i-1, j-1]+s(X[i], Y[j]), c[i, j-1]+s(_,Y[j]), c[i-1, j)+s(X[i],_)}. • Time and space complexity Both are O(nm) or O(n2) if both sequences have equal length n. • Why? We have to compute c[i,j] (the cost) and b[i,j] (for backtracking). Each will take O(n2). 2016/5/29 chapter25 35