Solutions 1

Exercises 1, solutions 1. Write the pseudocode of algorithms for finding the most frequent symbol in a string for a) ordered alphabets and b) integer alphabets, and analyze their time and space complexity. Solution: the following algorithm computes the most frequent symbol in a string T of length n in the ordered alphabet case. It runs in O(n log σT ) time and uses O(σT ) space, where σT is the number of distinct symbols in T . It uses a balanced binary tree data structure, denoted by T , which supports the following operation: increment(c): adds symbol c with a counter equal to 0, if not present, and increments c’s counter by one. Returns c’s counter. Time: O(log σ) f max ← 0 for i ← 0 to |T | − 1 do C ← T .increment(T [i]) if C > f max then f max ← C, smax ← T [i] output(smax) 1. 2. 3. 4. 5. The following algorithm computes the most frequent symbol in T in the integer alphabet case. It runs in O(n + σ) time and uses O(σ) space. 1. 2. 3. 4. 5. 6. for c ∈ Σ do C[c] ← 0 f max ← 0 for i ← 0 to |T | − 1 do C[T [i]] ← C[T [i]] + 1 if C[T [i]] > f max then f max ← C[T [i]], smax ← T [i] output(smax) 2. Compute the Morris-Pratt and Knuth-Morris-Pratt π functions for the pattern ainainen. Solution: j 1 2 3 4 5 6 7 8 P [0 .. j − 1] a ai ain aina ainai ainain ainaine ainainen πMP (j) 0 0 0 1 2 3 0 0 πKMP (j) 0 0 0 0 0 3 0 0 3. Modify Algorithm 2.7 to compute the Knuth-Morris-Pratt π function. Solution: let πMP and πKMP be the MP and KMP π functions, respectively. The following algorithm computes the πKMP function: 1 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. for i ← 1 to m do π(i) ← 0 i ← 1, j ← 0 while i < m do while i + j < m and P [i + j] = P [j] do j ←j+1 if i + j = m or P [i + j] 6= P [j] then π(i + j) ← j else π(i + j) ← π(j) if j = 0 then i ← i + 1 else i ← i + j − π(j), j ← π(j) Its correctness is based on the fact that ( j if i = m ∨ P [i] 6= P [j] πKMP (i) = πKMP (j) otherwise where j = πMP (i). 4. Simulate the Duels algorithm with P = acat and T = tagatacatt. Modify T so that acat does not occur in T while 14 occurs in T 0 . Solution: W[1] = 1, W[2] = 2, W[3] = 1 stack = stack = duel(5, stack = duel(4, stack = duel(3, stack = duel(2, stack = stack = duel(0, stack = { }, push 6 { 6 }, push 5 6), W[1] = 1, { 5 }, push 4 5), W[1] = 1, { 5 }, push 3 5), W[2] = 2, { 5 }, push 2 5), W[3] = 1, { 5 }, push 1 { 1 5 }, push 1), W[1] = 1, { 1 5 } mark(1) mark(2) mark(3) mark(4) mark(5) mark(6) mark(7) mark(8) mark(9) = = = = = = = = = 1, 1, 1, 1, 5, 5, 5, 5, 5, T[1] T[2] T[3] T[4] T[5] T[6] T[7] T[8] T[9] = = = = = = = = = T[6 + 1] = a, P[1] = c T[5 + 1] = c, P[1] = c T[5 + 2] = a, P[2] = a T[5 + 1] = c, P[1] = c a, g, a, t, a, c, a, t, t, 0 T[1 + 1] = g, P[1] = c P[1 P[2 P[3 P[4 P[5 P[6 P[7 P[8 P[9 2 1] 1] 1] 1] 5] 5] 5] 5] 5] = = = = = = = = = a, T’[1] = 1 c, T’[2] = 0 a, T’[3] = 1 t, T’[4] = 1 a, T’[5] = 1 c, T’[6] = 1 a, T’[7] = 1 t, T’[8] = 1 , T’[9] = 0 There is one occurrence of P at starting position 5. If we replace symbol T [8] = t with c, then we have mark(8) = 5, T[8] = c, P[8 - 5] = t, T’[8] = 0 so that 14 occurs in T 0 while P does not occur in T . 5. Write the pseudocode of the Duels Algorithm, excluding the computation of the witness array. The algorithm should run in O(n log n) time and should use constant space (in addition to the witness array and the stack). Solution: the following code implements the first phase of Duels algorithm. It computes the set S of all consistent positions of T with respect to P . It stores S into a list L which supports the operations push-front, pop-front and pop-back. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. for i ← n − m to 0 do L.push-front(i) while L.size() ≥ 2 do i1 ← L.pop-front() i2 ← L.pop-front() k ← W [i2 − i1 ] if i2 − i1 ≥ m ∨ k = 0 then L.push-front(i2 ) L.push-front(i1 ) else if T [i2 + k] = P [k] then L.push-front(i2 ) else L.push-front(i1 ) The following code implements the second phase of the Duels algorithm. It finds all the occurrences of P in T using the list L. The algorithm iterates over all the positions i in T in decreasing order, excluding the positions smaller than the first position in the list. The algorithm maintains the invariants that mark(i) = j and that after processing position i, if ones ≥ 1, then ones = max{k | T 0 [i .. i + k − 1] = 1k }. Hence, if ones ≥ m after processing position j ∈ L, P occurs at starting position j and the algorithm reports j. If L is implemented using a doubly-linked list, the list operations run in constant time and the algorithm runs in O(n) time and uses constant space in addition to the list. 3 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. b←n−1 ones ← 0 while L.size() > 0 do j ← L.pop-back() for i ← b downto j do if P [i − j] 6= T [i] then ones ← 0 else ones ← ones + 1 if ones ≥ m then output(j) b←j−1 6. Simulate the BNDM algorithm with P = acat and T = tagatacatt. Solution: B[a] = 0101 B[c] = 0010 B[t] = 1000 i = 3, T[0 .. 3] = taga j = 0, T[3 - 0] = a, D = 0101 j = 1, T[3 - 1] = g, D = 0000 shift = 3 i = 6, T[3 j = 0, T[6 j = 1, T[6 j = 2, T[6 shift = 2 .. 6] = atac - 0] = c, D = 0010 - 1] = a, D = 0001 - 2] = t, D = 0000 i = 8, T[5 j = 0, T[8 j = 1, T[8 j = 2, T[8 j = 3, T[8 j = 4, T[8 shift = 4 .. 8] = acat - 0] = t, D = - 1] = a, D = - 2] = c, D = - 3] = a, D = - 4] = t, D = 1000 0100 0010 0001 0000 4

Solutions 1

Related documents

Products

Support

Solutions 1

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib