Solutions 1

advertisement
Exercises 1, solutions
1. Write the pseudocode of algorithms for finding the most frequent symbol
in a string for a) ordered alphabets and b) integer alphabets, and analyze
their time and space complexity.
Solution: the following algorithm computes the most frequent symbol in
a string T of length n in the ordered alphabet case. It runs in O(n log σT )
time and uses O(σT ) space, where σT is the number of distinct symbols
in T . It uses a balanced binary tree data structure, denoted by T , which
supports the following operation:
increment(c): adds symbol c with a counter equal to 0, if not present,
and increments c’s counter by one. Returns c’s counter. Time: O(log σ)
f max ← 0
for i ← 0 to |T | − 1 do
C ← T .increment(T [i])
if C > f max then f max ← C, smax ← T [i]
output(smax)
1.
2.
3.
4.
5.
The following algorithm computes the most frequent symbol in T in the
integer alphabet case. It runs in O(n + σ) time and uses O(σ) space.
1.
2.
3.
4.
5.
6.
for c ∈ Σ do C[c] ← 0
f max ← 0
for i ← 0 to |T | − 1 do
C[T [i]] ← C[T [i]] + 1
if C[T [i]] > f max then f max ← C[T [i]], smax ← T [i]
output(smax)
2. Compute the Morris-Pratt and Knuth-Morris-Pratt π functions for the
pattern ainainen.
Solution:
j
1
2
3
4
5
6
7
8
P [0 .. j − 1]
a
ai
ain
aina
ainai
ainain
ainaine
ainainen
πMP (j)
0
0
0
1
2
3
0
0
πKMP (j)
0
0
0
0
0
3
0
0
3. Modify Algorithm 2.7 to compute the Knuth-Morris-Pratt π function.
Solution: let πMP and πKMP be the MP and KMP π functions, respectively. The following algorithm computes the πKMP function:
1
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
for i ← 1 to m do π(i) ← 0
i ← 1, j ← 0
while i < m do
while i + j < m and P [i + j] = P [j] do
j ←j+1
if i + j = m or P [i + j] 6= P [j] then
π(i + j) ← j
else π(i + j) ← π(j)
if j = 0 then i ← i + 1
else i ← i + j − π(j), j ← π(j)
Its correctness is based on the fact that
(
j
if i = m ∨ P [i] 6= P [j]
πKMP (i) =
πKMP (j) otherwise
where j = πMP (i).
4. Simulate the Duels algorithm with P = acat and T = tagatacatt. Modify
T so that acat does not occur in T while 14 occurs in T 0 .
Solution:
W[1] = 1, W[2] = 2, W[3] = 1
stack =
stack =
duel(5,
stack =
duel(4,
stack =
duel(3,
stack =
duel(2,
stack =
stack =
duel(0,
stack =
{ }, push 6
{ 6 }, push 5
6), W[1] = 1,
{ 5 }, push 4
5), W[1] = 1,
{ 5 }, push 3
5), W[2] = 2,
{ 5 }, push 2
5), W[3] = 1,
{ 5 }, push 1
{ 1 5 }, push
1), W[1] = 1,
{ 1 5 }
mark(1)
mark(2)
mark(3)
mark(4)
mark(5)
mark(6)
mark(7)
mark(8)
mark(9)
=
=
=
=
=
=
=
=
=
1,
1,
1,
1,
5,
5,
5,
5,
5,
T[1]
T[2]
T[3]
T[4]
T[5]
T[6]
T[7]
T[8]
T[9]
=
=
=
=
=
=
=
=
=
T[6 + 1] = a, P[1] = c
T[5 + 1] = c, P[1] = c
T[5 + 2] = a, P[2] = a
T[5 + 1] = c, P[1] = c
a,
g,
a,
t,
a,
c,
a,
t,
t,
0
T[1 + 1] = g, P[1] = c
P[1
P[2
P[3
P[4
P[5
P[6
P[7
P[8
P[9
2
1]
1]
1]
1]
5]
5]
5]
5]
5]
=
=
=
=
=
=
=
=
=
a, T’[1] = 1
c, T’[2] = 0
a, T’[3] = 1
t, T’[4] = 1
a, T’[5] = 1
c, T’[6] = 1
a, T’[7] = 1
t, T’[8] = 1
, T’[9] = 0
There is one occurrence of P at starting position 5. If we replace symbol
T [8] = t with c, then we have
mark(8) = 5, T[8] = c, P[8 - 5] = t, T’[8] = 0
so that 14 occurs in T 0 while P does not occur in T .
5. Write the pseudocode of the Duels Algorithm, excluding the computation
of the witness array. The algorithm should run in O(n log n) time and
should use constant space (in addition to the witness array and the stack).
Solution: the following code implements the first phase of Duels algorithm. It computes the set S of all consistent positions of T with respect
to P . It stores S into a list L which supports the operations push-front,
pop-front and pop-back.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
for i ← n − m to 0 do
L.push-front(i)
while L.size() ≥ 2 do
i1 ← L.pop-front()
i2 ← L.pop-front()
k ← W [i2 − i1 ]
if i2 − i1 ≥ m ∨ k = 0 then
L.push-front(i2 )
L.push-front(i1 )
else if T [i2 + k] = P [k] then
L.push-front(i2 )
else L.push-front(i1 )
The following code implements the second phase of the Duels algorithm. It
finds all the occurrences of P in T using the list L. The algorithm iterates
over all the positions i in T in decreasing order, excluding the positions
smaller than the first position in the list. The algorithm maintains the
invariants that mark(i) = j and that after processing position i, if ones ≥
1, then ones = max{k | T 0 [i .. i + k − 1] = 1k }. Hence, if ones ≥ m
after processing position j ∈ L, P occurs at starting position j and the
algorithm reports j. If L is implemented using a doubly-linked list, the
list operations run in constant time and the algorithm runs in O(n) time
and uses constant space in addition to the list.
3
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
b←n−1
ones ← 0
while L.size() > 0 do
j ← L.pop-back()
for i ← b downto j do
if P [i − j] 6= T [i] then
ones ← 0
else
ones ← ones + 1
if ones ≥ m then output(j)
b←j−1
6. Simulate the BNDM algorithm with P = acat and T = tagatacatt.
Solution:
B[a] = 0101
B[c] = 0010
B[t] = 1000
i = 3, T[0 .. 3] = taga
j = 0, T[3 - 0] = a, D = 0101
j = 1, T[3 - 1] = g, D = 0000
shift = 3
i = 6, T[3
j = 0, T[6
j = 1, T[6
j = 2, T[6
shift = 2
.. 6] = atac
- 0] = c, D = 0010
- 1] = a, D = 0001
- 2] = t, D = 0000
i = 8, T[5
j = 0, T[8
j = 1, T[8
j = 2, T[8
j = 3, T[8
j = 4, T[8
shift = 4
.. 8] = acat
- 0] = t, D =
- 1] = a, D =
- 2] = c, D =
- 3] = a, D =
- 4] = t, D =
1000
0100
0010
0001
0000
4
Download