R f n-2

advertisement
Simple Algorithm for Sorting the
Fibonacci String Rotations
Manolis Christodoulakis
King’s College London
Joint work with
Costas S. Iliopoulos
Yoan José Pinzón Ardila
Our Goal
 What makes Fibonacci strings a best
case input for the Burrows-Wheeler
Transform (BWT)?
 Relationship between different rotations
of a Fibonacci string
 What is their lexicographic order?
 Side effect: we can deduce the symbol
stored at any position of any Fibonacci
string in constant time (without using ,
provided that the fn values are known)
SOFSEM 2006
2
Fibonacci Strings & Numbers
 The n-th Fibonacci string
Fn = Fn-1Fn-2
n≥2
F0=b, F1=a
 The n-th Fibonacci number
fn = fn-1+fn-2
n≥2
f0=1, f1=1
F0 = b
f0 = 1
F1 = a
F2 = a b
f1 = 1
F3 = a b a
f3 = 3
F4 = a b a a b
f4 = 5
f2 = 2
SOFSEM 2006
3
Notation
 The i-th rotation of a string
x
= 0 1 … i-1 i … n-1
Ri(x) = 0 1 … i-1 i … n-1
where i is taken modulo n.
 rank(i,x) = the rank of Ri(x)
 rot(ρ,x) = the rotation whose rank is ρ
SOFSEM 2006
4
Burrows-Wheeler Transform (BWT)



M.Burrows and D.J.Wheeler. 1994
Purpose: to make a string more
compressible
BWT Algorithm:
1.
2.
3.
4.
Create list of all rotations
Sort them
Output last symbol of every rotation
Output the rank of the 0-th rotation
SOFSEM 2006
5
BWT on Fibonacci Strings
F5 = abaababa, f5 = 8
R70(F5)
R21(F55)
R25(F55)
R
R30(F
(F55))
R43(F55)
R65(F5)
R16(F5)
R74(F5)
=
=
=
=
=
=
=
=
=
a
b
a
a
a
a
b
a
a
b
b
a
b
a
b
a
a
b
b
b
a
b
a
b
a
a
a
b
a
b
b
a
a
a
b
a
a
a
b
SOFSEM 2006
a
b
a
a
b
a
a
b
a
b
b
a
a
b
a
b
b
a
b
a
a
b
a
a
a
b
b
a
a
a
a
b
a
a
a
b
b
a
b
a
a
b
b
a
b
a
b
b
a
a
a
b
a
b
b
a
a
a
a
b
a
b
a
6
Properties of Fibonacci Strings
 The number of ‘b’ in Fn is fn-2
 Proof: By induction.
 C.S.Iliopoulos, D.W.Moore and
W.F.Smyth. 1997
Fn = Fn-2Fn-3…F1un,
un = ba (n odd)
un = ab (n even)
Let’s call this the IMS formula.
SOFSEM 2006
7
Similarities in Rotations
 R0(Fn) differs from Rfn-2(Fn) in 2 symbols
 Proof:
R0(Fn)
Rfn-2(Fn)
R0(Fn)
=
=
=
=
Fn-2Fn-3…F1un
Fn-3…F1unFn-2
Fn-1Fn-2
Fn-3…F1un-1Fn-2
(1)
(2)
 Ri(Fn) differs from Ri+fn-2(Fn) in 2 symbols
 Proof:
Ri(Fn)
= Ri(R0(Fn))
Ri+fn-2(Fn) = Ri(Rfn-2(Fn))
SOFSEM 2006
8
Relative Order of Rotations
 Ri(Fn) < Ri+fn-2(Fn) for n odd, i  fn-1-1
 Proof:
R0(Fn) = Fn-3…F1un-1Fn-2 = Fn-3 … F1 ab Fn-2
Rfn-2(Fn) = Fn-3…F1un Fn-2 = Fn-3 … F1 ba Fn-2
For i=fn-1-1:
Ri(Fn)
= bFn-2Fn-3…F1a
Ri+fn-2(Fn)= aFn-2Fn-3…F1b
 Similarly,
Ri(Fn) > Ri+fn-2(Fn) for n even, i  fn-1-1
SOFSEM 2006
9
Sorted List of Rotations
 We proved (n odd):
Ri(Fn) < Ri+fn-2(Fn)
i  fn-1-1
(3)
 We will now prove that there is no j s.t.
Ri(Fn) < Rj(Fn) < Ri+fn-2(Fn)
 Proof: (constructive)
Start at i=fn-1 and construct the partial list
Ri Ri+fn-2 Ri+2fn-2 Ri+3fn-2 … Ri+kfn-2 …
for as long as
i+kfn-2  fn-1-1 (mod fn)  kfn-1
I.e. the list is complete!
SOFSEM 2006
10
Identify Rotation (i) by Rank (ρ)
 Therefore, for n odd:
rot(ρ,Fn) = ( fn-1+ρfn-2) mod fn
= (ρfn-2-1) mod fn
 Similarly, for n even, the sorted list is
constructed bottom-up giving
rot(ρ,Fn) = (-(ρ+1)fn-2-1) mod fn
SOFSEM 2006
11
Identify Rank (ρ) of a Rotation (i)
 This is simply the inverse of the previous
function
 n odd
rank(i,Fn) = ((i+1)fn-2) mod fn
 n even
rank(i,Fn) = ((i+1)fn-2-1) mod fn
SOFSEM 2006
12
Symbols of Fibonacci Strings
 Fn[i] = ?
 Observe that
Fn[i] = Ri(Fn)[0]
 In the sorted list of rotations, the first fn-1
rotations start with ‘a’, the rest with ‘b’
 Thus Fn[i] can be deduced from rank(i,Fn)
If rank(i,Fn) ≤ fn-1 then Fn[i]=a else b.
SOFSEM 2006
13
BWT & Fibonacci ― The Quick Way
 The first fn-2 symbols of BWT are ‘b’
 Proof: (n odd)
We proved the first fn-2 rotations have index
(ρ·fn-2-1)modfn
for 0 ≤ ρ < fn-2
The last symbol of these rotations is
Fn[ (ρ·fn-2-1 +fn-1)modfn ]
Which for 0 ≤ ρ < fn-2 is ‘b’
 The next fn-1 symbols of BWT are ‘a’
 Proof: Consequence of previous lemma
SOFSEM 2006
14
Download