PPTX - Computer Science and Engineering

advertisement
Approximate On-line Palindrome
Recognition, and Applications
Amihood Amir
Benny Porat
Moskva River
Confluence of 4 Streams
CPM 2014
Palindrome Recognition
- Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin,
ostriak, -- I prochti obratno. A? Smeshno poluchaetsia?
Vladimir Nabokov, Invitation to a Beheading (1)
"Take the word ropot [murmur]," Cincinnatus' brother-in-law,
the wit, was saying to him, "and read it backwards. Eh?
Comes out funny, doesn't it?" [--› topor: the axe]
A palindrome is a string that is the same
whether read from right to left or from
left to right:
Examples: доход
A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!
Palindrome Example
Ibn Ezra: Medieval Jewish philosopher,
poet, Biblical commentator, and
mathematician.
Was asked: "‫"אבי אל חי שמך למה מלך משיח לא יבא‬
[ My Father, the Living God, why does the king messiah not
arrive?]
His response:
"‫ שוב אשוב אליכם כי בא מועד‬,‫"דעו מאביכם כי לא בוש אבוש‬
[ Know you from your Father that I will not be delayed. I will return
to you when the time will come ]
Palindromes in Computer Science
Great programming exercise in CS 101.
Example of a problem that can be solved by
a RAM in linear time, but not by a 1-tape
Turing machine.
(Can be done in linear time by a 2-tape TM)
Palindrome Concatenation
We may be interested in finding out
whether a string is a concatenation of
palindromes of length > 1.
Example: ABCCBABBCCBCAACB
Why would we be interested in such a funny
problem?
– we’ll soon see
Exercise: Do this in linear time…
Stream 2 - Approximations
As in exact matching, there may be errors.
Find the minimum number of errors that,
if fixed, will give a string that is a
concatenation of palindromes of length > 1
ABCCBABBCCBCAACB
Example: ABCCBCBBCCBCABCB
For Hamming distance:
A-Porat [ISAAC 13]:
Algorithm of time O(n2)
Stream 3 - Reversals
Why is this funny problem interesting?
Sorting by reversals: In the evolutionary
process a substring may “detach” and
“reconnect” in reverse:
ABCABCDAABCBAD
ABCABCDAABCBAD
CBAADCB
Sorting by Reversals
What is the minimum number of reversals
that, when applied to string A, result in
string B?
History:
Introduced: Bafna & Pevzner [95]
NP-hard:
Carpara [97]
Approximations: Christie [98]
Berman, Hannenhalli, Karpinski [02]
Hartman [03]
Sorting by Reversals – Polynomial
time Relaxations
Signed reversals: Hannenhalli & Pevzner [99]
Kaplan, Shamir, Tarjan [00]
Tannier & Sagot [04]
. . .
Disjointness: Swap Matching Muthu [96]
Two constraints:
1. The length of the reversed substring is limited
to 2.
2. All swaps are disjoint.
Pattern Matching with Disjoint
Reversals
• Reversal Distance (RD):
– The RD between s1 and s2 is the minimum
number k, such that there exist s2’ , where
HAM(s1,s2’) =k, and s1 reversal match s2.
S1:
S2:
A
D
C
B
A
E
D
B
A
A
D
A
A
B
A
B
C
E
RD(S1,S2) = 2
Connection between Reversal
Matching and Palindrome
Matching
Interleave Strings:
S1:
S2:
A
D
C
B
A
E
D
B
A
C
D
A
A
B
A
B
D
E
A C D D C A B A A B E A D B B D A E
On-line Input
Suppose that we get the input a byte at a
time:
For the palindrome problem:
A C D D C ABAABEADBBDAE
On-line Input
Suppose that we get the input a byte at a
time:
For the reversal problem:
A
C
D
D
C
A
B
A
A
B
E
A
D
B
B
D
A
E
Main Idea – Palindrome Fingerprint
The Rabin Karp
Fingerprint
Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)
s0,s1,s2,…sm-1
ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)
The Reversal
Fingerprint
If rm+1ΦR(S) = Φ(S) => S is a palindrome.
w.h.p.
Palindrome Fingerprint
If rm+1ΦR(S) = Φ(S) => S is a palindrome.
Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p)
Example:
r6ΦR(S)=
ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)
S=ABCBA
r6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r5 A) =
r5 A + r4 B + r3 C + r2 B + r A = Φ(S)
Simple Online Algorithm for
Finding a Palindrome in a Text
t1,t2,t3, …
ti,ti+1,ti+2 ,…ti+m, ti+m+1
, … tn
Φ=r1ti+ r2ti+1+… rmti+m mod (p)
If rm+1ΦR =Φ =>
ΦR=r-1ti+ r-2ti+1+… r-mti+m mod (p) there is a palindrome starting in
the i-th position.
If not, then for the next position:
Φ= Φ +
rm+1t
i+m+1
mod (p)
ΦR=ΦR + r-(m+1)ti+m+1 mod (p)
Note: This algorithm finds online
whether the prefix of a text is a
permutation. For finding online whether
the text is a concatenation of
permutations, assume even-length
permutations, otherwise, every text is a
concatenation of length-1 permutations.
Palindrome with mismatches
Start with 1 mismatch case.
1-Mismatch
S=
…
s0,s1,s2,
sm-1
Choose l prime numbers q1,…,ql < m such
that
l
q
i 1
i
m
1-Mismatch
S=
S2,0=
S2,1=
s0,s2,s4 …
S3,0=
S3,1=
S3,2=
s0,s3,s6 …
s1s3,s5 …
s1,s4,s7 …
s2,s5,s8 …
…
s0,s1,s2,
mod 2
mod 3
sm-1
Examples: q1=2, q2=3
For each qi construct qi
subsequences of S as
follows: subsequence Sqi,j
is all elements of S whose
index is j mod qi.
Example
S=
S2,0=
s0,s2,s4
S2,1=
s1s3,s5
S3,0=
s0,s3
S3,1=
s1,s4
S3,2=
s2,s5
s0,s1,s2, s3,s4,s5
mod 2
mod 3
1-Mismatch
• We need to compare:
s0 , s1, s2,
sm-1, sm-2, sm-3
…
sm-2 ,sm-1
…
s1 , s0
• We prove that in the partitions strings:
Sq,j= SRq,(m-1-j)mod q
Example
S2,0=
s0,s2,s4
S2,1=
s1s3,s5
S3,0=
s0,s3
S3,1=
S3,2=
S=
s0,s1,s2,s3,s4,s5
SR=
s5,s4,s3,s2,s1,s0
S3,0=
s0,s3
SR3,2=
s5,s2
s1,s4
S3,1=
s1,s4
s2,s5
SR3,1=
s4,s1
S2,0=
s0,s2,s4
SR2,1=
s5s3,s1
Exact Matching
Lemma: S=SR  Sq,j = SRq,(m-1-j) mod
for all q and all 0 ≤ j ≤ q.
q
1-Mismatch
Lemma:
There is exactly one mismatch
There is exactly one subpattern in each
group that does not match.
C.R.T
Chinese Remainder Theorem
Let n and m two positive integers.
a mod n  b mod n
a mod m  b mod m
a mod nm  b mod nm
In our case: if two different indices, i and j, have an error, and only
one subsequence is erroneous, since the product of all q’s > m, it
means that i=j.
Complexity
There exists a constant c such that, for
any x<m, there are at least x/log m prime
numbers between x and cx.
log m
Therefore, choose log log m prime
numbers between log m and c log m.
Complexity
• For each qi we compute 2qi different
fingerprints:
 log 2 m 
• Overall space: l

2  q i  O 

log
log
m
io


• Each character participates in exactly
two fingerprints (the regular and the
reverse).
 log m 

2 l  O 
• Overall time:
 log log m 
Online
All fingerprint calculations can be done
online
We know the m at every input character,
to compute the comparisons.
Conclude: Our algorithm is online.
k-Mismatches
Use Group testing…
k-Mismatches
Group Testing
• Given n items with some positive ones,
identify all positive ones by a small
number of tests.
• Each test is on a subset of items.
• Test outcome is positive iff there is a
positive item in the subset.
k-Mismatch
• Group: partition of the text.
• Test: distinguish between: (using the 1-mismatch
algorithm)
– match
– 1-mismatch
– more then 1-mismatch
k-Mismatches
S=
S2,0=
S2,1=
s0,s2,s4 …
S3,0=
S3,1=
S3,2=
s0,s3,s6 …
s1s3,s5 …
s1,s4,s7 …
s2,s5,s8 …
…
s0,s1,s2,
mod 2
mod 3
sm-1
Each Sq,j is a
group in our
group testing
Similar to the 1-mismatch
algorithm just with more
prime numbers…
l
q 1 q 2 q 3 ... q l s.t
q
i 1
i
m
k
Our tests
• We define The reversal pair of
Sq,j to be SRq,(m-1-j)mod q
• Each partition is “tested against” its
reversal pair.
Correctness
s0,s1,s2,
…
sj
….
sm-1
i2
i9
i5
i7
i
For any group of k character i1,i2,..ik
There exists a partition where sj
appears alone
C.R.T
Correctness
s0,s1,s2,
…
sj
….
sm-1
i2
i9
i5
i7
i
If sj invokes a mismatch we will catch it.
Complexity
• Overall space:
 k log 2 m
O 
2
log
log m

• Overall time:




 k 2 log 4 m 

O 
2

log
log
m


Approximate Reversal Distance
Using the palindrome up to k-mismatches
algorithm, can be solved in
O n log k  nk
2
O n  k
2


time, and
space.
спасибо
Download