Longest Palindromic Substring

advertisement
Longest Palindromic Substring
Yang Liu
Problem
• Given a string S
• Find the longest palindromic substring in S.
Example:
S=“abcbcbb”. The longest palindromic substring
is “bcbcb”.
Simple Idea(Brute Force)
S=“abcbcbb”
Length=n(6) substring: “abcbcbb”---not palindromic
Length=n-1(5) substring:
start=0, end=n-2(4): “abcbc”---not palindromic
start=1, end=n-1(5): “bcbcb”---palindromic
Longest palindromic substring: “bcbcb”
Simple Idea(Brute Force)
2

For len=n to 2
len n
n  len
for start=0 to n-len

start  0
end= start+len-1
if substring(start,end) is palindromic
return substring(start,end)
Return first character
Complexity
2
n len
len n
start  0

 len 
2
3
(
n

len

1
)
len

O
(
n
)

len n
len
Dynamic Programming(DP)
• If substring(i,j) is palindromic, then
substring(i+1,j-1) is palindormic
• P[i,j]=1 if substring(I,j) is palindormic
=0 otherwise
• When j-i is small(=0, 1), easy to know:
P[i,i]=1 and P[i,i+1]=(S[i]==S[j]) (base)
• Computer P[i,j] from small j-i to big j-i:
P[i,j]=P[i+1,j-1] && S[i]==S[j]
Example of DP
S=“abcbcbb”
P[i,i]
1
1
1
1
1
1
1
Example of DP
S=“abcbcbb”
P[i,i]
1
0
1
P[i,i+1]
0
1
0
1
0
1
0
1
1
1
Example of DP
S=“abcbcbb”
P[i,i]
P[i,i+1]
P[i,i+2]
P[i,i+3]
1
0
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
1
1
Example of DP
S=“abcbcbb”
P[i,i]
P[i,i+1]
P[i,i+2]
P[i,i+3]
1
0
0
0
0
0
0
1
0
1
0
1
1
1
0
1
0
0
1
0
1
0
1
0
0
1
1
1
.
.
.
Example of DP
Max palindromic substring?
S=“abcbcbb”
P[i,i]
P[i,i+1]
P[i,i+2]
P[i,i+3]
1
0
0
0
0
0
0
1
0
1
0
1
1
1
0
1
0
0
1
0
1
0
1
0
0
1
1
1
.
.
.
for(len=n to 1)
for(i=0 to n-len)
if (P[i,i+len-1])
return S[i..i+len-1]
DP Algorithm
for(i=0 to n-1)
P[i,i]=1;
O(n2) time and space
P[i,i+1]=(S[i]==S[i+1])?1:0;
for(len=3 to n)
for(i=1 to n-len+1)
P[i,i+len-1]=(P[i+1,i+len-2] && S[i]==S[i+len])?1:0
for(len=n to 1)
for(i=0 to n-len)
if (P[i,i+len-1])
return S[i..i+len-1]
Algorithm of O(1) Space and O(n2)
time
• Given the center of a palindrome, easy to find
the maximum substring with that center
 center at i: check S[i-dist]==S[i+dist]
S=“abcbcbb” center at 2(c)
S[2-1]=S[2++1]=b continue
S[2-2]!=S[2+2]
stop
 center at i,i+1: check S[i-dist]==S[i+1+dist]
• Do this for all possible centers(n+n-1=2n-1)
Linear Time Algorithm
• The previous algorithm simply computes:
– an array P[1..n-1] where P[i] is the length of
maximum substring centered at i.
– an array Q[1..n-2] where Q[i] is the length of
maximum substring centered at i and i+1.
• Can we reduce the time to compute P[i] & Q[i]
by using already computed P[j] & Q[j] (j<i)?
Compute P[i] & Q[i] Efficiently
S=“abbabbabbabbabbaba”
• “abbabbabbabbabbaba” P[6]=12
• “abbabbabbabbabbaba” Q[7]=16
P[7]? Q[7]?
 “abbabbabbabbababa” Shall we compare S[8] & S[10]?
 “abbabbabbabbababa” No! its image P[2] W.R.T S[7] and
the rightmost edge of P[7] provide a lower bound.
 Similarly, “abbabbabbabbabbaba” implies a lower bound
from P[6] and the rightmost edge of Q[7]
Lower Bound of P[i]
• Depends on the rightmost edge of
paralindromic substrings and the image of S[i]
in the substring.
• Rightmost edge: rEdge
• Image: depends on the length of the substring

can we make the length of paralindromic
substrings to be always odd?
Length Change of Paralindromic
Substrings
• Insert a special character between any
adjacent characters in the input string
o S =“abcbcbb”
 S=“#a#b#c#b#c#b#b”
o S=“abccbab”
S=“#a#b#c#c#b#a#b”
Center, Image, and Rightmost Edge
• “abbabbabbabbabbaba” P[19]=?
center
rEdge=26
=13
“#a#b#b#a#b#b#a#b#b#a#b#b#a#b#b#a#b#a”
image=2*13-19=7
P[7]=13
P[19]>=P[7]=13
Center, Image, and Rightmost Edge
• “aababbabbabaaaba” P[19]=?
center
rEdge=28
=15
“#a#a#b#a#b#b#a#b#b#a#b#a#a#a#b#a”
image=2*15-19=7
P[7]=7
P[21]>=P[7]=7
Center, Image, and Rightmost Edge
• “babbabbabbabbaaaba” P[21]=12
center
rEdge=28
=15
“#b#a#b#b#a#b#b#a#b#b#a#b#b#a#a#b#a#b#a
”
image=2*15-21=9
P[9]=19
P[21]>=2(rEdge-i)-1
=2(28-21)-1=13
Center, Image, and Rightmost Edge
• “babbabbabbabbaaaba” P[21]=12
center
rEdge=28
=15
“#b#a#b#b#a#b#b#a#b#b#a#b#b#a#a#b#a#b#a
”
In general, paralindromic
substring centered at i can be extended to one
side at least min(P[i], rEdge-i) (P[i] now refers to
the maximum characters in one side including
the center character at i)
Center, Image, and Rightmost Edge
• “abbabbabbabbabbaba” P[19]=?
center
rEdge=26
=13
“#a#b#b#a#b#b#a#b#b#a#b#b#a#b#b#a#b#a”
image=2*13-19=7
P[7]=7
P[19]>=min(P[7],26-19)=7
Center, Image, and Rightmost Edge
• “aababbabbabaaaba” P[19]=?
center
rEdge=28
=13
“#a#a#b#a#b#b#a#b#b#a#b#a#a#a#b#a”
image=2*13-19=7
P[7]=3
P[21]>=min(P[7],28-)=
Center, Image, and Rightmost Edge
• “babbabbabbabbaaaba” P[21]=12
center
rEdge=28
=15
“#b#a#b#b#a#b#b#a#b#b#a#b#b#a#a#b#a#b#a
”
image=2*15-21=9
P[9]=10
P[21]>=min(P[9],28-21+1)=8
O(n) Algorithm
For(i=1 to n-1)
Why the complexity is O(n)?
insert special character before A[i]
center=0;
rEdge=0;
For(i=1 to 2n-1)
image=2*center-i;
P[i]=min(P[image],rEdge-i+1);
extend P[i] to its maximum;
if(P[i]+i>rEdge)
rEdge=P[i]+I;
center=i;
Find the maximum P[i] for i in 1, 3, …, 2n-1.
Return the substring centered at i with 2P[i]-1 characters.
Exercise 1
• Find one of the longest paralindromic
subsequences.
Example:
S=“abbbcccabaa”
Longest paralindormic subsequenc: “abccba”
from “abbbcccabaa”
Exercise 2
Determine whether an integer is a palindrome.
Do this without extra space.
Research Reference
“A New Linear-Time ‘On-Line’ Algorithm for
finding the smallest initial palindrome of a
string”, G. Manacher, JACM 22(3):346-351,
1975.
Download