Longest common subsequence X=x x ...x

advertisement
Longest common subsequence
• Definition 1: Given a sequence X=x1x2...xm,
another sequence Z=z1z2...zk is a subsequence of
X if there exists a strictly increasing sequence
i1i2...ik of indices of X such that for all j=1,2,...k,
we have xij=zj.
• Example 1: If X=abcdefg, Z=abdg is a
subsequence of X.
X=abcdefg,
Z=ab d g
2016/5/29
chapter25
1
• Definition 2: Given two sequences X and
Y, a sequence Z is a common subsequence
of X and Y if Z is a subsequence of both X
and Y.
• Example 2: X=abcdefg and Y=aaadgfd.
Z=adf is a common subsequence of X and
Y.
X=abc defg
Y=aaaadgfd
Z=a d f
2016/5/29
chapter25
2
• Definition 3: A longest common
subsequence of X and Y is a common
subsequence of X and Y with the longest
length. (The length of a sequence is the
number of letters in the seuqence.)
• Longest common subsequence may not
be unique.
• Example: abcd
acbd
Both acd and abd are LCS.
2016/5/29
chapter25
3
Longest common subsequence problem
• Input: Two sequences X=x1x2...xm, and
Y=y1y2...yn.
• Output: a longest common subsequence of X and Y.
• A brute-force approach
Suppose that mn. Try all subsequence of X
(There are 2m subsequence of X), test if such a
subsequence is also a subsequence of Y, and select
the one with the longest length.
2016/5/29
chapter25
4
LCS: Applications
• Compare two versions of source code for
the same program.
• Unix command: diff for compare text files.
2016/5/29
chapter25
5
Charactering a longest common
subsequence
• Theorem (Optimal substructure of an LCS)
• Let X=x1x2...xm, and Y=y1y2...yn be two
sequences, and
• Z=z1z2...zk be any LCS of X and Y.
• 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS
of X[1..m-1] and Y[1..n-1].
• 2. If xm yn, then zkxm implies that Z is an LCS
of X[1..m-1] and Y.
• 2. If xm yn, then zkyn implies that Z is an LCS of
X and Y[1..n-1].
2016/5/29
chapter25
6
The recursive equation
• Let c[i,j] be the length of an LCS of X[1...i] and
Y[1...j].
• c[i,j] can be computed as follows:
0
if i=0 or j=0,
c[i,j]= c[i-1,j-1]+1
if i,j>0 and xi=yj,
max{c[i,j-1],c[i-1,j]} if i,j>0 and xiyj.
Computing the length of an LCS
• There are nm c[i,j]’s. So we can compute them in
a specific order.
2016/5/29
chapter25
7
The algorithm to compute an LCS
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1. for i=1 to m do
2.
c[i,0]=0;
3. for j=0 to n do
4.
c[0,j]=0;
5. for i=1 to m do
6.
for j=1 to n do
7.
{
8.
if x[I] ==y[j] then
9.
c[i,j]=c[i-1,j-1]+1;
10
b[i,j]=1;
11.
else
if c[i-1,j]>=c[i,j-1] then
12.
c[i,j]=c[i-1,j]
13.
b[i,j]=2;
14.
else c[i,j]=c[i,j-1]
15.
b[i,j]=3;
14
}
b[i,j] stores the directions. 1—diagnal, 2-up, 3-forward.
2016/5/29
chapter25
8
Example 1: X=BDCABA and Y=ABCBDAB.
2016/5/29
chapter25
9
Constructing an LCS (back-tracking)
• We can find an LCS using b[i,j]’s.
• We start with b[n,m] and track back to some cell b[0,i] or b[i,0].
• The algorithm to construct an LCS
1.
2.
3.
4.
i=m
j=n;
if i==0 or j==0 then exit;
if b[i,j]==1 then
{
i=i-1;
j=j-1;
print “xi”;
}
5. if b[i,j]==2
i=i-1
6. if b[i,j]==3
j=j-1
7. Goto Step 3.
• The time complexity: O(nm).
2016/5/29
chapter25
10
Shortest common supersequence
• Definition: Let X and Y be two sequences. A
sequence Z is a supersequence of X and Y if both
X and Y are subsequence of Z.
• Shortest common supersequence problem:
Input: Two sequences X and Y.
Output: a shortest common supersequence of X and Y.
• Example: X=abc and Y=abb. Both abbc and
abcb are the shortest common supersequences for
X and Y.
2016/5/29
chapter25
11
Recursive Equation:
• Let c[i,j] be the length of an LCS of X[1...i]
and X[1...j].
• c[i,j] can be computed as follows:
j
if i=0
i
if j=0,
c[i,j]= c[i-1,j-1]+1
if i,j>0 and xi=yj,
min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj.
2016/5/29
chapter25
12
2016/5/29
chapter25
13
Assignment 3: (Due week 13, Monday at 7: 30 pm)
Question1: Write a program to compute the SCS for two
sequences. Use s1=abcdabbcabddabcd and
s2=abbcabbdacbdadbc as the test input.
Backtracking is required, i.e. the program MUST output the
shortest common super-sequence, Not just the length of
SCS.
Question 2. Write a program to calculate the maximum degree
of a node in a undirected graph.
(1) Use an adjacency matrix to store the graph;
(2) Use a adjacency list to store the graph.
(3) give the time complexity of the two programs. Which
one is better? Why?
You can use the graph in slide 22 as the test input.
2016/5/29
chapter25
14
Part-H1
Graphs
SFO
LAX
2016/5/29
chapter25
ORD
DFW
15
Graphs (§ 12.1)
• A graph is a pair (V, E), where
– V is a set of nodes, called vertices
– E is a collection of pairs of vertices, called edges
– Vertices and edges are positions and store elements
• Example:
– A vertex represents an airport and stores the three-letter airport code
– An edge represents a flight route between two airports and stores the mileage
of the route
PVD
ORD
SFO
LGA
HNL
2016/5/29
LAX
DFW
chapter25
MIA
16
Edge Types
• Directed edge
–
–
–
–
ordered pair of vertices (u,v)
first vertex u is the origin
second vertex v is the destination
e.g., a flight
ORD
flight
AA 1206
PVD
ORD
849
miles
PVD
• Undirected edge
– unordered pair of vertices (u,v)
– e.g., a flight route
• Directed graph
– all the edges are directed
– e.g., route network
• Undirected graph
– all the edges are undirected
– e.g., flight network
2016/5/29
chapter25
17
Terminology
• End vertices (or endpoints) of an
edge
– U and V are the endpoints of a
a
• Edges incident on a vertex
– a, d, and b are incident on V
• Adjacent vertices
U
– U and V are adjacent
V
b
d
X
c
• Degree of a vertex
e
W
– X has degree 5
• Parallel edges
• Self-loop
j
Z
i
g
f
– h and i are parallel edges
h
Y
– j is a self-loop
2016/5/29
chapter25
18
Terminology (cont.)
• Path
– sequence of alternating vertices
and edges
– begins with a vertex
– ends with a vertex
– each edge is preceded and
followed by its endpoints
• Simple path
– path such that all its vertices and
edges are distinct
• Examples
– P1=(V,b,X,h,Z) is a simple path
– P2=(U,c,W,e,X,g,Y,f,W,d,V) is a
path that is not simple
2016/5/29
chapter25
a
U
c
V
b
d
P2
P1
X
e
W
h
Z
g
f
Y
19
Terminology (cont.)
• Cycle
– circular sequence of alternating
vertices and edges
– each edge is preceded and followed
by its endpoints
• Simple cycle
– cycle such that all its vertices and
edges are distinct
• Examples
– C1=(V,b,X,g,Y,f,W,c,U,a,) is a
simple cycle
– C2=(U,c,W,e,X,g,Y,f,W,d,V,a,) is
a cycle that is not simple
2016/5/29
chapter25
a
U
c
V
b
d
C2
X
e
C1
g
W
f
h
Z
Y
20
Adjacency List Structure
• Incidence sequence for each
vertex
– sequence of references to
edge objects of incident
edges
• Edge objects
– references to associated
positions in incidence
sequences of end vertices
2016/5/29
chapter25
21
Adjacency Matrix Structure
• Augmented vertex objects
– Integer key (index)
associated with vertex
• 2D-array adjacency array
– Reference to edge object for
adjacent vertices
– “Infinity” for non
nonadjacent vertices
• A graph with no weight has 0
for no edge and 1 for edge
2016/5/29
chapter25
22
Part-H2
Depth-First Search
A
B
D
E
C
2016/5/29
chapter25
23
Depth-First Search (§ 12.3.1)
• Depth-first search (DFS)
is a general technique for
traversing a graph
• A DFS traversal of a graph
G
– Visits all the vertices and
edges of G
– Determines whether G is
connected
– Computes the connected
components of G
– Computes a spanning forest
of G
2016/5/29
• DFS on a graph with n
vertices and m edges takes
O(n + m ) time
• DFS can be further
extended to solve other
graph problems
chapter25
– Find and report a path
between two given vertices
– Find a cycle in the graph
24
DFS Algorithm
•
The algorithm uses a mechanism for
setting and getting “labels” of
vertices and edges
Algorithm DFS(G)
Input graph G
Output labeling of the edges of G
as discovery edges and
back edges
for all u  G.vertices()
setLabel(u, UNEXPLORED)
for all e  G.edges()
setLabel(e, UNEXPLORED)
for all v  G.vertices()
if getLabel(v) = UNEXPLORED
DFS(G, v)
2016/5/29
Algorithm DFS(G, v)
Input graph G and a start vertex v of G
Output labeling of the edges of G
in the connected component of v
as discovery edges and back edges
setLabel(v, VISITED)
for all e  G.incidentEdges(v)
if getLabel(e) = UNEXPLORED
w  opposite(v,e)
if getLabel(w) = UNEXPLORED
setLabel(e, DISCOVERY)
DFS(G, w)
else
setLabel(e, BACK)
chapter25
25
Example
unexplored vertex
visited vertex
unexplored edge
discovery edge
back edge
A
A
A
B
D
E
A
D
E
B
C
2016/5/29
E
C
A
B
D
C
chapter25
26
Example (cont.)
A
B
A
D
E
B
C
C
A
A
B
D
E
B
C
2016/5/29
D
E
D
E
C
chapter25
27
DFS and Maze Traversal
• The DFS algorithm is
similar to a classic
strategy for exploring a
maze
– We mark each
intersection, corner and
dead end (vertex) visited
– We mark each corridor
(edge ) traversed
– We keep track of the
path back to the entrance
(start vertex) by means
of a rope (recursion stack)
2016/5/29
chapter25
28
Analysis of DFS
• Setting/getting a vertex/edge label takes O(1) time
• Each vertex is labeled twice
– once as UNEXPLORED
– once as VISITED
• Each edge is labeled twice
– once as UNEXPLORED
– once as DISCOVERY or BACK
• Method incidentEdges is called once for each vertex
• DFS runs in O(n + m) time provided the graph is
represented by the adjacency list structure
– Recall that Sv deg(v) = 2m
2016/5/29
chapter25
29
Download