Longest common subsequence • Definition 1: Given a sequence X=x1x2...xm, another sequence Z=z1z2...zk is a subsequence of X if there exists a strictly increasing sequence i1i2...ik of indices of X such that for all j=1,2,...k, we have xij=zj. • Example 1: If X=abcdefg, Z=abdg is a subsequence of X. X=abcdefg, Z=ab d g 2016/5/29 chapter25 1 • Definition 2: Given two sequences X and Y, a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. • Example 2: X=abcdefg and Y=aaadgfd. Z=adf is a common subsequence of X and Y. X=abc defg Y=aaaadgfd Z=a d f 2016/5/29 chapter25 2 • Definition 3: A longest common subsequence of X and Y is a common subsequence of X and Y with the longest length. (The length of a sequence is the number of letters in the seuqence.) • Longest common subsequence may not be unique. • Example: abcd acbd Both acd and abd are LCS. 2016/5/29 chapter25 3 Longest common subsequence problem • Input: Two sequences X=x1x2...xm, and Y=y1y2...yn. • Output: a longest common subsequence of X and Y. • A brute-force approach Suppose that mn. Try all subsequence of X (There are 2m subsequence of X), test if such a subsequence is also a subsequence of Y, and select the one with the longest length. 2016/5/29 chapter25 4 LCS: Applications • Compare two versions of source code for the same program. • Unix command: diff for compare text files. 2016/5/29 chapter25 5 Charactering a longest common subsequence • Theorem (Optimal substructure of an LCS) • Let X=x1x2...xm, and Y=y1y2...yn be two sequences, and • Z=z1z2...zk be any LCS of X and Y. • 1. If xm=yn, then zk=xm=yn and Z[1..k-1] is an LCS of X[1..m-1] and Y[1..n-1]. • 2. If xm yn, then zkxm implies that Z is an LCS of X[1..m-1] and Y. • 2. If xm yn, then zkyn implies that Z is an LCS of X and Y[1..n-1]. 2016/5/29 chapter25 6 The recursive equation • Let c[i,j] be the length of an LCS of X[1...i] and Y[1...j]. • c[i,j] can be computed as follows: 0 if i=0 or j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, max{c[i,j-1],c[i-1,j]} if i,j>0 and xiyj. Computing the length of an LCS • There are nm c[i,j]’s. So we can compute them in a specific order. 2016/5/29 chapter25 7 The algorithm to compute an LCS • • • • • • • • • • • • • • • • • 1. for i=1 to m do 2. c[i,0]=0; 3. for j=0 to n do 4. c[0,j]=0; 5. for i=1 to m do 6. for j=1 to n do 7. { 8. if x[I] ==y[j] then 9. c[i,j]=c[i-1,j-1]+1; 10 b[i,j]=1; 11. else if c[i-1,j]>=c[i,j-1] then 12. c[i,j]=c[i-1,j] 13. b[i,j]=2; 14. else c[i,j]=c[i,j-1] 15. b[i,j]=3; 14 } b[i,j] stores the directions. 1—diagnal, 2-up, 3-forward. 2016/5/29 chapter25 8 Example 1: X=BDCABA and Y=ABCBDAB. 2016/5/29 chapter25 9 Constructing an LCS (back-tracking) • We can find an LCS using b[i,j]’s. • We start with b[n,m] and track back to some cell b[0,i] or b[i,0]. • The algorithm to construct an LCS 1. 2. 3. 4. i=m j=n; if i==0 or j==0 then exit; if b[i,j]==1 then { i=i-1; j=j-1; print “xi”; } 5. if b[i,j]==2 i=i-1 6. if b[i,j]==3 j=j-1 7. Goto Step 3. • The time complexity: O(nm). 2016/5/29 chapter25 10 Shortest common supersequence • Definition: Let X and Y be two sequences. A sequence Z is a supersequence of X and Y if both X and Y are subsequence of Z. • Shortest common supersequence problem: Input: Two sequences X and Y. Output: a shortest common supersequence of X and Y. • Example: X=abc and Y=abb. Both abbc and abcb are the shortest common supersequences for X and Y. 2016/5/29 chapter25 11 Recursive Equation: • Let c[i,j] be the length of an LCS of X[1...i] and X[1...j]. • c[i,j] can be computed as follows: j if i=0 i if j=0, c[i,j]= c[i-1,j-1]+1 if i,j>0 and xi=yj, min{c[i,j-1]+1,c[i-1,j]+1} if i,j>0 and xiyj. 2016/5/29 chapter25 12 2016/5/29 chapter25 13 Assignment 3: (Due week 13, Monday at 7: 30 pm) Question1: Write a program to compute the SCS for two sequences. Use s1=abcdabbcabddabcd and s2=abbcabbdacbdadbc as the test input. Backtracking is required, i.e. the program MUST output the shortest common super-sequence, Not just the length of SCS. Question 2. Write a program to calculate the maximum degree of a node in a undirected graph. (1) Use an adjacency matrix to store the graph; (2) Use a adjacency list to store the graph. (3) give the time complexity of the two programs. Which one is better? Why? You can use the graph in slide 22 as the test input. 2016/5/29 chapter25 14 Part-H1 Graphs SFO LAX 2016/5/29 chapter25 ORD DFW 15 Graphs (§ 12.1) • A graph is a pair (V, E), where – V is a set of nodes, called vertices – E is a collection of pairs of vertices, called edges – Vertices and edges are positions and store elements • Example: – A vertex represents an airport and stores the three-letter airport code – An edge represents a flight route between two airports and stores the mileage of the route PVD ORD SFO LGA HNL 2016/5/29 LAX DFW chapter25 MIA 16 Edge Types • Directed edge – – – – ordered pair of vertices (u,v) first vertex u is the origin second vertex v is the destination e.g., a flight ORD flight AA 1206 PVD ORD 849 miles PVD • Undirected edge – unordered pair of vertices (u,v) – e.g., a flight route • Directed graph – all the edges are directed – e.g., route network • Undirected graph – all the edges are undirected – e.g., flight network 2016/5/29 chapter25 17 Terminology • End vertices (or endpoints) of an edge – U and V are the endpoints of a a • Edges incident on a vertex – a, d, and b are incident on V • Adjacent vertices U – U and V are adjacent V b d X c • Degree of a vertex e W – X has degree 5 • Parallel edges • Self-loop j Z i g f – h and i are parallel edges h Y – j is a self-loop 2016/5/29 chapter25 18 Terminology (cont.) • Path – sequence of alternating vertices and edges – begins with a vertex – ends with a vertex – each edge is preceded and followed by its endpoints • Simple path – path such that all its vertices and edges are distinct • Examples – P1=(V,b,X,h,Z) is a simple path – P2=(U,c,W,e,X,g,Y,f,W,d,V) is a path that is not simple 2016/5/29 chapter25 a U c V b d P2 P1 X e W h Z g f Y 19 Terminology (cont.) • Cycle – circular sequence of alternating vertices and edges – each edge is preceded and followed by its endpoints • Simple cycle – cycle such that all its vertices and edges are distinct • Examples – C1=(V,b,X,g,Y,f,W,c,U,a,) is a simple cycle – C2=(U,c,W,e,X,g,Y,f,W,d,V,a,) is a cycle that is not simple 2016/5/29 chapter25 a U c V b d C2 X e C1 g W f h Z Y 20 Adjacency List Structure • Incidence sequence for each vertex – sequence of references to edge objects of incident edges • Edge objects – references to associated positions in incidence sequences of end vertices 2016/5/29 chapter25 21 Adjacency Matrix Structure • Augmented vertex objects – Integer key (index) associated with vertex • 2D-array adjacency array – Reference to edge object for adjacent vertices – “Infinity” for non nonadjacent vertices • A graph with no weight has 0 for no edge and 1 for edge 2016/5/29 chapter25 22 Part-H2 Depth-First Search A B D E C 2016/5/29 chapter25 23 Depth-First Search (§ 12.3.1) • Depth-first search (DFS) is a general technique for traversing a graph • A DFS traversal of a graph G – Visits all the vertices and edges of G – Determines whether G is connected – Computes the connected components of G – Computes a spanning forest of G 2016/5/29 • DFS on a graph with n vertices and m edges takes O(n + m ) time • DFS can be further extended to solve other graph problems chapter25 – Find and report a path between two given vertices – Find a cycle in the graph 24 DFS Algorithm • The algorithm uses a mechanism for setting and getting “labels” of vertices and edges Algorithm DFS(G) Input graph G Output labeling of the edges of G as discovery edges and back edges for all u G.vertices() setLabel(u, UNEXPLORED) for all e G.edges() setLabel(e, UNEXPLORED) for all v G.vertices() if getLabel(v) = UNEXPLORED DFS(G, v) 2016/5/29 Algorithm DFS(G, v) Input graph G and a start vertex v of G Output labeling of the edges of G in the connected component of v as discovery edges and back edges setLabel(v, VISITED) for all e G.incidentEdges(v) if getLabel(e) = UNEXPLORED w opposite(v,e) if getLabel(w) = UNEXPLORED setLabel(e, DISCOVERY) DFS(G, w) else setLabel(e, BACK) chapter25 25 Example unexplored vertex visited vertex unexplored edge discovery edge back edge A A A B D E A D E B C 2016/5/29 E C A B D C chapter25 26 Example (cont.) A B A D E B C C A A B D E B C 2016/5/29 D E D E C chapter25 27 DFS and Maze Traversal • The DFS algorithm is similar to a classic strategy for exploring a maze – We mark each intersection, corner and dead end (vertex) visited – We mark each corridor (edge ) traversed – We keep track of the path back to the entrance (start vertex) by means of a rope (recursion stack) 2016/5/29 chapter25 28 Analysis of DFS • Setting/getting a vertex/edge label takes O(1) time • Each vertex is labeled twice – once as UNEXPLORED – once as VISITED • Each edge is labeled twice – once as UNEXPLORED – once as DISCOVERY or BACK • Method incidentEdges is called once for each vertex • DFS runs in O(n + m) time provided the graph is represented by the adjacency list structure – Recall that Sv deg(v) = 2m 2016/5/29 chapter25 29