UML CS 91.503 Midterm Exam Fall, 2003 MIDTERM EXAM SOLUTIONS 1: (6 points) What can you conclude? 1) f1 (n) n 3 lg 4 n 2) f 2 (n) O 27 log4 n 3) f 3 ( n ) ( 2 n ) 3 First, observe that 27 log4 n 27 log3 n 33 log3 n 3log3 n n 3 . This implies: f 2 (n) O(n 3 ) . f3(n) f1(n) f2(n) 2n n 3 lg 4 n 27 log4 n a) (3 points) Can we conclude from statements (1)-(3) that f1 (n) O( f 3 (n)) ? Why or why not? Either prove or provide a counterexample. SOLUTION: No. Counterexample: f1 (n) 3n , f 3 (n) 2 n . 1 of 10 UML CS 91.503 Midterm Exam Fall, 2003 b) (3 points) Can we conclude from statements (1)-(3) that f1 (n) ( f 2 (n)) ? Why or why not? Either prove or provide a counterexample. SOLUTION: Yes. Proof: We observed above that f 2 (n) O(n 3 ) . This implies that n 3 ( f 2 (n)) . Now, n 3 lg 4 n (n 3 ) , so by transitivity we have n 3 lg 4 n ( f 2 (n)) . This, together with f1 (n) (n 3 lg 4 n) , imply transitively that f1 (n) ( f 2 (n)) . 2: (24 points) This question involves the paper: “On calculating connected dominating set for efficient routing in ad hoc wireless networks.” Consider the algorithm that consists of the following two steps: 1) dominating set formation, consisting of the marking process on p. 8; 2) creating all-pair shortest paths for the dominating set resulting from step 1. a) (2 points) Give an example of a best-case input for this algorithm, where “best” relates to the size of the dominating set (not running time). SOLUTION: Since the goal is to minimize the size of the dominating set, “best” here means smallest dominating set size. If we allow a complete graph as input, then a complete graph is a best-case input because the algorithm returns an empty dominating set in this case. To see this, observe that, since each pair of nodes is connected, no node has an unconnected pair of neighbors. If we do not allow a complete graph as input, then a best-case n-node input has one node of degree n-1 that is connected to each other node. Each other node has degree 1. In this case, the algorithm returns a dominating set consisting of the node of degree n-1. 2 of 10 UML CS 91.503 Midterm Exam Fall, 2003 b) (2 points) Give an example of a worst-case input for this algorithm, where “worst” relates to the size of the dominating set (not running time). SOLUTION: A worst-case n-node input produces a maximum-sized dominating set. Such an input consists of a single cycle of nodes. That is, each node has degree 2 and the nodes are connected in a ring shape. In this case, each node’s neighbors are unconnected so each node is part of the dominating set. The dominating set therefore has size = n. c) (20 points) Analyze the worst-case asymptotic running time of the algorithm (here “worst” refers to running time). You only need to give an upper bound. Give the smallest upper bound that you can. Wherever necessary, make suitable assumptions about representation so that an efficient implementation is used. Make sure that the variables in your running time formula are chosen to provide a meaningful description of the running time. SOLUTION: For this analysis we represent the running time in terms of the sizes of the vertex set V and the edge set E. We assume that, although the algorithm is distributed, we use a sequential model of computation so that the total running time is the sum of the running time at each vertex. We also assume an adjacency list graph representation so that each open neighbor list is actually an adjacency list. We analyze separately the running time for the marking process and the shortestpath routing: 1) Marking process: This consists of several steps: a. Initialize marks to F. Time: (|V|). b. Initialize data structure Lv for each vertex v that will hold its neighbor lists. Lv is initialized to null. Time: (|V|). c. Neighbor list exchange. For each vertex v and for each vertex u in v’s adjacency list, this copies the entries of Adj[u] into Lv. Note that because this is a distributed environment we copy the entire contents of Adj[u] instead of simply copying a pointer to it. In the worst case, after this step each vertex has the entire adjacency list. Time: O(|V||E|). 3 of 10 UML CS 91.503 Midterm Exam Fall, 2003 d. Marking. For this step each vertex v has an auxiliary boolean array Pv of dimensions |adj[v]|x|adj[v]| that records whether or not each pair of vertices in its adjacency list is connected. This part of the algorithm steps through Lv, setting the appropriate entry of Pv to 1 if Lv shows that those vertices are connected. Once all of Lv has been processed in this way, the algorithm steps through Pv to see if some pair of vertices Pv is unconnected. If so, v is marked T. Time: O(|V||E|+|V|3). 2) Shortest-path routing: For this step we can either use FloydWarshall’s all-pairs shortest path algorithm or that of Johnson. In the former case, the worst-case running time is in O(|V|3). In the latter case, the time is in O(|V|2lg|V|+|V||E|). Total worst-case running time is in O(|V| + |V||E| + |V|3 ) in the FloydWarshall case and O(|V| + |V||E| + |V|2lg|V| + |V||E|) in the Johnson case. Now, in the worst case |E| is in (|V|2). The time therefore reduces to O(|V|3) in each case. 3: (15 points) Flow Networks Given a flow network G = (V, E) with source s and sink t, a cut (S,T) is a partition of V into S and T = V – S such that s S , t T . Let (S,T) be a cut of a flow network. Let C be a subset of T such that t C . Let D = T – C. Consider the following statement about the flow network: f ( S , C ) f (T , S ) f ( D, S ) f (V , D) where f ( X , Y ) f ( x, y) and xX yY f ( x, y ) denotes the flow from vertex x to vertex y. Either prove the statement or provide a counterexample. SOLUTION: The statement is TRUE. Proof: First note that flow conservation implies that d D, f (d ,V ) 0 since neither s nor t is in D. Summing these 0 values yields: f ( D,V ) 0 . By part 2 of Lemma 26.1 we 4 of 10 UML CS 91.503 Midterm Exam Fall, 2003 therefore have: f (V , D) 0 . Now, observe that since C and D are disjoint, part 3 of Lemma 26.1 yields: f ( S , C ) f ( S , D) f ( S , T ) . Applying part 2 of Lemma 26.1 to this (twice) gives: f ( S , C ) f ( D, S ) f (T , S ) . Since f (V , D) 0 , this is equivalent to: f ( S , C ) f ( D, S ) f (T , S ) f (V , D) . Rearranging terms yields: f ( S , C ) f (T , S ) f ( D, S ) f (V , D) , thus completing the proof. 4: (15 points) Amortized Analysis. A sequence n of operations is performed on a data structure. The i-th operation costs i if i is an exact power of 4, and 1 otherwise. a) (8 points) Use aggregate analysis to determine the amortized cost per operation. SOLUTION: The aggregate cost is found by summing the actual cost of the sequence of n operations. We first observe that in a sequence of n operations there are log 4 n +1 exact powers of 4. The sum of costs is in 2 log n parts: 1) the cost of the 1’s, which is n i0 1 , where the -1 sum is needed because the cost of 1 is not paid for exact powers of 4, 2) the cost of the log n exact powers of 4, which is i0 4i . The aggregate cost is therefore= 4 4 n i 0 4 log n 4 1 n log i n log 4 n 1 4 4 n 1 i 0 log n 4 n log i 4 n 1 4 log4 n 1 1 = 3 (4)4 log4 n 1 (4)n 1 n 3n .To find the amortized cost per 3 3 operation we divide by n. This yields at most 3. b) (7 points) Use the accounting method to determine the amortized cost per operation. SOLUTION: Assign amortized cost of 3 to each operation in the sequence. This includes 1 to pay for itself and 2 to help pay for the operations whose sequence numbers are a power of 4. Note that the problem did not specify different types of operations. As a result, the solution cannot assign different amortized costs to different operations in the sequence. To complete the solution we must show that an amortized cost of 3 guarantees that the credit will never be negative. That is, we must show that: 5 of 10 UML CS n 91.503 Midterm Exam Fall, 2003 cˆ i 1 ci . This can be shown in 2 different ways. The first way is to n i 1 i use the results of (a) as follows. If a cost of 3 is assigned to each n n operation, then i 1 cˆi 3n . Part (a) showed that i 1 ci 3n . n cˆ i 1 ci then follows from transitivity. The second way is to calculate n i 1 i how much savings is needed in between gaps in powers of 4 in order to pay for the powers of 4 and then show that the amount saved is enough. The size of a gap between 4i and 4i-1 is 4i - 4i-1 = 4i-1(4 -1) = 4i-1(3). Let x be the amount of savings required for each item in the gap. Then the credit 4 3 will be sufficient if: 3x 4 i 1 4 i 3x 4 x . The savings of 2 for each item in the gap is therefore sufficient. As a result, the amortized cost of 3 suffices. 5: (40 points) Sequence Alignment Consider two sequences of characters: X = < x1, x2, ..., xm > and Y = < y1, y2, ..., yn >. The sequence alignment problem here asks for the optimal alignment of characters of X with characters of Y, where optimality means maximal total cost. Gaps are allowed. The cost assumptions for an aligned pair of characters xi and yj are: - if xi = yj, then this pair contributes cost c1 - if xi yj, then this pair contributes cost c2 - if a character of one sequence is aligned with a gap in the other sequence, then this character contributes cost c3 Design an algorithm that solves the sequence alignment problem. a) (14 points) Pseudocode SOLUTION: (see next page) 6 of 10 UML CS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 91.503 Midterm Exam SeqAlign ( X , Y , c1, c 2, c3) m length[ X ] n length[Y ] for i 0 to m do c[i,0] 0 for j 0 to n do c[0, j ] 0 for i 1 to m do for j 1 to n do t1 c1 c[i 1, j 1] t 2 c2 c[i 1, j 1] t3 c3 c[i 1, j ] t4 c3 c[i, j 1] if xi y j then x max( t1 , t3 , t 4 ) if x t1 then c[i, j ] t1 b[i, j ] 1 else if x t3 then c[i, j ] t3 b[i, j ] 3 else if x t 4 then c[i, j] t4 b[i, j ] 4 else x max( t 2 , t3 , t 4 ) if x t 2 then c[i, j] t2 b[i, j ] 2 else if x t3 then c[i, j ] t3 b[i, j ] 3 else if x t 4 then c[i, j] t4 7 of 10 Fall, 2003 UML CS 34 35 91.503 Midterm Exam Fall, 2003 b[i, j ] 4 DumpSeqAlign(b, X , Y , m, n, alignX , alignY , k ) Note: To make the recursion work properly we assume that the parameter k is passed by reference to DumpSeqAlign. 1 DumpSeqAlign(b, X , Y , i, j, alignX , alignY , k ) if i 0 or j 0 2 3 then return if b[i, j ] 1 or b[i, j ] 2 4 then DumpSeqAlign(b, X , Y , i 1, j 1, alignX , alignY , k ) 5 alignX [k ] X [i ] 6 alignY [k ] Y [ j ] 7 8 k k 1 else if b[i, j ] 3 9 then DumpSeqAlign(b, X , Y , i 1, j , alignX , alignY , k ) 10 alignX [k ] X [i ] 11 alignY [k ] ' ' 12 13 k k 1 else if b[i, j ] 4 14 then DumpSeqAlign(b, X , Y , i, j 1, alignX , alignY , k ) 15 alignX [k ] ' ' 16 alignY [k ] Y [ j ] 17 18 k k 1 b) (13 points) Correctness SOLUTION: The pseudocode is a dynamic programming algorithm that is very similar to the LCS algorithm on p. 353-355 of the text. As such, it inherits much of its correctness from the correctness of the LCS pseudocode. The main correctness task is to show that an optimal sequence alignment has optimal substructure. This can be established using a result similar to Theorem 15.1 on p. 351. This then leads to a recursive optimal cost calculation similar to Eq. 15.14 on p. 352. 8 of 10 UML CS 91.503 Midterm Exam Fall, 2003 For the optimal substructure theorem, let Z k z1 , z2 ,, zk be an optimal solution for Xm, Yn, where Xm, Yn denote sequences of length m and n, respectively. Let zi ( xs , yt ) represent a pair of characters, one from X and one from Y. Theorem: 1) If zk ( xm , yn ) and (( xm yn and c1 c3) or ( xm yn and c2 c3)) then Zk-1 is optimal for Xm-1 and Yn-1. 2) If zk ( xm , ' ' ) and (( xm yn and c3 c1) or ( xm yn and c3 c2)) then Zk is optimal for Xm-1 and Yn. 3) If zk (' ' , yn ) and (( xm yn and c3 c1) or ( xm yn and c3 c2)) then Zk is optimal for Xm and Yn-1. Proof: We prove (1), as the proofs of (2) and (3) are similar to that of (1). We establish (1) using a cut & paste proof by contradiction. By way of contradiction, suppose that Zk-1 is not optimal for Xm-1 and Yn-1. Then there exists some optimal Z’ for Xm-1 and Yn-1 such that cost(Z’)>cost(Zk-1). Now consider Z’’ = Z’ + (xm,yn). If ( xm yn and c1 c3) then cost(Z’’)=cost(Z’)+c1 and cost(Zk)=cost(Zk-1)+c1. Since cost(Z’)>cost(Zk-1), this implies that cost(Z’’)>cost(Zk), contradicting the optimality of Zk. On the other hand, if ( xm yn and c2 c3) , then cost(Z’’)=cost(Z’)+c2 and cost(Zk)=cost(Zk-1)+c2. Since cost(Z’)>cost(Zk-1), this implies that cost(Z’’)>cost(Zk), again contradicting the optimality of Zk. QED. The Theorem leads to the following recursive cost calculation, where c[i,j] denotes the optimal cost for Xi,Yj. max( c[i 1, j 1] c1, c[i 1, j ] c3, c[i, j 1] c3) if xi y j c[i, j ] max( c[i 1, j 1] c2, c[i 1, j ] c3, c[i, j 1] c3) if xi y j SeqAlign() implements the recursive cost expression above. It therefore correctly calculates the optimal alignment cost. This, combined with its similarity to LCS-LENGTH, justify its correctness. Finally, the actual alignment Zk is created in two steps. First, SeqAlign() records the direction corresponding to each cost choice in the b array. This 9 of 10 UML CS 91.503 Midterm Exam Fall, 2003 is similar to the usage of the b array in LCS-LENGTH. Second, DumpSeqAlign() recursively constructs Zk by storing its characters in character arrays alignX and alignY. The structure of DumpSeqAlign() is similar to that of PRINT-LCS() on p. 355. c) (13 points) Analysis: Provide as tight an upper bound on the worstcase asymptotic running time as you can. SOLUTION: The modifications to LCS-LENGTH() to create SeqAlign() add only (1) time to the cost calculation loop. Thus, the worst-case running time of SeqAlign() is in O(mn). Similarly, the modifications to PRINT-LCS() to create DumpSeqAlign() add only (1) time to each level of the recursion, to the worst-case running time of DumpSeqAlign() is in O(m+n). The total worst-case running time is therefore in O(mn). 10 of 10