UML CS 91.503 Midterm Exam Fall, 2008 MIDTERM EXAM SOLUTIONS Stats: to be determined later (with ?? points added) - Minimum: - Maximum: - Average: - Standard Deviation: 1: (5 points) Asymptotic Growth of Functions (a) (1 point) List the 4 functions below in nondecreasing asymptotic order of growth: lg 32 n 1) 3n n 3n n n 3 lg lg n 2) n lg n 2 n lg n 2 3) n 3 lg lg n lg 32 n 4) smallest largest n Rationale: lim n 3n 0 , so 3n is the smallest. n n lg n 2 n2 lg 2 n O(n3 lg lg n) because lg 2 n O(n lg lg n) . lg 32 5n ; this exponential function dominates the other 3 functions. n 1) f1 (n) (n2 ) 2) f 2 (n) (n3 ) 3) f3 (n) O((n lg n)2 ) 4) f 4 (n) 3n n lg 32 n n 3 lg lg n f2(n) f3(n) n lg n 2 n2 f1(n) f4(n) 1 of 10 n3 3n n UML CS 91.503 Midterm Exam b) (1 point) f1 (n) ( f 2 (n)) TRUE Fall, 2008 FALSE Proof: f1 (n) (n 2 ) f1 (n) O(n 2 ) [1] by the definition of the operator. n 2 O(n3 ) [2]. Applying transitivity to [1] and [2] yields f1 (n) O(n3 ) [3]. Now, via transpose symmetry from f 2 (n) (n3 ) we have n3 O( f 2 (n)) [4]. Applying transitivity to [3] and [4] yields f1 (n) ( f 2 (n)) . c) (1 point) f 3 (n) ( f 4 (n)) TRUE FALSE TRUE FALSE Counter-example: f3 (n) n and f 4 (n) n3 . d) (1 point) f 3 (n) (n3 ) Proof: f 3 (n) O(( n lg n) 2 ) combined transitively with (n lg n) 2 O(n3 ) yields f 3 (n) (n3 ) . e) (1 point) f1 (n) ( f 4 (n)) TRUE FALSE Counter-example: f1 (n) n 2 and f 4 (n) n3 2: (5 points) Recurrence Find a tight upper bound on the closed-form solution for the following recurrence: T (n) 3T (n 1) n where T(n) is constant for sufficiently small n. That is, find a function g (n ) such that T (n) O( g (n)) . Solution: The Master Theorem does not apply here. A recursion tree can be used. The tree has n 1 levels if T(0)=1. 3i (n i ) work is done at the i th level, except for the bottom level, where 3n T (0) 3n work is done (thanks to Jane for pointing out the work at the bottom level). The total work is: n1 n1 n1 n1 i n1 i n i n i ( 3 ( n i )) 3 3 n 3 i 3 n 3 3i i 3n (using closed-form solutions to i 0 i 0 i 0 i0 i 0 the summations): n(3n 1) (n 1)3( n1) n3n 3 n 7 n n 3 3 3 O(3n ) . Thus, T (n) O 3n . 2 4 4 2 4 2 of 10 UML CS 91.503 Midterm Exam Fall, 2008 3: (5 points) Analyze Pseudocode Mystery1 has one argument: a positive integer value n. Mystery1(n) print " Mystery1 called with n " n if n 1 then return for i 1 to 3 do Mystery 2(n / 4) return Mystery 2(n) print " Mystery2 called with n " n if n 1 then return Mystery1(n / 4) return Derive a tight upper bound on Mystery1’s worst-case asymptotic running time as a function of n. Solution: In the worst case, let n be a power of 4. T (n) (1) 3((1) T (n / 16)) 3T (n / 16) (1). Case 1 of the Master Theorem applies, yielding T (n) (nlog16 3 ) . So, a tight upper bound is T (n) O(nlog16 3 ) . 4: (35 points) Design an Algorithm: Shortest Paths This problem is from Introduction to Algorithms: A Creative Approach, by Udi Manber. Let G = ( V, E ) be an unweighted, directed graph. Let v and w be two vertices of G. Design an efficient algorithm that finds the number of different shortest paths (not necessarily vertex-disjoint) between v and w. Make sure that you provide pseudocode, correctness justification and running time analysis for your algorithm. 3 of 10 UML CS 91.503 Midterm Exam Fall, 2008 a) (12 points) Pseudocode: BFS-Count(G, s) is called once with s=v. BFS-Count(G, s) 1 for each verte x u V [G ] {s} do color [u ] WHITE 2 nShortestPaths[u ] 0 3 d [u ] 4 5 color[ s] GRAY 6 d [ s] 0 7 nShortestPaths[ s] 0 8 Q 0 9 ENQUEUE(Q, s) 10 while Q 0 do u DEQUEUE( Q) 11 for each verte x v Adj[u ] 12 do if color[v] WHITE 13 then color[v] GRAY 14 d [v] d [u ] 1 15 nShortestPaths[v] 1 16 17 ENQUEUE(Q, v) else if d [v] d [u ] 1 18 then nShortestPaths[v] nShortestPaths[v] nShortestPaths[u ] 19 color[u ] BLACK 20 b) (12 points) Correctness i) Mechanical: (4 points) The for loops in lines 1-4 and 12-19 terminate. In lines 1-4 the loop visits each vertex (except the source) once. In lines 12-19 the loop visits each element of an adjacency list whose length is finite. The while loop in lines 10-20 terminates because each vertex is ENQUEUE’d only once and is eventually DEQUEUE’d. Arrays color, d, and nShortestPaths stay within bounds. ii) “As Advertised”: (8 points) BFS-Count(G, s) uses a modified Breadth-First-Search starting at vertex s. (Note: Notationally, the vertex v inside BFS-Count should not be confused with the v in the high-level call BFS-Count(G, v)). It is similar to the BFS procedure on p. 532 of our textbook, except that the predecessor array is not used and lines 3, 7, 16, 18, and 4 of 10 UML CS 91.503 Midterm Exam Fall, 2008 19 are introduced to keep track of the number of shortest paths. Upon termination of BFS-Count(G, s), for each vertex x s , d[x] contains the length of the shortest path from s to x. That is true due to Theorem 22.5. We claim that, upon termination, nShortestPaths contains the number of shortest paths from s to x. This can be shown by induction, where the inductive hypothesis is that, at the end of each iteration of the while loop, nShortestPaths[v] contains the number of shortest paths from s to v discovered by BFS so far for each vertex v adjacent to u. In lines 3 and 7, each element of nShortestPaths is initialized to 0. As a base case, at the end of the first iteration of the while loop, each vertex v adjacent to s, having been WHITE, will have nShortestPaths[v]=1; this correctly represents the shortest path of length 1 from s to v. For the inductive step we consider what occurs during some iteration of the while loop. When a vertex v is first discovered, this means that the first shortest path in the BFS tree from s to v has been identified; nShortestPaths[v] is therefore set to 1 in line 16. If v has previously been encountered when we arrive at line 12, then we must check if we have just discovered a shortest path; this is done with the test in line 18. If we have discovered another shortest path, then we perform the assignment in line 19. Note that our inductive hypothesis guarantees that nShortestPaths[v] contains the number of shortest paths discovered so far from s to v and that nShortestPaths[u] contains the number of shortest paths discovered so far from s to u. Thus, the addition in line 19 yields the correctly updated number of shortest paths discovered so far from s to v. This completes the induction. Upon termination, nShortestPaths contains the number of shortest paths from s to each other vertex, which guarantees that we find the number of shortest paths from s to the original target vertex w. c) (11 points) Analysis: Derive the tightest upper bound that you can on the worstcase asymptotic running time of your pseudocode. - The worst-case asymptotic running time of BFS-Count(G, s) is in O(| V | | E |) . This is because: the worst-case asymptotic running time of BFS(G, s) is in O(| V | | E |) a constant number of operations has been removed a constant number of constant-time operations have been inserted, without creating new loops, function calls, or any recursion. 5: (10 points) Amortized Analysis This problem uses the journal paper “Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs” by David Eppstein. Prove that the total potential (defined on p. 6 of the paper) cannot be negative, regardless of the sequence of insertion and deletion operations (and any resulting merge operations). That is, show that: n2 log n i 0 5 of 10 UML CS 91.503 Midterm Exam Fall, 2008 (Note: Be sure to show that the initial total potential is also non-negative.) Solution: When all the points are initially put into S1 at the beginning, then k=1. Thus, n | Si | log | Si | n 2 log n and n2 log n n2 log n 0 . In general, we know that k | S i 1 so log | Si | log n . This implies that k | S i 1 k | S i 1 i | n, k i | log | Si | | Si | log n n log n . Transitively, i 1 k i | log | Si | n log n , which means that n | Si | log | Si | n 2 log n . Thus, i 1 k n 2 log n i n 2 log n n | Si | log | Si | n 2 log n n 2 log n 0 . Therefore the total potential i 1 is non-negative. 6: (10 points) Flows This problem uses the flow network G ' (V ' , E ' ) described in Section 26.3 on p. 665666 of our textbook for finding a maximum bipartite matching in an undirected bipartite graph G (V , E ) . Here we are given an integer flow network x 1and we modify G ' to form the G ' ' . G ' ' is identical to G ' except that each edge in G ' ' has capacity x instead of capacity 1. Does this change the results established in Section 26.3? Discuss if, how and why. Solution: Even though all the edge capacities are all equal integers, some of the results of Section 26.3 change or need additional explanation. In particular, the proof of Lemma 26.10 (converse direction) needs additional justification. In that proof the fact that the flow was integer-valued and the capacity of each edge was 1 was used to argue that, for each vertex u L , one unit of positive flow could enter on at most one edge and leave on at most one edge. This was critical to proving that the set of edges being considered was a matching. Section 26.3 relies on the Ford-Fulkerson method, which, in its most general form (FORD-FULKERSON-METHOD on p. 651) just keeps finding an augmenting path and increasing flow accordingly. This method allows the following type of situation to occur, as illustrated in Justin’s diagram below. Here maximum flow (although having the integrality property) splits flow coming out of a vertex (vertex a): 6 of 10 UML CS 91.503 Midterm Exam 1/2 s c 1/2 2/2 1/2 0/2 b x a 2/2 t 1/2 1/2 y Fall, 2008 2/2 1/2 As a result, Corollary 26.12, which showed that the cardinality of a maximum matching in M equaled the value of a maximum flow in f, is no longer substantiated. However, if we consider the more specialized FORD-FULKERSON procedure on p. 658, then we can say (this is Jane’s point of view) that because FORD-FULKERSON forces flow along an augmenting path to equal the minimum capacity on the path, and that capacity will be x, the flow will not be split across edges emanating from a vertex of L and Lemma 26.10 will still hold. So, depending on your interpretation of “Ford Fulkerson method” one could successfully argue either for or against the validity of this part of the converse direction of Lemma 26.10. Regardless of one’s interpretation, Jane correctly points out that the wording of both Lemma 26.10 and Corollary 26.12 need to change slightly due to the cardinality of the maximum flow now being | f || xM | rather than just | f || M | . The cardinality of the matching is therefore | M || f | / x . Note that because x is an integer the integrality theorem (Theorem 26.11) still holds. 7: (30 points) Dynamic Programming This problem is adapted from the book Research and Education Association Problem Solvers: Operations Research. The task is to plan a production schedule for expensive wireless sensors over a 4-month period of time from November, 2008 through February, 2009. The goal is to meet demand while minimizing cost. The company has demand forecasts for each of the 4 months, given below: 7 of 10 UML CS Month 91.503 Midterm Exam Month Index Fall, 2008 Demand (in thousands) November 1 4 December 2 1 January 3 3 February 4 2 A schedule is represented by (x1, x2, x3, x4), where xj represents the number of wireless sensors (in thousands) produced during month j. A schedule is feasible if it meets demand. That is, the following constraints must be satisfied: x1 4 x1 x2 5 x1 x 2 x 3 8 x1 x 2 x3 x 4 10 Note that the last constraint is an equality to avoid over-producing. The costs are: - $7,000 per sensor that is produced; - $40,000 for each month that has a production run (set-up cost); - $10,000 per month for each sensor that is produced during one month but shipped during a later month (carry-over or storage cost). For example, for the schedule (5, 0, 3, 2) the total cost (in thousands of dollars) would be = 5 7 0 7 3 7 2 7 70 for the units produced, plus 3 40 120 for the 3 setups (November, January and February), plus 110 10 for the sensors produced in November and sold later on in February, for a total cost of $200,000. Formulate a minimal cost expression recursively. Justify your answer by demonstrating optimal substructure. You do not need to solve your expression to obtain an optimal schedule for this particular problem instance. 8 of 10 UML CS 91.503 Midterm Exam Fall, 2008 Solution: Pseudocode was not requested, so we just set up the recursive cost formulation and demonstrate optimal substructure. All units here are expressed in thousands. Let zj be the number of units on hand at the start of the j th month. Then z1 0 , z2 x1 4 , z3 z2 x2 1 , z4 z3 x3 3 , and z4 x4 2 0 . (Note that we use the last equation to avoid overproducing at the end.) Let dj be the demand for month j. Let Pj(zj) be the cost, taking into account production decisions for months j,…,4. The recursive cost formulation is: Pj ( z j ) min{ 7 x j 10( z j x j d j ) 40 ( x j ) Pj 1 ( z j x j d j )} where 1 if x j 0 and the following constraints are imposed for each j in order to 0 otherwise satisfy demand and not over-produce: (x j ) zj xj d j and z j x j d j d j 1 d 4 . (We assume that P5 (0) 0 ). The book does not prove optimal substructure, but we do it here. To justify the cost formulation, we examine each part of it. Since the goal is to minimize cost, we minimize the overall cost expression. (Note that, for Pj(zj), we must choose a combination of values for xj and zj that minimizes the cost expression.) The 7xj part is the “per-sensor” cost, and 40xj) is the production set-up cost. 10( z j x j d j ) represents the storage charge. Finally, we discuss Pj 1 ( z j x j d j ) , which we claim exhibits optimal substructure. Assuming that we abide by the constraints, we can apply a cut-and-paste proof by contradiction to establish optimality for Pj 1 ( z j x j d j ) . Let z j ' , x' j be values that minimize the Pj(zj) expression, so that Pj(z’j) is optimal. By way of contradiction, suppose that there was a better way to make decisions for months ( j+1)…4 for z ' j x' j d j ; call this P ' j 1 ( z ' j x' j d j ) . This would yield 7 x' j 10( z ' j x' j d j ) 40 ( x' j ) P' j 1 ( z ' j x' j d j ) 7 x' j 10( z ' j x' j d j ) 40 ( x' j ) Pj 1 ( z ' j x' j d j ) . Since we are minimizing, this would produce a cost < Pj(z’j) , contradicting the optimality of Pj(z’j). Note that if Pj(zj) relied on past months rather than future months, then it would not be clear how much extra to produce in the base case for the first month. 9 of 10 UML CS 91.503 Midterm Exam 10 of 10 Fall, 2008