CS 415 ALGORITHM ANALYSIS SPRING 2011 Mid-term 1, March 22, 2011, 10 to 12 noon Name ______________________________ Write the answers in the space provided. You are allowed to use any book/notes. You need not repeat any algorithm that is in the text or was presented in class. Just make a reference to the page in the text or in the lecture slide. Clarity (in the form as well as content) of presentation is a critical part of your grade. For algorithm design problems, efficient design is a problem requirement even if it is not explicitly stated. Problem 1 Problem 2 Problem 3 Problem 4 Problem 5 Problem 6 TOTAL 1) A pipe which is 200 meters long carrying a chemical is connected to the production plant at one end (right end) and a packaging unit at the other (left) end where the chemical is measured into container for packaging and sale. From time to time there is a blockage at a single location in the pipe at some position k from the packaging unit. When this happens, the production unit’s pressure gauge triggers an alarm and the system shuts down. Your job is to locate the spot where the blockage has occurred. There are probes located at positions separated by 1 meter. You can open a probe, and if you detect overflowing chemical, you know that the block is to the right; else it is to the right. Clearly the cost associated with probe that causes the chemical to overflow is expensive since a clean up should be performed following the test. Thus the management has instructed to minimize the number of such probes to no more than two. What is the minimum number of probes needed to locate the probe position closest the blockage position, subject to the constraint? Describe how you would locate the blockage with the minimum number of probes. SOLUTION: The problem is essentially the same as minimizing the number of tests performed using 2 jars to find the highest step from which you can drop it without breaking it. We make the following claim: The maximum number of stairs in a ladder for which the above problem can be solved using n tests is n(n + 1)/2. The proof is by induction on n. With n = 1, the max. number of stairs = 1. (Why? Suppose the number of stairs in a ladder is 2. Then, if your first test is from stair 1 and the jar does not break then clearly you have not solved the problem. Similarly if the first test is from stair 2 and jar breaks, then you have not solved the problem. Similar argument holds for num stairs > 2.) Assume that the above claim is true for k = n – 1. We will show it for k = n. Suppose the first test is on stair k. Then, number of stairs below k is at most n – 1. (For, suppose the first test breaks the jar, then the rest of the tests have to be done sequentially in the order 1, 2, 3, …, k – 1. Since you have at most n – 1 additional tests left, k – 1 must be <= n – 1. Since we are trying to maximize the number of stairs, we should choose k – 1 = n – 1. By induction hypothesis, the number of stairs above k is n(n – 1)/ 2 so the total number of stairs is = n( n – 1) / 2 so the maximum total is n ( n – 1) / 2 + n = n(n+1)/2. Now, we can check that with n = 19, the maximum number of stairs for which we can successfully solve the problem is 19 x 20/2 = 190, but our ladder has 200 stairs so we need more than 19. But with n = 20, the maximum number of stairs on the ladder for which we can successfully solve the problem is 20 x 21/2 = 210. Since this number of above 200, 20 is the minimum number of tests needed to solve the problem. Most of you gave your answer as 27. Clearly this answer is far from optimal. Clearly 2 sqrt(n) – 1 was acceptable for HW problem. But here the question requires MINIMUM number of tests so you should analyze the problem more carefully. (Recall that there was a practice problem where I asked you to work out the minimum number of tests in the case of 100 stairs.) 2) Given an array A containing n numbers and a number T, you are to find if there exist two indices j and k, 0 <= j, k <= n – 1 such that A[j] + A[k] = T. Design an efficient algorithm to solve this problem. Describe your algorithm informally first, then present pseudo-code and illustrate how your algorithm works step by step with an example where n = 4. Finally, estimate the number of operations performed by your algorithm as a function of n and express the total complexity of the algorithm as a function of n using O notation. Solution: This problem was discussed in class and a complete solution was presented. There are basically two algorithms: Algorithm 1: sort the array. Then for each key A[j], j = 0, 1, 2, …, n – 1 , perform a binary search for T – A[j] in the sorted array. If T – A[j] is found in the array for some j, then output TRUE, if all n binary searches failed, output FALSE. Correctness proof: Clearly, if the algorithm outputs YES, such a pair exists. Conversely, if such a pair (j, k) exists, during the j-th iteration of the loop, the binary search will succeed and hence the program will output YES. Pseudo-code: Input: Array A of size n and a number T, Output: YES if there exist j, k such that A[j] + A[k] = T, NO else. Algorithm: Step 1: sort the array using Heap Sort. Step 2: for j =0, 1, 2, …, n – 1 do if Binary search(A, 1, n, A[j]) then return TRUE; Step 3: return FALSE. Binary search and HEAP SORT are standard algorithms so we omit the details. Complexity: Step 1 takes O(n log n) time, Step 2 involves n iterations of binary search each of which take O(log n) time for a total of O(n log n). The overall complexity is O(n log n). Algorithm 2: Outline: sort array A as in algorithm 1. Initialize two indices low to 0 and high to n – 1. compare A[low] + A[high] to T. If equal, then output YES and exit; if A[low] + A[high] > T, then high = high – 1, else low = low + 1. Repeat the process until low <= high. Proof of correctness: We have to argue that the rules for updating the low and high indices are correct. Suppose in one of the iterations A[low] + A[high] > T. Then, A[j] + A[high] is also > T for any j > low so we can eliminate all the pairs of indices in which the larger index is high can be removed from consideration which justifies the update rule high--. Similar argument holds for the case A[low] + A[high] < T. Pseudo-code: Input: Array A[0 .. n – 1], integer T. Output: yes if there exist j, k (j <= k) such that A[j] + A[k] = T, no else. Step 1: Sort A using heap sort. Step 2: low = 0; high = n – 1; while (low <= high) { if A[low] + A[high] = T return yes; else if A[low] + A[high] > T high--; else low++; } Step 3: output( no). Complexity: Step 1 takes O(n log n) time. Step 2 involves at most n iterations. (Each time low or high moves, they are separated by n – 1 steps, and when the distance between them is -1, we exit. Thus the number of iterations is exactly n.) In each iteration of the loop, a constant number of operations are performed, so the total complexity of this step is O(n). Step 3 requires 1 operation. Thus the total complexity = O(n log n) + O(n) + 1 = O(n log n). Example: Input: A = [ 12, 8, 17, 5], T = 21 Step 1: A gets sorted as 5, 8, 12, 17 Step 2: low = 0, high = 3 iteration 1: sum = 22, above T so high = 2. Iteration 2: sum = 17, below T so low = 1. Iteration 3: sum = 20, below T so low = 2. Iteration 4: sum = 34, above T so high = 3. Now (high > low) so we quit with no answer. This is the correct answer for this input. 3) Consider a directed graph G = <V, E> in which the vertices are colored with one of two colors Black or Red. Given a start vertex s (colored B), and a finish vertex t (colored R), we want to know if there is a path from s to t in which the vertices are alternately colored B R B R … B R. Design an efficient algorithm to solve this problem. State your algorithm clearly and obtain its worst-case time complexity as a function of n = |V| and m = |E|.(State clearly how the standard representation of a graph should be augmented to include the vertex color.) Solution: There are two solutions. Both look similar but they are different in one fundamental way. We will discuss this at the end. Solution 1: Recall that BFS_Search(G, s) will start with a graph G and a source vertex s and will initialize a boolean array visited[] in which all the entries are initially false. When the search ends, visited[j] contains true if and only if there is a directed path from s to j. Since this function is standard (see the text book or class notes), we will not repeat the pseudo-code for this. Note that BFS_Search’s input graph G is directed, and has no colors on vertices. Outline of algorithm: Now we are given a graph G in which vertices are colored with B or R. Assume that G is given in the form of an adjacency list plus a COLOR array of size n (n = the number of vertices in G) that gives the color of vertex j = 0, 1, …, n – 1. We will do a pre-processing of graph G to produce a graph G’ (in which vertices have no colors). Then, we will simply call BFS_Search(G’, s) and this will solve the problem. The idea is simple. Since the colors have to alternate, what we do is walk through the adjacency list of each vertex j in G that is colored R and remove from the list all the vertices are colored R. Similarly, for all vertices colored B, we remove the black colored vertices from its adjacency list. Once this is done, we ignore the COLOR array. The resulting graph G’ is given as input to BFS_Search. If visited[t] is true (false), then output yes (no). Pseudo-code: Boolean colored_BFS (Input: G using adjacency graph, COLOR array, vertices s, t) Output: yes if there is a path from s to t in which colors alternate, no otherwise. Step 1: for every vertex u in adjacency list do Walk through the adjacency list of u and remove all the vertices whose color is the same as the color of j; Let G’ be the resulting graph. Step 2: Call BFS_Search(G’, s). Step 3: if visited[t] == true then output YES else output NO. Time complexity: Let n = number of vertices in G, m = the number of edges in G. It is eacy to see that step 1 takes O(n + m) time since we walk through all the nodes in the adjacency list once. Step 2 is the standard BFS that takes O(n + m) steps. Step 3 takes O(1) time. Total time complexity = O(n + m). The second algorithm will be discussed in class. 4) Consider the following problem: given as input a set of jobs J1, …,Jk where for each job, you are given (start time, finish time, revenue). You are to choose a subset of jobs that are not overlapping in time, and the sum of the profits of the selected jobs should be as large as possible. Recall that when revenue is the same for all the jobs, we showed that the greedy algorithm that sorts the jobs by increasing order of finish time is optimal. Give a counterexample to show that none of the greedy algorithms are not optimal: (a) most profitable job first (i.e., select the most profitable job J, remove the jobs that overlap with J, then repeat the process) (b) earliest deadline first, (c) fewest overlapping job first and (d) earliest starting job first. Design an efficient algorithm to solve the problem and obtain an estimate of its time complexity. Solution: the counterexamples are easy to construct and so we skip this. Outline of algorithm for optimally solving the problem: First we convert this problem to the problem of finding the longest path in an acyclic directed path. One way to do this is as follows: each job represents a node, and if job J’s finish time comes before job K’s start time, then there is an edge from J to K. Assign a weight on the edge <J, K> = profit of job J. Also add two extra nodes s and t, and add an edge from s to every vertex with weight 0 on all the edges. Also add an edge from every vertex j to t with weight equal to the profit of job j. Now it is easy to see that each path from s to t in this graph represents a selection of disjoint set of jobs and hence the longest path in this graph from s to t gives the optimal solution to the problem. Algorithm for finding the longest path between s and t in an acyclic directed graph G: perform the following computation on each vertex and assign a value to it in the reverse order of topological sorting of vertices in G. For a vertex with no outgoing edge, assign value = 0. (For example, t will be assigned value = 0 since it has no outgoing edges.) Now we will describe how to assign value to a generic vertex v in the reverse topologically sorted order. Since we follow this order, we know that at this point, if there are outgoing edges from v to v1, v2, …, vk, then vertices v1, v2, …, vk have been assigned a value. Call these values val(v1), …, val(vk). Also suppose the weight on the edge <v, vj> is wj. Then, assign val(v) as val(v) = max { val(vj), + wj}, j = 1, 2, …, k. val(s) is the optimal profit for the problem. Why does it work? First note that the graph created above is acyclic. We can show by induction that for any vertex j other than s and t, the length of the longest path from j to t is the maximum profit we can get if we are allowed to choose only jobs that start at or later than the starting time of job j. This shows that the algorithm is correct. Pseudo-code: Input: a sequence of jobs J1, …,Jk and for each job, < sj,, fj,, pj> the start time, finish time and profit. Output: S, a subset of {1, 2, …, k} of jobs that are pairwise not overlapping and the total profit of jobs in S is as large as possible. Step 1: Create a weighted, acyclic graph G on k+2 vertices with vertex label {s, 1, 2, …, k, t}. Add edge from s to each other vertex with edge weight = 0 and also add edge from every vertex in 1, 2, …, k to t with weight pj. Add edge from job j to job k with weight pj if sk > fj. Step 2: Perform topological sorting of graph G. Let SORT[0 .. k+1] be the topological sorting of vertices of G. here we have used 0 for s, and k+1 for t. Step 3: for j = k+1 down to 0 do if (j has no outgoing edges) then val[j] = 0 else { value = 0; for each vertex k in adjacency_list[j] do if (val[k] + weight of edge <j, k> > value) { value = val[k] + weight of edge<j, k>; link[j] = k; } end for; val[j] = value; } Step 4: output the path from s to t using link pointer. Complexity: it is easy to see that the time complexity of constructing the graph is O(n2). Also the number of edges in G is O(n2) hence the time complexity of steps 2 and 3 is O(n2). Step 4 takes O(n) time. Total complexity = O(n2). 5) A deadly disease is spreading through the population of a country A. In country B, where the infection did not exist until now, there is panic since (exactly) one of the N persons returning from a visit to A is reported to be infected. (This information is known from country A’s blood test at the port of exit, but A’s policy allows it to only report the number of infected people, not their identity.) Center for Disease Control (CDC) of country B wants to quarantine this carrier. There is a highly reliable blood test which never produces false positives or false negatives. The test is very expensive but highly sensitive. A pool containing hundreds of blood samples mixed together will test positive if at least one of them is infected. CDC is anxious to quarantine the patient with as few blood tests as possible. As an algorithm designer, you have been asked to devise a test plan that meets the above requirements. Describe your plan informally and express the number of blood tests conducted by your plan as a function of N, and prove that it works. Solution: This problem can be solved by just using binary search. Collect samples of all n people. Form a group of floor (n/2) blood samples, and test. If it tests positive, then continue testing on this group, else continue testing on the remaining group of ceiling(n/2) people. Keep repeating this until only one member is left and this person is the infected. This is clearly similar to binary search where the domain of search reduces in size by 2 after each test. Thus the total number of tests is 1 + [log2 n]. 6) Shown below are a weighted directed graph, the content of the binary heap Q and the parent array (Pi) at the end of second iteration of Dijkstra’s algorithm. Show all the steps involved in the next iteration. (The adjacency list is not shown here. You should exhibit this list. Assume that the adjacency list is sorted by vertex label.) Name each step clearly (such as DeleteMin, relaxation on edge such and such, etc.)