CS 415 ALGORITHM ANALYSIS SPRING 2011

advertisement
CS 415 ALGORITHM ANALYSIS
SPRING 2011
Mid-term 1, March 22, 2011, 10 to 12 noon
Name ______________________________
Write the answers in the space provided.
You are allowed to use any book/notes.
You need not repeat any algorithm that is in the text or was presented in class. Just make a
reference to the page in the text or in the lecture slide.
Clarity (in the form as well as content) of presentation is a critical part of your grade.
For algorithm design problems, efficient design is a problem requirement even if it is not
explicitly stated.
Problem 1
Problem 2
Problem 3
Problem 4
Problem 5
Problem 6
TOTAL
1) A pipe which is 200 meters long carrying a chemical is connected to the production plant
at one end (right end) and a packaging unit at the other (left) end where the chemical is
measured into container for packaging and sale. From time to time there is a blockage at a
single location in the pipe at some position k from the packaging unit. When this
happens, the production unit’s pressure gauge triggers an alarm and the system shuts
down. Your job is to locate the spot where the blockage has occurred. There are probes
located at positions separated by 1 meter. You can open a probe, and if you detect
overflowing chemical, you know that the block is to the right; else it is to the right. Clearly
the cost associated with probe that causes the chemical to overflow is expensive since a
clean up should be performed following the test. Thus the management has instructed to
minimize the number of such probes to no more than two. What is the minimum number
of probes needed to locate the probe position closest the blockage position, subject to the
constraint? Describe how you would locate the blockage with the minimum number of
probes.
SOLUTION: The problem is essentially the same as minimizing the number of tests
performed using 2 jars to find the highest step from which you can drop it without
breaking it. We make the following claim:
The maximum number of stairs in a ladder for which the above problem can be
solved using n tests is n(n + 1)/2.
The proof is by induction on n. With n = 1, the max. number of stairs = 1. (Why?
Suppose the number of stairs in a ladder is 2. Then, if your first test is from stair 1
and the jar does not break then clearly you have not solved the problem. Similarly
if the first test is from stair 2 and jar breaks, then you have not solved the problem.
Similar argument holds for num stairs > 2.)
Assume that the above claim is true for k = n – 1. We will show it for k = n. Suppose
the first test is on stair k. Then, number of stairs below k is at most n – 1. (For,
suppose the first test breaks the jar, then the rest of the tests have to be done
sequentially in the order 1, 2, 3, …, k – 1. Since you have at most n – 1 additional tests
left, k – 1 must be <= n – 1. Since we are trying to maximize the number of stairs, we
should choose k – 1 = n – 1. By induction hypothesis, the number of stairs above k is
n(n – 1)/ 2 so the total number of stairs is = n( n – 1) / 2 so the maximum total is n (
n – 1) / 2 + n = n(n+1)/2. Now, we can check that with n = 19, the maximum number
of stairs for which we can successfully solve the problem is 19 x 20/2 = 190, but our
ladder has 200 stairs so we need more than 19. But with n = 20, the maximum
number of stairs on the ladder for which we can successfully solve the problem is
20 x 21/2 = 210. Since this number of above 200, 20 is the minimum number of tests
needed to solve the problem.
Most of you gave your answer as 27. Clearly this answer is far from optimal. Clearly
2 sqrt(n) – 1 was acceptable for HW problem. But here the question requires
MINIMUM number of tests so you should analyze the problem more carefully.
(Recall that there was a practice problem where I asked you to work out the
minimum number of tests in the case of 100 stairs.)
2) Given an array A containing n numbers and a number T, you are to find if there exist two
indices j and k, 0 <= j, k <= n – 1 such that A[j] + A[k] = T. Design an efficient algorithm to
solve this problem. Describe your algorithm informally first, then present pseudo-code
and illustrate how your algorithm works step by step with an example where n = 4. Finally,
estimate the number of operations performed by your algorithm as a function of n and
express the total complexity of the algorithm as a function of n using O notation.
Solution: This problem was discussed in class and a complete solution was
presented. There are basically two algorithms:
Algorithm 1: sort the array. Then for each key A[j], j = 0, 1, 2, …, n – 1 , perform a
binary search for T – A[j] in the sorted array. If T – A[j] is found in the array for
some j, then output TRUE, if all n binary searches failed, output FALSE.
Correctness proof: Clearly, if the algorithm outputs YES, such a pair exists.
Conversely, if such a pair (j, k) exists, during the j-th iteration of the loop, the binary
search will succeed and hence the program will output YES.
Pseudo-code:
Input: Array A of size n and a number T,
Output: YES if there exist j, k such that A[j] + A[k] = T, NO else.
Algorithm:
Step 1: sort the array using Heap Sort.
Step 2: for j =0, 1, 2, …, n – 1 do
if Binary search(A, 1, n, A[j]) then return TRUE;
Step 3: return FALSE.
Binary search and HEAP SORT are standard algorithms so we omit the details.
Complexity: Step 1 takes O(n log n) time, Step 2 involves n iterations of binary
search each of which take O(log n) time for a total of O(n log n). The overall
complexity is O(n log n).
Algorithm 2:
Outline: sort array A as in algorithm 1. Initialize two indices low to 0 and high to n
– 1. compare A[low] + A[high] to T. If equal, then output YES and exit; if A[low] +
A[high] > T, then high = high – 1, else low = low + 1. Repeat the process until low <=
high.
Proof of correctness: We have to argue that the rules for updating the low and
high indices are correct. Suppose in one of the iterations A[low] + A[high] > T.
Then, A[j] + A[high] is also > T for any j > low so we can eliminate all the pairs of
indices in which the larger index is high can be removed from consideration which
justifies the update rule high--. Similar argument holds for the case A[low] +
A[high] < T.
Pseudo-code:
Input: Array A[0 .. n – 1], integer T.
Output: yes if there exist j, k (j <= k) such that A[j] + A[k] = T, no else.
Step 1: Sort A using heap sort.
Step 2: low = 0; high = n – 1;
while (low <= high) {
if A[low] + A[high] = T return yes;
else if A[low] + A[high] > T high--;
else low++;
}
Step 3: output( no).
Complexity: Step 1 takes O(n log n) time. Step 2 involves at most n iterations. (Each
time low or high moves, they are separated by n – 1 steps, and when the distance
between them is -1, we exit. Thus the number of iterations is exactly n.) In each
iteration of the loop, a constant number of operations are performed, so the total
complexity of this step is O(n). Step 3 requires 1 operation.
Thus the total complexity = O(n log n) + O(n) + 1 = O(n log n).
Example:
Input: A = [ 12, 8, 17, 5], T = 21
Step 1: A gets sorted as 5, 8, 12, 17
Step 2:
low = 0, high = 3
iteration 1: sum = 22, above T so high = 2.
Iteration 2: sum = 17, below T so low = 1.
Iteration 3: sum = 20, below T so low = 2.
Iteration 4: sum = 34, above T so high = 3.
Now (high > low) so we quit with no answer. This is the correct answer for this
input.
3) Consider a directed graph G = <V, E> in which the vertices are colored with one of two
colors Black or Red. Given a start vertex s (colored B), and a finish vertex t (colored R), we
want to know if there is a path from s to t in which the vertices are alternately colored B R
B R … B R. Design an efficient algorithm to solve this problem. State your algorithm clearly
and obtain its worst-case time complexity as a function of n = |V| and m = |E|.(State
clearly how the standard representation of a graph should be augmented to include the
vertex color.)
Solution: There are two solutions. Both look similar but they are different in one
fundamental way. We will discuss this at the end.
Solution 1: Recall that BFS_Search(G, s) will start with a graph G and a source
vertex s and will initialize a boolean array visited[] in which all the entries are
initially false. When the search ends, visited[j] contains true if and only if there is a
directed path from s to j.
Since this function is standard (see the text book or class notes), we will not repeat
the pseudo-code for this. Note that BFS_Search’s input graph G is directed, and has
no colors on vertices.
Outline of algorithm: Now we are given a graph G in which vertices are colored
with B or R. Assume that G is given in the form of an adjacency list plus a COLOR
array of size n (n = the number of vertices in G) that gives the color of vertex j = 0, 1,
…, n – 1. We will do a pre-processing of graph G to produce a graph G’ (in which
vertices have no colors). Then, we will simply call BFS_Search(G’, s) and this will
solve the problem. The idea is simple. Since the colors have to alternate, what we
do is walk through the adjacency list of each vertex j in G that is colored R and
remove from the list all the vertices are colored R. Similarly, for all vertices
colored B, we remove the black colored vertices from its adjacency list. Once this is
done, we ignore the COLOR array. The resulting graph G’ is given as input to
BFS_Search. If visited[t] is true (false), then output yes (no).
Pseudo-code:
Boolean colored_BFS (Input: G using adjacency graph, COLOR array, vertices s, t)
Output: yes if there is a path from s to t in which colors alternate, no otherwise.
Step 1:
for every vertex u in adjacency list do
Walk through the adjacency list of u and remove all the vertices whose color is
the same as the color of j;
Let G’ be the resulting graph.
Step 2:
Call BFS_Search(G’, s).
Step 3: if visited[t] == true then output YES else output NO.
Time complexity: Let n = number of vertices in G, m = the number of edges in G. It
is eacy to see that step 1 takes O(n + m) time since we walk through all the nodes in
the adjacency list once.
Step 2 is the standard BFS that takes O(n + m) steps. Step 3 takes O(1) time.
Total time complexity = O(n + m).
The second algorithm will be discussed in class.
4) Consider the following problem: given as input a set of jobs J1, …,Jk where for each job, you
are given (start time, finish time, revenue). You are to choose a subset of jobs that are not
overlapping in time, and the sum of the profits of the selected jobs should be as large as
possible. Recall that when revenue is the same for all the jobs, we showed that the greedy
algorithm that sorts the jobs by increasing order of finish time is optimal. Give a
counterexample to show that none of the greedy algorithms are not optimal: (a) most
profitable job first (i.e., select the most profitable job J, remove the jobs that overlap with
J, then repeat the process) (b) earliest deadline first, (c) fewest overlapping job first and
(d) earliest starting job first. Design an efficient algorithm to solve the problem and obtain
an estimate of its time complexity.
Solution: the counterexamples are easy to construct and so we skip this.
Outline of algorithm for optimally solving the problem: First we convert this
problem to the problem of finding the longest path in an acyclic directed path.
One way to do this is as follows: each job represents a node, and if job J’s finish
time comes before job K’s start time, then there is an edge from J to K. Assign a
weight on the edge <J, K> = profit of job J. Also add two extra nodes s and t, and
add an edge from s to every vertex with weight 0 on all the edges. Also add an edge
from every vertex j to t with weight equal to the profit of job j. Now it is easy to see
that each path from s to t in this graph represents a selection of disjoint set of jobs
and hence the longest path in this graph from s to t gives the optimal solution to
the problem.
Algorithm for finding the longest path between s and t in an acyclic directed graph
G: perform the following computation on each vertex and assign a value to it in the
reverse order of topological sorting of vertices in G. For a vertex with no outgoing
edge, assign value = 0. (For example, t will be assigned value = 0 since it has no
outgoing edges.) Now we will describe how to assign value to a generic vertex v in
the reverse topologically sorted order. Since we follow this order, we know that at
this point, if there are outgoing edges from v to v1, v2, …, vk, then vertices v1, v2, …, vk
have been assigned a value. Call these values val(v1), …, val(vk). Also suppose the
weight on the edge <v, vj> is wj. Then, assign val(v) as
val(v) = max { val(vj), + wj}, j = 1, 2, …, k.
val(s) is the optimal profit for the problem.
Why does it work? First note that the graph created above is acyclic. We can show
by induction that for any vertex j other than s and t, the length of the longest path
from j to t is the maximum profit we can get if we are allowed to choose only jobs
that start at or later than the starting time of job j. This shows that the algorithm is
correct.
Pseudo-code:
Input: a sequence of jobs J1, …,Jk and for each job, < sj,, fj,, pj> the start time, finish
time and profit.
Output: S, a subset of {1, 2, …, k} of jobs that are pairwise not overlapping and the
total profit of jobs in S is as large as possible.
Step 1: Create a weighted, acyclic graph G on k+2 vertices with vertex label {s, 1, 2,
…, k, t}.
Add edge from s to each other vertex with edge weight = 0 and also add edge from
every vertex in 1, 2, …, k to t with weight pj.
Add edge from job j to job k with weight pj if sk > fj.
Step 2: Perform topological sorting of graph G. Let SORT[0 .. k+1] be the topological
sorting of vertices of G. here we have used 0 for s, and k+1 for t.
Step 3: for j = k+1 down to 0 do
if (j has no outgoing edges) then val[j] = 0
else {
value = 0;
for each vertex k in adjacency_list[j] do
if (val[k] + weight of edge <j, k> > value) {
value = val[k] + weight of edge<j, k>;
link[j] = k; }
end for;
val[j] = value;
}
Step 4: output the path from s to t using link pointer.
Complexity: it is easy to see that the time complexity of constructing the graph is
O(n2). Also the number of edges in G is O(n2) hence the time complexity of steps 2
and 3 is O(n2). Step 4 takes O(n) time. Total complexity = O(n2).
5) A deadly disease is spreading through the population of a country A. In country B, where
the infection did not exist until now, there is panic since (exactly) one of the N persons
returning from a visit to A is reported to be infected. (This information is known from
country A’s blood test at the port of exit, but A’s policy allows it to only report the number
of infected people, not their identity.) Center for Disease Control (CDC) of country B
wants to quarantine this carrier. There is a highly reliable blood test which never produces
false positives or false negatives. The test is very expensive but highly sensitive. A pool
containing hundreds of blood samples mixed together will test positive if at least one of
them is infected. CDC is anxious to quarantine the patient with as few blood tests as
possible. As an algorithm designer, you have been asked to devise a test plan that meets
the above requirements. Describe your plan informally and express the number of blood
tests conducted by your plan as a function of N, and prove that it works.
Solution: This problem can be solved by just using binary search. Collect samples
of all n people. Form a group of floor (n/2) blood samples, and test. If it tests
positive, then continue testing on this group, else continue testing on the
remaining group of ceiling(n/2) people. Keep repeating this until only one
member is left and this person is the infected. This is clearly similar to binary
search where the domain of search reduces in size by 2 after each test. Thus the
total number of tests is 1 + [log2 n].
6) Shown below are a weighted directed graph, the content of the binary heap Q and the
parent array (Pi) at the end of second iteration of Dijkstra’s algorithm. Show all the steps
involved in the next iteration. (The adjacency list is not shown here. You should exhibit
this list. Assume that the adjacency list is sorted by vertex label.) Name each step clearly
(such as DeleteMin, relaxation on edge such and such, etc.)
Download