503MidtermExamSolF08

advertisement
UML CS
91.503 Midterm Exam
Fall, 2008
MIDTERM EXAM SOLUTIONS
Stats: to be determined later (with ?? points added)
- Minimum:
- Maximum:
- Average:
- Standard Deviation:
1: (5 points) Asymptotic Growth of Functions
(a) (1 point) List the 4 functions below in nondecreasing asymptotic order of growth:
lg 32 n
1) 3n 
n
3n  n
n 3 lg lg n
2)
n lg n 2
n lg n  2
3) n 3 lg lg n
lg 32 n
4)
smallest
largest
n
Rationale: lim n 3n  0 , so 3n  is the smallest.
n
n lg n 2 n2 lg 2 n  O(n3 lg lg n)
because lg 2 n  O(n lg lg n) . lg 32   5n ; this exponential function dominates the
other 3 functions.
n
1)
f1 (n)  (n2 )
2) f 2 (n)  (n3 )
3)
f3 (n)  O((n lg n)2 )
4)

f 4 (n)   3n
n

lg 32 n
n 3 lg lg n
f2(n)
f3(n)
n lg n 2
n2
f1(n)
f4(n)
1 of 10
n3
3n  n
UML CS
91.503 Midterm Exam
b) (1 point) f1 (n)  ( f 2 (n))
TRUE
Fall, 2008
FALSE
Proof: f1 (n)  (n 2 )  f1 (n)  O(n 2 ) [1] by the definition of the  operator. n 2  O(n3 ) [2].
Applying transitivity to [1] and [2] yields f1 (n)  O(n3 ) [3]. Now, via transpose symmetry
from f 2 (n)  (n3 ) we have n3  O( f 2 (n)) [4]. Applying transitivity to [3] and [4] yields
f1 (n)  ( f 2 (n)) .
c) (1 point) f 3 (n)  ( f 4 (n))
TRUE
FALSE
TRUE
FALSE
Counter-example: f3 (n)  n and f 4 (n)  n3 .
d) (1 point) f 3 (n)  (n3 )
Proof: f 3 (n)  O(( n lg n) 2 ) combined transitively with (n lg n) 2  O(n3 ) yields f 3 (n)  (n3 ) .
e) (1 point) f1 (n)  ( f 4 (n))
TRUE
FALSE
Counter-example: f1 (n)  n 2 and f 4 (n)  n3
2: (5 points) Recurrence
Find a tight upper bound on the closed-form solution for the following recurrence:
T (n)  3T (n  1)  n
where T(n) is constant for sufficiently small n. That is, find a function g (n ) such that
T (n)  O( g (n)) .
Solution: The Master Theorem does not apply here. A recursion tree can be used. The
tree has n  1 levels if T(0)=1. 3i (n  i ) work is done at the i th level, except for the bottom
level, where 3n T (0)  3n work is done (thanks to Jane for pointing out the work at the
bottom level). The total work is:
n1
n1
n1
 n1 i   n1 i  n
i
n
i
(
3
(
n

i
))

3

3
n

3
i

3

n
3

3i i  3n  (using closed-form solutions to









i 0
i 0
i 0
 i0
  i 0 
the summations):
n(3n  1) (n  1)3( n1)  n3n  3 n 7 n n 3

 3  3    O(3n ) . Thus, T (n)  O 3n .
2
4
4
2 4
 
2 of 10
UML CS
91.503 Midterm Exam
Fall, 2008
3: (5 points) Analyze Pseudocode
Mystery1 has one argument: a positive integer value n.
Mystery1(n)
print " Mystery1 called with n " n
if n  1
then return
for i  1 to 3
do Mystery 2(n / 4)
return
Mystery 2(n)
print " Mystery2 called with n " n
if n  1
then return
Mystery1(n / 4)
return
Derive a tight upper bound on Mystery1’s worst-case asymptotic running time as a
function of n.
Solution: In the worst case, let n be a power of 4.
T (n)  (1)  3((1)  T (n / 16))  3T (n / 16)  (1). Case 1 of the Master Theorem applies,
yielding T (n)  (nlog16 3 ) . So, a tight upper bound is T (n)  O(nlog16 3 ) .
4: (35 points) Design an Algorithm: Shortest Paths
This problem is from Introduction to Algorithms: A Creative Approach, by Udi Manber.
Let G = ( V, E ) be an unweighted, directed graph. Let v and w be two vertices of G.
Design an efficient algorithm that finds the number of different shortest paths (not
necessarily vertex-disjoint) between v and w.
Make sure that you provide pseudocode, correctness justification and running time analysis
for your algorithm.
3 of 10
UML CS
91.503 Midterm Exam
Fall, 2008
a) (12 points) Pseudocode: BFS-Count(G, s) is called once with s=v.
BFS-Count(G, s)
1 for each verte x u  V [G ]  {s}
do color [u ]  WHITE
2
nShortestPaths[u ]  0
3
d [u ]  
4
5 color[ s]  GRAY
6 d [ s]  0
7 nShortestPaths[ s]  0
8 Q  0
9 ENQUEUE(Q, s)
10 while Q  0
do u  DEQUEUE( Q)
11
for each verte x v  Adj[u ]
12
do if color[v]  WHITE
13
then color[v]  GRAY
14
d [v]  d [u ]  1
15
nShortestPaths[v]  1
16
17
ENQUEUE(Q, v)
else if d [v]  d [u ]  1
18
then nShortestPaths[v]  nShortestPaths[v]  nShortestPaths[u ]
19
color[u ]  BLACK
20
b) (12 points) Correctness
i) Mechanical: (4 points)
The for loops in lines 1-4 and 12-19 terminate. In lines 1-4 the loop visits each
vertex (except the source) once. In lines 12-19 the loop visits each element of an
adjacency list whose length is finite. The while loop in lines 10-20 terminates
because each vertex is ENQUEUE’d only once and is eventually DEQUEUE’d.
Arrays color, d, and nShortestPaths stay within bounds.
ii) “As Advertised”: (8 points)
BFS-Count(G, s) uses a modified Breadth-First-Search starting at vertex s. (Note:
Notationally, the vertex v inside BFS-Count should not be confused with the v in the
high-level call BFS-Count(G, v)). It is similar to the BFS procedure on p. 532 of our
textbook, except that the predecessor  array is not used and lines 3, 7, 16, 18, and
4 of 10
UML CS
91.503 Midterm Exam
Fall, 2008
19 are introduced to keep track of the number of shortest paths. Upon termination of
BFS-Count(G, s), for each vertex x  s , d[x] contains the length of the shortest path
from s to x. That is true due to Theorem 22.5.
We claim that, upon termination, nShortestPaths contains the number of shortest
paths from s to x. This can be shown by induction, where the inductive hypothesis is
that, at the end of each iteration of the while loop, nShortestPaths[v] contains the
number of shortest paths from s to v discovered by BFS so far for each vertex v
adjacent to u. In lines 3 and 7, each element of nShortestPaths is initialized to 0. As a
base case, at the end of the first iteration of the while loop, each vertex v adjacent to
s, having been WHITE, will have nShortestPaths[v]=1; this correctly represents the
shortest path of length 1 from s to v. For the inductive step we consider what occurs
during some iteration of the while loop. When a vertex v is first discovered, this
means that the first shortest path in the BFS tree from s to v has been identified;
nShortestPaths[v] is therefore set to 1 in line 16. If v has previously been encountered
when we arrive at line 12, then we must check if we have just discovered a shortest
path; this is done with the test in line 18. If we have discovered another shortest
path, then we perform the assignment in line 19. Note that our inductive hypothesis
guarantees that nShortestPaths[v] contains the number of shortest paths discovered
so far from s to v and that nShortestPaths[u] contains the number of shortest paths
discovered so far from s to u. Thus, the addition in line 19 yields the correctly
updated number of shortest paths discovered so far from s to v. This completes the
induction. Upon termination, nShortestPaths contains the number of shortest paths
from s to each other vertex, which guarantees that we find the number of shortest
paths from s to the original target vertex w.
c) (11 points) Analysis: Derive the tightest upper bound that you can on the worstcase asymptotic running time of your pseudocode.
-
The worst-case asymptotic running time of BFS-Count(G, s) is in O(| V |  | E |) . This
is because:
the worst-case asymptotic running time of BFS(G, s) is in O(| V |  | E |)
a constant number of operations has been removed
a constant number of constant-time operations have been inserted, without creating
new loops, function calls, or any recursion.
5: (10 points) Amortized Analysis
This problem uses the journal paper “Fast Hierarchical Clustering and Other Applications of
Dynamic Closest Pairs” by David Eppstein. Prove that the total potential (defined on p. 6
of the paper) cannot be negative, regardless of the sequence of insertion and deletion
operations (and any resulting merge operations). That is, show that:


  n2 log n   i  0
5 of 10
UML CS
91.503 Midterm Exam
Fall, 2008
(Note: Be sure to show that the initial total potential is also non-negative.)
Solution: When all the points are initially put into S1 at the beginning, then k=1. Thus,


n | Si | log | Si | n 2 log n and   n2 log n  n2 log n  0 . In general, we know that
k
| S
i 1
so log | Si | log n . This implies that
k
| S
i 1
k
| S
i 1
i
| n,
k
i
| log | Si |  | Si | log n  n log n . Transitively,
i 1
k
i
| log | Si | n log n , which means that n | Si | log | Si | n 2 log n . Thus,
i 1
k
  n 2 log n    i  n 2 log n  n | Si | log | Si |  n 2 log n  n 2 log n  0 . Therefore the total potential
i 1
is non-negative.
6: (10 points) Flows
This problem uses the flow network G '  (V ' , E ' ) described in Section 26.3 on p. 665666 of our textbook for finding a maximum bipartite matching in an undirected bipartite
graph G  (V , E ) . Here we are given an integer
flow network
x  1and we modify G ' to form the
G ' ' . G ' ' is identical to G ' except that each edge in G ' ' has capacity
x instead of capacity 1.
Does this change the results established in Section 26.3? Discuss if, how and why.
Solution: Even though all the edge capacities are all equal integers, some of the results of
Section 26.3 change or need additional explanation. In particular, the proof of Lemma
26.10 (converse direction) needs additional justification. In that proof the fact that the flow
was integer-valued and the capacity of each edge was 1 was used to argue that, for each
vertex u  L , one unit of positive flow could enter on at most one edge and leave on at
most one edge. This was critical to proving that the set of edges being considered was a
matching. Section 26.3 relies on the Ford-Fulkerson method, which, in its most general
form (FORD-FULKERSON-METHOD on p. 651) just keeps finding an augmenting path and
increasing flow accordingly. This method allows the following type of situation to occur, as
illustrated in Justin’s diagram below. Here maximum flow (although having the integrality
property) splits flow coming out of a vertex (vertex a):
6 of 10
UML CS
91.503 Midterm Exam
1/2
s
c
1/2
2/2
1/2
0/2
b
x
a
2/2
t
1/2
1/2
y
Fall, 2008
2/2
1/2
As a result, Corollary 26.12, which showed that the cardinality of a maximum matching in M
equaled the value of a maximum flow in f, is no longer substantiated.
However, if we consider the more specialized FORD-FULKERSON procedure on p. 658,
then we can say (this is Jane’s point of view) that because FORD-FULKERSON forces flow
along an augmenting path to equal the minimum capacity on the path, and that capacity will
be x, the flow will not be split across edges emanating from a vertex of L and Lemma 26.10
will still hold. So, depending on your interpretation of “Ford Fulkerson method” one could
successfully argue either for or against the validity of this part of the converse direction of
Lemma 26.10.
Regardless of one’s interpretation, Jane correctly points out that the wording of both
Lemma 26.10 and Corollary 26.12 need to change slightly due to the cardinality of the
maximum flow now being | f || xM | rather than just | f || M | . The cardinality of the
matching is therefore | M || f | / x .
Note that because x is an integer the integrality theorem (Theorem 26.11) still holds.
7: (30 points) Dynamic Programming
This problem is adapted from the book Research and Education Association Problem
Solvers: Operations Research.
The task is to plan a production schedule for expensive wireless sensors over a 4-month
period of time from November, 2008 through February, 2009. The goal is to meet demand
while minimizing cost. The company has demand forecasts for each of the 4 months, given
below:
7 of 10
UML CS
Month
91.503 Midterm Exam
Month Index
Fall, 2008
Demand (in thousands)
November
1
4
December
2
1
January
3
3
February
4
2
A schedule is represented by (x1, x2, x3, x4), where xj represents the number of wireless
sensors (in thousands) produced during month j. A schedule is feasible if it meets demand.
That is, the following constraints must be satisfied:
x1  4
x1  x2  5
x1  x 2  x 3  8
x1  x 2  x3  x 4  10
Note that the last constraint is an equality to avoid over-producing.
The costs are:
-
$7,000 per sensor that is produced;
-
$40,000 for each month that has a production run (set-up cost);
-
$10,000 per month for each sensor that is produced during one month but shipped
during a later month (carry-over or storage cost).
For example, for the schedule (5, 0, 3, 2) the total cost (in thousands of dollars) would be =
5  7  0  7  3  7  2  7  70 for the units produced,
plus 3  40  120 for the 3 setups (November, January and February),
plus 110  10 for the sensors produced in November and sold later on in February,
for a total cost of $200,000.
Formulate a minimal cost expression recursively. Justify your answer by demonstrating
optimal substructure. You do not need to solve your expression to obtain an optimal
schedule for this particular problem instance.
8 of 10
UML CS
91.503 Midterm Exam
Fall, 2008
Solution: Pseudocode was not requested, so we just set up the recursive cost formulation
and demonstrate optimal substructure. All units here are expressed in thousands. Let zj be
the number of units on hand at the start of the j th month.
Then z1  0 , z2  x1  4 , z3  z2  x2  1 , z4  z3  x3  3 , and z4  x4  2  0 .
(Note that we use the last equation to avoid overproducing at the end.) Let dj be the
demand for month j. Let Pj(zj) be the cost, taking into account production decisions for
months j,…,4.
The recursive cost formulation is:
Pj ( z j )  min{ 7 x j  10( z j  x j  d j )  40 ( x j )  Pj 1 ( z j  x j  d j )} where
1 if x j  0 
 and the following constraints are imposed for each j in order to
0 otherwise 
satisfy demand and not over-produce:
 (x j )  
zj  xj  d j
and
z j  x j  d j  d j 1    d 4 .
(We assume that P5 (0)  0 ). The book does not prove optimal substructure, but we do it
here. To justify the cost formulation, we examine each part of it. Since the goal is to
minimize cost, we minimize the overall cost expression. (Note that, for Pj(zj), we must
choose a combination of values for xj and zj that minimizes the cost expression.) The 7xj
part is the “per-sensor” cost, and 40xj) is the production set-up cost.
10( z j  x j  d j ) represents the storage charge. Finally, we discuss Pj 1 ( z j  x j  d j ) , which
we claim exhibits optimal substructure. Assuming that we abide by the constraints, we can
apply a cut-and-paste proof by contradiction to establish optimality for Pj 1 ( z j  x j  d j ) . Let
z j ' , x' j be values that minimize the Pj(zj) expression, so that Pj(z’j) is optimal. By way of
contradiction, suppose that there was a better way to make decisions for months ( j+1)…4
for z ' j  x' j d j ; call this P ' j 1 ( z ' j  x' j d j ) . This would yield
7 x' j 10( z ' j  x' j d j )  40 ( x' j )  P' j 1 ( z ' j  x' j d j )  7 x' j 10( z ' j  x' j d j )  40 ( x' j )  Pj 1 ( z ' j  x' j d j )
. Since we are minimizing, this would produce a cost < Pj(z’j) , contradicting the optimality
of Pj(z’j).
Note that if Pj(zj) relied on past months rather than future months, then it would not be clear
how much extra to produce in the base case for the first month.
9 of 10
UML CS
91.503 Midterm Exam
10 of 10
Fall, 2008
Download