503MidtermExamSolutionsF03

advertisement
UML CS
91.503 Midterm Exam
Fall, 2003
MIDTERM EXAM SOLUTIONS
1: (6 points) What can you conclude?
1)

f1 (n)   n 3 lg 4 n

2)

f 2 (n)  O 27 log4 n

3)
f 3 ( n )  ( 2 n )
3
First, observe that 27 log4 n  27 log3 n  33 log3 n  3log3 n  n 3 . This implies: f 2 (n)  O(n 3 ) .
f3(n)
f1(n)
f2(n)
2n
n 3 lg 4 n
27 log4 n
a) (3 points) Can we conclude from statements (1)-(3) that
f1 (n)  O( f 3 (n)) ?
Why or why not? Either prove or provide a counterexample.
SOLUTION: No. Counterexample: f1 (n)  3n , f 3 (n)  2 n .
1 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
b) (3 points) Can we conclude from statements (1)-(3) that
f1 (n)  ( f 2 (n)) ?
Why or why not? Either prove or provide a counterexample.
SOLUTION: Yes. Proof: We observed above that
f 2 (n)  O(n 3 ) . This implies
that n 3  ( f 2 (n)) . Now, n 3 lg 4 n  (n 3 ) , so by transitivity we have n 3 lg 4 n  ( f 2 (n)) .
This, together with f1 (n)  (n 3 lg 4 n) , imply transitively that f1 (n)  ( f 2 (n)) .
2: (24 points) This question involves the paper: “On calculating
connected dominating set for efficient routing in ad hoc wireless
networks.”
Consider the algorithm that consists of the following two steps:
1) dominating set formation, consisting of the marking process on
p. 8;
2) creating all-pair shortest paths for the dominating set resulting
from step 1.
a) (2 points) Give an example of a best-case input for this
algorithm, where “best” relates to the size of the dominating set
(not running time).
SOLUTION: Since the goal is to minimize the size of the dominating set,
“best” here means smallest dominating set size. If we allow a complete
graph as input, then a complete graph is a best-case input because the
algorithm returns an empty dominating set in this case. To see this,
observe that, since each pair of nodes is connected, no node has an
unconnected pair of neighbors. If we do not allow a complete graph as
input, then a best-case n-node input has one node of degree n-1 that is
connected to each other node. Each other node has degree 1. In this
case, the algorithm returns a dominating set consisting of the node of
degree n-1.
2 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
b) (2 points) Give an example of a worst-case input for this
algorithm, where “worst” relates to the size of the dominating
set (not running time).
SOLUTION: A worst-case n-node input produces a maximum-sized
dominating set. Such an input consists of a single cycle of nodes. That
is, each node has degree 2 and the nodes are connected in a ring
shape. In this case, each node’s neighbors are unconnected so each
node is part of the dominating set. The dominating set therefore has
size = n.
c) (20 points) Analyze the worst-case asymptotic running time of
the algorithm (here “worst” refers to running time). You only
need to give an upper bound. Give the smallest upper bound
that you can. Wherever necessary, make suitable assumptions
about representation so that an efficient implementation is used.
Make sure that the variables in your running time formula are
chosen to provide a meaningful description of the running time.
SOLUTION: For this analysis we represent the running time in terms of
the sizes of the vertex set V and the edge set E. We assume that,
although the algorithm is distributed, we use a sequential model of
computation so that the total running time is the sum of the running time
at each vertex. We also assume an adjacency list graph representation
so that each open neighbor list is actually an adjacency list. We analyze
separately the running time for the marking process and the shortestpath routing:
1) Marking process: This consists of several steps:
a. Initialize marks to F. Time: (|V|).
b. Initialize data structure Lv for each vertex v that will hold its
neighbor lists. Lv is initialized to null. Time: (|V|).
c. Neighbor list exchange. For each vertex v and for each vertex
u in v’s adjacency list, this copies the entries of Adj[u] into Lv.
Note that because this is a distributed environment we copy the
entire contents of Adj[u] instead of simply copying a pointer to
it. In the worst case, after this step each vertex has the entire
adjacency list. Time: O(|V||E|).
3 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
d. Marking. For this step each vertex v has an auxiliary boolean
array Pv of dimensions |adj[v]|x|adj[v]| that records whether or
not each pair of vertices in its adjacency list is connected. This
part of the algorithm steps through Lv, setting the appropriate
entry of Pv to 1 if Lv shows that those vertices are connected.
Once all of Lv has been processed in this way, the algorithm
steps through Pv to see if some pair of vertices Pv is
unconnected. If so, v is marked T. Time: O(|V||E|+|V|3).
2) Shortest-path routing: For this step we can either use FloydWarshall’s all-pairs shortest path algorithm or that of Johnson. In the
former case, the worst-case running time is in O(|V|3). In the latter
case, the time is in O(|V|2lg|V|+|V||E|).
Total worst-case running time is in O(|V| + |V||E| + |V|3 ) in the FloydWarshall case and O(|V| + |V||E| + |V|2lg|V| + |V||E|) in the Johnson
case. Now, in the worst case |E| is in (|V|2). The time therefore
reduces to O(|V|3) in each case.
3: (15 points) Flow Networks
Given a flow network G = (V, E) with source s and sink t, a cut
(S,T) is a partition of V into S and T = V – S such that
s  S , t  T . Let (S,T) be a cut of a flow network. Let C be a
subset of T such that t  C . Let D = T – C.
Consider the following statement about the flow network:
f ( S , C )  f (T , S )  f ( D, S )  f (V , D)
where f ( X , Y ) 
  f ( x, y) and
xX yY
f ( x, y ) denotes the flow
from vertex x to vertex y.
Either prove the statement or provide a counterexample.
SOLUTION: The statement is TRUE. Proof: First note that flow
conservation implies that d  D, f (d ,V )  0 since neither s nor t is in D.
Summing these 0 values yields: f ( D,V )  0 . By part 2 of Lemma 26.1 we
4 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
therefore have: f (V , D)  0 . Now, observe that since C and D are disjoint,
part 3 of Lemma 26.1 yields: f ( S , C )  f ( S , D)  f ( S , T ) . Applying part 2 of
Lemma 26.1 to this (twice) gives: f ( S , C )  f ( D, S )   f (T , S ) . Since f (V , D)  0 ,
this is equivalent to: f ( S , C )  f ( D, S )   f (T , S )  f (V , D) . Rearranging terms
yields: f ( S , C )  f (T , S )  f ( D, S )  f (V , D) , thus completing the proof.
4: (15 points) Amortized Analysis.
A sequence n of operations is performed on a data structure. The i-th
operation costs i if i is an exact power of 4, and 1 otherwise.
a) (8 points) Use aggregate analysis to determine the amortized cost
per operation.
SOLUTION: The aggregate cost is found by summing the actual cost of the
sequence of n operations. We first observe that in a sequence of n
operations there are log 4 n +1 exact powers of 4. The sum of costs is in 2
log n
parts: 1) the cost of the 1’s, which is n  i0 1 , where the -1 sum is needed
because the cost of 1 is not paid for exact powers of 4, 2) the cost of the
log n
exact powers of 4, which is i0  4i  . The aggregate cost is therefore=
4
4
n  i 0 4
log n 
4 1  n  log
i
n  log 4 n  1 
4
4 n   1  i 0
log n 
4   n  log
i
4 n  1 
4 log4 n 1  1
=
3
(4)4 log4 n   1
(4)n  1
n
 3n .To find the amortized cost per
3
3
operation we divide by n. This yields at most 3.
b) (7 points) Use the accounting method to determine the amortized
cost per operation.
SOLUTION: Assign amortized cost of 3 to each operation in the sequence.
This includes 1 to pay for itself and 2 to help pay for the operations whose
sequence numbers are a power of 4. Note that the problem did not specify
different types of operations. As a result, the solution cannot assign
different amortized costs to different operations in the sequence. To
complete the solution we must show that an amortized cost of 3 guarantees
that the credit will never be negative. That is, we must show that:
5 of 10
UML CS

n
91.503 Midterm Exam
Fall, 2003
cˆ  i 1 ci . This can be shown in 2 different ways. The first way is to
n
i 1 i
use the results of (a) as follows. If a cost of 3 is assigned to each
n
n
operation, then i 1 cˆi  3n . Part (a) showed that i 1 ci  3n .

n
cˆ  i 1 ci then follows from transitivity. The second way is to calculate
n
i 1 i
how much savings is needed in between gaps in powers of 4 in order to
pay for the powers of 4 and then show that the amount saved is enough.
The size of a gap between 4i and 4i-1 is 4i - 4i-1 = 4i-1(4 -1) = 4i-1(3). Let x be
the amount of savings required for each item in the gap. Then the credit
4
3
will be sufficient if: 3x 4 i 1  4 i  3x  4  x  . The savings of 2 for each item
in the gap is therefore sufficient. As a result, the amortized cost of 3
suffices.
5: (40 points) Sequence Alignment
Consider two sequences of characters:
X = < x1, x2, ..., xm >
and
Y = < y1, y2, ..., yn >.
The sequence alignment problem here asks for the optimal
alignment of characters of X with characters of Y, where optimality
means maximal total cost. Gaps are allowed. The cost
assumptions for an aligned pair of characters xi and yj are:
- if xi = yj, then this pair contributes cost c1
- if xi  yj, then this pair contributes cost c2
- if a character of one sequence is aligned with a gap in the other
sequence, then this character contributes cost c3
Design an algorithm that solves the sequence alignment problem.
a) (14 points) Pseudocode
SOLUTION: (see next page)
6 of 10
UML CS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
91.503 Midterm Exam
SeqAlign ( X , Y , c1, c 2, c3)
m  length[ X ]
n  length[Y ]
for i  0 to m
do c[i,0]  0
for j  0 to n
do c[0, j ]  0
for i 1 to m
do for j  1 to n
do t1  c1  c[i 1, j 1]
t 2  c2  c[i  1, j  1]
t3  c3  c[i  1, j ]
t4  c3  c[i, j 1]
if xi  y j
then x  max( t1 , t3 , t 4 )
if x  t1
then c[i, j ]  t1
b[i, j ]  1
else if x  t3
then c[i, j ]  t3
b[i, j ]  3
else if x  t 4
then c[i, j]  t4
b[i, j ]  4
else x  max( t 2 , t3 , t 4 )
if x  t 2
then c[i, j]  t2
b[i, j ]  2
else if x  t3
then c[i, j ]  t3
b[i, j ]  3
else if x  t 4
then c[i, j]  t4
7 of 10
Fall, 2003
UML CS
34
35
91.503 Midterm Exam
Fall, 2003
b[i, j ]  4
DumpSeqAlign(b, X , Y , m, n, alignX , alignY , k )
Note: To make the recursion work properly we assume that the parameter
k is passed by reference to DumpSeqAlign.
1 DumpSeqAlign(b, X , Y , i, j, alignX , alignY , k )
if i  0 or j  0
2
3
then return
if b[i, j ]  1 or b[i, j ]  2
4
then DumpSeqAlign(b, X , Y , i  1, j  1, alignX , alignY , k )
5
alignX [k ]  X [i ]
6
alignY [k ]  Y [ j ]
7
8
k  k 1
else if b[i, j ]  3
9
then DumpSeqAlign(b, X , Y , i  1, j , alignX , alignY , k )
10
alignX [k ]  X [i ]
11
alignY [k ] ' '
12
13
k  k 1
else if b[i, j ]  4
14
then DumpSeqAlign(b, X , Y , i, j  1, alignX , alignY , k )
15
alignX [k ] ' '
16
alignY [k ]  Y [ j ]
17
18
k  k 1
b) (13 points) Correctness
SOLUTION:
The pseudocode is a dynamic programming algorithm that is very similar
to the LCS algorithm on p. 353-355 of the text. As such, it inherits much of
its correctness from the correctness of the LCS pseudocode. The main
correctness task is to show that an optimal sequence alignment has
optimal substructure. This can be established using a result similar to
Theorem 15.1 on p. 351. This then leads to a recursive optimal cost
calculation similar to Eq. 15.14 on p. 352.
8 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
For the optimal substructure theorem, let Z k  z1 , z2 ,, zk  be an optimal
solution for Xm, Yn, where Xm, Yn denote sequences of length m and n,
respectively. Let zi  ( xs , yt ) represent a pair of characters, one from X and
one from Y.
Theorem:
1) If zk  ( xm , yn ) and (( xm  yn and c1  c3) or ( xm  yn and c2  c3)) then Zk-1 is
optimal for Xm-1 and Yn-1.
2) If zk  ( xm , ' ' ) and (( xm  yn and c3  c1) or ( xm  yn and c3  c2)) then Zk is optimal
for Xm-1 and Yn.
3) If zk  (' ' , yn ) and (( xm  yn and c3  c1) or ( xm  yn and c3  c2)) then Zk is optimal
for Xm and Yn-1.
Proof:
We prove (1), as the proofs of (2) and (3) are similar to that of (1). We
establish (1) using a cut & paste proof by contradiction. By way of
contradiction, suppose that Zk-1 is not optimal for Xm-1 and Yn-1. Then there
exists some optimal Z’ for Xm-1 and Yn-1 such that cost(Z’)>cost(Zk-1). Now
consider Z’’ = Z’ + (xm,yn). If ( xm  yn and c1  c3) then cost(Z’’)=cost(Z’)+c1
and cost(Zk)=cost(Zk-1)+c1. Since cost(Z’)>cost(Zk-1), this implies that
cost(Z’’)>cost(Zk), contradicting the optimality of Zk. On the other hand, if
( xm  yn and c2  c3) , then cost(Z’’)=cost(Z’)+c2 and cost(Zk)=cost(Zk-1)+c2.
Since cost(Z’)>cost(Zk-1), this implies that cost(Z’’)>cost(Zk), again
contradicting the optimality of Zk. QED.
The Theorem leads to the following recursive cost calculation, where c[i,j]
denotes the optimal cost for Xi,Yj.
 max( c[i  1, j  1]  c1, c[i  1, j ]  c3, c[i, j  1]  c3) if xi  y j 
c[i, j ]  

max( c[i  1, j  1]  c2, c[i  1, j ]  c3, c[i, j  1]  c3) if xi  y j 
SeqAlign() implements the recursive cost expression above. It therefore
correctly calculates the optimal alignment cost. This, combined with its
similarity to LCS-LENGTH, justify its correctness.
Finally, the actual alignment Zk is created in two steps. First, SeqAlign()
records the direction corresponding to each cost choice in the b array. This
9 of 10
UML CS
91.503 Midterm Exam
Fall, 2003
is similar to the usage of the b array in LCS-LENGTH. Second,
DumpSeqAlign() recursively constructs Zk by storing its characters in
character arrays alignX and alignY. The structure of DumpSeqAlign() is
similar to that of PRINT-LCS() on p. 355.
c) (13 points) Analysis: Provide as tight an upper bound on the worstcase asymptotic running time as you can.
SOLUTION:
The modifications to LCS-LENGTH() to create SeqAlign() add only (1)
time to the cost calculation loop. Thus, the worst-case running time of
SeqAlign() is in O(mn). Similarly, the modifications to PRINT-LCS() to
create DumpSeqAlign() add only (1) time to each level of the recursion,
to the worst-case running time of DumpSeqAlign() is in O(m+n). The total
worst-case running time is therefore in O(mn).
10 of 10
Download