1 Exact Algorithm via Recursion

advertisement
Algorithms and Data Structures 2010/11
Coursework 2
Issue date: Wednesday, 10th November 2010
The deadline for this coursework is NOON on Thursday 25th November, 2010. Please submit
your solutions electronically via submit. This is worth 50% of the coursework for A&DS.
In this coursework, we consider the “knapsack counting” problem. Your mission is to
understand, implement, and experiment with a suite of algorithms for this problem.
In the knapsack counting problem, we are given as input a list of non-negative integer
weights w1 , w2 , . . . , wn ∈ N, and an upper bound B ∈ N. We say that some specific set
S ⊆ {1, . . . , n} represents a feasible knapsack solution (wrt w1 , . . . , wn , B) if and only if
X
wi ≤ B.
i∈S
The total number of feasible knapsack solutions (which we wish to count) is
X
count(n, B) = S ⊆ {1, 2, . . . , n} :
wi ≤ B .
i∈S
Note we may assume wlog that wi ≤ B for all 1 ≤ i ≤ n (we can delete any wi > B).
§1 describes how to compute count(n, B) exactly via a recursive (exponential-time) algorithm. §2 describes a dynamic programming which computes count(n, B) exactly in Θ(nB)
time. §3 introduces a very basic approximation algorithm, which runs in Θ(n3 ) time on a
“rounded” version of the input, and returns an “approximate count” within a factor of (n + 1)
of count(n, B) (this is an n-approximation algorithm). §3 then shows we can improve the
quality of this approximation (to within a factor of 1.25, say, or closer), by drawing uniform
samples from the feasible solutions of the “rounded problem”, and checking whether or not
they also are solutions to the original problem 1 . Your task in this coursework is to implement all three algorithms, prove some relevant facts (see §3.1), and write a short report on
experimental results.
1
Exact Algorithm via Recursion
Given the input w1 , . . . , wn , B, ∈ N, we define for every 0 ≤ k ≤ n and every b ≤ B,
X
count(k, b) = S ⊆ {1, 2, . . . , k} :
wi ≤ b .
(1)
i∈S
Clearly count(0, b) = 1 for all 0 ≤ b ≤ B (the only feasible solution is to take S = ∅). Also,
count(k, 0) = 1 for all 0 ≤ k ≤ n (again, the only feasible solution is to take S = ∅). In all
other cases, when k ≥ 1 and b ≥ 1, we can partition the set of feasible solutions into solutions
1
Both the initial n-approximation algorithm we get from “rounding” wi → ai , and the betterapproximation algorithm we get by repetitively sampling from this set, are due to Martin Dyer [1].
1
with k ∈ S, and solutions with k 6∈ S. This gives the following recurrence, which can easily
be coded up into a natural recursive algorithm.

1
if k = 0

count(k − 1, b)
if k > 0, but wk > b .
count(k, b) =
(2)

count(k − 1, b) + count(k − 1, b − wk ) if k > 0 and wk ≤ b
To justify the recurrence, note that S with k ∈ S is a solution for w1 , . . . , wk with bound b if
and only if S\ {k} is a solution for w1 , . . . , wk−1 with bound b−wk (there are count(k−1, b−
wk ) such solutions in total). Also, S with k 6∈ S is a solution for w1 , . . . , wk with bound b if
and only if S is a solution for w1 , . . . , wk−1 with the same bound b (there are count(k − 1, b)
of these solutions). Hence count(k, b) = count(k − 1, b) + count(k − 1, b − wk ). If wk > b,
then count(k − 1, b − wk ) = 0.
The recursive algorithm for counting knapsack solutions is an exact algorithm - it always
returns the exact value of count(n, B). However, it can take exponential time, even for
relatively small values of B (eg B ≤ n2 ). You will see this when you run experiments.
The first task of this coursework is to implement (2) directly as the Java method
public static int countKnapsackRecurse(int[] w, int B).
2
Exact Algorithm via Dynamic Programming
In presenting a DP algorithm we have three issues to address:
1. Describe the expanded set of subproblems and the recurrence which relates them;
2. Describe the table where we will store the solutions to these subproblems;
3. Finally, we must describe the order in which we fill in this table, by giving our algorithm.
In our DP algorithm, we will ask for the solution to count(k, b) (defined in (1)) for all
0 ≤ k ≤ n and all 0 ≤ b ≤ B, and we will work with the recurrence (2). We will store
all these solutions in a table (an integer array) C of dimensions (n + 1)(B + 1). For every
0 ≤ k ≤ n, 0 ≤ b ≤ B, the cell C(k, b) will hold the value of count(k, b) once it has been
computed by the DP algorithm. The algorithm we use to build the table is given below.
Algorithm countKnapsackDP(w, B)
1. n ← length(w).
2. Define array C of dimensions (n + 1)(B + 1)
3. for b ← 0 to B do C[0, b] ← 1 od
4. for k ← 1 to n do
5.
for b ← 0 to B do
if wk > b then C[k, b] ← C[k − 1, b]
6.
else C[k, b] ← C[k − 1, b] + C[k − 1, b − wk ] fi
7.
8.
9.
od
od
10. return C[n, B]
2
In practice (ie, when you implement this algorithm), the wi values will be stored in an array,
and instead of referring to wi you will refer to W[i − 1] (as Java arrays begin at 0).
Your second task of this coursework is to implement countKnapsackDP in Java for
general values of w1 , . . . , wn , B ∈ N, as the method
public static int countKnapsackDP(int[] w, int B)
In doing this, always check the wk > b case, so that you do not try to access undefined
cells of the table. When you run experiments on your implementation, you will find that it
runs much much faster than countKnapsackRecurse (from §1) if B is not too large. For large
values of B, countKnapsackDP may still require quite a large amount of time/space.
3
Approximation algorithm via Dynamic Programming
3.1
Initial n + 1-approximation algorithm
Now we present an algorithm which for any input w1 , . . . , wn , B (regardless of the size of B,
\
n2 ) such that
or the values of the wi ), will, in Θ(n3 ) time, compute a estimate count(n,
\
count(n, B) ≤ count(n,
n2 ) ≤ (n + 1) · count(n, B).
(3)
It will be your responsibility to prove (3), which then implies we have an (n+1)-approximation
algorithm (even better, we only have one-sided error). To compute our estimate, we construct
a “rounded” version a1 , . . . , an of our input weights, where
2 n wi
,
ai =def
B
\
and take n2 as our new upper bound. We write count(n,
n2 ) to denote the number of
2
\
knapsack solutions for the rounded instance a1 , . . . , an , n - the “hat” on count(n,
n2 ) is to
\
indicate we are working with rounded values. We compute count(n,
n2 ) as follows:
1. n ← length(w).
2. for k ← 1 to n do
3.
2
ai ← b n Bwi c od
\
4. count(n,
n2 ) ← countKnapsackDP(a, n2 )
\
5. return count(n,
n2 )
Your third task is to prove this is a Θ(n3 )-time (n + 1)-approximation algorithm:
\
• Step 1: count(n, B) ≤ count(n,
n2 )
P
(show that every element of count(n, B)’s set - this is {y ∈ {0, 1}n : n
j=1 yj wj ≤ B} Pn
2
n
2
\
also lies in count(n,
n )’s set {x ∈ {0, 1} : j=1 xj aj ≤ n }).
\
• Step 2: count(n,
n2 ) ≤ (n + 1) · count(n, B)
P
2
\
(define a functionP
f from count(n,
n2 )’s set {x ∈ {0, 1}n : n
j=1 xj aj ≤ n } into count(n, B)’s
set {y ∈ {0, 1}n : n
j=1 yj wj ≤ B} and show that no element in count(n, B)’s set is the
image of f(x) for more than (n + 1) different x).
• Step 3: Justify the Θ(n3 ) running time.
3
3.2
Refining the Approximation via Sampling
The basic approximation procedure described in §3.1 is easy to implement and very useful for
reducing the size of the DP table (in the case when B is large), but the estimate returned can
be n+1 times greater than the true value. We now show how to use this rough approximation,
plus the DP table (plus a bit of randomness) to come up with an estimate which with high
probability will lie within a factor of (1 ± 0.25) of count(n, B).
We define the sets of feasible solutions, for the initial instance and the rounded instance:
X
K =def
S ⊆ {1, 2, . . . , n} :
wi ≤ B .
i∈S
S ⊆ {1, 2, . . . , n} :
b =def
K
X
ai ≤ n
2
.
i∈S
b Also recall
\
Observe that by definition, count(n, b) = |K| and that count(n,
n2 ) = |K|.
from §3.1 that we have a simple procedure (defined in terms of countKnapsackDP) to
\
compute count(n,
n2 ) exactly in O(n3 ) time. Clearly,
count(n, B) = |K| =
|K| b
|K|,
b
|K|
b either! However, we can estimate |K|/|K|
b
but unfortunately we do not know the value of |K|/|K|
via a procedure known as random sampling. The idea is as follows: we know, from Step 1 of
b (the solutions to the original instance are all solutions of the
the third task, that that K ⊆ K
rounded version). Suppose we had a oracle drawRandomSample which we could ask to
b where S would be chosen uniformly at random2 from
supply us with a random element S ∈ K,
b We could run our magic procedure m times to obtain m samples S1 , S2 , . . . , Sm .
the set K.
Next, we could test each SjP
in turn to determine whether it is also an element of K or not
(this just involves checking i∈Sj wi against B), and hence evaluate
p=
|{Sj : 1 ≤ j ≤ m, Sj ∈ K}|
.
m
b then we should
Observe that if m is large (ie, we take a large number of samples from K),
b
have a reasonably good approximation of |K|/|K|. More specifically, it is possible to show using
b
Chernoff bounds, that if we take about m = Ω(|K|/|K|)
samples, we have
b ≤ p ≤ (1 + 0.25)|K|/|K|.
b
(1 − 0.25)|K|/|K|
b
Step 2 of the third task of §3.1 is crucial here, because it implies that |K|/|K|
≤ (n + 1). If we
b
take about 10n samples from K and evaluate p with these, then w.h.p p lies within (1 ± 0.25)
b 3 . A direct consequence is that we can then estimate count(n, B) = |K| to within a
of |K|/|K|
\
factor of 1 ± 0.25 (w.h.p) by taking p · count(n,
n2 ). Here is the pseudocode:
2
b have the same chance of being taken.
Uniformly at random means that all elements of the set K
b
b is usually more than we
Hardly any instances of w1 , . . . , wn , B have |K|/|K| ∼ n, so 10n samples from K
b
need. In your experiments, try to include a case where |K|/|K| ∼ n (for reasonably large n).
3
4
Algorithm countKnapsackApprox(w, B)
1. n ← length(w).
2. m ← 10n.
3. ` ← 0
4. for i ← 1 to n do
5.
2
ai ← b n Bwi c od
\
6. count(n,
n2 ) ← countKnapsackDP(a, n2 )
7. for k ← 1 to m do
8.
S ← drawRandomSample(a, n2 )
9.
if (S is a feasible solution for w1 , . . . , wn , B)
then ` ← ` + 1
10.
11.
od
12. p ← `/m
\
13. return bp · count(n,
n2 )c
Observe that the time taken by this code-fragment is Θ(n3 ) + O(m · TdrawRandomSample ),
where Θ(n3 ) is from line 6 and O(m · (n + TdrawRandomSample (n))) comes from lines 7-11. We
can achieve TdrawRandomSample (n) = O(n), hence countKnapsackApprox is Θ(n3 ).
Note: When we mention with high probability, this probability is taken over the random
choices made by our algorithm, for any fixed input w1 , . . . , wn , B ∈ N. So this should hold
true (with high probability) for any fixed input w1 , . . . , wn , B. In most cases of w1 , . . . , wn , B,
we can get away with far fewer samples than 10n. It may be interesting to experiment, by
√
taking m = 10 n, say.
3.3
b
Generating a random sample from the set K
Our refined approximation of §3.2 is highly dependent on the existence of an oracle for
b of rounded feasible solutions - we refer to this oracle as
drawing samples from the set K
2
drawRandomSample(a, n ) in the code-fragment in §3.2 for our refined approximation.
I now claim that there is a simple sampling algorithm which allows us to generate a
b in O(n) time, using the DP table that we have already built
uniform random sample from K
2
b is the set of feasible solutions to our rounded instance
for a1 , . . . , an , n . Recall that K
a1 , . . . , an , n2 . Also recall that the feasible solutions we consider are represented as subsets S
of the index set {1, . . . , n}.
It is your job to come up with the pseudocode for generating a uniform random
b in O(n) time (and your job to justify your algorithm in your report). The
sample from K
sampling procedure does not have to be coded as a separate method drawRandomSample
- it may be better to include it as a code fragment within countKnapsackApprox, so you
can re-use the DP table for many samples.
Hint: You will need to make use of the DP table you will have built for a1 , . . . , an , n2 . The
recurrence (2) is also a key component in generating a sample. As an extra hint, I have drawn
5
b
the a1 , . . . , an , n2 table below, showing the location of count(n, n2 ) (this is the value |K|),
2
2
and its two component values count(n − 1, n − an ) and count(n − 1, n ). Note however,
you will need to exploit the recurrence recursively.
0
1
2
3
..
.
0
1
..
.
..
.
n−1
n
4
.........
.........
.........
.........
.........
..
.
.........
......
n 2 − an
..
.
count(n − 1, n2 − an )
......
......
......
......
......
..
.
......
......
......
......
......
..
.
......
......
......
......
n2
0
count(1, n2 )
..
.
count(n − 1, n2 )
count(n, n2 )
Testing
For your implementation of countKnapsackRecurse and countKnapsackDP, you could start
by testing some examples.
Suppose that input is w1 = 3, w2 = 4, w3 = 5, w4 = 6, w5 = 8, w6 = 9, B = 19.
answers: count(5, B) = 27, count(5, B − 9) = 11, count(6, B) = 38.
5
Your tasks
Download the file CountKnapsack.java from the course webpage. This file contains declarations for the methods you are required to write.
1. Write a method which implements the recursive algorithm countKnapsackRecurse [5 marks]
to evaluate count(n, B), described in §1.
public static int countKnapsackRecurse(int[] w, int B)
2. Write a method which implements the Θ(nB) dynamic algorithm discussed in §2.
[10 marks]
public static int countKnapsackDP(int[] w, int B)
Note: In preparation for Task 4, it might help to write a method buildKnapsackTable which returns the entire DP table, rather than just count(n, B). countKnapsackDP could be written as a ‘wrapper’ around this. Or you could just duplicate your
table-building code into the body of countKnapsackApprox, not a big deal.
3. Prove the inequality (3) mentioned in §3.1 by showing each of Step 1 (2 marks) and [10 marks]
Step 2 (4 marks). Justify the Θ(n3 ) running-time of the algorithm of §3.1 (4 marks).
These proofs should be given in a file named task3.txt or task3.pdf.
4. Write a method to implement countKnapsackApprox, described in §3. Your main [10 marks]
challenge will be to write code to draw a uniform random sample from the set of solutions
for for a1 , . . . , an , n2 . This is the only tricky bit. If you can’t work out pseudocode
for sampling, please ask someone (but they should only help with pseudocode, not the
coding)!!! Also, make sure to credit the person who helps on your submission.
public static int countKnapsackApprox (int[] w, int B)
6
5. Write a short report of about 2-3 pages. This must include a justification of your [15 marks]
sampling algorithm used in countKnapsackApprox, plus a discussion of your experimental results. You should run tests on at least 100 instances (and preferably more)
of the knapsack problem, varying the number of weights n and also the sizes of the wi
and of B. It may be helpful to generate test examples randomly. Issues to be addressed:
• Justification of the correctness and O(n) running time of the sampling code you
write for countKnapsackApprox (5 marks). If you had to ask someone for help
with this that is fine, but credit them here!
• Experimental comparison of the speed of countKnapsackRecurse against countKnapsackDP. The algorithms give identical answers, but their running times
vary wildly (the running time of countKnapsackRecurse will grow exponentially with n, unless Java’s compiler is better than I think!). (5 marks)
• Experimental comparison of your implementation of countKnapsackApprox (5
marks). Many possibilities here. Certain things I would like to see explored:
(i) for examples where B (and hence the wi ) is not much greater than n2 , a
comparison of count(n, B) (computed exactly by countKnapsackDP) against
the value returned by countKnapsackApprox.
(ii) Same test as (i), but using a variant of countKnapsackApprox which takes
√
only 10 n (or even 10) samples.
b
(iii) Tests with a particular example where |K|/|K|
∼ n.
(iv) For examples where B is very large, we cannot compute count(n, B) exactly,
but it would be nice to test how the answer given by countKnapsackApprox
varys depending on the number of samples used.
Implement all of your methods within CountKnapsack.java, available from the course
webpage. Write your report in a file called report.pdf or report.txt. Then submit as
follows:
submit cs3 ads cw2 CountKnapsack.java task3.???
report.???
(if you have extra files, please also include them.)
The DEADLINE is NOON, Thursday, November 25, 2010.
Warning: Before submitting, please do “more” files-to-be-submitted (acroread on the report) from your current directory, to check that you have the right versions to hand (the rules
are “what is marked, is what is submitted”).
References
[1] Approximate counting by dynamic programming, by Martin Dyer. 35th Annual ACM
Symposium on Theory of Computing, 2003, pages 693-699.
Mary Cryan
7
Download