CSE 594: Combinatorial and Graph Algorithms SUNY at Buffalo, Fall 2006 Lecturer: Hung Q. Ngo Last update: November 14, 2006 Analyzing approximation algorithms with the dual-fitting method 1 A greedy algorithm for WEIGHTED SET COVER One of the best examples of combinatorial approximation algorithms is a greedy algorithm approximating the WEIGHTED SET COVER problem. An instance of the S ET C OVER problem consists of a universe set U of m elements, a family S of n subsets of U , where set S ∈ S is weighted with wS . We want to find a sub-family of S with minimum total weight such that the union of the sub-family is U (i.e. covers U ). Consider the following greedy algorithm. Algorithm 1.1. G REEDY-S ET-C OVER(U, S, w) 1: C ← ∅, A ← U 2: while A 6= ∅ do 3: Pick S ∈ S with the least cost per un-covered element, i.e. pick S such that 4: A←A−S 5: C ← C ∪ {S} 6: end while 7: return C wS |S∩A| is minimized. In this section, we analyze this algorithm combinatorially. Then, a linear programming based analysis will be derived in the next section. Without loss of generality, suppose the algorithm returns a collection {S1 , . . . , Sk } of k sets. Let Xi be the set of newly covered elements of U after the ith step. Let xi = |Xi |, and wi = wSi which is the weight of the ith set picked by the algorithm. Assign a cost c(u) = wi /xi to each element u ∈ Xi , for all i ≤ k. P For any set S ∈ S, we first estimate u∈S c(u). Let ai = |S ∩ Xi |. Then, it is easy to see the following: wS w1 ≥ a1 + · · · + ak x1 wS w2 ≥ a2 + · · · + ak x2 .. .. .. . . . wk wS ≥ . ak xk Hence, X u∈S c(u) = k X i=1 k wi X wS ai ≤ ai ≤ wS · H|S| , xi ai + · · · + ak i=1 where H|S| = 1 + 1/2 + · · · + 1/|S| is the |S|th harmonic number. Since |S| ≤ m for all S, we conclude that X c(u) ≤ Hm · wS , ∀S ∈ S. (1) u∈S 1 One may ask, what if ai +· · ·+ak = 0 for some i. This is not a problem. Since S 6= ∅, a1 +· · ·+ak 6= 0. If ai + · · · + ak = 0 for some i, then all the terms ai wxii , . . . , ak wxkk can be ignored. Let T be any optimal solution, then X XX X cost(C) = c(u) ≤ c(u) ≤ H|T | · wT ≤ Hm · cost(T ). u∈U T ∈T u∈T T ∈T We thus have proved the following theorem. Theorem 1.2. G REEDY-S ET-C OVER has approximation ratio Hm . Exercise 1. In the S ET M ULTICOVER problem, each element u is required to be covered mu times, where mu is a positive integer. Each set can be picked multiple times. The cost of picking S k times is kwS . Devise a greedy algorithm for S ET M ULTICOVER with approximation ratio Hm (and prove that!). Exercise 2. In the M AXIMUM C OVERAGE problem, we are given a universe U , a collection S of subsets of U , and a positive integer k. Each element u in the universe has a non-negative integer weight wu . The problem is to find k members of S whose union has the maximum total weight. Suppose we solve this problem by greedily pick the best set in each iteration until k sets are picked. (“Best” set is the set maximizing total weight of uncovered elements.) Prove that this strategy has k approximation ratio 1 − 1 − k1 . Exercise 3. Consider the WEIGHTED VERTEX wv > 0. Consider the following algorithm COVER problem in which each vertex v is weighted with Algorithm 1.3. LR V ERTEX C OVER(G, w) 1: C = ∅ 2: For each v ∈ V (G), let c(v) ≤ wv 3: while C is not a vertex cover do 4: Pick an uncovered edge (u, v), let ≤ min{c(u), c(v)} 5: c(u) ← c(u) − ; c(v) ← c(v) − 6: Add into C all vertices v having c(v) = 0. 7: end while 8: return C Prove that this is a 2-approximation algorithm. 2 Analyzing GREEDY SET COVER with dual-fitting It is natural to find out how Algorithm 1.1 relates to the integer programming formulation of S ET C OVER. The IP for S ET C OVER is X wS xS min S∈S X (2) subject to xS ≥ 1, ∀u ∈ U, S3u xS ∈ {0, 1}, ∀S ∈ S. The LP-relaxation is min subject to X S∈S X wS xS xS ≥ 1, ∀u ∈ U, S3u xS ≥ 0, ∀S ∈ S. 2 (3) And, the dual LP is X max subject to u∈U X yu yu ≤ wS , ∀S ∈ S, (4) u∈S yu ≥ 0, ∀u ∈ U. The dual constraints look very much like relation (1), except that we need to divide both sides of (1) by Hm . Thus, for each u ∈ U , if we set yu = c(u)/Hm , then y is a dual feasible solution. It follows that X cost(C) = c(u) = Hm cost(y) ≤ Hm · OPT. u∈U 3 More general covering problems The C ONSTRAINED S ET M ULTICOVER problem is a generalization of the S ET C OVER problem in which each elements u ∈ U needs to be covered mu times, where mu is a positive integer. The corresponding integer program can be written as X min wS xS S∈S X (5) subject to xS ≥ mu , ∀u ∈ U, S3u xS ∈ {0, 1}, ∀S ∈ S. When relaxing this program, it is no longer possible to remove the upper bounds xS ≤ 1 (otherwise an integral optimal solution to the LP may not be an optimal solution to the IP). The LP-relaxation is X min wS xS S∈S X subject to xS ≥ mu , ∀u ∈ U, (6) S3u −xS ≥ −1, ∀S ∈ S, xS ≥ 0, ∀S ∈ S. The dual linear program is now max subject to X u∈U X mu y u − X zS S∈S yu − zS ≤ wS , ∀S ∈ S, (7) u∈S yu , zS ≥ 0, ∀u ∈ U, ∀S ∈ S. We will try to devise a greedy algorithm to solve this problem and analyze it using the dual-fitting method. Algorithm 3.1. G REEDY-S ET-M ULTICOVER(U, S, w, m) 1: C = ∅; A ← U 2: // We call an element u ∈ U “alive” if mu > 0. Initially all of A are alive 3: while A 6= ∅ do wS is minimized. 4: Pick S such that |S∩A| 5: C = C ∪ {S} 3 mu ← mu − 1 for each u ∈ S ∩ A 7: Remove from A all elements u with mu = 0 8: end while 9: return C 6: The next P step is to write P the cost of C as a multiple of the cost of a feasible solution to (7), namely cost(C) = ρ( u mu yu − S zS ) for some feasible solution (y, z) of (7). For each element u ∈ U and each j ∈ [mu ], let c(u, j) be the cost of covering u for the jth time. If S covers u for the jth time, and wS AS is the set of alive elements before S was picked, then c(u, j) = |S∩A . If S was chosen before T , S| then AT ⊆ AS . Thus, wS wT wT ≤ ≤ . |S ∩ AS | |T ∩ AS | |T ∩ AT | Consequently, for any u we have c(u, 1) ≤ · · · ≤ c(u, mu ). The final cost is cost(C) = mu XX c(u, j). u∈U j=1 P P In order to write this sum as ρ( u∈U mu yu − S∈S zS ) (keeping in mind that yu , zS ≥ 0), it makes sense to try cost(C) = X mu c(u, mu ) − u∈U = X mu c(u, mu ) − u −1 X mX [c(u, mu ) − c(u, j)] u∈U j=1 mu XX [c(u, mu ) − c(u, j)] u∈U j=1 u∈U The P double sum (after the minus sign) is non-negative, which is good. We need to write it in the form ρ S∈S zS somehow. Note that, each time u is covered, a term c(u, mu ) − c(u, j) is added into the double sum. For each S ∈ C, suppose S covers u ∈ S ∩ AS the ju,S th time. Then, mu XX [c(u, mu ) − c(u, j)] = u∈U j=1 X X [c(u, mu ) − c(u, ju,S )] . S∈C u∈S∩AS Now, let ρ be a number to be determined. Define yu = zS = 1 c(u, mu ), ∀u ∈ U ρ 1 X [c(u, mu ) − c(u, ju,S )] S ∈ C ρ u∈S∩AS S∈ /C 0 For (y, z) to be dual-feasible, we would like to find ρ so that, for each S ∈ S, Consider any S ∈ / C. In this case, X 1X c(u, mu ). y u − zS = ρ P u∈S yu − zS ≤ wS . u∈S u∈S Let u1 , . . . , uk be the elements of S. Without loss of generality, assume that u1 was completely covered before u2 , and so on. Then, right before ui is completely covered, |AS | ≥ k − (i − 1). Hence, c(ui , mui ) ≤ wS /(k − i + 1). Consequently, k X u∈S y u − zS ≤ 1X wS Hm ≤ · wS . ρ k−i+1 ρ i=1 4 Now, consider any S ∈ C. In this case we have X y u − zS = u∈S = 1X 1 X c(u, mu ) − [c(u, mu ) − c(u, ju,S )] ρ ρ u∈S u∈S∩AS X X 1 c(u, mu ) + c(u, ju,S ) ρ u∈S∩AS u∈S\AS Let u1 , . . . , uk0P be elements in S \AS which were completely covered in that order. Note that 0 ≤ k 0 < k. Note also that u∈S∩AS c(u, ju,S ) = wS . Similar to the previous reasoning, we get 0 X u∈S 1 y u − zS = ρ k X i=1 wS + wS k−i+1 ! ≤ Hm · wS . ρ Hence, (y, z) would be a dual feasible solution if we pick ρ = Hm , which is also an approximation ratio for Algorithm 3.1. Exercise 4. Devise a greedy algorithm for S ET M ULTICOVER with approximation ratio Hm . Analyze your algorithm using the dual-fitting method. Exercise 5. In the M ULTISET M ULTICOVER problem, we are given a collection S of multisets of a universe U . For each S ∈ S, let M (S, u) be the multiplicity of u in S. Each element u needs to be covered mu times. We can assume M (S, u) ≤ mu for all S, u. Devise a greedy algorithm for M ULTISET M ULTICOVER with approximation ratio Hd , where d is the largest multiset size. The size of a multiset is the total multiplicity of its elements. Analyze your algorithm using the dual-fitting method. Exercise 6. Consider the integer program min{cT x | Ax ≥ b}, where A, b have non-negative integral entries, and x is required to be non-negative and integral also. This is called a covering integer program. Use scaling and rounding to reduce covering integer programs to M ULTISET M ULTICOVER, so that we can use the greedy algorithm for the M ULTISET M ULTICOVER instance to get a greedy algorithm for the C OVERING I NTEGER P ROGRAM instance with approximation ratio O(lg n), where n is the input size of the covering integer program. (Thus, the instance of M ULTISET M ULTICOVER must have size polynomial in n.) Exercise 7. Vazirani’s book. Problem 24.12, page 241. Historical Notes The greedy approximation algorithm for S ET C OVER is due to Johnson [5], Lovász [6], and Chvátal [2]. Feige [4] showed that approximating S ET C OVER to an asymptotically better ratio than ln m is NP-hard. The dual-fitting analysis for G REEDY S ET C OVER was given by Lovász [6]. Dobson [3] and Rajagopalan and Vazirani [8] studied approximation algorithms for covering integer programs. The dualfitting method has found applications in other places [1, 7]. References [1] P. C ARMI , T. E RLEBACH , AND Y. O KAMOTO, Greedy edge-disjoint paths in complete graphs, in Graph-theoretic concepts in computer science, vol. 2880 of Lecture Notes in Comput. Sci., Springer, Berlin, 2003, pp. 143–155. 5 [2] V. C HV ÁTAL, A greedy heuristic for the set-covering problem, Math. Oper. Res., 4 (1979), pp. 233–235. [3] G. D OBSON, Worst-case analysis of greedy heuristics for integer programming with nonnegative data, Math. Oper. Res., 7 (1982), pp. 515–531. [4] U. F EIGE, A threshold of ln n for approximating set cover (preliminary version), in Proceedings of the Twenty-eighth Annual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996), New York, 1996, ACM, pp. 314–318. [5] D. S. J OHNSON, Approximation algorithms for combinatorial problems, J. Comput. System Sci., 9 (1974), pp. 256–278. Fifth Annual ACM Symposium on the Theory of Computing (Austin, Tex., 1973). [6] L. L OV ÁSZ, On the ratio of optimal integral and fractional covers, Discrete Math., 13 (1975), pp. 383–390. [7] M. M AHDIAN , E. M ARKAKIS , A. S ABERI , AND V. VAZIRANI, A greedy facility location algorithm analyzed using dual fitting, in Approximation, randomization, and combinatorial optimization (Berkeley, CA, 2001), vol. 2129 of Lecture Notes in Comput. Sci., Springer, Berlin, 2001, pp. 127–137. [8] S. R AJAGOPALAN AND V. V. VAZIRANI, Primal-dual RNC approximation algorithms for set cover and covering integer programs, SIAM J. Comput., 28 (1999), pp. 525–540 (electronic). A preliminary version appeared in FOCS’93. 6