Approximating k-Median via PseudoApproximation Shi Li Princeton Ola Svensson EPFL 04/20/2013 Wal-mart Stores in New Jersey Question : Suppose you have budget for 50 stores, how will you select 50 locations? k-median facilities clients F : potential facility locations C : set of clients k : number of facilities to open d : metric over F C find S F, |S | = k minimize connection cost k-median clustering Known Results: k-median LP rounding 6.667 [CGTS99] 3.25 [CL12] Primal-Dual Local Search 6 [JV99] 3+ε [AGK+01] 4 [JMS03] 4 [CG99] 1+√3+ε ≈ 2.732 [LS13] (1+2/e)-hardness of approximation [JMS03] 2 ≤ LP-GAP ≤ 3 (∃exp. time algorithm) Uncapacitated Facility Location k-median (UFL) facilities clients F : potential facility locations C : set of clients $100 $100 of facilities to open fki :, inumber F : cost for opening i d : metric over F C find S F, |S |= k minimize + facility cost $30 $20 $100 $100 connection cost Known Results: UFL Studied in 1960’s in Operations Research 3.16 [STA98] 2.41 [GK99] 3 [JV99] 1.853 [CG99] 1.728 [CG99] 5+ε [Kor00] 1.861 [MMSV01] 1.736 [CS03] 1.61 [JMS02] 1.582 [Svi02] 1.52 [MYZ02] 1.50 [Byr07] 1.488 [Li11] 1.463-hardness of approximation [GK98] (1+√3+ε)-approximation on k-median k-median and UFL f = cost of a facility f #open facilities Given a black-box α-approximation A for UFL Naïve try : find an f such that A opens k facilities α-approximation for k-median? Proof : α ≈1.488 for UFL, α > 1.736 for k-median k-median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need strong α-approximation for UFL Normal α-approximation : strong α-approximation : F + C F+ C a a a £ OPT £ OPT k-median and UFL Naïve try : find an f such that A opens k facilities 2 issues with naïve try : 1. need strong α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities S1 : set of k1 < k facilities S2 : set of k2 > k facilities bi-point solution bi-point solution S1 S2 |S1| < k ≤ |S2| a, b : a|S1| + b|S2| = k, a + b = 1 bi-point solution : aS1+bS2 cost(aS1+bS2) = a cost(S1) + b cost(S2) k-median and UFL 2 issues with naïve try : 1. need strong α-approximation for UFL 2. can not find f s.t. A opens exactly k facilities strong approx. factor bi-point integral final ratio for k-median [JV] [JMS] our result 3 x2 6 2 x2 4 2 dothis not factor know of how improve 2 istotight !! k-median and UFL strong approx. factor bi-point integral bi-point pseudo-integral final ratio for k-median [JV] [JMS] our result 3 x2 6 2 x2 4 2 Main this Lemma 1 : ofsuffice to give factor 2 is tight !! an α-approximate solution with k+O(1) facilities Main Lemma 2 : bi-point solution of cost C solution of cost with k+O(1/ε) facilities Proof of Lemma 1 Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities clustering case : simpler proof due to anonymous reviewer k-median clustering is easy in practice reason : there is a “meaningful” clustering [Awasthi-Blum-Sheffet] : ε, δ >0 constants, OPTk-1 ≥ (1+δ)OPTk can find (1+ε)-approx. Proof of Lemma 1 A : α-approx. with k + c facilities Apply A to (k-c, F, C, d): k centers, cost ≤ α OPTk-c Case 1 : OPTk-c ≤ (1+ε)OPTk , DONE! Case 2 : OPTk-c > (1+ε)OPTk apply [Awasthi-Blum-Sheffet] OPTk-c OPTk-i-1 OPTk-i OPTk k-c k-i-1 k-i k Main Lemma 2 : bi-point solution of cost C solution of cost with k+O(1/ε) facilities [JV] bi-point solution of cost C solution of cost 2C based on improving [JV] algorithm JV algorithm S1 i S2 Prob. of opening a facility in S1 τi = nearest facility of i given : bi-point solution aS1+bS2 select S’2 S2 , |S’2| = |S1| = k1 withofprob. a, open S1 in S Prob. opening a facility 2 with prob. b, open S’2 randomly open k-k1 facilities in S2 \ S’2 guarantee : either i is open, or τi is open Analysis of JV algorithm d1 i1 j d2 ≤ d1+d2 i2 i1 S1 , i3 either i1 or i3 is open : j i2 b × d2 else if i1 open : j i1 + a2 × d1 if i2 open else (i3 open) E[cost of j] ≤ : j i3 2 i3 S’2 + ab × (2d1+d2) × [cost of j in aS1+bS2] Our Algorithm i3 i1 ≤ d1+d2 d1 j d2 ≤ d1+d2 i2 on average, d1 >> d2 i3 : j i2 b × d2 else if i1 open : j i1 + a2 × d1 if i2 open else (i3 open) E[cost of j] ≤ : j i3 1+ 3 22 + ab × (2d (d1+2d 1+d2) × [cost of j in aS1+bS2] Our Algorithm need to guarantee : either i is open, or τi is open for a star : either center open, (with prob. a) τi i or all leaves open (with prob. b) open ideaeach : star independently? may bighappen stars: always the center, : #openopen facilities >k open each leaf with prob. ≈b group of small stars of the same size : dependent rounding each group, open 3 more facilities than expected Summary strong approx. factor bi-point integral bi-point pseudo-integral final ratio for k-median [JV] [JMS] our result 3 x2 6 2 x2 4 2 Main Lemma 1 : suffice to give an α-approximate solution with k+O(1) facilities Main Lemma 2 : bi-point solution of cost C solution of cost with k+O(1/ε) facilities Open Problems gap between integral solution with k+1 open facilities and LP value(with k open facilities)? tight analysis? algorithm works for k-means? Questions?