# long

```Improved approximation for k-median
Shi Li
Department of Computer Science
Princeton University
Princeton, NJ, 08540
04/20/2013
minimize
maintenance cost + transportation cost
\$20
\$10
\$50
\$100
\$130
\$30
\$30
Facility Location Problem
BALINSKI, M. L.1966. On finding integer solutions to linear programs. In
Proceedings of the IBM Scientific Computing Symposium on Combinatorial
Problems. IBM, New York, pp. 225–248.
KUEHN, A. A., AND HAMBURGER, M. J. 1963. A heuristic program for
locating warehouses.
STOLLSTEIMER, J. F.1961. The effect of technical change and output
expansion on the optimum number, size and location of pear marketing
facilities in a California pear producing region. Ph.D. thesis, Univ. California
at Berkeley, Berkeley, Calif.
STOLLSTEIMER, J. F.1963. A working model for plant numbers and
locations. J. Farm Econom. 45, 631– 645.
Uncapacitated Facility Location
(UFL)
facilities
clients
F : potential facility locations
C : set of clients
fi , i  F : cost for opening i
d : metric over F C
\$100
\$100
\$30
\$20
find S F,
minimize
+
facility cost
\$100
\$100
connection cost
Wal-mart Stores
in New Jersey
Question :
Suppose you have budget
for 50 stores, how will you
select 50 locations?
k-median
facilities
clients
F : potential facility locations
C : set of clients
of facilities
to open
fki :, inumber
 F : cost
for opening
i
d : metric over F C
find S F, |S |= k
minimize
+
k-median clustering
Known Results: UFL
 O(log n)-approximation [Hoc82]
 constant approximations







3.16 [STA98]
2.41 [GK99]
3
[JV99]
1.853 [CG99]
1.728 [CG99]
5+ε [Kor00]
1.861 [MMSV01]






1.736 [CS03]
1.61 [JMS02]
1.582 [Svi02]
1.52 [MYZ02]
1.50 [Byr07]
1.488 [Li11]
 1.463-hardness of approx. [GK98]
4 Deterministic rounding of linear programs
4.5 The uncapacitated facility location problem
5 Random sampling and randomized rounding
of linear programs
5.8 The uncapacitated facility location problem
7 The primal-dual method
7.6 The uncapacitated facility location problem
9 Further uses of greedy and local search
algorithms
9.1 A local search algorithm for the uncapacitated
facility location problem
9.4 A greedy algorithm for the uncapacitated facility
location problem
12 Further uses of random sampling and
randomized rounding of linear programmings
12.1 The uncapacitated facility location problem
Know results : k-median
 pseudo-approximation
 1-approx with O(k log n) facilities [Hoc82]
 2(1+ε)-approx. with (1+1/ε)k facilities[LV92]
 super-constant approximation
 O(log n loglog n) [Bar96,Bar98]
 O(log k loglog k) [CCGS98]
Known Results: k-median
 constant approximation
LP rounding
6.667 [CGTS99]
3.25 [CL12]
Primal-Dual
Local Search
6 [JV99]
3+ε [AGK+01]
4 [JMS03]
4 [CG99]
1+√3+ε [LS13]
 (1+2/e)-hardness of approximation [JMS03]
Lloyd Algorithm[Lloyd82]
 k-means clustering : min total squared distances
 k-means vs k-median
• clustering: k-means is more
often used
• Walmart example: k-median
is more appropriate
• approximation: k-median is
“easier”
Local Search
 Can we improve the solution
by p swaps?
 No : stop
 Yes : swap and repeat
 Approximation :
 k-median : 3+2/p [AGK+01]
 k-means : (3+2/p)2 [KMN+02]
LP for k-median
yi : whether to open i
xi,j : whether connect j to i
integrality gap
is at least 2
integrality gap is at most 3 open
(proofatnon-constructive)
most k facilities
client j can
mustonly
be connected
connected
to an open facility
(1+√3+ε)-approximation on k-median
k-median and UFL
 f = cost of a facility
 f
#open facilities
Given a black-box α-approximation A for UFL
Na&iuml;ve try : find an f such that A opens k facilities
α-approxition for k-median?
Proof : α ≈1.488 for UFL, α &gt; 1.736 for k-median
k-median and UFL
Na&iuml;ve try : find an f such that A opens k facilities
2 issues with na&iuml;ve try :
1. need LMP α-approximation for UFL
α-approximation:
LMP α-approximation
F
+
C
F+
C
a a
a
&pound; OPT
&pound; OPT
LMP = Lagragean Multiplier Preserving
k-median and UFL
Na&iuml;ve try : find an f such that A opens k facilities
2 issues with na&iuml;ve try :
1. need LMP α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
S1 : set of k1 &lt; k
facilities
S2 : set of k2 &gt; k
facilities
bi-point solution
k-median and UFL
2 issues with na&iuml;ve try :
1. need LMP α-approximation for UFL
2. can not find f s.t. A opens exactly k facilities
LMP approx. factor
bi-point  integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
dothis
not factor
know of
how
improve
2 istotight
!!
bi-point solution
S1
S2 k1= |S1| &lt; k ≤ |S2| = k2
a, b : ak1 + bk2 = k, a + b = 1
bi-point solution : aS1+bS2
cost(aS1+bS2) = a cost(S1) + b
cost(S2)
gap-2 instance
0
cost of integral solution = 2
1
S1
k1 = 1,
S2
k+1
cost(S1) = k+1,
01
k -1
k1 +
k2 = k
k
k
k2 = k+1
cost(S2) =
k-median and UFL
LMP approx. factor
bi-point
 integral
bi-point
 pseudo-integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
1+ 3 + e
2
1+ 3 + e
this factor
of 2 istotight
Main Lemma
1 : suffice
give!!an α-approximate
solution with k+O(1) facilities
Main Lemma 2 : bi-point solution of cost C
1+ 3 + e
 solution of cost
C with k+O(1/ε) facilities
2
Main Lemma 1
A : black-box α-approximation with k+c open facilities
A' : (α+ε)-approximation with k open facilities
A' calls A nO(c/ε) times.
open
facilities,
cost
=0
k open
facilities
, cost
huge
Dense Facility
Bi : set of clients in a small ball around i
i is A-dense, if connection cost of Bi in OPT is ≥ A
this instance : i is A-dense for A≈opt
i
Bi
Dense Facility
Reduction component works directly if there are
no opt/t-dense facilities, t = O(c/ε)
can reduce to such an instance in nO(t) time
i
Bi
Lemma 1 from [ABS]
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
 k-median clustering is easy in practice
 reason : there is a “meaningful” clustering
[Awasthi-Blum-Sheffet] : ε, δ &gt;0 constants,
OPTk-1 ≥ (1+δ)OPTk  can find (1+ε)-approximation
Lemma 1 from [ABS]
[ABS] OPTk-1 ≥ (1+δ)OPTk  (1+ε)-approximation
A : α-approximation algorithm for k-median with k+c medians
Algorithm
 Apply A to (k-c, F, C, d)  solution with k facilities of cost ≤ αOPTk-c
 Apply [ABS] to each (k-i, F, C, d) for i = 0, 1, 2, …, c-1
 Output the best of the c+1 solutions
Proof
 If OPTk-c ≤ (1+ε)OPTk, then done.
 otherwise, consider the smallest i s.t. OPTk-i-1 ≥ (1+ε)1/cOPTk-i
 [ABS] on (k-i, F, C, d)  solution of cost (1+ε)OPTk-i ≤ (1+ε)2OPTk
Main Lemma 2 : bi-point solution of cost C
 solution of cost 1+ 3 + e C with k+O(1/ε) facilities
2
[JV] bi-point solution of cost C  solution of cost 2C
 based on improving [JV] algorithm
JV algorithm
S1
i
S2
τi = nearest facility of i
given : bi-point solution aS1+bS2
select S’2  S2 ,
|S’2| = |S1| = k1
with prob. a, open S1
with prob. b, open S’2
randomly open k-k1 facilities in S2 \ S’2
guarantee : either i is open, or τi is open
Analysis of JV algorithm
d1
i1
j d2
≤ d1+d2
i2
i1  S1 ,
i3  S’2
i3
either i1 or i3 is open
If i2 is open, connect j to i2
Otherwise, if i1 is open, connect j to i1
Otherwise connect j to i3
E[cost of j] ≤
2
&times; [cost of j in aS1+bS2]
Our Algorithm
i3
i1
≤ d1+d2
d1
j d2
≤ d1+d2
i2
i3
on average, d1 &gt;&gt; d2
d(j, i3) ≤ d2d
1+2d
1+d22
If i2 is open, connect j to i2
Otherwise, if i1 is open, connect j to i1
Otherwise connect j to i3
E[cost of j] ≤
1+ 3
22
&times; [cost of j in aS1+bS2]
Our Algorithm
need to guarantee : either i is open, or τi is open
for a star, either the center is
open, or all leaves are open
τi
i
first
idea try
:
 open
eachalways
star independently?
big stars:
open the center,
 with
a, open
center,
open prob.
each leaf
withthe
prob.
≈b
openofthe
 with
groupprob.
smallb,stars
theleaves
same
 problem
: can not
bound the
size, dependent
rounding
openopen
facilities
 number
for each of
group,
3 more
facilities than expected
small stars
small star : star of size ≤ 2/(abε )
Mh : set of stars of size h, m = |Mh|
Roughly,
for am stars, open the center
for bm stars, open the leaves
More accurately,
permute the stars and the facilities
open top &eacute;&ecirc;am&ugrave;&uacute; +1 centers
open bottom &eacute;&ecirc;bhm&ugrave;&uacute; leaves
big stars
size h &gt; 2/(abε )
always open the center
randomly open &ecirc;&euml;a + bh&uacute;&ucirc; -1 leaves
&ecirc;&euml;a + bh&uacute;&ucirc; -1≈ bh for big star
Lemma : we open at most k + 6/(abε)
facilities.
for a big star of size h,
FRAC : a+bh
ALG : &ecirc;&euml;a + bh&uacute;&ucirc; &pound; a + bh
for a group of m small stars of size h
FRAC : m(a+bh)
ALG : &eacute;&ecirc;am&ugrave;&uacute; +1+ &eacute;&ecirc;bhm&ugrave;&uacute; &pound; m(a + bh)+ 3
there are at most 2/(abε) groups
Summary
LMP approx. factor
bi-point  pseudo-integral
final ratio for k-median
[JV]
[JMS]
our result
3
x2
6
2
x2
4
2
1+ 3 + e
2
1+ 3 + e
Main Lemma 1 : suffice to give an α-approximate
solution with k+O(1) facilities
Main Lemma 2 : bi-point solution of cost C
1+ 3 + e
 solution of cost
C with k+O(1/ε) facilities
2
Open Problems
 gap between integral solution with k+1 open
facilities and LP value(with k open facilities)?
 tight analysis?
 algorithm works for k-means?
Questions?
```