>> Kostya Makarychev: So let’s start today’s talk. It... Haghtalab. She is a graduate student at Carnegie Mellon...

advertisement
>> Kostya Makarychev: So let’s start today’s talk. It is a great pleasure to introduce Nika
Haghtalab. She is a graduate student at Carnegie Mellon University advised by [indiscernible]
and [indiscernible]. Nika works on learning theory, game theory, algorithms and she already has
very exciting results and today she will tell us about the k-center problem.
>> Nika Haghtalab: Thank you. I am going to talk about symmetric and asymmetric k-center
problem and specifically how they behave under stability conditions. Let me first start by
introducing my co-authors. This is joint work with Nina Balcan and Colin White. I have tried
very hard to include a lot of pictures and figures in this talk since it is a clustering talk, but I
don’t think any of them are going to look as good as these pictures do, but let’s hope.
So I wan to talk about k-center clustering. K-center clustering is motivated by a very simple and
natural question which is: if we want to for example assign K locations in a city to fire stations
how can we do that and where should we put these fire stations? But we should first think about
what our goal is when we are putting in fire stations. We really want to reduce the travel time
for the worst location possible in the city. So if the worst location was going to be on fire we
want the closest fire station to be able to send a fire truck and resolve this issue very quickly.
That is what k-center clustering is going to do.
More formally for a set of points, S and a distance metric D, k-center problem is a problem of
choosing K of those points and assign them as centers. Then partitioning every point and
assigning it to its closest center with the goal to minimize the maximum possible radius in this
sort of cover. Even more formally we want to choose a set of centers for the worst point in the
set we want to have the best distance being minimized. We are going to call this as a cost of the
k-center clustering and we will refer to it as r*.
As an example let’s have a bunch of points. We are going to assign the centers as the middle
points and the r* is really the cost of the largest edge here. The partition that’s induced by the
sort of assigning each point to its closest center is going to be the k-center clustering. This is not
always symmetric; they could be different in different directions. For example when you are
driving you have one way roads, so the distance to and from a location can be very different. We
are going to consider k-center under two different notions of distance, one when it symmetric
and one when it’s asymmetric. For the asymmetric setting we still have some axioms about the
distance, which is the distance of any point to itself is 0 and the direct triangle inequality holds.
In addition if you are in symmetric setting you also have that for any two points the distance in
either direction are the same.
Our objective stays the same. We want to be able to minimize the distance from the center to the
points. So this direction we care about. The backward we don’t. It really doesn’t matter, you
can switch this as long as we stick with a consistent notion everything is good. So for this talk
we are going to talk with distance from the center to the points. There are well known worst case
approximation results for both symmetric and asymmetric k-center. For the symmetric k-center
in 85 Gonzalez showed a tight 2-approximation algorithm for the symmetric k-center. It is very
simple and you cannot do better than that. Any 2- epsilon approximation is going to be hard.
For the asymmetric version in 96 and then later in 2001 there was an approximation algorithm
for log * n and if we need to remind ourselves of what of log * is it is a super constant function.
If you write n as a tower of powers of 2 the height of this tower is log * n. So it is a very slowly
growing function, but nevertheless it is not constant. So there was an approximation algorithm
for of log * n and the guess at that point was that this cannot be tight. It is not a very natural
number to come up with, but as it turned out it was tight. There was a matching lower bound for
the asymmetric k-center of log * n that was shown later.
Question?
>>: 7 others.
>> Nika Haghtalab: There are many others. This was a breakthrough result.
Good, so in many situations having a constant approximation, let along a super constant
approximation is not very desirable. This is really a cost you are paying. So if you are paying
double the cost you might not be happy with that, but unfortunately this existence of tight
approximation guarantees and stops us from going and getting better results. But the main
problem here is that these tight results are just for the worst case situation. In many situations in
real life we are not always given this worst case scenario. A lot of times it would be very
pessimistic to assume that everyday we are solving a worst case instance.
So we want to go beyond this worst case analysis and focus on instances that can naturally occur
in real life. What does that mean? It means that we assume certain natural assumptions, natural
and double codes and then we are hoping that under those assumptions we can get better results.
Hoping that either we can reduce these approximation guarantees or we can actually find the
exact solution.
>>: If there is a problem for which I say worst case makes sense is particularly k-center no, but
the first example where you just showed –.
>> Nika Haghtalab: Yea so maybe when I show you the type of natural assumptions we make
you will see that it does make actually a lot of sense. So maybe I will come back to this question
in two or three slides.
Okay, so one of the notions that we are going to assume here is called perturbation resilience and
what it tries to capture is that in real life we make some mistakes. Things we measure can be a
little bit off or there can be fluctuations in them. For example in this case the traffic on one day
is higher than the other day. So the travel time changes from one day to another. These small
changes, small fluctuations in the measure our hope is that is not going to entirely change our
solution. So if we decide to put the fire stations at one place after six months hopefully we are
not going to come back and say, “Oh the traffic has changed by epsilon and now we have to
destroy all the fire stations we had and come back and put k different fire stations somewhere
else.”
So that is perturbation resilience is trying to capture, that small fluctuations don’t make a big
difference. So more formally, actually not more formally yet, perturbation resilience for most of
this talk I am going to assume that small fluctuations don’t change the solution at all. So, it’s not
that they don’t change it drastically, but they don’t change it at all. I will make it more robust at
the end. This implies some more structure and the hope is that this more structure is going to
help us find the exact solution. Let’s look at this example. So a perturbation or fluctuations in
the distance are going to look something like this and in this specific scenario there is a lot of
structure present so the clustering doesn’t change at all. So this instance would be perturbation
resilience.
More formally perturbation resilience was introduced by Bilu and Linial in 2012 and what it says
is that for an instance and a distance metric it is alpha perturbation resilience if I blow up the
distances by at most multiplicated factor of alpha. So this is what that is. Then the solution does
not change at all. So that is being alpha perturbation resilient. One very natural implication of
this result is that actually the optimal clustering is unique already, because optimal clustering you
are not making any extra perturbation to it. So if there exists 2 optimal clustering’s one has to be
different. The other thing to note is that it’s fine if the centers change. All we care about is that
the partition does not change under the new distance metric. It’s okay if the centers do change.
Is there any question about perturbation resilience?
Okay, the next natural assumption that I want to mention here for this talk is called
approximation stability. What it captures is that if you are approximating the cost of a function,
of a clustering in this situation, you are also approximating the membership or the partition itself.
So if you move within the cost for a partition that has better cost the partition is also closer in
membership to the actual optimal solution. This was introduced by Balcan, Blum and Gupta in
2009 and more formally what it is, is that if you have an alpha approximation of the cost, which
is ar* we have an epsilon fraction of the points being different than the optimal. So that’s alpha
epsilon. A very simple result is that if you have an alpha 0 approximation stable instant you
have an alpha perturbation resilience instance. This is relatively simple. So what it means is that
approximation stability is actually a stronger notion of stability than perturbation resilience.
Okay so at this point is there worry about this? It is something that we decide. It is an
assumption that we make about well behaving instances.
So, prior work about perturbation resilience and approximation stability so BL’09 introduced this
for max cut problem and they showed that for a square root of n perturbation resilience they can
find the exact optimal solution. ABS’10 showed that for center based clustering, which is kcenter, k-median, k-mean and a bunch of other clustering notions, under 3 perturbation they can
find the optimal clustering. BL’11 improved this to 1 plus root 2 perturbation resilience.
MMV’14 improved the max cut from BL’11 and also showed that for a min multiway cut under
4 perturbations you can find the optimal solution. For approximation stability BBG’09 showed
that for k-means and for k-medians and also min sum if you have a 1 plus delta approximation
stable instance then you can find the optimal result quickly.
In this talk I am going to first improve on this notion which is the 3-PR and 1 + square root of 2PR, that’s for the symmetric case. I am going to show that for k-center a 2 perturbation
resilience is enough. On the other hand for asymmetric I am going to show that if you have 3
perturbation resilience you can find the exact optimal solution and if you have 2 approximation
stability you can also find the exact solution. We also show that, but I am not going to go
through it in this talk as much, that both of these are tight. So anything under 2 perturbation
resilience or 2 approximation stability for k-center, both symmetric and asymmetric, stays hard,
finding the optimal solution stays hard.
To put into perspective what we are going to show remember that before we have any stability
assumptions, if we don’t have any stability assumptions I said that there’s a tight 2
approximation for k-center and a tight log * n for asymmetric version of it. Under stability the
story changes a lot. We have the same 2-PR, the 2 perturbation, so we can find the exact
solution under 2 perturbation, but for the asymmetric perturbation resilience changes the story a
lot. All of a sudden from that log * n we can go down to a constant alpha and that’s pretty cool.
For approximation stability that even goes farther and in fact for both k-center and asymmetric kcenter we will have a 2 approximation stability, which for this case is pretty trivial, for
asymmetric not so much. So it’s really interesting to see that under these stability assumptions
without them 2 different problems could be very different in terms of difficulty and with them
they could look very similar.
So I want to talk about asymmetric version under perturbation resilience. One of the challenges
when you are working with asymmetric clustering in general is that there are points that, the
distance in the two different directions are very different. This is especially a problem when a
point is much closer to a center of a different cluster, so this. Oh sorry, this is just showing that
the two versions are different. A point is much closer to a center of different cluster than it’s
own center. The problem here would be that if I draw a ball of radius r around each point a point
might not capture any of its own points in the same cluster. So these points are not really good.
We don’t want to deal with them or we want to sort of filter them and not worry about them too
much for now.
I am going to introduce a set called symmetrized set which captures this notion. It’s going to
include points that there forward and backward distances are almost the same to threshold r*.
Formally a point p is put in a symmetrized set, if for all the other points, if that other point to p
has distance less than r* p also has distance less than r* to q or even more formally if this
statement holds, if q to p is smaller than r* this implies that p to q is also less than r*. So this
means that in 2 directions they are both under r*.
Let’s look at an example of this. So I’m going to put the definition of A up there, but if there is
any question about it please let me know and I will explain that. Let’s look at this example. So
for the maroon point the only other point that we have in this set is a point that has distance 2 r to
it. So for p equal to the maroon point and q equal to the beige point this is always false, because
there is no point that it has less than r* to p. So, because the premise is false this statement is
always true and our maroon point is in the set a. For the beige point however we have a distance
of r* going into it, but we have a distance of 2 r* coming out, so that’s not good. So that point is
not going to be in the set A.
One more example: we have for the maroon point we have a distance of r* going in and again a
distance of 2 r* coming out, even though they are from two different clusters and again that’s not
good. So that’s not going to be in set A either. So I just defined the set A so it has symmetrized
points in it, so it would be nice if set A is not empty, because if it’s empty what I did was
completely useless. So I am going to tell you about 3 properties of set A and these 3 properties
hold if we have 3 perturbation resilience.
The first property is that all of the centers are in set A and this is really nice. At least one point,
and the most important point of each cluster, which is the center is going to be in this set A. The
second point is that if I only look at set A 2 points are written within distance r* from each other,
then they both belong to the same optimal cluster. Note that as long as we are in set A if 2 points
in one direction have distance of r* in the other direction they also have distance of r*. So when
we are considering set A symmetry tells us that don’t worry about the direction. For the third
property, now looking outside of A if I take a point outside of A the point in a that has the closest
distance to at that point both belong to the same optimal cluster again.
So these three properties show that there is a lot of structure present in this set A. Not only
centers are part of it and it you look at any r* threshold of the set A the points belong to the same
set, but also even if you don’t look in A and you are looking outside the set A is going to guide
you and tell you how you have to put these points in the same cluster. So if we have these three
properties then the algorithm is very natural. So our algorithm first starts by creating the set A,
so the blue points are going to be in the set A and then we are going to partition it based on r*,
which means that any 2 points that are within r* away from each other are put in the same
partition or you can think of it as draw these r* edges and the connected components of this
graphs is the partition.
After doing that for any point that was not in A we are going to connect it to its closest point in
A. The claim is that the clustering induced by this partition is the optimal clustering. I am going
to give you one minute proof version of this by going through the algorithm one more time. We
are going to create set A again and at this point what do we know? It has all the centers and that
was by fact 1. So if fact 1 was correct then A has all of the centers. Then we are going to
partition set A based on r*. So at this point we know a couple more properties that because all
the centers were in A and the distance from a center to all of its points is r so each center and the
points in its own cluster are now put in the same partition. So that we know and that’s just by the
definition of the radius.
We also know that no set has 2 points from 2 different clusters. Why is that? That’s by fact 2,
fact 2 said that if you are written at distance of r* then you are in part of the same cluster. So if I
made this partition by r* then by fact 2 I am not mixing or merging any of the two different
clusters together. And what do I know at the end? I know that I connected each point to its
corresponding closest point in A, which also has to belong to the same cluster by factory. So if
facts 1, 2 and 3 hold then this algorithm is correct.
>>: So even if facts 1 and 3 [indiscernible].
>> Nika Haghtalab: No because the point p for the second part –. So I don’t have an example
here, but let’s assume that there was a point here that its closest point was this guy, a point that
was not the center itself. Actually this is a pretty good example here. This guy’s closest point is
this guy, that’s not the center. So if I don’t know that these two have to belong to the same set
and that’s a problem, especially because fact 3 is defined for outside of A not inside of A. So
you do need the 3 facts. So is the algorithm and the overall proof clear? Yeah?
>>: So just one thing, how do you know r*?
>> Nika Haghtalab: You don’t need to know it, you can guess it. You are not in a [indiscernible]
setting, you are in a setting where you have n points and you have n squared distances, guess r*.
It has to be one of the n square distances and run this. That’s one way of doing it. So if you
don’t know the optimal you can always guess it because you are in a very discrete setting.
>>: But if I guess incorrectly then I would get the wrong number of clusters or what?
>> Nika Haghtalab: Yes you would get the wrong number of clusters.
>>: Necessarily.
>> Nika Haghtalab: Yes because this is monotonic n k.
>>: [indiscernible].
>> Nika Haghtalab: So I want to give you a proof of fact 1 and 2 and fact 3 we are going to
ignore the proof.
>>: So sorry, how can you find the centers?
>> Nika Haghtalab: The centers? I am not finding them. The centers –.
>>: [indiscernible].
>> Nika Haghtalab: Exactly, so once you have the cluster you can find the center. So I am
saying this partition is optimal clustering [inaudible].
Okay, so I am going to give you proof of fact 1 and 2, especially because it’s going to use a
perturbation so I want you guys to get a flavor, a feel of like how these perturbations are
important. Fact 3 we will ignore for now. A useful lemma, this is by definition a useful lemma,
that is for any point the distance of that point to a center that it’s not its own is more than 2 r*. I
want to prove this and once I prove this fact 1 and 2 follow very quickly from it.
Let’s assume that’s not the case. So we have a p and a q such that p to that center is 2 r*. So
reversing this, so we are assuming that there is a 2 r* distance between p and the center. So the
distance between that center and q is by definition r*, therefore the distance of p to q is at most 3
r*. So the distance of p to any point in the cluster that it’s not its own is now at most 3 r*. I am
going to make a perturbation; this perturbation is going to increase all of the distances by
multiplying it by 3, except exactly the distances between p and this set. It’s going to increase it
up to a 3 r*, above that we are not going to increase it.
So this is going to look kind of like this, but it’s very hard to show it in plane. But what happens
is that in this new distance metric the distance between p and any other point is still the same 3
r*. So if I were to cluster this way the cost of this clustering would be 3 r* and the cost of this
clustering is no more than r*. So this is a 3 r* approximation in essence. It is not really an
approximation, but the cost of this clustering is 3 r* in the new distance. Also in the new
distance what I did is increase all of the distances that were r* before to 3 r* now. So, there is no
way that my previous clustering now has a cost that is smaller than 3 r*.
So if I were to cluster the way that it was before the cost of that clustering is at least 3 r*. So I
have two clustering’s that both have cost 3 r* or the other clustering is actually better than it. So
the optimal clustering is not unique anymore. This would be a contradiction. So I created a
perturbation and I showed that optimal clustering is not unique anymore. So this is very simple
and now let’s use this lemma which says that your distance to any center that’s not your own is
more than 2 r* to prove fact 1 and 2. The definition of A there if you need it. So we want to
show that all of the centers are in set A.
Consider one of the clusters in that center, for the points that are part of that cluster we know that
the center to cluster by definition has distance of r*. So no matter what the backward dimension
is the second part of this implication is always correct. So the center is naturally always in that
set, except if there was a point that was also not in the same cluster, but pointing to you at a
distance of r*, but our lemma says that this just cannot happen because the distance of such a
point is at least 2 r*. So any center always satisfies this implication and any center therefore is
always in set A.
Fact 2 says that if you are in set A and if you are within r* of each other you are both in the same
optimal clustering. Assume not, so we have a p and a q such that they are within distance of r*
from each other, but they are in two different clusters. So then starting from p, going to q I am
paying a distance of r*. What about starting from q going to the center? Q is in the set A,
because it’s in the set A and because the center to q has a distance of r* the q to center also has to
have distance of r*. So the distance that q pays to go to the center is also at most r*. Given that
using triangle inequality p to the other center is now at most 2 r*, which contradicts the lemma.
So these two facts follow very naturally from the lemma that we have.
Are there any questions about this? Good. So this proves that if you have 3 perturbation
resilience instances for asymmetric k-center you can find the exact optimal solution with a very
simple algorithm that I showed you.
Let’s go to the symmetric case. So for the symmetric case what we have is that I claim that a 2
perturbation resilience is enough and that’s because there is extra structure in k-center, like
symmetry for example, that is going to help us. We are going to use an algorithm that was
introduced by Balcan and Liang. When it was introduced it was shown to have 1 + root 2
perturbation resilience, to work for 1 + root 2 perturbation resilience. We are going to show that
for k-center it does better, only 2 perturbation resilience is enough for it and I will just give you
an overview of why.
There are two key properties when you have 2 perturbation resilience in symmetric instances. I
am not going to prove them to you, but I will tell you what they are. The first one is that if you
draw a ball of radius ri at the center then you are capturing points from that cluster and only from
that cluster, which means that this is not an instance of this property that I am telling you,
because I drew an r* ball and I captured another point. But if there was a little bit more
structure, which is that then such an example satisfies the coverage property. Draw a ball of
your radius you are capturing your cluster and exactly the cluster.
>>: [inaudible].
>> Nika Haghtalab: The ri is the radius of [indiscernible]. So if this is ci and this is big Ci
cluster I, ri is the radius of that.
The second property that we call weak center proximity is that each point is closer to its own
center than any other point, which means that if I take p and take it to its own center and I take it
to another point in a different cluster the red edge is always smaller than the blue edge. So these
two properties I can show that hold for any 2 perturbation resilience instance of k-center. They
don’t necessarily hold for other center based clustering’s actually. So if you have 2 perturbation
resilience here for k-medians for example the coverage doesn’t work. So this is using k-center
very much, like using the structure that exists in k-center.
Based on these two properties there is a notion called a closer distance notion, which says that
for any two sets the linkage, the closure linkage distance between these two sets is defined as
followed: I will chose one center and I will draw a radius of r ball around it. 2 properties should
hold, the first one is that this ball should capture both a and b and the second property is that
anything else that also it captures including a, b and more should be closer to this center than to
any other point outside of this ball. This already sounds very much like the 2 properties that I
explained to you and there is a reason otherwise they wouldn’t have come up with this notion of
distance.
Let’s look at an example. I have a set ab and I want to see what their closure distance is. So I
will pick a center and I will draw a radius of r around it, whatever radius this is r. So do I have
the properties or not? So a and b are definitely captured, so that’s good, but there is a point
where the red edge is bigger than the blue edge. So this cannot be r*, this cannot be the closure
distance. I have to go slightly farther. So now I go farther to capture that bad point and now any
point that is within here is closer to the center than any point outside of it. So this is going to be
the closure distance of set a and b. Let’s do one more example. I want to show you that if I take
2 sub sets of the same cluster and I make sure that one of them has the center of the cluster in
there then the closure distance of these two is at most ri, which is the radius of that center.
Okay, so we have the sets a and b, if it’s not clear a and b are the bigger sets. I am going to draw
a ball of radius ri at the center and I am going to show you that it has the desired properties. So it
has the first property because we said by coverage property if I drew a ball of radius ri I capture
exactly the cluster so that’s good, it has the first property. For the second property let’s assume
that it doesn’t have it then by the weak center proximity, by definition you have to be closer to
your own center, which happens to be the center of the optimal cluster than to any points outside
of your cluster. This is exactly what weak center proximity says. So if you have 2 sub sets of
the same cluster and you make sure that ci is in one of them the closure distance of the two is at
most ri.
Now I will tell you what closure distance algorithm actually does. It is a linkage based
algorithm, it’s very simple. Once you understand the definition of closure link, closure distance
is very simple. It starts with all the points being laid down like that and it is going to consider
each being 1 set and now it’s going to merge any 2 sets that have the smallest closure linkage
distance. So this is going to be an example of it and it will continue. It won’t stop at k; it will
just build this tree until there is 1 cluster left. So this is the hierarchy called clustering of this
instance, but we want exactly k centers. So we are going to find the best pruning and we do that
by dynamic programming. What is a pruning? The pruning is that we will cut this tree at some
point and this would be the clusters left for it. We are going to do that and find the best cost
cluster of size k that’s left. Another pruning could be this and there are 3 clusters left at this
point.
So this algorithm works given one condition and that is if I did not merge any 2 clusters that
were incomplete together at some point. So when I look at 1 cluster at this point it’s either a sub
set of an optimal cluster or it has multiple complete clusters in it. It cannot merge half of one
cluster with another half of another cluster. So if such a condition holds then the algorithm
works by intuition. What I want to tell you is why it makes sense that this condition also works.
I am not going to give you a real proof, but I will give you an intuitive proof. Let’s take the
simpler case which is we are taking 2 sub sets from 2 different clusters and I want to tell you that
this is not going to merge. This merge is not going to happen at any point in the algorithm. Why
not?
So A might have the center in it or it might not have the center in it. If it does then good, if it
doesn’t let’s pick another A prime that does have the center of ci in it. By what I explained to
you 2 slides ago we know that the closure distance of these two is at most ri. If you want to
remember it’s because the coverage the ball keeps all of the points and by big center proximity
nothing is closer to points outside then the center. So definitely for A prime and A the closure
distance is ri. What about for a and b? Okay, so let’s see what the closure distance is. We will
pick one center, we will draw the points, the ball and now we look at the center that was in A.
What happens? The center is definitely closer to its points in its own cluster than the points
outside. This is by the coverage. We said that if we draw a small ball I only capture my points.
So the center is closer to a point outside. So this cannot satisfy the closure linkage.
So if I want the closure distance I have to go even farther and capture the whole set A, which
means that now I have to pay more than ri. This was just an example of the proof. This is not a
total proof because there are cases we didn’t cover. There are cases like, “So what if one of them
is a complete cluster?” This is a much more challenging situation. So I am not going to cover it,
but intuitively just because the larger the cluster is the larger is going to be the closure distance
and the properties that we have, big center proximity and coverage is going to help us with that.
So this shows us that we have 2 perturbation resilience is enough for k-center.
We also have lower bounds which say that there is no polynomial time algorithm for symmetric
instances of any 2- epsilon approximation stable clustering. So let’s remember that if we had
alpha approximation stability we had alpha perturbation resilience. So our lower bound is for the
stronger notion of stability than perturbation resilience. So it implies perturbation resilience
lower bound as well. It is also for symmetric, any symmetric instance is asymmetric as well. So
this result says that a bunch of our results are tight, which is the 2 perturbation resilience for
symmetric that I just showed you, the 2 approximation stability for asymmetric that I didn’t show
you, but it’s very similar to the 2 perturbation way. What is now known to be tight is the 3
perturbation resilience very symmetric. It would be very interesting if we figure this out, but we
don’t know right now if the algorithm we have is tight or not.
I said that perturbation resilience we are considering this scenario that the fluctuations do not
make any difference in the clustering. This is not a robust notion so we want to work with a
more robust notion of perturbation resilience which is called alpha epsilon perturbation
resilience, which means that alpha perturbation you are allowed to change by an epsilon. A little
bit of change in membership as long as you have alpha perturbations. In the paper we show that
a 4 epsilon perturbation resilience for k-center is enough and the nice thing is that actually single
linkage will give you that. It’s an elaborate proof. I think if you have like 6 epsilon is very
trivial, but 4 epsilon takes a little bit of time.
>>: [inaudible].
>> Nika Haghtalab: So you need your clusters to be larger than epsilon n, like 2 epsilon n or
something like that. You need some largeness of clusters assumptions. Actually we have lower
bounds that say if they are not large then it’s difficult.
>>: [inaudible].
>> Nika Haghtalab: Yes, you get the exact [indiscernible]. We have more results for the robust
notions of approximations to both of them that I’m not going to mention. Let’s go back and
think a little bit about what we showed. We started working with these natural stability notions
like approximation stability and perturbation resilience where the hope when these notions were
introduced was that if I have 1 + epsilon stability or 1 + epsilon perturbation resilience then I can
still find either approximate or exact solution, that was the hope.
So our lower bound kind of was negative in that sense even though, so upper bound we showed
that we can take a problem that doesn’t have a constant approximation ratio and show that 3
perturbation is enough. This was a very strongly positive result. Our lower bound is somewhat
negative. That says that even at 2 perturbation resilience or 2 approximation stability there is
simply not enough structure to find the optimal. So that means that going back we need to think,
hopefully this won’t happen for k-means and k-medians, k-center is much more sensitive to
distances. So the hope is that this will not happen for k-means and k-medians, but it would be
very interesting to see if you can even get less than 1 + root 2. That’s the best known result for it
right now.
If it happens for k-means or k-medians, or if not in general for k-center we do need a stronger
notion of stability. What would be a reasonable notion that’s not assuming way too much?
>>: So something should [inaudible].
>> Nika Haghtalab: Yes.
>>: [inaudible].
>> Nika Haghtalab: So for example 1 + epsilon approximation stability result [indiscernible], the
run time is also in epsilon and also the restriction on the cluster size they have is based on
epsilon. So there are definitely things that won’t work when epsilon is 0.
>>: [inaudible].
>> Nika Haghtalab: I don’t know about their case. I think they have multiple results, but I don’t
know. But generally the hope is that you want to be able to say, I mean 2 perturbations is kind of
a large perturbation, but 1 + epsilon you can justify very easily. So that’s sort of the hope and
this lower bound is somewhat negative because of that. So we do need, if we want to consider kcenter or even k-means and k-median, depending on what we get is that what would be a more
natural, stable stability result that’s not assuming way too much.
To conclude it would be interesting to see how asymmetric and stability work in general. So we
showed that for the asymmetric clustering objective how things worked, but in general there are
other asymmetric clustering objectives like what happens with asymmetric k-means or
asymmetric k-medians, that would be interesting. It is also interesting to see if the hardness that
we showed is it tight for also 3 perturbation asymmetric situations or not? There is a gap of 2 to
3 for perturbation resilience considering asymmetric. So it would be very interesting to see what
that actual value is. And with that I want to thank you. Are there any questions?
[Applause]
>>: [indiscernible].
>> Nika Haghtalab: Which property?
>>: [indiscernible].
>> Nika Haghtalab: I am not sure if I can guarantee that. The algorithm, for example the closure
distance algorithm is like given the properties its sort of a natural notion of distance, but if you
don’t know those properties it doesn’t seem like something very natural. So I’m not sure if it
will imply anything nice about the LP solution. But the properties that I proved definitely show
that they only hold for 2 perturbation resilience or 3 perturbation for that situation, because if
you go below that value those properties break.
And in fact we have a section in our paper that is trying to solve for k-center we got down from 1
+ root to 2, but what about k-means and k-median? So we notice that these properties break. So
we try to replace these properties assuming some additional things. So, maybe that’s sort of like
a first step to moving beyond k-center and moving beyond 1 + root 2 in the future, but it’s not
very clear. I don’t think it would be easy just to show that the LP is all of a sudden behaving
nicely. You have to analyze that separately I think.
>>: I agree, I mean one of the issues of these are not very robust. So for instance you talked a
little bit about this alpha epsilon resilience, I mean that seems much more interesting.
>> Nika Haghtalab: It is.
>>: But what you showed us is not alpha epsilon resilience.
>> Nika Haghtalab: That’s true. So the alpha epsilon implication is much more interesting, I
agree. The results are much less crisp. That’s why I didn’t show it to you.
>>: Do we know like there must be a separate between alpha epsilon and alpha zero?
>> Nika Haghtalab: No, we don’t know the separation for it. So the best result we have for kcenter is what I have, which is the 4 epsilon. We think you can do 3 epsilon, but I don’t know.
>>: [inaudible].
>> Nika Haghtalab: So the robust algorithm for the 4 epsilon is single linkage.
>>: Okay.
>> Nika Haghtalab: So it’s really showing that there are properties that hold.
>>: [inaudible].
>> Nika Haghtalab: No, no, no, single linkage.
>>: Okay.
>>: [indiscernible].
>> Nika Haghtalab: Yes, so that you have to do. If you don’t we have hardness results.
>>: [indiscernible].
>> Nika Haghtalab: Exactly, we have other sort of linkage style algorithms, like single linkage
plus some pre-processing or post processing that also is interesting because it’s a real algorithm.
>>: [indiscernible].
>>: But in practice would you solve the LP or would you actually run something like single
linkage?
>>: [indiscernible].
>> Nika Haghtalab: Yea, these are like n squared at most.
>>: [indiscernible].
>>: [indiscernible].
>> Nika Haghtalab: So you can’t, we have one result platform that you get a 2 approximation,
assuming I don’t remember exactly which one it is, but yes there is a setting that you don’t get an
exact, you get some approximation for slightly better perturbation resilience. So you can
definitely do that. Our hardness results of course don’t cover that so I think it would be doable.
I don’t know of a nice connection between them though. That would be cool to see.
>> Kostya Makarychev: More questions? Then let’s thank the speaker again.
[Applause]
Download