>> Yuval Peres: Good morning. Kind of a... e-mail I got that described Claire as the queen of...

advertisement
>> Yuval Peres: Good morning. Kind of a long introduction. Let me just quote from a recent
e-mail I got that described Claire as the queen of the approximation scheme. And I think we'll
see some of that today. Please.
>> Claire Mathieu: Thank you, Yuval. Thank you for inviting me. It's a pleasure to be here.
When I visit here, often I talk about [inaudible] algorithms or probability and algorithms. Today
I want to talk about something -- the algorithmic design side of me, which is really the main
focus of my research, designing approximation schemes for combinatorial optimization
problems.
Here's a laundry list of problems. These are all algorithmic problems. Some of them are packing
and scheduling type problems. Some of them are network design problems in the Euclidean
plane. Some of them are planar graph problems. Some are metric space problems. And then
some miscellaneous optimization problems.
Most of these problems are NP-hard. And the question is what do we do when we have an
optimization problem that we need to solve and yet it's NP-hard.
Our answer is to deal with such NP-hard problems, we design algorithms that are not exact, don't
give you the best answer, but they are pretty fast and they give you still good approximation
guarantees.
So most of this talk will be about approximation schemes. What is an approximation scheme?
It's an algorithm that runs in parallel time, and then it gives you -- it outputs a solution whose
value is very close to optimum. The relative error between the value of the output and the
optimum value is at most epsilon.
So when you have a maximization problem, the value of the output is at least 1 minus epsilon
times opt. When you have a minimization problem, the value of the output is at most 1 plus
epsilon times opt. The algorithm is parameterized by the epsilon.
We have an approximation scheme when for every epsilon there's an algorithm. So you tell me I
don't really want the exact solution since that's not achievable in polynomial time, but I want a 5
percent guarantee. You tell me that. I answer here's an algorithm. I'll guarantee the answer will
be within 5 percent of opt and the runtime is polynomial.
Now, if you say 1 percent, I'll give you a different algorithm. Slower, but polynomial time in the
size of the input. Of course the runtime depends on epsilon.
Okay. So throughout the talk I will use the following acronym, PTAS, polynomial time
approximation scheme. So you're going to see this word many times in the talk. In a sense, this
kind of result can be seen as a last step. That is, for many of these problems, before people had
designed three approximations, an algorithm whose output was at most three times the optimal
value with two approximations, and this is in a way the best we can hope for if P is different
from NP, in a sense.
All right. Let's get started. This problem is what got me interested in approximation algorithms.
You see this man?
>> [inaudible].
>> Claire Mathieu: What?
>> It looked like [inaudible].
>> Claire Mathieu: This is some random company. This is what they put on their Web page.
There's a man. There's some cloth. You order some clothes and they cut the cloth according to
the patterns. So you're cutting this cloth and they want -- they don't want to waste too much
tissue. Too much cloth. Sorry. Tissue.
So they're trying to [inaudible] the patterns put them on the cloth to minimize the total length of
cloth that they use. We have this strip. They want to put the patterns on these strips.
When did I get interested in this? It was many, many, many years ago I worked on tiling
problems. And we had algorithms for tiling regions with little tiles. And I was interested in
approximate tiling, what if the region is a little different, and then the exact algorithms break
down completely. So I -- that's what got me into approximation algorithms.
Okay. So here we go. Now this looks more mathematical, doesn't it. The input is a set of
rectangles. Each rectangle has a width and a height. The output, the width is at most 1, the
height is at most 1. The output a packing of these rectangles in this strip of width 1 and to
minimize the height used.
Of course you will say but clothes depend on that rectangle. Okay. Well, let's think about
cutting wood. Okay. And the result, this is joint work with Eric Remila, 1996, is a PTAS,
polynomial time approximation scheme.
And the tool comes from mathematical programming. It's a linear programming relaxation of the
problem. So here's one very simple instance. Every object has width either 4 or 3. The total -this strip has width 7. Then how do you solve the problem. Let me just go into the technical part
just for a little bit, for those of you who do research in theory so that you see there's some theory
here.
We can define a variable X 4, 3, which says if you intersect the solution with this line, you
intersect one piece of width 4, one piece of width 3. Now, given solution, when you slide this
line, how often do you meet one 4 and one 3. X 4, 3. X at 4, 3.
So these are your variables. Then you have constraints. For example, the objects of width 3,
sometimes when you draw your line, you need just one of those. Sometimes two of those.
Sometimes one of those and one of width 4. All in all, you have to capture -- get enough of them
to cover all the objects of width 3.
So here you have variables and constraints. This defines a linear program. And then you get to
work.
>> Why is this sufficient?
>> Claire Mathieu: Why is this efficient?
>> Why is this sufficient?
>> Claire Mathieu: Sufficient.
>> [inaudible].
>> Claire Mathieu: Well, you get rid of small objects. You run big objects. You only have a
[inaudible] so let's say you just have 4 and 3. If you only have two different widths, then you
only have a constant number, two different big widths. You only have a constant number of
variables, a constant number of configurations, and so you can just round each configuration up
to the nearest integer. Oh, I forgot. This is an asymptotic PTAS.
When up goes to infinity, you're just running a few variables. So it doesn't -- it's -- the increase
is negligible. Okay. So that's first strip packing.
>> [inaudible].
>> Claire Mathieu: [inaudible] not allowed to rotate. That's right. So I guess the cloth has some
texture. Or the wood. You know, you want to cut across the grain, not the other way.
All right. So once we saw that these mathematical programming techniques could be used for
two-dimensional packing problems, then we started wondering what other problems could be
amenable to this kind of approach. And I visited people at AT&T. And here's one problem that
they suggested to me. Dynamic storage allocation. It's an old problem that comes from dynamic
memory allocation in operating systems. And we're interested in this when I visited about ten
years ago because of their problems coming from Sonet rings. It's somewhat relevant to routing
weighted calls in Sonet networks. You have to schedule requests of limited durations in an
all-optical network. And to each request you assign a set of wavelengths, adjacent wavelengths.
And you have to do this with a limited number -- total number of wavelengths.
So Sonet rings, the network is a ring. Now, forget about that. Imagine cut it up and imagine it's
just one dimensional. Then we get exactly dynamic storage allocation.
So here's the math problem. This is the input. Rectangles. Something happened here. Each
rectangle corresponds to a request. There's a start time and an end time. Say starts at time 3,
ends at time 7.
This is time. This is wavelengths. Each request requests a certain number of wavelengths from
this start time to that end time. And you have to decide from time 3 to time 7 I will assign this
range of wavelengths to that request. And you can choose which range.
In other words, this is like the strip packing problem, except that these rectangles only slide
vertically. You're not allowed to slide them horizontally. Okay.
So that's the dynamic storage allocation problem. And what is the result? Well, if the maximum
height is much less than opt, so if no request requests, you know, 30 percent of a total amount of
wavelengths, then we can do it. We have a PTAS. That's joint work with Adam Buchsbaum,
Howard Karloff, Nick Reingold, and Mikkel Thorup, who were all at AT&T at the time. All
right. Can we do more? Can we solve other packing and scheduling problems using these -using mathematical programming techniques?
Well, there's one more problem on which I worked around that time with a student of mine -then a student of mine, Nicolas Schabanel, broadcast disk scheduling. This is a problem that is
related to Video on Demand. It's about asymmetric wireless communication, where there's a
much larger communication capacity from the information source to the recipient than in the
reverse direction.
For example, think about mobile clients who are trying to retrieve information from a server,
server base station, through wireless medium. When a user requests some item, some
information, the request doesn't actually get propagated through the system, but what happens is
that the request waits until the relevant information is broadcast, and then it gives that broadcast
information to the client. So it's pseudo interactive. The schedule of information is actually
oblivious to the clients.
Okay. So here's the math problem, kind of, the model. I like to think about it in terms of a radio
station. When I'm in my car, I turn on the radio. I can listen to news, sometimes there are sports,
I can listen to the weather, and occasionally I'm interested in traffic information.
Now, when people turn on their radio, usually there's something they have in mind. They want
to hear one of these things. Let's say that the people who turn on their radio because they want to
listen to the news, let's say it's Poisson process at a certain known rate.
The people who want to listen to sports, some other Poisson process, and so on. There are four
different kinds of users, each with its own Poisson process.
Now, what is the problem? The radio station has to decide what to broadcast when. For
example, news every ten minutes. Now, what to broadcast when? What for? Well, one goal
could be to minimize the average response time, the listeners' waiting time, how long they have
to wait before they get what they want to hear.
Okay. So this is the problem that we worked on. Of course you can criticize the model in many
ways, but this one has a clean solution. A PTAS designed jointing with Nicolas Schabanel and
Neal Young in 2000, ten years ago already.
So these are just examples of packing scheduling problems for which mathematical
programming and a little bit of probability in this case can help you design a good algorithm.
That was the first part of the approximation schemes I wanted to show, approximation schemes
for packing and scheduling.
Now, actually this is one part of combinatorial optimization, but a big part of combinatorial
optimization has do with graph problems, graph optimization problems, network design
problems.
And for those problems, a big event happened about 15 years ago when Sanjeev Arora designed
an approximation scheme for geometric problems. Let's say you have an optimization problem
where your data is a Euclidean plane, and then you want to find the best, say, TSP. How can you
do that? It used to be you only have constant factor approximations. Since Arora and Mitchell's
work, there is an approximation scheme for the Traveling Salesman Problem in the Euclidean
plane.
So this came out about 15 years ago, and I was so impressed by that paper, those results. In fact,
they just got the Gödel Prize for this a few months ago. So I'm not the only one who was
impressed.
So we all wanted to read about this. And the nice thing is that Arora's data contained not just a
result but a technique. And the technique could be used not just for the Traveling Salesman
[inaudible] but for many other problems where the input consists of points in the plane.
>> [inaudible].
>> Claire Mathieu: Mitchell [inaudible]. Joe Mitchell from Stony Brook.
>> [inaudible].
>> Claire Mathieu: Equivalent. Equivalent. So here's the technique. It's very simple. It's a
quadtree decomposition of a plane, a hierarchical decomposition of a plane. You have points in
there. You cut your area into fours repeatedly. Recursive decomposition. And then you use
some dynamic programming and you do some rounding, you add some structure, and you might
be unlucky. You might be unlucky. You might happen to cut your input just at the wrong place,
at the place where you must not cut because important things happen just there.
So to avoid that, you do a random shift. So a little bit of randomness plus a hierarchical
decomposition gives you the ideas that yield approximation schemes for problems in the plane.
Plus some work.
Okay. So -- yes.
>> Can you just mention what's special about the plane?
>> Claire Mathieu: What's special about the plane?
>> Yeah. What happens in the plane and not the [inaudible]?
>> Claire Mathieu: The methods extend to three dimensions to a constant number of dimensions
for most problems. The problem is to use dynamic programming, you need to have a small
interface between the subproblem that you're solving and the outside. In the geometry case, the
interface is the boundary of the area where you're solving the subproblem. In the Euclidean
plane, the length of this boundary is small compared to the area inside. That's not true in high
dimension. So that enables you to use all sorts of rounding, approximation tricks along the
boundary, and that's only in small dimension.
Okay. So I -- since I was so interested in that technique, I wanted to see what problems could be
solved using this that had not been solved by Sanjeev Arora. And there's one problem that was
solved a while ago, the Steiner tree problem. The Steiner tree problem is a network connectivity
problem. You're given points. You want to connect them [inaudible] using extra points if you
want, using minimum total length. This is a real design problem, if you will.
Now, if you want everything to be connected, that's the Steiner tree. If you just want to connect
this group together, that group together, that group together, it's a Steiner forest problem. You
could say if I have a Steiner forest problem, say I want to connect the red points to one another,
the blue points to one another, purple to one another, green and orange, each color corresponds
to a group that you want to be connected, you could say let's just take each group and for each
group we do the Steiner tree problem, we solve the Steiner tree problem. That problem has a
solution.
But that's not necessarily what you want to do. Look at this case. The green and the purple
groups, it makes sense to combine them, and you're actually saving length if you use this edge
both for the green and the purple connected place, for connecting them.
So that means you have to decide which groups should go together, the purple and green, blue
and red, orange alone, before solving the problem. That makes it much more difficult.
Now, using Arora's hierarchical decomposition approach, in collaboration with Cora Borradaile,
who was a graduate student at Brown at the time, and Philip Klein from Brown, we designed a
PTAS for this problem.
Then I thought since we understand how this hierarchical decomposition works, maybe -- are we
done exploring its potential. We looked at a survey written by Arora on various problems solved
with this approach. And at the end there were two problems that kind of stood out. He said the
last section was problems that we have not been able to solve with this approach, minimum
triangulation vehicle routing. I told my students -- my student, Aparna Das, let's think about
vehicle routing. Minimum weight triangulation got solved very recently. And for this problem,
we couldn't quite get a PTAS. We got a quasi PTAS.
What's a quasi PTAS? It means the running time is not quite polynomial. It's quasi polynomial
time. What does quasi polynomial time mean? It means not N to a constant, N to almost a
constant, some logarithm. N to polylog N. Okay.
Now, N -- running time N to the polylog N, that sounds pretty terrible, if you want to implement
it. But actually the first draft of Arora's work on TSP, the very first draft of his data was not a
PTAS, it was a quasi PTAS. A quasi PTAS can be viewed as a strong indication that there exists
a PTAS. So it's like an intermediate step. My expectation is that in one year or two or three
someone will take this and make it into a PTAS. We were just missing one idea.
Okay. And what is this problem? I haven't defined this problem that we solved. Vehicle
routing. Here's a bus. Here are clients. Okay. They all want to travel to this place, this depot,
but the bus has a capacity of ten, minibus. So the bus goes to pick up these ten people, then goes
back here. Those ten goes back here. Those ten goes back here. Three trips.
>> Is this Austin?
>> [inaudible].
>> Claire Mathieu: You know, my student designed this slide. She must have had a reason. I
guess I should have put it there. Yes.
>> [inaudible] there was a student who used the [inaudible] to get quasi PTAS. Do you know
the name of the student?
>> Claire Mathieu: Arora's student?
>> [inaudible].
>> Claire Mathieu: Quasi PTAS for what problem?
>> For Euclidean TSP. For using this decomposition. It was quasi polynomial.
>> Claire Mathieu: You mean before Arora's result?
>> They worked together. I don't know. I mean, Arora mentioned his talk [inaudible].
>> Claire Mathieu: I don't know. Sorry. I'm not sure.
Now, of course in terms of epsilon, the runtime is horrendous. Now, we've worked since then on
extensions of this result. But it's actually -- it's not easy. If you think about it, to get -- how do
you -- why do you get -- one second. One [inaudible]. Why do you get a quasi PTAS instead of
a PTAS? It all depends on how much information you want to have at the interface. If you can
store it with N to the log N bits, so if you have, say, N locations, each of them you want a
number between -- log N locations, each of them you want to have a number between 1 and N,
that gives you N to the log N. That's typical. Yes.
>> So what was the goal in this? Was it to minimize the number of loops or the whole length?
>> Claire Mathieu: The lengths. The lengths. You want to minimize the total length of your
routes. So if K equals N, then it's TSP.
All right. So there is -- yes.
>> Can we go back for one second? I may be asking a very stupid question.
>> Claire Mathieu: That's okay.
>> So I apologize a head of time. So you were saying that there's a quasi PTAS and you feel that
there may be a PTAS.
>> Claire Mathieu: Yes.
>> And I was wondering about the lower bounds for these types of things. I'm familiar with
lower bounds for approximation algorithms where you say there's no constant time [inaudible].
>> Claire Mathieu: Yes.
>> Do people prove that quasi PTAS is the best that can be done?
>> Claire Mathieu: It's just a feeling. It's not -- oh. Is there -- no. We don't have a -- we don't
have any -- we don't have any natural problem for which there's a quasi PTAS and there's no
PTAS. There's ->> And there's provably no [inaudible].
>> Claire Mathieu: And there's provably no PTAS. No. This is really -- everybody believes
that this means a PTAS is waiting to be discovered.
>> That's because this -- so is this an inconceivable thing to prove this? Or is it just that we
haven't found the technique yet?
>> I mean, cannot be shown that it's NPX-hard [inaudible].
>> Claire Mathieu: If you believe P is different from NP, then -- then this -- this -- these are
all -- these all can be viewed -- if you don't care about implementation, these all can be viewed as
complexity results. All of these results say these problems are not NPX-hard. You cannot prove
that there cannot exist a 1.1 approximation algorithm. Double negatives.
>> [inaudible].
>> Claire Mathieu: Yes. Yes. All right. So these Euclidean problems we worked on, we used
Arora's framework. Now, there is this intriguing phenomenon that has happened recently, which
is that all these optimization problems and graphs, they can be defined when points are in the
Euclidean plane. You can also define versions when points are in planar graphs. There are some
settings such as road networks where it makes more sense to look at distances as being computed
along a planar graph rather than Euclidean distances.
Well, there's another problem for which an approximation scheme we design for one and then for
the other, in one setting and then in the other setting. And Phil Klein and I, we have the
impression that the two frameworks are related; that when a problem can be solved
geometrically, there's a good chance it can be solved in planar graphs. Planar graphs are a little
bit more difficult than Euclidean.
So we've started pursuing this program. We want to show that combinatorial optimization
problems can be solved in planar graphs. Our first result in that direction was for the Steiner tree
problem. Cora Borradaile, Phil Klein and myself designed a PTAS for the Steiner tree problem
when your points that you want to connect are in a planar graph.
And to do that we designed a structure called a brick decomposition. We have the hope that the
structure will be analogous in planar graphs to the hierarchical decomposition in geometric
settings. So we would like to use it again and again and again as a building tool for algorithms in
planar graphs.
So there's a planar graph here. These are the vertices. And you see this is a subgraph of your
planar graph. And it's composed of bricks. And each brick, this is almost the shortest path. It's
a near shortest path. And somehow the structure inside makes it easy to solve the problem
inside. And then you can use some standard techniques to solve the problem.
So that's the grand plan. So so far we've done Steiner tree. Then other people did Steiner forest.
And now we're working on multiway cut. Multiway cut. There's a planar graph that I didn't
draw. I only drew the vertices. There's some special vertices, your terminals. You want to
disconnect them from one another. So I drew this in a [inaudible]. You want to draw cycles to
separate these terminals -- terminal faces from one another. So you see the multiway cut
solution. It has a bunch of connected components, each of which is two connected subgraph.
We may have -- we are on our way to an approximation scheme for this after three years of
work.
And after that there's some other problems that we want to solve in planar graphs. And if each
problem takes us a year, maybe in four or five years we'll get to Markov random fields.
You have an image, you want to do image segmentation. You want to partition the pixels,
separate them into regions and the minimize the cost. Well, the cost is for two adjacent pixels,
there's a certain cost depending on where you want the image, to color them, give them different
labels. And each pixel, you already have some partial idea, some idea of what its color should
be. So there's a cost to assigning a label to each pixel and to assigning different labels to
adjacent pixels.
Now, then you want to find the best -- the best way to label the image, so the best way to
partition the image into regions, and hopefully, if all goes well in our program, in a few years
we'll have an approximation scheme for that.
>> Sounds like the ground stage for a busy model with external fields or [inaudible].
>> Claire Mathieu: It's not easy because you have many different labels.
>> [inaudible].
>> Claire Mathieu: And each pixel has a different function, cost function. All right. So I've
talked about geometric problems, I've talked about planar graph problems where distances are
distances in the graph. Now, what is more general than this, whenever you have distances,
general metric spaces. Is what beyond our ability? General graph problems. That's pretty hard.
Is there anything we can do when there's a general metric for combinatorial optimization
problems? Yes. Let's take one of the most fundamental problems of combinatorial optimization,
max cut.
Max cut. What is that problem? Everyone who's in theory knows what max cut is. You want to
take a graph ->> [inaudible].
>> Claire Mathieu: No, that's okay. Thank. You have a graph. You want to partition its
vertices into two parts so as to maximize the number of edges crossing the cut. That's max cut.
This is one of the original twelve NP-complete problems in Karp's famous paper.
Max cut is such a beautiful problem. It's been the source of so many new ideas for algorithmic
design. It's been the first problem for which people used semidefinite programming to design
good approximation algorithms. It's been the object of interest for lower bounds, for
consequences of a unique games conjecture, for the sampling complexity in dense graphs. And
so we know it's a good problem. It's an inspiring problem.
So let's try this in a metric space. You have points in a metric space. The edges have lengths.
The lengths satisfy the triangle inequality complete graph. Partition the nodes, the vertices into
two sets to maximize the sum of the lengths going across.
For that problem, in joint work with [inaudible] Fernandez de la Vega we designed a PTAS. The
solution, it's a randomized algorithm. It's based on sampling. Sampling. That's an old idea. It's
based on importance sampling. The key idea is we don't want to just take a random sample of a
vertices uniformly. We have a metric space. Let's use the metric.
If you think about it, think about the max cut problem. Imagine all these points are together
except one who is really, really far. How do you want to cut and catch as many long edges as
possible? You want to have this point on one side and the other vertices on the other side. This
point is very important. You do not want to miss it. It's more important than the others.
So when you sample your vertices, you have to catch this point. Otherwise, the sample will not
give you the right image of what's happening. Therefore, therefore, let's sample vertices not
uniformly but with probability proportional to the average distance from a vertex to the rest of
the vertices of this space.
>> [inaudible] for max cut the best approximation is [inaudible] and here you're doing a PTAS
for a generalization?
>> Claire Mathieu: Points are in the metric space, so the graph is complete. It's a complete
graph with lengths that are the metric. So that's a -- it's a special ->> Okay. It's not ->> Claire Mathieu: It's a special case.
>> It's any metric space or ->> Claire Mathieu: It's a metric space.
>> It's not on the Euclidean sphere or anything?
>> Claire Mathieu: Any metric. The only property we use is a triangle inequality.
>> But you can't capture ->> Is it sort of like SDP solution?
[multiple people speaking at once].
>> Claire Mathieu: Here the triangle inequality, it's not in constraints that we put in, it's in the
input.
>> But you have -- but you have all edges, not just ->> Claire Mathieu: All edges. Yes. Yes. Okay. So when we designed it we thought -- and we
designed this we thought this is a neat idea to use this bias sampling, but then we learned
afterwards that this is a very old technique from statistics. It's a variance reduction technique.
We do this sampling and then we have to adjust all our numbers when we do our estimates. And
it's just -- it's been known in statistics for 30 years. Or maybe more.
>> [inaudible].
>> Claire Mathieu: Yes. Yes. That's true. Yeah, that's true. It's another example.
>> [inaudible] this compare to algorithms for dense graphs that de la Vega and [inaudible] ->> Claire Mathieu: Yes. It's -- actually, it's closely related. You can -- it's actually -- it can
apply to dense graphs, but in dense graphs every point -- well, almost every point has the same
importance because in the graph, the importance of a point is its degree. If a graph is dense, all
the points that matter have degree of [inaudible] constant times N, so they all have about the
same importance. So if you take this, you try to cast it in the framework of dense graphs, you
actually get the dense graph algorithm of Fernandez de la Vega.
Okay. I think I want to mention that this -- this looks like a very abstract problem, but actually
using the same techniques we can solve some clustering problems. So if you have points, you
have distances, you want to group them into clusters so as to maximize the distance between the
clusters or minimize the sum of distances inside the clusters, the various generalizations for
various generalizations, we also have a PTAS.
Now, I have 12 minutes left?
>> Yuval Peres: 17.
>> Claire Mathieu: What?
>> Yuval Peres: 17.
>> Claire Mathieu: 17 minutes. Okay. Great.
>> [inaudible].
>> Claire Mathieu: No problem. All right. So I talked about packing and scheduling problems.
I talked about graph problems on geometric graphs, on planar graphs in metric spaces. Now I
want to talk about miscellaneous problems for which it's also possible to design approximation
schemes using perhaps slightly different techniques.
I thought Eric Horvitz was going to be here, and I noticed that he's worked on data compression,
so I thought he might enjoy hearing about an approximation scheme for data compression. So,
Eric, if you're here, now is the time to pay attention. Okay.
You know, this is -- this is Morse code. I work on obsolete technology. I work on
approximation schemes with applications to telegraphs. Okay. You all know about Huffman
coding. In Huffman coding, you have words and you want to code them with, say, 0s and 1s,
some alphabet. And you want to minimize the average length of your text when everything that
you want to encode has a certain frequency.
Okay. Morse code is a way to encode the Latin alphabet. But, you see, you have dots, you have
dashes. If you're on a boat, you use your lamp, dots are short, dashes are long. Dashes take
longer than dots. That's why E, which is the most frequent letter in the alphabet, is encoded by a
dot, not a dash.
So this is a twist on Huffman coding. Huffman coding, when the letters are used to encode, have
different lengths. According to [phonetic] Mordecai Golin, this also has applications to run
length limited codes used in magnetic and optical storage. Code was a binary. Each one must be
presented by some number of zeros between A and B zeros. And so each atomic piece you
encode with has different lengths.
So if you have, say, four words where frequency is 1/3, 1/3, 1/6, 1/6, that adds to 1. With regular
Huffman coding, where you have two letters for encoding, each of cost 1, this is your tree.
With encoding with two letters, every A cost 1, every B cost 3, cost 3, kind of like a dot and a
dash, then this is the tree. You want to avoid two dashes. So that's the optimal tree.
All right. So we made some observation that we could solve the relaxed problem. So in all these
problems we want an encoding that is prefix free. We want every two codewords -- no two
codeword to be one prefix of the other.
Well, let's relax the problem and say that the beginning of the words should be prefix free, but in
the end we are allowed to have two words encoded by the same -- if they have a long path that is
exactly equal, they're both very long and they only differ at the very end, then it's okay. So we
only put the prefix-free constraint at the beginning.
Then if we can solve that problem, then we can convert the result into a truly prefix-free coding.
So we reduce the problem to an easier problem, and then we can use standard approximation
techniques and design a PTAS. Joint work with Mordecai Golin, Neal Young. All right. That
was the bit about data compression.
And finally there are two more problems I want to talk about that have kind of approximation
schemes, rank aggregation and correlation clustering. Both of these projects are joint with my
ex-graduate student, Warren Schudy, who just graduated.
Let me talk about rank aggregation. I was talking to Susan Holmes just two days ago. She was
visiting here. She does statistics. She said rank aggregation, everybody is interested in rank
aggregation. It's such a hot topic.
Look at these. You want to have an algorithm to detect cancer cells. The algorithm they have
work well if they have a good training set to start with. What's a good training set? It's initially
training the algorithm with hard images, images where it's hard to find features.
Okay. So let's try to look at a bunch of images where we try to find features, you know, ants
here, tails of fish there, and let's see how hard it is to detect the features.
We need to rank these images by difficulty of feature detection before choosing which ones to
feed to the algorithm as a training set.
The way you do that is you get humans, you get them to look at these images, try to detect the
features, you look at the pairs of images, you look at how many features they got right here, how
many features you got right there, you -- it gives you a comparison on the difficulty of these two
images. And once you have all these comparisons, then you try to deduce a global ranking of
your images. So that's just one example of the kind of setting where this problem comes up.
Now, for the mathematical model for the problem that we worked on. There are some variants,
but this is the problem that we solved.
Each committee member is asked to rank four candidates. This person puts Alice first, Bob
second, Charlie third, and delta fourth. This person uses that ranking. This ranking, that
ranking. Now, the committee gets together and you want to output the best possible aggregate
ranking. How do you do that? Well, you're all working in good faith. You get along with one
another. So you agree on a measure. You just want to produce a ranking that minimizes
inversions. What do I mean? There will be a certain output ranking. This output ranking will be
at a certain distance from here in terms of the number of inversions, from there, from there, from
there, that's minimized a total number of inversions overall. Committee ranking.
All right. Very simple definition. How do we find that best ranking? It's NP-hard. Therefore,
approximation algorithm to the rescue. So let's do it. Let's start with a reasonable ranking.
There are several constant factor approximations known. Example. Let's look at the average
rank of each person. Let's sort them by average rank. Alice has ranked 1, 3, 2, 1. Average is
7/4. This would be the best, he'll be number 1, and so on. That's what we start with. It's a
constant factor approximation. For some instances, it doesn't give you a good ranking, but it's
reasonable. And then we do very simple. Everything we've learned in algorithms as
undergraduates. Divide and conquer, except we don't cut in the middle, we cut in random places
because we don't want to cut in what might have to be a critical spot. And then at some point we
say, okay, this is small enough, let me go to the best case. It switch to a different algorithm that's
based on sampling. I take this base case, and with a less efficient algorithm, I find a near optimal
ranking, with a small additive error. Combining all this together, you get a PTAS.
What's funny is that all these are elementary techniques. This is really an algorithm that could
have been designed many years ago. 30 years ago all the techniques were there. It's just a matter
of putting them together just right.
My last example is a case where we fail. Correlation clustering. My first grant proposal in the
U.S. I said I want to design approximation schemes. I want an approximation scheme for a
correlation clustering. The answer from the reviewers say it's provably impossible. And the
grant proposal was rejected. Okay. So I changed my grant proposal, but I did not forget about
the problem. I still wanted to solve it. So let me define it for you. And then let me tell you how
I solve it even though it's provably impossible.
Here's the input. It's a complete graph. Every edge has a label. Similar, dissimilar. When my
student -- when Warren gives a talk, he has pictures of cats and dogs, but I've seen them so many
times, I got tired of cats and dogs. So this is my very own handwriting.
Now, you -- as humans, you know what these are. They are 2s and 3s. Is there a program that
can recognize that these are 2s and those are 3s? Let's see. Let's look at pairwise comparisons.
These kind of look the same. Those kind of look the same. And these, they kind of look the
same too. Now, that's too bad.
Okay. So that's the input, this complete graph. This is the output, a partition of the data into two
classes or any number of classes of parts according to similarity. And what is the value, how
good is this partition. Well, let's see how much it agrees with the input. These are in different
parts. The edge says they're different. Very good. Same thing here. Very good. These are in
different parts, but the edge says they're the same. Not good. They won. These are in the same
part, but the edge says they're different. Not good. They won. Total, 2. The cost is the number
of edges where there's a discrepancy between the input and the output. And this is what we're
trying to optimize in correlation clustering.
All right. If P is different from NP, no approximation scheme possible, NPX-hard.
>> [inaudible] no PTAS?
>> Claire Mathieu: No PTAS. There's a 2.25 approximation, I think. But I like to have epsilon
relative error. So I add assumptions, add some probabilistic assumption. Let's think of these
objects. When I wrote those numbers, these symbols, I actually had numbers in mind. I thought
I was writing a 2 or a 3, that's the ground truth. When you have images of dogs and cats on the
Web that you retrieve, these are pictures that have been taken of an animal is actually a real dog
or a real cat. That's the ground truth.
Let's assume the input is a noisy version of some unknown ground truth. Let's assume that for
every pair of images, when you compare them, you get the right answer except when you don't.
With some probability P, the answer is flipped.
>> [inaudible] means you have one dog-like looking cat and you have [inaudible] edges.
>> Claire Mathieu: [inaudible] this question. I don't know -- I don't know what to do when
there's no independence. I don't know what to do when the answers are dependent. We don't
have a good setup.
>> [inaudible] independent errors on the vertices [inaudible].
>> Claire Mathieu: Okay. Here's a different way to model the prime. Each of your objects can
be seen as some multidimensional vector, and then between two vectors there's a probability
that -- a measure of similarity that your answer will be equal or dissimilar. That depends on how
close these vectors are.
But then what is the ground truth, what does it mean that a vector is a dog or a cat? What is your
goal partition? It's not obvious if you just have vectors.
>> [inaudible] you're doing here, you're not only changing the model but changing the goal,
right? You want to discover the ground truth ->> Claire Mathieu: I want to discover the ground truth or the max likelihood clustering, the max
likelihood clustering. With these assumptions they actually -- they agree for most values of P.
So what we can prove is -- so first we have a PTAS if P is not too close from one half. If P is
one half, it is just a random input. There's no chance we can get the ground truth. But as long as
it's bounded away from one half, then we have a PTAS.
And more than that, we could actually get the exact ground truth, not approximate but exact, if
all the clusters are large enough. And the answer is based on semidefinite programming. And I
will skip this. This is a quadratic formulation instead of linear programming, and it's great for in
optimization problems. So this is one application where it really gives you the right answer.
Now, this is our theory, but my student worked with Micha Elsner who's a graduate student in
natural languages group at Brown. And they tried to use the same idea, these ideas, to do
correlation clustering on natural languages. And they solved the SDP. And they got this kind of
matrix for the solution. And you look at this, you see, oh, there's a cluster, there's another
cluster. And these are just small groups. So the clustering comes out naturally and it's
surprisingly efficient.
>> So how do you [inaudible] is it easy?
>> Claire Mathieu: We -- actually, we take the SDP solution and then we use the Constant
Factor Approximation algorithm for rounding. Because the SDP solution itself can be seen as an
instance of correlation clustering, and then we can use a Constant Factor Approximation
algorithm as a black box. But there's no time to discuss this.
Now, in my remaining 45 seconds, I'd like to talk about the future. First of all, if you look at the
kind of techniques that I used to solve these optimization problems, you see that aside from basic
algorithms and elementary techniques, they borrow from two adjacent fields. One is probability
theory, and the other one is mathematical programming. So we need experts from both of these
fields to help us design good algorithms. Each is necessary, but neither is sufficient.
Now, in terms of what I would like to work on later, besides these various problems that I
mentioned on the way on the planar graphs, I would like to better understand the power of these
techniques for linear programming and for semidefinite programming, in particular
lift-and-project is a way to enhance the techniques. How powerful is that? That's a very
intriguing, hard question.
And the other direction is I would like to work on probabilistic models of hard-to-approximate ->> It's so hard ->> Claire Mathieu: -- optimization problems. Okay. That's it. Thank you.
[applause].
>> Yuval Peres: Any additional questions?
>> What exactly do you mean by probabilistic models of [inaudible]?
>> Claire Mathieu: Kind of like correlation clustering. When you have a problem that is
NPX-hard, can you design a model to generate the input, add some assumptions, so that it
becomes -- so that this restricted version is easy to approximate.
But of course you have to have a model that makes sense. In the case of correlation clustering, it
makes sense to assume there's some underlying ground truth. So for those problems it's natural.
For other problems it's not so clear what is the right -- what the right model is.
>> Maybe it's the random ordering version [inaudible].
>> Claire Mathieu: Yes. Yes.
>> Yuval Peres: Okay.
[applause].
Download