>> Yuval Peres: Good morning. Kind of a long introduction. Let me just quote from a recent e-mail I got that described Claire as the queen of the approximation scheme. And I think we'll see some of that today. Please. >> Claire Mathieu: Thank you, Yuval. Thank you for inviting me. It's a pleasure to be here. When I visit here, often I talk about [inaudible] algorithms or probability and algorithms. Today I want to talk about something -- the algorithmic design side of me, which is really the main focus of my research, designing approximation schemes for combinatorial optimization problems. Here's a laundry list of problems. These are all algorithmic problems. Some of them are packing and scheduling type problems. Some of them are network design problems in the Euclidean plane. Some of them are planar graph problems. Some are metric space problems. And then some miscellaneous optimization problems. Most of these problems are NP-hard. And the question is what do we do when we have an optimization problem that we need to solve and yet it's NP-hard. Our answer is to deal with such NP-hard problems, we design algorithms that are not exact, don't give you the best answer, but they are pretty fast and they give you still good approximation guarantees. So most of this talk will be about approximation schemes. What is an approximation scheme? It's an algorithm that runs in parallel time, and then it gives you -- it outputs a solution whose value is very close to optimum. The relative error between the value of the output and the optimum value is at most epsilon. So when you have a maximization problem, the value of the output is at least 1 minus epsilon times opt. When you have a minimization problem, the value of the output is at most 1 plus epsilon times opt. The algorithm is parameterized by the epsilon. We have an approximation scheme when for every epsilon there's an algorithm. So you tell me I don't really want the exact solution since that's not achievable in polynomial time, but I want a 5 percent guarantee. You tell me that. I answer here's an algorithm. I'll guarantee the answer will be within 5 percent of opt and the runtime is polynomial. Now, if you say 1 percent, I'll give you a different algorithm. Slower, but polynomial time in the size of the input. Of course the runtime depends on epsilon. Okay. So throughout the talk I will use the following acronym, PTAS, polynomial time approximation scheme. So you're going to see this word many times in the talk. In a sense, this kind of result can be seen as a last step. That is, for many of these problems, before people had designed three approximations, an algorithm whose output was at most three times the optimal value with two approximations, and this is in a way the best we can hope for if P is different from NP, in a sense. All right. Let's get started. This problem is what got me interested in approximation algorithms. You see this man? >> [inaudible]. >> Claire Mathieu: What? >> It looked like [inaudible]. >> Claire Mathieu: This is some random company. This is what they put on their Web page. There's a man. There's some cloth. You order some clothes and they cut the cloth according to the patterns. So you're cutting this cloth and they want -- they don't want to waste too much tissue. Too much cloth. Sorry. Tissue. So they're trying to [inaudible] the patterns put them on the cloth to minimize the total length of cloth that they use. We have this strip. They want to put the patterns on these strips. When did I get interested in this? It was many, many, many years ago I worked on tiling problems. And we had algorithms for tiling regions with little tiles. And I was interested in approximate tiling, what if the region is a little different, and then the exact algorithms break down completely. So I -- that's what got me into approximation algorithms. Okay. So here we go. Now this looks more mathematical, doesn't it. The input is a set of rectangles. Each rectangle has a width and a height. The output, the width is at most 1, the height is at most 1. The output a packing of these rectangles in this strip of width 1 and to minimize the height used. Of course you will say but clothes depend on that rectangle. Okay. Well, let's think about cutting wood. Okay. And the result, this is joint work with Eric Remila, 1996, is a PTAS, polynomial time approximation scheme. And the tool comes from mathematical programming. It's a linear programming relaxation of the problem. So here's one very simple instance. Every object has width either 4 or 3. The total -this strip has width 7. Then how do you solve the problem. Let me just go into the technical part just for a little bit, for those of you who do research in theory so that you see there's some theory here. We can define a variable X 4, 3, which says if you intersect the solution with this line, you intersect one piece of width 4, one piece of width 3. Now, given solution, when you slide this line, how often do you meet one 4 and one 3. X 4, 3. X at 4, 3. So these are your variables. Then you have constraints. For example, the objects of width 3, sometimes when you draw your line, you need just one of those. Sometimes two of those. Sometimes one of those and one of width 4. All in all, you have to capture -- get enough of them to cover all the objects of width 3. So here you have variables and constraints. This defines a linear program. And then you get to work. >> Why is this sufficient? >> Claire Mathieu: Why is this efficient? >> Why is this sufficient? >> Claire Mathieu: Sufficient. >> [inaudible]. >> Claire Mathieu: Well, you get rid of small objects. You run big objects. You only have a [inaudible] so let's say you just have 4 and 3. If you only have two different widths, then you only have a constant number, two different big widths. You only have a constant number of variables, a constant number of configurations, and so you can just round each configuration up to the nearest integer. Oh, I forgot. This is an asymptotic PTAS. When up goes to infinity, you're just running a few variables. So it doesn't -- it's -- the increase is negligible. Okay. So that's first strip packing. >> [inaudible]. >> Claire Mathieu: [inaudible] not allowed to rotate. That's right. So I guess the cloth has some texture. Or the wood. You know, you want to cut across the grain, not the other way. All right. So once we saw that these mathematical programming techniques could be used for two-dimensional packing problems, then we started wondering what other problems could be amenable to this kind of approach. And I visited people at AT&T. And here's one problem that they suggested to me. Dynamic storage allocation. It's an old problem that comes from dynamic memory allocation in operating systems. And we're interested in this when I visited about ten years ago because of their problems coming from Sonet rings. It's somewhat relevant to routing weighted calls in Sonet networks. You have to schedule requests of limited durations in an all-optical network. And to each request you assign a set of wavelengths, adjacent wavelengths. And you have to do this with a limited number -- total number of wavelengths. So Sonet rings, the network is a ring. Now, forget about that. Imagine cut it up and imagine it's just one dimensional. Then we get exactly dynamic storage allocation. So here's the math problem. This is the input. Rectangles. Something happened here. Each rectangle corresponds to a request. There's a start time and an end time. Say starts at time 3, ends at time 7. This is time. This is wavelengths. Each request requests a certain number of wavelengths from this start time to that end time. And you have to decide from time 3 to time 7 I will assign this range of wavelengths to that request. And you can choose which range. In other words, this is like the strip packing problem, except that these rectangles only slide vertically. You're not allowed to slide them horizontally. Okay. So that's the dynamic storage allocation problem. And what is the result? Well, if the maximum height is much less than opt, so if no request requests, you know, 30 percent of a total amount of wavelengths, then we can do it. We have a PTAS. That's joint work with Adam Buchsbaum, Howard Karloff, Nick Reingold, and Mikkel Thorup, who were all at AT&T at the time. All right. Can we do more? Can we solve other packing and scheduling problems using these -using mathematical programming techniques? Well, there's one more problem on which I worked around that time with a student of mine -then a student of mine, Nicolas Schabanel, broadcast disk scheduling. This is a problem that is related to Video on Demand. It's about asymmetric wireless communication, where there's a much larger communication capacity from the information source to the recipient than in the reverse direction. For example, think about mobile clients who are trying to retrieve information from a server, server base station, through wireless medium. When a user requests some item, some information, the request doesn't actually get propagated through the system, but what happens is that the request waits until the relevant information is broadcast, and then it gives that broadcast information to the client. So it's pseudo interactive. The schedule of information is actually oblivious to the clients. Okay. So here's the math problem, kind of, the model. I like to think about it in terms of a radio station. When I'm in my car, I turn on the radio. I can listen to news, sometimes there are sports, I can listen to the weather, and occasionally I'm interested in traffic information. Now, when people turn on their radio, usually there's something they have in mind. They want to hear one of these things. Let's say that the people who turn on their radio because they want to listen to the news, let's say it's Poisson process at a certain known rate. The people who want to listen to sports, some other Poisson process, and so on. There are four different kinds of users, each with its own Poisson process. Now, what is the problem? The radio station has to decide what to broadcast when. For example, news every ten minutes. Now, what to broadcast when? What for? Well, one goal could be to minimize the average response time, the listeners' waiting time, how long they have to wait before they get what they want to hear. Okay. So this is the problem that we worked on. Of course you can criticize the model in many ways, but this one has a clean solution. A PTAS designed jointing with Nicolas Schabanel and Neal Young in 2000, ten years ago already. So these are just examples of packing scheduling problems for which mathematical programming and a little bit of probability in this case can help you design a good algorithm. That was the first part of the approximation schemes I wanted to show, approximation schemes for packing and scheduling. Now, actually this is one part of combinatorial optimization, but a big part of combinatorial optimization has do with graph problems, graph optimization problems, network design problems. And for those problems, a big event happened about 15 years ago when Sanjeev Arora designed an approximation scheme for geometric problems. Let's say you have an optimization problem where your data is a Euclidean plane, and then you want to find the best, say, TSP. How can you do that? It used to be you only have constant factor approximations. Since Arora and Mitchell's work, there is an approximation scheme for the Traveling Salesman Problem in the Euclidean plane. So this came out about 15 years ago, and I was so impressed by that paper, those results. In fact, they just got the Gödel Prize for this a few months ago. So I'm not the only one who was impressed. So we all wanted to read about this. And the nice thing is that Arora's data contained not just a result but a technique. And the technique could be used not just for the Traveling Salesman [inaudible] but for many other problems where the input consists of points in the plane. >> [inaudible]. >> Claire Mathieu: Mitchell [inaudible]. Joe Mitchell from Stony Brook. >> [inaudible]. >> Claire Mathieu: Equivalent. Equivalent. So here's the technique. It's very simple. It's a quadtree decomposition of a plane, a hierarchical decomposition of a plane. You have points in there. You cut your area into fours repeatedly. Recursive decomposition. And then you use some dynamic programming and you do some rounding, you add some structure, and you might be unlucky. You might be unlucky. You might happen to cut your input just at the wrong place, at the place where you must not cut because important things happen just there. So to avoid that, you do a random shift. So a little bit of randomness plus a hierarchical decomposition gives you the ideas that yield approximation schemes for problems in the plane. Plus some work. Okay. So -- yes. >> Can you just mention what's special about the plane? >> Claire Mathieu: What's special about the plane? >> Yeah. What happens in the plane and not the [inaudible]? >> Claire Mathieu: The methods extend to three dimensions to a constant number of dimensions for most problems. The problem is to use dynamic programming, you need to have a small interface between the subproblem that you're solving and the outside. In the geometry case, the interface is the boundary of the area where you're solving the subproblem. In the Euclidean plane, the length of this boundary is small compared to the area inside. That's not true in high dimension. So that enables you to use all sorts of rounding, approximation tricks along the boundary, and that's only in small dimension. Okay. So I -- since I was so interested in that technique, I wanted to see what problems could be solved using this that had not been solved by Sanjeev Arora. And there's one problem that was solved a while ago, the Steiner tree problem. The Steiner tree problem is a network connectivity problem. You're given points. You want to connect them [inaudible] using extra points if you want, using minimum total length. This is a real design problem, if you will. Now, if you want everything to be connected, that's the Steiner tree. If you just want to connect this group together, that group together, that group together, it's a Steiner forest problem. You could say if I have a Steiner forest problem, say I want to connect the red points to one another, the blue points to one another, purple to one another, green and orange, each color corresponds to a group that you want to be connected, you could say let's just take each group and for each group we do the Steiner tree problem, we solve the Steiner tree problem. That problem has a solution. But that's not necessarily what you want to do. Look at this case. The green and the purple groups, it makes sense to combine them, and you're actually saving length if you use this edge both for the green and the purple connected place, for connecting them. So that means you have to decide which groups should go together, the purple and green, blue and red, orange alone, before solving the problem. That makes it much more difficult. Now, using Arora's hierarchical decomposition approach, in collaboration with Cora Borradaile, who was a graduate student at Brown at the time, and Philip Klein from Brown, we designed a PTAS for this problem. Then I thought since we understand how this hierarchical decomposition works, maybe -- are we done exploring its potential. We looked at a survey written by Arora on various problems solved with this approach. And at the end there were two problems that kind of stood out. He said the last section was problems that we have not been able to solve with this approach, minimum triangulation vehicle routing. I told my students -- my student, Aparna Das, let's think about vehicle routing. Minimum weight triangulation got solved very recently. And for this problem, we couldn't quite get a PTAS. We got a quasi PTAS. What's a quasi PTAS? It means the running time is not quite polynomial. It's quasi polynomial time. What does quasi polynomial time mean? It means not N to a constant, N to almost a constant, some logarithm. N to polylog N. Okay. Now, N -- running time N to the polylog N, that sounds pretty terrible, if you want to implement it. But actually the first draft of Arora's work on TSP, the very first draft of his data was not a PTAS, it was a quasi PTAS. A quasi PTAS can be viewed as a strong indication that there exists a PTAS. So it's like an intermediate step. My expectation is that in one year or two or three someone will take this and make it into a PTAS. We were just missing one idea. Okay. And what is this problem? I haven't defined this problem that we solved. Vehicle routing. Here's a bus. Here are clients. Okay. They all want to travel to this place, this depot, but the bus has a capacity of ten, minibus. So the bus goes to pick up these ten people, then goes back here. Those ten goes back here. Those ten goes back here. Three trips. >> Is this Austin? >> [inaudible]. >> Claire Mathieu: You know, my student designed this slide. She must have had a reason. I guess I should have put it there. Yes. >> [inaudible] there was a student who used the [inaudible] to get quasi PTAS. Do you know the name of the student? >> Claire Mathieu: Arora's student? >> [inaudible]. >> Claire Mathieu: Quasi PTAS for what problem? >> For Euclidean TSP. For using this decomposition. It was quasi polynomial. >> Claire Mathieu: You mean before Arora's result? >> They worked together. I don't know. I mean, Arora mentioned his talk [inaudible]. >> Claire Mathieu: I don't know. Sorry. I'm not sure. Now, of course in terms of epsilon, the runtime is horrendous. Now, we've worked since then on extensions of this result. But it's actually -- it's not easy. If you think about it, to get -- how do you -- why do you get -- one second. One [inaudible]. Why do you get a quasi PTAS instead of a PTAS? It all depends on how much information you want to have at the interface. If you can store it with N to the log N bits, so if you have, say, N locations, each of them you want a number between -- log N locations, each of them you want to have a number between 1 and N, that gives you N to the log N. That's typical. Yes. >> So what was the goal in this? Was it to minimize the number of loops or the whole length? >> Claire Mathieu: The lengths. The lengths. You want to minimize the total length of your routes. So if K equals N, then it's TSP. All right. So there is -- yes. >> Can we go back for one second? I may be asking a very stupid question. >> Claire Mathieu: That's okay. >> So I apologize a head of time. So you were saying that there's a quasi PTAS and you feel that there may be a PTAS. >> Claire Mathieu: Yes. >> And I was wondering about the lower bounds for these types of things. I'm familiar with lower bounds for approximation algorithms where you say there's no constant time [inaudible]. >> Claire Mathieu: Yes. >> Do people prove that quasi PTAS is the best that can be done? >> Claire Mathieu: It's just a feeling. It's not -- oh. Is there -- no. We don't have a -- we don't have any -- we don't have any natural problem for which there's a quasi PTAS and there's no PTAS. There's ->> And there's provably no [inaudible]. >> Claire Mathieu: And there's provably no PTAS. No. This is really -- everybody believes that this means a PTAS is waiting to be discovered. >> That's because this -- so is this an inconceivable thing to prove this? Or is it just that we haven't found the technique yet? >> I mean, cannot be shown that it's NPX-hard [inaudible]. >> Claire Mathieu: If you believe P is different from NP, then -- then this -- this -- these are all -- these all can be viewed -- if you don't care about implementation, these all can be viewed as complexity results. All of these results say these problems are not NPX-hard. You cannot prove that there cannot exist a 1.1 approximation algorithm. Double negatives. >> [inaudible]. >> Claire Mathieu: Yes. Yes. All right. So these Euclidean problems we worked on, we used Arora's framework. Now, there is this intriguing phenomenon that has happened recently, which is that all these optimization problems and graphs, they can be defined when points are in the Euclidean plane. You can also define versions when points are in planar graphs. There are some settings such as road networks where it makes more sense to look at distances as being computed along a planar graph rather than Euclidean distances. Well, there's another problem for which an approximation scheme we design for one and then for the other, in one setting and then in the other setting. And Phil Klein and I, we have the impression that the two frameworks are related; that when a problem can be solved geometrically, there's a good chance it can be solved in planar graphs. Planar graphs are a little bit more difficult than Euclidean. So we've started pursuing this program. We want to show that combinatorial optimization problems can be solved in planar graphs. Our first result in that direction was for the Steiner tree problem. Cora Borradaile, Phil Klein and myself designed a PTAS for the Steiner tree problem when your points that you want to connect are in a planar graph. And to do that we designed a structure called a brick decomposition. We have the hope that the structure will be analogous in planar graphs to the hierarchical decomposition in geometric settings. So we would like to use it again and again and again as a building tool for algorithms in planar graphs. So there's a planar graph here. These are the vertices. And you see this is a subgraph of your planar graph. And it's composed of bricks. And each brick, this is almost the shortest path. It's a near shortest path. And somehow the structure inside makes it easy to solve the problem inside. And then you can use some standard techniques to solve the problem. So that's the grand plan. So so far we've done Steiner tree. Then other people did Steiner forest. And now we're working on multiway cut. Multiway cut. There's a planar graph that I didn't draw. I only drew the vertices. There's some special vertices, your terminals. You want to disconnect them from one another. So I drew this in a [inaudible]. You want to draw cycles to separate these terminals -- terminal faces from one another. So you see the multiway cut solution. It has a bunch of connected components, each of which is two connected subgraph. We may have -- we are on our way to an approximation scheme for this after three years of work. And after that there's some other problems that we want to solve in planar graphs. And if each problem takes us a year, maybe in four or five years we'll get to Markov random fields. You have an image, you want to do image segmentation. You want to partition the pixels, separate them into regions and the minimize the cost. Well, the cost is for two adjacent pixels, there's a certain cost depending on where you want the image, to color them, give them different labels. And each pixel, you already have some partial idea, some idea of what its color should be. So there's a cost to assigning a label to each pixel and to assigning different labels to adjacent pixels. Now, then you want to find the best -- the best way to label the image, so the best way to partition the image into regions, and hopefully, if all goes well in our program, in a few years we'll have an approximation scheme for that. >> Sounds like the ground stage for a busy model with external fields or [inaudible]. >> Claire Mathieu: It's not easy because you have many different labels. >> [inaudible]. >> Claire Mathieu: And each pixel has a different function, cost function. All right. So I've talked about geometric problems, I've talked about planar graph problems where distances are distances in the graph. Now, what is more general than this, whenever you have distances, general metric spaces. Is what beyond our ability? General graph problems. That's pretty hard. Is there anything we can do when there's a general metric for combinatorial optimization problems? Yes. Let's take one of the most fundamental problems of combinatorial optimization, max cut. Max cut. What is that problem? Everyone who's in theory knows what max cut is. You want to take a graph ->> [inaudible]. >> Claire Mathieu: No, that's okay. Thank. You have a graph. You want to partition its vertices into two parts so as to maximize the number of edges crossing the cut. That's max cut. This is one of the original twelve NP-complete problems in Karp's famous paper. Max cut is such a beautiful problem. It's been the source of so many new ideas for algorithmic design. It's been the first problem for which people used semidefinite programming to design good approximation algorithms. It's been the object of interest for lower bounds, for consequences of a unique games conjecture, for the sampling complexity in dense graphs. And so we know it's a good problem. It's an inspiring problem. So let's try this in a metric space. You have points in a metric space. The edges have lengths. The lengths satisfy the triangle inequality complete graph. Partition the nodes, the vertices into two sets to maximize the sum of the lengths going across. For that problem, in joint work with [inaudible] Fernandez de la Vega we designed a PTAS. The solution, it's a randomized algorithm. It's based on sampling. Sampling. That's an old idea. It's based on importance sampling. The key idea is we don't want to just take a random sample of a vertices uniformly. We have a metric space. Let's use the metric. If you think about it, think about the max cut problem. Imagine all these points are together except one who is really, really far. How do you want to cut and catch as many long edges as possible? You want to have this point on one side and the other vertices on the other side. This point is very important. You do not want to miss it. It's more important than the others. So when you sample your vertices, you have to catch this point. Otherwise, the sample will not give you the right image of what's happening. Therefore, therefore, let's sample vertices not uniformly but with probability proportional to the average distance from a vertex to the rest of the vertices of this space. >> [inaudible] for max cut the best approximation is [inaudible] and here you're doing a PTAS for a generalization? >> Claire Mathieu: Points are in the metric space, so the graph is complete. It's a complete graph with lengths that are the metric. So that's a -- it's a special ->> Okay. It's not ->> Claire Mathieu: It's a special case. >> It's any metric space or ->> Claire Mathieu: It's a metric space. >> It's not on the Euclidean sphere or anything? >> Claire Mathieu: Any metric. The only property we use is a triangle inequality. >> But you can't capture ->> Is it sort of like SDP solution? [multiple people speaking at once]. >> Claire Mathieu: Here the triangle inequality, it's not in constraints that we put in, it's in the input. >> But you have -- but you have all edges, not just ->> Claire Mathieu: All edges. Yes. Yes. Okay. So when we designed it we thought -- and we designed this we thought this is a neat idea to use this bias sampling, but then we learned afterwards that this is a very old technique from statistics. It's a variance reduction technique. We do this sampling and then we have to adjust all our numbers when we do our estimates. And it's just -- it's been known in statistics for 30 years. Or maybe more. >> [inaudible]. >> Claire Mathieu: Yes. Yes. That's true. Yeah, that's true. It's another example. >> [inaudible] this compare to algorithms for dense graphs that de la Vega and [inaudible] ->> Claire Mathieu: Yes. It's -- actually, it's closely related. You can -- it's actually -- it can apply to dense graphs, but in dense graphs every point -- well, almost every point has the same importance because in the graph, the importance of a point is its degree. If a graph is dense, all the points that matter have degree of [inaudible] constant times N, so they all have about the same importance. So if you take this, you try to cast it in the framework of dense graphs, you actually get the dense graph algorithm of Fernandez de la Vega. Okay. I think I want to mention that this -- this looks like a very abstract problem, but actually using the same techniques we can solve some clustering problems. So if you have points, you have distances, you want to group them into clusters so as to maximize the distance between the clusters or minimize the sum of distances inside the clusters, the various generalizations for various generalizations, we also have a PTAS. Now, I have 12 minutes left? >> Yuval Peres: 17. >> Claire Mathieu: What? >> Yuval Peres: 17. >> Claire Mathieu: 17 minutes. Okay. Great. >> [inaudible]. >> Claire Mathieu: No problem. All right. So I talked about packing and scheduling problems. I talked about graph problems on geometric graphs, on planar graphs in metric spaces. Now I want to talk about miscellaneous problems for which it's also possible to design approximation schemes using perhaps slightly different techniques. I thought Eric Horvitz was going to be here, and I noticed that he's worked on data compression, so I thought he might enjoy hearing about an approximation scheme for data compression. So, Eric, if you're here, now is the time to pay attention. Okay. You know, this is -- this is Morse code. I work on obsolete technology. I work on approximation schemes with applications to telegraphs. Okay. You all know about Huffman coding. In Huffman coding, you have words and you want to code them with, say, 0s and 1s, some alphabet. And you want to minimize the average length of your text when everything that you want to encode has a certain frequency. Okay. Morse code is a way to encode the Latin alphabet. But, you see, you have dots, you have dashes. If you're on a boat, you use your lamp, dots are short, dashes are long. Dashes take longer than dots. That's why E, which is the most frequent letter in the alphabet, is encoded by a dot, not a dash. So this is a twist on Huffman coding. Huffman coding, when the letters are used to encode, have different lengths. According to [phonetic] Mordecai Golin, this also has applications to run length limited codes used in magnetic and optical storage. Code was a binary. Each one must be presented by some number of zeros between A and B zeros. And so each atomic piece you encode with has different lengths. So if you have, say, four words where frequency is 1/3, 1/3, 1/6, 1/6, that adds to 1. With regular Huffman coding, where you have two letters for encoding, each of cost 1, this is your tree. With encoding with two letters, every A cost 1, every B cost 3, cost 3, kind of like a dot and a dash, then this is the tree. You want to avoid two dashes. So that's the optimal tree. All right. So we made some observation that we could solve the relaxed problem. So in all these problems we want an encoding that is prefix free. We want every two codewords -- no two codeword to be one prefix of the other. Well, let's relax the problem and say that the beginning of the words should be prefix free, but in the end we are allowed to have two words encoded by the same -- if they have a long path that is exactly equal, they're both very long and they only differ at the very end, then it's okay. So we only put the prefix-free constraint at the beginning. Then if we can solve that problem, then we can convert the result into a truly prefix-free coding. So we reduce the problem to an easier problem, and then we can use standard approximation techniques and design a PTAS. Joint work with Mordecai Golin, Neal Young. All right. That was the bit about data compression. And finally there are two more problems I want to talk about that have kind of approximation schemes, rank aggregation and correlation clustering. Both of these projects are joint with my ex-graduate student, Warren Schudy, who just graduated. Let me talk about rank aggregation. I was talking to Susan Holmes just two days ago. She was visiting here. She does statistics. She said rank aggregation, everybody is interested in rank aggregation. It's such a hot topic. Look at these. You want to have an algorithm to detect cancer cells. The algorithm they have work well if they have a good training set to start with. What's a good training set? It's initially training the algorithm with hard images, images where it's hard to find features. Okay. So let's try to look at a bunch of images where we try to find features, you know, ants here, tails of fish there, and let's see how hard it is to detect the features. We need to rank these images by difficulty of feature detection before choosing which ones to feed to the algorithm as a training set. The way you do that is you get humans, you get them to look at these images, try to detect the features, you look at the pairs of images, you look at how many features they got right here, how many features you got right there, you -- it gives you a comparison on the difficulty of these two images. And once you have all these comparisons, then you try to deduce a global ranking of your images. So that's just one example of the kind of setting where this problem comes up. Now, for the mathematical model for the problem that we worked on. There are some variants, but this is the problem that we solved. Each committee member is asked to rank four candidates. This person puts Alice first, Bob second, Charlie third, and delta fourth. This person uses that ranking. This ranking, that ranking. Now, the committee gets together and you want to output the best possible aggregate ranking. How do you do that? Well, you're all working in good faith. You get along with one another. So you agree on a measure. You just want to produce a ranking that minimizes inversions. What do I mean? There will be a certain output ranking. This output ranking will be at a certain distance from here in terms of the number of inversions, from there, from there, from there, that's minimized a total number of inversions overall. Committee ranking. All right. Very simple definition. How do we find that best ranking? It's NP-hard. Therefore, approximation algorithm to the rescue. So let's do it. Let's start with a reasonable ranking. There are several constant factor approximations known. Example. Let's look at the average rank of each person. Let's sort them by average rank. Alice has ranked 1, 3, 2, 1. Average is 7/4. This would be the best, he'll be number 1, and so on. That's what we start with. It's a constant factor approximation. For some instances, it doesn't give you a good ranking, but it's reasonable. And then we do very simple. Everything we've learned in algorithms as undergraduates. Divide and conquer, except we don't cut in the middle, we cut in random places because we don't want to cut in what might have to be a critical spot. And then at some point we say, okay, this is small enough, let me go to the best case. It switch to a different algorithm that's based on sampling. I take this base case, and with a less efficient algorithm, I find a near optimal ranking, with a small additive error. Combining all this together, you get a PTAS. What's funny is that all these are elementary techniques. This is really an algorithm that could have been designed many years ago. 30 years ago all the techniques were there. It's just a matter of putting them together just right. My last example is a case where we fail. Correlation clustering. My first grant proposal in the U.S. I said I want to design approximation schemes. I want an approximation scheme for a correlation clustering. The answer from the reviewers say it's provably impossible. And the grant proposal was rejected. Okay. So I changed my grant proposal, but I did not forget about the problem. I still wanted to solve it. So let me define it for you. And then let me tell you how I solve it even though it's provably impossible. Here's the input. It's a complete graph. Every edge has a label. Similar, dissimilar. When my student -- when Warren gives a talk, he has pictures of cats and dogs, but I've seen them so many times, I got tired of cats and dogs. So this is my very own handwriting. Now, you -- as humans, you know what these are. They are 2s and 3s. Is there a program that can recognize that these are 2s and those are 3s? Let's see. Let's look at pairwise comparisons. These kind of look the same. Those kind of look the same. And these, they kind of look the same too. Now, that's too bad. Okay. So that's the input, this complete graph. This is the output, a partition of the data into two classes or any number of classes of parts according to similarity. And what is the value, how good is this partition. Well, let's see how much it agrees with the input. These are in different parts. The edge says they're different. Very good. Same thing here. Very good. These are in different parts, but the edge says they're the same. Not good. They won. These are in the same part, but the edge says they're different. Not good. They won. Total, 2. The cost is the number of edges where there's a discrepancy between the input and the output. And this is what we're trying to optimize in correlation clustering. All right. If P is different from NP, no approximation scheme possible, NPX-hard. >> [inaudible] no PTAS? >> Claire Mathieu: No PTAS. There's a 2.25 approximation, I think. But I like to have epsilon relative error. So I add assumptions, add some probabilistic assumption. Let's think of these objects. When I wrote those numbers, these symbols, I actually had numbers in mind. I thought I was writing a 2 or a 3, that's the ground truth. When you have images of dogs and cats on the Web that you retrieve, these are pictures that have been taken of an animal is actually a real dog or a real cat. That's the ground truth. Let's assume the input is a noisy version of some unknown ground truth. Let's assume that for every pair of images, when you compare them, you get the right answer except when you don't. With some probability P, the answer is flipped. >> [inaudible] means you have one dog-like looking cat and you have [inaudible] edges. >> Claire Mathieu: [inaudible] this question. I don't know -- I don't know what to do when there's no independence. I don't know what to do when the answers are dependent. We don't have a good setup. >> [inaudible] independent errors on the vertices [inaudible]. >> Claire Mathieu: Okay. Here's a different way to model the prime. Each of your objects can be seen as some multidimensional vector, and then between two vectors there's a probability that -- a measure of similarity that your answer will be equal or dissimilar. That depends on how close these vectors are. But then what is the ground truth, what does it mean that a vector is a dog or a cat? What is your goal partition? It's not obvious if you just have vectors. >> [inaudible] you're doing here, you're not only changing the model but changing the goal, right? You want to discover the ground truth ->> Claire Mathieu: I want to discover the ground truth or the max likelihood clustering, the max likelihood clustering. With these assumptions they actually -- they agree for most values of P. So what we can prove is -- so first we have a PTAS if P is not too close from one half. If P is one half, it is just a random input. There's no chance we can get the ground truth. But as long as it's bounded away from one half, then we have a PTAS. And more than that, we could actually get the exact ground truth, not approximate but exact, if all the clusters are large enough. And the answer is based on semidefinite programming. And I will skip this. This is a quadratic formulation instead of linear programming, and it's great for in optimization problems. So this is one application where it really gives you the right answer. Now, this is our theory, but my student worked with Micha Elsner who's a graduate student in natural languages group at Brown. And they tried to use the same idea, these ideas, to do correlation clustering on natural languages. And they solved the SDP. And they got this kind of matrix for the solution. And you look at this, you see, oh, there's a cluster, there's another cluster. And these are just small groups. So the clustering comes out naturally and it's surprisingly efficient. >> So how do you [inaudible] is it easy? >> Claire Mathieu: We -- actually, we take the SDP solution and then we use the Constant Factor Approximation algorithm for rounding. Because the SDP solution itself can be seen as an instance of correlation clustering, and then we can use a Constant Factor Approximation algorithm as a black box. But there's no time to discuss this. Now, in my remaining 45 seconds, I'd like to talk about the future. First of all, if you look at the kind of techniques that I used to solve these optimization problems, you see that aside from basic algorithms and elementary techniques, they borrow from two adjacent fields. One is probability theory, and the other one is mathematical programming. So we need experts from both of these fields to help us design good algorithms. Each is necessary, but neither is sufficient. Now, in terms of what I would like to work on later, besides these various problems that I mentioned on the way on the planar graphs, I would like to better understand the power of these techniques for linear programming and for semidefinite programming, in particular lift-and-project is a way to enhance the techniques. How powerful is that? That's a very intriguing, hard question. And the other direction is I would like to work on probabilistic models of hard-to-approximate ->> It's so hard ->> Claire Mathieu: -- optimization problems. Okay. That's it. Thank you. [applause]. >> Yuval Peres: Any additional questions? >> What exactly do you mean by probabilistic models of [inaudible]? >> Claire Mathieu: Kind of like correlation clustering. When you have a problem that is NPX-hard, can you design a model to generate the input, add some assumptions, so that it becomes -- so that this restricted version is easy to approximate. But of course you have to have a model that makes sense. In the case of correlation clustering, it makes sense to assume there's some underlying ground truth. So for those problems it's natural. For other problems it's not so clear what is the right -- what the right model is. >> Maybe it's the random ordering version [inaudible]. >> Claire Mathieu: Yes. Yes. >> Yuval Peres: Okay. [applause].