>> Yuval Peres: All right. Good morning everyone. We are very happy to have Raghu Meka from the IAS who will tell us how to beat the union bound. >> Raghu Meka: Today I'll talk about beating the union bound by geometric techniques. So what is a union bound? It's a basic fact in probability. It says it shall, in events or some probability space then the probability of the union of the events is at most the sum of the individual probabilities. In particular, if the sum of the regional properties is less than one, then with nonzero probability, none of the events occur. And this was popularized by Erdos, and is one of the most basic techniques in the probabilistic method. But even before Erdos there was one literary figure who made prominent use of this. And I'll give you three guesses. >>: [inaudible]? >> Raghu Meka: Literary figure. >>: [inaudible]? >> Raghu Meka: Okay. Close, but>>: What was the question? >> Raghu Meka: There was one literary figure who made use of the union bound before? >>: [inaudible]. >> Raghu Meka: Exactly. When you have eliminated impossible, whatever remains. However, improbable must be the truth. It’s kind of a paraphrasing of the union bound. So despite its simplicities, amazingly effective in the Probabilistic Method. And some notable examples are the original application the method by Erdos students showing the existence of Ramsey graphs, very good Ramsey graphs. And Shannon, who also used it to show the existence of very good error correcting codes. Johnson-Lindenstrauss use it to show the existence of very good metric embeddings and so on. There are many others such applications as can be seen in this beautiful book by Alon and Spencer. On the other hand, sometimes the union bound is indeed too naïve, and it's not enough to capture the full picture. And one salient example of this is the Lovasz Local Lemma. It says that if you have any events as before but the events are d dependent, meaning each event depends on, at most, d other events, and the probability of each event is not too large, then the probability of the union is at most one, is less than one. And if you use the union bound here, you get nothing if is much larger than the degree d. So it beats the union bound. And we’ll keep things simple whenever we [inaudible] in beating the union bound. And this result itself, the [inaudible] result is amazingly effective. And Spencer says it helps us find needles in haystacks. On the other hand, there are, we’ll come across this concept again just to make this result constructive or algorithmic. And this required further breakthroughs by starting with the work of Beck and more recently the work of Moser and Moser and [inaudible] and so on. >>: [inaudible]? >> Raghu Meka: All right. I guess this is the first Lovasz and Erdos formulation. >>: [inaudible]. >> Raghu Meka:. I think in seventy-seven it was 4 and was improved to e later on. This was first result had a 4 in it. So here I'll talk about two examples. The first, [inaudible] the union bound. The first is constructive discrepancy minimization. And here I'll talk about a new approach for online linear programs, but by Brownian motion. And then I'll talk about the limit theorem for polytope, and here I'll just state the result and give some applications to learning theory, to learning theory. And in both cases, we will use some nice symmetry and geometric properties of the Gaussian distribution to go past the union bound, and that kind of serves as the underlying or unifying theme between these results. And here's the outline that I'll talk with the two parts and finally I'll finish by mentioning some of my other results and summarizing the work. So let's start with the first part. So in the basic setting here, in discrepancy minimization, is you have a set system which is a collection of n sets or something universe of size n. For example, here, let's say the universe is size 5 and s one is the set 1,3,4, and s two is 2, 3, 5, s three is everything and so on. So our collection of sets like this and the goal in discrepancy minimization is to color the elements red or blue so as to minimize the maximum imbalance. So for example, let's say we color, construct this coloring here, then the first row has an imbalance of three, the second row or the second set has an imbalance of minus 1, and so on. So the maximum imbalance or the discrepancy of the coloring is three. And the goal in discrepancy minimization is you're given a set system and you want to find a coloring CHI, such that, which minimizes the maximum or all sets in your set system of the discrepancy. And this is a fundamental commentarial concept in many applications. And let's look at a few examples to familiarize ourselves with that. And one of the first and notable examples of bounds in discrepancy was a [inaudible] from 64. We showed that if the sets in your set system correspond to arithmetic progressions, so Z, n, then the discrepancy is at least n to the one quarter. >>: Is m equal to n? >> Raghu Meka: Here? >>: Yeah. >> Raghu Meka: Here m can be much larger; it’s usually where n squared>>: Is this uniform over m? How does m>> Raghu Meka: So m is, instead of all o, m is the size of the set system and here you are taking all arithmetic progressions, so they'll be roughly what n squared says. >>: Okay. >> Raghu Meka: Like you have the set one, three, five, seven, you also get two, four, six and so on. And Matousek and Spencer also gave a matching upper bound. And either example that is well studied is the case of halfspaces. So here you have a bunch of points or some, let's say the plane, and the sets in your set system correspond to indicator functions of halfspaces. So you draw a hyperplane and all the points on one side will correspond to one set, like s one here, it will either hyperplane and all the points in this side will correspond to s 2, and you consider all possible hyperplane’s and all possible sets that you get in this form. And once again we have tight upper and lower bounds for the discrepancy of this set system as well. Okay? So discrepancy minimization and discrepancy theory has many applications, and here I'll talk about one cute application that I like to, very interesting in computer graphics. So ray tracing is a kind of nice concept in graphics where you give some textual description of the, of some figure. There is a sphere at so and so coordinates, and there's a slab at so and so coordinates, there's a triangle somewhere and so on. And the goal of the ray tracer is to take this text description and render an image that corresponds to this description. And this image is actually generated by one such ray tracer which I was glad to use because this is from an undergrad project I did long time back. So how does a ray tracer work? So ray tracer works as follows: you specify the position of the camera and position of the screen, and there are objects that you're trying to render. So I shoot some rays from the camera, finds out where the rays intersect the objects, and then retraces back onto the screen, and then renders them according to the properties of the intersection point. So a very simple concept, and one important operational question here is to which [inaudible] decide which points to shoot rays from? For instance, you can't shoot the rays from all the points on the screen because that would be too computationally intensive. And the choice of the points makes a big difference in the quality of the rendered image. For example, if you choose a narrow grid then you get a lower [inaudible] errors. And what people do in practice or in graphics is to choose a set of points whose discrepancy with respect to small or with respect to simple geometric shapes like half spaces, triangles, circles, is small. And this is actually what practitioners use; they compute some, they have some preset configurations which minimizes this discrepancy, and they render, they use this in their ray tracing programs. And more generally, discrepancies has many applications. >>: [inaudible]? >> Raghu Meka: I think the intuition is that when you do this, every ship gets kind of the same number of points, but that [inaudible] is probably more empirical. It can really clearly, I mean if you do it, run the program, you can clearly see a [inaudible] has been minimized with the aspect of discrepancies. So it's probably more empirical. >>: Somehow I’m missing something. I mean, wouldn’t the grid be [inaudible]? >> Raghu Meka: The grid is not so good>>: Why? >> Raghu Meka: Why? Let’s see. Because it depends on how fine a grid you choose. So if you fix the number of points then the grid is not as good as some other configurations. >>: Why? >> Raghu Meka: So, for instance, if you take a, let's say I take a square, which is smaller than the resolution of the grid, then you are kind of, or like a rectangle which sits inside between the two lines. >>: [inaudible]? >> Raghu Meka: So you can do well with grid too, but then you would be require finer and finer resolution or more points. >>: I mean for a small square the grid is more or less as good as that. >> Raghu Meka: Yeah, yeah. The thing is, how small a grid do you need? >>: Yeah. But for a given set of points>>: For a given this number of points [inaudible] much better than>>: The small square. I mean, it’s a question of which collection of sets you are trying to- >> Raghu Meka: Right. Let's say you take a rectangle. Thin rectangle. You don't want to lose those two. The shapes like that might be problematic if you use the grid. And discrepancy theory has many applications in complexity theory where they correspond to good average case lower bounds, to communication complexity where it is one of the main techniques for proving lower bounds, to computational geometry where it is useful in building small data structures and to graphics, as we saw before, and also to pseudo-randomness. And there are many more applications as can be seen in [inaudible] books. And here I'll talk about one of the cornerstone results in this area, which is this Spencer celebrator’s [phonetic] six standard deviation suffice theorem. It says the following: if you have a set system with n sets in it, then the system has discrepancy at most six times square root n. And it's called the six standard deviation suffice result because there is a six in the theorem statement and square root n corresponds to the standard deviation of a set under a random coloring. And the square root n bound here is actually necessary. You don't know the right constant in front of it, but more or less it's tight. And what's most interesting here is that it beats the union bound. If you take a random coloring and use a union bound across the different sets, you can show that the coloring gets it discrepancies square root of n log n. But on the other hand, the [inaudible] systems where random colorings get your discrepancies, [inaudible] off square root n log n. So Spencer’s result beats the union bound in this sense. >>: [inaudible]? >> Raghu Meka: And another aspect of this result is that Spencer's original proof had an ingenious use of the pigeonhole principle and hence was nonconstructive. So again, no algorithm for finding such a coloring shot of and numerating all possible colorings. And in fact, Alon and Spencer, Spencer [inaudible] twenty years ago that there is no inefficient algorithm for finding such a coloring, finding a coloring which gets discrepancy o of square root n. And like all group [inaudible], this was shown to be false. Recently in a breakthrough work by Bansal, we showed that the [inaudible] randomized algorithm for finding a coloring with discrepancy o of square root n. And here, I want to talk about a new elementary and geometric proof of Spencer's result which also gives an algorithm to find such a coloring. And I want to say that our result is truly constructive in the sense that even though Bansal give an algorithm for the problem, the analysis of his algorithm still appeal to the nonconstructive proof. So he had some [inaudible] program in relaxation and to argue about the value of the program he used Spencer's nonconstructive proof, whereas our algorithm itself gives a proof of the result. And more than the specific result, I think the technique we entered, used for finding just a coloring that we call EDGE-WALK, which involves some rounding linear programs where Brownian motion seems to be a much broader potential and could be used elsewhere. And very recently, about 2, 3 weeks back, Rothvoss used some of our techniques to get the first implements for the bin-packing problem in nearly 30 years. So it seems to other applications which are made further investigation. >>: Can ask a question about the [inaudible] algorithm? When you said, in this context, does that mean just with a fixed probability your algorithm will give you one more? >> Raghu Meka: Right. And you can also make it deterministic, but I think here the punch line is just finding such an algorithm. And here's the outline of the algorithm. I’ll describe the algorithm in its full glory and also sketch its analysis. So the algorithm was two steps. One is the partial coloring method which was used by Spencer and we’ll also use. And next I'll describe the EDGE-WALK algorithm. So let me start with the partial coloring method. So the partial coloring method was introduced by Beck and the philosophy it is as follows: in the discrepancy minimization problem we said we wanted to find a red, blue, or 1 minus 1 coloring. And Beck said, instead of insert, let's find a 1 minus 1, 0 coloring, so where some of the variables might get a value of zero. But if you don't put any restriction here, the problem becomes trivial. You just put all zeros vector. So Beck said, instead of insert, let's find a partial coloring where at least half of the coordinates are nonzero. And the [inaudible] is as follows. Let's say you find a partial coloring where half of the coordinates are nonzero, like the one shown here, and so you say once you find such a coloring let's hide the things which are nonzero. Let's forget about that and [inaudible] the problem on the remaining unfixed variables. So you find other partial coloring on these guys, maybe n over four elements, and then repeat the process until you cover the whole, all the variables. Okay? And if everything works out according to plan, the hope is that the first partial coloring gives you a discrepancy of square root n and the second partial coloring gives you a discrepancy of square root of n over two because now you are working over universe of size n over 2, and then n over four and so on. So you get a geometrical [inaudible] series and total discrepancy is o of square root n, which is what we want. And more concretely, the first step is you have a collection of n set systems, and sets over some universe size n, and you want to find a 1, 0, minus 1 coloring such that the coloring has small discrepancy or o of root n and most importantly, at least half of the coordinates are nonzero. >>: And it's really, n is the number sets and the [inaudible]? >> Raghu Meka: Right. So here I'm simplifying, if you put m there, it will be square root of n times log of m over n. But I think this is the main step. Once you get this it’s not too hard to get to other things. >>: [inaudible]? Number of sets or number of [inaudible]? >> Raghu Meka: Oh, here? It's square root of number of variables, number of elements times log of m over n. That’s the precise bound. And our main result is that [inaudible] is an efficient algorithm to find such a coloring. And that's what I'll talk about. And because there is an efficient algorithm to find such a coloring, as a corollary, we get that there exists one such coloring which was a main point of the [inaudible] results. So let me now talk about the EGE-WALK algorithm. And the [inaudible] algorithm, let me rephrase the problem in a geometric language. So here's a discrepancy set, [inaudible] coloring which minimizes the maximum imbalance. But there is a different way or perhaps more intuitive way of measuring the discrepancy which is to look at the matrix vector product of the incidence matrix of the set system with the coloring vector. Then the discrepancy of the coloring is just a maximum entry in absolute value of the metrics vector product. So you want to place some [inaudible] vectors so the infinity normal of the matrix vector product is not too large. And further, if you let V, 1 through V, m be the indicator factors of the sets in your set system, what we want is the coloring CHI such that linear part of CHI, with each of the indicator vectors is small. And now these constraints, the linear part of CHI with the indicator vectors being small, these are all linear constraints except with the [inaudible], they’ll be one CHI to be a plus or minus one vector. And this naturally suggests the approach of looking at the linear programming relaxation for the problem. And if you do that, you end up with the following polytope. B, which is a set of all x, so that each entry of x is at most one in absolute value and the end product of x with each of the indicator vectors is small. So this is a linear program in relaxation of the discrepancy minimization problem. And our goal is to find a nonzero lattice point inside this polytope. Not just a nonzero point, but trivial point which has many nonzero coordinates that would correspond to a good partial coloring. And in the rest of the talk I'll refer to the constants of the form absolute value of i, x, i at most one, as color constraints because that's where they come from. And the linear part of constraints is discrepancy constraints because they correspond to the discrepancy of the set system. So here's a goal set up. Again, you have this polytope and we like to find a highly nonzero lattice point inside this polytope. And for intuition, let’s use the distance from the origin as a proxy for how nonzero a point is. That's just for intuition. And so we are starting at the origin, let's say. And our goal is to find a vertex which is kind of as far away from the origin as possible, but it's still inside the polytope. And the way we do this is to use Brownian motion. So we do a random walk in dimensions until you hit the boundary of the polytope. And once you hit the boundary of the polytope, you need to decide what to do. We still want to keep moving away from the origin because you want to increase the distance from the origin, on the other hand, we don't want to cross the polytope. So we take the greedy approach, we change it to the Brownian motion but now constrain yourselves to lie within the boundary that you hit. So do a random walk of Brownian motion, and so you hit other constraints, and then you repeat the process until lo and behold, you hit the vertex that you wanted to get at the beginning. So this is that the full algorithm. And the claim is that this algorithm will find a good partial coloring. >>: So this Brownian[inaudible] through fractal points? >> Raghu Meka: Right. So if Brownian, it is through fractional points. >>: And what is your stopping point? >> Raghu Meka: So [inaudible] the stopping criteria is just go until you reach a vertex. >>: The vertex of the polytope? >> Raghu Meka: Vertex of the polytope. But in reality, we’ll just stop after a certain number of iterations because you wanted to [inaudible] discrete wall. >>: But this final point is also [inaudible] fractional? >> Raghu Meka: So most of the coordinates will be integral except some coordinates. Half the coordinates will be integral, half the coordinates will be fractional. And we’ll then recourse on the fractional coordinates as before. So half the coordinates will be integral.. That's what we want it to show. >>: I see. >>: You can keep either the phase, coloring phase>> Raghu Meka: Or the discrepancy. Exactly. That's the point. So, I mean, let me describe the algorithm fully because this continuity, continuous Brownian motion is not exactly, it’s nice for intuition, but when you try to implement it it’s a bit problematic. So to make the algorithm formal, let me describe what’s the random walk that we’ll use. So the random walk is very natural. You have some subspace v and given your current position, you just choose a random Gaussian vector in your subspace and take a step of [inaudible] size gamma in that direction where gamma is some step size. You want to take tiny steps, not too large. So the walk looks something like this. And another minor issue is that in the description I said you stop the walk once you hit the phase. But if you're doing a discrete walk you might hit the phase but might overshoot it, and the solution is to just introduce a slack near each bound. So if you get too close to the boundary, then we’ll say we hit it. So let me now describe the algorithm. So the algorithm gets those input some vectors which you should think of as the indicator vectors of your set system, and it transfers a certain number of steps which depends on step size gamma, but let's not worry about the parameterization too much. So you start from the origin, and that [inaudible] step let color t [inaudible] for all color constraints or color phases that are nearly hit. This is a set of all coordinates i such that the absolute value of the i coordinate is very close to one. And let discrepancy t, to know the set of our discrepancy constraints that that are nearly hit. This is a set of all vectors V, j, so this the linear part of V, j with our current vector is very close to the threshold that we don't want to cross. And that said, you still want to do the random walk, but not change any of these constraints which are very close to being violated. So we just walk in the subspace that is orthogonal to all of these constraints. So if you let B, t be the subspace, you just pick a random Gaussian vector in that subspace, take a tiny step in that direction, and repeat this process. And the number of steps you have to do this will roughly be one over gamma squared where gamma is the step size. So that's the algorithm. And the claim is that you'll still find it good partial coloring with some non-negligible probability. Now let me give a sketch of the analysis and actual proof is not too much more complicated. You just have to write some dead bounds. But it's not too hard. >>: [inaudible]? >> Raghu Meka: So you’d have to take very tiny>>: [inaudible] But then do you stay at the point of actually, nevertheless [inaudible]? >> Raghu Meka: So, I mean it happens with so little, tiny probability that you kind of ignore such events. So if your step size is much smaller than the slack you introduce, you're never going to overshoot the [inaudible]. >>: But you stay actually off the phase. You don't move around [inaudible]? >> Raghu Meka: Know. So you can think of the slack as being polynomial small, so instead of stacking them at the n, if it's one over n, you can just [inaudible] anything you want. It's not going to affect the error in the discrepancy problem. >>: Could you use Brownian motion if you think you're [inaudible]? >> Raghu Meka: I think so. I mean, probably. Yeah. You should be able to. Brownian motion is basically like the case where gamma is going to zero. So let me describe the analysis. The setup is the same. You have this polytope and we’d like to find a highly nonzero lattice point. And the punch line for the analysis is that the discrepancy phases are much farther from the origin than the color phases. Okay? So let's say we are shooting for a constant of hundred times root n. Then the discrepancy phases are at a distance of hundred from the origin because each of these vectors V, j has [inaudible] root n, and the color phases are at a distance of one from the origin because [inaudible] X size less than equal to one. And by design, by the way, we designed the random walk, it looks in Gaussian in any direction, with some variants. So you do some calculations, some [inaudible] bounds, and you get that the probability that the walk hits a discrepancy phase is roughly exponentially minus hundred squares, like [inaudible] Gaussian [inaudible], and the probability of the walk is a color phase is roughly exponentially minus 1. It’s not too small. And as a consequence of these two bounds, you get that the probability that the walk hits a discrepancy phase is much smaller than the probability that it hits a color phase. >>: [inaudible]? >> Raghu Meka: So you fix any discrepancy phase, in any color phase, it's more likely to hit the color phase, and then you take a kind of expectation or all the discrepancy phases. >>: [inaudible] when you say is [inaudible] do you mean the probability that it ever hits [inaudible]? >> Raghu Meka: Okay. Because of this comparison, what's morally true is that you end up hitting the cube phases more often. And let's say you don't worry about those [inaudible] as an issue and you will run the walk until you reach a vertex. But when you which reach their vertex, you should have hit n constraints. And we know that most of them are color constraints. And if you are satisfying n constraints, and most of them are color constraints, half of them must be color constraints. That means that you get a good partial coloring. >>: So the estimate you have [inaudible] one round from the origin until you hit something straight, right? So you hit the discrete, the discrepancy phase is smaller [inaudible]? >> Raghu Meka: So this is [inaudible] of the algorithm. Let’s fix the discrepancy phase. The probability that hits that discrepancy phase is much smaller than the probability that hits any of the, an average cube constraint. >>: [inaudible] first time we hit [inaudible]'s start with>> Raghu Meka: Kind of intuition actually works even afterwards because I cheated a little bit, and you're catching me on that, is that it's not any color phase, but like most of the color phases. It might not be true for let's say, X, 1, but it will be true for the majority of the color phases. >>: I don’t understand the quantifiers here. You are fixing two phases, one as [inaudible] and one color phase? >> Raghu Meka: Right. >>: So what’s the probability that it hits>> Raghu Meka: The discrepancy phase throughout the run of the algorithm? It's roughly the sky. >>: If that’s what’s true, then with high probability you would never hit any discrepancy phase. >> Raghu Meka: No, but there are many of them, right? >>: Yeah. But they're not exponentially [inaudible] trying to>> Raghu Meka: Hundred is a constant here. You should think of it, maybe if you put a six here, and then it might make sense. It's a constant, and the number of discrepancy constraints may be larger or is comparable to the number of color constraints. >>: But really when you say a half, you can improve that too. >>: Because once you keep one of those discrepancy phases, so your random walk is no longer>> Raghu Meka: Truly Gaussian. But what happens is the first inequalities still hold because if you look at the production, it's a Gaussian with lesser variants, so you're doing better. And for the other guys, it won't be true for every color constraint but it will be true on average. I mean, it's like the intuition, and you arrived on the calculations, it will come out to be true on average. You might decrease the variants n for one of the color constraints but not for all that data [inaudible] simultaneously. Okay? So that's basically the proof, and you just have to write some [inaudible] bounds and use this averaging argument to formerly verify it. And the key part here is that we use the symmetry of the Gaussian distribution to argue this fact and that's where we beat the, end up beating the union bound in the analysis. Okay? So let me summarize. So you use our EDGE-WALK, we call it the EDGE-WALK algorithm because you're basically trying to walk on the edges of the polytope as much as you can. And so it gives an algorithmic formal departure coloring lemma, you find a group partial coloring, and then you recourse on the unfixed variables as in Spencer's original argument. And you combine these two to get Spencer’s data. So here is the symbol. >>: When you optimize [inaudible]? >> Raghu Meka: So I was [inaudible] completely sure, I did some crude estimates and wrote some MatLab code to figure out the constant, it came out to be 13. It probably can tweak it better that for a starter approximation, say 13. >>: So let me just try to understand. When you choose the amount of this slack [inaudible], you're saying we can do that in a way that justifies [inaudible]? >> Raghu Meka: On n. >>: [inaudible] estimate I mean, somehow it might depend on [inaudible] get to [inaudible]? >>: [inaudible] Guassian [inaudible]? >> Raghu Meka: Right. So you choose it, let's say you choose that one over n squared or something. Okay? And you run this whole algorithm. And then you, whatever slack you have, you just round it to the nearest one or minus one. And because it's one over n squared and each vector has [inaudible] n, you don't hurt yourself much. So before moving onto the second part, let me just say that the algorithm I described, you can do it for any polytope. This is just a random walk on polytopes, so you just start from the origin, if it contains the origin or start from any point inside the polytope, and then repeat the process as before. And here let’s look at kind of the ideal scenario where you do that Brownian motion. And it seems to introduce some strange distribution on the vertices of this polytope, which kind it is cute to [inaudible] vertices which are closer to the origin in terms of the bounding hyperplanes. >>: So one thing is it's like [inaudible]. >> Raghu Meka: It's kind of an iterator rounding scheme. >>: [inaudible]? >> Raghu Meka: It’s really, you’re [inaudible] vertex of the remaining bound and you start with the vertex of the points. >>: [multiple speakers][inaudible]. >> Raghu Meka: No. In rounding you solve your optimized linear function [inaudible]. That's a special case of rounding. >>: [inaudible] polytope. [inaudible]? >>: So you think of this as an integral polynomial? >> Raghu Meka: Yeah. It's like you're trying to find an integral solution. So you find this fractionally integral solution and then you iterate on the remaining things and so on. >>: The problem is if I have as they say an integral [inaudible] and I don't know what the hyperplanes are, so how can you do it? >> Raghu Meka: Say that again? >>:. So if I have, let’s say I start with the integral portal which is [inaudible] of the integral points, I [inaudible] optimize [inaudible] and I don't know what I iterate on. How I do this? [inaudible]? >>: Well, if a polytope is integral anyway>> Raghu Meka: For any polytope you just, so here's the routing algorithm if you want, you have a fractional solution. You do this walk and you get some vertex, and you look at the coordinates of the vertices which are now integral. >>: [inaudible]? >> Raghu Meka: If it's not, then the routing algorithm has failed. But that's the approach. So you find some coordinates, which are integral, okay? >>: Maybe not afterward. >> Raghu Meka: Yeah. Then the algorithm is failing for those kinds of, it's not a universal rounding algorithm. So if [inaudible], it will [inaudible]. So this is the approach for finding>>: So it’ll work if this is a polytope, if I start to have many integral coordinates>> Raghu Meka: [inaudible]. >>: What I’m saying is if the polytope is guaranteed, has this property to begin with, I should be able to solve this by running [inaudible] and even finding [inaudible]. >> Raghu Meka: No, no, no, no. But think of like which vertex, there might be a vertex which has some integral coordinates, but not all vertices have this property. And you want to find one such vertex just as in the discrepancy minimization problem. It was not even clear that there existed such a vertex. And in fact, these vertices are much fewer than the total number of vertices. They are exponentially small fraction of the total number of vertices and you want to kind of isolate these vertices. >>: [inaudible]. But on the other hand, some of the vertices [inaudible]. So maybe if you expand your definition of rounding, my sense of rounding is that you take a fraction [inaudible]. >> Raghu Meka: So you guess through integral point. And if it fails, let’s say it gives all zeros, saying which is a trivial rounding. It doesn't give anything meaningful there. >>: [inaudible]? >> Raghu Meka: The thing is a rounding algorithm in principle, and the applications also use this as a rounding algorithm. There is a way of rounding using it around discrepancy constraints and so on. And in fact, [inaudible] use it in this context. He finds the LP solution for this binpacking problem and uses our algorithm to round the LP solution. >>: Do you know how exactly [inaudible]? >> Raghu Meka: Not really because it's like two or three weeks old, and I haven't had time to look at it. Okay. So here's the first part of the talk, and let me now talk about the second half. And here I'll just state our result and mostly talk about some applications. So what are central limit theorems? And I'm sure most of us know what limit theorems are, but let me just quickly run through some examples to basically set up some notation. So the central limit theorem says that if you're n independent [inaudible] random variables, then the sum of the variables after a proper normalization looks like the standard Gaussian distribution. And it is one of the most basic results in probability and has many applications. And pictorially, here I had the probability density function of one random variable, two random variables, and so on. And you can clearly see the shape of the Bell curve emerging. In either example, is large deviation balance in probability. We say that if, I mean, specifically for independent boundary random variables, the sum of independent [inaudible] random variables, the probability deviates too much from the mean, and somehow comparable to what happens in the Gaussian case. And such bounds are crucial for [inaudible] randomized algorithms, they again fall in same kind of broad paradigm of limit theorems because you're comparing the behavior of sum of independent things to that of the Gaussian distribution. And why should we care about limit theorems and algorithms or in computer sense applications? Mainly because they give us a nice way to translate [inaudible] problems to continuous problems where we often have many more sophisticated tools like calculus or convex geometry and so on. And one nice example of this philosophy is the matter of convex relaxations for [inaudible] optimization problems. So there you look at relaxations which are now continuous problems and then you can do some rounding or so on. So limit theorems have a lot of applications in computer science to working theory or social choice theory, to learning theory, complexity theory, communication complexity, and also in pseudo-randomness. And here I’ll talk about one limit theorem and its applications to questions in working theory and learning theory. So my work has led to two generalizations of the central limit theorem to multidimensional and [inaudible] which are motivated by such applications for problems in learning theory. And here I'll talk about the multidimensional version. So to motivate the multidimensional central limit theorem, let us look at a special case of the classical central limit theorem. It says that if you have n random plus or minus one signs, then there are some looks like the standard Gaussian distribution, which is a very, very special case of the central limit theorem. But there is a geometric way of looking that this result which is as follows: let's say you have the universal, the end-dimensional space [inaudible] two dimensions here, you do limitations of power point, and red squares, you should think of as n bid Boolean factors, and blue circles as Gaussian random vectors, random Gaussian vectors. And the central limit theorem says that if you take a hyperplane in n dimensions, it slices the space, then the fraction of bullion points that lie on one side of the hyperplane is close to the fraction of Gaussian points that lie on the same side of the hyperplane. And given this geometric interpretation, it's natural to ask what happens if you have, instead of having one hyperplane, you have two hyperplane's? Is the fraction of Boolean points on the same side close to the fraction of blue points still? What if you have three hyperplanes, or more generally, you have many hyperplanes which now form a polytope. And this leads to the question of central limit theorem for polytopes. So you have a polytope in n dimensions and we’d like to know if the Boolean volume of the polytope, which is the probability that a random plus or minus one point license at the polytope is close to the Gaussian volume, which is the probability that random Gaussian vector lies inside the polytope. And this is what, we’ll show a limit theorem in this context, but let me just put some conditions and we’ll reach out the [inaudible]. So we’ll assume that our polytope has a polynomial number of facets because that's the most interesting case for applications. And, most importantly, we also assume that the polytope is regular in the sense that no variable is too inferential for any of the bounding hyperplanes. Let me explain this. So the geometrically, it says that the polytope, none of the bounding hyperplanes of the polytope are aligned with the coordinate access vectors. For example, the hyperplane x, 1 will not be regular, and the hyperplane X, 1, oh, sorry. The sum of all the x sides is a regular hyperplane. And regular polytopes appear reasonably commonly in [inaudible] programs and so on, but most importantly, regularities needed for such central limit theorem. With regularity you cannot have a central limit theorem, so it's a reasonable assumption to make. And in joint work with Harsha and Klivan, we showed that if you have such a regular polytope, then the Boolean volume of the polytope is close to the Gaussian volume [inaudible] which is a little o of one. The little o of one is actually something like log squared the number of phases, but it's most importantly little o of one. So error goes to zero as the dimensions go to infinity. >>: Does the little o depend on how far these planes are from the coordinate axis? >> Raghu Meka: I mean, not in terms of the distance from the origin, but in terms of orientation. >>: Yeah. >> Raghu Meka: Exactly. So here's our statement limit theorem restated. And once again, the notable thing here is that it beats the union bound. There’s a rich body of literature on multidimensional central limit theorems, but unfortunately, if you apply those results in our setting, you end up with a bound which is at least linear in the number of facets of the polytope. And at its heart, or implicitly, I think this is because when you specialized those theorems to our setting, you end up bounding the error for a single hyperplane and then kind of taking a union bound across a different hyperplane's. >>: So then why do need the condition that it has polynomial facets? [inaudible]? >> Raghu Meka: Say that again? >>: So why doesn't the sum of m>> Raghu Meka: It applies to the most, like the error is actually [inaudible] polytope with any number of facets. But it’s the easiest to write down. So if the number of facets is K, there is like log squared K, so as long as K is maybe slightly sub exponential you get something meaningful. >>: If you know exponential you could shave off all the corners of>>: Of course. [inaudible] exponential [inaudible]. >> Raghu Meka: Right. We can handle up to something like two to the root 10 or two to the n some power. Yeah? >>: But again, it doesn't depend on the distance [inaudible] how it captures the large deviations case as well? >> Raghu Meka: No. I think, it's too, it's too imprecise for large deviation bounds because>>: It's too small? >> Raghu Meka: It’s hard captioning the [inaudible] details. Even the usual central limit theorem, you can't get large deviation bounds from that. And our bounds are actually nearly optimal. You get comparable lower bounds as well. So you get something like a log squared K and the lower bound is square root of log K. And before going through the applications, I won't have time to delve too much into the proof, but let me say a few words. The first thing I want to say is what makes us think this is possible, that you can be to the union bound in this context? If you look at the literature in probability, one of the main techniques they have for proving limit theorems are these radiational[phonetic]methods, and the decor of them is if you want to bound the error, you need to bound the probability of some, of regions which are close to the boundary. So when you do the analysis, things which already close to the boundary on either side, if they have too much volume then you'll be in trouble. And for the case of polytopes, let's say you start with your polytope and look at all points which are at the distance of epsilon from the boundary of the polytope. So you get K of these strips like this, like the ones shown here, and there's a beautiful result due to Nasarov, which showed that the Gaussian volume of these blue regions is actually square root of log K times epsilon. And if you use the union bound, you get a bound of K times epsilon because each strip has a volume of epsilon and there are K of these strips. So this is one of the places where you beat the union bound and knowing this [inaudible] prompted us to think such results as this and you can use it. >>: So what is K? >> Raghu Meka: K is the number of facets. >>: [inaudible]? >> Raghu Meka: This is in n dimensions, but the figure is power points. And finally, the actual proof of the result uses this. This is one place where we beat the union bound. And then we also use some other results from nontrivial, that’s from convex geometry, but what if I find surprising is the proof actually uses techniques that will dole out in the context of designing pseudorandom generators. So the result has nothing to do with pseudo-randomness or designing PRG’s. Some of the proof was at least morally inspired by those techniques. You can de-randomize it, but we throw the [inaudible] probably without coming across this, at least [inaudible]. So let me now talk about the applications of the limit theorem. And you'll see how to use the limit theorem to beat union bound in various cases. The first application I want to talk about is noise sensitivity. Let's say you have some election where people cast their votes and you decide the outcome of the election based on the majority. And what we'd like to, what we are interested in here is what if there are errors in registering the words. So you have these words again, and there are errors, and what we'd like to know is if the errors cause the outcome of the election to change. And this is can be captured by this nice concept from the analysis of Boolean functions called noise sensitivity. So you have some Boolean function f and some noise parameter epsilon, and the noise sensitivity of f is the probability that the function evaluated at a random point changes its value when you perturb each coordinate of the random point with some probability epsilon. Okay? So coming back to the voting scenario, so the election scheme here will be decided by the function f, and noise sensitivity is a probability that if you flip each vote with some probability with error probability epsilon, the outcome of the election changes. That is noise sensitivity. >>: X is also at random? >> Raghu Meka: X is also at random. So it has many applications in analysis of Boolean functions starting with the work of Kahn, Kalai, Linial, who implicitly used it to show some four year concentration properties started in 1997, who used it to show some optimal in approximal[phonetic] results. Benjamini, Kalai and Shramm, who formerly defined. They also use it for some questions in percolation and so on. And here, I want to talk about noise sensitivity of majorities, or more generally, weighted majorities. And here, Peres showed that the noise sensitivity of any rated majority with noise rate epsilon is at most two times square root of epsilon. And as you've been asking before, when you have a question about one majority, it’s more natural to ask what happens if you have two majorities or three or four or several majorities. And this leads to the problem of computing the, or bonding the noise sensitivity of polytopes. So you have some polytoping end dimensions and the noise sensitivity of the polytope geometrically means the following: so you generate a random point, and by [inaudible] the probability that when you put out this point, if it ends up crossing the boundary. Right? You take a random point, perturb it, and we want to know when will it, how likely is it to cross the boundary of the polytope? And using our limit theorem, we can show that the noise sensitivity of any regular polytope is at most by log it [inaudible] K times the factor which is polynomial in the noise rate epsilon. >>: So there’s [inaudible]? >> Raghu Meka: Yeah. Boolean points. And once again, this beats the union bound. And the best previous result was, basically said that you have a bound in the noise sensitive of a single hyperplane [inaudible] things and the noise sensitive can increase at most by a factor of K. >>: So K is the number of facets? >> Raghu Meka: K is the number of facets. And how does it prove [inaudible]? At a high-level, the proof basically says the following: so you want to compute the Boolean noise sensitivity. We use our limit theorem to translate the Boolean problem to the Gaussian world where the Gaussian analog of noise sensitivity is the surface area or the Gaussian service area of a polytope. And there there's a beautiful result of Nasarov, which I briefly method mentioned before; so it is a Gaussian service area of a polytope with k facets, it's square root of log K times epsilon. The square root of log K. So here it’s important that our limit theorem itself have very good dependence on the number of facets because if you lose too much in the limit theorem, even if you have very good bounds with the Gaussian, well you don't get anything. And so we have to beat the union bound in both cases. And one application for noise sensitivity bound is to learning intersections of half spaces. And let me skip this application for now. And let me just summarize by saying that in both of these cases we beat the union bound by using some geometric techniques, especially some symmetric properties of the Gaussian distribution and this other example of, along the same lines in my work on computing the [inaudible] of Gaussian process. Let me briefly mention some of my interests and then I'll conclude. So one of the major focuses has been in pseudo-randomness and constructing pseudorandom generators and then focusing on constructing generators for geometric shapes like halfspaces, polynomial threshold functions, polytopes; and here, the pseudorandom generators are naturally motivated by questions and complexity theory. But they also have applications in algorithms. Especially through dimension reduction, counting solutions to [inaudible] programs and so on. And the bounded space algorithms, where they’re helpful for making some streaming algorithms efficient and simulating random walks efficiently, and so on to bound [inaudible] and to counting algorithms. So that is one [inaudible]. And the [inaudible] is hardness of approximation where the focus of my work has been on the unique [inaudible] conjecture, which is probably the most important open question in this area, and which if true, would solve many other important open problems like the complexity of Max-Cut, Vertex-Cover, SparsestCut and so on. And here in my work, gives the best evidence in support of the conjecture and so just the quality to approach improving the conjecture by using alphabet reduction; and this is one of the things that I thought about in the past and will probably think about in the future. And a learning theory, my work has focused on learning halfspaces, polynomials, and more recently, to adapting the smooth complexity framework from algorithms to make certain seemingly intractable problems in learning theory tractable. And I'm also done work in data mining where the focus has been mainly on matrix rank minimization problems. So you have a bunch of linear constraints on matrixes and you want to find the minimal rank matrix which satisfies these constraints. This has many applications, many [inaudible] such as a Netflix challenge or recommendation systems, metric and computing metric embeddings, kernel learning and so on. And here, this is more [inaudible]. So the focus has been on giving simple and practical algorithms which scale well for high dimensional data and also have some nice rigorous guarantees. So in summary, my work has basically been around using the structure of randomness in the form of limit theorems and the geometry of randomness, for various questions in algorithmic proofs, for complexity theories, pseudo-randomness, learning theory, and vice versa. And it is these connections I think that form the core of my research, and I’d like to explore these things in the future. Thank you. >>: Further questions? >>: Do you have, what’s the lowest bound [inaudible] polytopes? >> Raghu Meka: Of polytopes you need square root log K. >>: Times? >> Raghu Meka: Times epsilon. Sorry, square root of epsilon. Up to some constants. If you have a polytope with K facets, it seems like the>>: [inaudible]? >> Raghu Meka: Epsilon to the half, n square root of log K. >>: [inaudible]? >> Raghu Meka: We have log square and also the regularity assumption. >>: [inaudible]? >> Raghu Meka: So I think it might be easier to push the K down then to push the epsilon down because I think getting square root of epsilon in our techniques might be much harder than, like we use what's called the Linderberg[phonetic] method and even for a single, if you use that method even for the center limit theorem, you end up getting a bound like epsilon to one fifth. So that might be more harder than the K. >>: For the small k there's nothing better than this? [inaudible]? >> Raghu Meka: Right. So you have our results and it beats the union bound but it has a regularity assumption. >>: But with the union bound, only the K [inaudible]? >> Raghu Meka: Right. Yeah. Thank you.