Document 17864806

>> David Wilson: We are happy to have the third talk of the day. Ronen Eldan will tell us about the Gaussian noise stability deficit. >> Ronen Eldan: Thank you. So far I really enjoyed the talks in this seminar. So we are talking about the Gaussian noise stability deficit. Let's try to understand what Gaussian noise stability means. Our starting point is actually the Gaussian isoperimetric inequality. Let's see what that is. So how are setting in this whole talk is just RN equipped with just standard Gaussian measure. This is its density. And the Gaussian surface area of a subset of RN will just be defined as integral over the boundary of the set of the Gaussian density with respect to the n-1 they mention on a Hausdorff measure. So this is roughly how much of the Gaussian major increases in first order when we take an epsilon extension of the set. Now the Gaussian isoperimetric inequality proved initially by Borell and Sudakov-Tsirelson somewhere in the ‘70s says roughly that the isoperimetric minimizers are half spaces. In other words, out of all sets whose volume is some prescribed number, the set which minimizes the surface area is just a half space. What we'll consider is some extension of this isoperimetric inequality which is an inequality concerning noise stability. So let's first try to understand what we mean by Gaussian noise. We say that x and y are jointly standard Gaussian with some correlation role which is a parameter between 0 and 1. If either, so one way to define it is the coordinate of x and y are all normal variables and covariance matrix is just such that each one is just a standard Gaussian and the corresponding x1 and y1 say have correlation role between them and that's true for each coordinate separately. Another way to define it equivalently is the following. We take three independent standard Gaussians and we say that x and y have this common component route roh times Z one and then we add an independent component to each. So to x we add this and to y we add an independent copy of this so we can just think about this as being the actual thing we want to measure and this as being the noise. So x and y are the same thing with each roh is close to one we have some small noise which is distinct between x and y and we define the noise ability of subset a over n as just the probability that both x and y are in a so we should, maybe it would have been natural to divide this by the product of probabilities that x and y are in a. In some sense it measures how stable the set is to noise given that x was already in a, y was some noise version of x. What's the probability, how likely is it that y is also in a, so that's the Gaussian noise stability. Now a theorem of Crystal Borell from the mid-‘80s says that half spaces are not only isoperimetric minimizers; they also maximize the noise ability, so for all sets with a Gaussian measure, if I want to maximize the correlation between x and y both being in a, I want to take this, my set to be a half space. y is than an extension of the isoperimetric inequality, so it's not hard to see that when rho is very close to one. When the noise is very small the probability that x will be in a and y will be in a and the complement of a is just more or less proportional to the surface area, right, because x and y have to be kind of close to each other and this is just a calculation that the first order of change with respect to rho of the assessed ability is just proportional to the surface area, so this extends the isoperimetric inequality and okay. This result has many applications. It kind of connects many areas of mathematics. It’s relevant in approximation theory, in rearrangement inequalities, in concentration in high dimension, related inequalities. I just want to mention one discrete application to this, to the so-called majority stablest theorem. This is due to Mossel, O’Donnel and Oleszkiewicz and that is kind of a discrete version of the same thing which states the following. If we have a function defined in the discrete cube, so we can think about this function as an election system. It takes the votes of n different people and the outcome is just 0 or 1 who won the election say. And we're thinking about this point in the cube as just the uniform point in the cube and then we can consider a noise, so we can imagine, for example, that the people counting the votes sometimes make mistakes, so there is a probability of epsilon for each vote being counted that they regenerate the vote randomly. Now let's say that we want to maximize the noise stability so we don't want this noise, these errors to affect the final outcome and what the theorem says roughly is that the best way to avoid, to get a stable thing under a condition of low influences, what this roughly means that is each voter does not have a big effect on the outcome. I don't want to precisely define what that means, but it turns out that the most stable thing is just the majority function so we just have to sum them all up and check whether they are bigger than some constant. The best real-life application that I could come up with to Borell’s theorem was the following. Say we are collecting street cats, so we are wandering the streets and we see street cats which have different properties like their size and height and how loudly they meow, so these are all properties in the real line that have Gaussian, well I guess we could expect them to have Gaussian distribution and let's say that our goal is to, we don't want to break up families of cats, so if two cats are siblings we want to kind of get a high correlation between the events that I collect them both. So I want to maximize the expected number of cats that are family members that I collect and let's say that we have, so our space has two parameters. What's the weight of the cat, is it light or heavy, and what's the complex argument of the cat, is it an imaginary cat or a real cat and I'm starting to collect more and more cats and in the end I want to kind of decide how I, what is my criteria for keeping the cat or not keeping the cat. And it turns out that I want to choose criteria based on some half space like this. If the properties are not correlated it's easy to see that it would be a coordinate half space otherwise it will be some, I have to do some PCA probably and it will be some kind of half space. All right. We know that half spaces are the most stable sets. Now we can ask ourselves is this fact robust. Namely, if we know that a set is almost as stable as its corresponding half space, as that half space which has the same measure, is this set in some sense, does this look like a half space or we could ask the same thing about isoperimetric question. If a surface area of a set is almost like that of the corresponding half space, does the set in some sense look like a half space. So more formally we would like to say something like given that the deficit between the noise stability of the set a and the noise stability of the half space with the same measure is smaller, is it true that the distance between the two sets is small with respect to some metric and what metrics would one consider. A natural metric to consider is just the total variation distance, so just the measure of the symmetric difference and another measure one could consider is just what is the Wasserstein distance between the restrictions of the Gaussian measure onto the sets. And the first result we have in this direction is by Mossel and Neeman which says the following. We define Delta of a to be the minimum among all half spaces of the Gaussian measure of the symmetric difference between a and this half space. I also want the measure of this half space to be equal to the measure of a, so this is some kind of total variation distance between a and the set of all possible half spaces. And the result says that this quantity can be controlled by the deficit, so if the deficit is very, very small in some sense the set is close to a half space and this is true up to a constant which depends only on the measure of my set and on this parameter O. in particular, it implies that the quality case can only, I mean we could only have equality if our set is a half space up to some probability 0 change. Okay. So this robust inequality admits numerous applications basically wherever we have, wherever Borrel’s theorem is used, almost. We also get some robust version in particular for the majority stablest theorem we know a robust version, so basically the majority function is essentially the only function which minimizes, which maximizes the stability. And this also implies, so this could be seen as a quantitative version of Arrow's theorem for those of you who know what that is. It implies that the only way to minimize the probability of nonrational outcome in an election is by taking the majority function under some low influence assumptions. By taking growth to one in some cases we get also a robust isoperimetric inequality. I don't want to give details about this. The conjecture is that this exponent 4 over here can be replaced by 2 and I just want to mention that there is a slightly older robust result for the isoperimetric inequality by Cianchi, Fusco, Maggi and Pratelli. I guess I said that okay from 2011. Cianchi, all right. Okay. Let's go back to this metric and I want to try to understand. Maybe there's a better way to capture the distance between a and h. I want to try to convince you that at least when we're talking about noise stability this metric might miss something and to do that I want to construct a very simple example. The example looks like this. We are going to construct two sets which are slight perturbations of just the measure one half space on the line, so let's consider the real line. And let's say that this is 0, so the measure of all of this is one half and I want to take here an interval of measure epsilon and call it i2 and I'll call this thing an i1 so the half space is just i1 and i2. Now I want to take this interval and just move it slightly to the right and call it i3. And the set h would just be these two things. That's the original half space and a perturbation of h which I call a which will be just i1 and i3 instead of i2. But now I want to consider another perturbation. Instead of taking i3 to be here, I take this epsilon mass and move it a constant distance, so I put it here. So let's say that this point is the inverse Gaussian cumulative distribution function of 3 over 4, so this is one half and I put it in 3 over 4 and let's call that i4 and the set b will be just i1 and i4. Now it's pretty clear that the distance, well the total variation distance between both of these sets and the half space is just epsilon so Delta over these sets is the same. But on the other hand, let's try to understand what the noise stability of a and b are. Maybe I'll just, so a is the blue set and b will be the black set. Okay. To know what the noise stability of a is I have to consider the probability that both x and y are in a. So that's a probability that both x and y are in i1 plus the probability that x is in i3 and y is in i1 conditioning on x being in i3. I have factor two here because I can also replace x by y and I have an O of epsilon square which is the probability that both x and y are in the small interval. Now I have exactly the same thing for b so if I want to compare the deficit between these two guys in the stability of h, this suggests we have to look at the difference between these two terms. Now it's not so hard to realize that given that x is in i3 the probability of that y is in i1, the noise version of x is, well it's not so different than the same probability but given that x was in i2. I didn't move i2 so much to get i3 and if you calculate this you will see that actually the difference is of order epsilon. On the other hand if rho is not very close to 0, it's also easy to see that given that x is in r4, when I say that x is here, this diminishes the probability that y will be here by a lot. I mean well, at least by some constant factor. If I plug in these two facts to the previous formulae, what I get is that the stability of a is the stability of h, well minus something of the order of epsilon square while the stability of b is much smaller. Because I moved this interval over here, I got something of order epsilon and well this suggests that this metric doesn't capture what's going on so well, so I want to capture not only how much mass I move, but how far I moved it. So this gets us to the main there am I want to introduce and it's the following. So let's try to define different metric. Namely what we do is this. We take our set a. We look that all possible half spaces whose measure is the same measure as that of a, and we measure the distance between the centroid of h and the centroid of a. It's pretty clear, so if this is the origin, if a is someone here, h would probably look like this, and it's not hard to see that this measures how far I move the mass and not only how much mass I moved and I guess this result could convince you that this metric is somewhat more natural. What we get with this metric is again, I have to constantly depend on the measure of a and on rho. This deficit can actually be bounded from both sides by the same quantity up to some logarithmic factor. In some sense if we only care about knowing the deficit of two constants, it's actually enough. We don't have to calculate what the noise stability of the set is. We just have to calculate this quantity which is, I'm sure you'll agree with me that this is simpler to calculate. It's basically just the one dimensional thing. It depends only on the marginal of a onto a certain direction, right? Now this theorem has a few corollaries. First of all, the conjecture I mentioned is verified since Delta square is controlled by this metric epsilon, it gives an improved Gaussian robust isoperimetric inequality because by taking rho to one it turns out that you can also get the limit case. This is another example of what you could get by this inequality. For example, if you know that a set is a pretty good surface area measure, then when rho is close to one this deficit will be small which will imply that epsilon is rather small and now you use this fact with a larger value of rho and plug your estimate of epsilon here and this will give you some estimate on the noise stability in terms of the surface area. So somehow we know that the noise stability cannot get much worse as we increase rho by using this twosided thing. Any questions so far because at this point I think I'll move to some ideas from the proof. >>: I don't understand why the Gaussian squared is less than the [indiscernible] >> Ronen Eldan: Okay. This is, well I haven't explained why, but basically the extreme example in this case and it’s not so hard to prove it is the set a defined here. If you take the mass and move it very closely you can see that for a Delta square is of the order epsilon and it's not hard to see that this is the worst case you can be. Just project it onto one dimension and somehow play with it. And this is a very easy fact, but maybe not immediate. Glad to help. Let's talk about some ideas of the proofs. So what I'll do is mainly I will prove Borrell’s result. This is a novel proof of Borrell’s result based on stochastic calculus and somehow in this proof we will see how the centroid of the set comes up. I'm not going to really prove the robustness thing, but hopefully I'll give an idea of how to do it. All right. We are interested in this quantity, the stability of a, which is just the probability that x and y are in a. If we plug in the definition of x and y, it's the probability that route rho’s at 1+ route 1 minus rho is that 2 na and the same for y where z2 and z3, well z1, z2 and z3 I remind our just independent standard Gaussians. What we can do is definitely we can take expectation over z1 and inside the expectation we can condition on z1. We did nothing here. And when we condition on z1 it's clear that this guy and this guy will be independent. We can instead of just checking that they are both in a, we will just check that the first one is in a and take the square of the probability. At this point what we do is the following. Let w be just the standard twin or a process or a Brownian motion. It's clear that w time rho, the joint distribution of w time rho and time 1 is the joint distribution of these two guys. What I can do is I can replace all of this expression by w1 and instead of conditioning on z1 I'll just condition on whatever happens until time rho, so what we get is the stability is just a probability that a Brownian motion at time 1 is an a conditioned on the filtration at time rho squared. Until now we didn't really do anything. This encourages me to take this probability to look at the dual martingale, the probability that w1 is in the a conditioned on ft, this then I give it a name. Let's call it mt so we are actually interested in the expectation of m rho squared. Since mt is a martingale by definition. It's a dual martingale, this expectation by Ito’s formula is just the expectation of the quadratic variation of the martingale between time 0 and time rho. All we are interested in is how much this martingale really varies and in order to know what the quadratic variation is, what we want to do is try to calculate just the Ito differential. To do this what we do is, mt is just this probability. This probability is just the integral of some measure on a and this measure is just the measure of w1 conditioned on wt. Now w1conditioned and wt is just a Gaussian measure centered at wt. We already went used t of our time interval 01 which believes us 1 minus t seconds to go, so it's Gaussian whose variance was 1 minus t. And mt would just be the integral of this density ft over our set a. Now we have a process of measures ft which begins with a standard Gaussian. The Gaussian, its center moves according to a Brownian motion as the actual Gaussian shrinks, the variance shrinks and at time 1 we end up with some Delta measure. And we want d of mt which encourages us to calculate d of ft. If we calculate d of ft, ft, well we have a formula for it, so we can just use Ito’s formula to calculate the differentia. It turns out we get the following thing. I don't want to bother you with the actual calculation, but I do want to give you some intuition about what we get which is pretty simple. We get that this process measure varies in infinitesimal time what happens is we take our measure ft and we multiply it by a linear function which is equal to 0 on the center if x is equal to wt is equal to 0 and has a random gradient. Basically our process is we start with a Gaussian measure and we keep multiplying by linear functions with random slopes, I mean randomly distributed directions and this kind of makes sense because if we think about it in one dimension, we multiplied by many functions which look like this, one plus epsilon x and many functions which look like one minus epsilon x. We have many cancellations. Each cancellation looks like one minus epsilon square x square and if we take this to some high power we get something like e to the minus, some constant x square which is a Gaussian density. But not all of them cancel. Some of them, in the end we still are left with some terms which don't cancel out and this gives us an exponential which actually moves the center of the Gaussian. So this is, I mean this is a very simple fact but it turns out to be very useful and the reason it's useful is the following. If we want to know what d of mt is now we take, we just integrate d of ft and this is a linear function and if we integrate a linear function over the set a, all we care about is where the centroid of a is located. If the centroid of a is far from the origin, this will change the mass of a a lot and if it's at the origin multiplying by a linear function will just do nothing, so the center of mass actually appears here. But the center of mass with respect to what? With respect to some random measure fft. But it's not so hard to just change variables to get, ft is some Gaussian and of course I can make it the standard Gaussian by just moving the center and dividing by the standard deviation and if we do this we just get the actual Gaussian center of mass of a set, but the set is not exactly the set a. It's just the set a which I moved a bit and I shrink, actually I inflated a bit. In order to know, I remind you that we are interested in the quadratic variation of this process. It will be big if those vectors are big. At any given time I’m taking this vector and I am multiplying it by an in infinitesimal Gaussian. We finally get that the quadratic variation difference is just the norm of the center of mass of the Gaussian center of mass of some translate of my original set a. If we use the same change of variables we actually find that the measure of this set with respect to which I am integrating is just my martingale mt, so at each point in time I moved my set a somewhere so that it's Gaussian measure is exactly mt and the quadratic variation is just how far the center of mass is from the origin. I have five more minutes, I think. We started 5 minutes late. All right. So what we want to do now is to compare the quadratic variation with the process on a, which was an arbitrary set to the quadratic variation of the same process on a half space whose measure is equal to the measure of a. So let's take a half space h which satisfies this and define exactly the same process. Let's call it nt instead of mt and I want to see what the quadratic variation of nt is. Here we make this simple observation that if we started from a half space the set will always be a half space. If we move a half space and shrink it it remains a half space. The analogous expression to this would just be the same thing but know that in case of a half space all of this thing will only depend on the value of the dual martingale namely we have the martingale nt and the quadratic variation of nt is just some quantity is just some function of nt. What is this function? We take a half space whose measure is nt and we look at its centroid and measure how far it is from the origin and now we just observed one very simple fact and the fact is that if I have two sets which have the same measure, so I have the set a and a half space whose measure was the same as that of a, then the centroid of h, let's say the origin is somewhere here, then the centroid of h will always be more far away from the origin than the centroid of a because to get from a to h I have to take this mass and put it here and it's just a monotone one dimensional thing. This is a pretty obvious fact and this is actually the only point in the proof where we have some inequality so by using this inequality what we see is that given that mt and nt are the same we know that this quantity must be bigger than this quantity, so the quadratic variation of mt is always smaller than the same thing we get for nt. So we have two diffusion processes and we know that when they are equal one of them moves faster than the other. That doesn't exactly tell us that the quadratic variation of nt will be bigger than that of mt. We still have some work to do. What we can do is we couple mt and nt by saying okay, up to some time change they are both Brownian motions. Let's make them live on the same probability space by just saying that they are the same Brownian motions and then the inequality so, just means that given some, given the time of the Brownian motion the inner clock of mt move slower than inner clock of nt and this, well, this coupling gets easy to see that the quadratic variation of mt will be dominated by that of nt which finishes the proof if we take expectation here. We have an inequality and actually this gives us a stronger in fact. We have a stochastic domination between these things which gives us information about tire moments which in itself it has some more applications I just have to mention that, well, at least integer moments were already known, the inequality was already known by a paper by Mossell and O'Donnell but okay. This is a new proof of this. So this just gives us the inequality and let me just for 1 minute, in 1 minute just try to give you brief ideas about how to improve the robustness. To do this we have to say okay. We know that the process nt is ahead of mt but by how much? We know that whenever the very center, those distances are quite different nt kind of accumulates before mt becomes lagged after nt but well this metric epsilon just tells us something about this difference times 0 and we want to say that somehow well, we want to say that given that it's larger times 0 it kinds of remains large for quite some time and to do that we kind of take the second derivative of what's going on. We take the Ito differential of this process epsilon t which turns out to be dictated by the behavior of some random matrix which this process is related to and we can analyze this random matrix. Well it's kind of a stochastic random matrix. We can analyze it with some spectral tools and [indiscernible] transportation entropy and equality is kind of the central tooling analysis and in the half a minute I have left I just want to advertise that okay. This kind of stochastic equation we, it was pretty simple for us to derive. We just took a very natural process and differentiated it, but we can actually, given some initial measure mu we can actually define a new process using these stochastic measures, so if the initial measure is not a Gaussian but something else, we can still somehow follow to the same method and give, and get some kind of a stochastic evolution on the space of measures and, for example, if we start with the uniform measure on the discrete cube but embedded in RN this gives new direct proof of the majority stablest theorem with a slightly stronger version of the conditions we need. This is joint work with E. Mossel. The conditions we need our slightly weaker and it turns out these equations turn out to be a pretty useful tool in high dimensional convex geometry. Yeah, I guess I'll finish here. [applause] >> David Wilson: Any [laughter] any other questions? Any other questions?

Document 17864806

Related documents

Products

Support

Document 17864806

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib