>> Amir Dembo: So it's a pleasure to introduce David Wilson from Microsoft, who will talk about Oded's work on Boolean functions. >> David Wilson: Thank you. Okay. I'm still going to be talking about these Boolean functions tie into random term games and percolation. So remember at one point Oded declared that he didn't intend to work on Boolean functions anymore, but then we had Ryan O'Donnell as a post-doc and Eti was visiting and Mike Saks was visiting, and somehow this was too much to resist continuing work on Boolean functions. So we're going to -- there are two conventions. So we're going to have the functions met by this one, one to the Nth to minus 1 and 1, and so one particular question that we looked at, and this was with Eti and Oded was the following. So if you have a function and you have a way of evaluating it, by a decision tree, so what the tree does is it looks at bits one at a time, makes a decision as to what bit to look at next and then read the next bit and at some point it stops reading bits and that puts the value of the function. So delta I is the probability that the Ith bit is read. Okay. And we see that delta T is the maximum over I of delta IT, and delta of the function is the minimum over decision trees of the maximum probability that the bit is read. Okay. So how small can delta be? All right. So just to give some example Boolean functions. Okay, so if the function is always one then delta is zero. Okay. And so to run up this degenerate case, we require the function to be balanced, and what a balanced function is expected value of the function on a random input somebody zero. So half if time it's one and half the time it's minus one. Okay. So there's the dictator function which always returns the value of the first bit. And for that, delta's one because you always have to look at that bit. The majority -- like I said, this is order of one. You have to look at most of the bits in order to figure out what the majority is. And this is in contrast to the related concept of the influence of a bit on the random function. So what the influence is is the probability that F of X is not equal to F with F of X with the Ith bit flipped. Okay. So this is the influence of the Ith bit. Okay. And so here the majority of influences is pretty small. It's order one over root N. Okay. So another classic example of Boolean function which is two to the [inaudible] lineal is the tribes function. For this you have two to the N blocks of N bits. And if any of these blocks are all zero, the function is zero. Or I should say minus one. Okay. So for tribes the influence is log N over N, which is small, but delta is still pretty big because you have to look at most of the tribes in order to verify that the function isn't minus one. But for any given tribe you only have to look at order one bits typically, and so it's order one over log N. Okay. So we constructed some example for which delta was one over N to the half times square root of log N, and I don't really want to describe the example in great detail, but there's an easy word bound of one over root N for any function, and that's just because if you -- if each bit is thread with probability one over root N, you're looking at fewer than root N bits, and if you have an independent input and an independent run of the algorithm, there's a good chance that the two runs will look at different bits, different collections of bits, and furthermore that they'll have different answers, and then you can just sort of combine the two inputs, and the algorithm would -- I mean, I can't decide what output for that case. So, all right, so that's the lower bound here. And for this example, the influence is also one over root N times root log N. Okay. So then we also looked at monotone functions, and for that there's some construction that gives one over the key root of N, and turns out the influence is one over N to the two-thirds log N. So this is based on a branching process, but I won't say too much about the actual function. So what I want to talk about is this. So there's a matching lower bound, one over N to the one-third, which holds for any Boolean function and. >>: [inaudible]. >> David Wilson: Any monotone Boolean function, yes. Thanks. That was proved by Oded. And I'll say a few words about how this proof goes. Okay. So this is the Fourier coefficient corresponding to a set S. This is the expected value of F of X times the product of I and S of X and I. And for monotone functions, the influence of the Ith bit is just the Fourier coefficient on the set containing I. Okay. So there's a band due to Servedio and O'Donnell, or O'Donnell and Servedio, which says that if you were to sum up these Fourier coefficients, this is at most a square root of the expected number of bits read by any algorithm that evaluates the function. >>: Is that just [inaudible]. >> David Wilson: This is for any function F. Okay. And so this is O'Donnell and Servedio. Basically the way the proof works is that this sum can be expressed as an inner product. And so this is by Schwarz and let's rewrite this slightly. >>: [inaudible] remove the letter from [inaudible]. >> David Wilson: All right. All right. So this is sum over I, F of X squared. Sum over I, X of I one if the Ith bit is read. Okay. So F is plus or minus one valued. All right. So this evaluates to one, and here if you expand this, so you get a term of both bit I and bit J read, and if I and J are different and expectation is just going to be zero, and so you only get the diagonal terms here, and okay, so this is X of I, X of I, which is always going to be one, and so you end up getting an expected number of bits read. And like I said, this also follows from inequality due to Schramm and Steif which is that. If you sum over sets of size K, this is the most delta K [inaudible] of L squared, which for Boolean functions is always going to be one. So here you take K equal to one, and do some manipulations and get a similar inequality. Okay. So there's another inequality which is due to O'Donnell, Saks, Schramm, and Servedio, which we need, and that's at the variance of a function, Boolean function is at most the sum over the input bits I, delta I for any decision tree times influence of the Ith bit. Okay. So for balance functions, the variance is going to be one. This is upper bounded by delta. Okay. Let me get the sum of the influences. And for monotone functions influences just this Fourier coefficient, which then by O'Donnell, Servedio is MS this. And so expected number of bits read, number of bits read is MS delta times N, and in the end this knows that delta has to be at least N to the minus one-third. Okay. All right. So I'll say a few words about how the O'Donnell Saks, Servedio, Schramm inequality's derived. So what they do is they say that if you look at say X and Y are independent bit patterns and look at the expected difference between F of X and F of Y, and if the -- if the decision tree is gonna look at bits I1, I2, and so on, up to IS in that order, we're going to define E subzero to equal to Y except of the notation that they use as X at I1 up to IS times Y. So what this means is that it's equal to Y except that at these bit positions where it's equal to X. U1 is the same thing except starting at I2. And then U sub-S is equal to Y. Okay. And so by the triangle inequality, this is at most the difference between F at U0 minus F at U1. Okay. And so if you take a generic term in there, F at U sub-T, minus F minus one, F at U sub-T, this is the sum over I of on the same thing times one is the I sub-Tth bit read is equal to I. Okay. So if you condition on the bits that you've read up to time T minus one, then what you have in U sub-T minus one, well, that was just equal to Y, except that at these bits that you haven't read yet where it's equal to X. And so this is conditionally just a random input, and if you compare it with U sub-T, that's just the same random input except we randomize that the bit I sub-T and okay, so if you subtract these then conditional on what you've read so far the expected value of this is just the influence of the Ith bit on F times the probability that you read bit I at time T okay? And so then if you sum over all Is and sum over all Ts, it skips to the right-hand side over here, and the expected absolute difference here is basically the variance of the function. So it's how they got their bound. Okay. All right. So basically the O'Donnell Saks Schramm Servedio band says that the product of these two has to be at least one over N and it's and it's recently typed for several of these examples. Okay. So I guess I have a few minutes left, and so I'll tie this into random-turn games and okay, so how about if I start the projector. All right. All right. I apologize for the technical difficulties here. All right. So this is -- this is one of Oded's favorite Boolean functions here. If we percolate that is. Okay. So we're going to tie this into random-turn games. Okay. So this is the game of Hex where at each turn a coin is flipped to decide which player moves next. And in this case the computer is playing against itself. And so this is from work with Yuval Peres, Scott Sheffield, and Oded. And so one thing that we show is that the probability that one of the players wins is equal to the probability that you'd have a crossing in random percolation. And you can show this by induction. So ->>: [inaudible]. >> David Wilson: On any board, yes, that's right. Okay. So it's true. If there are no nukes left or if there's one nuke left and -- okay. So suppose that you've shown that if there are K nukes left then if they're K plus one nukes left you can verify that you know the best move for black is the site that's most likely to be pivotal and it's also the best move for white and -- all right. So anyway. The probability that black wins is the probability of a black crossing in this game. Okay. So all right so from work of Smirnov and Werner ->>: [inaudible]. >> David Wilson: Yes. I believe I said that. >>: [inaudible] what do you have to do to win? [laughter]. >> David Wilson: All right. So white ones connect the white sides, black ones connect the black sides, and they toss a coin to decide who moves next. And they can't move tiles. Once they're placed, they're placed. >>: What was the last thing you said? >> David Wilson: You can't move tiles. Once you lay them down, then there they stay. >>: [inaudible]. >> David Wilson: Okay. So this can be viewed as a decision tree algorithm. So the players play optimally. They say we're going to move, and instead of having the coin toss decide who moves next, you have -- they agree on their common best move and the coin toss is the random bit, and from Smirnov and Werner, we know that the influence of the bits near the middle is going to be for an L by L board, L to the minus 5 over 4. And then from the O'Donnell-Servedio bound, okay, first of all, this is a monotone function. If we do submission over all sites, this is going to be L squared, L to the minus 5 over 4. >>: [inaudible]. Right. Let's [inaudible] square. And so this is -- all right. So this is at least that. At least L to the three halves plus little O of one. So this says that if you are to evaluate this percolation crossing function, no matter which strategy you use, it's always going to take at least L to the three halves coins to uncover and decide what the function is. And for random-turn Hex experimentally exponent seems to be around 1.5 and 1.6. And so I think I'm about out of time, so I'll stop here. [applause]. >> Amir Dembo: Questions or comments? >>: So you don't expect this to be optimal [inaudible]. >>: It's an optimal strategy for the players, but it's ->>: No, no, but it's not optimal [inaudible]. >> David Wilson: We don't know any better algorithm. Maybe it's optimal, maybe it isn't. We do not have an opinion. Yes? >>: I have one comment. One is that this whole project actually also started from a hike with [inaudible] ledge where the, you know, on the hike suggested that there should be a connection between percolation and Hex because they're happening on the same board and so you should take seriously these suggestions to, you know, play various games. And the other comment is that there's a -- so there's a lower bound of L to the three halves for the length of the game that we get from the argument [inaudible] but the embarrassing thing is we don't know any upper bounds. So the game last most L squared, it's about L to the [inaudible] 1.6 [inaudible] show less than L to the 1.99. L to the 2 minus epsilon from the length of the game. You don't know -- we don't know even [inaudible]. >>: I think we'll have ->>: [inaudible]. [applause]. >> Amir Dembo: Okay. So it's a pleasure to resume the session with the next talk by Christophe Garban on Oded's work on Noise Sensitivity. >> Christophe Garban: Okay. So first I would like to thank the organizers for this conference which enables all of us to remember Oded and his mathematics. I had the chance to work with him during a about two years, and let me tell you it was an amazing experience. Maybe I should say quite and amazing experience. [laughter]. So I could experience many things that [inaudible] described this morning, like for example when we worked with Gabor in the [inaudible] and some strategy completely collapsed which happened very frequently, then the next morning Oded would come with something completely new. So I experienced many of these things. Many of the only thing that's changed from '99 to 2007 was that [inaudible] was into fussball maybe. So in this talk I will present one of the many fields in which Oded made the great contributions, namely his work on noise sensitivity of Boolean functions. But I will restrict things to the case of percolation. So we will encounter Boolean functions but applied to the case of percolation. So what we will see is that when you look at critical percolation in the plane and you're interested in microscopic properties of this percolation, these microscopic properties they are very sensitivity to perturbations. If you change just a little bit the picture, your large scale connect properties will be completely diminished. So we will see that this will correspond to the following phenomenon, the phenomenon that macroscopic events in some sense refuse high frequency. So we will see what I mean by this. So just to illustrate this perturbation to this sensitive to small perturbations, here I have a simulation of a create Z2 percolation. So I have PC equals 1F, and I have joined the three largest clusters in the left so the red one is the biggest and then the blue and the green, and on the right it's the same configuration except there are some mistakes, so -- or rather I know a little bit the left picture. So if you have good eyes you would see that the two discrete configuration are in a way very close but nevertheless the microscopic properties there are very different. >>: What's the noise level? >> Christophe Garban: Well, I cheated a little bit because it's not that microscopic. Here the noise is maybe 0.1 or something like this. So the noise here, it means that for each single edge you keep the edge with probability 1 minus epsilon [inaudible] epsilon you [inaudible] it. So here I -- so one of the motivations that Oded [inaudible] when they looked at these things was to start a root towards proving conformal invariants. So I won't go into this because it starts from a simple root, but another motivation was to study a model called dynamical percolation which is a model which is the analog of [inaudible] dynamic but in the case of percolation, and here I have a movie done by Oded but I don't know if -- okay -- it's going to work. So this movie done by Oded you see a per calculation in the triangular lattice, so imagine this thing in the whole plane and each [inaudible] are called into [inaudible] so you have your critical percolation which is everyone being in time and what was expected from the very first paper on dynamical percolation was that even though at each fixed time you see a critical percolation, so eventually you see only finite [inaudible] when you run the dynamic you will have some explosion times where an finite cluster should appear. So this took several years to be proven, and it was proved by Oded and Jeff and I mention this later. But in order to prove that you have such explosion times, exceptional times, you need to show that somehow the [inaudible] the system is moving really fast so that sometimes it can catch infinite paths. So you need to prove these kind of sensitivity statements that large scale properties they move very -- really fast. There is a fast mixing property for these things. Okay. So those are the motivations to you can this instability. So in percolation we are mainly interested in clusters and connectivity and things like this, and these large scale connectivity properties, they are naturally encoded by Boolean functions. So for example, if you take a large rectangle, you may ask if there is a crossing on that from left to right. So this is a Boolean function and this is the same kind of Boolean function as the one David talked about at the end of his talk. So it's defined very simply like that. It's going to be one if there's a crossing, and zero else. And we are going to consider this Boolean function in larger and larger scales. So remind the movie that I showed at the beginning. So when we get zero would be the initially configuration. And if you look at the configuration at later time T, small time T, then again you have noised configuration of the initial one. So between [inaudible] zero and [inaudible] T, you change a small proportion of hexagons. So we will look at function at this big scale percolation incarceration and we will wonder how sensitive they are to this type of noise. So how to quantify that a big connectivity property like the Boolean function we have before, how to quantify that it mixes fast or that it loses its memory very fast, very simply we can classify this with the covariance. So we can look at the covariance of having a crossing at time zero and having a crossing at time T. And if this covariance is the converge zero it means you lose these formation. So the correlation of microscopic properties will correspond to these covariance is going to zero. So noise sensitivity corresponds to the vanishing of these acquaintances here. Maybe the second one you can think of it as you know what's the initial configuration is and you know that a small proportion roughly T of bits has changed. What can you guess about the outcome FN of omega T, but the crossing of FNT. If it's no incentive this variance is going to zero so basically this means that you can't guess anything. So note that -- note that defined in such a way noise sensitivity is not quantitative statement and in order to prove that you have explosion times we will need more quantitative versions of noise sensitivity. And what I mean by this is we will need to know at what speed the large scale system decorrelates. So we will need to know at what -- at which speed these covariance converge to zero. So the natural setup to look at these things and I think Eti once told me that initially that was not so much convinced about this, but his later contributions proved that finally was convinced and he was very comfortable with it is to walk on the Fourier side. So I quickly recall harmonic analysis of Boolean function, but David used some of it just before. So we view all Boolean functions in the larger space of day two space in the space in the [inaudible] and [inaudible] and when you do if you will analysis you just need to find the natural and convenient optimal basis. And here there is a very natural and convenient one which are the characters of this group. So this autonomic is indexed by the subsets of the bits, S subset of this, and for each of these subsets S, the character corresponding to the subset S is just the product of the value of the bit in this subset S. So this gives you 2 to the N such functions and it's easy to see that they are orthogonal, and so dock full analysis in the [inaudible] is just projecting your function on each of these two to the N characters. So this gives you a Fourier coefficient. And for example, the Fourier coefficient corresponding to the empty sets is just protecting to the constant, so this is the average. So we will see that when you take any Boolean function and you look at the sequence of it's Fourier coefficients, it's fully encode the interesting properties for us. So why is it helpful in mixing of sensitivity of the Boolean functions is just because the covariance is -- are easily expressed in terms of the Fourier coefficients. So if you want to compute the correlation between time zero and time T, then you write down your observable and you project it on your basis, and then you use the fact that for two different characters they are orthogonal, so only the diagonal terms remains. And you end up with this simple formula here. So try to remind this formula so at least the next slide. The covariance is the sum of the Fourier coefficient squared times this E to the minus T times S. So you notice here that if your Boolean function has coefficients of high cardinality, then the covariance is going to be very small. On the other hand if your Boolean function is supposed on the low frequencies, then this is going to be stable. Okay. So this agrees with the usual intuition. But what will be important for us when you have a Boolean function will be this what I think physicists call energy spectrum, which for presence the contribution for all K between 1 and N of the level K Fourier coefficient. So if your function adds a distribution which is not on the right then it's going to be more sensitivity than if it's stuck on the left. So just using possible you can see that the total mass of this spectrum where we don't need to use the empty set corresponds to the variance of your Boolean function. So now in percolation what we will need to do, we have these Boolean functions F of N, which corresponds to left to right crossing in boxes of scale N, and we'd like to know what is the shape of the energy spectrum of this Boolean function. Is it located near finite frequencies or does it spread to infinity or how does it look? So we'd like to describe the shape of a -- what we call FN was a -- these type of Boolean functions. We'd like to describe the shape of the energy spectrum. So and if we want a quantitative statement in the sensitivity of percolation, we want more we would like to know at which speed the spectrum mass will diverge to infinity. So just as a comparison I don't hear the energy spectrum of the majority function. So in this case, you can compute things explicitly, and you can see that in some sense most of the spectrum mass will be localized in finite frequencies. So we expect that in the case of percolation it should look differently. So there are basically three very different approaches to try to localize where the special mass is. So the first one was done by Oded, Eti and Gil in '98, and it's used techniques from analysis. And so I mention what their result was later. The second technique was a technique based on randomized algorithm, and this is something which is related to the previous talk a little bit. And the last one was the work we had many attempts Gabor and Oded, and finally [inaudible]. And so you can -- when you look at these approaches which turn out to be very different, you can see that there is a common denominator to these three. And if you don't see then I can help you. So okay. So now I just present these results. So the first one -- from the beginning there were heuristical arguments which say that which speed the spectral mass should diverge at infinity, and it's even more than this. It can prove that a positive fraction of the mass will spread at this speed here, if you know the critical exponent. But what a little bit embarrassing is that even though you know that a positive fraction diverges really fast, it could still be that you have a positive fraction lying at the bottom part of the -- of this distribution. And it turned out to be hard to handle the lower tail of the distribution. So the first result that I mention is -- which uses hypercontractivity and which uses ideas that came from analysis but are so computer science is at the time proves that the spectral mass diverges at least logarithmically fast. So stated in terms of Fourier coefficient, it means that asymptomatically, you don't have mass in the finite frequencies. So this already proves you noise sensitivity statement. So the second result was -- that I did with Jeff went a little bit further and proved that the spectral mass diverges at least polynomially fast. But they proved more. They gave bone on what is the lower tail of this energy spectrum and this lower tail control enabled them to prove that there are exceptional times in the dynamical percolation model. And finally with Oded and Gabor we could go all the way to the expected end to the [inaudible] with a sharp control on the lower tail. So maybe I should insist here that even though on the third we can go all the way to N to the three-fourths. These two approaches they have the advantage to be much more general than the one we did a priori. For example, the one that use the hypercontractivity in some sense says that if you take any Boolean function so that all -- each of its variable are very influenced in the outcome, then you will have a control -- logarithmic control on it's Fourier spectrum. So somehow there are matters here we can apply them to many cases. And as well in the case of a -- the work by Oded and Jeff, if you have a randomized algorithm which computes your function so that it looks at very few bits, then you will a very good control in the Fourier spectrum. So maybe this third approach could be applied to other things but it's not clear yet. So just to say on word on the last approach, and then I will describe in more detail the green one, the one with Jeff. The last approach, it's in the real case if you would have a function in L2, and you look at this Fourier transform, then the square root of this Fourier transform it defines you a probability measure in the real line, so you can see this as a random variable for each function F you can see this function as a random variable in the line. For Boolean function you can do exactly the same. But this time it -- instead of having a probability distribution in the real line, you will have a probability distribution on the characters. So you will have a probability distribution in subsets of your bits. And very roughly speaking the idea of this third approach is to sample uniformly to sample a frequency according to this measure. So this will be some subsets of -- so if your Boolean function is the left to right pressing of before it will be some random subsets. And this random subset was believed to be very close to the pivotal points of percolation. But it turns out that this actual should be different. And the goal was to try to study the properties of this random set and to try to set it in a way. This is close to a random counter set, a little bit. But the difficulty was that the dependency structure of this random set is a little bit hard to analyze. So I won't describe more the red and the blue approaches and I will now try to describe more the algorithmic approach that he has done with Jeff. So this is based on randomized algorithms. So what is this? So you if take a Boolean function F from the cube into 0, 1, a randomized algorithm is something that computes the function F by examining bits one at a time. So you take a first bit random E or according to some procedure and depending on the value of this first bit you choose any one and so on until you discover the output of the function F. And an algorithm for us will be efficient if you manage to compute F using the smallest number of bits. So to quantify this, I use the same thing as a what David talked about before, which is the [inaudible] over all the variables of the probability that the variable is used along the algorithm. So this is called revealment. And for an algorithm A, we will call J the random subset of the bits which is actually used along the algorithm. So David already gave the example of majority, so majority there is nothing really smart to do, you just need to ask people one at a time and you'll at least to ask half of the people, and it's easy to see that with high probability you need to ask many, many, so the revealment is not going to be very good. So you could take other examples like recursive majority. And here you can -- I won't do it, but you can easily find algorithm which will ask very few people. And so we are interested in percolation. So what would be a natural randomized algorithm? Well, we've seen it already several times today, so say you want to have the left to right event here by white hexagons, so what I did that is present in all the work of Oded and 2D percolation is to use exploration pass. So here the randomized algorithm you can ask what is the value of the first hexagon and depending on it you ask -- you continue your exploration pass and so on. So if your exploration pass ends up being here, the Boolean function is going to be zero, and [inaudible] it will be one. So this is a nice randomized algorithm, but not so nice because it makes the revealment equal to one. Why? Because here I always ask the value of the first [inaudible]. There is an easy way. You can randomize the first -- the departure of the exploration pass and maybe you need at most two exploration pass to be sure of the result but eventually by randomizing the beginning you can have a nice randomized algorithm computing the left-right crossing. So let me just tell you what will be the revealment for such function. Asymptotically the bits used by the algorithm is J set. It will be this S 86 curve that [inaudible] mentioned this morning. So asymptotically you see that the amount of bits which are used by the algorithm is very small because this curve is -- occupies a very small fraction of the window. And since we know that this is something of dimension, as David mentioned seven-fourths, it is it even implies that the revealment is of order N to the minus one-fourth. So there are some issues to play with, but this is standard in percolation, so for left to right pursing on the triangular grid you can have algorithms very small revealment. So now how does it -- what does it say about the spectrum itself? How does it help us to localize where the spectral mass is? Well, there is a I think very nice and the proof is very short of this theorem which were done by Oded and Jeff and was mentioned by David before. So if you take a Boolean function and [inaudible] even the real valued function, and if you have a randomized algorithm which computes F of revealment delta, then only this information gives you deep information in the spectrum. For any level K, it will tell you that the level K mass of the Fourier distribution is dominated by K times the revealment times delta squared of F. So in particular in the case of the percolation crossing, since we have a very small revealment, then we directly have a information on the lower tail of the Fourier distribution. So in the time remaining I'd like to explain how to get this result here. So so with start with some function like this, and we have this algorithm which revealment data, and we would like to have a bond on the level K Fourier coefficient. So to do this, we can look at the projection of the function F on the level K characters. So we look at the function F, but only on the level K frequencies. And so the only goal is to estimate what is the L2 norm squared of this function. So what we want to prove is that this is less than K times delta times this. So how to do this. Well, first notice that L2 norm squared, this is the same as this. And now we want to use the knowledge of the fact that the -- we have an algorithm and the fact that it has a small revealment and so on, so knowing the information coming from the algorithm, now the function F is computed by the algorithm, so this is measurable with respect to this algorithm A that this going to be this. And by [inaudible] this is less than the norm of F. So now the goal is to try to bond this expectation here. So let me just try to take a simple example to see what's going on. Assume we have six bits, so X1, X2, X3. So X5, X6. And say we have some Boolean function but we only look at the projection in the level 2 coefficients and say that once projected we have some coefficient times X1, X2, plus some coefficient times X1, X4. Because [inaudible] times X5, X6. So basically we have this frequency here, this frequency here, and this one. And now say that you -- you compute your function F, which is above this function G, right, and say that the randomized algorithm is going to look at these three variables. So F is computed but J is not computed completely. So what is EJ of AE? So in that case if you know what the algorithm here, you -- this conditional expectation is going to be alpha times X1, X2. Here you didn't learn anything. Then you will have beta X1. So let me write it like it this. So this term is constant because you know what it is. The algorithm told you what it is. And then you have the last term which is gamma X5 A, X of A. So this is a constant term. This is a level one term. And this is a level two term. So all of this, to say that once you apply the algorithm, you have a kind of collapsing of the frequencies from the size K to the smaller sizes. And the whole game is to get a bone on the collapse on the empty coefficient. >>: When you say [inaudible]. >> Christophe Garban: Yes. So -- right. So here it was not the expected value, it was -- I don't know how to write it but -- oh, actually, I want to write it like this. So this is the random function which depends on the algorithm and then the set J that you visited, and this is the expectation of this thing, which is the expectation of J knowing A. And here you only keep this term which is constant and the other one they vanish. So this term is basically the average of this random function when you average all the other bits. So we can write it like that. Okay. So here when you run the algorithm, you discover some sites and you have all your frequencies floating around. And when you discover the whole frequency, this goes to the empty coefficients, and this accumulates, and you want to say that if the revealment is small, this is not going to accumulate too much. So now just to bond this thing, so this is the expectation of the square root of the empty coefficient for this random function here, and we can write it like this. So by possible, this is going to be exactly this. So in some sense it's not so easy to study this collapsing packages, so Oded and Jeff had this trick to reverse the study and now what to do with it? Well, the expectation of this, this is just the expectation of G squared. And okay, so we get this. And now we don't really know what to do with the frequencies which are of size other than K, so we just bond. So so far we didn't lose anything in the control of the Fourier mass, but here we bond by saying that this is less than G squared minus only on the size K coefficient. And this gives us -- so this gives us so E of J squared, this is the sum this. And we subtract the sum of this thing. So these are also the coefficient of J. This is the same. So now what do we do with it? Well, when you have a frequency somewhere and you set up bits of size K, just when we need it, then now you send the algorithm in to reveal sites one at a time. If for this frequency here you reveal one of the sites, then the frequency here will collapse to the size smaller. So for this to be nonzero, you need to -you need the algorithm to avoid the frequency here. But now each individually each site of this size K set as probability at most delta to be visited. So the policy that's the whole thing is visited will be less than K times delta. So this is nonzero only with [inaudible] here. K times delta, so this is going to be less than K times delta. Some over S equals K. Like this. And this is G squared. And if you plug in in the [inaudible] this gives you this. Okay. So this estimate gave a lower bound in energy spectrum of percolation and also for the right zero -- for the right zero event, and that's what gives [inaudible] manages the existence of explosion times. Okay. [applause]. >> Amir Dembo: Questions or comments? I think we'll thank Christophe again. And we will start without a break. So it's a pleasure to introduce Gabor Pete who will present a talk on how to produce -- prove tightness for the size of strange random sets. >> Gabor Pete: Thanks. Okay. So this is Oded fetching water for our dinner with Christophe at Mt. Ranier. So the water is coming from inside the glacier. So officially the [inaudible] was unsuccessful in the sense that we didn't get to the top. We had to turn back from 4,000 meters about. But I really enjoyed it, and it was a great experience for me. So still I'm very grateful to Oded for the hike. >>: I have a question. >> Gabor Pete: Yes? >>: [inaudible]. [laughter]. >>: How did you get him to turn back? [inaudible]. [laughter]. >> Gabor Pete: Okay. Ask Christophe. But he was okay. >>: Probably worried about [inaudible]. >> Gabor Pete: Okay. So the photo is not related to the talk. Except that the people involved are the same. So it will be a continuation of Christophe's talk in the sense that this strange random set I'm talking about is the Fourier coefficients of create preparation. But some of the stories I think is more -- is interesting also more generally. So I'm trying -- we'll try to explain that. And also I think whenever we gave a talk on this paper we never managed to get to the -- to this last point that I tried to remedy this. Okay. Okay. So Christophe gave the introduction what the Fourier coefficients have to do with the noise sensitivity, but I recall very briefly. So, okay, this is difficult. Okay. Anyway, so when we have a finite set, V, and we have all the possible plus minus configurations, save it's simplicity with uniform measure, so say or plus -- or black and white colored inks of the head board and you have -- you have functions on that for example the crossing function, the indicator function of having the left side crossing, and the -- so that add to space has a very nice autonomic basis with respect to the seizure of [inaudible]. This is just the expectation of the product which is just the product of the bit. So this is and optimize basis. And you can take the Fourier expansion in this basis. This is called a Fourier-Walsh expansion. And you can do a little bit strangle thing. This is like a bit like what quantum mechanics does that you can encode all this information about the Fourier coefficient into a random set. So by Parseval the sum of the Fourier coefficient squared is just the F2 norm of the function squared. Okay. So if I normalize by this, I can just define a random set, a random subset of the bit corresponding to F. So the probability that the set, the spectral sample is this given S is just the normalized Fourier coefficient squared. So this way I get a random subset of diversities. So I get it in one piece. It tells me the distribution. And so how is this useful for noise sensitivity? So Christophe showed you this formula that if you have a configuration omega and you resample each bit with probability epsilon independently, so this is the noise diversion of the configuration and you look at the correlation -- correlation between and after the noise, then you can write this very nicely in terms of the Fourier coefficients. So this -- you have the sum over non empty sets because I subtracted this thing here which was exactly corresponding to the zero, to the empty set. Okay. So if you do this little error to arithmetic you get that -- you get this formula. So the correlation is basically some ways measured by the size of the -- somehow the typical size of the spectral sample. So typically in what sense? You can see from here that -- so if -- so you are taking a noise epsilon. If there is no weight between zero and some large constant over epsilon, so -- yeah, so the probability of those samples is very small, then that part of this expectation will be small. And for larger sizes, it will be small because it's an exponential thing. So this whole thing will be small if you don't have math between zero and K over epsilon. So which means that if you want to show that epsilon noise or that they made you decorrelate, then you want to show that there is no -- with this strange random set doesn't have -- it's unlikely that it's non empty, but it has this size. So we are after such lower [inaudible] for noise sensitivity. And so at least to us it was Gil Kalai who suggested that although we are really interested only in the size, there is no -- I mean, here is only the size of the set. To understand that size maybe you could look at the whole distribution as a random set in the critical percolation say for the crossing in an invariant squared, it's a random subset of the invariant squared. So we should look at it as a random set. And the -- so it's a strange set, as I said. We don't know a way to sample from this set, for example, effectively. There's I mean some relation to quantum word is that there's a theorem due to Bernstein and Vazirani that if the Boolean function itself is polynomial time computable, then there's a quantum algorithm to sample from this virtual sample. Okay. I don't know. Maybe there are some conjectures that say that it isn't be possible. I have no idea. It is possible to sample or not. Don't expect anything. I don't know anything about quantum algorithms and stuff. Okay. One sign that looking at the spectral sample for critical preparation could be interesting is that Tsirelson, Boris Tsirelson has a theory of noises and this combined with the Schramm Smirnov theorem think that the scaling limit of critical percolation is a noise. They don't tell you what that is. So Tsirelson's theory of noises does apply. And from that and the -- and Smirnov's theorem that critical percolation has a conformally invariant scaling limit. The combination of the thing says that the spectral sample, so the left-right crossing in a conformal rectangle just take a domain with four mesh points and I'm interested in the left-right crossing in this conformal rectangle with mesh 1 over N. So the sequence of a random sets as N goes to infinite has a scaling limit and the scaling limit is conformally invariant. It doesn't say [inaudible] anything about the scaling limit, but certainly it should be from source of interesting object. Now, also maybe this came up in the previous talk. At least for -- so it has something -- the spectral sample has something to do with pivotals, so if F is plus minus 1 value, then you can talk about pivotal bits, so those are the bits that if you flip the bit then the outcome of the event changes from plus to minus one or vice versa. And so it's not hard to see that the probability that the size is [inaudible] is exactly the probability that the size is contained in the strange spectral sample. And this is also true for pairs of points, okay? But it's not true for more than pairs. So for three deposits, not 24. So this is also Gil Kalai's observation. Which sort of says that it does have a lot -- this set does have a lot to do with the set of pivotals, but it is different. So both random subsets measure the influence or relevance of bits in some sense but in a little bit different senses. Okay. At least for left-right crossing in a quad you can show that the probability of the spectral sample intersects some simply connected domain -- domain B is comparable to -- the constant is the same as the probability that that subdomain B is pivotal for the left-right crossing. So a subdomain is pivotal means that if you change all the bits to all black inside the domain or all white in the domain, this makes at least this makes a difference. And that is basically the probability that you have the 4M event from the boundary of B to the boundary of Q. So what is this 4M event? So like a bit is pivotal if and only if it's very easy to see that you must have a -- so if you are interested in a white left-right crossing, either you have a left -- white left-right crossing or you have a black top down crossing, exactly one of these two, and it is pivotal if you have a black arm connecting to the bottom and a black arm connecting to the top, a white arm connecting to the right, a white arm connecting to the left. But if it's white, then you do have this and if it's black you do have that. So this is the 4M event. So this 4M event is something that is understood well for particular percolation [inaudible]. Okay. There's another small fact that for example the probability that the spectral sample is contained in some say box B but is not empty, is the square of the 4M probability. So this is very special to percolation. So for other Boolean functions you won't get something like that. And also, for example, so this is only one sign that for pivotal something is different than for the spectral sample is that the probability that all the pivotals in the configuration is inside this box B, and you do have pivotals is not the square of the 4M probability but is the 6 M probability. Which is different from this. Okay. So what do I want to say? Okay. So we have this spectral sample just to -- I mean, you have sort of seen these examples in the previous two talks. So for example for the dictator which is the first bit, it's very noise stable. Because you have to toss the dictator in order to make a change. And there the spectral sample is concentrated on the dictator. With quality 1 is a dictator. Majority is still noise stable. The spectral sample, most of the spectral -- most of the mesh of the spectral sample is concentrated on singletons, now of course distributed evenly on all the singletons, however, there is some interesting thing is that so since the probability of being pivotal and being the spectral example is the same, the expectation of the two sets are the same, and the probability that a bit is pivotal for majority, so with probability one over square root of N, you will have the same number of plus ones and minus ones. And once you have this situation, every assemble bit is pivotal. So with this small probability one over square root of N, you will have N pivotals, which means that the expectation of the pivotals will be large, huge, if square root of N. So the expectation of this spectral sample is square root of N. On the other hand, most of the mesh is on singletons. So this shows that -- it is not at all the case that the spectral sample is always kind of typically the size of its expectation, the example for it is not at all true. Okay. Last simple example is that parity you just multiply the bits and that's the most sensitive to noise function that you can imagine. It's the spectral sample is concentrated on the entire set, whatever you change, you change everything. Okay. Now, the spectral sample for the left right crossing in N by N square is -has some basic following from the conformal invariants of -- maybe the existence of the scaling limit of critical percolation. It has some very nice similarity properties. So just on the level of expectations to start with, so the expected size is -- so it's N squared the number of bits in an N by N box times the 4N probability from distance one to distance N that is known to be N to the minus 5 quarters some smear November Werner, so you get some N to the three-quarters as the expected number of -- the expected size of the spectral sample. Now, if you look at this on a coarser scale, so you take this super lattice of medium size boxes, so R by R boxes and you look at the super lattice, and look at which boxes -- which R by R boxes are intersected by the spectral sample? Now, by this previous result that I was showing you, so the number of boxes is L squared over R squared and the probability that the spectral sample intersects an R by R box is the 4N probability from distance R to N. It turns out so that this expected number is exactly as if you were looking at the spectral sample in an N over R by N over R box. So somehow when you look at it on the large scale, it looks like the same function on that super grid. Okay. So this is one form of self-similarity. And another form of self-similarity is that if you condition on a -- the spectral sample intersect into some R by R box, then inside the R by R box it will look somehow like just if you take -- if you had taken the crossing in an R by R box. So at least for the expectations the expected size there is the expected size of SR. So the left-right in an R by R squared. Of course these two results are compatible with each other in the sense but also in the proof of these. You use the quasi-multiplicativity of the F of 4, so of the 4N probabilities what does that mean? So if you look at the -- so if you multiply the -- so the number of R box -- the expected number of R boxes intersected times the expected size of the intersection once you intersect it, this product of course should give the expected size of the entire set. And it does give. So this is the compatibility I told you. So the point is that in order to have -- so the probability that you have the 4N event from distance one to N is of two constants is the same of having it from one to R and then having from R to N. So of course one is bigger than the other, because if you have from one to N, then you also have from one to R and you have there R to N. But also the other way. So once you have from one to R and you have from R to N with positive probability you get the connection. You get the 4Ns from one to N. So this is a nice property of critical percolation. So all these formulas would be true for the pivotals as well. There is nothing special to the spectral sample. But they are also like that. A more classical example is if you take the zero-set of one dimensional simple random walk our one dimensional ground in motion, so of length N then the size of the zero set is around square root of N. Now if you take R boxes on the line, so R intervals, then -- so for example in the second thing is that it's a condition on that you have zeros in these specific R interval, then of course the expected size there is around square root of R, which is about the size of the number of zeros in a length R simple random walk. And maybe from this or from other considerations you also get that the number of R boxes that you intersect is typically the size of the number of zeros for the simple random walk of length. So you do get the self-similarity in many instances of probability. Maybe this self-similarity can be looked at as somewhat a fact -- some with a result of having a scaling limit for these sets. Okay. Now, so you have this self-similarity. What type of concentration do you expect? So Christophe explained that the intuition is or the hope was for a long while that the spectral sample should be concentrated near this expectation. So why do you expect that and how concentrated -- what kind of concentration do we expect? Okay. So these are just silly examples. If you know probabilities it's obvious. So say if you take a uniform set, so just IID with the same density. Well, surprise, surprise, it will be more uniform than spectral sample. So it will be -- it will intersect more boxes but intersection within a given box will be smaller. So it doesn't have the clustering effect that the spectral sample does. And then you get just CRT, central limit theorem, that the concentration of size -- really most of the masses are within square root of the expectation. Okay. A bit more similar to the spectral sample is where you take -- so I fix some R, which is some medium box, N to the gamma, so gamma between zero and one, and I take the super grid and I say well let's take a IID sequence again but now the probability -- so with some small probability which is the probability that the spectral sample will intersect that box I take up large X which is exactly the expected size of the spectral sample given that the intersects the box and otherwise it's zero. So it is more similar. And then I take the sum of this independent things. Now this is still -- you can also some generalization of central limit theorem tells you what concentration you get. It is a bit more spread out than this completely uniform thing. But it is still a little over this expectation. Now, if you have this self-similarity stuff going on on every level that you can imagine, then you don't expect this [inaudible] anymore. So somehow you get the same [inaudible] on every level and you get that you expect only tightness around the mean. So you expect that the probability that the spectral sample is non empty but it's half size smaller than lambda times its expectation should go to zero as lambda goes to zero. And we did get -- so we would like to get the exact rate depending on lambda and we did get the exact rate depending on lambda. So how can you get such a thing? Okay. So if you had a lot of independence, then how would that proof of the tightness work? So for example for the zero set of one dimensional sample random walk you can run the proof, the following proof. So what is the probability that -- so conditioned on the event that the zero set intersects an R box, and conditioned on everything else, so the configuration -- the set of zeros in all other R boxes on the interval. So with all this conditioning still for zeros it's obvious that the probability that you have at least constant times the expected number of their expectation is -- you have some positive probability for that. So this is just the second moment method [inaudible] or what is it [inaudible] signal or something like that? Okay. So this is -- this is easy for the zeros. And another one -- so this is somehow -- so what is the probability that the -- you intersect only very few R boxes. It should be very close to the probability that you intersect only one R box. So it just -- you need this sub-exponential in K direction. This is the statement. So this statement is a somewhat -- it's a result of the clustering effect that I was telling you. So if you have -- you have this KR intervals if you think that they are pretty far from each other then the probability -you have to pay K times the probability to get to the set and leave the set. And this probability is too big. It didn't discuss -- it's not balanced by. So you have more ways if the KR boxes are far from each other, then you have a lot of combinatorial ways to play with them. But this combinatorial entropy is not balanced by the -- it didn't balance the cost that it takes. So you can -- with a little bit of work you can approve this for the zeros. Okay. And once you have these two properties what -- how the proof looks like. So the probability you take some whatever Rs, your favorite Rs, what is the probability that the zero set is at most C times this expectation? Well, you break it into events that the number of R boxes intersected is K. Now, given that you have -- so condition on having K in -- K, R boxes intersected, for each of them independently of all the others, this is the conditioning that, you have probability C to have all of the -- a lot of points in there. So the probability that you failed all the time is at most one minus C to the K. And so you have this line, and from this by two, so this is -- this thing is -- gross sub-exponentially in K and this thing -- the K's exponentially in K, so you get basically this probability that you have only one up to constant vectors is the probability that you have only one intersection. And that you know for sample random walk is -- well, the number of these boxes is N over R, and the probability that get there and leave and don't come back is N over R to the minus 3 over 2. And of course the opposite direction is obvious. So once you just condition to have exactly one box to intersect, then this is the number of intersections that you expect. So -- and then you take lambda to be the ratio between C times the C of -- expectation of CR and expectation of CN, the expectation of the subject, and looking at so if you take this lambda and you plug this in you get for simple random walk the probability that zero is less than lambda, the expectation is roughly lambda. Okay. I have three minutes. So the point -- the problem is that we don't have that much in -- we have no idea if this amount of independence is true for the spectral sample or not. We have only this very limited independent. So if you condition on the spectral sample to intersect a bowl, an R box, an R by R box but you -- and you condition on -- you cannot condition on anything, you must condition on sum W is not intersected. So these Ws that could be anything, not very close to the bowl, but all the conditioning that we can handle is completely negative conditioning, you must condition that it doesn't intersect. And under this conditioning we can proof this condition of second moment and sense this positive probability result. So where does this negative conditioning come from? Why it's only negative conditioning? So the probability that we can handle this probability, the probability that spectral sample is contained entirely in some set U, some subset of the bit. So they're just taking all the coefficients corresponding inside that U. And this is like a projection. So it's not very surprising that it's exactly like a projection of taking a conditional expectation. So this is this conditional variance if we can easily check that. Which means for example that for two disjoint sets A and B the probability that the spectral sample intersects B but doesn't intersect A, you can interpret it as probability of contained in something minus probability of not contained -- as contained in something else. So this is this thing. And then you use the [inaudible] theorem for marking this or something like that. So you get -- the point is that the whole strategy for the spectral sample of [inaudible] the spectral sample of critical percolation depends on the fact that this sort of inclusion formal and these sort of probabilities we can understand the physical space. So in a -- because this conditional probabilities and conditional expectations can be rephrased in terms of 4M events here and 4M events there. So whenever you have a Boolean function for which you can -- you can understand in physical space this sort of probabilities, then you have a chance to run other strategy. And okay. Now here comes the -- so we have these two results. So actually I didn't say how to prove them. So one -- so we have this replacement when you only have this negative conditioning, and you also have some clustering effect similar to the zeros of the brown emotion. So along each of them was a lot of work to prove. But assuming that you have these two, what do you do? Well, the trouble is that you cannot repeat that calculation that I showed you here because I cannot just take the case power that I would like because I need a -- whenever I failed to have a large enough intersection, I cannot just condition just go on, because I don't have the independence. I go could on if not enough points for a failure, not enough points in one box. If it's meant we found nothing in the box, they be we sort of could go on -- okay. For -- so there is a simple remedy for that, this is a nice idea. You take an independent random dilute sample, so you take a random subset independently of everything with the right size such that if an R box is the size of the sample in an R box is small, then it's likely that they didn't intersect the dilute sample, and if the size is large then it is likely that it intersected. Which means that you can measure -- if it's small or big just by looking at whether it's empty or not. So failure will be empty. So you have only negative information that you gain by looking at these dilute sample. So now you have a chance to do. However, there was another problem, which is that you have -- you had also this conditioning that the number of -- and when you want to put one and two together, you wanted to use that -the probability that the number of R boxes intersected. The probability they too are small, were small. And this conditioning is not negative conditioning. So we cannot just take the case powers of this one minus C because there is this positive conditioning. And what do you do with the positive conditioning? And okay, we will completely -- I mean it looks like a silly technical problem. And while we were with Christophe we were digesting how silly this problem is. So that's when first came up with one solution and then while we were digesting that solution, he came up with another solution. So the first solution -- so you can try to do something like you scan sequentially the boxes with this random dilute sample, and so if you see that you didn't succeed, you didn't find anything, it's empty, empty, empty, empty, you go on. So if you could say that somehow the good probability you had many, many chances, you had to find something then you would be fine. So the question is how do you put this thing in that actually we had -- we had a large number of boxes intersected so the first solution was a filtered macro inequality, which is that. So if you have some non negative variables -- so it's extremely general, some X scale on negative variables and some F case of monotone increasing filtration. So this is some of what you learned during this scanning process. And the Y case are the condition expectation. So the probability that the unconditional ones is large and the conditioned once are small is small. So you can try -- so very simple proof due to SF numbers, you can try to [inaudible]. So but so this sort of gave some solution to our troubles. The trouble is that this is just a macro inequality. It's too weak. We didn't get the full result that we wanted. So that came up with a better solution which where I will end. Almost. Okay. So sorry. Yes. Okay. So this is a strange large deviation lemma with very dependent things. And so this is exactly the type -- so instead of scanning sequentially, we know that if we have a information about any set of boxes that we didn't intersect there, given that the spectral sample is there, we have a positive probability that we actually -- there is a lot of intersections, so the random dilute sample will find that. So we have this type of thing. So in order to get an exponential large deviation thing instead of doing this sequential scanning, you do -- you average these inequalities at once. So you take a random, a suitably chosen random J and take the expectation of this inequality and you get -- you get something. And you get the -- that's where we ask how did you come up with this proof and this result and said that he tried not to think probabilistically but tried to -- well, we had a bunch of inequalities, he wanted to get another inequality, he tried to be an analyst and do something. Okay. So this is what [inaudible] so with this we actually get the final result that we get is that the probability that the spectral sample is less than the expected is something, the basically equivalent to contained in a single R by R sub-square. And that we know showed you on the second slide. And so we do get some result, and we get the scaling limit of the spectral sample is conformally invariant Cantor-set with Hausdorff-dimension three-quarters. And you could run the same process with pivotals but you get some different exponent here. But it's and overkill to do this, because pivotals have much more independence, they don't real want to do that. And I stop here. Thanks. [applause]. >> Amir Dembo: Comments or questions? >>: So if I understand, one of the worries is that this thing is not an [inaudible]. So can you prove [inaudible] so do you expect clustering I guess? >> Gabor Pete: Yes. >>: So can you prove that it's [inaudible]. >> Gabor Pete: Well, there is something about -- so pivotals has a lot of independence. So there you can say that however the set of pivotals looks here if you condition to see pivotals in here, then the configuration inside here will be basically independence of all the other conditionings that you've made. So for pivotals we know that. So pivotals still have the clock clustering effect. It has the same type of clustering as the spectral sample. But it has this independence. This approximate independence thing. And we don't have a clue about the spectral spam. Actually, there is one things that I wanted to mention that Gil Kalai had a conjecture that the entropy of such sets, similar sets should be just bounded by the expectation of the size of the set so the log factor shouldn't be there, that you would get from uniform. So that's a factor I think I can prove for pivotal but not for spectral sample. [applause]