>> Yuval Peres: [inaudible] very happy for Ryan O'Donnell who will tell us about the striking generalization of the famous KKL paper. KKL, Kruskal-Katona and monotone nets. >> Ryan O'Donnell: Thanks, Yuval. It's great to be back here in Seattle at Microsoft. So I'm going to be talking about I guess two actually joint works with Karl Wimmer formally of Carnegie Mellon and now at Duquesne University. Okay. So let's start with what monotone nets are. So this is -- by this I mean a net for the set of all monotone Boolean functions. So let's say a set of function H is a gamma net for the set of monotone functions, if every monotone Boolean function is kind of close to one of the functions in age. And specifically it won't be that close, so we'll look at instead the correlation of the function. So we want that every function little H in H should have at least gamma correlation with every -well, every monotone function should have at least gamma correlation with something in age. So just to remind you of definitions, the Boolean function is monotone if changing 0s to 1s in the input can only make the output go from 0 to 1. And by correlation of two functions H and F I mean the probability over a random X uniformly from 0,1 to the N that H equals F minus the probability that they differ. So this is a number between minus 1 and 1. It's 1 if they're equal, minus 1 if they're opposite, and it's related to the distance between the two functions in this way. But there's lots of monotone functions, so it will be hard to come up with like a net that's covering them. So we'll actually be interested in distances that are pretty close to a half. So we just want functions that are slightly correlated with all the monotone functions. Is that clear? Okay. So actually this problem has been studied surprisingly a lot. So I'll tell you about the previously known results. So this is Kerns and Valiant [phonetic] from '89. And they show that this -- well, I don't have a pointer. They showed that this collection of N plus 2 Boolean functions, the all zero function, the all one function, and the N coordinate projection functions, are like a sort of good net for the collection of all monotone functions. So every monotone function has at least 1 over N correlation with the all zeros, the all ones, or one of the N coordinate projection functions. I should say I guess that if you just take the all zeros function and the all ones function, then that's every function has correlation at least 0 with one of those, right, because it's either more likely to be zero or more likely to be one. >> [inaudible] >> Ryan O'Donnell: Yeah. So. So yeah. So I guess the idea is to keep H small. Let's say polynomial size. Okay. So this is improved I guess six years later. Just the analysis was improved. So [inaudible] showed that the same net is actually better than this. It's achieves correlation at least log N over N. And this is [inaudible], some Carnegie Mellon guys, in 1998. They took an even simpler looking net that only contains three functions: 0, 1, and the majority function. And they showed that every monotone function is either a little bit correlated with 0, with 1, or with the majority function. And the amount was 1 over root N. And they also had another interesting aspect in their paper. They showed that any collection of polynomially many functions can achieve correlation at most log N over root N. So there's a little gap left in what they did. Well, one thing we do in this paper is close this gap. So the natural thing is just to combine these two nets. And indeed we show that if you take these N plus three functions, 0, 1, majority, around the N coordinate projection functions, then every monotone function has at least log N over root N correlation with one of these functions. Okay. So we know it can be any better, so this is actually [inaudible]. Okay. So that's nice, I guess, but I don't know, what is the interest in this problem. It seems like more like a curiosity, really. Well, actually, the reason that all of these papers, including us, actually, were looking at this problem was because of a notion in learning theory, computational learning theory. So actually these are two heroes of learning theory [inaudible] and in this paper they address -- or they introduce the notion of weak learning, which is like the learning theory model where instead of striving to get like a really excellent hypothesis you only strive to get a hypothesis that's a little bit better than trivial. And so you might do this if the set of functions you're trying to learn is quite complicated. And they were specifically looking at trying to learn the class of all monotone functions. So I'll give you a quick explanation of the connection to learning theory. If you're not that familiar with it, then you can just tune out for a bit, because I won't explain exactly everything I say here. So a corollary, for example, of our net result is that the set of all monotone functions is weakly learnable with advantage this same quantity, log N over root N, under the uniform distribution. And the proof of this fact, the algorithm, is extremely simple. You just draw a bunch of examples. So the learning model is there's some unknown in this case monotone function F, and you can get examples from it, which are pairs X where X is shown as as uniformly random, and the label F of X. And you're trying to come up with like a hypothesis that is somehow close to the unknown function F. So this result immediately gives you this simple algorithm, just draw a bunch of examples and check which of the functions in the net, either 0, 1, majority or the N projection functions, empirically seems most correlated with these labels F that you're getting, and whichever one seems best just pick that one and let that be your hypothesis. Okay. And it's easy to see if you just do some samplings, some [inaudible] bounds or whatever, that the empirical correlation will be close to the true correlation. So this theorem tells you that one of these N plus 3 functions has at least this much correlation, and that's like the learning theoretic advantage that your hypothesis achieves. >> [inaudible] >> Ryan O'Donnell: Yeah. Yeah, probably you invert this in square, but then you need to multiply by log N because you're taking uniform -- a union bound over N plus three functions. Actually, I little quirk -- I don't mention this, but a little quirk of how we prove this is actually you can achieve the same thing with only N to the epsilon many samples. But that's taking this too far into the guts of learning theories, so I'll skip over that. And you can see actually that just anytime that you have a net result like this where you say like every function in this class, e.g. monotone functions, is, you know, somewhat close to one of -- a small set of functions. Then you can do the same algorithm. Just try everything in the net, pick which one looks best. So all of the results here have the same learning theory corollary. In fact, I should mention that this result, which is the upper bound, was not actually an upper bound about nets, it was -- or a -- it was an intractability result for learning theory. So this result actually showed that any learning algorithm for monotone functions which only seize polynomially many examples, information theoretically cannot achieve advantage better than this amount. So, in particular, the net kind of strategy, learning by net strategy can't do better, and that implies this statement. Okay. So that's the connection to learning theory, which motivated the study of monotone nets. And, you know, it's kind of a somewhat obscure problem in learning theory maybe, and only if you're a true like Boolean function nerd like me might you be really interested in this problem. But as is sometimes the case, I think it turned out that like the how or like the tools used to prove these results proved to be more interesting than the actual result itself. So let me tell you how each of these papers achieved their result. So this basically follows from the expansion of the hypercube, the expansion of it as a graph. This result follows immediately from the KKL theorem. I'll say what it is soon. This result follows quite quickly from there Kruskal-Katona theorem. And this result follows from first by proving like a generalized KKL theorem and using that to prove like a robust Kruskal-Katona theorem, and then that implies the result. So these are the two things that I mainly want to tell you about in this talk, and you may remember them from the title. Okay. So let me first remind you about the KKL theorem. We have, as was mentioned, the first K, Jeff Kahn [phonetic], sitting here in the audience. I guess I haven't even said who they are. It's Jeff Kahn, Gill Kalai, and [inaudible] Linial from '88. Okay. So this is a very, very famous theorem about Boolean functions, one of the most amazing theorems in the area. And it says this. So here's the theorem. If you have any Boolean function F, there always must exist some coordinate between 1 and N which has a somewhat high influence on the function F. So what does that mean? So first you kind of have to scale this statement by some factor. This is a fraction of points where F is 0 and the fraction of points where F is 1. We mainly are just concerned about the case where F has roughly as many 0s as 1s, so this is about a half or at least a constant. This is about a half or at least a constant. So for the rest of the talk just think of this factor as like an absolute constant. So then it says just that function F, which is sort of roughly balanced, always has a coordinate with at least log N over N influence. Well, what is this influence? The influence of coordinate I is just like the probability that flipping the Ith coordinate makes a difference to the function F. So, more precisely, the experiment is you pick a random string X, or a random point in the hyper key of X, then you flip the Ith coordinate and you see if F has different values, 1,0 and 1,1, on this coordinate. Somehow the probability that this coordinate is relevant, it's somewhere between 0 and 1, and the theorem of KKL is that there's always a coordinate that has influence at least log N over N. I mean, so far it's not clear, is that good or bad or what. But so good, it's tight. We'll see -- we'll understand a little bit more in the next few slides. Any questions so far, by the way? Right. So before I talk a little bit more about the KKL theorem, let me just remind you that I told you this net result with log N over N, so somewhat far away from what we end up shooting for, but this log N over N thing follows truly immediately from the KKL theorem. So let me see why. So this is KKL. And let's imagine now that in particular F is a monotone function. If F is a monotone function, then you can interpret the Ith influence in another way. You can see that as follows. In this experiment you pick some X, right, which is some long string, and then maybe this is the Ith coordinate, and you consider what happens when it's 0 and what happens when it's 1, and you look at F's value. So it has two values. So it could be 0 on both of the them, in which case you don't sort of score any points in this probability. Could be 1 on both of them, in which case again this event doesn't happen. It could be 0 and 1, in which case you sort of get a point here, very different, or it could be 1,0. But actually it can't be 1,0, because it's a monotone function. So if you change a 0 to 1, you know, that can't make a monotone function go from 1 to 0. So since these are the only three possibilities, you see it's also basically like the probability that F of X equals XI, or, more precisely, it's the probability that F of X equals XI minus the probability that F of X differs from XI. Well, if you don't quite follow that calculation, it's very trivial, so take my word for it. This is precisely the correlation of XI with F. So just to state it again, if F is a monotone function, then you can also interpret the influence as just the correlation of the Ith coordinate with the function F. And this is KKL. It says that there's always a coordinate with influence at least this much. So, you know, basically one of three things can happen. F can be mostly 0, in which case, you know, 0 is quite close to the function. F can be mostly 1, in which case 1 is quite close to the function, or if neither happens, then both of these numbers are kind of absolute constants, KKL tells you that there's a coordinate with influence at least log N over N, and therefore that also has correlation at least log N other N. So that proves that the 0,1 and X1 through XN is a log N over N net. >> [inaudible] >> Ryan O'Donnell: Yeah. There was some more learning theory results in there. They always gave like a 2 to the root N time algorithm for strongly learning every monotone function. This is quite nice result. Great. So, yeah, just told you about that. Let me actually just briefly go back and tell you about this, because it will also put the KKL theorem in a nice context for you. So this is even earlier results. Yeah. So this is sometimes known as the expansion theorem or the [inaudible] inequality or the edge isoparmetric inquality the hypercube. This E of F -- there's a lot of symbols, sorry about that, but this E of F represents the average of the influences, okay, the average of the influences. And, again, if you think about this as a constant, this is saying that the average of the influences is always at least constant over N. So you can see already the KKL is kind of saying something interesting. It's saying the average may be at least 1 over -- constant over N, but the max is always at least log over N. And to see the connection to like expansion on the hypercube or edge ispirometry, what is the average of the influence is, well, in the influence experiment, right, you pick a random X and then you flip the Ith coordinate. So if we're going to also average over I, it's like saying you pick a random X and then flip a random coordinate. >> [inaudible] do you get the bound on the min also of the [inaudible]? >> Ryan O'Donnell: You know, the thing is -- the min on the influences can be quite small. It can be sort of proportional to the min of these two things. In fact, yeah, it can be 0, right, if [inaudible] thank you. That's a much ->> [inaudible] min is 0 over it. >> Ryan O'Donnell: Yeah. Right. So the average of the influence can also be interpreted as this. You pick a random edge and look at the probability that F labels its endpoints differently. So, again, if you think of it geometrically, if you think of blue as 1 and this brown is zero, F kind of makes let's say a subset of the cube, maybe the blue points. And then you count the fraction of edges that are on the boundary. You go from the inside to the outside. Okay. And so this is like an isoparmetric statement. It says that the fraction of edges on the boundary is always as least constant over N times sort of the volume of the smaller side. That's a familiar statement. This is a very trivial statement. You can prove it by induction on N. I think the original proof was Harper in '64. He was a mathguard [phonetic] student at Oregon, our neighbor to the south. Okay. And this is also sharp, by the way, for these functions F that are like XI, just a coordinate projection function. Here's like a picture of that. Because if it's this kind of function, the probability of its one is half, probably at zero is a half, so those cancel with the four, and it's just saying that are a 1 over N fraction of the edges go between the two sides, which is true, right, 1 over N fraction of the edges in the cube go in the Ith direction. Okay. Any questions? Great. So, yeah, just this statement about the expansion of the hypercube. Our edge inspirometry says that the average of the influence is at least let's say constant over N, and KKL says that actually the maximum is at least log N over N. I'm actually going to use a slightly refined version of KKL, which I guess maybe first appeared in a paper offal grant. It looks a bit funny, but this is the version I actually want you to remember more so than to remember this. It says that if F is a function all of whose influences are smaller than let's say 1 over N to the .01, which is pretty big, right, 1 over N to the .01, if they're all smaller than this, then actually their average is a bit large. It's a bit funny. If they're all kind of small, then their average is a bit large. And you can see this as actually stronger than the KKL. Why? How would you prove KKL using this fact? Okay. If F has an influence that's bigger than 1 over N to the .01, then we're certainly far easily done. It's way bigger than log N over N. Otherwise the average of the influences is at least log N over N. So certainly one of them has influence at least log N over N. So remember this one, this version. Great. I should also mention that the KKL in this theorem as well are also known to be sharp, even for a monotone function. This function introduced by [inaudible] in '86 called Tribes has the property that actually its correlation with all of these N plus 2 functions is log N over N. So all of the influences are log N over N. Okay. Any questions about KKL? So a few more facts about it. It's kind of funny. You know, these influences are combinatorial notions, but there's no known combinatorial proof of KKL. Only analytic proofs. You might say that this log N kind of looks small. I mean, you know, you're beating the trivial bound here by log N, it doesn't look that big, but actually the fact that it goes to infinity is like crucial. I mean, that's like the awesome aspect of it. And it's exactly why it has all these nice applications. So, for example, in the original paper it showed they used it to show in an idealized model of a two-party election you can always bribe a little O of one fraction of the voters and force the outcome of the function with high probability -- or the election. The fact that this goes to infinity is the reason why monotone graph properties have sharp thresholds. And it's also the reason why the sparsest-cut SPD has superconstant integrality gap, if you know what that means. If you don't, you can ask [inaudible] who is the DE right there. Okay. So that's the beauty of KKL. And I'll just say -- I mean, I won't give the proof. It's a little complicated. But I'll kind of give some kind of sketch or how I think of the proof. A very sketchy sketch. Somehow the idea I feel is that this expansion theorem we mentioned is not tight for tiny sets. So sets that are of about size of about a half may have only this sort of expansion by a factor of 1 over N, but a really tiny set in the hypercube actually has a lot of edges coming out of it. This is -- somehow the idea is to apply this to each set, which I'll call maybe delta I, which is the points that are on the boundary of F in the Ith direction. Remember, the hypothesis of this refined KKL was that if all of the influences are small, then somehow they're average is a bit large. So, I mean, if all the influences are small, it's like saying all these sets have small volume. So then you can maybe hope to apply this idea to that fact and then somehow average it all together. It doesn't work out so nicely as that, that's why you have to bring in these combinatorial analytic ideas. But somehow I think of that as an idea. I'm not really sure what that picture illustrates. So for some sets of the Hamming cube, this is also an easy fact. I mean, Harper also proved this in '64. I mean, he showed a sharp edge isoparmetric inequality. If you give me like the size of a set in a hypercube, I'll tell you the best one for the purposes of making the edge boundary small. And he actually showed this. If G is let's say the indicator of a subset in the cube, then its boundary is at least 2 over N times the -- let's say the volume of it times this extra log factor. And the sharp subsets are like subcubes. And that's somehow why the log comes in. >> [inaudible] >> Ryan O'Donnell: Pardon me? >> [inaudible] boundary? >> Ryan O'Donnell: Yeah. So I guess I wrote G instead of F here because I'm thinking of -imagine G is the indicator of a smallish set. Yeah. So somehow you see there's this like extra log that comes in that east kind of what you would like to exploit. Unfortunately, as I said, the proof doesn't quite look like that. You have to get some analytic notions in. And in particular if somehow instead of applying this to a set you want to apply it to a function which is like imagine G of X is the probability that somehow a short random walk from X would lie in the boundary, the Ith boundary of this set F. Well, this is a real valid function. So you can't -- I mean, this is about sets. You can't apply Harper's theorem. But there's a generalization of this theorem to real valid functions proved by I guess Gross in 1975 called the Log-Sobolev Inequality. And it's also equivalent to these things like hypercontractive. Inequalities or [inaudible] inequalities, how it was proved in the original KKL. So somehow you use this more powerful fact instead, and somehow this is the intuition. But it's tricky to actually do the proof. Okay. That's all I will say about the proof. But we can also stop for questions here, because I think this is the last thing I'll say for KKL for a while. Okay. Great. So that's KKL. And we're going to put it aside for a little while and come back to it at the end. So we're going to now go to the second thing in the title, which is Kruskal-Katona, which uses some of the same concepts but is a bit different. So Kruskal-Katona, as you probably know, is a famous theorem in combinatorics from '63 or '68, depending if you're Kruskal or Katona. And it's usually stated in terms of set systems. But I'm going to stick with this notion of Boolean functions. And if you know the set systems, then you're sharp enough to do the translation on the fly to Boolean functions. Okay. So what's the Kruskal-Katona theorem say. This is my picture and will be for the future of the talk of the Boolean hypercube, 0,1 to the N. So I draw it in this funny way where I kind of picture the Hamming weights going upward. So this is the set of all strings with Hamming weight 0. This is set of all string with Hamming weight N, and this is a set of all strings somewhere around here with Hamming weight N over 2. And there are many more of those, which is why I draw the picture in this funny way. And the Kruskal-Katona theorem is concerned with just a particular slice, just a set of all Boolean strings that have a fixed Hamming weight K. And I'll donate -- denote that slice by like N choose K, the set of all Boolean strings that have Hamming weight K. So there's this slice. And you can imagine you have some function F, a Boolean function F on the entire Boolean cube, but you'll just focus in on what's going on on this slice. And I'll introduce this notation, mu sub-K, for just a faction of points in the slice where F is 1. It's like the density of F in this slice F you think of F as the indicator of a subset. So the Kruskal-Katona theorem is all about comparing what's going on at one slice to what's going on at the next slice up, or perhaps the next slice down. But we'll say next slice up. And in fact it's all about trying to make a statement like this. Imagine you have a monotone Boolean function on the whole cube and its density at level K is something. What can you say about its density at level K plus 1. So by virtue of the fact that it's monotone, right, you would know that there's some stuff -- there's some points where F is 1 up here, because anytime you have a point on the K slice, which is where F is 1, and then you change a 0 do a 1, by virtue of monotonicity, the resulting string has to be one where F is 1 up here, right? And in fact it's very trivial exercise to prove that the density must go up as you go up the slices. So it's very trivial to prove that the density of a monotone function at level K plus 1 is at least that of what it is at level K. The point of Kruskal-Katona is to give a better statement than that, to give like a sharp statement. And that's exactly what Kruskal-Katona does. It actually tells you if you fix K and you tell me exactly the density of a monotone function at level K it will tell you exactly like what F should be so as to try to minimize the density at the next level. It's like some the first strings in [inaudible] graphical order or something, but the point is it just tells you like the exact best inequality that you can put here. Now, it's actually a little complicated to state. And if I were to state it, it would look like this. Mu K-sub 1 is some -- at least some complicated function of mu K and K. So people often quote instead of the actual theorem, like a [inaudible] that's a bit easier to deal with. And [inaudible] as has one. [inaudible] and Thompson have another. I'll actually even further simplify them. So this is like a corollary of a corollary of K Kruskal-Katona. It says that the density of a monotone function at level K plus 1 is bigger than the density at level K plus this amount. So how should you think about this amount? Basically I want to tell you it's something like 1 over N. So I what I'm saying is I want to focus on the setting of parameters where, first of all, the slices we're talking about are somewhere in the middle. The K over N is bounded away from 0 and 1. So we're not talking about up here or down here. And also I want to talk about the case where this density is also somewhere bounded away from 0 and 1. So we're not talking about almost completely full or empty slices. So I will always just care about these two settings of parameters. In that case, if mu K is a constant, then this is all a constant. And if K is like a constant fraction of N, this then this is like a constant fraction of N. So it's just this. I mean this is a corollary of a corollary of a corollary. The density goes up by at least some constant over N. Okay. Any -- does that make sense? And this corollary of a corollary of a corollary is tight. I mean, KK is exactly too tight. I mean, it is the exact best answer, but we haven't lost anything yet here. And it's very easy to see. F of X could be this very simple monotone function XI. And so what's the density of the XI function at level K, it's like asking if I pick a random string of N bits with exactly K1, what's the probability that the Ith coordinate is 1. Well, it's K over N. And so the density at level K plus 1 is K plus over N and the difference 1 over N. So Kruskal-Katona tells me that the least amount by which the density can go up from a monotone function is like 1 over N, and this is [inaudible]. So you play around with it, and you might wonder are our examples that are kind of like is this the only examples where this density increase is so small? Oh. This is the lead-in to some slide that's like three slides from now. Actually maybe in the interest of time -- I was going to show how -- well, okay, I'll show this. So this Kruskal-Katona corollary actually easily implies this old net result, if you remember it, that either 0,1 or majority has at least this much correlation with -- for any given monotone function, one of these three functions has at least 1 over root N correlation with it. Let me quickly sketch that. So imagine F is a monotone function. And we'll sort of divide up the slices around the middle. Let me just assume that as density at the middle slice is about a half, okay? If it's way bigger than the half, then probably 1 is quite correlated with that. If it's way smaller than the half, then probably 0 is quite correlated with that. So let's assume that it's a half, and then we'll show that majority is quite correlated with the function. We have some function monotone function F, its density at this slice is a half. Okay. So then Kruskal-Katona theorem tells us that the density at the next slice goes up by like some C over N. Okay. Then you apply it again. It's still monotone, so Kruskal-Katona tells you this density at this slice is at least 2C over N. Okay. You keep doing this for a while. Let's say you do it for like root N slices, and therefore at this point, at this N over 2 plus root N slice you know the density of F is at least half plus constant over root N. You can do a symmetric thing going down and get that the density down here is smaller than half minus C root N. And now you're in good shape to conclude that majority has this much correlation with F. Because, you know, majority is 1 everywhere above the middle, and 0 everywhere below the middle. That's the definition of majority. So it's like all 1s up here and all the 0s down here. So in this piece you're sort of catching correlation C over root N with majority. And on this, because it's one up here. And on this piece you're kind of catching correlation C over root N with majority. And these two pieces occupy like a constant fraction of the hypercube. This is well known that the constant fraction of the hypercube is between these two levels and therefore also outside these two levels. So that's a pretty sketchy sketch. But that's how this net theorem quickly follows from Kruskal-Katona. Okay. So you remember our, you know, monotone net theorem, which was like the final corollary of all this work was that one of these guys for every monotone function one of these guys has at least log N over root N correlation. And you can imagine trying to give a similar proof. And in fact you'll get the exact same proof if you could just conclude that instead of the density going up by constant over N at each step it went up by log N over N at each step. And you'll just gain a factor of log N. And so we exactly execute that idea by proving this robust version of a Kruskal-Katona theorem. So this is one of our main theorems. Let F be a monotone function and let's say you're somewhere in the middle of the Hamming cube and the densities are also bounded away from 0 and 1. Basically it says that the density, when you go from level K to level K plus 1 always goes up by actually log N over N. Unless F is somehow strongly correlated with one of the N coordinate functions. So to state it in the contra-positive, if the correlation of F sort of within slice K and every XI is smaller than 1 over N to the epsilon, then the density jumps up by log N over N. And I think you can probably kind of imagine that like once you have this theorem to like conclude this net result it's not too bad. I mean, it takes a couple of pages because -- well, you know, basically the density is going up and up and up, and you're happy, unless at some level you have like very large correlation with some function XI. And then you need to eventually deduce that indeed this guy has good correlation with F sort of everywhere in the cube. But you can see that they're kind of similar and a little bit of work will take you from this to this. Okay. So this is the main theorem that we prove. And I think you can also see that it kind of should, if you remember, remind you of KKL, right? The KKL theorem said that if F is any function it doesn't have to be monotone and all of its -- the N influences are smaller than 1 over N to the .01, then kind of the average of the influences, which is like the edge boundary, is at least log N over N. And this is also some kind of statement that like the edge boundary or somehow the boundary between the K and K plus first level is at least log N over N. So what I'm saying is you can tell that this also kind of looks like the KKL theorem. So we prove this robust KK, Kruskal-Katona, theorem by proving some new kind of KKL theorem. So what's the difference? The difference is that the KKL theorem is about functions, and this thing is kind of about functions restricted to like a single slice, which is actually a negligible -- each individual slice is like negligible fraction of the hypercube. So we kind of need like a KKL theorem that's like localized to a single slice. Any questions? Feel free to ask them if you have them. Okay. Right. So, okay, so let's imagine we wanted to now like prove KKL but somehow localized to a slice. We want to prove something like this. Well, you can again say, okay, let F be a function mapping this slice and choose K into 0,1. Now let's try to show that one of its influences is large. But there's an immediate problem, though, which is what is influence? Because, you see, the normal definition of influence doesn't make sense anymore. Imagine you have a function defined just on the set of all weight K strings. The definition of influence is like pick a random string, okay, you can pick a random string here, but then you have to like flip is Ith coordinate. But if you change a 0 to a 1 or a 1 to a 0, you will no longer have Hamming weight K. So you'll get a string that's not even in the domain of F. So it doesn't make sense. Okay. So you invent this sort of already given away here, but you invent a different notion of influence where instead of like picking a random string and flipping the Ith coordinate you pick a random string and you swap the Ith and Jth coordinate. And that's cool because we swap the Ith and Jth coordinates, you won't change the Hamming weight of the string. So you at least get to something that's still in the domain. And maybe there's some chance that there actually -- this doesn't do anything if they have the same value. XI equals XJ, but they may. Great. So we can invent this new notion of influence of a pair of coordinates on a function whose domain is the Kth slice, just the probability you pick X and then you swap the Ith and Jth coordinates. That changes the value of the function. Okay. So that's some new definition. You can again define this E to be the average of all the influences. Now you're averaging over N choose two things. And, yeah, this is the theorem we actually prove, that it's like a generalization of the KKL if all of the influences, even the refined version, if all of the influences are smaller than 1 over N to the .01, then their average is at least log N over N. Okay. So, yeah, it looks like kind of identical to the original KKL theorem. But, just to remind you, it's like in a different setting, functions just on the Kth slice. And, you know, when we tried to prove this, it does seem harder than proving KKL, because the proof of KKL uses Fourier analysis in a deep way. And Fourier analysis is very like nicely attune to the situation where the domain is a -- carries a product distribution. You usually associate the product distribution with 0,1 to the N, but the uniform distribution on the set of strings of Hamming weight K is not a product distribution. So it makes it seem like it's hard to envision using Fourier analysis. So that's why maybe it's a bit harder than the original KKL. >> [inaudible] >> Ryan O'Donnell: Oh, you're like three or four slides ahead of me. I wish I had talked to you back in the day. Great. Yeah. We'll see it in a second. Okay. In summary I kind of told you about all of these things, and the only thing I haven't really told you about yet is this KKL on a single Hamming slice. Okay. So let me -- the last part of the talk will be about this. And in fact we prove a KKL theorem in like a more generalized setting. Although I will now admit to you honestly that we set up a nice generalized setting and then we have like two special cases in mind. A, the Hamming cube, and B, like the slice of the Hamming cube. But, okay, we did it in a general setting anyway. Okay. So what is this general setting where we're going to try to prove like a generalized KKL. The setting is what's called a Scheier graph. I never heard of what a Scheier graph is until I started this project. But it's a simple generalization of a Kaylee [phonetic] graph, which I think more people have heard of. So it's a graph. The vertex set is some set X. And you imagine you have a group G which is acting on X in the group theoretic sense. And you also imagine like in a Kaylee [phonetic] graph that you have a subset of the group, which is like a generating set for the group, U. And we'll also make the standard assumption that it's closed under inverses. Okay. It's a graph, so what edges do you put in. You look at each X in little X and big X, and you put in an edge to each X acted upon by use for each little U in the generating set. So here's X. You like act upon it by all of the guys in the generating set. That gives you some other guys in capital X and you just put an edge to all of them. Is there anything else on this slide? So just a quick comments. First of all, this is an undirected graph because of this assumption that it's closed under inverses. So if you go here, if you acted upon this guy by U1 inverse, which is also in capital U, you come back to X. It's undirected. And it's also regular, right, because at each vertex you put like one edge for each guy in capital U. It should also be connected, so maybe G should act transitively on X, but never mind. Okay. So whenever you have a graph, you also have like a natural -- oh, I should also say that when X equals G and the group action is just group multiplication, then that's the Kaylee [phonetic] graph. Okay. So you have a nice graph and whenever you have a nice graph you can consider the natural random walk where you just start at a point and go to a random neighbor and keep walking. And luckily this is an undirected graph and it's regular so the stationary distribution for this random walk will be the uniform distribution, which is on capital X, which is pleasant. You could also interpret it as, you know, you start at a random little X and X, and then you just pick a random generator and act on your current location. You pick another random generator and act on your location. Yep? >> Let's assume the action is transitive, it's just a quotient of the Kaylee [phonetic] graph, right, and then you're essentially doing a random walk on the [inaudible] and looking at it, project it down into X. >> Ryan O'Donnell: Everybody else is nodding, so yeah. [laughter] >> Ryan O'Donnell: Sorry. I'm not a mathematician really, so I'll -- yeah, I think that sounds right. I'm just a lowly computer scientist. But, yeah, sorry, yeah. Yeah. Okay. Yeah. So we have two examples mainly where X is the set of all strings of Hamming weight K. The group acting on it is the symmetric group, you know, because if you permute the bits of a string of Hamming weight K, it remains a string of Hamming weight K. And the generating set is the set of all transpositions. These are the swaps. Another case is a very simple Kaylee [phonetic] graph of Z2 to the N. That's the Hamming cube. The generators are like the elementary, you know, all zeros except for 1, 1, and that's like the action of U is when you add that it's like flipping the Ith coordinate. Great. Okay. So you can more generally define influences here, right? I mean, if you have a function on set capital X, a Boolean function, Boolean valid function, you define the influence of the Uth generator to be probably the F of X differs from F of X hit by the generator U. And it's natural because the random walk to take X to have the uniform distribution on capital X. And again E would be the average of all the influences over all the generating set. And, yeah, so once you get to the setup, then you just try to carry out the KKL proof in this level -- higher level of generality, and you do it carefully. And like whenever you seem to be doing something sort of specific to the Boolean cube, you try to stay at a higher level. And eventually you just give the proof. So the proof is like very similar to the original proof. There's one thing that you kind of need that it's trivial in the hypercube case, but somehow in the proof it seems like you really need this extra fact, that the generating set U is closed under conjugation. So UV, U inverse is in capital U. If I'm saying words like this, I should also know like quotient and [inaudible] quotient grid. So, yeah, I apologize. Anyway, yeah, so you can basically just carry out the KKL proof in this setting with this extra assumption. And so this extra assumption holds in like the two main cases that we care about. In the hypercube case it's very easy because the group is abelian. So, I mean, conjugation just doesn't do anything. It just leaves it the same. And it also works out fine in this case of the Kth slice with the transpositions, because the transpositions form a conjugacy class of SN. So conjugated transposition you get another transposition. So that's great. So you can prove it in both of these settings. Now, what are actual like numbers you get at or what quantify configuration do you get out. I'll come to that to later. Let me just say a little bit about the proof I guess, again, at a very, very high level. As before, if some set X maybe F is the indicator of these points, and again you define little G of X to be the real value function on capital X which is that, well, one for each U. The probability that a short random walk starting from X lands in the Uth boundary set of X. So you have a random walk and then maybe it lands at a point where if you did a U step that goes from inside F to outside F. And hopefully in this graph small sets of large expansion, like in the hypercube graph. And that's exactly quantified by like the Log-Sobolev for this Markov chain. Well, Log-Sobolev inequality for this Markov chain would give you a statement like this Log-Sobolev so hopefully you not only should U be closed under conjugation but this whole setup you better -- in a configuration where there's like -- you know, this appropriate holds or like there's a large Log-Sobolev constant. And just one like word about this condition, why do you need that U is closed under conjugation, somehow when you're doing the proof, you need the -- if you do a U step followed by a random step or a random bunch of steps, it's the same thing as doing some random steps and then a U step. I mean, somehow you use that in the proof. And that's basically equivalent to saying that U is closed under conjugation. Okay. Great. So here's the theorem. In this exact Scheier graph setup where U is closed under conjugation, this extra condition, we get this. This is like combining the two parts of talagran thing into one. It says that the average of the influences is at least row, which is the log solo of constant for the random walk, times log of 1 over the maximum of the influences. Again, if you think of like if the maximum influence is small, then this has become -- this is large and so this is like log of that. So if the maximum influence is small, then you sort of gain like a log factor in the inequality. In particular, like the very first paper that introduced Log-Sobolev inequality is this paper by Gross in 1975. I mean, his first and main example was the simplest case, the hypercube with this set of generators, just the standard random walk on the hypercube, and he computes by induction that the Log-Sobolev Inequality -- Log-Sobolev Constant is 2 over N. So you can take this general theorem, just plug in 2 over N here, and you get like constant over N times log of 1 over maximum influence. So that's exactly like the KKL that the Talagran refinement of KKL. And then very luckily for us somebody else figured out the log-Sobolev Constant for this random walk. So this is Lee and Yaw [phonetic] in 1998, quite a bit later. And they showed that for this random walk in the Kth slice with the transpositions as the generating set, the Log-Sobolev Constant is again 1 over N, assuming that K over N is bounded away from 0 and 1. This was not hold, and it actually becomes 1 over log N, and therefore you exactly like lose the factor that you gain, the log factor that you gained. Yeah. So that was good for us. So but then if you plug 1 over N in, you get the exact same theorem that we needed for the robust Kruskal-Katona that the averages of the influence is at least 1 over N times -- well, if you do the simplified version, you get that if F is a balance function there exists some generator U with influence at least row log 1 over row. This is like a simple identification of this statement. So that's how we get the two KKL applications that we needed. Any questions? Okay. So I have a couple more slides I guess. That's the summary. I guess we kind of finished all this. So this is a paper that Karl and I wrote a little while ago, and just now we're kind of in the process of writing up a short paper where we show that some of these things -- some of these results are tight. So I'll mention a couple updates from what we're working on now. We really thought that this condition was extraneous. We really, really worked to try to get rid of it. And then one day we were like, hey, let's see if maybe it's necessary after all, and we just found an example. The whole setup where U is not closed in our conjugation, and then the theorem fails. So it's a pretty simple case. It's a Kaylee graph. This is the group. It's nonabelian, as you see. It's like Zth to the N, semidirect product ZN onto the natural operation where this just cyclicly permutes the coordinates. Take this generator set, it's not closed in our conjugation, and then this simple function, F, which takes a string plus an index, just ignores the index and outputs the first coordinate, it's balanced but all of the influences are constant over N. 1 over N maybe. So that's a shape. So this strong condition is actually necessary. And we also -- there's a further like twist Talagran that put on the original KKL where we actually -- it's actually strictly better. She showed that the average of the influence over log 1 of the influence is at least 1 over N. And we also managed to generalize this slightly better version and stick the log sub-11 quality in there. And the proof is a bit different. You have to give this proof that uses like [phonetic] and some generalized holder in equality. But again it's kind of -- yeah. You just kind of take the proof of KKL, or telegran, and you just carefully stay at a high level and generalize it to this setting. Okay. So the last thing I'll mention here is like one fun maybe open problem. This is more of a quirky problem that came up for us. Go all the way back to this net thing. Remember we gave this net that was like 0,1, majority, and the N coordinate functions, and we showed that every monotone function is correlation at least log N over root N with one of these guys. And there's N plus three guys in the net. And for a long time we thought that like, you know, you got to have the N coordinate functions in there, and so you'd think -- and so probably any net that achieves this much, that's how cardinality at least like N, right, so we were like maybe we can get it down to N plus 2, N plus 1. Well, actually we needs you can actually -- it won't show it, but you can get one of cardinality N over log N even. So actually I think you can probably get one of slide now having seen this, I think you can probably get a net that's this good and has cardinality N to the epsilon. So one could try to prove or disprove that. Okay. Thanks for your attention. [applause] >> Ryan O'Donnell: Questions? Claire? >> Yeah. It's almost very simple. And I know it's not. So one thing I try to remember, one piece of information from your talk [inaudible]. [laughter] >> Ryan O'Donnell: Yeah. >> [inaudible] >> Ryan O'Donnell: No, I think you should remember the statement of the original KKL. It's a cool thing. Well, I mean, if you want something that has to do with something we did, I guess I like this cololary of the Kruskal-Katona that's like one of these sort -- you know, Kruskal-Katona is an old result and it's like rigid in a sense that it like gives you the best optimize -- the best function for like minimizing the density of the upper shadow, and it is what it is. But we show that actually we can prove a theorem that says if you're not like the optimizers, if you're kind of `not similar to one of the optimizers, then actually the density jump is a lot bigger. You can look for other cases where you know there's been example that achieves like this much but if you somehow rule out some other things then maybe you can prove a much better bound for whatever your problem is. Any other questions? >> Can you hint at what's the smaller [inaudible]? >> Ryan O'Donnell: Oh. Yeah. We didn't try very hard on this, but the thing that Karl tells me works, he wrote it down, is just divide an N inputs into blocks of size log N and take majority on each of the blocks. And maybe also throw in 0 and 1. So how, yeah, somehow like you take majority on log N bits, it's kind of like similar enough to like each of the coordinate functions there that kind of covers each of them. You can take the place of -- you can stick log N coordinate functions together and replace them with majority on that. >> [inaudible] >> Ryan O'Donnell: Yeah. We -- it's quite possible. We didn't really try very hard. >> Yuval Peres: Okay? >> Ryan O'Donnell: Okay. Thanks. >> Yuval Peres: Let's thank Ryan. [applause]a