>> Yuval Peres: We're delighted to have Nati Linial from the Hebrew University. >> Nati Linial: Thank you. Okay. The thing is enough of you who have heard me preach in the past. So as always feel free if you have any questions comments anything. Don't wait 'til the end just -- okay. So it's really subject that I'm very excited about and the subject that you've touched upon sometimes in the past and I'm coming back to, because there are lots of new discoveries that are relevant and there is quite a bit that tells me that the subject is, in fact, even more significant than I had realized before. So I think it's worthwhile coming back. I want to start from -- a question that Microsoft I can say. This is a fair practical importance, great practical importance which is -- let's -- it's a good spot to start with. To read large graphs. So on this issue I don't have that much to offer except to explain to you what the problem is and why it's so important and so on and forth but it will serve as I think in the case a major sort of motivation for the things that I will tell you later. So the area of science which I see this on a regular basis is bioinformatics which is something that I do but it's an extremely general question of relevance to anything machine learning. In fact, I would even dare to say that this should be a next step in areas such as statistics and machine learning so and so forth. So let me start from the concrete. Lots of data -- so I will stick with bioinformatics just because it's a subject I know something about. But such examples are all over the place. In any areas -- big data is, of course, the popular buzzword these days, but in so to stick with -- to bioinformatics for example when you collect data about -- in biological experiments one of the things that you will find out something which can be described in terms of what's called a [indiscernible] interaction. There some organism with which you're dealing. You don't need to know anything about anything significant about that to know that proteins are little machines that do anything in any living organism and these molecules can be an interaction. This is really how things take place, but not -- it's limited. So can you create a graph in which the [indiscernible] are the proteins of the organism in question just to give you a scale. It's really -- so how many proteins are there let's say in our body it's a matter of there are different ways to count it which makes sense but any number between I don't know 20,000 and a quarter million would make sense according to -- so that's the scale of the graph. It has edges. And there it is. So, of course, this is a very meaningful source of information, this big graph. But how do you read isn't it I mean, what is it that you want to look at in a graph like this or for example, I give you this graph of a human or this graph of a rat or of the [indiscernible] whatever, any of the model organisms. How do you compare between them. How do you attempt anything. So this is really a major question. And I must say for the best -- to the best of my knowledge this is -- there is really no good answer for this at the moment. So it's a major question, and you'll see it motivates much of what I do today. I'll tell you today. So what do people do some of the time? So, perhaps, feed the organism something different or cause a mutation or whatever and redraw the graph and understand what the differences are. That's very meaningful for their purposes. So that's one thing. But on general level people usually count degrees on the variance while the degrees are being counted for what as far as I can tell for only one good reason that it's. Easy. That's the -- but it really provides very little information. So how do you -- so this is a major -- major problem and what I want to tell you some background for the following approach which says count small subclass. So I'm not claiming that this is not even close to the ultimate answer to this question, but at least it's a meaningful and possibly useful answer to this, but before I go to this and I explain where this is coming -- and first of all I have to say, again, in defense of biologists systems biologist this is something that biologists do carry out to some extent that you'll see this idea in bioinformatics papers but even what to do with these numbers even that is not so clear. Okay. That we already know all of us know what do with this, right? I mean, perhaps you'll see the numbers come in clusters. Perhaps you recognize that it looks like a distribution that you know and, okay, perhaps you're now able to estimate give good estimates for the final parameters and so and so forth. And I can imagine something like this in this context, as well. So a possible general approach would be come up with so an analogy okay come up with hopefully not too big of generalative motives. Or -- can people see this or should I -- you're still okay. Graphs and recover. So somehow it's strange for us to -- I mean we have -- I mean if you understand if you really understand the important of the human you've really made huge progress in biology. We just don't understand exactly what this means, but for example, one thing you could say, okay, let me come up with some simple models. This is like the standard distributions that in the analogy the standard contribution that we consider in probability and statistics and then let me find out what the defining particles are for this graph. So that's an approach. Now I want to mostly concentrate on this. So a good reason to look into these problems and one of my main motivations for coming back to this question is this theory for a large extent developed originally here mostly by [indiscernible]. Any coworkers anyone here? I don't think so. But he has -- I should have brought it down. He has a beautiful book on large metrics and graph units which to a large extent tells you -- to some extent tells you that if you understand the count of small subgraphs in a big graph it really puts you -- I mean, it really tells you in some way everything that we do here today is a symbiotic in nature. We only speak about very large graphs. You have a sequence of graphs and you like to have a notion of what it means for the sequence of graphs to tend to a limit. Now, as you all know, if you have a metric, then it's possible to speak about limits if there is a metric on whatever object, because then at least you can speak about [indiscernible] perhaps understand what is going on. So how do you -- you need a metric among graphs. How do you compare graphs. Well, it's easier to think of a graph as just a symmetric zero one metric. So if you want a metric to minus zero metric just think of this as black and white dots on a mapping to zero. Instead let's explain what is -- what is the metric between two functions symmetric functions from 01 into 01. Well, allow you to rearrange in a symmetric fashion a metric preserving map of the interval and then I take these two functions and I take one difference something like this. So this a good notion of distance among graphs and so the theorem says that a sequence -and this is called the square cut metric. So a sequence of graphs has a limit in this sense if it only -- for every fixed graph edge, the instance frequency with which H is found in tends to a limit. So you have two notions of limit, limits here and the theorem tells that they coincide. One notion comes from this way of comparing or measuring distance among graphs, and the other comes from this concept of counting small subjects. So you fix a graph H and you ask how often do I find this graph H in G N. Okay. I'm being more accurate with details. And if this tends to limit for every graph H, then this makes sense to say the sequence goes -- the two concepts coincide. Okay. These two notions of. So in particular, what I'm trying to read out of this and to context of this very general question is that if you understand the frequency at which you see small subgraphs in a big graph, then this tells you what the limit object is. There's a lot of additional stuff in graphing and whatnot. It's as I said, by now really a big theory but that's a good motivation for -- okay. Let me say two more things about what this very generally context. First of all this is very closely related but I will not develop this -- this local approach. So the motion of local which appears in my title is exactly this and I won't say much more about this. So property testing for those who know what they are is very closely related. That's another way to understand or to view the local structure of big graphs. We're also going to talk about the other [indiscernible] that show not just -- and I will mostly focus on graphs, but we will also see just briefly at least two other types and one is tool. Tool mark is just an orientation of the complete graph and again you can look at small subsets and ask how often do I see everything that I see and I will also speak about permutations. And in general any large object that you can think you can also associate with a local theory. So I have to explain what is the local structure of a permutation. A permutation I think in a sense is a number from one to unwritten in sum. And then let's say I want to understand the profile of this that's an expression I will introduce in a minute. So I'm looking at some five locations here. I see a five. It has an order there's an order. This is bigger than that and smaller than and so on so forth. It gives you permutations. It gives a permutation of five. So there's five. So permutations always have a local profile. So that's -- so this is one comment. The other thing is there is no two -- no local in the sense of looking at neighborhoods. That's also very interesting, but I'm not going to get into this at all. You could also ask instead of saying I'm looking at let's say five element sets or something like this I'm looking at the vertice and I'm looking at what the five neighborhoods that the vertice looks like. That's also very interesting. But I'm not getting into that at all. Those of you who you heard me ask a question about girth at lunch that question can flow into this. Okay. So what are we doing? We are going to have a large graph. And we're going to look at -- most of the time as I said I will speak about graphs. We're going to look at K element subsets. So this is called the K profile of a graph. So G is a large graph. K is an integer. The K profile of G is this. You're looking at all of the K element subsets of vertices. Each K element substances stands a K vertice graph. So this induces a distributional K vertices graph. This distribution is the K profile of G. Okay. Good. What do we want to look at? There's a whole array -- not too many, but there is several possible behaviors key properties of the profile that we -- so in fact, let me already start with one very concrete and very difficult question about which we know still very, very little and I will try to say a few of things that we know, so's what I view as a very basic question, what possible K profiles, just very concrete question. So let me be even more concrete just to bring home to you the notion that these are very basic questions and the moment we'll descend them very, very quickly. Let's start with K equals three. Just K equals three. So what are possible graphs on three vertices. There is this. There is this. And there is this. And let's say I call this P one, P two and P three. And so if this is G, then this is P of G this is whole vector is P of G and I'm not interested in what happens in small graphs. I'm only interested in the symbiotics. So in particular I'd like to understand this set, this four dimensional set which consists of all the X's in flow such that for every positive X and every large N there exists an N vertex G such that P of G minus X is less than X. So don't let this confuse you. Basically, which four topics am I interested in those which are realizable or most realizable and all allowing [indiscernible] on L by arbitrary large graphs. So think about it. This is a four dimensional set only three-dimensional because, of course, the coordinates are negative and sum to one? What is this set. Are we able to describe it? The answer is no we are still unable to describe even this very simple -- this very first case of the general problem. We'll tell you a few things that we know about this. And as I've tried to impress on you while this is still -- I mean it's considerable that is if you know if we really wanted -anyone who was interested in this on would not be working on this problem and would like to understand this set, the case of K equals three perhaps is manageable but when you go up to K equals four the situation of the moment is really bad. We really don't understand the thing at all. So let me try and... You're not awake enough in my -- I'll try to make you awake. Okay. Let's see. Can I do this? Okay. Okay. So let me try and give you an impression on a -- so here is a very well known and old fact, good minds inequality from 64 says that P zero plus P three is at least a quantum. So let me just be accurate about this. I actually should be writing minus little of one but because I'm dealing with this, I'm suppressing such that. Let me prove this to you so at least we would prove something. It's really very easy. And instead of speaking about P zero and P three let me instead think of -- instead of speaking about graph, what I do is I cover the complete graph I color them blue and red instead of edges and nonedges and so first of all before we prove this, let's say that this makes sense if I do the coloring if I take a G and one half graph the chance for this is an eighth and the chance for this is an eighth. So at least you have equality here. And how do you prove this it's really very easy. So let's sound this expression. On the one hand this is maxima is of course these two numbers and M minus one. So this less than N times one N minus one squared over four which would be the case if each one of those is N minus one over two. On the other hand, I can ->>: [indiscernible]. >> Nati Linial: Yeah. Sorry correctly. And but there's another way to interpret this. You see, let's instead now draw things like this to indicate color. You see every triangle like this contributes zero to this. Every triangle like this contributes two to this. So this is two times N equal three times P one plus P plus two. And now it just move things -- I mean, now it's just high school algebra and you've had this conclusion. Okay. Very good. So this nice observation this fact has been known you know for almost 50 years. So what if we go up to four. Okay. Let's ask the same question for four vertexes and let's hear what we know that will give you an indication of what the situation is with it. Okay. So for want of a better name, let me call them Q zero and Q six. So this is the probability that I get probability with no edges and this is the probability that I get on six. Of course, there is more than seven -- in fact, 11 different -- some graph now -- classes for graphs [indiscernible]. So in view of what we saw here and the fact that equality had exactly in general half, this made the conjecture that Q zero plus Q six is at least one in 32. Let's just go over the calculation. In one half, the probability is one in 64, one in 64, one -- this is not. It was refuted by Andrew Tollson. And there is -- so he wrote about this. There's more than one paper and there is also [indiscernible]. Also found counter examples. The point is -- so one of the -- one of the main, observations that really motivate my work on this is the fact that in any mathematical field your understanding depends very much on examples that we know, and they're really very, very few examples that we know in concrete classes of graphs that we understand. There is -- so the answer is smaller than this, not much smaller one in 33 or so. But no one has any guess on where the minimum is. So what is this hiding? I will tell you later about more -- about what's on here and -- one reason why this is more, you know, an aside notes okay there was this question the conjecture the natural conjecture is not true is that think of very fundamental questions like how theory the [indiscernible] Theorem. So you know, okay, N is the least integer such if I take the complete graph on this number of vertex and I color everything blue and red. Either I get the blue k clique or I get the blue L clique. So what do we know about these numbers? So let's speak mostly about R and N. The symbiotics here is something like four to the N, and here it's root two. So it's a fourth power. And the lower bound the best lower bound by Ramsey numbers comes from G and one half. To me this suggests that perhaps this is not the right place to look for the solution that there are graphs which have better behavior in this respect than random cross. So this suggests to me there is really new continent to discover here. You know, there is a whole class of -- and the fact at the moment no one even has a guess of where this is. I mean granted the difference here is not so big. It's really okay. So we know now lower bounds I forget, perhaps, one in 34 something. The gap is not so big. But the point is no one even has a concrete guess of where -- what the optimum construction is. So we know -- so it's like you know the famous story about the elephant. There is an elephant in the room. We have touched it but we don't understand, you know, what shape it is, what size it is where it goes and so. And there is an indication that there is an elephant to discover. By the way, what is the situation about timing how when should I -Okay. Good. So okay. So let me then try and give you one more indication on -- how poorly we understand these things and other continents there is to discover. So here is a very interesting and basic notion that's related to all of this, and this is inducability. So what's -- so we're fixing a graph H and this is the largest frequency with which we can find an induced H from -- little sloppy here with my definition, but I think you understand. So you fix an H. You take a large graph and you want the probability. Let's say that H is a K vertex graph I want to graph a big graph in which I have the highest probability that if I pick K vertexes what I see is an H. Okay. So and there is certain -- there are certain graphs for which we know this but mostly we don't. So here is a nice example. I don't know how much I will be able to cover the details of this, but let's say H is a C5. Okay. I'll get you doing something. What's -- which graph has the most induced C5s among all big graphs, what would you do? What would you guess? Go off the C5. Excellent first guess. Still no final. So that's a blow-up of C5 just for those who have know the terminology. So you replace each vertex of the C5 by an independent of size N and you connect with a complete [indiscernible]. How likely are you to find the C5 here. Well, you'll find it precisely if all the five vertices. Each vertex falls in a different block so which means the probability is five over five to the five which is 24 over 625. Well, anyone willing to raise the bets? Anything better than this? It's a good start. I mean, the solution's not so far off. What can you do it to improve it even locally? What if they all fall in the same block? Now we're getting nothing for this. So just recourse. >>: [inaudible]. >> Nati Linial: So just do the same thing. And if you do this, you'll get one over 26. So then inducability of C5 is at least one -- okay. What do we know? So to forget the exact number but this is what we know at the moment. So I don't know how much time I will have to tell you about this. Let me only say two words to explain this. This comes from flag articles. So if you know what these are then you know if not I hope to be able to say something. This is a beautiful metric that was born -- was invented a few years ago. It's has a theory and finally you go to the computer and compute. You have to solve some large growing and growing semindefinite problems. Of course, at some point the machine gives up, and this is -- well, the solution. That's the best we're able to do. So the value here is just the computation of power of our machine, and sometimes in situations like this people were managed to be clever enough and throw away that part and actually -- in short, you know as mathematicians we don't know the answer. As human beings of course we know the answer. That's the answer. Okay. So, perhaps, it's not perfectly satisfying, but let me show you the following embarrassing thing. What is the inducability of this one? No one has a guess. In terms of numbers, the numbers are also -- the upper and lower I don't remember what they are so it's not important for the moment. The upper and lower bond are not so different from each other, but no one has a guess again. So for me, that's a close cousin of these things. So it's again an indication that there's a whole family of graphs that should be very interesting. This sounds like a thoroughly basic property of the graph. You want -- that's by the way self complimentary. The compliment of this is the same graph. How can you gain induced copies of this. No one has a clue. Okay. So somehow I thought I had more time. I prepared more of this material for the whole list. Of course so I knew I was not going to cover everything. So let me see what can I tell you and what will I skip. Okay. Let me say a few words -- okay. There are a few important ideas that I still want to get through. So in terms of this local structure of graphs, first of all, this whole thing is very closely related to very highly developed part of combinatorics seimigraph theory. The semi graph theory you mostly ask questions like such as the graph is such and such density with so many edges. It doesn't have to contain this and this graph. So I think you're presumably all know theorem which says that if the density of G is bigger than R minus two R minus one then G contains a KR. Okay. The first example is this the complete bioformatic graph with two sides doesn't contain a triangle and that's the graphs with the highest density which doesn't contain -- this doesn't contain a triangle. You want one that doesn't contain K four you take four equal parts and that has the highest density. So this local theory of graphs, this kind of stuff, contains -- I mean this is a huge extension of this basic question, because here we're only counting density and here we want to understand the full view. So there is, of course, a lot of material in graph theory that's highly relevant, but this really takes much broader view of graph theory from the perspective -- and so let me -- let me say a word about flag algebra in sum. These are of course very basic very basic theorem of graph theory. And this has led to various questions. One, for example, if you never saw this then this is still a famous -- one of the most famous open problems in [indiscernible] -- in problems for three graphs. So question. What is the highest -- so the element of this theorem what is the highest density of a three uniform hypergraph, so a collection of three that does not contain K for 3. So it doesn't contain four vertices in all the four. That has been open for many years. There is a -- there is a concrete conjecture which I will not say, but that has been open for many, many years, one of the most famous problems. And so in order to attack this question and attack another question in this area, so here in particular that was already known to Montell. So Montell is a predecessor of [indiscernible] theorem says that -- I'm just repeating what I wrote here a graph with density bigger than one half contains -- and the bound is tight. So the question that remained here is if the density of G is G bigger than one half what -how small can P three be so what's the smallest density of triangle in order to attack these two questions Lasvos invented his theory of flag algebra which I will not have time to explain to you. Conceptually it's quite simple, but very powerful. And I already explained a little bit about how you work with it. Eventually you have to solve some growing and growing semidefinite problems, and with this he was able to concretely -- this was concretely resolved by [indiscernible] using flag algebras. And here he made the most progress, but this is still open. So that's a very powerful tool that we're able to use in this whole area. But as I said, still many, many things remain open. So what I'd like to do in the last part of my -- I haven't gotten even to tell you about [indiscernible]. But that's not so important. Let me -- let me say a little bit about local to local. So I think having given you the impression that local theory of combinatorics is how deep interesting. What's perhaps even more exciting is the local to global theory. So my main question so far has been what does the K profile of large graph look like, what can we say about this. And as I said with K equals three there is perhaps, still hope. I didn't tell you some of the theorems that we're able to prove, but we were making progress there. Four seems at the moment seems getting a full description of the four profiles seems at the moment completely out of reach. But then there's another kind of theory that you could ask yourself. So let's suppose I'm telling you the K profile of the graph. What can you conclude globally. So here there is a beautiful conjecture by [indiscernible]. Okay. So let me start with the definition. There are several general properties that you can consider in the -interesting about profiles. There's one class which is very interesting and again worthy of separate lecture which is quasi -- an object can share properties with a relevant graph and there is a beautiful theory about this in graphs and so on in permutations. I guess I should mention the name of Chang Graham Wilson who developed such a theory from graphs and then the recent work on permutations. So that is something that we have a fairly decent understanding of. There's another property of generosity which is much weaker which is universality, K universality. You say that G is K universal if it has a full K profile. So everything is there. So that's another very interesting property to consider. And this is what the other side of conjecture speaks about. It's really an amazing conjecture. So here it is for every H, fixed H, there exists an epsilon positive such that if G is an n-vertice graph -- so N is graphed. And it's independence number. So the largest independence site. And it's a clique number less than the N to the epsilon -- sorry -- that is H three, then I four G and negative G are bigger than N to the X. So let's look at this and understand. So this is a graph which is not universal. It's some -- if you're looking at the local profile there is something missing. There's no H induced right? H free means doesn't contain an induced copy of H. Now, in general, I mentioned to you before the Ramsey Theorem. If you reverse the way that I told you before, then every graph on an N vertices has either a clinical size local or a nonclinical size local and the boundaries start. It's attained in a random graph. If you only omit this one object, there is no H in it. Then the numbers jump from the to [indiscernible] to a fractional power. In fact, in their paper, they prove exponentially the root logic. So that is none. But then third power is open. The first case which is unknown is H equals C5. We don't know this -- how to prove this or refute this for C5. And just to give you a sense of a -- I said that I would mention two elements of -- I mentioned it once. So [indiscernible] proved that [indiscernible] for graphs is equivalent to [indiscernible] for two elements, requires some explanation. So what it means for to miss something you understand let's say we're dealing with the seven profiles. So there's two elementals and seven vertices which you cannot find anywhere. What replaces this -- the element of this is the largest transitive. So every -- that's very easy. Every inverted external element -- if you never saw this then that's a good exercise why you show something. Every inverted external element has a transive site element on at least a local base two of N vertices. The boundary is tied up to a constant here. Okay. So these two theorems -- these two statements are equivalent, and they're both open. And let me just give you -- perhaps -- finish with that. Let me tell you -- at least I will mention one small thing that we did. So different way of stating the elementary conjecture that's a restatement of the conjecture which is nothing just manipulating simple things. It says for every K there exists a maximum positive such that if R graph and graph G are bigger than the N to the epsilon -- smaller to the epsilon then G -- sorry -- then G is K universal. So we're wondering if this is -- this is something of the sort -- that you know it's a weak version of what you expect to see in a random graph. Okay. And so we were wondering if there is anything like this instead of saying that there are no large cliques and no large other cliques perhaps if the counts are small. So here is a very simple thing which I mean, it's just -- it's simple in the sense that it's just work. There are no really ideas necessarily. So proposition that I proved with [indiscernible]. So coming back to the triple statistics, if P zero and P three are less than .159 then G is universal. The bound is tight. For example, we know that this does not imply five universality. We don't know whether it implies four universality. And this is a new type of question. That's very recent in this area. So let me just say -- so why -- let me just say one more thing about why I find this conjecture so exciting. There really lots of things. In particular think of -- perhaps this is not the right thing to do in this last minute of talk. Think about what we know in physics. In physics all we know is really all we have to start with is local information, right, how particles interact and we'd like to conclude for it a big picture of -- the big view of the -- in this -- I think this is why I find this conjecture so exciting that all we are saying is that if you look at the repertoire of the small size subgraphs of this big graph, this has this huge effect on the overall structure of the graph and with this I will stop. Thanks. [clapping]. >>: What the recent fate [inadudible]. >> Nati Linial: There was one a here a few months ago which my understanding has been found to be -I don't think it was so deep. So perhaps how old is? This do you remember the name of -Okay. So there was -- I'm aware of a paper from a few months ago which claimed to have proved this, and I didn't read it but I checked with, you know, one person that he -- the author thinked was this guy so I checked with [indiscernible]. So my understanding is that the question is still quite solidly open. [Clapping].