>> Yuval Peres: We're delighted to have Nati Linial... >> Nati Linial: Thank you.

advertisement
>> Yuval Peres: We're delighted to have Nati Linial from the Hebrew University.
>> Nati Linial: Thank you.
Okay. The thing is enough of you who have heard me preach in the past. So as always feel free if you
have any questions comments anything. Don't wait 'til the end just -- okay. So it's really subject that I'm
very excited about and the subject that you've touched upon sometimes in the past and I'm coming back
to, because there are lots of new discoveries that are relevant and there is quite a bit that tells me that the
subject is, in fact, even more significant than I had realized before. So I think it's worthwhile coming back.
I want to start from -- a question that Microsoft I can say. This is a fair practical importance, great
practical importance which is -- let's -- it's a good spot to start with.
To read large graphs. So on this issue I don't have that much to offer except to explain to you what the
problem is and why it's so important and so on and forth but it will serve as I think in the case a major sort
of motivation for the things that I will tell you later. So the area of science which I see this on a regular
basis is bioinformatics which is something that I do but it's an extremely general question of relevance to
anything machine learning. In fact, I would even dare to say that this should be a next step in areas such
as statistics and machine learning so and so forth. So let me start from the concrete. Lots of data -- so I
will stick with bioinformatics just because it's a subject I know something about. But such examples are
all over the place. In any areas -- big data is, of course, the popular buzzword these days, but in so to
stick with -- to bioinformatics for example when you collect data about -- in biological experiments one of
the things that you will find out something which can be described in terms of what's called a
[indiscernible] interaction.
There some organism with which you're dealing. You don't need to know anything about anything
significant about that to know that proteins are little machines that do anything in any living organism and
these molecules can be an interaction. This is really how things take place, but not -- it's limited. So can
you create a graph in which the [indiscernible] are the proteins of the organism in question just to give you
a scale. It's really -- so how many proteins are there let's say in our body it's a matter of there are
different ways to count it which makes sense but any number between I don't know 20,000 and a quarter
million would make sense according to -- so that's the scale of the graph. It has edges. And there it is.
So, of course, this is a very meaningful source of information, this big graph. But how do you read isn't it I
mean, what is it that you want to look at in a graph like this or for example, I give you this graph of a
human or this graph of a rat or of the [indiscernible] whatever, any of the model organisms. How do you
compare between them. How do you attempt anything. So this is really a major question. And I must
say for the best -- to the best of my knowledge this is -- there is really no good answer for this at the
moment. So it's a major question, and you'll see it motivates much of what I do today. I'll tell you today.
So what do people do some of the time?
So, perhaps, feed the organism something different or cause a mutation or whatever and redraw the
graph and understand what the differences are. That's very meaningful for their purposes. So that's one
thing. But on general level people usually count degrees on the variance while the degrees are being
counted for what as far as I can tell for only one good reason that it's. Easy. That's the -- but it really
provides very little information. So how do you -- so this is a major -- major problem and what I want to
tell you some background for the following approach which says count small subclass. So I'm not
claiming that this is not even close to the ultimate answer to this question, but at least it's a meaningful
and possibly useful answer to this, but before I go to this and I explain where this is coming -- and first of
all I have to say, again, in defense of biologists systems biologist this is something that biologists do carry
out to some extent that you'll see this idea in bioinformatics papers but even what to do with these
numbers even that is not so clear.
Okay. That we already know all of us know what do with this, right? I mean, perhaps you'll see the
numbers come in clusters. Perhaps you recognize that it looks like a distribution that you know and,
okay, perhaps you're now able to estimate give good estimates for the final parameters and so and so
forth. And I can imagine something like this in this context, as well. So a possible general approach
would be come up with so an analogy okay come up with hopefully not too big of generalative motives.
Or -- can people see this or should I -- you're still okay. Graphs and recover. So somehow it's strange for
us to -- I mean we have -- I mean if you understand if you really understand the important of the human
you've really made huge progress in biology. We just don't understand exactly what this means, but for
example, one thing you could say, okay, let me come up with some simple models. This is like the
standard distributions that in the analogy the standard contribution that we consider in probability and
statistics and then let me find out what the defining particles are for this graph. So that's an approach.
Now I want to mostly concentrate on this. So a good reason to look into these problems and one of my
main motivations for coming back to this question is this theory for a large extent developed originally
here mostly by [indiscernible]. Any coworkers anyone here? I don't think so. But he has -- I should have
brought it down. He has a beautiful book on large metrics and graph units which to a large extent tells
you -- to some extent tells you that if you understand the count of small subgraphs in a big graph it really
puts you -- I mean, it really tells you in some way everything that we do here today is a symbiotic in
nature. We only speak about very large graphs. You have a sequence of graphs and you like to have a
notion of what it means for the sequence of graphs to tend to a limit. Now, as you all know, if you have a
metric, then it's possible to speak about limits if there is a metric on whatever object, because then at
least you can speak about [indiscernible] perhaps understand what is going on. So how do you -- you
need a metric among graphs. How do you compare graphs. Well, it's easier to think of a graph as just a
symmetric zero one metric. So if you want a metric to minus zero metric just think of this as black and
white dots on a mapping to zero. Instead let's explain what is -- what is the metric between two functions
symmetric functions from 01 into 01. Well, allow you to rearrange in a symmetric fashion a metric
preserving map of the interval and then I take these two functions and I take one difference something
like this. So this a good notion of distance among graphs and so the theorem says that a sequence -and this is called the square cut metric. So a sequence of graphs has a limit in this sense if it only -- for
every fixed graph edge, the instance frequency with which H is found in tends to a limit.
So you have two notions of limit, limits here and the theorem tells that they coincide. One notion comes
from this way of comparing or measuring distance among graphs, and the other comes from this concept
of counting small subjects. So you fix a graph H and you ask how often do I find this graph H in G N.
Okay. I'm being more accurate with details. And if this tends to limit for every graph H, then this makes
sense to say the sequence goes -- the two concepts coincide. Okay. These two notions of. So in
particular, what I'm trying to read out of this and to context of this very general question is that if you
understand the frequency at which you see small subgraphs in a big graph, then this tells you what the
limit object is. There's a lot of additional stuff in graphing and whatnot. It's as I said, by now really a big
theory but that's a good motivation for -- okay. Let me say two more things about what this very generally
context. First of all this is very closely related but I will not develop this -- this local approach. So the
motion of local which appears in my title is exactly this and I won't say much more about this. So property
testing for those who know what they are is very closely related. That's another way to understand or to
view the local structure of big graphs. We're also going to talk about the other [indiscernible] that show
not just -- and I will mostly focus on graphs, but we will also see just briefly at least two other types and
one is tool. Tool mark is just an orientation of the complete graph and again you can look at small
subsets and ask how often do I see everything that I see and I will also speak about permutations.
And in general any large object that you can think you can also associate with a local theory. So I have to
explain what is the local structure of a permutation. A permutation I think in a sense is a number from
one to unwritten in sum. And then let's say I want to understand the profile of this that's an expression I
will introduce in a minute. So I'm looking at some five locations here. I see a five. It has an order there's
an order. This is bigger than that and smaller than and so on so forth. It gives you permutations. It gives
a permutation of five. So there's five. So permutations always have a local profile. So that's -- so this is
one comment. The other thing is there is no two -- no local in the sense of looking at neighborhoods.
That's also very interesting, but I'm not going to get into this at all. You could also ask instead of saying
I'm looking at let's say five element sets or something like this I'm looking at the vertice and I'm looking at
what the five neighborhoods that the vertice looks like. That's also very interesting. But I'm not getting
into that at all. Those of you who you heard me ask a question about girth at lunch that question can flow
into this.
Okay. So what are we doing? We are going to have a large graph. And we're going to look at -- most of
the time as I said I will speak about graphs. We're going to look at K element subsets. So this is called
the K profile of a graph. So G is a large graph. K is an integer. The K profile of G is this. You're looking
at all of the K element subsets of vertices. Each K element substances stands a K vertice graph. So this
induces a distributional K vertices graph. This distribution is the K profile of G. Okay.
Good. What do we want to look at?
There's a whole array -- not too many, but there is several possible behaviors key properties of the profile
that we -- so in fact, let me already start with one very concrete and very difficult question about which we
know still very, very little and I will try to say a few of things that we know, so's what I view as a very basic
question, what possible K profiles, just very concrete question. So let me be even more concrete just to
bring home to you the notion that these are very basic questions and the moment we'll descend them
very, very quickly. Let's start with K equals three. Just K equals three. So what are possible graphs on
three vertices. There is this. There is this. And there is this. And let's say I call this P one, P two and P
three. And so if this is G, then this is P of G this is whole vector is P of G and I'm not interested in what
happens in small graphs. I'm only interested in the symbiotics. So in particular I'd like to understand this
set, this four dimensional set which consists of all the X's in flow such that for every positive X and every
large N there exists an N vertex G such that P of G minus X is less than X. So don't let this confuse you.
Basically, which four topics am I interested in those which are realizable or most realizable and all
allowing [indiscernible] on L by arbitrary large graphs. So think about it. This is a four dimensional set
only three-dimensional because, of course, the coordinates are negative and sum to one?
What is this set.
Are we able to describe it? The answer is no we are still unable to describe even this very simple -- this
very first case of the general problem. We'll tell you a few things that we know about this. And as I've
tried to impress on you while this is still -- I mean it's considerable that is if you know if we really wanted -anyone who was interested in this on would not be working on this problem and would like to understand
this set, the case of K equals three perhaps is manageable but when you go up to K equals four the
situation of the moment is really bad. We really don't understand the thing at all. So let me try and...
You're not awake enough in my -- I'll try to make you awake. Okay. Let's see. Can I do this?
Okay. Okay. So let me try and give you an impression on a -- so here is a very well known and old fact,
good minds inequality from 64 says that P zero plus P three is at least a quantum. So let me just be
accurate about this. I actually should be writing minus little of one but because I'm dealing with this, I'm
suppressing such that.
Let me prove this to you so at least we would prove something. It's really very easy. And instead of
speaking about P zero and P three let me instead think of -- instead of speaking about graph, what I do is
I cover the complete graph I color them blue and red instead of edges and nonedges and so first of all
before we prove this, let's say that this makes sense if I do the coloring if I take a G and one half graph
the chance for this is an eighth and the chance for this is an eighth. So at least you have equality here.
And how do you prove this it's really very easy. So let's sound this expression. On the one hand this is
maxima is of course these two numbers and M minus one. So this less than N times one N minus one
squared over four which would be the case if each one of those is N minus one over two. On the other
hand, I can ->>: [indiscernible].
>> Nati Linial: Yeah. Sorry correctly. And but there's another way to interpret this. You see, let's instead
now draw things like this to indicate color. You see every triangle like this contributes zero to this. Every
triangle like this contributes two to this. So this is two times N equal three times P one plus P plus two.
And now it just move things -- I mean, now it's just high school algebra and you've had this conclusion.
Okay. Very good.
So this nice observation this fact has been known you know for almost 50 years. So what if we go up to
four. Okay. Let's ask the same question for four vertexes and let's hear what we know that will give you
an indication of what the situation is with it. Okay. So for want of a better name, let me call them Q zero
and Q six. So this is the probability that I get probability with no edges and this is the probability that I get
on six. Of course, there is more than seven -- in fact, 11 different -- some graph now -- classes for graphs
[indiscernible]. So in view of what we saw here and the fact that equality had exactly in general half, this
made the conjecture that Q zero plus Q six is at least one in 32. Let's just go over the calculation. In one
half, the probability is one in 64, one in 64, one -- this is not. It was refuted by Andrew Tollson.
And there is -- so he wrote about this. There's more than one paper and there is also [indiscernible].
Also found counter examples. The point is -- so one of the -- one of the main, observations that really
motivate my work on this is the fact that in any mathematical field your understanding depends very much
on examples that we know, and they're really very, very few examples that we know in concrete classes
of graphs that we understand. There is -- so the answer is smaller than this, not much smaller one in 33
or so. But no one has any guess on where the minimum is. So what is this hiding? I will tell you later
about more -- about what's on here and -- one reason why this is more, you know, an aside notes okay
there was this question the conjecture the natural conjecture is not true is that think of very fundamental
questions like how theory the [indiscernible] Theorem. So you know, okay, N is the least integer such if I
take the complete graph on this number of vertex and I color everything blue and red. Either I get the
blue k clique or I get the blue L clique. So what do we know about these numbers?
So let's speak mostly about R and N. The symbiotics here is something like four to the N, and here it's
root two. So it's a fourth power. And the lower bound the best lower bound by Ramsey numbers comes
from G and one half. To me this suggests that perhaps this is not the right place to look for the solution
that there are graphs which have better behavior in this respect than random cross. So this suggests to
me there is really new continent to discover here. You know, there is a whole class of -- and the fact at
the moment no one even has a guess of where this is. I mean granted the difference here is not so big.
It's really okay. So we know now lower bounds I forget, perhaps, one in 34 something. The gap is not so
big. But the point is no one even has a concrete guess of where -- what the optimum construction is. So
we know -- so it's like you know the famous story about the elephant. There is an elephant in the room.
We have touched it but we don't understand, you know, what shape it is, what size it is where it goes and
so. And there is an indication that there is an elephant to discover.
By the way, what is the situation about timing how when should I -Okay. Good.
So okay. So let me then try and give you one more indication on -- how poorly we understand these
things and other continents there is to discover. So here is a very interesting and basic notion that's
related to all of this, and this is inducability. So what's -- so we're fixing a graph H and this is the largest
frequency with which we can find an induced H from -- little sloppy here with my definition, but I think you
understand. So you fix an H. You take a large graph and you want the probability. Let's say that H is a K
vertex graph I want to graph a big graph in which I have the highest probability that if I pick K vertexes
what I see is an H. Okay. So and there is certain -- there are certain graphs for which we know this but
mostly we don't. So here is a nice example. I don't know how much I will be able to cover the details of
this, but let's say H is a C5. Okay. I'll get you doing something.
What's -- which graph has the most induced C5s among all big graphs, what would you do? What would
you guess?
Go off the C5.
Excellent first guess.
Still no final.
So that's a blow-up of C5 just for those who have know the terminology. So you replace each vertex of
the C5 by an independent of size N and you connect with a complete [indiscernible]. How likely are you
to find the C5 here. Well, you'll find it precisely if all the five vertices. Each vertex falls in a different block
so which means the probability is five over five to the five which is 24 over 625. Well, anyone willing to
raise the bets?
Anything better than this?
It's a good start. I mean, the solution's not so far off. What can you do it to improve it even locally?
What if they all fall in the same block?
Now we're getting nothing for this. So just recourse.
>>: [inaudible].
>> Nati Linial: So just do the same thing. And if you do this, you'll get one over 26. So then inducability
of C5 is at least one -- okay. What do we know?
So to forget the exact number but this is what we know at the moment. So I don't know how much time I
will have to tell you about this. Let me only say two words to explain this. This comes from flag articles.
So if you know what these are then you know if not I hope to be able to say something. This is a beautiful
metric that was born -- was invented a few years ago. It's has a theory and finally you go to the computer
and compute. You have to solve some large growing and growing semindefinite problems. Of course, at
some point the machine gives up, and this is -- well, the solution. That's the best we're able to do. So the
value here is just the computation of power of our machine, and sometimes in situations like this people
were managed to be clever enough and throw away that part and actually -- in short, you know as
mathematicians we don't know the answer. As human beings of course we know the answer. That's the
answer.
Okay. So, perhaps, it's not perfectly satisfying, but let me show you the following embarrassing thing.
What is the inducability of this one?
No one has a guess.
In terms of numbers, the numbers are also -- the upper and lower I don't remember what they are so it's
not important for the moment. The upper and lower bond are not so different from each other, but no one
has a guess again. So for me, that's a close cousin of these things. So it's again an indication that
there's a whole family of graphs that should be very interesting. This sounds like a thoroughly basic
property of the graph. You want -- that's by the way self complimentary. The compliment of this is the
same graph. How can you gain induced copies of this. No one has a clue.
Okay. So somehow I thought I had more time. I prepared more of this material for the whole list. Of
course so I knew I was not going to cover everything.
So let me see what can I tell you and what will I skip.
Okay. Let me say a few words -- okay. There are a few important ideas that I still want to get through.
So in terms of this local structure of graphs, first of all, this whole thing is very closely related to very
highly developed part of combinatorics seimigraph theory. The semi graph theory you mostly ask
questions like such as the graph is such and such density with so many edges. It doesn't have to contain
this and this graph. So I think you're presumably all know theorem which says that if the density of G is
bigger than R minus two R minus one then G contains a KR. Okay. The first example is this the
complete bioformatic graph with two sides doesn't contain a triangle and that's the graphs with the highest
density which doesn't contain -- this doesn't contain a triangle. You want one that doesn't contain K four
you take four equal parts and that has the highest density. So this local theory of graphs, this kind of
stuff, contains -- I mean this is a huge extension of this basic question, because here we're only counting
density and here we want to understand the full view. So there is, of course, a lot of material in graph
theory that's highly relevant, but this really takes much broader view of graph theory from the perspective
-- and so let me -- let me say a word about flag algebra in sum. These are of course very basic very basic
theorem of graph theory. And this has led to various questions. One, for example, if you never saw this
then this is still a famous -- one of the most famous open problems in [indiscernible] -- in problems for
three graphs. So question. What is the highest -- so the element of this theorem what is the highest
density of a three uniform hypergraph, so a collection of three that does not contain K for 3. So it doesn't
contain four vertices in all the four. That has been open for many years. There is a -- there is a concrete
conjecture which I will not say, but that has been open for many, many years, one of the most famous
problems. And so in order to attack this question and attack another question in this area, so here in
particular that was already known to Montell. So Montell is a predecessor of [indiscernible] theorem says
that -- I'm just repeating what I wrote here a graph with density bigger than one half contains -- and the
bound is tight. So the question that remained here is if the density of G is G bigger than one half what -how small can P three be so what's the smallest density of triangle in order to attack these two questions
Lasvos invented his theory of flag algebra which I will not have time to explain to you. Conceptually it's
quite simple, but very powerful. And I already explained a little bit about how you work with it. Eventually
you have to solve some growing and growing semidefinite problems, and with this he was able to
concretely -- this was concretely resolved by [indiscernible] using flag algebras. And here he made the
most progress, but this is still open. So that's a very powerful tool that we're able to use in this whole
area. But as I said, still many, many things remain open. So what I'd like to do in the last part of my -- I
haven't gotten even to tell you about [indiscernible]. But that's not so important. Let me -- let me say a
little bit about local to local. So I think having given you the impression that local theory of combinatorics
is how deep interesting. What's perhaps even more exciting is the local to global theory. So my main
question so far has been what does the K profile of large graph look like, what can we say about this.
And as I said with K equals three there is perhaps, still hope. I didn't tell you some of the theorems that
we're able to prove, but we were making progress there. Four seems at the moment seems getting a full
description of the four profiles seems at the moment completely out of reach. But then there's another
kind of theory that you could ask yourself. So let's suppose I'm telling you the K profile of the graph.
What can you conclude globally. So here there is a beautiful conjecture by [indiscernible]. Okay. So let
me start with the definition. There are several general properties that you can consider in the -interesting about profiles.
There's one class which is very interesting and again worthy of separate lecture which is quasi -- an
object can share properties with a relevant graph and there is a beautiful theory about this in graphs and
so on in permutations. I guess I should mention the name of Chang Graham Wilson who developed such
a theory from graphs and then the recent work on permutations. So that is something that we have a
fairly decent understanding of. There's another property of generosity which is much weaker which is
universality, K universality.
You say that G is K universal if it has a full K profile. So everything is there. So that's another very
interesting property to consider.
And this is what the other side of conjecture speaks about. It's really an amazing conjecture. So here it is
for every H, fixed H, there exists an epsilon positive such that if G is an n-vertice graph -- so N is graphed.
And it's independence number. So the largest independence site. And it's a clique number less than the
N to the epsilon -- sorry -- that is H three, then I four G and negative G are bigger than N to the X. So
let's look at this and understand. So this is a graph which is not universal. It's some -- if you're looking at
the local profile there is something missing. There's no H induced right? H free means doesn't contain
an induced copy of H. Now, in general, I mentioned to you before the Ramsey Theorem. If you reverse
the way that I told you before, then every graph on an N vertices has either a clinical size local or a
nonclinical size local and the boundaries start. It's attained in a random graph. If you only omit this one
object, there is no H in it. Then the numbers jump from the to [indiscernible] to a fractional power. In fact,
in their paper, they prove exponentially the root logic. So that is none. But then third power is open. The
first case which is unknown is H equals C5. We don't know this -- how to prove this or refute this for C5.
And just to give you a sense of a -- I said that I would mention two elements of -- I mentioned it once.
So [indiscernible] proved that [indiscernible] for graphs is equivalent to [indiscernible] for two elements,
requires some explanation. So what it means for to miss something you understand let's say we're
dealing with the seven profiles. So there's two elementals and seven vertices which you cannot find
anywhere. What replaces this -- the element of this is the largest transitive. So every -- that's very easy.
Every inverted external element -- if you never saw this then that's a good exercise why you show
something. Every inverted external element has a transive site element on at least a local base two of N
vertices. The boundary is tied up to a constant here. Okay. So these two theorems -- these two
statements are equivalent, and they're both open. And let me just give you -- perhaps -- finish with that.
Let me tell you -- at least I will mention one small thing that we did. So different way of stating the
elementary conjecture that's a restatement of the conjecture which is nothing just manipulating simple
things. It says for every K there exists a maximum positive such that if R graph and graph G are bigger
than the N to the epsilon -- smaller to the epsilon then G -- sorry -- then G is K universal. So we're
wondering if this is -- this is something of the sort -- that you know it's a weak version of what you expect
to see in a random graph. Okay. And so we were wondering if there is anything like this instead of
saying that there are no large cliques and no large other cliques perhaps if the counts are small. So here
is a very simple thing which I mean, it's just -- it's simple in the sense that it's just work. There are no
really ideas necessarily. So proposition that I proved with [indiscernible].
So coming back to the triple statistics, if P zero and P three are less than .159 then G is universal. The
bound is tight. For example, we know that this does not imply five universality. We don't know whether it
implies four universality. And this is a new type of question. That's very recent in this area. So let me
just say -- so why -- let me just say one more thing about why I find this conjecture so exciting. There
really lots of things. In particular think of -- perhaps this is not the right thing to do in this last minute of
talk. Think about what we know in physics. In physics all we know is really all we have to start with is
local information, right, how particles interact and we'd like to conclude for it a big picture of -- the big view
of the -- in this -- I think this is why I find this conjecture so exciting that all we are saying is that if you look
at the repertoire of the small size subgraphs of this big graph, this has this huge effect on the overall
structure of the graph and with this I will stop. Thanks.
[clapping].
>>: What the recent fate [inadudible].
>> Nati Linial: There was one a here a few months ago which my understanding has been found to be -I don't think it was so deep. So perhaps how old is? This do you remember the name of -Okay. So there was -- I'm aware of a paper from a few months ago which claimed to have proved this,
and I didn't read it but I checked with, you know, one person that he -- the author thinked was this guy so I
checked with [indiscernible]. So my understanding is that the question is still quite solidly open.
[Clapping].
Download