Document 17864500

advertisement
>> Chris: It’s a great pleasure to introduce Jeff Steif. I would like to start by saying that he’s the
Beanbon lecturer this year. So every year we have one distinguished mathematician forabolist Beanbon
lecturer and let me say a few words about William Beanbon. Some of you know who he was but some
of you might be attending this conference for the first time. William Beanbon lived almost the whole
20th Century from 1903 to 2000.
It gives me a special pleasure to say a few words about him because he was born in Poland. He got his
PhD in Rolf which was in Poland before the war and that’s the place where much of pre-war Polish
mathematics was done. Some of the best known names associated with Rolf were Banek, Elam Katz,
and Beanbon.
I will not go into detailed history but in 1939 he came to the University of Washington and he stayed at
the University of Washington until the end of his life. Of course at the end he was retired. He’s
probably best known as a statistician. He was one of the editors of the Annals of Mathematical
Statistics. This was the journal that preceded separate journals of Annals of Statistics and Annals of
Probability. He was also the President of the IMS.
He also made a very important contribution to mathematics. Now days many people know about
Orlage spaces but some people started calling them Beanbon Orlage spaces because Beanbon and
Orlage co-authored the first original paper. Orlage kept working on these spaces and Beanbon moved
to different problems. He made them mostly his contributions to statistics.
Any how it’s a great pleasure to have a distinguished speaker today and who will talk about Boolean
Functions, Noise Sensitivity, Influences and Percolation.
>> Jeff Steif: Thanks a lot Chris. So first I want to thank the organizers for inviting me to give this talk.
I’m visiting here at Microsoft for a year and having great pleasure enjoying myself talking with theory
group and all the visitors and it’s a great opportunity. I appreciate being here.
So, I’m going to talk about the following topics. It’s going to be very much of an overview lecture so feel
free to interrupt me anytime you want with comments or questions.
So these are the four concepts I’ll me discussing, Boolean Functions, Noise Sensitivity, Influences and
how they arise in a particular model called Percolation. I think [indiscernible] said all his pictures were
due to someone else so these pictures also borrowed from elsewhere.
Okay, there’s some lecture notes that Christophe Garban and I wrote called Noise Sensitivity of Boolean
Functions and Percolation and they’re already published but we’re in the process of putting, extending
them and writing a book. So, if anyone were to look at the lecture notes and have any comments we’d
welcome any types of comments for the coming book. Today’s lecture will be a brief survey of some of
the topics covered in these lecture notes.
Okay, so the first things we’re going to talk about are Boolean functions and Noise sensitivity and the
basic set up for noise sensitivity is quite elementary. We have n random variables x1 through, xn there
i.i.d. + or – 1 each with probability ½ so we just have n coin flips and we denote the vector by x. Then
we have a function f from -1, 1 to the n, this is sequences of length n of + and – 1’s into + or – 1 and this
is called the Boolean function, it just has 2 possible outputs. So, in some sense this function will be a
function of these n inputs x1 through xn.
So this is what a Boolean function is and the next notion we need to do to introduce noise sensitivity is
we have to talk about a small perturbation of x. So x is denoted there and x epsilon is here and this
denote, this again a vector of length n, x1 epsilon through x and epsilon which we think of as a small
perturbation of x and the way it’s going to be perturbed, and it’s a very simple way you go to each of
these n bits and independently with probability epsilon I take that bit and I throw it away and I replace it
by a new + or – 1 each with probability of ½, completely independently. So for each of these is erased
with probability a very small probability epsilon and replaced by a new coin flip. So that’s for
perturbation.
Then the basic question is if we look at, and of course note that x epsilon this perturbed thing is also of
course because everything is independent and symmetric this is again a sequence of n coin flips, they’re,
it’s all ½ a ½. The basic question of noise sensitivity is if we look at f of x in f of x epsilon of f of the small
perturbation are these things going to be close to independent or are they going to be highly
correlated?
Now obviously if epsilon is extremely small then x epsilon and x will be the same with very high
probability you wouldn’t have perturbed it and f of x and f of x epsilon will be more or less the same
value almost with probability close to 1 so we have it highly correlated. So what we should think instead
is epsilon is very small but fixed and this function f will be function of very many variables in some
complicated function of these n variables.
Then it’s not quite clear if things will be independent or not and this is the first definition by Benjamini,
Kalai, and Schramm who introduced this concept. We say a sequence of Boolean functions fn which will
be mapping as above sequences of length n at the + or – 1 we call it noise sensitive if you fix any, if you
take any fixed epsilon bigger than zero and we look at the correlation of the covariance between fn of x
and fn of x epsilon this is a covariance that this goes to zero.
Now what this means fn is just taking 2 values so being uncorrelated is the same as being independent.
So this simply says that if we look at f of x and f of x epsilon these are asymptotically independent. They
become independent and then goes to infinity, if you fix, if epsilon is fixed.
Okay, so I said interrupt me if there are any comments or questions.
Okay, so let’s take 3 quick examples that are easy. The first one is the simplest Boolean function in the
world, it’s called the Dictator Function, fn of x1 through xn which = x1, it’s just the first bit. So this of
course doesn’t really depend upon n and it’s not so hard to see if you do a small perturbation of this
most likely things are not going to change and this will not be noise sensitive. The function doesn’t
really even depend upon n.
An example of a noise sensitive function is the Parity Function. These bits are + or – 1 so I can multiply
them out and if I take the product of them, we call this the Parity function this one is another Boolean
function that turns out it is noise sensitive. Because basically if epsilon’s fixed and n is enormously big
with very high probability you’ll be re-sampling one of the bits with very large probability and once it’s
resampled it’s just as likely to be + or – 1 and so the things become completely uncorrelated and
independent.
An example where things are maybe not so clear at first is something which is called the Majority
Function. So in this function n the number of variables has to be odd and what the function simply does
it looks at the n bits and it looks if there is a majority of 1 or a majority of – 1’s. If it’s a majority of 1’s
the function outputs 1 if it’s a majority of – 1’s it outputs -1. One way to do that is you can sum up the
bits with + or – 1 and you take the sign of the sum. So this gives you the Majority Function and question
is whether this is noise sensitive or not, it’s not as perhaps obvious as the other ones but it turns out this
one is not noise sensitive.
So if you fix your epsilon very small, imagine and election with Democrats and Republicans and with
everything being i.i.d., ½, ½, and you say that the Democrats won and then you change a very small
percentage of the peoples votes the Democrats will still be the winners. So this is not noise sensitive.
So our main example of interest that will be the interest of the talk is Percolation theory. So in
Percolation theory what we do is we have an R by R square piece of hexagonal lattice exactly like this
and each of those hexagons is painted black or white each with probability of ½ independently. I think
of those as my input bits. They’re my, now I have about R, since its R by R I have about R squared of
these + or – 1, think of black as 1 and white as – 1.
Now I can define a fairly, a Boolean function which describes the percolation picture. What I’m going to
do is I’m going to ask if there’s a crossing from the left side to the right consisting only of black
hexagons. So in this particular realization there is a black path from the left to the right side going as I
just showed you, the probability is about a half.
So we define a Boolean function one if there’s a left to right black crossing and -1 if it’s not. So this
function simply tells you if there’s this right crossing and you can ask if this is noise sensitive or
percolation crossings noise sensitive.
So the way you think about this is on the left side we have original percolation configuration omega and
let’s assume that there is a left to 1 I’ve made black turned into red. Let’s assume there is a left to right
crossing and now I’m going to apply this so called epsilon noise so each of these hex, vary only epsilon
portion of these hexagons are going to be re-flipped to determine their value. The question is, is there
still a crossing from left to right? Given the fact there was a crossing before I did this noise is that, how
much information does that tell you about there will be a crossing afterwards.
Noise sensitivity means basically there’s no information being transferred. So the question of noise
sensitivity turns out to be in here and the theorem concerning noise sensitivity was proven by
Benjamini, Kalai, and Schramm that the percolation crossings are noise sensitive. So that, if you have a
crossing initially you do this very small perturbation you get no information what so ever whether there
will be a crossing after the perturbation. So, we call this noise sensitive. It’s sensitive to a small amount
of noise.
So, now this is, if epsilon is completely fixed the statement is for every fixed epsilon these become
asymptotically uncorrelated. Now it’s clear as epsilon gets smaller the perturbation is closer to the
original configuration and so it’s going to be harder for things to become independent. Still we can ask
what would happen if we let this epsilon not be fixed but go to zero with R at some rate?
Now of course if epsilon R decreases to zero too quickly you’ll never get the independence of the
crossing before and after. If epsilon’s too small the perturbation won’t change anything. But one can
still ask how if it can go to zero and what kind of rate?
In the initial paper where they proved noise sensitivity they proved something stronger. They said we
can even let the epsilon R goes to zero as long as it doesn’t go too quickly. It has to be bigger than or
equal to some specified sufficiently large constant divided by log R. So as long as it doesn’t go quicker
than 1 over log R essentially you still get this asymptotic independence of the percolation from before
and after.
The next question one can ask is what happens if the amount of noise, so vista says you can go log
arithmetically but one can ask can you send up or to go to zero even quicker, like an inverse power? So
what happens if the amount of noise epsilon R decreases to zero as a power or an inverse power of R,
like 1 over R to the ½, could you still have these things being asymptotically uncorrelated? Okay, so this
will be the motivating question which we’ll come back to in a few minutes.
So now I’m going to introduce a couple other concepts that arise in Boolean functions and arise in this
area that turns out to be very key to the study of all these things. This key player is two things which are
called Pivotality and Influences.
So now we go back to the general context of Boolean function and I want to talk about the idea a
particular bit ith, so ith is going to be 1 over a bit, the 5th bit or the 10th bit. I want to talk about the ith
bit being pivotal. So for Boolean function f the event that ith is pivotal this is defined as follows it’s the
event, this turns out to be an event that is measurable with respect to the other variables. It’s the event
that if you were to change the ith bit the outcome of the function would change.
So in other words take a realization and look at f as 1 or -1, now go to the ith bit and change it, maybe f
changed, maybe f didn’t change. If f changed then I say i was pivotal, basically it’s pivotal for the event
whether f is going to be + or – 1.
So that’s the notion of pivotal and the notion of influence, the influence of the ith bit which we denote i
sub i of f the big I is for influence, the little i is that we’re talking about the ith bit and that’s our Boolean
function. The influence of the ith bit is exactly the probability that i is pivotal. The probability that this
occurs at this particular ith bit turned out to be very important in this realization that by changing it you
change the outcome f.
So let me go back to our 3 Boolean functions, Dictator, the Parity function, the Majority function and
just check what that is there just so that the concepts clear. So for the simplest one for the Dictator
function fn of x1 through xn is just the first bit. Obviously the first bit has influence 1 because it’s always
pivotal. If you change the first bit the outcome always changes. The other bits of course have influence
zero because those function of course don’t, those variables don’t even come into f.
For the Parity function that’s also very simple, all the variable has influence 1 because if you take any bit
is always pivotal. If I change any bit the function gets multiplied, will just change because you’re taking a
product of + and – 1’s so trivially all the influences are 1.
For the Majority function it’s slightly more interesting yet not so hard to see that the influences, of
course the influence of all the variables are the same because all the variables are playing the same role
and the influences are 1 over n to the ½ approximately and this is basically because if you look at the
first bit what does it mean for the first bit to be pivotal? When is it going to be the case that the first bit
by becoming, can change the outcome? The first bit can only change the outcome if there’s a tie among
all of the other bits. If you have n – other bits then you want to tie between + and – 1’s this is basically
like a random walk being back at the origin at time n and this decays like a constant over n to the ½
power. So in this all the influences are 1 over n to the ½.
Let me mention although it’s not so related to the noise sensitivity it’s sort of an interesting theorem if
you find the notion of influence interesting so I want to mention an interesting theorem in the area. The
theorem is an answer to the following question. How small can all the influences be?
Now if you take f to be the constant function all the influences are zero so this is uninteresting. So we
have to avoid the degenerate things so let’s stick to functions f with a probability that f is 1 is say
somewhere between ¼ and ¾, so it’s non-degenerate. If we stick to such f you can ask how small can
you make the largest influence. Is there any guarantee that there will be some variable of reasonably
large influence and if you how large?
So if we look at the majority functions this, the majority functions show that the maximum influence can
be a small as 1 over n to the ½. The question is could it get even smaller than 1 over n to the ½? The
answer is it turns out it can be a lot smaller than 1 over n to the ½ power.
So here is a particular Boolean function. So if you take the n variables x1 through nxn and we’re going to
partition it into disjoint blocks, first block, second block, third block, etcetera, and the length of each of
these blocks is going to be ascabin which is going to be log based 2 of n – log base 2, log base 2 of n. If
you define the Boolean, so we partition it that way and then we say that the function f is 1 if at least 1 of
these blocks is all 1. So if you can find 1 of these blocks that are all 1 the function is 1. If there’s no such
block the function is – 1.
Okay, this is a Boolean function it turns out its non-degenerate. The probability that f is 1 is somewhere
between, well between ¼ and ¾, it’s non-degenerate. So it’s easy to check that the functions are nondegenerate and it’s also easy to check that these influences have become our log n over n. So these
influences are much smaller than 1 over n to the ½ and the majority function is log n over n which is
much smaller.
It turns out there’s a theorem by Kahn, Kalai, and Linial that says this is the best possible. So if you take
any non-degenerate Boolean function there’s, you always can find at least 1 of the variables who’s
influences at least a constant log n over n. So the answer to the question how small can the influences
be? They can get as small as log n over n but they can’t get any smaller.
>>: [inaudible] types of [indiscernible] constant?
>> Jeff Steif: What’s that?
>>: Does types of [indiscernible] constant?
>>: No.
>> Jeff Steif: I don’t know. No, Jeffrey says no.
>>: So what does?
>>: We don’t know –
>> Jeff Steif: What’s that?
>>: There’s a gap of I think terminal 2.
>> Jeff Steif: [inaudible] the optimal concept –
>>: [inaudible] best [indiscernible].
>> Jeff Steif: Okay, okay.
>>: But the best example is something like that?
>>: The best example is [indiscernible].
>> Jeff Steif: Okay, so we originally talked about noise sensitivity and then we moved into influences, so
what is, I should say so this argument uses Fourier analysis and hypercontractivity of certain operators
and so called Benomne Beckner growths and equality but I’m not going to be going into this is more of
an overview but these are the types of things that come in.
So why are the influences relative to noise sensitivity? If you’re only interested in noise sensitivity why
do you care about the influences and one of the fundamental theorems in the area was originally in the
Benjamini, Kalai, and Schramm paper which says that if you take any sequence of Boolean functions and
you look at the sum of the squared influences, you square the influences and sum them up if this goes to
zero then the sequence is noise sensitive. So in this sense the influences give you information on
whether you’re noise sensitive.
So this condition is certainly not necessary because if you take the parity function, the parity function is
trivially, it’s very noise sensitive but it certain doesn’t satisfy that because all of the influences are 1 so
that sum is in fact n, which is going to infinity. So it’s not necessary but it turns out to be necessary for a
very large class of functions. The condition is necessary for monotone functions this means increasing.
So Boolean functions monotone if whenever you change some of the bits from -1 to 1 the output of the
function can only increase.
It turns out for these this condition is necessary and so this gives you a sufficient, necessary and
sufficient condition for noise sensitivity. If you take the majority functions I told you that this was not
noise sensitive. What happens when you plug this into the formula so the influences was 1 over
squared of n, you square it you get 1 over n, you sum it and you get 1. So it becomes that sum of order
1 for the majority so it just barely misses satisfying this condition. It turns out that the majority function
is an extremal sequence in many respects for Boolean functions.
This theorem is also proved using for analysis hypercontractivity in any quality or generalizing of any
quality due to Talagrand.
Okay, so this is, if we take this general theorem, okay so now we want to get back to percolation
crossings. So I told you that percolation crossings are noise sensitive so how do we get the noise
sensitivity of percolation crossing from this. So I told you percolation crossings are noise sensitive and
although it was not done in this way in the original paper I want to explain to you basically why, how it
would follow from this theorem. So, assuming this theorem how do we get the noise sensitivity? We
have to somehow compute the influences for the particular case of percolation.
So at this point we now have the event if you have a square, an R by R square the event is whether
there’s a left to right black crossing and I want to have some way of computing what the influence is.
The answer to this is going to bring us into what are called critical exponents in percolation. So let me
describe what these are.
So first of all I only describe percolation on an R by R box but normally we actually do percolation on an
infinite lattice. So imagine you do percolation on an infinite hexagonal lattice. Now we’ve always been
taking P as a ½ the probability of a black, but you could take the probability of a black to be anything. So
imagine you take the probability of a black to be P and white 1 – P independently and I want ask is there
an infinite black component? So I do this on entire infinite hexagonal lattice, is there an infinite black
component? The answer is it depends upon P and there’s a critical value and it was essential shown in
1960 by Harris that when P is ½ there’s no infinite black component. It was proven 20 years later by
Kestin that if P is slightly bigger than a half, if P’s bigger than ½ then in fact you suddenly get this infinite
black component.
So we say the critical value for percolation is ½. At this critical value of ½ there’s no infinite component.
Okay, now we look at this picture, we look at our infinite lattice, P is ½ and I want to look at the event
that is an open path, a black path from the origin to distance R way.
Since the probability of having an infinite black path is zero we know that this probability goes to zero.
The question is how fast does it go? How fast this will go was answered by Lawler, Schramm and
Werner in 2002 it says the probability of this event decays like alpha 1 of R is just the definition of this, it
decays like R to the – 5/48 essentially + little o of 1, don’t worry about that, so decay is like 1 over R of
the 5/48 and we call 5/48 a critical exponent.
So let me just mention that there’s a critical value which is P as a ½ is a so called critical value. That
determines if there’s an infinite cluster and there’s this critical exponent 5/48. There’s a big difference
between them. So people like to say that the 5/48 is universal and they like to say the P as a ½ is not
universal. What this means is if you, this ½ it happened to be the right thing for this model but if you
changed the model a little bit and did something else you could get a different critical value for P.
However, if you took a different model and looked at the critical percolation in this, if you looked at the
percolation picture at the new critical value and looked at this event again it’s believed that this should
still decay like 1 over R to the 5/48. So 5/48 should be the number that comes up for all of these models
but the ½ was very special to this and that’s why they call that universal.
Okay, now there’s another critical exponent that’s going to be relevant for the influences. This is called
the four-arm exponent. So now I want to look at the following event that from the origin there’s 2 black
paths going out, 2 white paths going out and in this clockwise order, black-white, black-white. Again, of
course this is even more unlikely than the other so the probability of this will also go to zero. The
question is how fast does it go? Smirnov and Werner showed that the probability of this event decays
like 1 over R to a different power and this other power is 5/4. So we say that 5/4 is the critical exponent
for the four-arm events because there are 4 arms in this picture.
Okay, now this is exactly the right picture to capture the notion of influence. So what is the probability
of a hexagon being pivotal for percolation? So now we’re back into this crossing picture. We’re looking
for a blue now path from left to right and we want to know when is it the case that a particular, forget
the D here that a particular hexagon x is pivotal. So what has to happen if x is pivotal? It means if x is
on, if it’s blue there’s a left to right crossing and if it’s red there’s no crossing. Well, so if x is on that
means there has to be some blue crossing from left to right and this blue crossing has to cross through x.
If the blue crossing didn’t cross through x then by turning x off you wouldn’t have gotten rid of the blue
crossing. So, for x to be pivotal this blue crossing has to go through x.
In addition there has to be no path, blue path from this part of the path to this part of the path.
Because if there were a path going like this there you could get from here to there avoiding x and x
would not be pivotal. So, in order for there to be no blue path from here to here there has to be a red,
red is now the other color rather than black and white, blue and red now there has to be a red path
going up there and a red path going down there.
So that’s what the picture is for to be pivotal and of course as long as you’re not near the boundary this
is exactly what we had on the previous slide, the four-arm exponent with these four-arms going out. So
the probability of being pivotal is as long as you’re away from the boundary it’s at most by the critical
exponent on the previous slide 1 over R to the 5/4 and this yields the sum, what does this do to the sum
of the squared influences? Well, there R squared many hexagons each of the influence is 1 over R to the
5/4. I square the 1 over R to the 5/4, I get 1 over R to the 5/2, I get R squared x that and that goes to
zero.
So other than dealing with that you have to deal with the boundaries. Not very difficult you have to deal
with boundary issues because this is really the only the appropriate picture if you’re away from the
boundary but this is basically tells you that the sum of the squared influence goes to zero and you get
noise sensitivity.
Okay, so now if we go to quantitative noise sensitivity this is the question can you take the epsilon R is
going to zero so recall that Benjamini, Kalai, and Schramm as I told you said that percolation crossings
de-correlate under noise even when the epsilon R goes to zero as long as they don’t go to quickly, if
they’re bigger than constant over log R for sufficiently large constant C.
Might we believe that percolation crossings can de-correlate and when I, de-correlate can mean lots of
things. It can mean, de-correlate can mean partially independent or completely independent. For me
de-correlate means essentially completely independent so that they become asymptotically
independent. Not that they’re just not completely correlated but they’re asymptotically independent.
So, might we believe that percolation crossing can de-correlate even if epsilon R is 1 over R to the alpha?
If so what would be the largest alpha we could use? If we take alpha too big the probability the noise
becomes so small that things won’t change and you can’t possibly de-correlate and so this is a well
defined question. How big can we get the alpha? Basically a lot of talk will be trying to convince you or
telling you what’s known and how to get what the alpha is.
Okay, now you know the alpha so the heuristic is yes, I don’t know if this is the answer, maybe we’ll see
the heuristic on the next slide. So, yes we might believe that crossings de-correlate and guess at the
largest alpha is ¾.
What I want to do in the next slide is explain to you why you might believe the best exponent is ¾.
Okay, so we repeated that here so we have it here the heuristic for the noise sensitivity exponent for
percolation might we believe that we de-correlate even if we go like 1 over power, yes in the largest
alpha is ¾. Okay, so by the bottom of the slide I hope to convince you heuristically that ¾ is the answer.
So, recall the, so we already said this before the four-arm exponent says that the probability of a
hexagon being pivotal is about 1 over R to the 5/4. So that’s the probability of a hexagon being pivotal.
So the expected, if we look at now the total number of, that’s the probability of 1 hexagon being pivotal
if we look at all the pivotal and look at the expected number of pivotals, how many, what’s the expected
number of pivotals? Well, we have R squared many hexagons each is pivotal with about that
probability so the expected number of pivotal hexagons is about R to the ¾ which is then ¾ above.
Therefore, image you now, the noise epsilon R the probability to which you’re re-sampling is 1 over R to
the alpha. What would be the expected number of pivotal hexagons that we resample? We have this
random set of pivotals what’s the expected number that we resample? Well, it would be exactly the
number of, the expected number of pivotal hexagons times epsilon R. So this would be just R to the ¾ epsilon.
So if alpha is bigger than ¾ then the expected number of pivotals that we resample is going to zero. So if
alphas bigger than ¾ that means we don’t resample, we don’t actually resample a pivotal and things
don’t change.
Okay, you might argue right away on that, hold it, you know it’s possible I hit, I don’t have to touch a
pivotal to change things. Maybe I change this one which wasn’t pivotal and this one which is pivotal and
I change them both of course I could change things. So it’s certainly not the case the fact that I didn’t hit
pivots says I don’t change things but it’s heuristic. In fact making this heuristic rigorous is actually quite
easy. So this can be made rigorous easily in a five minute argument.
If alpha’s less than ¾ then the number of pivotal hexagons that we resample is now R to the ¾ - alpha
which is now very big, so that means that we’re very likely to hit a pivotal now. If we hit a pivotal well
things change and things should all get mixed up, you’ve lost all your information. So that’s the heuristic
of the ¾.
Now, this part is much, much harder to make rigorous. It’s heuristic where the ¾ come from. If you hit
a pivotal things could change and then maybe everything got lost in the shuffle and you have no
information about, and so what you knew originally at about the picture you had a crossing gives you no
information afterwards. Okay, so now that explains the heuristic for ¾.
So now let me tell you 2 different approaches that were used to get partial results. The first one didn’t
give the ¾ but it was the first argument that allowed epsilon R to go like 1 over R to some power.
Now this approach and the approach I’m going to describe after they’re very different and they’re also
very different than the original Benjamini, Kalai, and Schramm argument that said if the noise is bigger
than constant over log or things go to zero. So, although I should say and I’m going to be getting into
these, I’ll explain very simply some of the Fourier analysis later all the 3 approaches used for analysis but
beyond that common component the arguments are quite different.
On this slide I’m going to describe the second approach which is related to computer science, theoretical
computer science you could say. So now you have to sort of think like theoretical computer scientists if
you know how they think.
So we have a Boolean function, I have to take I want to consider randomize algorithm. So before
reading the slide let me tell you in words what you should imagine what a randomize algorithm is. Let’s
say you have a Boolean function of n variables. You know what the Boolean function is maybe it’s a
majority, maybe it’s something else. I ask you to compute f of x1 through xn. Well, you could do it you
know the function f but unfortunately you can’t see the bits, x1 through xn are covered; you don’t know
what they are.
So what you can do is you can ask for me some of the values of x1 through xn. You might say Jeff tell me
what the 3rd bit is please and I say oh that was a 1 and then you’d look at that and say ah that’s a 1.
Then please tell me what the 10th bit is. Maybe I’ll say that’s – 1. Then depending on as you get more
and more information you keep on asking about new bits but each new bit you ask about which bit you
choose will depend on what you’ve see up until that point in time. You want to, at some point,
presumably before you found out about every bit you might say don’t tell me anything else I already
know what the outcome is.
For example in a percolation picture if you already found out there was a left to right black crossing even
though you didn’t some of the other hexagons you wouldn’t need to get anymore information. What
you want to do is you want to do this type of thing, ask me these questions with trying to ask as few
questions as possible.
Okay, so now given that it should be easier to understand this. A randomize algorithm a for f examines
the bits one by one with the choice of the next bit examine may depend on the values of the bits
examined so far. Now it’s also allowed to be random. Which bit you choose next is allowed to depend
on what you’ve seen and some exterior randomness.
Even the first bit is random. So the algorithm might pick a bit uniformly at random or according to some
other distribution that’s allowed. That’s what we call a randomized algorithm. The algorithm stops as
soon as you know what the output of f is.
Now I said you want to ask as few questions as possible. So this is one way of describing this is the
following. When the game is over and you know at the output f is there are certain bits which you’ve
asked me about. We let g be J be this random set of bits that are examined by the algorithm. You
define, so that’s a random set and I define the revealment of A to be the following. I call it deltas of A,
it’s sort of represents the degree to which my algorithm A, I call it A, reveals the bits.
What I can do is for every one of these end bits I can say, what’s the probability that bit was ever looked
at by this randomized algorithm? That’s the probability that I is in J. So if you fix J as random but if I fix a
bit I, I can talk about that probability is in J, in other words the probability that you asked what I is.
I want to take the maximum, the maximum over all the different bits. That’s call the revealment of the
algorithm. With Oded one version of one of the interference we have is that if you have a sequence of,
let fn be a sequence of Boolean functions with an being a randomized algorithm for fn.
The first theorem says if these algorithms are such that the revealments go to zero, so very large n for
any fixed bit it’s very unlikely you’ll ask what the value is then it turns out the sequence is noise
sensitive. I’ll explain concretely how you do this in percolation. This also turns out to give you a
quantitative version which shows the following, if the revealments go down according to some inverse
power of n. So if it’s most C over n to the alpha for some alpha then if you take any beta less than alpha
over 2 then you end up getting the noise sensitivity you want as long as the noise, the noise is one, we
take the noise to be 1 over n to the beta. So what we’re asking is fn of x + fn of x perturbed the noise
version with this very small amount of noise, 1 over n to the beta this thing says that these become
asymptotically independent. Whenever beta is smaller than alpha over 2 if you have a bound on this on
the revealment.
Okay, so how do you, what does this give you for percolation. So I have to explain, to explain
percolation I have to tell you what this interface is. I also unfortunately have to change white to black
and black to white now. So now we’re going to look at left, unfortunately I got these pictures from
someone else I have no idea how to change the pictures.
[laughter]
Rotating doesn’t help either. Anyway, so now we wonder is there a left to right crossing or not? This is
a way; this is called the interface between the 2. When you look at SLE-6 for critical percolation there’s
a famous picture of Oded and this is what’s going to be described here.
So here’s how you determine if there’s a left to right crossing of whites. I start a picture, this red path
here and I continue the path always keeping whites on the right and reds on the left, blacks on the left.
So here it comes this way to be white, and then I want a black on the left, comes down here. This
defines the path precisely. It comes around and goes around and always keeps the whites on the right,
comes here, bounces and stays here, and then it comes back here and goes up there. Okay it keeps
whites on the right, blacks on the left.
Now this red path tells you if there’s a left to right crossing. Start this path it’s going to be bouncing off
this side awhile and this side awhile. Eventually it’s going to hit the top or the left side. If it hits the top
that says there’s no left to right crossing because on the left side of this red path is this black vertical
path that will stop the white path. Conversely if this red path the interface hit the left side before the
right side you could look at exactly what’s above the interface and that would be your white path.
So looking at this red path tells you that there’s a crossing. Okay, now the algorithm is the following.
The algorithm starts here, now you may not like this because that means I always examine this bit with
probability 1 so the revealment won’t go to zero but let’s not worry about that.
So the algorithm simply asks about bits it needs in order to be defined. So it says what are you? It’s
white, okay. Now it goes this way the path so it asks this bit because this is the next one it needs to
know, that’s black the path goes up here then it needs to ask this one.
So basically this path is going to be in the end asking these bits on the 2 sides of the path but no others.
So that’s how the algorithm works. Now let’s ignore the boundary, this doesn’t, you have to modify this
and I’m not going to explain how to modify it. I’ll tell you why it works if you’re away from the
boundary.
If you’re in the middle of the, here somewhere and I ask what’s the probability that this particular
hexagon is being looked at by the algorithm that’s what we need to know. What’s the probability of this
hexagon being look at by the algorithm? Well to be looked at you’re only looking at the ones that are
adjacent to the interface. So what has to happen is if you take a point of this interface you’ll see you
have a black path to the boundary and a white path to the boundary. So a hexagon near the center of
the picture is examined only if there is both a white and a black path coming out of it.
That’s a critical exponent that I have not described for you. That’s called the two arm exponent and
that’s also critical exponent which is known and it decays like 1 over R to the ¼. So this says points near,
not near the boundary the probability of them being revealed is at most 1 over R to the ¼. You can do
some extra randomization to get rid of the problems on the boundary.
So hence, and therefore by the previous theorem you end up getting de-correlation if epsilon R is larger,
as long as epsilon R is larger than 1 over R to the 1/8. So it gives a proof that you can let the epsilon R
decay as a power, of this power and you can still get noise sensitivity, the de-correlation.
Of course the 1/8 this is a factor of 6 off from the ¾ conjecture. Yeah.
>>: Do you have an example of a nice monotone function that is noise sensitive but has no algorithm –
>> Jeff Steif: Yeah –
>>: [inaudible]?
>> Jeff Steif: Yeah, do GNP, P is a ½ and do cleak containment at the right value of cleak. It will ask you
if you have a 2 log n cleak. It turns out its noise sensitive and there’s no algorithm.
>>: [inaudible] noise it’s whatever it should be.
>> Jeff Steif: Yeah, yeah, yeah, it’s about to, you want to choose the size to get it non-degenerate, it’s
about 2 log and then it only becomes a non-degenerate function at certain values of n. There’s an
example and it’s known that there’s no algorithms.
>>: [inaudible]
>> Jeff Steif: Yeah, it’s, it’s known. So that’s the best example I know for the answer to your question. I
can also say, you might say okay you’re off by a 6 but can you find a better algorithm? This is one
algorithm and the answer is maybe you can find better algorithms. We don’t know if you can find better
algorithms, it’s an interesting question.
Nonetheless there are theorems that say maybe they’re better algorithms but they can’t be that much
better. There are bounds on how good an algorithm can be and it’s knowing that you can’t possibly get
up to the ¾ through via this method.
Okay, so the, do you know what time I started?
>> Chris: 10 more minutes.
>> Jeff Steif: I have 10 more minutes, okay. So, okay, the way this is done is the Fourier set-up. You
have the following thing. If you take the set of functions now that zero 1 to the end should be +, - 1 to
the end into R. If we look at all functions not just Boolean this is of course a 2 to the end dimensional
vector space because the domain has 2 to n elements.
There’s a very, very nice orthogonal basis for this vector space which are the sets that’s called Ki sub S
where S is a subset of 1 through n. If S is the empty set, Ki empty set is taken to be the constant
function 1. Otherwise Ki S is the following Boolean function Ki f, x1 through xn is you simply take the
bits sitting inside of S and you multiply them, that’s Ki sub S. If you know about group theory these are
the characters for the group Zema 2 to the end but you don’t need in this context everything can sort of
stay commentarial and you don’t have to deal with that.
Okay, so those are the characters. Now those are our basis elements, they’re orthogonal and therefore
if I give you any function f you can simply write it out in this orthogonal basis. I can sum it up over all f
containing 1 through n, f had of S, Ki of S. F had of S is called the Fourier co-efficient of S.
What’s elementary to check, very elementary to check is that is correlation which is what we’re
interested in that this is equal to; this can be expressed in the following very simple way in terms of
these Fourier co-efficient. Now since my f is mapping into + or – 1 it has L 2 norm as 1. That means if I
sum up the squares of these Fourier co-efficient just by Pythagoreans Theorem they add up to 1.
What’s interesting is, but you end up getting that this correlation is given by the sum k was 1 of the n, 1
– epsilon to the k times this thing. Now, if I sum up f had of S squared over all the S’s we get 1. But here
I’m just going to sum them up over the S’s of size k. This is sort of we call this the weight at size k. In
order for this to be small, I mean epsilon small but if I take 1 – epsilon to a big power it becomes small.
So what you basically want noise sensitivity corresponds to the Fourier weights, these things here being
concentrated on large values of S. Most of the weight should be on large values of S. That’s what this
formula tells us.
So let me give you an interesting relationship between the spectral sample and something else. So it’s
not necessary but it’s nice to have the following picture. There’s something called the spectral sample.
Given a Boolean function f its spectral sample which we call script S sub f, this is a random set.
It’s a random set of 1 through n and it’s defined distributionally as follows. The probability that this
random set is equal to S is simply the Fourier co-efficient of S squared. These all add up to 1 so this gives
us a probability distribution on the sub sets. Then in terms of this random set this, you can re-write this
correlation which you want to go to zero very nicely in terms of this S sub f, that it’s basically it’s the
expected value of 1 – epsilon raised to the size of this random set.
So noise sensitivity is basically equivalent to the cardinality of this set going off to infinity. Now, so to
understand noise sensitivity what one has to understand is the structure of the cardinality of this set.
This set is quite comp, there’s a very complicated random set but this tells us that’s what we want to
understand, so quantitative noise sensitivity can be obtained by understanding the typical behavior of
this random set.
Now what’s not so hard to get a hold of is the expected value of that. But the hard thing is to show you
that the expected value of this random variable is actually telling you the typical situation.
So, there turns out to be an interesting relation between, we have 2 different random sets for Boolean
functions. We have the spectral sample that’s 1 random set and we have the pivotal set. The pivotal set
is the set of points which are pivotal for the Boolean function. That’s another random set.
It turns out that there’s a very interesting relationship between them. They are not defined on the same
space so you can’t ask if they’re independent or not but you can ask how they’re distributionally related
and there are some amazing things.
The probability that a given bit belongs to these 2 random sets are the same. Therefore the expected
size of these 2 things turn out to be the same and it also turns out these 2 random sets have the same 2
dimensional marginal’s.
Now this turns out to be very useful because you can sometimes obtain weak results for the spectrum.
You can, what you can do this tells you that second moment methods turn out to be very useful because
if you apply second moment methods to this spectral set you can instead transfer it over to the set of
pivotal points, which turns out to be a little bit easier to handle. This thing allows you to transform sort
of second moment arguments but unfortunately they don’t have the same distributions. In general just
these 2 dimensional so this can’t carry you all the way as you want.
So what’s happened with percolation? So the expected size of these 2 things basically if you look at the
slides I told you before and put them together the total influence is R to the ¾ but what you need to
show is that’s the typical behavior of the spectral set.
In his very long paper and act by Garban, Pete and Schamm this is just a corollary of what they did. They
did a lot but just this one thing which is relevant to the talk, is the main point of the talk, is that the
typical behavior of the spectral sample is R to the ¾. So not only the mean it’s typical behavior. This
was a very difficult project. The fact that this is a typical behavior this in the end yields the conjectured
noise sensitivity.
So that’s what you need to do. So the point is even though one is really interested in the cardinality of
this their method here you didn’t just look at the cardinality it looked at this random set as a random
sub set and try to analyze in some way and it turned out that it was, it has some relationship to fractal
percolation and they were able to get enough information on it to prove this result.
Okay, skip that part then. Okay.
[applause]
Download