19060 >> David Wilson: So today we're pleased to have...

advertisement
19060
>> David Wilson: So today we're pleased to have Anup Rao speaking. So Anup graduated from
UT Austin and did a post-doc for advanced study and Center for Computational Intractability. So
now he's a professor at UW.
>> Anup Rao: Thanks. Okay. So I'm going to talk today about pseudorandom generators. And
this is about conversations that I had with these people, with Mark Braverman, who is at MSR on
the other side, and Ran Raz and Amir Yehudayoff.
So this talk is about derandomization. So, broadly, the goal in derandomization is to answer this
question. Does access to randomness really help in computation?
And there are two kind of arenas where you can ask this question concretely. The first is for time
bounded computation, and here is the central question is can you replace every time bounded
randomized algorithm with a deterministic one.
So that's a question you can ask. And the second arena where you can ask this question is for a
space bounded computation. Computation that involve a bounded amount of memory. And here
the question or the central question is can you replace every randomized log space algorithm with
a deterministic log space algorithm that does the same task. And here's the status of what we
know so far. In the time bounded world, we do have some answers that are kind of conditional.
This is cutting off 10 percent of my slide. This is how to derandomize time in space bounded
computations. And what we do know is that in the time bounded world we know that BBP is
equal to P. You can derandomize every randomized polynomial time computation under some
plausible assumption. And so it's still not a definite yes. But there's some situation generally
believed to be true that would imply that this can happen. And this world the best, in the space
bounded world the best result to date is as a result of Saks and Zhou building on others that
shows that any computation that uses logarithmic space and uses random bits can be simulated
by deterministic computation that does the same thing but uses space that does the same thing
deterministically but uses only space log to the three halves then.
>>: What is E on the left-hand side.
>> Anup Rao: E is exponential time as in 2 to the over N.
>>: So that's actually a conjecture rather than a class of [inaudible].
>>: Yes, if. It's if a certain conjecture.
>> Anup Rao: Yes, this is a conjecture.
>>: Certain category.
>> Anup Rao: Conjecture is something believable that says class, exponential time computation,
does not have small circuits. So there's some problem in this class that cannot be computed by a
small circuit. That's what you need. Once you have that, then you can get rid of randomness.
But in this we're talking about the second part. This is the world we're going to focus on.
And really we'd like to show and at least I believe it should be true that BPL is equal to L.
Okay. So for this talk we're going to focus on a model of computation that's really simple and
easy to describe and here's what it is. I'll explain why it's relevant to the big problems that I just
talked about.
The model of computation is it's called branching programs. So this is a branching program. It's
a layered graph where all the edges go from one layer to the next layer.
And every vertex in the graph has two edges that come out of it. One labeled zero and one
labeled 1. And here's how you compute with this program, given a particular kind of N bit input
string, the way you compute using this program is you just take a walk in this graph. So you start
at the start vertex, which is colored green at the top. And then you read the first bit, which in this
case is 1. That tells you to go to this vertex.
And in this way the input string just corresponds to a path in this program.
And you follow this path which eventually leads you to the promised -- the output at the end of the
program. So the output of this program is just the vertex that you reach at the end of this walk.
>>: So do the red and the blue first have to be permutations or could they be absolute functions?
>> Anup Rao: That's an important point that we will get to. So in this case -- so in this particular
program that I drew out actually computes this function. It computes this sum mod 3. And more
generally natural class of functions that these guys can compute is you can compute for any
given group you can compute this group product, right? So given G1 through GN elements that
specify the program, you can compute for an N bit string the product of the kind of sub word that
the bit spring specifies using this program.
Okay. But the reason they're really interesting from a complexity theory perspective is that you
can reformat every randomized space-bounded computation so it looks like this. Here's how to
do every randomized space-bounded computation.
You start by using the input to the computation to compute deterministically width N length N
program. I didn't say but the width of the program is the maximum number of vertices in this
sense, and the length of the program is this length. The length is always going to be N in this
talk.
Okay. But every BPL computation can be run like this. You can first use the input to compute a
width N length N program. And then the only step that uses randomness is this step. You then
perform a random walk on this program.
So you start at the start vertex and pick an edge completely at random and see where you end
up. All right. And if you end up at a designated output vertex, an accept vertex in the final state
then you accept, otherwise you reject.
>>: Repeat that a random walk is a random strength, random --
>> Anup Rao: Yes. So use the input -- this program uses the input first to make the program. To
decide where the program is.
>>: What's the input?
>> Anup Rao: Okay. So we're talking about computation here. So there's -- this machine is
trying to compute some language, decide some language. Okay? So that means that for every
input string it has to decide whether the input string is in the language or not.
And how it's going to do it is like this. It will first use the input string to generate such a program.
>>: What is the property of the program that it generates?
>> Anup Rao: Just that it looks like this.
>>: It could be any program.
>> Anup Rao: Any program. It has width N and length N. You don't know what program is going
to -- okay? And once it has this program, it still doesn't know whether it's going to accept its input
or not. But what it's going to do, it's going to perform -- so that part was deterministic. There was
no randomness involved. And the second part is the only part where it uses the randomness and
it's going to use the randomness to do a random walk on the program that it's generating.
>>: The magic is in the trip, you're just not telling us ->> Anup Rao: What do you mean by the magic.
>>: To finish it.
>> Anup Rao: This is what the log space computation is. Okay? So if you have a machine that
uses small memory, then I can always take that machine and turn it into this. The machine does
this. Okay. So the only place that the machine uses randomness without loss of generality is in
the second step. That's what I'm trying to say. So in general it's not so clear when you look at
randomized machine what it's doing with the randomness. Here I'm just telling you you can
always force a randomness in this specific way.
>>: [inaudible].
>> Anup Rao: I'm sorry?
>>: What machine input dependent?
>> Anup Rao: No, the machine is not input independent. The program is input dependent. The
machine has a [inaudible].
>>: But the program is [inaudible].
>> Anup Rao: Right. So the Turing machine generates this program and the program depends
on input and then it runs the random walk.
So what this means is if you want to take the randomness out of this process, right? You don't
need to actually perform this random walk. All you need to do is you need to just estimate what's
the probability that a given branching program can output one. Like what the probability reaching
the output the accept vertex is. That's all you need to do. One way to do it is to perform this
random walk. But if you could do it deterministically and in log space some other way, then you
would get rid of the randomness in this computation.
And just to put it in a different way, if you find this way more appealing, you can think of writing
out the transition matrix for this random walk and then all you want to do is you want to compute
kind of one of the entries in the nth power of this transition matrix.
So to derandomize all space bounded computation, all you need to be able to do is kind of
compute powers of matrices with small space.
>>: That's assuming that all these layers are the same.
>> Anup Rao: No, it's not assuming that all the layers are the same. You can write the transition
matrix for the entire graph.
>>: I see.
>> Anup Rao: Okay. So what I'm going to describe to you now is another way, a path towards
trying to get rid of the randomness. A specific way to estimate this probability of outputting one.
And that's via what's called a pseudorandom generator. So what's a pseudorandom generator?
It's a function that takes a short string, a string of just length T bits, and stretches it out to a long
string. A string of length N bits, with the property that if you pick a completely random input and
then evaluate this function, then the output of the function is indistinguishable from a uniform
string to any program of this type. Any branching program of this type.
So what you want is that kind of you feed in uniform bits here and stretch them out using the
generator. And then give those to the program. Then you want the distribution -- then you want
the distribution of the output of the program in that case to look somewhat close to the distribution
of the program would have had if you fed in truly random bits.
So kind of the distribution G of UT is pseudorandom for branching programs. Okay. And the goal
is to find such a generator where, first of all, the generator has to be sufficiently computable. And
T has to be as small as possible. And epsilon has to be as small as possible. So that's what a
pseudorandom generator is.
Now let me show you how you can use it to derandomize programs, to derandomize small space
computation. Remember, this is what every computation does. And all we have to do is at the
end estimate the probability of outputting a 1 in this program.
And if you had a really nice generator like what we saw before, then the way you could estimate
the probability of outputting a one here is you could just run over all possible inputs to this
generator and see what fraction of those inputs lead the program to get to the accept state.
And because this generator G is supposed to be a pseudorandom generator the outputs to this
that lead to the accept state is going to be close to the correct fraction, the fraction overall on a
uniformly random string that would lead to the accept state.
Okay. And this would cost, assuming that the generator was sufficiently computable, it would
only cost an additional space of T. So the smaller the T is, the cheaper the step would be.
Okay. So that's kind of the connection to derandomizing space bounded computation.
>>: Protect time 2?
>> Anup Rao: It would take time 2. But we're interested in only counting the memory.
Okay. So the reason I'm telling you about this strategy is because it's a successful one. An
important result is sometime in the early '90s Nisan showed exactly how to do this, to get a
pseudorandom generator that the seed bits, the amount of bits that you need to feed into the
generator looks like order square log N. And that's exactly, that immediately gives that BPLs
contained in L squared.
>>: What do you mean? How the statement, the distribution of direct random GUT is about the
same, what do you mean by that?
>> Anup Rao: It's P of GUT. So you run the program.
>>: Through distributions.
>>: What distance.
>> Anup Rao: Distance, everywhere I'm talking L-1 distance. Statistical distance.
>>: So you don't care about the amount of space that G uses?
>> Anup Rao: I do. I do, and it's small. But for most I'll ignore this. Even to find -- it turns out
that every G that I'm going to discuss is going to be also small space.
>>: So small is T or 2 to the T?
>> Anup Rao: Small is.
>>: Small for you would be like one-tenth would be enough, the distance.
>> Anup Rao: One-tenth would be enough of the distance. But small would be in this case T.
Okay. So what Nisan showed is that you can do exactly this with log squared N bits of C. So you
can take log squared N bits and stretch them out to N bits so that no branching program of width
and length N can distinguish that string from a uniformly random string. And following this Saks
and Zhou gave what is now the best result for this problem that BPL is contained in L to the three
halves. But this is not via new pseudorandom generator. Although they did use Nisan's
pseudorandom generator, black box in order to get this result.
And another result, somewhat interesting, is the result of Saks and Zuckerman that shows that -remember, right now we have T log square N is the best for general programs. If you could make
this log squared log N, then we'd be done.
>>: In BP where you have like constant fraction of the inputs will give you a false positive and like
probability one-third you're hitting the once and probability, let's say, one-third you're hitting the
zeros, if it should be one where 0 is, right? Can't you just use random walks to get the linear site
out of this usual derandomization trick?
>> Anup Rao: Trying to get randomness completely. So I can't do random -- I mean, I could do
random walk in the batching program and submit the probability.
>>: Then you would need N.
>> Anup Rao: Then I would need N random bits. You can't derandomize that because then you
would have to run all over all N possibilities. The point, if you use pseudorandom generator, you
can derandomize it by doing the naive thing and it's not so expensive because T is small.
Okay. So one way to prove that BPL is equal to L is to give such a generator with T that's order
log N. And the best generator we know so far for general programs takes log squared N.
>>: So just one more question.
>> Anup Rao: You can ask many more.
>>: Again with this expander, you said that I would need log N [inaudible] but I can just
enumerate ->> Anup Rao: Log N.
>>: If you have an expander you need log N in order to, I want to think why I can't do random
walk with expander where these random bits are just -- I just take normalistically many and then I
can just brute force enumerate over there.
>> Anup Rao: I didn't understand that. So you pick the first vertex and expand completely
random.
>>: Oh, the starting point is what's happening. I see. Okay. Yeah. That answers my question.
>> Anup Rao: Okay. So that's kind of the state of where we are. And now let me tell you about
our results. So, first of all, our results are going to be for a somewhat more restricted type of
program. But exactly the kind of program that I've been showing you so far. Our results only
apply to regular programs. Okay? And what's a regular program? A program is regular if the N
degree of every one of these vertices in the middle is exactly two.
Okay? And just to point out, so the types of computation that I talked about with groups, for
example, will always produce a regular program like this.
>>: You're still not making the strong assumption the red and blue permutations?
>> Anup Rao: Right. It is weaker than permutation. And because I allow kind of two red edges
to come into the same vertex.
Okay. And just to point out another side fact. There's a famous result of Barrington that shows
that any circuit in NC1 can be simulated by branching programs. Those branching programs are
not like these they allow you to read variables more than once. But they are regular.
So regularity at least in some context is actually not such a big handicap. Okay. And what we
show is that in the case that programs are regular, then we can at least get a good dependence -we can get a really nice generator that has kind of a good dependence on, that interpolates
between kind of the result of Saks and Zuckerman and the result of Nisan.
So let me write, say exactly what it's saying. So it says that you can get a generator that
stretches T bits to N bits, where T is log N times log log N plus log D. Okay, where -- and the
output of this generator is pseudorandom for programs where the width of the program is D.
So if you kind of plug B equals N back in this result, then you get something like Nissan's
generator, but it shows how to take advantage of the narrowness of the program.
And I should have said this before. But before our work it wasn't clear how to, for example, get a
log N length C even for width 3 programs. So somehow even a small width it didn't seem like you
could take advantage of that. And we show how to take advantage of this.
>>: So you do slightly worse than the width 2 result to achieve log N for concept.
>> Anup Rao: Yes. So for width 2 the result was just exactly order log log N and we have an
extra log log N. Okay, but what I'm hoping is hiding in here is a technique that will be really
useful. And that's what I want to get at more than this result itself.
And okay so let me start talking about how we prove this. So it turns out that we're not really
going to design a new pseudorandom generator. We're just going to take the old pseudorandom
generator, in fact, something that's similar to Nissan's pseudorandom generator and just give a
better analysis of it. So let me start first by describing intuition for Nisan pseudorandom
generator. And this actually -- this is not exactly how Nisan presented it. It's an intuition that's
developed away of presenting it that's been developed over several works. Here's the basic idea.
So let's forget about trying to construct a generator from scratch. Let's assume that somebody
shows up and tells you here's a generator to generate N over 2 bits that are pseudorandom.
Suppose there's a function G that can take T bits and stretch those T bits to N over 2 bits that are
delta pseudorandom.
So now the question is suppose you actually want distribution on N bit strings that's
pseudorandom how can you take this generator that generates only N over two bits and use it to
generate N bit string? That's a question for you. What's the first thing you would try?
>>: [inaudible].
>> Anup Rao: Right. Exactly. So just use the generator twice. Right? So you can just use two
independent seeds. Stretch the first seed to N over 2 bits and N over 2 bits and just put them
together.
So what happens when you do this? You get a new generator now. Its seed length is double.
Become 2 T, and its output length is doubled. And also its error has doubled, right? So what you
get is that after you do the walk up to this point, the distribution on the vertices on this middle
layer is going to be delta close to the correct distribution. And then you do another
pseudorandom walk which will leave you at a distribution on the final output that's not two delta
close to the distribution. So that's what happens.
So the main thing we're going to try and do is try and derandomize it in some steps. This is way
expensive. We can't afford to spend two seeds to double output length to the generator. So
we're going to do something more clever.
>>: Do you find the doubling of the error?
>> Anup Rao: The doubling of the error will not be able to save -- sorry. These people were not
able to save. But then later I will show you that we save on it.
Now I'm going to show you how to save on the doubling of the seed which is what Nisan and this
line of work managed to do. But they didn't manage to stop the doubling there, which is what
leaves them to get worse results in this context.
So the main technical tool that's useful to kind of avoid doubling the seed length is what's called a
randomness extractor. Okay. So I'm going to be a bit informal here in describing what it is.
Because it's not so important for this talk.
But what it is is a way to kind of start with a distribution that is not close to uniform and make it
uniform. So it's a function that takes two inputs, a T bit string and a string with K plus log 1 over
epsilon bits.
And what it outputs again is a T bit string. And the property that it has is that if the input
distribution has entropy that's at least T minus K, okay, so entropy is K bits less than full, then the
output of the extractor is epsilon close to uniform.
So what the extractor does is it kind of takes these additional K plus log 1 over epsilon bits in Y
and kind of reloads that randomness back into the X that you started out with.
>>: Why is [inaudible].
>> Anup Rao: Y is truly random. So X is an arbitrary distribution that has entropy T minus K.
And Y is an independent string of random bits. And what the extractor can do is take this
independent randomness and kind of refill the randomness into X.
>>: Make some kind of assumption like no particular X is [inaudible].
>> Anup Rao: Right. So it turns out that -- I wrote here kind of Shannon entropy but in reality to
make all of this possible you need to measure entropy in terms of min entropy. The min entropy
of distribution is K if the weight of the heaviest string in that distribution is at most two to the
minus K.
That's even stronger measure of entropy. But in this talk I don't want to make distinctions
between these kinds of things, okay? So I just want to leave it -- let's just believe for a second
that as long as a Shannon entropy is T minus K, then this extractor works there are technical
issues that come up with the fact that you have to actually deal with this measure of entropy, but
let's not deal with them here.
>>: I'm sorry. I'm a little confused whether we should think of big X as a fixed string or a
distribution over strings.
>> Anup Rao: Big X is a distribution on strings.
>>: It is. Okay.
>> Anup Rao: So the distribution on strings that has entropy T minus K. So it's not quite uniform.
But you would like to get from it a uniform string of length T. And there's no way to do that
deterministically, you have to throw in Moran domestics, because entropy is only T minus K.
And what it says that if you do have K bits of entropy, you can add that. Okay. So that's what an
extractor is. And so now here's a more clever way to kind of double the output length of a
pseudorandom generator.
What you can do is before we just ran the old generator twice on two independent inputs. Now
what we're going to do is we'll run the first generator to generate the first N over two bits. But for
the second N over two bits we're not going to spend another T bits of C. Okay? What we're
going to try and do is just reload the entropy that X lost. So we're just going to use a small string
Y and extract from -- reload the entropy back into X and use that as a seed for the second part.
>>: So you're saying reload knowing TX reveals information about X?
>> Anup Rao: Right. At this point the program has remembered some information about X.
Right? The intuition is this point in the program is some information but hopefully not a lot of
information. It only has so many states to remember information about X. So even if you
condition on the vertex that you're at in the middle part of the program, you expect X to still have
a lot of entropy. That's what you want to take advantage of.
Okay. So just to -- maybe I shouldn't try to do this. But so here's a statement about extractors
that's not true but it's almost true that I want you to believe for the rest of this talk. The statement
is that here's another way to view what an extractor is doing. Suppose you have correlated
random variables X and A, with the property that X is uniformly distributed and A is distributed on
K bit strings. Then what the extractor does for you is the extractor has a property that if you look
at the joint distribution of A and the output of the extractor, then this is epsilon close in statistical
distance to the distribution of A and an independent uniformly random string. Okay. And again
the intuition is typically fixing this K bit string will reduce the entropy of X by roughly K bits, and
then the output of the extractor should be uniform for every fixed ->>: So how far from being true is this?
>> Anup Rao: Well, so to make it real, you know, the problem is there could be some fixing of K
bit string that actually completely annihilate the entropy in X but these fixings are unlikely so you
have to do this kind of argument. Yeah. So you have to -- it's not very far from not being true.
But I don't know how to measure the distance.
Okay. So once you have this view, here's how you analyze this construction. Here's the intuition
for it. Let's look at the path that's defined by this supposedly pseudorandom sequence of events.
Let's look at the vertex that's crossed by this path in this layer A. Okay. Then what you have is
that from what I said before, the vertex A has only kind of D log bits of information about X.
So if you set Y to be something like log D over epsilon then the joint distribution of A and the seed
for the second part is epsilon close to the distribution of A and completely independent bits.
And so what this gives you is you get a new generator where now the seed length, instead of
being 2 T is T plus something that's logarithmic in the width and epsilon. Okay. And we double
the output length just like before. And in order to make everything work we had to pay a small
price. We had to add this error of epsilon to the error of the pseudorandom generator, because
the output of this extractor is not exactly uniform but just epsilon close to uniform.
Okay. So that's kind of the main idea in the work of Nisan and all these works or one of the
central ideas is that you can kind of do this doubling more efficiently than the naive way.
So once you have this, you can then start making a pseudorandom generator from scratch. How
would you do it? Well, let's start with just a trivial generator. Let's take the generator that takes
log D over epsilon bits of C and just outputs one bit. Outputs the first bit of the seed. So this is
kind of a glossy generator, but it actually -- it works. Right? The output is completely uniform.
So no program can tell that apart.
And we'll just keep doubling the output length of this generator using the ideas that we saw.
So in the ith step we'll kind of add another log D over epsilon bits of seed and we'll double the
output length of the previous generator.
>>: Do you collect log N epsilon error?
>> Anup Rao: Right. We'll see. We will. We will collect N epsilon error. We'll see what
happens today. So what happens to the error? The error of the first step is 0. And each step,
the error that you get, is epsilon plus double the error of the previous step.
If you do this after you reached log N steps of doubling, you're now generating N bits that are
pseudorandom. Your seed length is now log N times log D over epsilon. And the error has
become, in every step, kind of the error doubles, and there's log N levels of doubling. So what
happens to the error is it becomes something like N times epsilon.
Okay. So that's the construction. So there are two reasons why the seed length for Nisan
generator is log squared N. The first is that D is N, right? So this term is always log N times log
N. If you want to handle width N. But even you wanted to handle width three to get a meaningful
result here you need epsilon to be smaller than 1 over N. Right? So this term is always going to
be log N. Either way.
Okay. So that's why the seed length is log squared N and what we're going to do is we're going
to show how to use, analyze the same construction but not pay so much in the error. Not add the
error like this. So let me just give you some intuition for why that should be possible.
Let's look at a couple of special cases. Okay. So intuitively -- so I'm just going to make some
vague statements. The program only has kind of log D things about the input that -- log D bits
that it's remembering about this huge input. Right? So it shouldn't be able to pay attention to all
parts of the input. So any fixed program should be in some sense paying attention to only some
parts of the program. Right? Some parts of the input. Most of the input it should be just kind of
ignoring. And that's what we want to take advantage of.
So let me give you an example. Let's say, for example, that the program, the output of the
program is completely independent of the first half of the input bits. Okay? So let's label -- let's
look at the middle layer and let's label every vertex there by the probability of reaching the accept
state with the random walk starting at that vertex.
So let's say, for example, that you're in the situation that if you start from here and you take a
random walk, you reach the accept state with probability one-third and it's the same for every
vertex in the middle layer.
So what does this mean? It means that it doesn't really matter what happened here, right? Once
you get to here in the random walk, it doesn't matter which vertex you're at, you're still going to hit
accept at the same probability. The settings of the bits here didn't matter.
Now, in this case if you wanted to analyze this construction, you see that the error from this side
of the program is irrelevant. Right? It doesn't matter how far away you are on the distribution at
this layer, because every one of those layers is as opposed to to accept the rest of the input with
probability one-third.
So, in fact, if you knew the program looked like this, you would prove that the new generator that
you get by combining these two is not 2 delta plus epsilon pseudorandom but just delta plus
epsilon pseudorandom. You would just have the error from the second part of the construction.
>>: Are you able to take epsilon 1 over log N?
>> Anup Rao: Eventually, yes. Eventually we're going to take epsilon to be like 1 over log N.
This program completely ignores the first part of the input. We shouldn't consider that part when
we add the error.
Another example, here's a program that completely ignores the second part of the input. So the
way I've captured that is by labeling the acceptance probability in the middle and you see they're
all either 1 or 0. Which means that once you hit this vertex, then no matter what happens, you're
going to accept. And once you hit this vertex, then no matter what happens you're going to
reject.
Okay. So again it's irrelevant what the bits on this side of the program are. And this time if you
try to do the analysis in a smart way, you know, it doesn't matter what happens here. Right? The
only thing that's relevant to fooling this program is whether you could fool it up to this point.
Okay. If you did this doubling, you should only count delta. The output of the final generators is
delta pseudorandom instead of two delta pseudorandom. So the reason I'm showing you these
cases is is because what I'm going to argue is that every program in fact is some kind of
combination of these kinds of things.
Okay? So in fact -- so here's what I want to do. I'm first going to describe a way to measure the
information in different parts of the program. Measure the amount of information, the different
parts of the program are gathering.
So let's say you have a particular layer of the program like this. Let's label every vertex in the
program just like on the previous slides with the probability that you will accept a random string
starting at that vertex. So the probability that a random walk starting at this vertex hits the accept
state.
So that's what those labels are.
Let's label every edge, let's weight every edge with the difference of the labels of the two vertices
that it connects. Okay. So this gives some weight on the edges. And now let's look at what
happens to our examples.
So in this example what can I say about the weight of all the edges on the left? 0, right? All of
these edges have 0 weight. Because all of these probabilities are one-third. And the same thing
happens in the other example. In this example all the edges, edge weights here on this side are
0. And in general what we're going to do is let's for any program or any segment of a program,
let's define this measure W of the program to be just the weight of all the edges that are in the
program. Okay. So in general I'm given a segment of the program. So there's some numbers A,
B and C, which are supposed to be acceptance probabilities at the end of the program.
>>: These weights can also be negative.
>> Anup Rao: No, I'm taking the absolute value of the differences. Okay. So I'm taking the
absolute value of the differences and the acceptance probability. So they're always positive. So I
claim it's a good measure of how much information this program is storing about its input. And so
here's how our proof is going to go. So remember I'm going to try and analyze the same
generator as before. So the generator starts with the single level and each time you double the
output length.
We're going to prove two statements. First of all, we'll prove that if you look at the error of the ith
generator, okay, and what I wrote here is kind of the expected value of the label you get to when
you feed in pseudorandom bits versus truly uniform bits, the difference in the expected value.
Then this can be bounded in terms of the weight of the program times I, the level of the
generator, times epsilon.
Okay. So if the weight of the program is small, this error is actually really small. And the second
thing we're going to show is that in the case -- so this statement has nothing to do with the
regularity. It applies to all branching programs. And the only place where you use regularity is in
the second statement where we show that if the program is regular, then the total weight of all
edges in it is completely independent of its length.
It only depends on the width of the program. The weight of all edges is at most 2 D if the width of
the program is D. Those are the two theorems that will prevent -- those turn out to be enough.
So if you plug everything back in, those statements, you get the final error of the generator is
actually only epsilon times 2 D times log N instead of epsilon times N that we had before.
And this allows you to get stronger results. Okay. So let me next talk about maybe the most
interesting part of our work is analyzing this measure of information. So that's what we want to
prove. We want to prove that if a program is regular, then the total weight of all the edges is at
most 2 D. It's independent of the length of the program. Okay. And to understand this, I'm going
to talk about a pebble game. A certain kind of game.
So let me show you what it is. Let's imagine you start with K pebbles placed on the real interval
of 0 to 1. So here are the pebbles. What I allow you to do in this game is that at every step I'll
allow you to pick any two of the pebbles, say those two, and move them towards each other.
So there are K pebbles on the line you can pick any two pebbles and move them towards each
other. And if you move them towards each other with a distance of alpha, then I'll say that so far
you've made a gain of two times alpha. So you've translated the pebbles by a total of 2 alpha.
And then you can continue in this way. So you can pick another two pebbles, move them
towards each other by a different distance, and I'll measure how much you move them together
and add that to your total gain.
So this is a particular kind of process involving pebbles. It took me a long time to make these
animations so I have to dwell on the slide just to make myself feel like I didn't waste my time.
Okay. We can do it again. All right. That's the pebble game. Now you might be wondering what
does this have to do with the random generators branching programs or computer science. But
here's what it has to do. Suppose you told me that you had a regular program and I claimed that
that program specifies a way to move pebbles around. Okay. So say you gave me this program
and it has this layer of vertices with those numbers on it. Then what I can do is I can put two
pebbles on every vertex at the end here and two pebbles here on the number A and two pebbles
for each of the numbers, as you can see.
And now this program specifies a way to move these pebbles exactly in the way that I had talked
about. So you can -- let's first move the two pebbles that correspond to the two edges connecting
to the top vertex there. So you can pick two of the pebbles from these two vertices and move
them to the next layer in the program.
And then what you're doing exactly is picking two pebbles, in this case B and C, and moving them
towards the mid point of B and C.
Okay. And you can do this for all the pebbles. Right. So for every edge you can move a pebble
along it. And you now end up with the same kind of configuration in the next layer.
So what's the point? What's the total gain that we did in this pebble game? What does it
correspond to in terms of the program. It corresponds exactly to the weight of the edges. Right?
The weight of the edges in this layer is exactly the total gain that we experience in this movement
of pebbles.
Okay. So all we need to do now to prove this theorem is we need to prove -- and this is the only
place where we use regularity. So for all that to work I put two pebbles on each have, and it was
really important that the number of edges coming in is equal to the number of edges going out.
>>: Is it false if you just assume bounded degree?
>> Anup Rao: Bounded degree, yes, it's false. Well, you can always assume bounded degree
because the width is bounded. So if the width is 3, the degree is at most 6. But, yeah, it's
actually false, and we have -- it can depend on N even with bit 3 and unbound, nonregularity.
Okay. But -- so now we've just reduced to the following situation we just have a bunch of pebbles
on the line and we want to know what's the maximum possible gain you can get using the pebble
game on these pebbles. Any guesses?
>>: 2 D?
>> Anup Rao: No. I think 2 D is actually wrong for these general positions. Anyway, it turns out
that the bound we can prove is the total gain is always just exactly the sum of the pairwise
distances between the pebbles.
>>: So this is the sort of slowest possible pebble game, you're talking about.
>> Anup Rao: This is the one that causes the pebbles to move the most. Eventually the game
ends when all the pebbles are in the same place.
>>: But this is relaxed from what we actually look at, right, because in the branching program
we're going to have we always move two?
>> Anup Rao: That's true. That's true. But it's even tight in that sense. So you can make a
branching program that realizes this bound. Okay. So I came to the total gain is just the sum of
the pairwise distances between the pebbles. And to see this, it's not hard. What I'll actually
argue is that in each step, when you move AI and AJ towards each other, for a gain of alpha, that
you have to reduce this quantity by at least alpha. And that would prove it. Right? Because this
quantity is always non-negative. And if every time you have a gain of alpha you release it by
alpha then you can't have a gain of more than this quantity. It's actually real easy to see. First of
all, when you move AI and AJ towards each other by alpha, then the term AI minus AJ reduces
by alpha.
And what happens to the other terms is if you have a pebble that's not in between AI and AJ,
right, then when you contract AI and AJ, one of the pebbles is moving away from this outside
pebble and one of the pebbles is moving towards this outside pebble. So all of those terms
remain the same. On the other hand, if you have a third pebble that's in between the AI and AJ,
then when you push AI and AJ together it only shrinks this quantity even more.
Okay. So overall that's an upper bound on the total gain that you can get in this pebble game.
Okay. Which means that if you start with the branching program and you have two pebbles that
at one accept vertex and all the other pebbles are at 0, then you get a bound of 2 D.
>>: So is it maybe true that the gain is always the same no matter what? [inaudible].
>> Anup Rao: No, that's not true. It's because what I described. So if you have pebbles in
between two pebbles, then you can reduce this quantity much faster.
>>: I'm not saying it's equal to that. But I'm asking if you're given the AI --
>> Anup Rao: So let me finish what I was saying. So on the other hand if you always take,
choose pebbles to move which are -- which don't have pebbles between each other, then it's
tight. So you can always achieve this bound by picking pebbles that don't have anything in
between them. So it can be different.
Okay. So that proves the bound on the total weight of all edges in the program. And the second
part is not very complicated. It's just a calculation. But the basic thing to note is that the point of
measuring the weight is that if you know that a program has small weight, then if you feed in two
distributions A and B epsilon close to each other, the expected value that the program can have
on the final vertex when it reads these inputs can disagree by at most the weight of the program
times epsilon. If you think about it, it's not hard at all. If you think about a single step, if a weight
of a single step is only .05, then even the worst case, if you have completely different inputs, you
end up at a value that's only a part by the sum of the edges. Okay. So if you're epsilon close,
you'll just be epsilon times the total weight of all the edges. Okay. And this is enough to give the
results. So I wrote the analysis here. But it's not really so hard. So you just argue about so now
we're again arguing about doubling the output length of the generator, and you can bound the
error in terms of kind of using induction in terms of the weight of the left part of the program and
the weight of the right part of the program. And these two weights add up to give you the weight
of the entire program. So it turns out you can bound this but I'll skip the technicalities but you can
bound it to give exactly the bound measure.
So that's it. That's our result. It's a generator for regular programs that uses seed length log N
times log log N plus log Z if the width is D. And it's interesting open problems. Obviously the
interesting ones get rid of all the caveats, get rid of the fact it's a regular program. And here you
can't use this measure of information that I talked about. You have to use something else if you
want to really get a win. And we don't know exactly what that something else is. There's a log
log N, which I think is coming from the fact that it's this construction. I think the construction it has
is inherent. But I'm not sure. And I think the most interesting thing about our work is this
measure of information on the program. And I wonder if there are other measures of information.
If that makes sense. Thanks. [applause].
>> David Wilson: Do we have any other questions?
>>: What's that picture?
>> Anup Rao: It's supposed to be coffee, but it's very black coffee, apparently.
Thank you.
>> David Wilson: Thanks again.
[applause]
Download