Document 17865062

advertisement
>> Yuval Peres: Alright, good afternoon. We’re delighted to have Li-Yang Tan tell us about his depth
hierarchy theorem for Boolean circuits.
>> Li-Yang Tan: Thanks Yuval. Yeah, thanks for the opportunity to be here. Gonna speak about joint
work with Ben Rossman and Rocco Servedio; Ben’s at Simon’s, and Rocco’s at Columbia. So this a talk
about circuit complexity, so very briefly, let’s recall the broad goal is to derive strong lower bounds on
the size of circuits computing an explicit function. So let me elaborate on two important words here.
The first is explicit; for this talk, it’s not very important; it’s … just think of a informal definition as
concrete, simple-to-describe functions. So a non-example will be—you know—a random function. And
I guess, formally, in complexity theory, we think of—you know—an explicit function as one being in NP.
And as for circuits, the holy grail is really to understand the power of polynomial-size {AND, OR, NOT}
circuits—the standard basis—or it’s known as—you know—P/poly, which is not important, but … so
we’ll ideally like to exhibit a function—a concrete, simple-to-describe function—that cannot be
computed by such circuits, which will separate P from NP, but we’re still very far from it. So the focus of
research so far—and the focus of this talk—is on restricted subclasses of P/poly, and in this talk, we will
focus on one specific restricted subclass.
So what’s this class? It’s the class of small-depth Boolean circuits; so this a Boolean circuit over the
standard basis of {AND, OR, NOT} gates. We can assume—it’s not hard to see—that all of the NOT gates
are pushed onto the bottom, although that’s, again, not important for this talk. So that’s the circuit, and
we are interested in function that requires complex circuits. And so what’s complex? There are two
measures that I’m interested in; one is depth, which is the number of layers you have; and the other’s
number of … size, which is number of gates you have—just these two. And we are gonna make it even
simpler and fix one of them. For this talk, we’ll think of depth as a constant—say, a hundred—and the
only parameter we care about is the number of gates, which is size. And we are interested in very
strong lower bounds on size—exponential, okay—under this assumption that the depth is constant.
Okay, so that’s our model.
Small-depth circuits has been studied since the eighties, and it’s really a story of success. You know,
even since the eighties, we have exponential lower bounds—again, it’s constant-depth circuits
computing an explicit function. So the landmark result here is that of Håstad, which we will talk a lot
about in this talk, and it builds on the work of Ajtai, Furst-Saxe-Sipser, and Yao—all in the eighties. And
just as an aside, I should mention that it’s quite rare in—you know—circuit complexity or complexity
theory that we have such strong lower bounds. It’s really among our strongest unconditional lower
bounds in all of complexity theory. Okay, so that’s … and the techniques that were developed to prove
these lower bounds, they are important beyond circuit complexity; they’re found in applications in
pseudo-randomness, learning theory, proof complexity, and so on. And at a very, very high level, in this
talk, I’m gonna speak about an extension of Håstad’s theorem, and we do so via generalization of his
techniques. So we hear a lot about Håstad’s theorem, his techniques, and how we extend both of them.
Okay, so let me go into detail about the outline of the talk. I’m gonna tell you about Håstad’s theorem
and two of the extensions, neither of which are due to us—both due to him back in the eighties—one is
average-case hardness, and the other is a depth hierarchy theorem; I’ll explain what each of these
means in a second; you can probably guess what the first means already. And our main result is that we
achieve both extensions of Håstad’s theorem simultaneously. So as the title suggests, we prove—you
know—an average-case depth hierarchy theorem, which again, I’ll try to explain what that means. After
telling you about our main result, I will give two applications, one in structural comple … two fairly
different areas: one in structural complexity, showing—you know—that a polynomial hierarchy is
infinite relative to a random oracle—I’ll explain what that means—and the second—and in a completely
different area—in the Fourier analysis of Boolean functions, we answer a question that’s been floating
around in the past few years about a converse to the famous Linial-Mansour-Nisan theorem, which I’ll
tell you about also. And if I have time—I hope we have time—I’ll tell you about a technique, which is
that of random projections, which is a generalization of Håstad’s technique, which is random
restrictions. So I’ll tell you about random restrictions and what random projections are.
Okay, so let’s get started with Håstad’s theorem. So Håstad’s theorem: he proved exponential lower
bounds against a constant-depth circuits computing an explicit function, and the function is very
explicit—the parity of x1 to xn. It really doesn’t get … there’s no arguing that this explicit; it’s concrete;
it’s simple to describe. Okay, so what does his theorem say? His theorem in his PhD thesis, he showed
that for every depth to … d greater than two—think of d as a hundred—any depth-d circuit computing
the n-variable parity function requires size two to the n to the one over d minus one. So in particular, if
you ask me to build a depth-100 circuit and make me compute parity, I need lots and lots of gates: two
to the n to the zero point zero one. So these are results we like in circuit complexity; it’s a very strong
lower bound against a very explicit function.
>>: Is it this tight? Like, from …
>> Li-Yang Tan: Yeah, it’s tight for parity. In particular, beating this bound for a function other than
parity is a big open problem, yeah. So this our best lower bounds against depth-d circuits. Okay, so
that’s Håstad’s theorem: parity require huge circuits.
>>: The fan-in and … for now, can be un …
>> Li-Yang Tan: Unbounded, right? Sorry, I should have said that: unbounded fan-in, but constant
depth. Yeah, if it’s bounded, then you cannot do much in constant depth. Yeah, sorry. Thanks for that.
Okay, so this is Håstad’s theorem—simple. Let’s talk about two extensions, both due to him. Okay, the
first is average-case hardness. So in fact, he proved something stronger; you know, two slides ago, I
showed that—you know—constant-depth circuits of—you know—what seems like large size cannot
compute parity. In fact, he shows in his thesis, that—you know—depth-d circuits of the same size—two
to the n to the one over d—agree with parity on … only on a half plus little o of one fraction of inputs. In
fact, the little one is very strong—exponentially small fraction of inputs. So this is, hopefully, clearly a
extension; two slides ago, I said that such circuits cannot compute parity; this gives me more
information; it says that such circuits cannot even correlate with parity. Okay?
So just a few notes: the … just the constant one function—a trivial, you know, depth-0 circuit—has
agreement one half with parity. So this sort of says that if you allow me to build a depth-100 circuit, you
allow me size two to the n to the zero point zero one—which seems like a lot—I cannot do much better
than a constant. If I’m lazy, I may as well just output the constant; I cannot even do an exponentially
small fraction better than constant. So it’s a very strong statement. And this is not too relevant for the
talk, but as an interesting aside, this was implicit in his thesis, but the exact relationship between d, and
this size, and the correlation bound was only recently pinned down by Impagliazzo et al and Håstad
himself thirty years later. Okay, so this the first extension: correlation bounds against parity.
Let me talk about the second extension, which is that of a depth hierarchy theorem, and this will take
slightly more time. In particular, I need some notation; AC0 sub d is a set of all depth-d poly(n)-size
circuits or functions computable by them. So depth-2 is not very interesting; it’s the set of all
polynomial-size ORs of ANDs or ANDs of ORs—also known as CNFs or DNFs—depth-3 circuits, depth-4,
and you take the union of it all, you get AC0, which is the class of all functions computable by this
constant-depth poly(n)-size circuits—in particular, parity is not one such function. Okay, so this is a
cartoon picture of Håstad’s theorem; it says that you have AC0, which is all constant-depth circuits; the
… and parity doesn’t live in this green circle. In fact, if you want to compute parity with a poly-size
circuit, you need depth roughly log n. So in a cartoon, it says that parity lives outside this circle at depth
log n. So here’s a challenge: I want the same lower bound against depth-d circuits, so Håstad’s theorem
tells us that—you know—if you make me compute parity with depth d, I need two to the n to the one
over d. So here’s the challenge: I want the same lower bound against depth-d circuits, but for a function
that AC0 depth-d-plus-one. ‘Kay, so intuitively, this feels like a more challenging task; you want the
same lower bound against the same class of functions, but for a much simpler target function. Right, in
particular, you want it to be so simple, even just allowing me one layer more of depth, I can compute it
in polynomial size. So this is—hopefully also obviously—and extension of Håstad’s theorem. So … and
then, Håstad was able to do it; it’s the so-called depth hierarchy theorem; and also in his PhD thesis, he
showed that for every depth, d, greater than two, there’s a function, fd—the so-called Sipser function,
which I’ll tell you about—such that fd is actually quite simple, as in depth … linear size, depth-d-plusone, but if you force me to use a depth-d circuit, I will require an exponential blow-up—two to the n to
the one over d—and this is, again—just to recall—the same lower bound we have against parity.
Okay, so a few notes: it builds on the work of Sipser—and hence, its Sipser function—which gave a
super-polynomial separation, and Yao, a few years later, gave an exponential separation and was
sharpened by Håstad. Okay, and before Yao’s work, the monotone case was solved by Klawe et al; it’s
the same theorem, just put monotone everywhere. There’s monotone function fd—Sipser is
monotone—fd is in AC0 depth-d-plus-one, but the lower bound only holds against monotone circuits, so
it’s a … yeah. And the hard function for all of the above is the Sipser function, which I’ll tell you about.
But before telling about that, just very briefly, the conceptual message here is that—you know—why is
it called a depth hierarchy theorem is that—you know—it says that depth-d-plus-one circuits are much,
much more powerful than depth-d. You know, you give me a nice, linear-size depth-d-plus-one circuit,
you make me decrease the depth by one, I may have to ex … blow up exponentially.
Okay, so what’s the depth-d Sipser function? The formal definition is a depth-d, read-once, regular,
alternating, monotone formula, but it’s really … the picture just says it all. You have alternating layers of
AND, OR, AND, OR, AND, OR—depth-d—the fan-in is regular—it’s n to the one over d—and it’s readonce; and it’s alternating AND/OR. It’s read-once in that—you know—the bottom-layer AND gates or
OR gates touch distinct set of variables, and it’s monotone—there are no NOT gates. Right? It’s sort of
the obvious depth-d formula; it’s sort of—you know—you made me write down a depth-d formula, this
is sort of, maybe, one the first I’ll write down. And in particular, it’s linear-size, right? You … any readonce formula is linear-size—a very nice, regular structure. That’s a Sipser function. So let’s get some
intuition as to why the Sipser function is so important for depth hierarchy theorems. So here’s a
challenge: I have a depth-3 Sipser—n to the one third fan-in, AND, OR, AND, OR—and now, it’s nice and
linear size; now, you point the gun to my head, and you say, “I want you to compute this in depth two.”
Okay? So the … here’s one thing I can do: I focus in on this AND subgate; I rewrite an AND of ORs as an
OR of ANDs—right, this is De Morgan; we can do that, but not very efficiently, right? How do you
rewrite an AND of ORs and an OR of ANDs? You take one from every bucket—so it’s n to the one third
to the n to the one third—so it’s roughly two to the n to the one third, right? And you write this gate as
a OR of ANDs—do this for every gate—and now, you look … you see that—you know—you have OR of
OR of ANDs, and you can collapse the two ORs, because an OR of ORs is OR of ORs. It’s kind of—I don’t
know—this one way to do it, but not very smart, and you can ask—you know—whether there’s a better
way to do it, right? And of … and what Håstad shows is that—you know—essentially, this is the best
thing you can do. If you want to compute depth-3 Sipser with depth-2, just apply De Morgan, blow up
the size—it’s the best thing you can do. And the same thing with depth hundred, computing it with
depth ninety-nine—right—you ju …
>>: So what you say … essentially, we really don’t work at all anyway, and the lower bounds nearly
match, so …
>> Li-Yang Tan: Nearly match. You don’t always get two to the n to the one third, right? You get two to
the n to the omega one over d.
>>: Yeah, yeah, yeah.
>> Li-Yang Tan: Right? So it’s like you have lower bound two to the n to the one hundred and eighty.
>>: Okay, but let’s say, from a construction point of view, you don’t know anything better than this.
>> Li-Yang Tan: Oh, strictly better? Like, even just to save one gate?
>>: Yeah, or not one gate, but let’s say something in …
>> Li-Yang Tan: Yeah, I’m not sure. That’s a good question. For depth-3, I do not know if I can prove a
precise two to the n to the one third lower bound.
>>: No, but the question is different.
>>: No, I’m just saying upper bound.
>> Li-Yang Tan: Sorry?
>>: Is the question different: can you save one gate?
>> Li-Yang Tan: I don’t know, yeah. I … the lower bound is definitely not delicate enough to capture
that you cannot, but maybe the upper bound is … yeah, I wouldn’t say that this is exactly optimal. Yeah,
yeah, okay. So yeah, that’s a depth hierarchy theorem and some intuition as to why Sipser is the
function to look at.
Okay, so what have we shown? We have Håstad’s theorem, which is nice; it says that parity not in AC0;
in fact—you know—depth-d—what seems like large size—two-to-the-n-to-the-one-over-d-size circuits
cannot compute parity. We have two extensions that feel somewhat different. One is that depth-d
circuits of that size cannot even approximate parity to fifty-one percent. The second one is that—you
know—we don’t consider parity—we consider much simple function—and we get—you know—the
same kind of lower bound against the same kind of circuits. So you can ask—and Håstad asked this—
you know, can I get the best of both worlds? Can I show that there’s a function at depth d plus one such
that, if you force me to us depth-d circuits, and you allow me huge size, I cannot even approximate it?
Right, that seems like a natural best-of-both-worlds kind of extension, and our main result in this work
is: we confirm this conjecture of Håstad. Okay? So I’ll like to tell you about this picture for the rest of
the talk. Okay. So more precisely, it’s no surprise we have a Sipser function; we show that for every
depth, d, greater than two, there’s a function fd—the Sipser function—which is in depth-d-plus-one
AC0, but depth-d circuits of size two to the n to the one over d, they have agreement little o of … half
plus little o of n. Okay, you cannot do fifty-one percent. And previous work: O’Donnell-Wimmer, in
2007, they proved the d equals two case; they proved—you know—a depth-3 Sipser you cannot
compute it in depth-2 … you cannot approximate it in depth-2. And then it’s, well, starting point for us;
we build on the techniques, and in particular, you hit … this is sort of the base case for us. So what we
did is basically a reduction—a depth reduction—to the O’Donnell-Wimmer proof of the depth d equals
two case.
Okay, to answer a question that is probably on your minds, this correlation bound—right—a few slides
ago, I said—you know—parity is very, very hard for AC0, and—you know—your correlation’s ex … at
most exponentially small, which is … and here, it seems like we’re not doing much better … or we’re
doing … we’re not doing as well; we only get—you know—one over poly(n) for d being constant. So you
can ask: why not exponentially small? I have two reasons: one is simply not possible for the Sipser
function; the Sipser function is a monotone function, and it’s a standard result that every Sipser function
… every monotone function has one over poly(n) correlation with a very simple circuit—you know,
either a dictator, an xi or a constant. So you cannot hope to do this for the Sipser function. So this not a
very good excuse; you say, “Find another function—a non-monotone function—at depth d plus one for
which you can prove exponential correlation bounds.” But in fact, it’s not possible for any function to
begin with; if you force the function to be in depth d plus one—which it has to, because of the name of
the game—it’s a standard result that any depth-d-plus-one circuit, it has one-over-poly(n)-correlation
with one of the depth-d circuits that feed into the top gate. So that seems to be some sort of
fundamental difference between depth hierarchy theorems’ correlation bounds and parity. In
particular—you know—for second reason—the first is … yeah—the second reason—you know—you
cannot hope for half plus exponential in n to the one over d, so it’s essentially optimal or constant—
yeah.
Okay, so that’s our theorem; let me touch on two applications of our result in two fairly different areas.
One’s in structural complexity—so forget everything about circuits—let’s talk about oracles’
relativization. So part of our job in complexity theory is we want to separate complexity classes, and—
you know—we haven’t been able to do so for many of them, and the famous example is whether P is
equal to NP. So we haven’t been able to do it, so we can consider twist of the question: imagine a world
where algorithms have free access to some magical function; call it A, okay? You give A an input, and for
free, in unit time, you get answer—so think of A as 3-SAT; so you have this magical oracle that you give it
a 3-SAT formula, and you snap your finger, and it tells you whether it’s satisfiable or not. So in this
world, you can ask whether P is equal to NP; it’s a slightly different question, but it seems like a natural
question; and the notation is thus P to the A equal NP to the A—does P, given an A oracle, equal NP,
given an A oracle? So a priori, it’s not clear that this is any easier than P versus NP, but in … actually, for
this, we have made a lot of progress, even since the seventies. The paper introducing this notion of
oracles—Baker, Gill, and Solovay—they noted that there exists some oracle for which you can separate
P and NP, and it … this was improved a few years later, qualitatively, by Bennett and Gill, who show
that—you know—in fact, for almost every magical oracle, A, P is … to the A is not equal NP to the A. So
we are still far from separating P versus NP, but at least in this—you know—the oracle sense of the
word, this is a pretty satisfactory solution. Right, not only that’s there exists one oracle, for almost every
oracle, they are distinct. Okay, that’s great.
>>: What is distribution? I mean, when you say almost all …
>> Li-Yang Tan: A uniform distribution—say, ninety-nine percent or half minus little o of one. Almost
every function that you can give, it’s … P is not equal to NP.
>>: True.
>> Li-Yang Tan: And yet, we cannot … that doesn’t apply to P does not equal NP in our world. So that’s
a little off-topic, but as a caveat, this shouldn’t be taken as evidence that P is not equal to NP for various
reasons that we discovered later—not we, but you know, in the eighties, yeah. Yeah, so … but that’s a
caveat, but it, just independently, is sort of an interesting question. Okay, so we have resolved P versus
NP on these worlds; let’s sort of move on to other questions. So here are two statements that we also
believe are true; they both concern the so-called polynomial hierarchy—I’ll explain what that means in a
second. One is that PH—the polynomial hierarchy—is not equals to PSPACE; the second is that PH is
infinite. For this talk, it’s actually not really important to know what either of these statements mean or
really what the PH is; these are two statements—like P versus NP—that we’d like to prove, but we
cannot prove; and two is stronger than one; and two implies P is not equal to NP, so two is very, very
strong. And so we are stuck on them, but we can ask—you know—do these separations hold relative to
one oracle? And if we can do that, can we ask … we can ask: do they hold relative to almost all oracles?
>>: One is false, two’s also false, right? They’re … oh, okay.
>> Li-Yang Tan: Right, two implies one. Exactly, so two is a very strong statement. So let’s see; we had
success on the P versus NP question with respect to oracles; and we have much success here, too. Yao
and Håstad showed that—you know—the weaker statement—that PH is not equal to PSPACE—holds for
some oracle, A; there exists a magical function such that PH is not equal to PSPACE. This was improved
by Cai and Babai, who showed that PH is not equal to PSPACE for almost all oracles, A. So the weaker
question—you know—we are very satisfied, again, and Yao and Håstad also proved that, in fact, PH is
infinite relative to some oracle, A. And again, you can ask for—you know—they conjecture that—you
know—you have the strongest of all these statements—that in fact, the stronger statement is true for
almost all oracles, that PH is infinite for almost all oracles, A. So this would imply these results,
because—you know—PH being infinite for almost all oracles, A, implies that PH is not equals PSPACE for
almost all oracles, A, and—you know—this version says it just for some oracle, and here, we are saying it
for almost all oracles. And in this work, we confirm this conjecture, and in fact, I like to touch on why
this is a direct consequence of our circuit lower bounds, and if you can … and if you may have guessed
from the names of people who proved this, these—you know—relativization results are established
using circuit lower bounds. So let me touch on this connection between circuits and relativization.
Okay, so there is actually a tight connection between—you know—the class of circuits we are interested
in, which is—you know—bounded-depth circuits, which we think of as very limited models of
computation—you cannot even compute parity—and—you know—the polynomial hierarchy, which is a
very, very expressive—and you know—tower of models of computation. And there’s a … this
correspondence was noted by Furst, Saxe, and Sipser, and roughly speaking, they differ by an
exponential. And the connection is really, really very close—you know, depth-3 AC0 with, you know, an
AND gate on top corresponds to Pi3, no, depth-10 AC0 with an OR gate on top corresponds to
Sigma10—okay, there’s a, really, a one-to-one correspondence. In particular, I believe that this was the
original motivation for proving circuit lower bounds; we wanted to use circuit lower bounds to prove
lower bounds against the polynomial hierarchy. Okay, so let’s take a look at the paper; it says—the title
of the paper is “Parity, Circuits, and the Polynomial-Time Hierarchy”—it says that a super-polynomial
lower bound is given for the size of circuits of fixed depth computing the parity function. Okay, and he
say … they say the connections are given to relativization of the polynomial-time hierarchy. So this a
paper, and let’s translate it; the first slide basically says that parity’s not in AC0; and the second says
that—you know—so they prove a super-polynomial lower bound on the size of AC0 circuits computing
parity, and they say that if you can improve it to super-quasi-polynomial, then you have that PH is not
equal to PSPACE for some oracle, A. Okay, and it’s quite easy, and so they didn’t quite do this, but they
noted that if you improve this separation to super-quasi-polynomial, then you get this separation. You
get … and this was done by Yao and Håstad in ‘86, and not surprisingly, they proved this by proving the
circuit result; they proved sufficiently strong lower bounds on the size of circuits computing parity.
Okay, so in fact—you know—we saw two slides ago—y’know—you have the circuit results and the two
extensions; they correspond perfectly to a picture in the relativized world. Yao and Håstad, they proved
that—you know—depth-d circuits of—you know—large size cannot compute parity, and that
corresponds exactly, and this was a motivation for them to show that PH is not equal to PSPACE for
some oracle, A. You have two extensions, one of which says that you cannot even approximate it—that
corresponds exactly to the strengthening of this theorem to say that the separation is true for almost all
oracles, A. You have another extension that says that—you know—depth-d circuits cannot compute,
not just parity, but a super simple function at depth d plus one; that corresponds exactly to the fact that
PH infinite … PH is infinite relative to some oracle. And just like how you can ask for the best of both
worlds in the circuit world, you can ask for the best of both worlds in the oracle world, and by
confirming Håstad’s conjecture, we confirm the conjecture that PH is infinite relative to almost all
oracles. Okay, I’m not gonna touch on this connection more, except to say that it’s really a very, very
close translation—it’s really a mirror image. We didn’t even prove that; it just follows by standard
techniques.
So okay, that’s application one; it’s sort of retro and old-school. Let me switch gears to a different
application now in analysis of Boolean functions—completely different, so forget all about oracles. So
let me start with a very basic fact about circuits. Fix a function f; consider following very simple
experiment: draw a uniform random x, flip a coordinate, alright? It’s so simple, and I want to know
what’s the probability that f of x is not equal to f of y, and I’m gonna multiply by n—this a matter of
convention; don’t worry too much about it, but if you allow me to multiply by n, it brings it to a number
between zero and n—and it’s known as the influence of f on … also, like, average sensitivity. And it
makes sense; the av … the name average sensitivity makes sense, right? It’s average number of
coordinates on which you’re sensitive on. So given any function, f, I can ask: what’s its influence? Is it
low? Is it close to zero? Is it high? Is it close to n? It’s just a measure of Boolean functions, and a
famous theorem of Linial, Mansour, and Nisan’s, sharpened by Boppana, says that if you give me some
more information about f—you promise me that f is computable by a size-s, depth-d circuit—then its
influence is bounded by log s to the d minus one. And again, the regime of parameters we should think
of is s being poly(n) and d being constant, in which case, LMN says that—you know—AC0 circuits have
polylog(n) influence, which in a spectrum of zero to n, we should think of as low. So it says they … LMN
says that—you know—small circuits of small depth have low influence. And this is a very important
result.
So here’s … let’s look at—you know—the usual suspects. So this is the line of all possible influences. On
the left, you have low-influence functions; on the right, you have high-influence functions. So let’s start
with high influence; parity is the world’s most influential function—it has influence n—a random
function has influence n over two—so very influential—and majority’s not hard to work out—it …
influence is roughly root n. For this talk, let’s think of root n as high. Okay, so that’s high-influence
functions. Let’s look at low-influence functions; you have the constant function, which is very boring—
its influence is zero—x1 which is also boring—its influence is one. You know, these are not very
interesting functions. And you have the Tribes function, which is a DNF; its influence is log n. Okay, so
you have this spectrum with all the usual suspects and Boolean functions lying on it. And as you can see
on the right, it’s canonical functions—canonical examples of functions—that do not lie in AC0; they
are—you know—complex functions, and LMN says that this is not a coincidence. You know, again, for
the range of parameters we should think of—you know—size is poly(n) or even quasipoly(n); it’s a … if
the depth is constant, LMN says that—you know—you give me such a function, it lies to the left of
polylog(n). So in particular, LMN shows that majority, and random function, and parity are not in AC0.
So it’s a strong theorem about—you know—the stability of circuits. Okay, so LMN is great.
A question that was asked by Benjamini, Kalai, and Schramm in a very famous paper about noise
sensitivity and about influence—and it was repeated in a different form in O’Donnell, and Kalai, and
Hatami in the past few years—is whether the converse to LMN is true. Okay? What do I mean by that?
LMN, vaguely speaking, says that small-depth, small-size circuits have low total influence; is it true that
low-influence functions are basically circuits—basically small-depth circuits? It’s not hard to show
that—you know—low-influence functions, they are not exactly a small-depth circuit, but it’s … it was still
possible that low-influence functions are essentially small-depth circuits and that you can well
approximate it with a small-depth circuit. So just to be slightly more precise, LMN says that if you give
me—you know—this structure that you have—you know—poly-size or even quasi-polynomial-size and
constant-depth circuits, you lie to the left of polylog(n). Okay? Now, you can ask: if you lie to the left of
polylog(n), do you have this structure? Are all polylog(n)-influence functions well-approximated by the
same class of functions? If it’s true, this’ll be a very, very nice characterization of low-influence
functions. But to spoil the suspense—you know—we disprove it; we show that our main result gives a
strong counterexample to this, which is unfortunate. And in particular, a question that came up during
the times I gave this talk is: it will be nice to try to save this conjecture somehow; I’m very interested in
the structure of polylog(n)-influence functions. Yeah, and roughly speaking—as an aside for the
experts—like, log n is where Friedgaard’s theorem breaks down. Like, if your influence is below log n,
Friedgaard’s theorem gives you a very nice structure, but log n is where it doesn’t give you any
information. So I’ll be very … I’m very, very interested in the structure of log(n)-influence functions, but
… and I was trying to prove this all summer, but ended up disproving it instead. So yeah, okay. So that’s
… and again, it’s a simple consequence of our main theorem; I’m not gonna go into it; it’s … there’s
nothing there; it’s … it follows quite easily.
>>: So the example again? It would be the …
>> Li-Yang Tan: Sipser function scale-out. Yeah, it’s …
>>: Sipser function?
>> Li-Yang Tan: Sipser function of depth, say, square log n. Okay, it has some influence, and what does
our main result show? It says that if you are any circuit of depth, not just constant, but square log n
minus one cannot even approximate it.
>>: Yeah.
>> Li-Yang Tan: So by adjusting parameters, choosing—you know, instead of square log n—maybe
something else, you get a log(n)-influence function that cannot be approximated even by superconstant-depth circuits. Okay, so yeah, again, a open problem is to rescue this somehow.
Oka, so two applications; one is—two fairly different areas—one is in structural complexity which shows
that—you know—PH is infinite relative to a random oracle, and second is to answer this BKS conjecture
that—repeated by O’Donnell, Kalai, and Hatami—which is that, unfortunately, there’s no approximate
converse to LMN. Okay, so let me actually talk about actual result. So for the rest of this talk, let me tell
you about Håstad’s techniques, the difficulties in applying them—you know, and by that, I mean
applying them to get our result, which is an average-case depth hierarchy theorem—and how our
techniques overcome these difficulties. Okay, in particular, let me give a very, very high, rough structure
of Håstad’s theorem, just in one slides and pictures, and I will try to say why extension one—the, you
know, approximation version—follows easily; in fact, it’s implicit in the proof. I’ll tell you why extension
two does not follow easily and why Håstad had to do extra work to prove extension two. And the way
he prove extension two, I’ll explain why it breaks extension one—why he loses average-case hardness—
and—you know—I hope to convey this tension between extension one and extension two—if you
wanted extension one, you cannot get extension two, and if you want extension two, it was hard to get
average-case hardness—and how our techniques somehow are able to get both.
Okay, but let’s start with the very basic Håstad’s theorem; in a picture, let’s, like, recap his proof. So one
slide … first of all, his main technique—which is that of random restrictions—it’s a very simple concept:
you take a Boolean function, f; you apply a random restriction to it; you get a simpler Boolean function, f
sub rho, where rho is your random restriction. I’ll make this more precise in the next slide, but it was a
very important concept introduced by Subbotovskaya back in the sixties, but even till today, is a really
very indispensable tool in circuit complexity. Håstad’s theorem uses it, and our theorem builds on it. So
let me tell you about what a random restriction is. So a random restriction, you have some parameter
p—think of it as small, so zero point one or one over log n—and you generate—you know—a string, rho,
in {0, 1, *} to the n. And how do you generate it? With probability p—independently—with probability
p, you put down a star; otherwise you put … you flip a coin, and put down all the zeroes and ones. So
the string you get out is roughly a p-fraction of stars, and in the one-minus-p-fraction, is split half-half
between ones and zeroes. Okay, this what a restriction is; what does it mean to hit a function with a
restriction? So the restriction of a Boolean function, f, by this string, rho, is the function where—you
know—you take in values for the stars, and for the non-stars, you fill in according to the template.
Okay, so this is what it means to transform f into f sub rho. So as you can see, intuitively, this is—you
know—it … you do make the function simpler, right? It was an n-variable function; now, it’s an—you
know—pn-variable function. So let’s see what this does to functions; so this what a random restriction
is.
So here’s Håstad’s theorem again as a cartoon. It says that parity—the red dot—doesn’t live in the
green circle; it lives at depth log n. So how do you prove such a statement? You hit the red dot with a
random restriction, and you hit the green circle with a random restriction, and if you can argue that two
different things happen to them, then clearly, the red dot cannot like in the green circle, right? So in
more detail, you argue that parity, when you hit it with a random restriction, it remains complex. It
basically becomes parity on fewer variables, but still very complex. You take AC0, and you hit it with a
random restriction, you are gonna collapse this to a really, really simple function—a small-depth
decision tree—which roughly speaking, lies within here; it lies within, like, the bottom-level circle. And
then, if you can argue these two, then to finish it off, you just argue that—you know—simple functions
cannot compute complex functions, and you’re done, right? Okay, so of these three steps, one is
essentially by definition of random restrictions and parity—it’s not hard at all—three is a simple
exercise—that small-depth decision trees cannot compute parity—the main technical work that Håstad
had to do was the second step—to show that if you hit the green circle with a random restriction, it
really … like, the whole tower just collapses. Okay, so let me … one slide about this main technical
ingredient. It says—you know, famous switching lemma—it says that you take any function in AC0, you
hit it with a random restriction—with carefully chosen p, depending on the structure … the size of the
circuit—the depth of it decreases by at least one. So to prove this theorem, you hit it—you know—d
times, and you can argue that—you know—by hitting it, the overall restriction, it collapses to a very
simple function. Okay, so there’s one slide; we’ll come back to this later, but that’s really the main
technical ingredient, and this is the technique that’s in … seen lots of applications. Okay.
So we just sort of sketch the proof of parity not in AC0 result; in particular, for every depth, d, greater
than two—you know, depth-100 circuits of large size cannot compute parity. Okay, and actually, I claim
that we have implicitly also established extension one. You know, the proof implicitly gives averagecase hardness, that not only can you not compute parity, your agreement is a tiny, tiny fraction. So let
me tell you in one slide why this follows. It’s a consequence of our random restriction; they key fact
here is that our random restrictions hide a uniform random string. What do I mean by that? They … I …
the wor … the phrase we have been using is they complete to the uniform distribution, which I guess is
not very formal, but here’s the formal meaning: you generate this random restriction, rho, alright? You
… zero, ones, and stars, and then now, you fill in the stars with zeroes and ones uniformly at random.
Consider this experiment, so at end of it, you get a fully zero-one-value string. The obvious-but-crucial
fact that this gives average-case hardness is that—you know—the resulting string is a uniform random
string, and this is sort of why—you know—you have proved an average-case hardness result. By hitting
a function with—you know—this random restriction, you’re implicitly feeding it a uniform random input.
So this is why you get … and this is not hard to see at all; it’s because—you know—when you’re not a
star, you’re split between zeroes and ones. I mean, so I won’t go into detail, but this is, roughly
speaking, the crucial fact behind why you get average-case hardness.
Okay, what have we done? We have sketched Håstad’s proof of his basic theorem; we have shown
why—you know—extension one is essentially—you know—implicit in the theorem. Let me tell you the
more interesting this as to why extension two is not … does not follow easily from Håstad’s theorem and
why, by proving extension two, it broke extension one. So here, again, is the statement of Håstad’s
theorem and its proof—it says that parity is not in AC0. And how you prove it: you show that—you
know—when you hit parity with a random restriction, it remains complex, whereas, you … if you hit AC0
with a random restriction, you collapse to a simple function, and you just note that—you know—simple
functions cannot compute parity. Okay? Why can’t it prove a depth hierarchy theorem? It’s because
for a depth hierarchy theorem—you know—I want to separate … it’s not delicate enough to separate
depth d plus one and depth d. You know, the lightning is too powerful; it destroys all of AC0. In
particular—you know—I … my hard function—say the Sipser function—lies in here, and you destroy it,
right? You try to do the proof, and you say, “I hit my hard function with a random restriction; I hit
depth-d AC0 with random restriction.” It shows that both of them collapse to small-depth decision
trees, and it’s not … it doesn’t give you the contradiction you want.
But this was not trouble for Håstad; he was able to do it be designing new random restrictions—not
yellow in color, but blue in color—designed specifically for the Sipser function to keep it complex. Okay,
so in a picture, what he does is that he comes with a new one, specifically with Sipser in mind, so that he
very carefully keeps Sipser complex—he doesn’t want to destroy it—and yet, he still has to prove that
anything of depth one less than Sipser still collapses to a decision tree. So he had to prove a new
switching lemma for the blue random restrictions, and this blue random restriction was tailored very
specifically for the Sipser function. So this is very nice. In particular, here you see you are doing
something more … much more delicate, right? Your contradiction comes in a fact that a decision tree
cannot compute a depth-2 circuit—you know, it really has to be very careful. Okay, so intuitively—just
to say again—it’s a much more delicate task, right? For parity and AC0, it’s a nice result, but—you
know—in part—you know—your hard function was really hard to begin with. So you just have to—you
know—destroy AC0 and show that—you know—by destroying AC0, you do not destroy your hard
function by too much. Whereas here, you’re really trying to get very, very fine-grained information
about the structure of circuits, right? You have to come up with something that destroys depth-d
circuits, but preserves—you know—your special function at depth d plus one. So this was [indiscernible]
but Håstad did it; so this is parity not in AC0; this is depth hierarchy theorem. But he paid a price; the
price comes in the fact that he only gets worst-case depth hierarchy theorem. So recall the key fact
about the yellow—the usual—random restrictions; it was independent across coordinates, and in
particular, it completes to the uniform distribution. Suppose you’re hiding—implicitly hiding—a uniform
random string in the random restriction. Håstad’s new restriction’s thus carefully tailored for the Sipser
function—you know, your coordinates are not independent; they are carefully correlated to keep Sipser
complex. Okay, and the distribution is only supported on a exponentially small set of inputs, and hence,
you only prove worst-case and not average-case hardness.
So just to summarize the difficulty that we faced when we tried to do this project: at a high level, there
are three requirements for an average-case depth hierarchy theorem; Håstad has two things, both of
which achieve two, but not three. One, you have to keep the target function—the hard function—
complex; two, you have to … your approximator, you have to destroy it; and three, you have to do so in
such a way that however you’re hitting them, it completes to the uniform distribution, okay? So
Håstad’s parity not in AC0 proof does two—you know, his famous switching lemma destroys AC0
circuits; it completes to the uniform distribution, just becomes it’s so simple; you know, you flip a coin
independently for every coordinate—but his—you know—his yellow lightning was too powerful; it was
designed to destroy all of AC0; and in particular, it destroys the hard function that you’re supposed not
to destroy. Okay, so he—when faced with this—he said, “No problem. I’m gonna define a new random
restriction that keeps my target function complex in depth d plus one. I still prove that my switching
lemma still holds—that it collapses.” But the price he paid was that he did it so carefully and correlated
the coordinates so carefully that it doesn’t … your proof … you’re not … it doesn’t complete to the
uniform distribution. And in this work, we design a random projection that achieves all three, and it’s
not hard to see that with random restrictions, you cannot achieve all three, and a key idea here was—
you know—random projections.
So let me, in my remaining time, tell you a bit about random projections and how … what … how do they
relate to random restrictions? And if I have time—I’m not sure I do—but I’ll sketch how projections
achieve very … all three. Okay, so again, our technique is random projections, which generalize this—
you know—notion of random restrictions. So a restriction—just to recall—you take a Boolean function,
f, over x1 to xn; you hit it with a random restriction; you get a simple Boolean function over x1 to xn. A
random projection, on the other hand, you take a Boolean function, f, over x1 to xn; you randomly
project it; you get a new Boolean function over new formal variables, y1 to ym. So this feels like a
generalization, because—you know—a restriction is just where your new formal variables are your old
formal variables. Okay, let me be more precise; in a random restriction, every xi is either set to a
constant—zero or one—or it survives—you know, I have been denoting it star, but you can think of it as,
you know, xi maps to xi, right? That’s what it means to survive. In a random projection, every xi is
either set to constant like before, or you can map it to a new—brand new—formal variable, yj, where j
doesn’t have anything to do with xi. So you’re basically changing the space of variables. And how do we
exploit this? Very roughly speaking, in our proof, the new variables—the y variables—are much smaller
than x variables, so we have a lot of collisions, right? This is something that you cannot do in a
restriction world; we map many to many different xi’s to either zero, one, or the same yj, and we map in
such a way depending on a structure of the Sipser formula. Okay. So again, projections, it’s easy to see
it’s a generalization of restrictions, because every xi, instead of mapping to yj, you enforce that it just
maps to the same xi.
Okay, so hopefully, I can give you a sense of why this helps us—be fast. And I’ll do so … I will prove a
weaker statement; I will show that—you know—I will show separation between 3d and d. And you can
see here you have to encounter same … the same kinds of problems, because—you know—your hard
function is in AC0.
>>: Sorry, did you tell us the map or anything? No.
>> Li-Yang Tan: No. I’m gonna come to that now.
>>: Okay.
>> Li-Yang Tan: Yeah. Hopefully, I didn’t skip … okay, good. Yeah, exactly, I went back to the same
place. Okay, so let me tell you about this projection and why … okay, so it’s designed specifically with
Sipser in mind, right? So you have … your depth-3d Sipser formula is—you know—AND gates at the
bottom; let’s look at the jth AND and some variables—here’s the projection; it’s gonna look a little
weird. Every xi in the jth tribe, I set it either one or yj—so again, this is something you cannot do in
restrictions, right? You just set all the … if you are not set to one, you set it to the same variable, yj, and
again, recall that j is the name of the tribe. So in the jth plus one AND, you either set it one or yj plus
one, okay? And what’s the distribution? Well, independently—we want there to be independent with
probability one half, but we condition on not getting the all ones input. Roughly speaking, why do we
want to do that? We do not want the AND gate to be satisfied—right, if you put down the all ones
input, the AND gates gets satisfied—we want to keep Sipser complex. And here, we see that we have
not put on any zeroes. So for an AND gate, if you do not put down any zeroes, and you do not put down
all ones, you keep it alive. In particular, it’s always the AND of a nonempty subset of inputs, which is
nice, which … but one thing you should be skeptical about is the claim that this completes to the
uniform distribution, because it’s all … it’s … I only put down ones and no zeroes, but roughly speaking,
what’s gonna save me is the fact I’ve groups all this together. So by hitting it with very likely to be zero,
it’s gonna be uniform. But anyway, so this is just one projection. As a standard in these depth hierarchy
theorems—you know—the over-random projection is just doing this over and over again. And if it’s an
OR, do it with the dual distribution with OR gates … with zeroes and yj’s.
Okay, let’s see why this helps us. Have three things; we need to show that this preserves Sipser, but
that’s essentially by definition, I hope. The second is that—you know—we still have to prove that AC0
circuits collapse to a simple function; and a third is that—you know—the restrictions complete to the
uniform distribution. Okay, the first, I claim, is by design, and a three you should be skeptical about. So
one, why does it remain complex? Well, it’s sort of designed with Sipser in mind; so the jth AND you
either map to one and yj, and you’re never all ones—so it’s always the AND of a nonempty subset of yj
variables, and the AND of yj, and yj, and yj is just yj. So this is designed specifically so that—you know—
the AND gate is never killed, and in particular, every y … every AND gate just becomes a new formal
variable, yj. So what happens is you go from depth-3d Sipser over x variables to depth-3d-minus-one
Sipser over y variables very … with probability one, right? So this is easy. Second one is the completion
to uniform, right? Every coordinate is correlated in a way that—you know—you condition on not
getting the all ones input. Okay, and the reaction should be that it does not look uniform at all—that
you only have ones and yj’s—but as you’ll see, like, the key here is that we are grouping the yj’s
together. So this is—I mean, it’s an easy fact, but we were super happy when we found out about it—is
that—you know—if you put down a bunch of ones, you group the yj’s, and then you hit the yj with
something very biased toward zero, then the resulting string is uniform, and this is—I mean—this is not
… this is calculation of the pmf, but—you know—what I really like about this is: allows me to generate a
uniform random string in this two-stage process, where I put down a bunch of ones; I group stars and
put down a bunch of zeroes; and I group stars; and I put down a bunch of ones. So what’s really nice is
that—you know—you’re … the usual way you generate a random string is just go coordinate-bycoordinate and flip a coin; this allows me to just put down a lot of ones first—you know, for my
application—and then—you know—hit the remaining things as a whole. Okay, so informally, rho—
which is very non-uniform—composed with, like, this two to the biased product distribution, pops back
to a uniform distribution, okay? So that’s nice.
So I asked … so I’m still missing one part, but I’ve already given you a … hopefully given you quite a good
picture of what happens to the Sipser formula. You have depth-3d Sipser, its AND gates at the bottom
accessing x variables, and your goal is to prove hardness of approximation with respect to uniform. You
hit it with a random restriction; you go from depth 3d to 3d minus one very nicely; and your job—
instead of proving hardness according to uniform distribution—you prove hardness according to the
two-to-the-w-biased product distribution over your new variables, yj. And again, what’s each yj? Each
of these collapses to a new variable—this collapses to a new variable; this collapses, new variable.
Okay, so what’s nice is—you know—your target remains very structured, and your gold remains very
structured; your goal goes from—you know—product distribution to product distribution to product
distribution. Okay, so the last step—the AC0 collapses to a simple function—I wouldn’t go into detail,
but as a last slide, as you would expect, we had to prove what Håstad proved, but for our random
projections.
So the key in Håstad’s proof was a random restriction argument, showing that—you know—any function
at depth … in AC0, it collapses to a simple function under random restrictions. You would hope that—
you know—under one stage of what I described—just putting down ones and group things into yj, you
know—you collapse by at least one, in which case, we can apply it many, many times. We couldn’t do
that for—at least—for this random projection, but we could prove it if you allow me to hit it three times.
So roughly speaking, this three is why I’ve only sketch a 3d versus d separation—you know, I used three
layers on my target to trade me one layer in my approximator. So—you know—if given—you know—
5d, surely, I can get the contradiction, but with three, I can prove that—you know—I get the collapse.
So with that, I achieve all three. So improving to d versus d minus one is, as you would guess—you
know—we change it from red to orange—you know, a significantly more delicate random projection—
to ensure that we get a collapse in depth, even under just one random projection. Okay? And we … of
course, once you change what your random restriction is … random projection is, you have to ensure
that the two other properties still hold. That—you know—one was somewhat simple in my sketch—you
know, that is, the fact that your target remains complex, which was really very neat in this, you know,
example, but it gets more complicated—and the completion to uniform also. So a big part of the project
was trying to juggle these three balls; we only could get two in the air for like five months; then
somehow, one day, we got all three in the air and were super happy.
Yeah, okay, yeah, so just a summary: we prove an average-case depth hierarchy theorem, which is—you
know—for every d—you know, say d equals hundred—there’s a function that—you know—if you allow
me depth hundred and one, I can compute it at linear size, but if you force me to use depth hundred, if
you allow me—you know—what seems like huge size, my agreement is—you know—less than fifty-one
percent. And I gave two applications; one is that the PH is infinite relative to a random oracle; and a
different application, showing that—you know—there’s no approximate converse to this—you know—
famous and useful LMN theorem. And our main technique, which we’re quite excited about, is this
notion of random projections, which extends—you know—this notion of random restrictions, and it’ll be
very nice to find further applications. So thank you. [applause]
>> Yuval Peres: Any additional questions?
>>: So …
>> Li-Yang Tan: Yeah?
>>: How do you want to save the Linial-Mansour-Nisan thing?
>> Li-Yang Tan: Ah, that’s a great question. I don’t know.
>>: So you want to do something like … so what you show is essentially if it’s polylog(n) in terms of
sensitivity …
>> Li-Yang Tan: Yeah.
>>: … it could still have high complexity—I mean, high …
>> Li-Yang Tan: Right.
>>: … large size.
>> Li-Yang Tan: Exactly.
>>: So do you want to say, maybe, if it’s not …
>> Li-Yang Tan: Yeah.
>>: … it’s smaller than log n, then it cannot have non [indiscernible]
>> Li-Yang Tan: Right, so I … yeah, at a very high level, I like any structural information I can prove about
log(n)-influence functions. And sort of as I sort of touched on, log n is a special number, because
anything lower than log n, we actually have quite strong—by a … not a easy theorem at all, it’s a famous
theorem of Friedgaard, which says that your influence is k, you’re essentially depending only on two to
the k variables. So if you told me a function has influence a hundred, I can tell you, “Oh, it’s not a very
interesting function. You essentially lie in dimension two to the one hundred.” Right? [laughter] It …
you … right, if you told me influence is square root log n—you know—you lie in dimension two to the
square root log n. Where does it break down? If you tell Friedgaard—you know—your influence is log
n, he tells you you are—you know—closer [indiscernible] which doesn’t say much. So log n, I think, is a
special number for me, because I really like to understand the structure of log(n)-influence functions.
And this was very nice, right? BKS, and O’Donnell, and Kalai, and Hatami, they said, “Oh, maybe log(n)influence functions, you can depend on all coordinates, but maybe you are—you know—a simple
circuit,” but it’s not—you’re approximated by a simple circuit—but it’s not true. One way to rescue is …
>>: It’s nontrivial, or it’s not true? ‘Cause polylog—I mean—could be log log ten.
>> Li-Yang Tan: Well, that’s … and even more complex functions, right?
>>: Yeah, so again, what is your result? After what—when he said polylog n—what …?
>> Li-Yang Tan: I see. I have a log(n)-influence function—yeah—I have a log(n)-influence function such
that if you in … if you allow me depth—not just constant—if you allow me depth square root log n, if you
allow me size, not just poly(n), but two to the n to the square root log n, I cannot approximate it.
>>: Okay, so even at log … it’s not polylog(n); it’s at log n.
>> Li-Yang Tan: Right. Exactly, I—again—my counterexample at log n.
>>: Right.
>> Li-Yang Tan: So a way to rescue is to allow me more … allow … broaden the class of circuits beyond
just small-depth circuits. So a conjecture would be that log(n)-influence functions are wellapproximated by poly-size circuits, period. It will be very hard to disprove, because if you disproved it,
you have—you know—separated P from P/poly, NP … yeah, but right? Because in particular, our
function is clearly a poly-size circuit, but …
>>: Wouldn’t it be very surprising if this notion of sensitivity was a tight characterization for some
notion of computational complexity? I mean …
>> Li-Yang Tan: Yeah, yeah, it’d be very nice, but it was … could also be very surprising, yeah. And in
some sense, our results show that it was too much to hope for, right? This was … it would have been
really nice if—you know—every log(n)-influence function is basically just a circuit. If you are a circuit—
low … small-depth circuit—you’re log(n)-influence; if you are log(n)-influence, you’re basically a … but
yeah, exactly as you said, it’s … maybe on hindsight, it was too bold, but yeah, it’s not true. But still, one
can hope for some sort of structure of log(n)-influence functions. I’m not sure, yeah. They’re constantinfluence or—you know—square root log n, we have very good structural information, right?
>> Yuval Peres: Thank you.
>> Li-Yang Tan: Thanks. [applause]
Download