Lecture 8

advertisement
CSE 291: Analysis of Polynomial Threshold Functions
Spring 2015
Lecture 8: Pseudorandom Generators for PTFs; Noise Sensitivity
Date:
Instructor: Danie Kane
1
Apr. 28
Recap
Last time we have covered regularity lemma.
Lemma 1
n
(Regularity Lemma). Any polynomial p that takes input from {−1, +1} can be written as a
decision tree of depth O( τ1 (ln τ1 )O(d) ), with leaves as polynomials pρ , such that the following holds. With
probability 1 − over a random leaf, the associated polynomial pρ is either (1) τ -regular, or (2) var(pρ ) ≤
kpρ k22 .
Remark.
In the latter case,
pρ
can be thought of as a large constant plus small variations, where its sign
stays constant (-1 or +1).
We have also dened pseudorandom generators(PRGs) of PTFs, which will be the main focus of this
lecture. Formally, our goal is to explicitly construct a simple (low entropy) random variable
any degree-d PTF over
n
Y,
so that for
variables,
|Ef (Y ) − EB∈U {±1}n f (B)| ≤ As we have seen, there is an implicit construction with seedlength as small as
(1)
O(d ln n + ln 1 ),
but this
requires a computationally inecient random sampling.
2
The Construction of Meka-Zuckerman PRG
The main tool we will be using is
k -wise
independence.
Denition 1. A sequence of random variables
independent.
Note that
(W1 , . . . , Wn ) are k -wise independent if any k of them are
d-wise independent random variables fools in expectation all degree-d polynomials.
But fooling
degree-d PTFs appears to be a harder task. We begin with a standard fact.
Fact 1. We can generate a set of k-wise independent random variables (W1 , . . . , Wn ), with Wi ∈U
from a seed of length O(k ln(nm)).
{1, 2, . . . , m},
F : [n] → [t] where (F (1), . . . , F (n)) are
O(2 ln(nt)). Next, we repeatedly use Fact 1 t times, to build
n
independent random vector Z1 , . . . , Zt ∈ {±1} , where each Zi is a random vector whose n coordinates are
k -wise independent. This step requires a seed length of t · O(k ln(2n)). We put each Zi as a row vector and
The construction is as follows.
We rst use Fact 1 to build
2-wise independent, requiring a seedlength of
concatenate them vertically to an array:

Z1,1 ,
 Z2,1 ,
Zt,1 ,
Z1,2 , . . . ,
Z2,2 , . . . ,
Zt,2 , . . . ,
Z1,n
Z2,n
Zt,n


Our pseudorandom number generated is dened as:
(Y1 , . . . , Yn ) = (ZF (1),1 , ZF (2),2 , . . . , ZF (n),n )
In the next section, we claim that with decent size of
k
and
t,
Equation (1) can be established.
3
Idea of Analysis
Step 1: Replacement Method Conditioned on F .
without loss of generality, assume
written as a degree-d polynomial
f (·) =
sign(pF (·)).
First we x F . For any degree-d polynomial p,
kpk2 = 1, since scaling does not change the sign. p(Y1 , . . . , Yn ) can be
pF of Z1 , . . . Zt (therefore there are nt − n dummy variables). Consider
The high level idea is to use replacement method, as we have seen in the invariance
principle. In particular, we are going to show
where
Ef (Z1 , . . . , Zt ) ≈ Ef (G1 , . . . , Gt )
(2)
Ef (B1 , . . . , Bt ) ≈ Ef (G1 , . . . , Gt )
(3)
(G1 , . . . , Gt ) are independent standard Gaussians, and (B1 , . . . , Bt ) are independent uniform Bernoullis.
We start with Equation (2).
Z = (Z1 , . . . , Zi−1 ), B = (B1 , . . . , Bi−1 ) and G = (Gi+1 , . . . , Gt ). With
ρ+ , ρ− : R → [−1, +1] such that ρ+ (x) = ρ− (x) = sign(x) except for
O((/d)d ) and ρ− (x) ≤ sign(x) ≤ ρ+ (x), with kρ(4) k∞ ≤ O((/d)−4d ). Consider
For notational simplicity, Let
foresight, we nd smooth functions
a small interval of length
ρ ∈ {ρ+ , ρ− }.
Our goal now comes down to bounding
E[ρ(p(Z, G, Zi )) − ρ(p(Z, G, Gi ))]
Now consider
W = Zi
or
W = Gi .
Let
p0 (Z, G, W ) = EW p(Z, G, W ). Since p is multilinear, the conditional
W simply become a dummy variable in p0 . A Taylor expansion
expectation is the same in either case and
of
ρ
yields
Eρ(p(Z, G, W )) = E[
If we set
k = 4d,
polynomial of degree 3 in
p(Z, G, W ) ] + O(kρ(4) k∞ E(p(Z, G, W ) − p0 (Z, G, W ))4 )
(Z, G, W ) are 4d-wise independent. Consequently the
(Z, G, W ) are 3d-wise independent.
(p(Z, G, W ) − p0 (Z, G, W ))4 is a polynoimal of degree 4d, Zi 's can be
then and the variables among the set
rst terms are the same in both cases, since the variables among the set
Now consider the second term. Since
safely replaced with
Bi 's
when computing the expectation.
In either case, by hypercontractivity (over a hybrid of Gaussians and Bernoullis),
E(p(B, G, W ) − p0 (B, G, W ))4 ≤ 2O(d) (E(p(B, G, W ) − p0 (B, G, W ))2 )2
We expand
p
in Fourier domain:
p(x) =
X
p̂(S)xS
S⊆[n]
Note that
of
p
F
determines a partition of
[n];
for example, if we replace
are aected. We call The set of variable
the ith bucket
this notation, it can be seen that
X
p0 (x) =
p̂(S)xS
S⊆[n]:S∩B(i)=∅
Therefore,
=
E|p(X) − p0 (X)|2
X
p̂(S)2
S⊆[n]:S∩B(i)6=∅
≤
X
X
p̂(S)2
S⊆[n] j∈S∩B(i)
=
X
X
j∈B(i) S∈[n]:j∈S
=
X
j∈B(i)
Infj (p)
Zi
with
with respect to
p̂(S)2
Gi , only a subset of arguments
F , abbreviated as B(i). Using
??) can be bounded as
As a result, the total sum of each individual term in (
E[f (Z1 , . . . , Zt ) − f (G1 , . . . , Gt )]| ≤ O() + O((/d)
−4d
)
t
X
X
2O(d) (
i=1
Infj (p))
j∈B(i)
ρ+ (·) (ρ− (·)) with sign(·) using Carbery-Wright,
E[ρ(p(Z, G, Zi )) − ρ(p(Z, G, Gi ))], as we have just shown above.
where the rst term comes from replacing
comes from the bound of
2
the second term
Similarly,
E[f (B1 , . . . , Bt ) − f (G1 , . . . , Gt )]| ≤ O() + O((/d)−4d )
t
X
2O(d) (
i=1
X
Infj (p))
2
j∈B(i)
Therefore:
E[f (B1 , . . . , Bt ) − f (Z1 , . . . , Zt )]| ≤ O() + O((/d)−4d )
t
X
i=1
Step 2: Averaging Over F .
X
2O(d) (
Infj (p))
2
(4)
j∈B(i)
Now, taking the expectation over the random choice of
F
on Equation (4),
the second term can be bounded as follows:
t
X
X
O (/d)−4d · EF [
2O(d) (
i=1
2
]
j∈B(i)
X
O (/d)−4d 2O(d) · EF [
≤
Infj (p))
Infj (p)Infk (p)]
j,k∈[n]:F (j)=F (k)
O (/d)−4d 2O(d) · EF [
≤
n
X
2
Infj (p)
+
j=1
where the rst inequality follows from the denition of
dence in each of
F 's
n
1 X
t
Infj (p)Infk (p)]
j,k=1
B(·),
the second inequality is by the
2-wise
indepen-
coordinates. To summarize, the error is bounded by
n
n
X
1 X
2
O() + O (/d)−4d 2O(d) · EF [
Infj (p) +
t
j=1
Infj (p)Infk (p)]
(5)
j,k=1
Let
τ = (maxj Infj (p))/(
P
j Infj (p)) be the regularity parameter of
n
X
Infj (p)
2
≤τ
j=1
In the meantime,
n
1 X
t
j,k=1
Infj (p)Infk (p)
n
X
≤
≤ τ d · var(p) ≤ τ d
n
1 X
(d · var(p))2
d2
2
(
Infj (p)) ≤
≤
t j=1
t
t
At this point, it may be tempting to set
t = O( 1 ( d )−4d ), which lets us conclude
this bound on τ may not be true.
and
But in general
D = Õ(τ0−1 )
such that with probability
τ = τ0 =
O().
the total expected error (5) is bounded by
Fortunately there is a quick x: apply regularity lemma on
tree of depth
Then
j=1
Step 3: Applying the Regularity Lemma.
O(( d )4d+1 )
Infj (p)
p.
1 − ,
p.
Essentially,
p
can be written as a decision
a random leaf is either (1)
τ0 -regular,
or (2)
a constant plus small variation term.
after
D
Now we modify our setting of
k
from
4d
to
D + 4d,
ensuring even
levels of variable conditioning along the path of the tree, the remaining variables are still
4d-wise
independent. In case (1), the previous result now can be applied to ensure the expected error on each leaf is
at most
.
In case (2), the sign of
p
over this leaf is a constant, thus has constant sign (-1 or +1). Applying
the previous result to every leaf and averaging let us conclude the result.
Seedlength.
In summary,
t = O( 1 ( d )−4d )
and
k = Õ(( d )−(4d+1) ).
Therefore the total number of
seedlength is
O(tk ln(2n)) + O(2 ln(nt)) = O
1 d
d
1
( )8d (ln )O(d) ln n = Õ ( )O(d) ln n
We emphasize that this is still a nontrivial PRG, since its seedlength is
Remark.
O(ln n).
Od (−12 ln n) for PTFs, although for LTFs, a construction with
n
n
seedlengh O(ln
ln ln ) has been shown. As of PRG for Gaussians, we can do a lot better: the best results
−c
so far are Od,c (
ln n) for arbitrary c > 0 and O(2O(d) −5 ln n). A recent result of seedlength polylog( n )
for
4
d=2
The state of the art right now is
has been shown.
Noise Sensitivity
Bernoulli and Gaussian Noise Sensitivity.
sensitivity of
f
Consider a boolean function
measures how likely small changes with input to
f
Denition 2. NS (f ), the noise sensitivity of f , is dened as
NS (f ) = Pr (f (X) 6= f (Y ))
(X,Y )
where (X, Y ) are (1 − )-correlated Bernoullis.
GNS (f ), the Gaussian noise sensitivity of f , is dened as
GNS (f ) = Pr (f (X) 6= f (Y ))
(X,Y )
where (X, Y ) are (1 − )-correlated Gaussians.
It is instructive to look at NS (f ) in Fourier domain. In particular,
=
=
=
=
=
1 − Ef (X)f (Y )
2
1 − Stab1− (f )
2
1 − Ef (X)(T1− f )(X)
2
P
1 − S⊆[n] fˆ(S)2 (1 − )|S|
X
S⊆[n]
The noise
leads to small changes of output. This is
opposite to the notion of stability we have seen.
N S (f )
f : Rn → {±1}.
2
1
−
(1 − )|S|
fˆ(S)2
2
f
Roughly, if
Tρ
has large high degree Fourier coecients, NS (f ) is likely to be high. Similar to the operator
we have seen in Bernoulli case, we can dene operator
is the function from
where
of
X.
Y
is
R
to
ρ-correlated
R
Uρ
in Gaussian case. Formally, for
with
X,
that
(Uρ f )(x) = E[f (Y )|X = x]
p
1 − ρ2 Z where Z
is, Y = ρX +
is a standard Gaussian independent
What does this operator do in the Fourier domain?
Lemma 2. For 0 < ρ ≤ 1, function f that has Fourier expansion f
Uρ f =
X
a
Specically,
Proof.
0 ≤ ρ ≤ 1, Uρ f
such that
=
a ca ha ,
P
we have
ρkak1 ca ha
Uρ ha = ρkak1 ha
First we show that
({Ue−s : s ≥ 0}, ◦)
is a semigroup. To check associativity, it suces to show
Ue−s Ue−t = Ue−(s+t)
This follows from straightforward calculations:
Ue−s [(Ue−t f )(x)]
= Ue−s [EA∼N (0,I) f (e−t x +
p
1 − e−2t A)]
p
p
= EA∼N (0,I),B∼N (0,I) f (e−s−t x + e−s 1 − e−2t A + 1 − e−2s B)
p
= EN ∼N (0,I) f (e−(s+t) x + 1 − e−2(s+t) N )
=
(Ue−(s+t) f )(x)
A, in the second inequality we introduce a
B independent
of A.
P
P
Now consider f =
a ca ha . We would like to nd the representation of gt = Ue−t f =
a ca (t)ha . Note
that Ue−0 f = f , thus, in this notation, ca (0) = ca . We take derivative of gt with respect to t. First note that
d
d
d
d
d
gt = Ue−t f =
Ue−(s+t) f =
Ue−s (Ue−t f )
=
Ue−s gt dt
dt
ds
ds
ds
s=0
s=0
s=0
where in the rst equality we introduce a standard Gaussian
standard Gaussian
Then
(Ue−s gt )(x)
=
=
=
=
=
EY g(e−s x +
p
1 − e−2s Y )
p
EY gt ((1 − s + O(s2 ))x + 2s + O(s2 )Y )
√
EY gt (x − sx + 2sY + O(s3/2 ))
√
1 X ∂2g
EY gt (x) + ∇gt (x) · (−sx + 2sY ) +
2sYi Yj + O(s3/2 )
2 i,j ∂xi ∂xj
gt (x) + ∇gt (x) · (−sx) + s∇2 gt + o(s3/2 )
Therefore,
d
Ue−s gt = −∇gt + ∇2 gt = Lgt
ds
s=0
L is the dierential operator we have seen in the alternative denition of Hermite polynomials.
Lha = −kak1 ha . Hence
X
d
gt = Lgt = −
ca (t)kak1 ha
dt
a
where
that
Recall
On the other hand,
X
d
gt = Lgt =
c0a (t)ha
dt
a
c0a (t) = −kak1 ca (t). In conjunction with the initial condition ca (0) = ca ,
all t ≥ 0,
X
Ue−t f =
ca e−tkak1 ha
By uniqueness of Fourier expansion,
we get
ca (t) = ca e
−kakt
. Thus for
a
That is,
Uρ f =
X
a
ca ρkak1 ha
Using the above lemma, we see that exactly analogous to the Bernoulli case,
GNS (f )
Average Sensitivity.
=
1 − Ef (X)U(1−) f (X) X ˆ 1 − (1 − )kak1
=
f (a)
2
2
a
The notion of average sensitivity measures the total inuence of coordinates.
particular,
AS(f )
=
n
X
Infi (f )
In
= n Pr(f (X) 6= f (X 0 ))
i=1
where
X
is a uniform Bernoulli random variable,
Gaussian Surface Area.
area,
Γ(f )
Consider
X 0 diers from X
f : Rn → {±1}. S = {x ∈ Rn : f (x) = +1}.
The Gaussian surface
is dened as:
Γ(f ) = lim
Pr(X : X
is within
We expect nice surfaces of
∂S (e.g. f
Z
Γ(f ) =
φ(x)dσ
∂S
is the Gaussian pdf.
Euclidean distance of
∂S)
is a PTF). In this case, a equivalent denition is through integral
over the surface area:
φ(x)
ε
2ε
ε→0
where
on one single randomly chosen coordinate.
Download