CSE 291: Analysis of Polynomial Threshold Functions Spring 2015 Lecture 8: Pseudorandom Generators for PTFs; Noise Sensitivity Date: Instructor: Danie Kane 1 Apr. 28 Recap Last time we have covered regularity lemma. Lemma 1 n (Regularity Lemma). Any polynomial p that takes input from {−1, +1} can be written as a decision tree of depth O( τ1 (ln τ1 )O(d) ), with leaves as polynomials pρ , such that the following holds. With probability 1 − over a random leaf, the associated polynomial pρ is either (1) τ -regular, or (2) var(pρ ) ≤ kpρ k22 . Remark. In the latter case, pρ can be thought of as a large constant plus small variations, where its sign stays constant (-1 or +1). We have also dened pseudorandom generators(PRGs) of PTFs, which will be the main focus of this lecture. Formally, our goal is to explicitly construct a simple (low entropy) random variable any degree-d PTF over n Y, so that for variables, |Ef (Y ) − EB∈U {±1}n f (B)| ≤ As we have seen, there is an implicit construction with seedlength as small as (1) O(d ln n + ln 1 ), but this requires a computationally inecient random sampling. 2 The Construction of Meka-Zuckerman PRG The main tool we will be using is k -wise independence. Denition 1. A sequence of random variables independent. Note that (W1 , . . . , Wn ) are k -wise independent if any k of them are d-wise independent random variables fools in expectation all degree-d polynomials. But fooling degree-d PTFs appears to be a harder task. We begin with a standard fact. Fact 1. We can generate a set of k-wise independent random variables (W1 , . . . , Wn ), with Wi ∈U from a seed of length O(k ln(nm)). {1, 2, . . . , m}, F : [n] → [t] where (F (1), . . . , F (n)) are O(2 ln(nt)). Next, we repeatedly use Fact 1 t times, to build n independent random vector Z1 , . . . , Zt ∈ {±1} , where each Zi is a random vector whose n coordinates are k -wise independent. This step requires a seed length of t · O(k ln(2n)). We put each Zi as a row vector and The construction is as follows. We rst use Fact 1 to build 2-wise independent, requiring a seedlength of concatenate them vertically to an array: Z1,1 , Z2,1 , Zt,1 , Z1,2 , . . . , Z2,2 , . . . , Zt,2 , . . . , Z1,n Z2,n Zt,n Our pseudorandom number generated is dened as: (Y1 , . . . , Yn ) = (ZF (1),1 , ZF (2),2 , . . . , ZF (n),n ) In the next section, we claim that with decent size of k and t, Equation (1) can be established. 3 Idea of Analysis Step 1: Replacement Method Conditioned on F . without loss of generality, assume written as a degree-d polynomial f (·) = sign(pF (·)). First we x F . For any degree-d polynomial p, kpk2 = 1, since scaling does not change the sign. p(Y1 , . . . , Yn ) can be pF of Z1 , . . . Zt (therefore there are nt − n dummy variables). Consider The high level idea is to use replacement method, as we have seen in the invariance principle. In particular, we are going to show where Ef (Z1 , . . . , Zt ) ≈ Ef (G1 , . . . , Gt ) (2) Ef (B1 , . . . , Bt ) ≈ Ef (G1 , . . . , Gt ) (3) (G1 , . . . , Gt ) are independent standard Gaussians, and (B1 , . . . , Bt ) are independent uniform Bernoullis. We start with Equation (2). Z = (Z1 , . . . , Zi−1 ), B = (B1 , . . . , Bi−1 ) and G = (Gi+1 , . . . , Gt ). With ρ+ , ρ− : R → [−1, +1] such that ρ+ (x) = ρ− (x) = sign(x) except for O((/d)d ) and ρ− (x) ≤ sign(x) ≤ ρ+ (x), with kρ(4) k∞ ≤ O((/d)−4d ). Consider For notational simplicity, Let foresight, we nd smooth functions a small interval of length ρ ∈ {ρ+ , ρ− }. Our goal now comes down to bounding E[ρ(p(Z, G, Zi )) − ρ(p(Z, G, Gi ))] Now consider W = Zi or W = Gi . Let p0 (Z, G, W ) = EW p(Z, G, W ). Since p is multilinear, the conditional W simply become a dummy variable in p0 . A Taylor expansion expectation is the same in either case and of ρ yields Eρ(p(Z, G, W )) = E[ If we set k = 4d, polynomial of degree 3 in p(Z, G, W ) ] + O(kρ(4) k∞ E(p(Z, G, W ) − p0 (Z, G, W ))4 ) (Z, G, W ) are 4d-wise independent. Consequently the (Z, G, W ) are 3d-wise independent. (p(Z, G, W ) − p0 (Z, G, W ))4 is a polynoimal of degree 4d, Zi 's can be then and the variables among the set rst terms are the same in both cases, since the variables among the set Now consider the second term. Since safely replaced with Bi 's when computing the expectation. In either case, by hypercontractivity (over a hybrid of Gaussians and Bernoullis), E(p(B, G, W ) − p0 (B, G, W ))4 ≤ 2O(d) (E(p(B, G, W ) − p0 (B, G, W ))2 )2 We expand p in Fourier domain: p(x) = X p̂(S)xS S⊆[n] Note that of p F determines a partition of [n]; for example, if we replace are aected. We call The set of variable the ith bucket this notation, it can be seen that X p0 (x) = p̂(S)xS S⊆[n]:S∩B(i)=∅ Therefore, = E|p(X) − p0 (X)|2 X p̂(S)2 S⊆[n]:S∩B(i)6=∅ ≤ X X p̂(S)2 S⊆[n] j∈S∩B(i) = X X j∈B(i) S∈[n]:j∈S = X j∈B(i) Infj (p) Zi with with respect to p̂(S)2 Gi , only a subset of arguments F , abbreviated as B(i). Using ??) can be bounded as As a result, the total sum of each individual term in ( E[f (Z1 , . . . , Zt ) − f (G1 , . . . , Gt )]| ≤ O() + O((/d) −4d ) t X X 2O(d) ( i=1 Infj (p)) j∈B(i) ρ+ (·) (ρ− (·)) with sign(·) using Carbery-Wright, E[ρ(p(Z, G, Zi )) − ρ(p(Z, G, Gi ))], as we have just shown above. where the rst term comes from replacing comes from the bound of 2 the second term Similarly, E[f (B1 , . . . , Bt ) − f (G1 , . . . , Gt )]| ≤ O() + O((/d)−4d ) t X 2O(d) ( i=1 X Infj (p)) 2 j∈B(i) Therefore: E[f (B1 , . . . , Bt ) − f (Z1 , . . . , Zt )]| ≤ O() + O((/d)−4d ) t X i=1 Step 2: Averaging Over F . X 2O(d) ( Infj (p)) 2 (4) j∈B(i) Now, taking the expectation over the random choice of F on Equation (4), the second term can be bounded as follows: t X X O (/d)−4d · EF [ 2O(d) ( i=1 2 ] j∈B(i) X O (/d)−4d 2O(d) · EF [ ≤ Infj (p)) Infj (p)Infk (p)] j,k∈[n]:F (j)=F (k) O (/d)−4d 2O(d) · EF [ ≤ n X 2 Infj (p) + j=1 where the rst inequality follows from the denition of dence in each of F 's n 1 X t Infj (p)Infk (p)] j,k=1 B(·), the second inequality is by the 2-wise indepen- coordinates. To summarize, the error is bounded by n n X 1 X 2 O() + O (/d)−4d 2O(d) · EF [ Infj (p) + t j=1 Infj (p)Infk (p)] (5) j,k=1 Let τ = (maxj Infj (p))/( P j Infj (p)) be the regularity parameter of n X Infj (p) 2 ≤τ j=1 In the meantime, n 1 X t j,k=1 Infj (p)Infk (p) n X ≤ ≤ τ d · var(p) ≤ τ d n 1 X (d · var(p))2 d2 2 ( Infj (p)) ≤ ≤ t j=1 t t At this point, it may be tempting to set t = O( 1 ( d )−4d ), which lets us conclude this bound on τ may not be true. and But in general D = Õ(τ0−1 ) such that with probability τ = τ0 = O(). the total expected error (5) is bounded by Fortunately there is a quick x: apply regularity lemma on tree of depth Then j=1 Step 3: Applying the Regularity Lemma. O(( d )4d+1 ) Infj (p) p. 1 − , p. Essentially, p can be written as a decision a random leaf is either (1) τ0 -regular, or (2) a constant plus small variation term. after D Now we modify our setting of k from 4d to D + 4d, ensuring even levels of variable conditioning along the path of the tree, the remaining variables are still 4d-wise independent. In case (1), the previous result now can be applied to ensure the expected error on each leaf is at most . In case (2), the sign of p over this leaf is a constant, thus has constant sign (-1 or +1). Applying the previous result to every leaf and averaging let us conclude the result. Seedlength. In summary, t = O( 1 ( d )−4d ) and k = Õ(( d )−(4d+1) ). Therefore the total number of seedlength is O(tk ln(2n)) + O(2 ln(nt)) = O 1 d d 1 ( )8d (ln )O(d) ln n = Õ ( )O(d) ln n We emphasize that this is still a nontrivial PRG, since its seedlength is Remark. O(ln n). Od (−12 ln n) for PTFs, although for LTFs, a construction with n n seedlengh O(ln ln ln ) has been shown. As of PRG for Gaussians, we can do a lot better: the best results −c so far are Od,c ( ln n) for arbitrary c > 0 and O(2O(d) −5 ln n). A recent result of seedlength polylog( n ) for 4 d=2 The state of the art right now is has been shown. Noise Sensitivity Bernoulli and Gaussian Noise Sensitivity. sensitivity of f Consider a boolean function measures how likely small changes with input to f Denition 2. NS (f ), the noise sensitivity of f , is dened as NS (f ) = Pr (f (X) 6= f (Y )) (X,Y ) where (X, Y ) are (1 − )-correlated Bernoullis. GNS (f ), the Gaussian noise sensitivity of f , is dened as GNS (f ) = Pr (f (X) 6= f (Y )) (X,Y ) where (X, Y ) are (1 − )-correlated Gaussians. It is instructive to look at NS (f ) in Fourier domain. In particular, = = = = = 1 − Ef (X)f (Y ) 2 1 − Stab1− (f ) 2 1 − Ef (X)(T1− f )(X) 2 P 1 − S⊆[n] fˆ(S)2 (1 − )|S| X S⊆[n] The noise leads to small changes of output. This is opposite to the notion of stability we have seen. N S (f ) f : Rn → {±1}. 2 1 − (1 − )|S| fˆ(S)2 2 f Roughly, if Tρ has large high degree Fourier coecients, NS (f ) is likely to be high. Similar to the operator we have seen in Bernoulli case, we can dene operator is the function from where of X. Y is R to ρ-correlated R Uρ in Gaussian case. Formally, for with X, that (Uρ f )(x) = E[f (Y )|X = x] p 1 − ρ2 Z where Z is, Y = ρX + is a standard Gaussian independent What does this operator do in the Fourier domain? Lemma 2. For 0 < ρ ≤ 1, function f that has Fourier expansion f Uρ f = X a Specically, Proof. 0 ≤ ρ ≤ 1, Uρ f such that = a ca ha , P we have ρkak1 ca ha Uρ ha = ρkak1 ha First we show that ({Ue−s : s ≥ 0}, ◦) is a semigroup. To check associativity, it suces to show Ue−s Ue−t = Ue−(s+t) This follows from straightforward calculations: Ue−s [(Ue−t f )(x)] = Ue−s [EA∼N (0,I) f (e−t x + p 1 − e−2t A)] p p = EA∼N (0,I),B∼N (0,I) f (e−s−t x + e−s 1 − e−2t A + 1 − e−2s B) p = EN ∼N (0,I) f (e−(s+t) x + 1 − e−2(s+t) N ) = (Ue−(s+t) f )(x) A, in the second inequality we introduce a B independent of A. P P Now consider f = a ca ha . We would like to nd the representation of gt = Ue−t f = a ca (t)ha . Note that Ue−0 f = f , thus, in this notation, ca (0) = ca . We take derivative of gt with respect to t. First note that d d d d d gt = Ue−t f = Ue−(s+t) f = Ue−s (Ue−t f ) = Ue−s gt dt dt ds ds ds s=0 s=0 s=0 where in the rst equality we introduce a standard Gaussian standard Gaussian Then (Ue−s gt )(x) = = = = = EY g(e−s x + p 1 − e−2s Y ) p EY gt ((1 − s + O(s2 ))x + 2s + O(s2 )Y ) √ EY gt (x − sx + 2sY + O(s3/2 )) √ 1 X ∂2g EY gt (x) + ∇gt (x) · (−sx + 2sY ) + 2sYi Yj + O(s3/2 ) 2 i,j ∂xi ∂xj gt (x) + ∇gt (x) · (−sx) + s∇2 gt + o(s3/2 ) Therefore, d Ue−s gt = −∇gt + ∇2 gt = Lgt ds s=0 L is the dierential operator we have seen in the alternative denition of Hermite polynomials. Lha = −kak1 ha . Hence X d gt = Lgt = − ca (t)kak1 ha dt a where that Recall On the other hand, X d gt = Lgt = c0a (t)ha dt a c0a (t) = −kak1 ca (t). In conjunction with the initial condition ca (0) = ca , all t ≥ 0, X Ue−t f = ca e−tkak1 ha By uniqueness of Fourier expansion, we get ca (t) = ca e −kakt . Thus for a That is, Uρ f = X a ca ρkak1 ha Using the above lemma, we see that exactly analogous to the Bernoulli case, GNS (f ) Average Sensitivity. = 1 − Ef (X)U(1−) f (X) X ˆ 1 − (1 − )kak1 = f (a) 2 2 a The notion of average sensitivity measures the total inuence of coordinates. particular, AS(f ) = n X Infi (f ) In = n Pr(f (X) 6= f (X 0 )) i=1 where X is a uniform Bernoulli random variable, Gaussian Surface Area. area, Γ(f ) Consider X 0 diers from X f : Rn → {±1}. S = {x ∈ Rn : f (x) = +1}. The Gaussian surface is dened as: Γ(f ) = lim Pr(X : X is within We expect nice surfaces of ∂S (e.g. f Z Γ(f ) = φ(x)dσ ∂S is the Gaussian pdf. Euclidean distance of ∂S) is a PTF). In this case, a equivalent denition is through integral over the surface area: φ(x) ε 2ε ε→0 where on one single randomly chosen coordinate.