Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan (MSR / Hebrew University) Kunal Talwar (Google) Avi Wigderson (IAS) ITCS 2016 The star of our show: f: {0,1}n {0,1} “Complexity” and “Boolean functions” Complexity measures: combinatorial/analytic ways to “get a handle on how complicated f is” Certificate complexity Decision tree depth (deterministic, randomized, quantum) Sensitivity Block sensitivity PRAM complexity Real Polynomial degree (exact, approximate) … All lie in {0,1,…,n} for n-variable Boolean functions. “Complexity” and “Boolean functions” revisited Complexity classes: computational ways to “get a handle on how complicated f is” Unrestricted circuit size Unrestricted formula size AC0 circuit size DNF size … All lie in {0,1,…,2n} for n-variable Boolean functions. High-level summary of this work: A computational perspective on a classic open question about complexity measures of Boolean functions…. ...namely, the sensitivity conjecture of [NisanSzegedy92]. Background: Complexity measures Certificate complexity Decision tree depth (deterministic, randomized, quantum) Block sensitivity PRAM complexity Real Polynomial degree (exact, approximate) Fundamental result(s) in Boolean function complexity: For any Boolean function f, the above complexity measures are all polynomially related to each other. Examples: DT-depth and real degree • DT-depth(f) = minimum depth of any decision tree computing f DT-depth is 4 1 0 1 0 1 0 1 1 • degR(f) = degree of the unique real multilinear polynomial computing f: {0,1}n {0,1} DT-depth and real degree are polynomially related Theorem: [NisanSzegedy92,NisanSmolensky,Midrijanis04] For any Boolean function f, degR(f) <_ DT-depth(f) <_ 2degR(f)3. (Lower bound is trivial: for each 1-leaf at depth d, have degree-d polynomial outputing 1 iff its input reaches that leaf, else 0. Sum these.) Polynomial for this leaf: x1x2(1-x4)x3 1 0 1 0 1 0 1 1 An outlier among complexity measures: Sensitivity • s(f,x) = sensitivity of f at x = number of neighbors y of x such that f(x) =/ f(y) • s(f) = (max) sensitivity of f = max of s(f,x) over all x in {0,1}n Folklore: s(f) <_ DT-depth(f). 1 0 1 0 1 0 Question: [Nisan91,NisanSzegedy92] Is DT-depth(f) <_ poly(s(f))? 1 1 The sensitivity conjecture Conjecture: DT-depth(f) <_ poly(s(f)). Equivalently, block sensitivity, certificate complexity, real degree, approximate degree, randomized/quantum DT-depth… Despite much effort, best known upper bounds are exponential in sensitivity: • [Simon82]: # relevant variables <_ s(f)4s(f) • [KenyonKutin04]: bs(f) <_ es(f) • [AmbainisBavarianGaoMaoSunZuo14]: bs(f),C(f) _< s(f)2s(f)-1, deg(f) _< 2s(f)(1+o(1)) • [AmbainisPrusisVihrovs15]: bs(f) _< (s(f)-1/3)2s(f)-1 This work: Computational view on the sensitivity conjecture Conjecture: DT-depth(f) <_ poly(s(f)). Previous approaches: combinatorial/analytic But conjecture also is a strong computational statement: low-sensitivity functions are very easy to compute! Conjecture: DT-depth(f) <_ poly(s(f)). Conjecture implies Prior to this work, even seems not to have been known. In fact, even an upper bound on the # of sensitivity-s functions seems not to have been known. Results Theorem: Every n-variable sensitivity-s function is computed by a Boolean circuit of size nO(s). In fact, every such function is computed by a Boolean formula of depth O(s log(n)). So now the picture is ? ? ? ? Results (continued) Circuit/formula size bounds are consequences of the conjecture. Another consequence of the conjecture: Theorem: Any n-variable sensitivity-s function can be selfcorrected from 2-cs-fraction of worst-case errors using nO(s) queries and runtime. (Conjecture low-sensitivity f has low degR has low deg2 has self-corrector) All results are fairly easy. (Lots of directions for future work!) Simple but crucial insight Fact: If f has sensitivity s, then f(x) is completely determined once you know f’s value on 2s+1 neighbors of x. f(x)=? 2s+1 neighbors x f(x)=1 … neighbors where f=0 …………..… neighbors where f=1 Either have at least s+1 many 0-neighbors or at least s+1 many 1-neighbors. The value of f(x) must equal this majority value! (If it disagreed, would have s(f) >_ s(f,x) >_ s+1.) Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors ? weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors ? weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors ? weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n weight level 2s+1; each point here has 2s+1 down-neighbors weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n becomes etc; weight level 2s+1; each point here has 2s+1 down-neighbors weight levels 0,…,2s 0n Theorem: Every sensitivity-s function on n variables is uniquely specified by its values on any Hamming ball of radius 2s. 1n 1n becomes etc; weight level 2s+1; each point here has 2s+1 down-neighbors weight levels 0,…,2s 0n weight levels 0,…,2s+1 0n Fill in all of {0,1}n this way, level by level. _ Corollary: There are at most 2{n choose <2s} sensitivity-s functions over {0,1}n. Can we use this insight to compute sensitivity-s functions efficiently? Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n. x Algorithm: For |x| stages, • Shift center of Hamming ball along shortest path to x • Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote) Compute at most n2s new values next center 0n = first center Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n. x Algorithm: For |x| stages, • • Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote) Compute at most n2s new values next center Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n. x Algorithm: For |x| stages, • • Shift center of Hamming ball along shortest path to x Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote) Compute at most n2s new values next center Small circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a circuit of size O(sn2s+1). Algorithm has value of f on bottom 2s+1 layers “hard-coded” in. Bottom 2s+1 layers Hamming ball centered at 0n. x Algorithm: For |x| stages, • Shift center of Hamming ball along shortest path to x • Use values of f on previous Hamming ball to compute values on new ball (at most n2s new values to compute; each one easy using majority vote) Shallow circuits for sensitivity-s functions? The algorithm we just saw seems inherently sequential – takes n stages. Can we parallelize? Yes, by being bolder: go n/s levels at each stage rather than one. Extension of earlier key insight Sensitivity-s functions are noise-stable at every input x. • Pick any vertex x. • Flip n/(11s) random coordinates to get y. • View t-th coordinate flipped as chosen from ‘untouched’ n-t+1 coordinates. At each stage, at most s coordinates are sensitive. Get Pr[f(x) =/ f(y)] <_ Pr[stage 1 flips f] + Pr[stage 2 flips f] + … <_ s/n + s/(n-1) + … + s/(n – n/11s + 1) _< 1/10. Downward walks Similar statement holds for “random downward walks.” • Pick any vertex x with |x| many ones. • Flip |x|/(11s) randomly chosen 1’s to 0’s to get y. • View t-th coordinate flipped as chosen from ‘untouched’ |x|-t+1 coords. Get Pr[f(x) =/ f(y)] <_ s/|x| + s/(|x|-1) + … + s/(|x| – |x|/11s + 1) <_ 1/10. Shallow circuits for sensitivity-s functions Theorem: Every n-variable sensitivity-s function has a formula of depth O(s log n). Algorithm has value of f on bottom 11s layers “hard-coded” in. x Parallel-Alg: Given x • If |x| <_ 11s, return hard-coded value. • Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one. • Return majority vote of the C results. x1 x2 …….. xC weight levels 0,…,11s 0n Algorithm has value of f on bottom 10s layers “hard-coded” in. Parallel-Alg: Given x • If |x| <_ 11s, return hard-coded value. • Sample C=O(1) points x1, x2, xC from “downward random walk” of length |x|/11s. Call Parallel-Alg on each one. • Return majority vote of the C results. x x1 x2….... xC weight levels 0,…,11s 0n • Have Parallel-Alg(x) = f(x) with probability 19/20 for all x (proof: induction on |x|) • After O(s log n) stages, bottoms out in “red zone”, so parallel runtime is O(s log n) • C=O(1), so total work is CO(s log n) = nO(s) Conclusion / Questions Many questions remain about computational properties of low-sensitivity functions. We saw there are at most 2{n choose <2s} many sensitivity-s functions. Can this bound be sharpened? We saw every sensitivity-s function has a formula of depth O(s log n). Does every such function have a • TC0 circuit / AC0 circuit / DNF / decision tree of size npoly(s)? • PTF of degree poly(s)? DNF of width poly(s)? GF(2) polynomial of degree poly(s)? A closing puzzle/request We saw sensitivity-s functions obey a “majority rule” (MAJ of any 2s+1 neighbors). Well-known that degree-d functions obey a “parity rule” (PAR over any (d+1)dim subcube must = 0). If the conjecture is true, then low-sensitivity functions have low degree… …and these two very different-looking rules must coincide! Explain this! Thank you for your attention 36