slides

advertisement
Smooth Boolean Functions are Easy:
Efficient Algorithms for Low-Sensitivity Functions
Rocco Servedio
Joint work with
Parikshit Gopalan (MSR)
Noam Nisan (MSR / Hebrew University)
Kunal Talwar (Google)
Avi Wigderson (IAS)
ITCS 2016
The star of our show:
f: {0,1}n  {0,1}
“Complexity” and “Boolean functions”
Complexity measures: combinatorial/analytic ways to
“get a handle on how complicated f is”
 Certificate complexity
 Decision tree depth (deterministic, randomized, quantum)
 Sensitivity
 Block sensitivity
 PRAM complexity
 Real Polynomial degree (exact, approximate)
 …
All lie in {0,1,…,n} for n-variable Boolean functions.
“Complexity” and “Boolean functions” revisited
Complexity classes: computational ways to “get a handle
on how complicated f is”
 Unrestricted circuit size
 Unrestricted formula size
 AC0 circuit size
 DNF size
 …
All lie in {0,1,…,2n} for n-variable Boolean functions.
High-level summary of this work:
A computational perspective on a classic open question
about complexity measures of Boolean functions….
...namely, the sensitivity conjecture of [NisanSzegedy92].
Background: Complexity measures
 Certificate complexity
 Decision tree depth (deterministic, randomized, quantum)
 Block sensitivity
 PRAM complexity
 Real Polynomial degree (exact, approximate)
Fundamental result(s) in Boolean function complexity:
For any Boolean function f, the above complexity measures are
all polynomially related to each other.
Examples: DT-depth and real degree
• DT-depth(f) = minimum depth of any decision tree
computing f
DT-depth is 4
1
0
1
0
1
0
1
1
• degR(f) = degree of the unique real multilinear polynomial
computing f: {0,1}n  {0,1}
DT-depth and real degree are polynomially related
Theorem: [NisanSzegedy92,NisanSmolensky,Midrijanis04]
For any Boolean function f,
degR(f) <_ DT-depth(f) <_ 2degR(f)3.
(Lower bound is trivial: for each 1-leaf at depth d, have degree-d
polynomial outputing 1 iff its input reaches that leaf, else 0. Sum these.)
Polynomial for this leaf: x1x2(1-x4)x3
1
0
1
0
1
0
1
1
An outlier among complexity measures: Sensitivity
• s(f,x) = sensitivity of f at x
= number of neighbors y of x such that f(x) =/ f(y)
• s(f) = (max) sensitivity of f
= max of s(f,x) over all x in {0,1}n
Folklore: s(f) <_ DT-depth(f).
1
0
1
0
1
0
Question: [Nisan91,NisanSzegedy92]
Is DT-depth(f) <_ poly(s(f))?
1
1
The sensitivity conjecture
Conjecture:
DT-depth(f) <_ poly(s(f)).
Equivalently, block sensitivity, certificate complexity, real degree,
approximate degree, randomized/quantum DT-depth…
Despite much effort, best known upper bounds are exponential
in sensitivity:
• [Simon82]: # relevant variables <_ s(f)4s(f)
• [KenyonKutin04]: bs(f) <_ es(f)
• [AmbainisBavarianGaoMaoSunZuo14]: bs(f),C(f) _< s(f)2s(f)-1, deg(f) _< 2s(f)(1+o(1))
• [AmbainisPrusisVihrovs15]: bs(f) _< (s(f)-1/3)2s(f)-1
This work:
Computational view on the sensitivity conjecture
Conjecture:
DT-depth(f) <_ poly(s(f)).
Previous approaches: combinatorial/analytic
But conjecture also is a strong computational statement:
low-sensitivity functions are very easy to compute!
Conjecture:
DT-depth(f) <_ poly(s(f)).
Conjecture implies
Prior to this work, even
seems not to have been known.
In fact, even an upper bound on the # of sensitivity-s functions
seems not to have been known.
Results
Theorem: Every n-variable sensitivity-s function is computed
by a Boolean circuit of size nO(s).
In fact, every such function is computed by a Boolean formula
of depth O(s log(n)).
So now the picture is
?
?
?
?
Results (continued)
Circuit/formula size bounds are consequences of the conjecture.
Another consequence of the conjecture:
Theorem: Any n-variable sensitivity-s function can be selfcorrected from 2-cs-fraction of worst-case errors using nO(s)
queries and runtime.
(Conjecture  low-sensitivity f has low degR
 has low deg2
 has self-corrector)
All results are fairly easy.
(Lots of directions for future work!)
Simple but crucial insight
Fact: If f has sensitivity s, then f(x) is completely determined
once you know f’s value on 2s+1 neighbors of x.
f(x)=?
2s+1 neighbors
x
f(x)=1
…
neighbors
where f=0
…………..…
neighbors where f=1
Either have at least s+1 many 0-neighbors or at least s+1 many 1-neighbors.
The value of f(x) must equal this majority value!
(If it disagreed, would have s(f) >_ s(f,x) >_ s+1.)
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
?
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
?
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
?
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
weight level
2s+1; each point
here has 2s+1
down-neighbors
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
becomes
etc;
weight level
2s+1; each point
here has 2s+1
down-neighbors
weight levels
0,…,2s
0n
Theorem: Every sensitivity-s function on n variables is
uniquely specified by its values on any Hamming ball of
radius 2s.
1n
1n
becomes
etc;
weight level
2s+1; each point
here has 2s+1
down-neighbors
weight levels
0,…,2s
0n
weight levels
0,…,2s+1
0n
Fill in all of {0,1}n this way, level by level.
_
Corollary: There are at most 2{n choose <2s}
sensitivity-s
functions over {0,1}n.
Can we use this insight to
compute
sensitivity-s functions efficiently?
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a
circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in.
Bottom 2s+1 layers
Hamming ball centered at 0n.
x
Algorithm: For |x| stages,
•
Shift center of Hamming ball along
shortest path to x
•
Use values of f on previous Hamming ball
to compute values on new ball
(at most n2s new values to compute; each
one easy using majority vote)
Compute at
most n2s new
values
next center
0n = first center
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a
circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in.
Bottom 2s+1 layers
Hamming ball centered at 0n.
x
Algorithm: For |x| stages,
•
•
Shift center of Hamming ball along
shortest path to x
Use values of f on previous Hamming ball
to compute values on new ball
(at most n2s new values to compute; each
one easy using majority vote)
Compute at
most n2s new
values
next center
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a
circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in.
Bottom 2s+1 layers
Hamming ball centered at 0n.
x
Algorithm: For |x| stages,
•
•
Shift center of Hamming ball along
shortest path to x
Use values of f on previous Hamming ball
to compute values on new ball
(at most n2s new values to compute; each
one easy using majority vote)
Compute at
most n2s new
values
next center
Small circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a
circuit of size O(sn2s+1).
Algorithm has value of f on bottom 2s+1 layers “hard-coded” in.
Bottom 2s+1 layers
Hamming ball centered at 0n.
x
Algorithm: For |x| stages,
•
Shift center of Hamming ball along
shortest path to x
•
Use values of f on previous Hamming ball
to compute values on new ball
(at most n2s new values to compute; each
one easy using majority vote)
Shallow circuits for sensitivity-s functions?
The algorithm we just saw seems inherently sequential – takes n
stages.
Can we parallelize?
Yes, by being bolder: go n/s levels at each stage rather than one.
Extension of earlier key insight
Sensitivity-s functions are noise-stable at every input x.
• Pick any vertex x.
• Flip n/(11s) random coordinates to get y.
• View t-th coordinate flipped as chosen from ‘untouched’ n-t+1 coordinates.
At each stage, at most s coordinates are sensitive.
Get Pr[f(x) =/ f(y)] <_ Pr[stage 1 flips f] + Pr[stage 2 flips f] + …
<_ s/n + s/(n-1) + … + s/(n – n/11s + 1)
_< 1/10.
Downward walks
Similar statement holds for “random downward walks.”
• Pick any vertex x with |x| many ones.
• Flip |x|/(11s) randomly chosen 1’s to 0’s to get y.
• View t-th coordinate flipped as chosen from ‘untouched’ |x|-t+1 coords.
Get Pr[f(x) =/ f(y)] <_ s/|x| + s/(|x|-1) + … + s/(|x| – |x|/11s + 1)
<_ 1/10.
Shallow circuits for sensitivity-s functions
Theorem: Every n-variable sensitivity-s function has a
formula of depth O(s log n).
Algorithm has value of f on bottom 11s layers “hard-coded” in.
x
Parallel-Alg: Given x
•
If |x| <_ 11s, return hard-coded value.
•
Sample C=O(1) points x1, x2, xC from
“downward random walk” of length
|x|/11s. Call Parallel-Alg on each one.
•
Return majority vote of the C results.
x1
x2
……..
xC
weight levels
0,…,11s
0n
Algorithm has value of f on bottom 10s layers “hard-coded” in.
Parallel-Alg: Given x
•
If |x| <_ 11s, return hard-coded value.
•
Sample C=O(1) points x1, x2, xC from
“downward random walk” of length
|x|/11s. Call Parallel-Alg on each one.
•
Return majority vote of the C results.
x
x1
x2….... xC
weight levels
0,…,11s
0n
• Have Parallel-Alg(x) = f(x) with probability 19/20 for all x
(proof: induction on |x|)
• After O(s log n) stages, bottoms out in “red zone”,
so parallel runtime is O(s log n)
• C=O(1), so total work is CO(s log n) = nO(s)
Conclusion / Questions
Many questions remain about computational properties of low-sensitivity
functions.
We saw there are at most 2{n choose <2s} many sensitivity-s functions.
Can this bound be sharpened?
We saw every sensitivity-s function has a formula of depth O(s log n).
Does every such function have a
•
TC0 circuit / AC0 circuit / DNF / decision tree of size npoly(s)?
•
PTF of degree poly(s)? DNF of width poly(s)? GF(2) polynomial of degree poly(s)?
A closing puzzle/request
We saw sensitivity-s functions obey a “majority rule” (MAJ of any 2s+1
neighbors).
Well-known that degree-d functions obey a “parity rule” (PAR over any (d+1)dim subcube must = 0).
If the conjecture is true, then low-sensitivity functions have low degree…
…and these two very different-looking rules must coincide!
Explain this!
Thank you for your attention
36
Download