Time-space tradeoff lower bounds for non-uniform computation Paul Beame University of Washington 4 July 2000 1 Why study time-space tradeoffs? To understand relationships between the two most critical measures of computation unified comparison of algorithms with varying time and space requirements. non-trivial tradeoffs arise frequently in practice avoid storing intermediate results by recomputing them 2 e.g. Sorting n integers from [1,n2] Merge sort S = O(n log n), T = O(n log n) Radix sort S = O(n log n), T = O(n) Selection sort only need - smallest value output so far - index of current element S = O(log n) , T = O(n2) 3 Complexity theory Hard problems prove L P prove non-trivial time lower bounds for natural decision problems in P First step Prove a space lower bound, e.g. S=w (log n), given an upper bound on time T, e.g. T=O(n) for a natural problem in P 4 An annoyance Time hierarchy theorems imply unnatural problems in P not solvable in time O(n) Makes ‘first step’ vacuous for unnatural problems 5 Non-uniform computation Non-trivial time lower bounds still open for problems in P First step still very interesting even without the restriction to natural problems Can yield bounds with precise constants But proving lower bounds may be harder 6 Talk outline The right non-uniform model (for now) branching programs Early success multi-output functions, e.g. sorting Progress on problems in P Crawling restricted branching programs That breakthrough first step (and more) true time-space tradeoffs The path ahead 7 Branching programs x1 1 x3 x2 x4 x5 0 x5 x3 x1 x7 x2 x7 0 1 x8 8 Branching programs x1 1 x3 x2 x4 x5 0 x5 x=(0,0,1,0,...) x3 x1 To compute f:{0,1} n {0,1} on input (x1,…,xn) x7 follow path from source to sink x2 x7 0 1 x8 9 Branching program properties Length = length of longest path Size = # of nodes Simulate TM’s node = configuration with input bits erased time T= Length space S=log2Size =TM space +log2n (head) = space on an index TM polysize = non-uniform L 10 TM space complexity x1 x2 x3 x4 … xn read-only input working storage Space = # of bits of working storage output 11 Branching program properties Simulate random-access machines (RAMs) not just sequential access Generalizations Multi-way version for xi in arbitrary domain D good for modeling RAM input registers Outputs on the edges good for modeling output tape for multi-output functions such as sorting BPs can be leveled w.l.o.g. like adding a clock to a TM 12 Talk outline The right non-uniform model (for now) branching programs Early success multi-output functions, e.g. sorting Progress on problems in P Crawling restricted branching programs That breakthrough first step (and more) true time-space tradeoffs The path ahead 13 Success for multi-output problems Sorting T S = W (n2/log n) [Borodin-Cook 82] T S = W (n2) [Beame 89] Matrix-vector product T S = W (n3) [Abrahamson 89] Many others including Matrix multiplication Pattern matching 14 Proof ideas: layers and trees m outputs on input x at least m/r outputs in some tree Tv T/r Only 2S trees Tv Typical Claim v0 v1 v vr-1 0 1 T/r vr if T/r = en, each tree Tv outputs p correct answers on only a c-p fraction of inputs Correct for all x implies 2Sc-m/r is at least 1 S=W(m/r)=W(mn/T) 15 Limitation of the technique Never more than T S = W (nm) where m is number of outputs “It is unfortunately crucial to our proof that sorting requires many output bits, and it remains an interesting open question whether a similar lower bound can be made to apply to a set recognition problem, such as recognizing whether all n input numbers are distinct.” [Cook: Turing Award Lecture, 1983] 16 Talk outline The right non-uniform model (for now) branching programs Early success multi-output functions, e.g. sorting Problems in P Crawling restricted branching programs That breakthrough first step (and more) true time-space tradeoffs The path ahead 17 Restricted branching programs Constant-width - only a constant number of nodes per level [Chandra-Furst-Lipton 83] Read-once - every variable read at most once per path [Wegener 84], [Simon-Szegedy 89], etc. Oblivious - same variable queried per level [Babai-Pudlak-Rodl-Szemeredi 87], [Alon-Maass 87], [Babai-Nisan-Szegedy 89] BDD = Oblivious read-once 18 BDDs and best-partition communication complexity x7 x1 x6 A x3 x2 x8 B x4 x5 0 1 Given f:{0,1}8->{0,1} Two-player game Player A has {x1,x3,x6,x7} Player B has {x2,x4,x5,x8} Goal: communicate fewest bits possible to compute f Possible protocol: Player A sends the name of node. BDD space # of bits sent for best partition into A and B 19 Communication complexity ideas Each conversation for f:{0,1}Ax{0,1}B {0,1} corresponds to a rectangle YAxYB of inputs YA {0,1}A YB {0,1}B BDD lower bounds size min(A,B) # of rectangles in tiling of inputs by f-constant rectangles with partition (A,B) Read-once bounds same tiling as BDD bounds but each rectangle in tiling may have a different partition 20 Restricted branching programs Read-k - no variable queried > k times on any path - syntactic read-k [Borodin-Razborov-Smolensky 89], [Okol’nishnikova 89], etc. any consistent path - semantic read-k many years of no results nothing for general branching programs either 21 Uniform tradeoffs SAT is not solvable using O(n1-e) space if time is n1+o(1). [Fortnow 97] uses diagonalization works for co-nondeterministic TM’s Extensions for SAT S=logO(1) n implies T= W (n1.4142..-e ) deterministic [Lipton-Viglas 99] with up to no(1) advice [Tourlakis 00] S= O(n1-e) implies T=W (n 1.618..-e ). [Fortnow-van Melkebeek 00] 22 Non-uniform computation [Beame-Saks-Thathachar FOCS 98] Syntactic read-k branching programs exponentially weaker than semantic read-twice. f(x) = “xTMx=0 (mod q)” x GF(q)n e nloglog n time W(n log1-en) space for q~n f(x) = “xTMx=0 (mod 3)” x {0,1}n 1.017n time implies W (n) space first Boolean result above time n for general branching programs 23 Non-uniform computation [Ajtai STOC 99] 0.5log n Hamming distance for x [1,n2]n kn time implies W(n log n) space follows from [Beame-Saks-Thathachar 98] improved to W(nlog n) time by [Pagter-00] element distinctness for x [1,n2]n kn time implies W(n) space requires significant extension of techniques 24 That breakthrough first step! [Ajtai FOCS 99] f(x,y) = “xTMyx (mod 2)” kn time implies W(n) space x{0,1}n y{0,1}2n-1 First result for non-uniform Boolean computation showing time O(n) space w(log n) 25 Ajtai’s Boolean function y1 0 y2 f(x,y)= xTMyx (mod 2) y3 y4 yn y6 y7 y8 y2n-1 My My is a modified Hankel matrix 26 Superlinear lower bounds [Beame-Saks-Sun-Vee FOCS 00] Extension to e-error randomized non-uniform algorithms Better time-space tradeoffs T W(n log/loglog(n/S) ) Apply to both element distinctness and f(x,y) = “xTMyx (mod 2)” 27 (m,a)-rectangles An (m,a)-rectangle R DX is a subset defined by disjoint sets A,B X, s DAUB SA DA, SB DB such that R = { z | zAUB = s, zASA, zBSB } |A|,|B| m |SA|/|DA|, |SB|/|DB| a 28 SA An (m,a)-rectangle SB s x1 DB DA m m SA SB A B xn SA and SB each have density at least a In general A and B may be interleaved in [1,n] 29 Key lemma [BST 98] Let program P use time T = kn space S accept fraction d of its inputs in Dn then P accepts all inputs in some (m,a)-rectangle where m = bn a is at least d 2-4(k+1) m - (S+1) r b-1 ~ 2k and r ~ k2 2k 30 Improved key lemma [Ajtai 99 s] Let program P use time T = kn space S accept fraction d of its inputs in Dn then P accepts all inputs in some (m,a)-rectangle where m = bn 1/50k b m Sr a is at least d 2 b-1 and r are constants depending on k 31 Proving lower bounds using the key lemmas Show that the desired function f evaluates to 1 a large fraction of the time i.e., d is large evaluates to 0 on some input in any large (m,a)-rectangle where large is given by the lemma bounds or ... do the same for f 32 Our new key lemma Let program P use time T = kn space S and accept fraction d of its inputs in Dn Almost all inputs P accepts are in (m,a)-rectangles accepted by P where m = bn 1/8k m Sr b a is at least d 2 2 b-1 and r are k O(k 2 ) no input is in more than O(k) rectangles 33 Proving randomized lower bounds from our key lemma Show that the desired function f evaluates to 1 a large fraction of the time i.e, d is large evaluates to 0 on a g fraction of inputs in any large-enough (m,a)-rectangle or ... do the same for f Gives space lower bound for O(gd/k)-error randomized algorithms running in time kn 34 Proof ideas: layers and trees v0 f= v1 kn/r v2 f (v ,…,v 1 (v1,…,vr-1) r-1) # of (v1,…,vr-1) is 2S(r-1) vr-1 0 1 kn/r vr f (v1,…,vr-1) = f r i=1 f vi-1vi vi-1vi can be computed in kn/r height 35 (r,e)-decision forest The conjunction of r decision trees (BP’s that are trees) of height en Each f (v1,…,vr-1) is a computed by a (r,k/r)-decision forest Only 2S(r-1) of them The various f (v1,…,vr-1) accept disjoint sets of inputs 36 Decision forest kn/r T1 T2 T3 T4 Tr Assume wlog all variables read on every input Fix an input x accepted by the forest Each tree reads only a small fraction of the variables on input x Fix two disjoint subsets of trees, F and G 37 Core variables kn/r T1 T2 T3 T4 Tt Can split the set of variables into core(x,F)=variables read only in F (=not read outside F) core(x,G)=variables read only in G (=not read outside G) remaining variables stem(x,F,G)=assignment to remaining variables General idea: use core(x,F), core(x,G), and stem(x,F,G) to define (m,a)-rectangles 38 A partition of accepted inputs Fix F, G,x accepted by P Rx,F,G={ y | core(y,F)=core(x,F), core(y,G)=core(x,G), stem(y,F,G)=stem(x,F,G), and P accepts y} For each F, G the Rx,F,G partition the accepted inputs into equivalence classes Claim: the Rx,F,G are (m,a)-rectangles 39 Classes are rectangles Let A=core(x,F), B=core(x,G), s=stem(x,F,G) SA={yA| y in Rx,F,G }, SB={zB| z in Rx,F,G } Let w=(s,yA,zB) w agrees with y in all trees outside G core(w,G)=core(y,G)=core(x,G) w agrees with z in all trees outside F core(w,F)=core(z,F)=core(x,F) stem(w,F,G)=s=stem(x,F,G) P accepts w since it accepts y and z So... w is in Rx,F,G 40 Few partitions suffice Only 4k pairs F,G suffice to cover almost all inputs accepted by P by large (m,a)-rectangles Rx,F,G Choose F,G uniformly at random of suitable size, depending on access pattern of input probability that F,G isn’t good is tiny one such pair will work for almost all inputs with the given access pattern Only 4k sizes needed. 41 Special case: oblivious BPs core(x,F), core(x,G) don’t depend on x Choose Ti in F with prob q G with prob q neither with prob 1-2q 42 xTMyx on an (m,a)-rectangle B A x For every s on AUB, f(xAUB,s,y) A = xAT MAB xB + g(xA,y) + h(xB,y) B My x 43 Rectangles, rank, & rigidity largest rectangle on which xATMxB is constant has a 2-rank(M) [Borodin-Razborov-Smolensky 89] Lemma [Ajtai 99] Can fix y s.t. every bnxbn minor MAB of My has rank(MAB) cbn/log2(1/b) improvement of bounds of [Beame-Saks-Thathachar 98] & [Borodin-Razborov-Smolensky 89] for Sylvester matrices 44 High rank implies balance For any rectangle SAxSB {0,1}Ax{0,1}B with m(SAxSB) |A||B|23-rank(M) Pr[ xATMxB= 1 | xA SA, xB SB] 1/32 Pr[ xATMxB= 0 | xA SA, xB SB] 1/32 derived from result for inner product in r dimensions So rigidity also implies balance for all large rectangles and so T W(n log/loglog(n/S) ) Also follows for element distinctness [Babai-Frankl-Simon 86] 45 Talk outline The right non-uniform model (for now) branching programs Early success multi-output functions, e.g. sorting Progress on problems in P Crawling restricted branching programs That breakthrough first step (and more) true time-space tradeoffs The path ahead 46 Improving the bounds What is the limit? T=W(nlog(n/S)) ? T=W(n2/S) ? Current bounds for general BPs are almost equal to best current bounds for oblivious BPs ! T=W(nlog(n/S)) using 2-party CC [AM] T=W(nlog2(n/S)) using multi-party CC [BNS] 47 Improving the bounds (m,a)-rectangles a 2-party CC idea insight: generalizing to non-oblivious BPs yields same bound as [AM] for oblivious BPs Generalize to multi-party CC ideas to get better bounds for general BPs? similar framework yields same bound as [BNS] for oblivious BPs Improve oblivious BP lower bounds? ideas other than communication complexity? 48 Extension to other problems Problem should be hard for (best-partition) 2-party communication complexity (after most variables fixed). try oblivious BPs first Prime candidate: (directed) st-connectivity Many non-uniform lower bounds in structured JAG models [Cook-Rackoff], [BBRRT], [Edmonds], [Barnes-Edmonds], [Achlioptas-Edmonds-Poon] Best-partition communication complexity bounds known 49 Limitations of current method Need n>T/r = decision tree height else all functions trivial so r > T/n A decision forest works on a 2-Sr fraction of the accepted inputs • only place space bound is used So need Sr<n else d.f. need only work on one input implies ST/n < n, i.e. T < n2/S 50