Turnstile Streaming Algorithms Might as Well Be Linear Sketches Yi Li Huy L. Nguyen David Woodruff Turnstile Streaming Model • Underlying n-dimensional vector x initialized to 0n • Long stream of updates x à x + ei or x à x - ei for standard unit vector ei • At end of the stream, x 2 {-m, -m+1, …, m-1, m}n for some bound m · poly(n) • Output an approximation to f(x) whp • Goal: use as little space (in bits) as possible Example: Norms • Suppose you want |x|pp = Ʃi=1n |xi|p • Want Z for which (1-Ɛ) |x|pp · Z · (1+Ɛ) |x|pp • Many applications • p=2 – Geometry, linear algebra • p=1 – Distances between distributions, network monitoring Algorithm for 2-Norm • Let r = 1/Ɛ2 • Choose an r x n matrix A of i.i.d. N(0,1/r) normal random variables (with precision 1/poly(n)) • Maintain Ax in the stream • Output |Ax|22 • Proof: Johnson-Lindenstrauss Lemma Algorithm for 1-Norm [Indyk] • Let r = 1/Ɛ2 • Choose an r x n matrix A of i.i.d. Cauchy random variables (with precision 1/poly(n)) • Maintain Ax in the stream • Output median(|Ax1|, …, |Axr|) • Proof: 1-stability of Cauchy distribution – If C1, C2 are independent Cauchy r.v.s, then a*C1 + b*C2 » (|a| + |b|) C3 for Cauchy r.v. C3 Common Features Algorithms for 2-norm and 1-norm have the following Some form: functions f(x) may be weird: 1. Choose a random matrix A independent of x What is xxx 2. Maintain Ax in the stream 1 3. Output a function of Ax Question (?!): does the optimal algorithm for approximating any function in the turnstile model have this form? Our Results • Yes, up to a factor of log n • Theorem: for computing a relation f for x in {-m, -m+1, …, m}n in the turnstile model, there is a correct (whp) algorithm which: 1. samples an integer matrix A uniformly from log m) hardwired matrices, independent of x 2. outputs a function of Ax O(n Logarithm of the number of states of Ax, for x in {-m, -m+1, …, m}n, plus amount of randomness, is optimal up to a log n factor Consequences a 2 {0,1}n Create stream s(a) b 2 {0,1}n Create stream s(b) Lower Bound Technique 1. Run Alg on s(a), transmit state of Alg(s(a)) to Bob 2. Bob computes Alg(s(a), s(b)) 3. If Bob solves g(a,b), space complexity of Alg at least the 1way communication complexity of g Consequences a 2 {0,1}n Create stream s(a) b 2 {0,1}n Create stream s(b) Our main theorem implies: If Bob can solve g(a,b), then space of Alg at least the simultaneous communication complexity of g Weaker public-coin model in which Alice and Bob simultaneously send a message to a referee The log n Factor Loss • Main Theorem: The logarithm of the number of states of Ax, as x ranges over {-m, -m+1, …, m}n, plus the amount of randomness to store A, is optimal up to a log n factor • The log n loss is necessary • Consider f(x) = x1 mod 2 Non-Uniformity Restriction • Careful wording: “samples an integer matrix A uniformly from O(n log m) hardwired matrices, independent of x” • Algorithm is non-uniform – Output of each state for each A also hardwired • Alternatively, allow algorithm to use more space to process a stream update, provided it only retains Ax and its randomness – Regenerate A during each stream update Comment on the Model • For each random seed, algorithm is a deterministic automaton with a finite number of states • Main theorem only requires correctness for x 2 {-m, -m+1, …, m}n It counts the number of states as x varies in this range • While processing the stream, may have |x|1 > m • The algorithm can’t abort if this happens. It must still be correct at the end of the stream for x in {-m, -m+1, …, m}n Related Work • Ganguly – Deterministic algorithms – Specific to heavy hitters problem – Shows algorithm might as well be a linear sketch over the reals – Dimension lower bound over the reals Talk Outline • Proof of Main Theorem 1. Reduction to path-independent automata 2. From path-independent automata to linear sketches • Applications and Open Questions Stream Automaton for Fixed Randomness … -en +en … -e1, +e2 Start +e1 … Want each state of the automaton to only depend on x, … not how it got there +e1 0n in two +edifferent 5 states -e1 … … Path-Independent Automaton • Undirected connected graph • Each x 2 Zn in a unique state • For each randomness, can we modify the automaton to make it path-independent? • Rule out algorithms that remember how they arrived at a state, e.g., an algorithm that stores the last 5 stream updates Path-Reversible Automaton • Path-reversible: 8 states s, if σ is a stream (+ei1, -ei2, -ei3, …,+eir) of updates, resulting in a state t, then from t the stream σ-1 = (-eir, …,+ei3,+ei2, -ei1) returns us to s +e2 s1 -e1 s2 -e2 +e5 s3 +e1 s4 -e5 • Path-reversible does not imply path-independent Strategy Arbitrary Automaton PathReversible Automaton PathIndependent Automaton For stream σ, freq(σ) 2 Zn is “net update” to each coordinate Idea: 1. if in a state s, and update by a stream σ, with freq(σ) = 0, answers ought to be similar 2. collapse all states s, s’ for which s+σ = s’ and freq(σ) = 0 for some stream σ Issue: how to define new output and transition function? Zero-Frequency Graph • Directed graph G = (V,E) • V = states of old automaton Aold (for fixed randomness) • (s,t) 2 E if there is a stream σ with s+σ=t and freq(σ) = 0 – Finite number of streams to consider • Terminal equivalence class: strongly connected component with no outgoing edge – Path in G lands in a terminal equivalence class – States of new automaton Anew = terminal equivalence classes New Transition Function • Suppose in terminal equivalence class C • Given an update ei • Let v 2 C be an arbitrary node • Compute v+ei using transition function of Aold • Walk from v+ei until reach terminal equivalence class C’ – C’ is unique • Does not depend on choice of v • Only one terminal equivalence class reachable – Anew is path-reversible Terminal equivalence class u +ei freq(σ) = 0 v +ei freq(σ’) = 0 Terminal equivalence class Output Function of Anew • In each terminal equivalence class C, sample node u from stationary distribution from random walk in C (add self-loops) – Output of Anew on C = Output of Aold on u • If v is starting vertex of Aold, – take a random walk in G from v – let starting vertex of Anew be terminal equivalence class C reached • Why is it correct? Correctness • Let ¦ be an arbitrary distribution on streams ¾ • Choose fixed randomness so Aold correct on ¦’: – Long sequence of zero streams, – Followed by ¾ sampled from ¦, – Followed by long sequence of zero streams • Output of Anew on ¦ statistically close to output of Aold on ¦’ • => for every ¦ there is a path-reversible Anew correct on ¦ Arbitrary Automaton PathReversible Automaton PathIndependent Automaton • Undirected zero-frequency graph G • New automaton states = connected components of G • 8 x 2 Zn, only one connected component of G contains states containing x • Uses path-reversibility • => Well-defined transition function • Random walk in components to choose outputs Talk Outline • Proof of Main Theorem 1. Reduction to path-independent automata 2. From path-independent automata to linear sketches • Applications and Open Questions Path Independent Automata and Submodules • • • • • • • • Let o be the initial state M = {x 2 Zn such that x in o} 0n 2 M If x 2 M, then –x 2 M If x, y 2 M, then x+y 2 M M is a free submodule of Zn (a lattice) M has a basis Any two bases have the same cardinality Path Independent Automata and Sketches • States of automaton are elements (cosets) of the quotient module Zn/M • Space of automaton is log of the number of cosets containing an x 2 {-m, …, m}n • Goal: build a sketching algorithm A¢x – – – – A is fixed for this automaton Space of A¢x ¼ space of automaton Injection from states of automaton to states of A¢x Will replace {-m, …, m}n with {-m/n, …, m/n}n Smith Normal Form • Zn/M examples: – Zn/e1 is free. It remembers all but first coordinate – Zn/(2e1, 2e2, …, 2en) not free. It remembers coordinate parities • Smith Normal Form: 9 a basis y1, …, yn of Zn for which the generators of M are qi¢yi for i = 1, …, r, where qi | qi+1 are positive integers, and r = rank(M) • If qs = 1 but qs+1 > 1, the generators of Zn/M are ys+1+ M, …, yn + M, and Zn/M is isomorphic to Z/qs+1 © … © Z/qr © Zn-r Counting States • Define n x n matrix B, where i-th column Bi is coefficients of ei in basis y1, …, yn • State = Bx mod q, after removing first s rows • For each i, there are x != x’ 2 {-m/n, …, m/n}n with 1. (Bx)i != (Bx’)i mod qi 2. (Bx)j = (Bx’)j mod qj 8 j != i • Proof: otherwise delete row i • Corollary: # states ¸ 2n-s Removing Torsion • Let sketch be Bx without mod q – After reducing entries in B mod q • For each old state Bx mod q, at most (mn)n-s new states • # new states <= #(old states)*(mn)n-s • log(# new states) <= log(# old states)*log(mn) Handling Large Entries in B • Want B in Z(n-s) x s to have integer entries of value at most poly(n) • Removing states from M outside of {-m, …, m}n, can assume qi · exp(poly(n)) • Take random linear combinations of rows of B, reduce each row mod a random prime • Whp if Bx != By, after this transformation to B, Bx != By Talk Outline • Proof of Main Theorem 1. Reduction to path-independent automata 2. From path-independent automata to linear sketches • Applications and Open Questions Applications and Open Questions • Simpler proof of ~(n1-2/p) bit lower bound for estimating Fp, p > 2 – No communication complexity • Many dimension lower bounds known for sketching norms over the reals – Fp, matrix norms, adaptive sketching – Do these give turnstile streaming lower bounds with finite precision?