Optimal Bounds for JohnsonLindenstrauss Transforms and Streaming Problems with SubConstant Error T.S. Jayram David Woodruff IBM Almaden Data Stream Model • Have a stream of m updates to an n-dimensional vector v – “add x to coordinate i” – Insertion model -> all updates x are positive – Turnstile model -> x can be positive or negative – stream length and updates < poly(n) • Estimate statistics of v – – – – # of distinct elements F0 Lp-norm |v|p = (Σi |vi|p )1/p entropy and so on • Goal: output a (1+ε)-approximation with limited memory Lots of “Optimal” Papers • Lots of “optimal” results – “An optimal algorithm for the distinct elements problem” [KNW] – “Fast moment estimation in optimal space” [KNPW] – “A near-optimal algorithm for estimating entropy of a stream” [CCM] – “Optimal approximations of the frequency moments of data streams” [IW] – “A near-optimal algorithm for L1-difference” [NW] – “Optimal space lower bounds for all frequency moments” [W] • This paper – Optimal Bounds for Johnson-Lindenstrauss Transforms and Streaming Problems with Sub-Constant Error What Is Optimal? • F0 = # of non-zero entries in v • “For a stream of indices in {1, …, n}, our algorithm computes a (1+ε)-approximation using an optimal O(ε-2 + log n) bits of space with 2/3 success probability… This probability can be amplified by independent repetition.” • If we want high probability, say, 1-1/n, this increases the space by a multiplicative log n • So “optimal” algorithms are only optimal for algorithms with constant success probability Can We Improve the Lower Bounds? x2 -2 ε {0,1} y2 -2 ε {0,1} Gap-Hamming: either Δ(x,y) > ½ + ε or Δ(x,y) < ½-ε Lower bound of Ω(ε-2) with 1/3 error probability But upper bound of ε-2 with 0 error probability Our Results Streaming Results • Independent repetition is optimal! • Estimating Lp-norm in turnstile model up to 1+ε w.p. 1-δ – Ω(ε-2 log n log 1/δ) bits for any p – [KNW] get O(ε-2 log n log 1/δ) for 0 · p · 2 • Estimating F0 in insertion model up to 1+ε w.p. 1-δ – Ω(ε-2log 1/δ + log n) bits – [KNW] get O(ε-2 log 1/δ) for ε-2 > log n • Estimating entropy in turnstile model up to 1+ε w.p. 1-δ – Ω(ε-2log n log 1/δ) bits – Improves Ω(ε-2 log n) bound [KNW] Johnson-Lindenstrauss Transforms • Let A be a random matrix so that with probability 1- δ, for any fixed q 2 Rd |Aq|2 = (1 ± ε) |q|2 • [JL] A can be a 1/ε2 log 1/δ x d matrix – Gaussians or sign variables work • [Alon] A needs to have (1/ε2 log 1/δ) / log 1/ε rows • Our result: A needs to have 1/ε2 log 1/δ rows Communication Complexity Separation f(x,y) 2 {0,1} y x 1 0 D1/3, ρ (f) = communication of best 1-way deterministic protocol that errs w.p. 1/3 on distribution ρ [KNR]: R||1/3(f) = maxproduct distributions ¹ £ λ D ¹ £ λ,1/3(f) Communication Complexity Separation f(x,y) 2 {0,1} VC-dimension: maximum number r of columns for which all 2r rows occur in communication matrix on these columns [KNR]: R||1/3(f) = Θ(VC-dimension(f)) Our result: there exist f and g with VC-dimension k, but: R||δ(f) = Θ(k log 1/δ) while R||δ(g) = Θ(k) Our Techniques Lopsided Set Intersection (LSI) U = 1/ε2 ¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1/δ - Alice cannot describe S with o(ε-2 log U) bits - If x, y are uniform then with constant probability, S Å T = ; - R||1/3(LSI) > Duniform, 1/3 (LSI) = Ω(ε-2 log 1/δ) Lopsided Set Intersection (LSI2) U = 1/ε2 ¢ 1/δ Is S Å T = ;? S ½ {1, 2, …, U} |S| = 1/ε2 T ½ {1, 2, …, U} |T| = 1 -R||δ/3(LSI2) ¸ R||1/3(LSI) = Ω(ε-2 log 1/δ) - Union bound over set elements in LSI instance Low Error Inner Product U = 1/ε2 ¢ 1/δ Does <x,y> = 0? x 2 {0, ε}U |x|2 = 1 y 2 {0, 1}U |y|2 = 1 Estimate <x, y> up to ε w.p. 1-δ -> solve LSI2 w.p. 1-δ R||δ (inner productε) = Ω(ε-2 log 1/δ) L2-estimationε U = 1/ε2 ¢ 1/δ - log 1/δWhat factor is new, but is |x-y| ? 2 want an (ε-2 log n log U 1/δ) lower bound U y 2 {0, 1} x 2 {0, ε} |x|2 = 1 |y|2 = 1 - Can use a known trick to 2 - |x-y|22 = |x| |y|22extra - 2<x, y> 2 –factor 2<x,y> 2 +an get log= n - Estimate |x-y|2 up to (1+Θ(ε))-factor solves inner-productε - So R||δ (L2-estimationε) = Ω(ε-2 log 1/δ) Augmented Lopsided Set Intersection (ALSI2) Universe [U] = [1/ε2 ¢ 1/δ] j 2 [U] i* 2 {1, 2, …, r} Si*+1 …, Sr S1, …, Sr ½ [U] All i: |Si| = 1/ε2 Is j 2 Si*? R||1/3(ALSI2) = (r ε-2 log 1/δ) Reduction of ALSI2 to L2-estimationε - Set r = Θ(log n) S1 S2 … Sr - R|| δ(L2-estmationε) = (ε-2 log n log 1/δ) x1 j yi* || (L x2 S -estimationxεi*+1 - Streaming Space > R ) δ 2i*+1 x … … … xr Sr xr } y - x = 10i* yi* - i=1i* 10i ¢ xi |y-x|2 is dominated by 10i* |yi* – xi*|2 } y Lower Bounds for JohnsonLindenstrauss x 2 {-nO(1), …, nO(1)} t y 2 {-nO(1), …, nO(1)} t Use public randomness to agree on a JL matrix A Ax - Can estimate |x-y|2 up to 1+ε w.p. 1-δ - #rows(A) = (r ε-2 log 1/δ /log n) - Ay |A(x-y)|2 Low-Error Hamming Distance Universe = [n] Δ(x,y) = Hamming Distance between x and y x 2 {0,1}n y 2 {0,1}n • R||δ (Δ(x,y)ε) = (ε-2 log 1/δ log n) • Reduction to ALSI2 • Gap-Hamming to LSI2 reductions with Low Error • Implies our lower bounds for estimating • Any Lp-norm • Distinct Elements • Entropy Conclusions • Prove first streaming space lower bounds that depend on probability of error δ – Optimal for Lp-norms, distinct elements – Improves lower bound for entropy – Optimal dimensionality bound for JL transforms • Adds several twists to augmented indexing proofs – Augmented indexing with a small set in a large domain – Proof builds upon lopsided set disjointness lower bounds – Uses multiple Gap-Hamming to Indexing reductions that handle low error ALSI2 to Hamming Distance multiple copies by S1, …, Sr ½ Embed [1/ε2 ¢ 1/δ] 2 duplicating coordinates at All i: |Si| = 1/ε different scales j 2 [U] i* 2 {1, 2, …, r} Si*+1 …, Sr - Let t = 1/ ε2 log 1/δ - Use public coin to generate t random strings b1, …, bt 2 {0,1}t - Alice sets xi = majorityk in Si bi, k - Bob sets yi = bi ,j