Sketching and Embedding are Equivalent for Norms Alexandr Andoni (Simons Institute) Robert Krauthgamer (Weizmann Institute) Ilya Razenshteyn (CSAIL MIT) 1 Sketching • Compress a massive object to a small sketch • Rich theories: high-dimensional vectors, matrices, graphs • Similarity search, compressed sensing, numerical linear algebra d • Dimension reduction (Johnson, Lindenstrauss 1984): random projection on a low-dimensional subspace preserves distances n When is sketching possible? 2 Similarity search • Motivation: similarity search • Model similarity as a metric • Sketching may speed-up computation and allow indexing • Interesting metrics: • • • • Euclidean ℓ2: d(x, y) = (∑i|xi – yi|2)1/2 Manhattan, Hamming ℓ1: d(x, y) = ∑i|xi – yi| ℓp distances d(x, y) = (∑i|xi – yi|p)1/p for p ≥ 1 Edit distance, Earth Mover’s Distance etc. 3 Sketching metrics 0 1 1 0 … 1 • Alice and Bob each hold a point from a Alice x metric space (say x and y) • Both send s-bit sketches to Charlie • For r > 0 and D > 1 distinguish sketch(x) • d(x, y) ≤ r • d(x, y) ≥ Dr • Shared randomness, allow 1% probability of error • Trade-off between s and D y Bob sketch(y) Charlie d(x, y) ≤ r or d(x, y) ≥ Dr? 4 Near Neighbor Search via sketches • Near Neighbor Search (NNS): • Given n-point dataset P • A query q within r from some data point • Return any data point within Dr from q • Sketches of size s imply NNS with space nO(s) and a 1-probe query • Proof idea: amplify probability of error to 1/n by increasing the size to O(s log n); sketch of q determines the answer 5 Sketching real line • Distinguish |x – y| ≤ 1 vs. |x – y| ≥ 1 + ε • Randomly shifted pieces of size w = 1 + ε/2 • Repeat O(1 / ε2) times • Overall: 0 x 1 0 y • D=1+ε • s = O(1 / ε2) 6 Sketching ℓp for 0 < p ≤ 2 • (Indyk 2000): can reduce sketching of ℓp with 0 < p ≤ 2 to sketching reals via random projections • If (G1, G2, …, Gd) are i.i.d. N(0, 1)’s, then ∑i xiGi – ∑i yiGi is distributed as ‖x - y‖2 • N(0, 1) • For 0 < p < 2 use p-stable distributions instead • Again, get D = 1 + ε with s = O(1 / ε2) • For p > 2 sketching ℓp is hard: to achieve D = O(1) one needs sketch size to be s = Θ~(d1-2/p) (Bar-Yossef, Jayram, Kumar, Sivakumar 2002), (Indyk, Woodruff 2005) 7 Anything else? • A map f: X → Y is an embedding with distortion D’, if for a, b from X: dX(a, b) / D’ ≤ dY(f(a), f(b)) ≤ dX(a, b) • Reductions for geometric problems • If Y has s-bit sketches for approximation D, then for X one gets s bits and approximation DD’ 8 Metrics with good sketches • A metric X admits sketches with s, D = O(1), if: • X = ℓp for p ≤ 2 • X embeds into ℓp for p ≤ 2 with distortion O(1) • Are there any other metrics with efficient sketches (D and s are O(1))? • We don’t know! • Some new techniques waiting to be discovered? • No new techniques?! 9 The main result If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε) into ≤ 2 space, if • A vector spaceEmbedding X with ‖.‖: X → R≥0ℓisp,apnormed • (Kushilevitz, ‖x‖ = 0Ostrovsky, iff x = 0 For norms 1998) • Rabani αx‖ = |α|‖x‖ (Indyk 2000) • ‖x + y‖ ≤ ‖x‖ + ‖y‖ Efficient sketches • Every norm gives rise to a metric: define d(x, y) = ‖x - y‖ 10 Sanity check • ℓp spaces: p > 2 is hard, 1 ≤ p ≤ 2 is easy, p < 1 is not a norm • Can classify mixed norms ℓp(ℓq): in particular, ℓ1(ℓ2) is easy, while ℓ2(ℓ1) is hard! (Jayram, Woodruff 2009), (Kalton 1985) • A non-example: edit distance is not a norm, sketchability is largely open (Ostrovsky, Rabani 2005), (Andoni, Jayram, Pătraşcu 2010) ℓq ℓp 11 No embeddings → no sketches • In the contrapositive: if a normed space is non-embeddable into ℓ1 – ε, then it does not have good sketches • Can convert sophisticated non-embeddability results into lower bounds for sketches 12 Example 1: the Earth Mover’s Distance • For x: R[Δ]×[Δ] → R with ∑i,j xi,j = 0, define the Earth Mover’s Distance x‖EMD as the cost of the best transportation of the positive part of x to the negative part (Monge-Kantorovich norm) • Best upper bounds: • D = O(1 / ε) and s = Δε (Andoni, Do Ba, Indyk, Woodruff 2009) • D = O(log Δ) and s = O(1) (Charikar 2002), (Indyk, Thaper 2003), (Naor, Schechtman 2005) No embedding into ℓ1 – ε with distortion O(1) (Naor, Schechtman 2005) No sketches with D = O(1) and s = O(1) 13 Example 2: the Trace Norm • For an n × n matrix A define the Trace Norm (the Nuclear Norm) ‖A‖ to be the sum of the singular values • Previously: lower bounds only for certain restricted classes of sketches (Li, Nguyen, Woodruff 2014) Any embedding into ℓ1 requires distortion Ω(n1/2) (Pisier 1978) Any sketch must satisfy sD = Ω(n1/2 / log n) 14 The plan of the proof If a normed space X admits sketches of size s and approximation D, then for every ε > 0 the space X embeds (linearly) into ℓ1 – ε with distortion O(sD / ε) Sketches Weak embedding into ℓ2 Information theory Linear embedding into ℓ1 – ε Nonlinear functional analysis A map f: X → Y is (s1, s2, τ1, τ2)-threshold, if (1, O(sD), 1, 10)-threshold • dX(x1, x2) ≤ s1 implies dY(f(x1), f(x2)) ≤ τ1 map from X to ℓ2 • dX(x1, x2) ≥ s2 implies dY(f(x1), f(x2)) ≥ τ2 15 Sketch → Threshold map X has a sketch of size s and approximation D X has no sketches of size s and approximation D ℓk ∞(X) has no sketches of size Ω(k) and approximation Θ(sD) ? There is a (1, O(sD), 1, 10)threshold map from X to ℓ2 No (1, O(sD), 1, 10)threshold map from X to ℓ2 (Andoni, Jayram, Pătraşcu 2010) (direct sum theorem for information complexity) Convex duality Poincaré-type inequalities on X (x1, …, xk) = maxi ‖xi 16 Sketching direct sums ℓk∞(X) has sketches of size O(s) and approximation Dk X has sketches of size s and approximation D (σ1, σ2, …, σk) — random ±1’s Alice Bob (a1, a2, …, ak) ∑i σi ai (b1, b2, …, bk) ∑i σi bi sketch(∑i σi ai) maxi ai - bi ≤ ‖∑i σi(ai – bi)‖ ≤ ∑i ai - bi ≤ k maxi ai - bi with probability 1/2 sketch(∑i σi bi) Crucially use the linear structure of X (not enough to be just a metric!) 17 Threshold map → linear embedding (1, O(sD), 1, 10)-threshold map from X to ℓ2 ? Linear embedding into ℓ1 – ε with distortion O(sD / ε) Uniform embedding into ℓ2 (Aharoni, Maurey, Mityagin 1985) (Nikishin 1973) g: X → ℓ2 s.t. L(‖x1 – x2‖) ≤ ‖g(x1) – g(x2)‖ ≤ U(‖x1 – x2‖) where • L and R are non-decreasing, • L(t) > 0 for t > 0 • U(t) → 0 as t → 0 18 Threshold map → uniform embedding • A map f: X → ℓ2 such that • ‖x1 - x2 ≤ 1 implies ‖f(x1) - f(x2)‖ ≤ 1 • ‖x1 - x2 ≥ Θ(sD) implies ‖f(x1) - f(x2)‖ ≥ 10 • Building on (Johnson, Randrianarivony 2006) • 1-net N of X; f Lipschitz on N • Extend f from N to a Lipschitz function on the whole X 19 Open problems • Extend to as general class of metrics as possible • Connection to linear sketches? • Sketches of the form x → Ax • Conjecture: sketches of size s and approximation D can be converted to linear sketches with f(s) measurements and approximation g(D) • Spaces that admit no non-trivial sketches (s = Ω(d) for D = O(1)): is there anything besides ℓ∞? • Can one strengthen our theorem to “sketchability implies embeddability into ℓ1”? • Equivalent to an old open problem from Functional Analysis. • Sketches imply NNS, is there a reverse implication? 20