UCLA IPAM 07 Advances in Metric Embedding Theory Yair Bartal Hebrew University & Caltech Metric Spaces Metric space: (X,d) d:X2→R+ d(u,v)=d(v,u) d(v,w) ≤ d(v,u) + d(u,w) d(u,u)=0 Data Representation: Pictures (e.g. faces), web pages, DNA sequences, … Network: communication distance Metric Embedding Simple Representation: Translate metric data into easy to analyze form, gain geometric structure: e.g. embed in lowdimensional Euclidean space Algorithmic Application: Apply algorithms for a “nice” space to solve problem on “problematic” metric spaces Embedding Metric Spaces Metric spaces (X,dX), (Y,dy) Embedding is a function f:X→Y For an embedding f, Given u,v in X let dY f u , f v dist f u, v d X u, v Distortion c = max{u,v X} distf(u,v) / min{u,v X} distf(u,v) Special Metric Spaces Euclidean space p lp metric in Rn: || x y || p | xi yi | in Planar metrics Tree metrics Ultrametrics Doubling 1 p Embedding in Normed Spaces [Fréchet Embedding]: Any n-point metric space embeds isometrically in L∞ Proof. w x y Embedding in Normed Spaces [Bourgain 85]: Any n-point metric space embeds in Lp with distortion Θ(log n) [Johnson-Lindenstrauss 85]: Any npoint subset of Euclidean Space embeds with distortion (1+e) in dimension Θ(e-2 log n) [ABN 06, B 06]: Dimension Θ(log n) In fact: Θ*(log n/ loglog n) Embeddings Metrics in their Intrinsic Dimension Definition: A metric space X has doubling constant λ, if any ball with radius r>0 can be covered with λ balls of half the radius. Doubling dimension: dim(X) = log λ [ABN 07b]: Any n point metric space X can be embedded into Lp with distortion O(log1+θ n), dimension O(dim(X)) Same embedding, using: nets Lovász Local Lemma Distortion-Dimension Tradeoff Average Distortion Practical measure of the quality of an embedding Network embedding, Multi-dimensional scaling, Biology, Vision,… Given a non-contracting embedding f:(X,dX)→(Y,dY): n avgdist f 2 1 dist f (u, v) u ,v X dist f u, v dY f u , f v d X u, v d f u , f v distavg f d u , v u ,v X Y u ,v X X [ABN06]: Every n point metric space embeds into Lp with average distortion O(1), worst-case distortion Θ(log n) and dimension Θ(log n). The lq-Distortion lq-distortion: dist q f dist f u, v n dist1 f 2 1 n 2 q u v dist u, v u vX f 1 dist u, v u vX f [ABN 06]: lq-distortion is bounded by Θ(q) 2 f dist u, v q q dist f max dist f u, v n dist 2 f 2 1 Dimension Reduction into Constant Dimension [B 07]: Any finite subset of Euclidean Space embeds in dimension h with lqdistortion eO(q/h) ~ 1+ O(q/h) Corollary: Every finite metric space embeds into Lp in dimension h with lqdistortion q eO( q / h) h 1 21 p Local Embeddings Def: A k-local embedding has distortion D(k) if for every k-nearest neighbors x,y: distf(x,y) ≤ D(k) [ABN 07c]: For fixed k, k-local embedding into Lp distortion Q(log k) and dimension Q(log k) (under very weak growth bound condition) [ABN 07c]: k-local embedding into Lp with distortion Õ(log k) on neighbors, for all k simultaneously, and dimension Q(log n) Same embedding method Lovász Local Lemma Local Dimension Reduction [BRS 07]: For fixed k, any finite set of points in Euclidean space has k-local embedding with distortion (1+e) in dimension Q(e-2 log k) (under very weak growth bound condition) New embedding ideas Lovász Local Lemma Time for a… Metric Ramsey Problem Given a metric space what is the largest size subspace which has some special structure, e.g. close to be Euclidean Graph theory: Every graph of size n contains either a clique or an independent set of size Q(log n) Dvoretzky’s theorem… [BFM 86]: Every n point metric space contains a subspace of size W(ce log n) which embeds in Euclidean space with distortion (1+e) Basic Structures: Ultrametric, k-HST [B 96] (u) (v) (w) x (z)=0 z 0 = (z) (w)/k (v)/k2 (u)/k3 d(x,z)= (lca(x,z))= (v) • An ultrametric k-embeds in a k-HST (moreover this can be done so that labels are powers of k). Hierarchically WellSeparated Trees 1 1 1 2 1/ k 1 2 2 2 3 2/ k 1 3 3 3 3 3 Properties of Ultrametrics An ultrametric is a tree metric. Ultrametrics embed isometrically in l2. [BM 04]: Any n-point ultrametric (1+e)embeds in lpd, where d = O(e-2 log n) . A Metric Ramsey Phenomenon Consider n equally spaced points on the line. Choose a “Cantor like” set of points, and construct a binary tree over them. The resulting tree is 3-HST, and the original subspace embeds in this tree with distortion 3. Size of subspace: 2log n n log 2 . 3 3 Metric Ramsey Phenomena [BLMN 03, MN 06, B 06]: Any n-point metric space contains a subspace of size 1e n which embeds in an ultrametric with distortion Θ(1/e) [B 06]: Any n-point metric space contains a subspace of linear size which embeds in an ultrametric with lq-distortion is bounded by Õ(q) Metric Ramsey Theorems Key Ingredient: Partitions Complete Representation via Ultrametrics ? Goal: Given an n point metric space, we would like to embed it into an ultrametric with low distortion. Lower Bound: W(n), in fact this holds event for embedding the n-cycle into arbitrary tree metrics [RR 95] Probabilistic Embedding [Karp 89]: The n-cycle probabilisticallyembeds in n-line spaces with distortion 2 C If u,v are adjacent in the cycle C then E(dL(u,v))= (n-1)/n + (n-1)/n < 2 = 2 dC(u,v) Probabilistic Embedding [B 96,98,04, FRT 03]: Any n-point metric space probabilistically embeds into an ultrametric with distortion Θ(log n) [ABN 05,06, CDGKS 05]: lq-distortion is Θ(q) Probabilistic Embedding Key Ingredient: Probabilistic Partitions Probabilistic Partitions P={S1,S2,…St} is a partition of X if i j : Si S j , Si X i P(x) is the cluster containing x. P is Δ-bounded if diam(Si)≤Δ for all i. A probabilistic partition P is a distribution over a set of partitions. P is (η,d)-padded if η PrBx, Px d x1 η Call P η-padded if d1/2. •[B 96] =Q(1/(log n)) •[CKR01+FRT03, ABN06]: η(x)= Ω(1/log (ρ(x,Δ)) x2 Partitions and Embedding [B 96, Rao 99, …] Let Δi=4i be the scales. For each scale i, create a probabilistic Δibounded partitions Pi, that are η-padded. diameter of X = Δ Δi For each cluster choose σi(S)~Ber(½) i.i.d. fi(x)= σi(Pi(x))·d(x,X\Pi(x)) f x f i x i 0 Repeat O(log n) times. Distortion : O(η-1·log1/pΔ). Dimension : O(log n·log Δ). 16 4 x d(x,X\P(x)) Time to… Uniform Probabilistic Partitions In a Uniform Probabilistic Partition η:X→[0,1] all points in a cluster have the same padding parameter. [ABN 06]: Uniform partition lemma: There exists a uniform probabilistic Δ-bounded partition such that for any x C , η(x)=log-1ρ(v,Δ), where v min x, xC The local growth rate of x at radius r is: x, r C1 v1 η(C1) v2 C2 v3 η(C2) B x, 4 r B x, r 4 Embedding into a single dimension Let Δi=4i. For each scale i, create uniformly padded probabilistic Δi-bounded partitions Pi. For each cluster choose σi(S)~Ber(½) i.i.d. f x f i x , i 0 1. 2. 3. fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x)) Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y). Lower bound: E[|f(x)-f(y)|] ≥ Ω(d(x,y)) Replicate D=Θ(log n) times to get high probability. Upper Bound: |f(x)-f(y)| ≤ O(log n) d(x,y) f i x i Pi x i1 x d x, X \ Pi x For all x,yєX: - Pi(x)≠Pi(y) implies fi(x)≤ ηi-1(x)· d(x,y) - Pi(x)=Pi(y) implies fi(x)- fi(y)≤ ηi-1(x)· d(x,y) i 0 f i x f i y d x, y i1 x i 0 d x, y log i 0 B x, 4 i B x, i 4 Olog n d x, y Use uniform padding in cluster Lower Bound: y i x i x Take a scale i such that Δi≈d(x,y)/4. It must be that Pi(x)≠Pi(y) With probability ½ : ηi-1(x)d(x,X\Pi(x))≥Δi Lower bound : E[|f(x)-f(y)|] ≥ d(x,y) Two cases: R f x f y j i j 1. R < Δi/2 then prob. ⅛: σi(Pi(x))=1 and σi(Pi(y))=0 Then fi(x) ≥ Δi ,fi(y)=0 |f(x)-f(y)| ≥ Δi/2 =Ω(d(x,y)). 2. R ≥ Δi/2 then prob. ¼: σi(Pi(x))=0 and σi(Pi(y))=0 fi(x)=fi(y)=0 |f(x)-f(y)| ≥ Δi/2 =Ω(d(x,y)). j Partial Embedding & Scaling Distortion Definition: A (1-ε)-partial embedding has distortion D(ε), if at least 1-ε of the pairs satisfy distf(u,v) ≤ D(ε) Definition: An embedding has scaling distortion D(·) if it is a 1-ε partial embedding with distortion D(ε), for all ε>0 [KSW 04] [ABN 05, CDGKS 05]: Partial distortion and dimension Q(log(1/ε)) [ABN06]: Scaling distortion Q(log(1/ε)) for all metrics lq-Distortion vs. Scaling Distortion Upper bound De c log(1/e) on Scaling distortion: ½ of pairs have distortion ≤ c log 2 = c + ¼ of pairs have distortion ≤ c log 4 = 2c + ⅛ of pairs have distortion ≤ c log 8 = 3c …. i avgdist 2 i c 2c i 0 Average distortion = O(1) Wost case distortion = O(log(n)) lq-distortion = O(min{q,log n}) Coarse Scaling Embedding into Lp Definition: For uєX, rε(u) is the minimal radius such that |B(u,rε(u))| ≥ εn. Coarse scaling embedding: For each uєX, preserves distances to v s.t. d(u,v) ≥ rε(u). rε(w) w u rε(u) rε(v) v Scaling Distortion Claim: If d(x,y) ≥ rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε) Let l be the scale d(x,y) ≤ Δl < 4d(x,y) 1. Lower bound: E[|f(x)-f(y)|] ≥ d(x,y) 2. Upper bound for high diameter terms f x f y Olog 1 e d x, y i l i i 3. Upper bound for low diameter terms f x f y O1 d x, y i l i i 4. Replicate D=Θ(log n) times to get high probability. Upper Bound for high diameter terms: |f(x)-f(y)| ≤ O(log 1/ε) d(x,y) f i x i Pi x i1 x d x, X \ Pi x Scale l such that rε(x)≤d(x,y) ≤ Δl < 4d(x,y). Bx, re x en i l f i x f i y d x, y i1 x i l d x, y log i l B x, 4 i B x, i 4 Olog 1 e d x, y Upper Bound for low diameter terms: |f(u)-f(v)| = O(1) d(u,v) i1i1xxd dx,xX, X\ P\ iPixx, i f i xfi x i Pi ixPixmin Scale l such that d(x,y) ≤ Δl < 4d(x,y). All lower levels i ≤ l are bounded by Δi. f x f y i i l i i l i l 1 Od x, y f x f y Olog 1 e O1 d x, y i 0 i i Embedding into trees with Constant Average Distortion [ABN 07a]: An embedding of any n point metric into a single ultrametric. An embedding of any graph on n vertices into a spanning tree of the graph. Average distortion = O(1). L2-distortion = Q log n Lq-distortion = Θ(n1-2/q), for 2<q≤∞ Conclusion Developing mathematical theory of embedding of finite metric spaces Fruitful interaction between computer science and pure/applied mathematics New concepts of embedding yield surprisingly strong properties Summary Unified framework for embedding finite metrics. Probabilistic embedding into ultrametrics. Metric Ramsey theorems. New measures of distortion. Embeddings with strong properties: Optimal scaling distortion. Constant average distortion. Tight distortion-dimension tradeoff. Embedding metrics in their intrinsic dimension. Embedding that strongly preserve locality.