d(x,y)

advertisement
UCLA IPAM 07
Advances in Metric
Embedding Theory
Yair Bartal
Hebrew University
&
Caltech
Metric Spaces




Metric space: (X,d) d:X2→R+
d(u,v)=d(v,u)
d(v,w) ≤ d(v,u) + d(u,w)
d(u,u)=0
 Data Representation: Pictures (e.g.
faces), web pages, DNA sequences, …
 Network: communication distance
Metric Embedding
 Simple Representation: Translate metric
data into easy to analyze form, gain
geometric structure: e.g. embed in lowdimensional Euclidean space
 Algorithmic Application: Apply
algorithms for a “nice” space to solve
problem on “problematic” metric spaces
Embedding Metric Spaces
 Metric spaces (X,dX), (Y,dy)
 Embedding is a function f:X→Y
 For an embedding f,
Given u,v in X let
dY  f u , f v 
dist f u, v  
d X u, v 
 Distortion
c = max{u,v  X} distf(u,v) / min{u,v  X} distf(u,v)
Special Metric Spaces
 Euclidean space

p
 lp metric in Rn: || x  y || p    | xi  yi | 
 in
 Planar metrics
 Tree metrics
 Ultrametrics
 Doubling

1
p
Embedding in Normed
Spaces
 [Fréchet Embedding]: Any n-point
metric space embeds isometrically in L∞
 Proof.
w
x
y
Embedding in Normed
Spaces
 [Bourgain 85]: Any n-point metric space
embeds in Lp with distortion Θ(log n)
 [Johnson-Lindenstrauss 85]: Any npoint subset of Euclidean Space
embeds with distortion (1+e) in
dimension Θ(e-2 log n)
 [ABN 06, B 06]: Dimension Θ(log n)
In fact: Θ*(log n/ loglog n)
Embeddings
Metrics in their Intrinsic
Dimension
 Definition: A metric space X has doubling
constant λ, if any ball with radius r>0 can be
covered with λ balls of half the radius.
 Doubling dimension: dim(X) = log λ
 [ABN 07b]: Any n point metric space X can be
embedded into Lp with distortion O(log1+θ n),
dimension O(dim(X))
 Same embedding, using:
 nets
 Lovász Local Lemma
 Distortion-Dimension Tradeoff
Average Distortion
 Practical measure of the quality of an embedding
 Network embedding, Multi-dimensional scaling, Biology,
Vision,…
 Given a non-contracting embedding
f:(X,dX)→(Y,dY):
n
avgdist  f    
 2
1
 dist f (u, v)
u ,v X
dist f u, v  
dY  f u , f v 
d X u, v 
d  f u , f v 

distavg f    
d u , v 

 
u ,v  X
Y
u ,v X
X
 [ABN06]: Every n point metric space embeds into Lp
with average distortion O(1), worst-case distortion Θ(log
n) and dimension Θ(log n).
The lq-Distortion
 lq-distortion:
dist q  f   dist f u, v 
 n
dist1  f    
 2
1
n
  
 2
q
u v
 dist u, v 
u  vX
f
1
 dist u, v 
u  vX
f
[ABN 06]:
lq-distortion is
bounded by Θ(q)
2
f
 dist u, v 
q
q
dist   f   max dist f u, v 
n
dist 2  f    
 2
1
Dimension Reduction into
Constant Dimension
 [B 07]: Any finite subset of Euclidean
Space embeds in dimension h with lqdistortion eO(q/h) ~ 1+ O(q/h)
 Corollary: Every finite metric space
embeds into Lp in dimension h with lqdistortion q  eO( q / h) h 1 21 p
Local Embeddings
 Def: A k-local embedding has distortion D(k) if for
every k-nearest neighbors x,y: distf(x,y) ≤ D(k)
 [ABN 07c]: For fixed k, k-local embedding into Lp
distortion Q(log k) and dimension Q(log k) (under
very weak growth bound condition)
 [ABN 07c]: k-local embedding into Lp with
distortion Õ(log k) on neighbors, for all k
simultaneously, and dimension Q(log n)
 Same embedding method
 Lovász Local Lemma
Local Dimension Reduction
 [BRS 07]: For fixed k, any finite set of
points in Euclidean space has k-local
embedding with distortion (1+e) in
dimension Q(e-2 log k) (under very weak
growth bound condition)
 New embedding ideas
 Lovász Local Lemma
Time for a…
Metric Ramsey Problem
 Given a metric space what is the largest
size subspace which has some special
structure, e.g. close to be Euclidean
 Graph theory: Every graph of size n
contains either a clique or an independent
set of size Q(log n)
 Dvoretzky’s theorem…
 [BFM 86]: Every n point metric space
contains a subspace of size W(ce log n)
which embeds in Euclidean space with
distortion (1+e)
Basic Structures:
Ultrametric, k-HST [B 96]
(u)
(v)
(w)
x
(z)=0
z
0 = (z)  (w)/k  (v)/k2 (u)/k3
d(x,z)= (lca(x,z))= (v)
• An ultrametric k-embeds in a k-HST (moreover this
can be done so that labels are powers of k).
Hierarchically WellSeparated Trees
1
1
1
2  1/ k
1
2
2
2
3 2/ k
1
3
3
3
3
3
Properties of Ultrametrics
 An ultrametric is a tree metric.
 Ultrametrics embed isometrically in l2.
 [BM 04]: Any n-point ultrametric (1+e)embeds in lpd, where d = O(e-2 log n) .
A Metric Ramsey
Phenomenon
 Consider n equally spaced points on the line.
 Choose a “Cantor like” set of points, and
construct a binary tree over them.
 The resulting tree is 3-HST, and the original
subspace embeds in this tree with distortion 3.
 Size of subspace: 2log n  n log 2 .
3
3
Metric Ramsey
Phenomena
 [BLMN 03, MN 06, B 06]: Any n-point
metric space contains a subspace of size
1e
n which embeds in an ultrametric with
distortion Θ(1/e)
 [B 06]: Any n-point metric space contains
a subspace of linear size which embeds in
an ultrametric with lq-distortion is bounded
by Õ(q)
Metric Ramsey Theorems
 Key Ingredient: Partitions
Complete Representation
via Ultrametrics ?
 Goal: Given an n point metric space, we
would like to embed it into an ultrametric
with low distortion.
 Lower Bound: W(n), in fact this holds event
for embedding the n-cycle into arbitrary tree
metrics [RR 95]
Probabilistic Embedding
 [Karp 89]: The n-cycle probabilisticallyembeds in n-line spaces with distortion 2
C
 If u,v are adjacent in the cycle C then
E(dL(u,v))= (n-1)/n + (n-1)/n < 2 = 2 dC(u,v)
Probabilistic Embedding
 [B 96,98,04, FRT 03]: Any n-point metric
space probabilistically embeds into an
ultrametric with distortion Θ(log n)
[ABN 05,06, CDGKS 05]:
lq-distortion is Θ(q)
Probabilistic Embedding
 Key Ingredient: Probabilistic Partitions
Probabilistic Partitions
 P={S1,S2,…St} is a partition of X if
i  j : Si  S j  ,
 Si  X
i
 P(x) is the cluster containing x.
 P is Δ-bounded if diam(Si)≤Δ for all i.
 A probabilistic partition P is a distribution over a set
of partitions.
 P is (η,d)-padded if
η
PrBx,    Px  d
x1
η
 Call P η-padded if d1/2.
•[B 96]
=Q(1/(log n))
•[CKR01+FRT03, ABN06]: η(x)= Ω(1/log (ρ(x,Δ))
x2
Partitions and Embedding
 [B 96, Rao 99, …]
 Let Δi=4i be the scales.
 For each scale i, create a probabilistic Δibounded partitions Pi, that are η-padded.
diameter of X = Δ
Δi
 For each cluster choose σi(S)~Ber(½) i.i.d.
fi(x)= σi(Pi(x))·d(x,X\Pi(x))
f x    f i x 
i 0
 Repeat O(log n) times.
 Distortion : O(η-1·log1/pΔ).
 Dimension : O(log n·log Δ).
16
4
x
d(x,X\P(x))
Time to…
Uniform Probabilistic Partitions
 In a Uniform Probabilistic Partition η:X→[0,1] all points
in a cluster have the same padding parameter.
 [ABN 06]: Uniform partition lemma: There exists a
uniform probabilistic Δ-bounded partition such that for
any x  C , η(x)=log-1ρ(v,Δ), where v  min  x,  
xC
 The local growth rate of x at radius r is:   x, r  
C1
v1
η(C1) 
v2
C2
v3
η(C2) 
B  x, 4 r 
B  x, r 4 
Embedding
into a single dimension


Let Δi=4i.
For each scale i, create uniformly padded
probabilistic Δi-bounded partitions Pi.

For each cluster choose σi(S)~Ber(½) i.i.d.
f  x    f i  x ,
i 0
1.
2.
3.
fi(x)= σi(Pi(x))·ηi-1(x)·d(x,X\Pi(x))
Upper bound : |f(x)-f(y)| ≤ O(log n)·d(x,y).
Lower bound: E[|f(x)-f(y)|] ≥ Ω(d(x,y))
Replicate D=Θ(log n) times to get high probability.
Upper Bound:
|f(x)-f(y)| ≤ O(log n) d(x,y)
f i x    i Pi x  i1 x   d x, X \ Pi x 
 For all x,yєX:
- Pi(x)≠Pi(y) implies fi(x)≤ ηi-1(x)· d(x,y)
- Pi(x)=Pi(y) implies fi(x)- fi(y)≤ ηi-1(x)· d(x,y)

i 0
f i x   f i  y   d x, y i1 x 
i 0
 d x, y  log
i 0
B  x, 4  i 
B  x,  i 4 
 Olog n   d x, y 
Use uniform
padding in
cluster
Lower
Bound:
y
i x i
x
 Take a scale i such that Δi≈d(x,y)/4.
 It must be that Pi(x)≠Pi(y)
 With probability ½ : ηi-1(x)d(x,X\Pi(x))≥Δi
Lower bound : E[|f(x)-f(y)|] ≥
d(x,y)

Two cases:
R
 f x   f  y 
j i
j
1.
R < Δi/2 then
 prob. ⅛: σi(Pi(x))=1 and σi(Pi(y))=0
 Then fi(x) ≥ Δi ,fi(y)=0
 |f(x)-f(y)| ≥ Δi/2 =Ω(d(x,y)).
2.
R ≥ Δi/2 then
 prob. ¼: σi(Pi(x))=0 and σi(Pi(y))=0
 fi(x)=fi(y)=0
 |f(x)-f(y)| ≥ Δi/2 =Ω(d(x,y)).
j
Partial Embedding &
Scaling Distortion
 Definition: A (1-ε)-partial embedding has distortion D(ε), if
at least 1-ε of the pairs satisfy distf(u,v) ≤ D(ε)
 Definition: An embedding has scaling distortion D(·) if it is
a 1-ε partial embedding with distortion D(ε), for all ε>0
 [KSW 04]
 [ABN 05, CDGKS 05]:
 Partial distortion and dimension Q(log(1/ε))
 [ABN06]: Scaling distortion Q(log(1/ε)) for all
metrics
lq-Distortion vs.
Scaling Distortion
 Upper bound De  c log(1/e) on Scaling distortion:
 ½ of pairs have distortion ≤ c log 2 = c
 + ¼ of pairs have distortion ≤ c log 4 = 2c
 + ⅛ of pairs have distortion ≤ c log 8 = 3c

….
i
avgdist   2 i  c  2c
i 0
 Average distortion = O(1)
 Wost case distortion = O(log(n))
 lq-distortion = O(min{q,log n})
Coarse Scaling Embedding
into Lp
 Definition: For uєX,
rε(u) is the minimal
radius such that
|B(u,rε(u))| ≥ εn.
 Coarse scaling
embedding: For each
uєX, preserves
distances to v s.t.
d(u,v) ≥ rε(u).
rε(w)
w
u
rε(u)
rε(v)
v
Scaling Distortion


Claim: If d(x,y) ≥ rε(x) then 1 ≤ distf(x,y) ≤ O(log 1/ε)
Let l be the scale d(x,y) ≤ Δl < 4d(x,y)
1. Lower bound: E[|f(x)-f(y)|] ≥ d(x,y)
2. Upper bound for high diameter terms
 f x  f  y   Olog 1 e  d x, y 
i l
i
i
3. Upper bound for low diameter terms
 f x  f  y   O1 d x, y 
i l
i
i
4. Replicate D=Θ(log n) times to get high probability.
Upper Bound for high diameter terms:
|f(x)-f(y)| ≤ O(log 1/ε) d(x,y)
f i x    i Pi x  i1 x   d x, X \ Pi x 
Scale l such that rε(x)≤d(x,y) ≤ Δl < 4d(x,y).
Bx, re x  en

i l
f i x   f i  y   d x, y i1 x 
i l
 d x, y  log
i l
B  x, 4  i 
B  x,  i 4 
 Olog 1 e   d x, y 
Upper Bound for low diameter terms:
|f(u)-f(v)| = O(1) d(u,v)

i1i1xxd dx,xX, X\ P\ iPixx,  i
f i xfi x i 
Pi ixPixmin
Scale l such that d(x,y) ≤ Δl < 4d(x,y).
 All lower levels i ≤ l are bounded by Δi.
 f x   f  y    
i
i l
i
i l
i
  l 1  Od x, y 
 f x  f  y   Olog 1 e   O1 d x, y 
i 0
i
i

Embedding into trees with
Constant Average Distortion
 [ABN 07a]: An embedding of any n point
metric into a single ultrametric.
 An embedding of any graph on n
vertices into a spanning tree of the
graph.
 Average distortion = O(1).
 L2-distortion = Q log n 
 Lq-distortion = Θ(n1-2/q), for 2<q≤∞
Conclusion
 Developing mathematical theory of
embedding of finite metric spaces
 Fruitful interaction between
computer science and pure/applied
mathematics
 New concepts of embedding yield
surprisingly strong properties
Summary





Unified framework for embedding finite metrics.
Probabilistic embedding into ultrametrics.
Metric Ramsey theorems.
New measures of distortion.
Embeddings with strong properties:
 Optimal scaling distortion.
 Constant average distortion.
 Tight distortion-dimension tradeoff.
 Embedding metrics in their intrinsic dimension.
 Embedding that strongly preserve locality.
Download