TREE-LIKE STRUCTURE IN GRAPHS AND EMBEDABILITY TO

advertisement
TREE-LIKE STRUCTURE IN GRAPHS AND EMBEDABILITY TO TREES
A dissertation submitted to
Kent State University in partial
fulfillment of the requirements for the
degree of Doctor of Philosophy
by
Muad Mustafa Abu-Ata
May 2014
Dissertation written by
Muad Mustafa Abu-Ata
B.S., Yarmouk University, 2000
M.Sc., Yarmouk University, 2003
Ph.D., Kent State University, 2014
Approved by
Dr. Feodor F. Dragan
, Chair, Doctoral Dissertation Committee
Dr. Ruoming Jin
, Members, Doctoral Dissertation Committee
Dr. Ye Zhao
Dr. Artem Zvavitch
Accepted by
Dr. Javed Khan
, Chair, Department of Computer Science
Dr. James L. Blank
, Dean, College of Arts and Sciences
ii
TABLE OF CONTENTS
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Research contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.2
Publication notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.3
Preliminaries and Notations . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.3.1
1.4
Tree-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.4.1
Low distortion embedding . . . . . . . . . . . . . . . . . . . . . .
13
1.4.2
Embedding into a metric of a (weighted) tree. . . . . . . . . . . .
14
1.4.3
Tree spanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.4.4
Sparse spanners . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
1.4.5
Collective tree spanners . . . . . . . . . . . . . . . . . . . . . . .
19
1.4.6
Spanners with bounded tree-width. . . . . . . . . . . . . . . . . .
20
iii
2 Metric tree-like structures in real-life networks:
an empirical study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.2
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.3
Layering Partition, its Cluster-Diameter and Cluster-Radius . . . . . . .
28
2.4
Hyperbolicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2.5
Tree-Distortion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
2.6
Tree-Breadth, Tree-Length and Tree-Stretch . . . . . . . . . . . . . . . .
46
2.7
Use of Metric Tree-Likeness . . . . . . . . . . . . . . . . . . . . . . . . .
53
2.7.1
Approximate distance queries . . . . . . . . . . . . . . . . . . . .
53
2.7.2
Approximating optimal routes . . . . . . . . . . . . . . . . . . . .
56
2.7.3
Approximating diameter and radius . . . . . . . . . . . . . . . . .
58
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
2.8
3 Collective Additive Tree Spanners and the Tree-Breadth of a Graph with Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
3.2
Collective Additive Tree Spanners and the Tree-Breadth of a Graph . . .
68
3.3
Hierarchical decomposition of a graph with bounded tree-breadth . . . .
69
3.4
Construction of collective additive tree spanners . . . . . . . . . . . . . .
72
3.5
Additive spanners for graphs admitting (multiplicative) tree t-spanners .
80
4 Collective Additive Tree Spanners of Graphs with Bounded k-Tree-Breadth, k ≥ 2 81
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
81
4.2
Balanced separators for graphs with bounded k-tree-breadth . . . . . . .
82
4.3
Decomposition of a graph with bounded k-tree-breadth . . . . . . . . . .
85
4.4
Construction of a hierarchical tree . . . . . . . . . . . . . . . . . . . . . .
87
4.5
Construction of collective additive tree spanners . . . . . . . . . . . . . .
89
4.6
Additive Spanners for Graphs Admitting (Multiplicative) t-Spanners of
Bounded Tree-width. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1
4.6.2
93
k-Tree-breadth of a graph admitting a t-spanner of bounded treewidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
5 Embedding of Weighted Graphs into Trees: Theoretical Grounds and Empirical
Analysis on Real Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
5.1
Layering partition for weighted graphs . . . . . . . . . . . . . . . . . . .
98
5.2
Properties of layering partition for weighted graphs . . . . . . . . . . . .
99
5.3
Construction of tree embedding . . . . . . . . . . . . . . . . . . . . . . . 102
5.4
Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4.1
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.4.2
Layering partition results . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.3
Non-contractive embedding results . . . . . . . . . . . . . . . . . 113
5.4.4
Edge subdivision (h ≤ w) . . . . . . . . . . . . . . . . . . . . . . 115
5.4.5
Contractive embedding: weighting clusters with their own diameters118
5.4.6
Embedding with recursive partitioning of clusters . . . . . . . . . 118
6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
v
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
vi
LIST OF FIGURES
1
A graph and its tree-decomposition of width 3, of length 3, and of breadth
2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2
Layering partition and associated constructs. . . . . . . . . . . . . . . . .
29
3
Illustration to the proof of Proposition 3. . . . . . . . . . . . . . . . . . .
38
4
Embedding into trees H, Hℓ and Hℓ′ . . . . . . . . . . . . . . . . . . . . . .
42
5
Illustration to the proof of Proposition 9. . . . . . . . . . . . . . . . . . .
48
6
Distortion distribution for embedding of a graph dataset into its canonic
tree H. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
7
Four tree-likeness measurements scaled. . . . . . . . . . . . . . . . . . . .
64
8
Tree-likeness measurements: pairwise comparison. . . . . . . . . . . . . .
64
9
A graph G with a disk-separator Dr (v, G) and the corresponding graphs
+
G+
1 , . . . , G4 obtained from G. c1 , . . . , c4 are meta vertices representing the
disk Dr (v, G) in the corresponding graphs.
10
. . . . . . . . . . . . . . . .
70
a) A graph G and its balanced disk-separator D1 (13, G). b) A hierarchical
tree H(G) of G. We have G = G(↓ Y 0 ), Y 0 = D1 (13, G). Meta vertices are
shown circled, disk centers are shown in bold. c) The graph G(↓ Y 1 ) with
its balanced disk-separator D1 (23, G(↓ Y 1 )) = Y 1 . G(↓ Y 1 ) is a minor of
G(↓ Y 0 ). d) The graph G(↓ Y 2 ), a minor of G(↓ Y 1 ) and of G(↓ Y 0 ).
Y 2 = V (G(↓ Y 2 )) is a leaf of H(G). . . . . . . . . . . . . . . . . . . . . .
vii
73
11
Illustration to the proof of Lemma 4: “unfolding” meta vertices.
. . . .
75
12
Illustration to the proof of Lemma 7. . . . . . . . . . . . . . . . . . . . .
77
13
A graph G with a balanced D3r -separator and the corresponding graphs
+
+
G+
1 , . . . , G4 obtained from G. Each Gi has three meta vertices represent-
ing the three disks.
14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
Illustration to the proof of Lemma 14. A tree-decomposition for G is
obtained from a tree-decomposition of H. . . . . . . . . . . . . . . . . . .
96
15
A layering partition of a weighted graph G. . . . . . . . . . . . . . . . . . 100
16
Illustration of proof of Lemma 17. . . . . . . . . . . . . . . . . . . . . . . 105
17
Cluster-width versus average distortion, maximum distortion and number
of dummy vertices for the Celegans dataset. . . . . . . . . . . . . . . . . 116
18
Cluster-width versus average distortion, maximum distortion and number
of dummy vertices for the CornellKing dataset. . . . . . . . . . . . . . . 117
viii
LIST OF TABLES
1
Known results on approximate embedding problems for multiplicative distortion; λ is used to denote the optimal distortion and n to denote the
number of points in the input metric. The table contains only the results
that hold for the multiplicative definition of the distortion; there is a rich
body of work that applies to other definitions of distortion, notably the
additive or average distortion, see [17] for an overview. . . . . . . . . . .
2
Graph datasets and their parameters: number of vertices, number of edges,
diameter, radius. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
15
25
Layering partitions of the datasets and their parameters. ∆s (G) is the
largest diameter of a cluster in LP(G, s), where s is a randomly selected
start vertex. For all datasets, the average diameter of a cluster is between
0 and 1. For most datasets, more than 95% of clusters are cliques. . . . .
4
31
Frequency of diameters of clusters in layering partition LP(G, s) (three
datasets). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
5
δ-hyperbolicity of the graph datasets. . . . . . . . . . . . . . . . . . . . .
35
6
Relative frequency of δ-hyperbolicity of quadruplets in our graph datasets
7
that have less than 10K vertices. . . . . . . . . . . . . . . . . . . . . . . .
36
Distortion results of embedding datasets into a canonic tree H. . . . . . .
44
ix
8
Distortion results of non-contractive embedding of datasets into trees Hℓ
and Hℓ′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
9
Lower and upper bounds on the tree-breadth of our graph datasets. . . .
50
10
Estimation of diameters and radii.
. . . . . . . . . . . . . . . . . . . . .
59
11
Summary of tree-likeness measurements. . . . . . . . . . . . . . . . . . .
62
12
Real datasets parameters: n: the number of vertices, m: the number of
edges, the largest edge weight, the smallest edge weight and the diameter
of the graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
13
Layering partitions of the datasets and their parameters. h is the clusterwidth of LP(s, h) and set equal to the longest edge weight. s is a randomly
selected start vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
14
Distortion results for non-contractive embedding of the datasets into tree
H. Cluster-width is equal to the largest edge weight (h = w).
15
. . . . . . 115
Distortion results for non-contractive embedding of the datasets into tree
H. Cluster-width is less than or equal the largest edge weight (h ≤ w). . 117
16
Distortion results for embedding of the datasets into tree H ′ . Edges inside
each cluster C are weighted equal to diam(C)/2. . . . . . . . . . . . . . . 119
17
Percentage of vertex pairs with distortion up to a given value by embedding
datasets into tree H ′ with own diameter weighting. . . . . . . . . . . . . 120
18
Distortion results for embedding with P-centers partitioning for datasets
into tree H ′ . P-centers has negligible improvement of distortion for other
datasets of table 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
x
Acknowledgements
I would like to express my deepest gratitude and thank my research advisor, Dr.
Feodor F. Dragan, for mentoring me during my PhD study and research. I have learned a
lot from him. Without his persistent help, patience and guidance, this dissertation would
not have materialized. I cannot thank him enough for his sincere and overwhelming help
and support. Also, I would like to thank my dissertation committee, Dr. Ruoming Jin,
Dr. Ye Zhao and Dr. Artem Zvavitch, for their participation, comments and feedback.
Finally, I would like to thank the faculty and staff of the Department of Computer Science
at Kent State University for their help and support.
xi
This dissertation is dedicated to the memory of my mother, Hajar Ibdah, her endless
love, care and support have sustained me throughout my life. Her passion, strength and
faith are the greatest lessons in my life.
xii
CHAPTER 1
Introduction
The problem of embedding a graph metric into a “nice” and “simpler” metric space
with low distortion has been a subject of extensive research, motivated from several
applications in various domains and for its intrinsic mathematical interest. “Nice” metric spaces are those with well-studied structural properties, allowing to design efficient
approximation algorithms, such as Euclidean or ℓ1 space, lines, weighted trees and distributions over them. A very incomplete list of applications includes approximation
algorithms for graph and network problems, such as sparsest cut [14, 126], minimum
bandwidth [34, 89], low-diameter decompositions [126], buy-at-bulk network design [16],
distance and routing labeling schemes [77, 79, 102, 164], and optimal group Steiner trees
[48, 93], and online algorithms for metrical task systems and file migration problems
[24, 26]. These applications, together with its intrinsic mathematical interest, have made
the study of low-distortion embeddings of graphs a significant field in its own right.
However, obtaining approximation algorithms for minimum distortion embeddings into
certain host spaces (e.g., Rd (d ≥ 1)) has been a notoriously hard problem (see [17, 128]
and papers cited therein). Therefore, a particular host metric of choice, also favored from
the algorithmic point of view, are simple graph metrics.
Again as mentioned earlier, tree metrics are a very natural class of simple graph
metrics since many algorithmic problems become tractable on them. Ideally we would like
1
2
that distances in the tree metric are no smaller than those in the original metric and we
would like to bound the distortion or the maximum increase. Formally, A multiplicative
embedding of a graph G = (V, E) into a weighted tree (possibly with Steiner vertices)
T = (V ∪ S, F ) is an embedding such that dG (u, v) ≤ λdT (u, v) for all u, v ∈ V . The
parameter λ is called tree-distortion. Analogously an additive embedding of a graph
G = (V, E) into a weighted tree (possibly with Steiner vertices) T = (V ∪ S, F ) is an
embedding such that dG (u, v) ≤ dT (u, v) + r for all u, v ∈ V .
The study of tree metrics can be traced back to the beginning of the 20th century,
when it was first realized that weighted trees can in some cases serve as an (approximate)
model for the description of evolving systems. More recently, as indicated in [153], it
was observed that certain Internet originated metrics display tree-like properties. It is
well known [151] that tree metrics have a simple structure: d is a tree metric if and
only if all submetrics of d of size 4 are such. Moreover, the underlying tree is unique,
easily reconstructible, and has rigid local structure corresponding to the local structure
of d. But what about the structure of approximately tree metrics? We have only partial
answers for this question, and yet what we already know seems to indicate that a rich
theory might well be hiding there.
In distributed systems and communication networks, an important requirement is
that a host network (graph) S must be a subgraph of original network G (each link
present in S must be present in G as well). This would lead to the notion of spanners.
If we require from the host graph to be not an arbitrary tree but a spanning tree of
the original graph, we obtain a well known notion of a tree t-spanner.
For t ≥ 1, a
(multiplicative) tree t-spanner of a graph G = (V, E) is a spanning tree T = (V, E ′ ⊆ E)
3
such that the distance between every pair of vertices in T is at most t times their distance
in G, i.e. dH (u, v) ≤ t · dG (u, v) for all u, v ∈ V [44]. The parameter t is called the stretch
(or stretch factor) of T with a stretch t. For r ≥ 0, an additive tree r-spanner of G is
a spanning tree T = (V, E ′ ⊆ E) such that dT (u, v) ≤ dG (u, v) + r, for all u, v ∈ V
[146]. The parameter r is called the surplus r. If we approximate the graph by a tree
spanner, we can solve a given problem on the tree and interpret the solution on the
original graph. The tree t-spanner problem asks, given a graph G and a positive
number t, whether G admits a tree t-spanner. Note that the problem of finding a tree
t-spanner of G minimizing t is known in the literature also as the Minimum Max-Stretch
spanning Tree problem (see, e.g., [86] and literature cited therein).
There are many applications of tree spanners in various areas. Tree spanners are useful
in designing approximation algorithms for combinatorial and algorithmic problems that
are concerned with distances in a finite metric space induced by a graph.
Tree spanners find applications also in network design and, in particular, in the context of distributed systems. One such application is the arrow distributed directory protocol introduced in [64]. This protocol supports the location of mobile objects in a
distributed network. It is implemented over a spanning tree T that spans the network,
and, as shown in [142], the worst case overhead ratio of the protocol is proportional to
the stretch of T . Therefore, a good candidate for the backbone of the arrow protocol is
a spanning tree with low stretch (see also [105]). Another application of tree spanners
is in message routing in communication networks. In order to maintain succinct routing
tables, efficient routing schemes can use only the edges of a tree spanner. A very efficient
routing scheme is available for trees [157]. We refer to the survey paper of Peleg [141]
4
for an overview on spanners and their applications.
Unfortunately, not many graph families admit good tree spanners. This motivates the
study of sparse spanners, i.e., spanners with a small amount of edges. There are many
applications of spanners in various areas; especially, in distributed systems and communication networks. In [144], close relationships were established between the quality of
spanners (in terms of stretch factor and the number of spanner edges), and the time and
communication complexities of any synchronizer for the network based on this spanner.
Sparse spanners are very useful in message routing in communication networks; in order
to maintain succinct routing tables, efficient routing schemes can use only the edges of a
sparse spanner [145]. The Sparsest t-Spanner problem asks, for a given graph G and
a number t, to find a t-spanner of G with the smallest number of edges. We refer to the
survey paper of Peleg [141] for an overview on spanners.
It is not difficult to show that there are metrics (e.g., cycles [101, 147]) which cannot
be embedded into tree metrics with o(n) distortion. Inspired by ideas from works of
Alon et al. [11], Bartal [24, 25], Fakcharoenphol et al. [87], and to extend those ideas
to designing compact and efficient routing and distance labeling schemes in networks,
in [79], a new notion of collective tree spanners1 was introduced. This notion is slightly
weaker than the one of a tree spanner and slightly stronger than the notion of a sparse
spanner. We say that a graph G = (V, E) admits a system of µ collective additive tree
r-spanners if there is a system T (G) of at most µ spanning trees of G such that for any
two vertices x, y of G a spanning tree T ∈ T (G) exists such that dT (x, y) ≤ dG (x, y) + r
(a multiplicative variant of this notion can be defined analogously). Clearly, if G admits
1
Independently, Gupta et al. in [102] introduced a similar concept which is called tree covers there.
5
a system of µ collective additive tree r-spanners, then G admits an additive r-spanner
with at most µ × (n − 1) edges (take the union of all those trees), and if µ = 1, then G
admits an additive tree r-spanner.
Recently, in [75], spanners of bounded tree-width were introduced, motivated by the
fact that many algorithmic problems are tractable on graphs of bounded tree-width, and
a spanner H of G with small tree-width can be used to obtain an approximate solution
to a problem on G. In particular, efficient and compact distance and routing labeling
schemes are available for bounded tree-width graphs (see, e.g., [77, 102] and papers cited
therein), and they can be used to compute approximate distances and route along paths
that are close to shortest in G. The k-Tree-width t-spanner problem asks, for a given
graph G, an integer k and a positive number t ≥ 1, whether G admits a t-spanner of
tree-width at most k. Every connected graph with n vertices and at most n − 1 + m edges
is of tree-width at most m + 1 and hence this problem is a generalization of the Tree tSpanner and the Sparsest t-Spanner problems. Furthermore, t-spanners of bounded
tree-width have much more structure to exploit algorithmically than sparse t-spanners
(which have a small number of edges but may lack other nice structural properties).
1.1
Research contribution
In this dissertation we study the “tree-likeness” and different problems described earlier of embedding graph metrics into tree metrics, tree-spanners, collective tree-spanners
and sparse spanners. In Chapter 2, we study tree-like structure in real-world graph
datasets from a metric point of view. We empirically investigate the problem of embedding (unweighted) graphs into trees using the recent state of the art graph embedding
6
techniques. Furthermore, we present strong evidence, based on solid theoretical foundations, that a number of real-life networks, taken from different domains like Internet
measurements, biological datasets, web graphs, social and collaboration networks, exhibit tree-like structures from a metric point of view. Specifically, we investigate few
graph parameters, namely, the tree-distortion and the tree-stretch, the tree-length and
the tree-breadth, the Gromov’s hyperbolicity, the cluster-diameter and the cluster-radius
in a layering partition of a graph, which capture and quantify this phenomenon of being
metrically close to a tree. By bringing all those parameters together, we not only provide
efficient means for detecting such metric tree-like structures in large-scale networks but
also show how such structures can be used, for example, to efficiently and compactly
encode approximate distance and almost shortest path information and to fast and accurately estimate diameters and radii of those networks. Estimating the diameter and the
radius of a graph or distances between its arbitrary vertices are fundamental primitives
in many data and graph mining algorithms.
Chapters 3 and 4 concern the problem of collective tree spanners and sparse spanners.
Specifically, we study collective additive tree spanners for families of graphs enjoying special Robertson-Seymour’s tree-decompositions, and demonstrate interesting consequences
of obtained results. We demonstrate in Chapter 3 that there is a polynomial time algorithm that, given an n-vertex graph G admitting a multiplicative tree t-spanner, constructs a system of at most log2 n collective additive tree O(t log n)-spanners of G. That
is, with a slight increase in the number of trees and in the stretch, one can “turn” a
multiplicative tree spanner into a small set of collective additive tree spanners.
In Chapter 4, we extend the result from Chapter 3 by showing that if a graph G
7
admits a multiplicative t-spanner with tree-width k − 1, then G admits a RobertsonSeymour’s tree-decomposition each bag of which can be covered with at most k disks of
G of radius at most ⌈t/2⌉ each. This is used to demonstrate that, for every fixed k, there
is a polynomial time algorithm that, given an n-vertex graph G admitting a multiplicative
t-spanner with tree-width k − 1, constructs a system of at most k(1 + log2 n) collective
additive tree O(t log n)-spanners of G.
In Chapter 5, we investigate the problem of embedding a weighted graph metric into
a tree metric. We develop an approach with proven theoretical bounds for this problem.
Furthermore, we apply and empirically test our approach on real-world graph datasets.
1.2
Publication notes
The results of Chapter 2 are to be submitted for publication to a relevant conference.
The results of Chapters 3 and 4 are accepted for publication and will appear in the Journal
of Theoretical Computer Science (TCS) and have already being partially published in
[73] at the 39th International Conference on Current Trends in Theory and Practice
of Computer Science (SOFSEM 2013). Results of Chapter 5 are in preparation for
submission for publication.
1.3
Preliminaries and Notations
A metric space is an ordered pair (M, d) where M is a set and d is a measure of
distance between elements of M , i.e., d is a function d : M × M −→ R, such that for any
x, y, z ∈ M , the following three conditions hold:
1. d(x, y) = 0 if and only if x = y.
8
2. d(x, y) = d(y, x) (symmetry).
3. d(x, y) ≤ d(x, z) + d(z, y) (triangle inequality).
For simplicity, we may refer to a metric space (M, d) by only M .
A metric space (M, d) is isometrically embeddable into a host metric space (M ′ , d′ )
if there exists a map φ : M −→ M ′ such that d′ (φ(p), φ(q)) = d(p, q) for all p, q ∈ M .
In this case, we say M is a subspace of M ′ . A low-distortion embedding between two
metric spaces (M, d) and (M ′ , d′ ) is a (non-contractive) mapping φ such that for any pair
of points p, q in the original metric space, their distance d(p, q) before the mapping is
the same as the distance d′ (φ(p), φ(q)) after the mapping, up to a (small) multiplicative
factor λ. Low-distortion embeddings have been a subject of extensive mathematical
studies, and found numerous applications in computer science (see [106, 107, 125]).
Formally, a low-distortion embedding of a metric space (M, d) into another metric
space (M ′ , d′ ) with distance functions d and d′ , is a mapping φ : M → M ′ such that for
any pair of points p, q in the original metric space M , their distance d(p, q) before the
mapping is the same as the distance d′ (φ(p), φ(q)) after the mapping, up to a (small)
multiplicative factor λ. The mapping φ has contraction cφ and expansion eφ if for every
pair of points p, q in M ,
d(p, q) ≤ cφ · d′ (φ(p), φ(q))
and
eφ · d(p, q) ≥ d′ (φ(p), φ(q)),
respectively. We say that φ is non-contracting if cφ is at most 1. A non-contracting
mapping φ has distortion λ if eφ is at most λ. Also, we say that φ : M → M ′ is an
9
embedding with (multiplicative) distortion λ ≥ 1 if d(x, y) ≤ d′ (φ(x), φ(y)) ≤ λ · d(x, y)
for all x, y in M .
Analogously we can define embedding with additive distortion: φ : M → M ′ is an
embedding with additive distortion λ ≥ 0 if d(x, y) ≤ d′ (φ(x), φ(y)) ≤ d(x, y) + λ for all
x, y in M.
Throughout the dissertation, we will often omit the word multiplicative when we refer
to embedding with multiplicative distortion.
Given an undirected graph G with the vertex set V (G) and the edge set E(G), the
graph metric of G denoted as M (G) is the metric induced by the shortest path distances of
G, i.e, M (G) = (V (G), dG ), where the distance function dG is the shortest path distance
between u and v for every pair of vertices u, v ∈ V (G).
All graphs occurring in this dissertation are connected, finite, undirected, loopless and
without multiple edges. Also, the graphs in all chapters are unweighted except for those
in Chapter 5. For a graph G = (V, E), we use n and |V | interchangeably to denote the
number of vertices in G. Also, we use m and |E| to denote the number of edges. A clique
is a set of pairwise adjacent vertices of G. By G[S] we denote a subgraph of G induced
by vertices of S ⊆ V . Let also G \ S be the graph G[V \ S] (which is not necessarily
connected). A set S ⊆ V is called a separator of a connected graph G if the graph
G[V \ S] has more than one connected component, and S is called a balanced separator
of G if each connected component of G[V \ S] has at most |V |/2 vertices. A set C ⊆ V
is called a balanced clique-separator of G if C is both a clique and a balanced separator
of G. For a vertex v of G, the sets NG (v) = {w ∈ V |vw ∈ E} and NG [v] = NG (v) ∪ {v}
are called the open neighborhood and the closed neighborhood of v, respectively.
10
In a graph G the length of a path from a vertex v to a vertex u is the number of edges
in the path. The distance dG (u, v) between vertices u and v is the length of a shortest path
connecting u and v in G. The disk/ ball of G of radius r centered at vertex v is the set
of all vertices at distance at most k to v: Dr (v, G) = Br (v, G) = {w ∈ V |dG (v, w) ≤ r}.
We omit the graph name G as in Dr (v) or Br (v) if the context is about only one graph.
A disk Dr (v, G) is called a balanced disk-separator of G if the set Dr (v, G) is a balanced
separator of G.
The diameter diam(G) of a graph G = (V, E) is the largest distance between a
pair of vertices in G, i.e., diam(G) = maxu,v∈V dG (u, v). The eccentricity of a vertex v,
denoted by ecc(v), is the largest distance from that vertex v to any other vertex, i.e.,
ecc(v) = maxu∈V dG (v, u). The radius rad(G) of a graph G = (V, E) is the minimum
eccentricity of a vertex in G, i.e., rad(G) = minv∈V maxu∈V dG (v, u). The center C(G) =
{c ∈ V : ecc(c) = rad(G)} of a graph G = (V, E) is the set of vertices with minimum
eccentricity. The diameter in G of a set S ⊆ V is maxx,y∈S dG (x, y) and its radius in G is
minx∈V maxy∈S dG (x, y) (in some papers they are called the weak diameter and the weak
radius to indicate that the distances are measured in G not in G[S]).
An approximation algorithm is an algorithm that runs in polynomial time and produces a solution that is within a guaranteed factor of the optimum solution for some
optimization problem. A constant approximation algorithm produces a solution within
a guaranteed constat factor c of the optimum solution (called a c-approximation). A
Polynomial Time Approximation Scheme (PTAS) is an approximation algorithm that
produces a solution that is within a factor of 1 + ϵ of the optimum solution and runs in
polynomial time for every fixed ϵ > 0.
11
1.3.1
Tree-decomposition
There are in the literature few graph parameters measuring metric tree-likeness of a
graph and related to the tree t-spanner problem. They all are based on the notion of
tree-decomposition introduced by Robertson and Seymour in their work on graph minors
[150].
A tree-decomposition of a graph G = (V, E) is a pair ({Xi |i ∈ I}, T = (I, F )) where
{Xi |i ∈ I} is a collection of subsets of V , called bags, and T is a tree. The nodes of T
are the bags {Xi |i ∈ I} satisfying the following three conditions (see Figure 1):
1.
∪
i∈I
Xi = V ;
2. for each edge uv ∈ E, there is a bag Xi such that u, v ∈ Xi ;
3. for all i, j, k ∈ I, if j is on the path from i to k in T , then Xi
∩
Xk ⊆ Xj . Equiv-
alently, this condition could be stated as follows: for all vertices v ∈ V , the set of
bags {i ∈ I|v ∈ Xi } induces a connected subtree Tv of T .
For simplicity we denote a tree-decomposition ({Xi |i ∈ I}, T = (I, F )) of a graph G by
T (G).
Tree-decompositions were used to define several graph parameters to measure how
close a given graph is to some known graph class (e.g., to trees or to chordal graphs) where
many algorithmic problems could be solved efficiently. The width of a tree-decomposition
T (G) = ({Xi |i ∈ I}, T = (I, F )) is maxi∈I |Xi | − 1. The tree-width of a graph G, denoted
by tw(G), is the minimum width, over all tree-decompositions T (G) of G [150]. The trees
are exactly the graphs with tree-width 1. The problem of determining if a given graph
admits a treewidth at most k, where k is variable, is NP-complete [13]. However, when k
12
(a) A graph G.
(b) A tree-decomposition of G.
Figure 1: A graph and its tree-decomposition of width 3, of length 3, and of breadth 2.
is a fixed constant, the problem has a linear time solution that also finds a width k tree
decomposition for the given graph [35]. It is worth noting that the time of the algorithm
of [35] is exponential on k.
The length of a tree-decomposition T (G) of a graph G is λ := maxi∈I maxu,v∈Xi dG (u, v)
(i.e., each bag Xi has diameter at most λ in G). The tree-length of G, denoted by tl(G),
is the minimum of the length, over all tree-decompositions of G [71]. The chordal graphs
are exactly the graphs with tree-length 1. Note that these two graph parameters are not
related to each other. For instance, a clique on n vertices has tree-length 1 and tree-width
n − 1, whereas a cycle on 3n vertices has tree-width 2 and tree-length n. The breadth of
a tree-decomposition T (G) of a graph G is the minimum integer r such that for every
i ∈ I there is a vertex vi ∈ V with Xi ⊆ Dr (vi , G) (i.e., each bag Xi can be covered by
13
a disk Dr (vi , G) := {u ∈ V (G) : dG (u, vi ) ≤ r} of radius at most r in G). Note that
vertex vi does not need to belong to Xi . The tree-breadth of G, denoted by tb(G), is
the minimum of the breadth over all tree-decompositions of G [76]. It turns out that
tree-breadth is related to the problem of tree t-spanner problem [76]. Unfortunately,
while graphs with tree-length 1 (as they are exactly the chordal graphs) can be recognized in linear time, the problem of determining whether a given graph has tree-length
at most λ is NP-complete for every fixed λ > 1 (see [127]). Judging from this result, it is
conceivable that the problem of determining whether a given graph has tree-breadth at
most ρ is NP-complete, too. We say that a family of graphs G is of bounded tree-breadth
(of bounded tree-width, of bounded tree-length) if there is a constant c such that for each
graph G from G, tb(G) ≤ c (resp., tw(G) ≤ c, tl(G) ≤ c).
1.4
1.4.1
Related work
Low distortion embedding
The work of Bourgain [37] presents first embeddings with guaranties. It was shown
that any finite metric on n nodes can be embedded into ℓ2 with logarithmic distortion
with the number of dimensions exponential in n. Linial et al. [126] modified Bourgain’s
result to apply for ℓ1 metrics and to use O(log2 n) dimensions. In [124], Linial et al.
used Bourgain’s result to discover properties of the distance metric between protein
sequences. They observed that many interesting biological properties of proteins can be
(re-)discovered by analyzing the embedding of the metric into ℓ2 . Aumann and Rabani
[14] and Linial et al. [126] gave also several other applications, including a proof of a
logarithmic bound on max-flow min-cut gap for multicommodity flow problems. They
14
also gave a lower bound on the distortion of any embeddings of general graphs into ℓ1 .
For more details, we point the reader to the recent survey by Indyk and Matousek [107].
Obtaining approximation algorithms for minimum distortion embeddings into certain
host spaces has been a notoriously hard problem. In many cases of interest, such as
embedding into Rd (d ≥ 1), the problem is known to be hard to approximate within
polynomial factors (see [17, 128] and papers cited therein).
Table 1 shows known results on approximate embedding problems for multiplicative
distortion.
1.4.2
Embedding into a metric of a (weighted) tree.
The strongest results were obtained, so far, for the additive distortion. Research on
the algorithmic aspects of finding a tree metric of least additive distortion has culminated
in the paper [9] (see also [56]), where a 6-approximation algorithm was established (in
the notation of [9], it is a 3-approximation algorithm, however, in our more restrictive
definition, requiring that the metric is dominated by the approximating one, it is a
6-approximation), together with a (rather close) hardness result. Relaxing the local
condition on d by allowing its size-4 submetrics to be δ-close to a tree metric, one gets
precisely Gromov’s δ-hyperbolic geometry. For study of algorithmic and other aspects of
such geometries, see, e.g., [52, 53, 119].
The situation with the multiplicative distortion is less satisfactory. The best result
for embedding general metrics into tree metrics is obtained in [21]: the approximation
factor is exponential in
√
log ∆/ log log n, where ∆ is the spread of the metric. Judging
from the parallel results of [17] for embedding into line metrics, it is conceivable that
15
From
general metrics
general metrics
general metrics
Into
L2
ultrametrics
line
Distortion
λ
λ
O(∆3/4 λ11/4 )
Source
[126]
[10]
[17]
Comments
uses SDP
∆ is the spread of the metric
1
general metrics
trees, line
general metrics
R
d
O(log 2 ∆)
(λ log n)
Ω(n
1
(22d−10)
[21]
1
[17, 128] hard to n (22d−10) - approximate,
for every d ≥ 1
R3
R3
> (3 − ϵ)λ
[137] hard to 3-approximate,
embedding is a bijection
line
line
λ
[111] λ is constant,
embedding is a bijection
Ω(1)
line
line
>n
[103] λ = nΩ(1) ,
embedding is a bijection
d
O(d)
ultrametrics
R
λ
[18]
weighted trees
line
λO(1)
[17]
1
1/12
weighted trees
line
Ω(n
λ)
–
hard to O(n 12 )-approximate
even for ∆ = nO(1)
weighted trees
Lp
O(λ)
[120]
unweighted graphs
trees
6λ
[19, 21, 54] improved from 100λ [21] to 27λ [19]
to 6λ [54]
unweighted graphs boun.deg. trees
λ
[111] λ is constant, embedding is a bijection
unweighted graphs spanning trees
O(λ log n)
[21, 76, 86]
unweighted graphs spanning trees NP-complete
[44]
planar graphs
spanning trees NP-complete
[90]
apex-minor–
spanning trees
λ
[75]
λ is constant;planar and
free graphs
bounded genus graphs are there
outerplanar graphs spanning trees
λ
[139]
√
unweighted graphs
line
O(λ2 )
[20]
implies n-approximation
unweighted graphs
line
> ac
[20]
hard to a-approximate for some a > 1
unweighted graphs
line
λ
[20]
λ is constant
√
3/2
unweighted trees
line
O(λ
log λ)
[20]
λ)
Table 1: Known results on approximate embedding problems for multiplicative distortion;
λ is used to denote the optimal distortion and n to denote the number of points in
the input metric. The table contains only the results that hold for the multiplicative
definition of the distortion; there is a rich body of work that applies to other definitions
of distortion, notably the additive or average distortion, see [17] for an overview.
16
any constant factor approximation for optimal embedding of general metrics into tree
metrics is NP-hard. For some small constant γ, the hardness result of [9] implies that it
is NP-hard to approximate the multiplicative distortion better than γ even for metrics
that come from unit-weighted graphs. For a special interesting case of shortest path
metrics of unit-weighted graphs, [21] gets a large (around 100) constant approximation
factor which was improved in [19] to a factor of 27 and later improved to a factor of 6
in [54] by using a method of decomposition(layering partition) of the graph. Also, in
[54], Chepoi and Dragan et al. present the first algorithm for embedding into anything
more complicated than trees where they achieve constant approximation of embedding
into outplanner graphs (K2,3 minor free graphs).
1.4.3
Tree spanners
Substantial work has been done on the tree t-spanner problem on unweighted
graphs. Cai and Corneil [44] have shown that, for a given graph G, the problem to
decide whether G has a tree t-spanner is NP-complete for any fixed t ≥ 4 and is linear
time solvable for t = 1, 2 (the status of the case t = 3 is open for general graphs)2 . The
NP-completeness result was further strengthened in [40] and [41], where Branstädt et al.
showed that the problem remains NP-complete even for the class of chordal graphs (i.e.,
for graphs where each induced cycle has length 3) and every fixed t ≥ 4, and for the class
of chordal bipartite graphs (i.e., for bipartite graphs where each induced cycle has length
4) and every fixed t ≥ 5.
The tree t-spanner problem on planar graphs was studied in [75,90]. In [90], Fekete
2
When G is an unweighted graph, t can be assumed to be an integer.
17
and Kremer proved that the tree t-spanner problem on planar graphs is NP-complete
(when t is part of the input) and polynomial time solvable for t = 3. For fixed t ≥ 4,
the complexity of the tree t-spanner problem on arbitrary planar graphs was left as
an open problem in [90]. This open problem was recently resolved in [75] by Dragan et
al., where it was shown that the tree t-spanner problem is linear time solvable for
every fixed constant t on the class of apex-minor-free graphs which includes all planar
graphs and all graphs of bounded genus. Note also that a number of particular graph
classes (like interval graphs, permutation graphs, asteroidal-triple-free graphs, strongly
chordal graphs, dually chordal graphs, and others) admit additive tree r-spanners for
small values of r (we refer reader to [39–41,44,90,118,122,141,142,146] and papers cited
therein).
The first O(log n)-approximation algorithm for the minimum value of t for the tree
t-spanner problem was developed by Emek and Peleg in [86] (where n is the number
of vertices in a graph). Recently, another logarithmic approximation algorithm for the
problem was proposed in [76] (we elaborate more on this in Chapter 3). Emek and
Peleg also established in [86] that unless P = NP, the problem cannot be approximated
additively by any o(n) term. Hardness of approximation is established also in [122],
where it was shown that approximating the minimum value of t for the tree t-spanner
problem within factor better than 2 is NP-hard (see also [142] for an earlier result).
1.4.4
Sparse spanners
Sparse t-spanners were introduced by Peleg, Schäffer and Ullman in [143, 144] and
since that time were studied extensively. It was shown by Peleg and Schäffer in [143] that
18
the problem of deciding whether a graph G has a t-spanner with at most m edges is NPcomplete. Later, Kortsarz [116] showed that for every t ≥ 2 there is a constant c < 1 such
that it is NP-hard to approximate the sparsest t-spanner within the ratio c·log n, where n
is the number of vertices in the graph. On the other hand, the problem admits a O(log n)ratio approximation for t = 2 [116, 117] and a O(n2/(t+1) )-ratio approximation for t > 2
[84]. For some other inapproximability and approximability results for the Sparsest
t-Spanner problem on general graphs we refer the reader to [32, 33, 66, 67, 82, 84, 85, 158]
and papers cited therein. It is interesting to note also that any (even weighted) n-vertex
graph admits an O(2k − 1)-spanner with at most O(n1+1/k ) edges for any k ≥ 1, and
such a spanner can be constructed in polynomial time [12, 28, 158].
On planar graphs the Sparsest t-Spanner problem was studied as well. Brandes
and Handke have shown that the decision version of the problem remains NP-complete
on planar graphs for every fixed t ≥ 5 (the case 2 ≤ t ≤ 4 is open) [38]. Duckworth,
Wormald, and Zito [80] have shown that the problem of finding a sparsest 2-spanner
of a 4-connected planar triangulation admits a polynomial time approximation scheme
(PTAS). Dragan et al. [74] proved that the Sparsest t-Spanner problem admits PTAS
for graph classes of bounded local tree-width (and therefore for planar and bounded genus
graphs).
Sparse additive spanners were considered in [27, 68, 83, 123, 162]. It is known that
every n-vertex graph admits an additive 2-spanner with at most Θ(n3/2 ) edges [68,83], an
additive 6-spanner with at most O(n4/3 ) edges [27], and an additive O(n(1−1/k)/2 )-spanner
with at most O(n1+1/k ) edges for any k ≥ 1 [27]. All those spanners can be constructed
in polynomial time. We refer the reader to the paper [162] for a good summary of the
19
state of the art of results on the sparsest additive spanner problem in general graphs.
1.4.5
Collective tree spanners
The problem of finding “small” systems of collective additive tree r-spanners for small
values of r was examined on special classes of graphs in [60, 77–79, 164]. For example, in
[60, 79], sharp results were obtained for unweighted chordal graphs and c-chordal graphs
(i.e., the graphs where each induced cycle has length at most c): every c-chordal graph
admits a system of at most log2 n collective additive tree (2⌊c/2⌋)-spanners, constructible
in polynomial time; no system of constant number of collective additive tree r-spanners
can exist for chordal graphs (i.e., when c = 3) and r ≤ 3, and no system of constant
number of collective additive tree r-spanners can exist for outerplanar graphs for any
constant r.
Only papers [77,102,164] have investigated collective (multiplicative or additive) tree
spanners in weighted graphs. It was shown that any weighted n-vertex planar graph
√
admits a system of O( n) collective multiplicative tree 1-spanners (equivalently, additive
tree 0-spanners) [77,102] and a system of at most 2 log3/2 n collective multiplicative tree 3spanners [102]. Furthermore, any weighted graph with genus at most g admits a system of
√
O( gn) collective additive tree 0-spanners [77, 102], any weighted graph with tree-width
at most k − 1 admits a system of at most k log2 n collective additive tree 0-spanners
[77, 102], any weighted graph G with clique-width at most k admits a system of at
most k log3/2 n collective additive tree (2w)-spanners [77], any weighted c-chordal graph
G admits a system of log2 n collective additive tree (2⌊c/2⌋w)-spanners [77] (where w
denotes the maximum edge weight in G).
20
Collective tree spanners of Unit Disk Graphs (UDGs) (which often model wireless
ad hoc networks) were investigated in [164]. It was shown that every n-vertex UDG G
admits a system T (G) of at most 2 log 3 n + 2 spanning trees of G such that, for any two
2
vertices x and y of G, there exists a tree T in T (G) with dT (x, y) ≤ 3 · dG (x, y) + 12.
That is, the distances in any UDG can be approximately represented by the distances in
at most 2 log 3 n + 2 of its spanning trees. Based on this result a new compact and low
2
delay routing labeling scheme was proposed for Unit Disk Graphs.
1.4.6
Spanners with bounded tree-width.
The k-Tree-width t-spanner problem was considered in [75] and [91]. It was
shown that the problem is linear time solvable for every fixed constants t and k on the
class of apex-minor-free graphs [75], which includes all planar graphs and all graphs of
bounded genus, and on the graphs with bounded degree [91].
CHAPTER 2
Metric tree-like structures in real-life networks:
an empirical study
2.1
Introduction
Large networks are everywhere. Can we understand their structure and exploit it?
For example, understanding key structural properties of large-scale data networks is crucial for analyzing and optimizing their performance, as well as improving their reliability
and security [129]. In prior empirical and theoretical studies researchers have mainly
focused on features like small world phenomenon, power law degree distribution, navigability, high clustering coefficients, etc. (see [22,23,36,57,88,113,114,121,160]). Those nice
features were observed in many real-life complex networks and graphs arising in Internet
applications, in biological and social sciences, in chemistry and physics. Although those
features are interesting and important, as it is noted in [129], the impact of intrinsic geometrical and topological features of large-scale data networks on performance, reliability
and security is of much greater importance.
Recently, a few papers explored a little-studied before geometric characteristic of reallife networks, namely the hyperbolicity (sometimes called also the global curvature) of the
network (see, e.g., [50, 62, 110, 129, 154]). It was shown that a number of data networks,
including Internet application networks, web networks, collaboration networks, social
21
22
networks, and others, have small hyperbolicity. It was suggested in [129] that property,
observed in real-life networks, that traffic between nodes tends to go through a relatively
small core of the network, as if the shortest path between them is curved inwards, may
be due to global curvature of the network. Furthermore, the paper [110] proposes that
“hyperbolicity in conjunction with other local characteristics of networks, such as the
degree distribution and clustering coefficients, provide a more complete unifying picture
of networks, and helps classify in a parsimonious way what is otherwise a bewildering
and complex array of features and characteristics specific to each natural and man-made
network.”
The hyperbolicity of a graph/network can be viewed as a measure of how close a
graph is to a tree metrically; the smaller the hyperbolicity of a graph is the closer it is
metrically to a tree. Recent empirical results of [50, 62, 110, 129, 154] on hyperbolicity
suggest that many real-life complex networks and graphs may possess tree-like structures
from a metric point of view.
In this chapter, we substantiate this claim through analysis of a collection of real
data networks. We investigate few recently introduced graph parameters, namely, the
tree-distortion and the tree-stretch of a graph, the tree-length and the tree-breadth of a
graph, the Gromov’s hyperbolicity of a graph, the cluster-diameter and the cluster-radius
in a layering partition of a graph. All these parameters are trying to capture and quantify
this phenomenon of being metrically close to a tree and can be used to measure metric
tree-likeness of a real-life network. Recent advances in theory (see appropriate sections
for details) allow us to calculate or accurately estimate those parameters for sufficiently
large networks. By examining topologies of numerous publicly available networks, we
23
demonstrate existence of metric tree-like structures in wide range of large-scale networks,
from communication networks to various forms of social and biological networks.
Throughout this chapter we discuss these parameters and recently established relationships between them for unweighted and undirected graphs. It turns out that all these
parameters are at most constant or logarithmic factors apart from each other. Hence,
a constant bound on one of them translates in a constant or almost constant bound
on another. We say that a graph has a tree-like structure from a metric point of view
(equivalently, is metrically tree-like) if anyone of those parameters is a small constant.
Recently, paper [8] pointed out that “although large informatics graphs such as social
and information networks are often thought of as having hierarchical or tree-like structure,
this assumption is rarely tested, and it has proven difficult to exploit this idea in practice;
... it is not clear whether such structure can be exploited for improved graph mining and
machine learning ....”
In this chapter, by bringing all those parameters together, we not only provide efficient means for detecting such metric tree-like structures in large-scale networks but also
show how such structures can be used, for example, to efficiently and compactly encode
approximate distance and almost shortest path information and to fast and accurately
estimate diameters and radii of those networks. Estimating accurately and quickly distances between arbitrary vertices of a graph is a fundamental primitive in many data
and graph mining algorithms.
Graphs that are metrically tree-like have many algorithmic advantages. They allow
efficient approximate solutions for a number of optimization problems. For example, they
admit a PTAS for the Traveling Salesman Problem [119], have an efficient approximate
24
solution for the problem of covering and packing by balls [55], admit additive sparse
spanners [53, 70] and collective additive tree-spanners [73], enjoy efficient and compact
approximate distance [53, 94] and routing [53, 69] labeling schemes, have efficient algorithms for fast and accurate estimations of diameters and radii [52], etc. We elaborate
more on these results in appropriate sections.
This chapter is structured as follows. In Section 2.2, we describe our graph datasets.
The next four sections are devoted to analysis of corresponding parameters measuring
metric tree-likeness of our graph datasets: layering partition and its cluster-diameter and
cluster-radius in Section 2.3; hyperbolicity in Section 2.4; tree-distortion in Section 2.5;
tree-breadth, tree-length and tree-stretch in Section 2.6. In each section we first give
theoretical background on the parameter(s) and then present our experimental results.
Additionally, an overview of implications of those results is provided. In Section 2.7, we
further discuss algorithmic advantages for a graph to be metrically tree-like. Finally, in
Section 2.8, we give some concluding remarks.
2.2
Datasets
Our datasets come from different domains like Internet measurements, biological
datasets, web graphs, social and collaboration networks. Table 2 shows basic statistics of our graph datasets. Each graph represents the largest connected component of
the original graph as some datasets consist of one large connected component and many
very small ones.
25
Graph
G = (V, E)
n=
|V |
m=
|E|
PPI [108]
1458
1948
Yeast [43]
2224
6609
DutchElite [63]
3621
4311
EPA [1]
4253
8953
EVA [133]
4475
4664
California [112]
5925 15770
Erdös [29]
6927 11850
Routeview [4]
10515 21455
Homo release 3.2.99 [155]
16711 115406
AS Caida 20071105 [47]
26475 53381
Dimes 3/2010 [152]
26424 90267
Aqualab 12/2007- 09/2008 [49] 31845 143383
AS Caida 20120601 [45]
41203 121309
itdk0304 [46]
190914 607610
DBLB-coauth [165]
317080 1049866
Amazon [165]
334863 925872
diameter radius
diam(G) rad(G)
19
11
22
10
18
13
4
10
10
17
8
9
10
26
23
47
11
6
12
6
10
7
2
5
5
9
4
5
5
14
12
24
Table 2: Graph datasets and their parameters: number of vertices, number of edges,
diameter, radius.
Biological Networks
PPI [108]: It is a protein-protein interaction network in the yeast Saccharomyces cerevisiae. Each node represents a protein with an edge representing an interaction between
two proteins. Self loops have been removed from the original dataset. The dataset has
been analyzed and described in [108].
Yeast [43]: It is a protein-protein interaction network in budding yeast. Each node
represents a protein with an edge representing an interaction between two proteins. Self
loops have been removed from the original dataset. The dataset has been analyzed and
described in [43].
26
Homo [155]: It is a dataset of protein and genetic interactions in Homo sapiens (Human).
Each node represents a protein or a gene.
An edge represents an interac-
tion between two proteins/genes. Parallel edges, representing different resources for
an interaction, have been removed. The dataset is obtained from BioGRID, a freely
accessible database/repositiory of physical and genetic interactions available at http:
//www.thebiogrid.org. The dataset has been analyzed and described in [155].
Social and Collaboration Networks
DutchElite [63]: This is data on the administrative elite in Netherland, April 2006.
Data collected and analyzed by De Volkskrant and Wouter de Nooy. A 2-mode network
data representing person’s membership in the administrative and organization bodies in
Netherland in 2006. A node represents either a person or an organization body. An edge
exists between two nodes if the person node belongs to the organization node.
EVA [133]: It is a network of interconnection between corporations where an edge exists
between two companies (vertices) if one of them is the owner of the other company.
Erdös [29]: It is a collaboration network with mathematician Paul Erdös. Each vertex represents an author with an edge representing a paper co-authorship between two
authors.
DBLB-coauth [165]: It is a co-authorship network of the DBLP computer science bibliography. Vertices of the network represent authors with edges connecting two authors if
they published at least one paper together.
Web Graphs
EPA [1]: It is a dataset representing pages linking to www.epa.gov obtained from Jon
27
Kleinberg’s web page, http://www.cs.cornell.edu/courses/cs685/2002fa/. The pages were constructed by expanding a 200-page response set to a search engine query, as in
the hub/authority algorithm. This data was collected some time back, so a number of
the links may not exist anymore. The vertices of this graph dataset represent web pages
with edges representing links. The graph was originally directed. We ignored direction
of edges to obtain an undirected graph version of the dataset.
California [112]: This graph dataset was also constructed by expanding a 200-page response set to a search engine query ‘California’, as in the hub/authority algorithm.
The dataset was obtained from Jon Kleinberg’s page, http://www.cs.cornell.edu/
courses/cs685/2002fa/. The vertices of this graph dataset represent web pages with
edges representing links between them. The graph was originally directed. We ignored
direction of edges to obtain an undirected graph version of the dataset.
Internet Measurements Networks
Routeview [4]: It is an Autonomous System (AS) graph obtained by University of Oregon
Route-views project using looking glass data and routing registry. A vertex in the dataset
represents an AS with an edge linking two vertices if there is at least one physical link
between them.
AS Caida [45,47]: These are datasets of the Internet Autonomous Systems (AS) relationships derived from BGP table snapshots taken at 24-hour intervals over a 5-day period by
CAIDA. The AS relationships available are customer-provider (and provider-customer,
in the opposite direction), peer-to-peer, and sibling-to-sibling.
Dimes 3/2010 [152]: It is an AS relationship graph of the Internet obtained from Dimes.
The Dimes project performs traceroutes and pings from volunteer agents (of about 1000
28
agent computers) to infer AS relationships. A weekly AS snapshot is available. The
dataset Dimes 3/2010 represents a snapshot aggregated over the month of March, 2010.
It provides the set of AS level nodes and edges that were found in that month and were
seen at least twice.
Aqualab [49]: Peer-to-peer clients are used to collect traceroute paths which are used to
infer AS interconnections. Probes were made between December 2007 and September
2008 from approximately 992,000 P2P users in 3,700 ASes.
Itdk [46]: This is a dataset of Internet router-level graph where each vertex represents a router with an edge between two vertices if there is a link between the corresponding routers. The dataset snapshot is computed from ITDK0304 skitter and
iffinder measurements. The dataset is provided by CAIDA for April 2003 (see http:
//www.caida.org/data/active/internet-topology-data-kit).
Information network
Amazon [165]: It is an Amazon product co-purchasing network. The vertices of the network represent products purchased from the Amazon website and the edges link “commonly/frequently” co-purchased products.
2.3
Layering Partition, its Cluster-Diameter and Cluster-Radius
Layering partition is a graph decomposition procedure that has been introduced in [39,
51] and has been used in [39, 51, 54] and [21] for embedding graph metrics into trees. It
provides a central tool in our investigation.
A layering of a graph G = (V, E) with respect to a start vertex s is the decomposition
of V into the layers (spheres) Li = {u ∈ V : dG (s, u) = i}, i = 0, 1, . . . , r. A layering
29
partition LP(G, s) = {Li1 , · · · , Lipi : i = 0, 1, . . . , r} of G is a partition of each layer Li
into clusters Li1 , . . . , Lipi such that two vertices u, v ∈ Li belong to the same cluster Lij
if and only if they can be connected by a path outside the ball Bi−1 (s) of radius i − 1
centered at s. See Figure 2 for an illustration. A layering partition of a graph can be
constructed in O(n + m) time (see [51]).
(a) Layering of graph G with respect to s. (b) Clusters of the layering partition LP(G, s).
(c) Layering tree Γ(G, s).
(d) Canonic tree H obtained from the layering
partition.
Figure 2: Layering partition and associated constructs.
A layering tree Γ(G, s) of a graph G with respect to a layering partition LP(G, s) is
′
the graph whose nodes are the clusters of LP(G, s) and two nodes C = Lij and C ′ = Lij ′
are adjacent in Γ(G, s) if and only if there exist a vertex u ∈ C and a vertex v ∈ C ′ such
30
that uv ∈ E. It was shown in [39] that the graph Γ(G, s) is always a tree and, given a
start vertex s, can be constructed in O(n + m) time [51]. Note that, for a fixed start
vertex s ∈ V , the layering partition LP(G, s) of G and its tree Γ(G, s) are unique.
The cluster-diameter ∆s (G) of layering partition LP(G, s) with respect to vertex s is
the largest diameter of a cluster in LP(G, s), i.e., ∆s (G) = maxC∈LP(G,s) maxu,v∈C dG (u, v).
The cluster-diameter ∆(G) of a graph G is the minimum cluster-diameter over all layering
partitions of G, i.e., ∆(G) = mins∈V ∆s (G).
The cluster-radius Rs (G) of layering partition LP(G, s) with respect to a vertex s is
the smallest number r such that for any cluster C ∈ LP(G, s), there is a vertex v ∈ V
with C ⊆ Br (v). The cluster-radius R(G) of a graph G is the minimum cluster-radius
over all layering partitions of G, i.e., R(G) = mins∈V Rs (G).
Clearly, in view of tree Γ(G, s) of G, the smaller the parameters ∆s (G) and Rs (G) of
G are, the closer the graph G is to a tree metrically.
Finding cluster-diameter ∆s (G) and cluster-radius Rs (G) for a given layering partition
LP(G, s) of a graph G requires O(nm) time1 , although the construction of layering
partition LP(G, s) itself, for a given vertex s, takes only O(n + m) time. Since the
diameter of any set is at least its radius and at most twice its radius, we have the
following inequality:
Rs (G) ≤ ∆s (G) ≤ 2Rs (G).
In Table 3, we show empirical results on layering partitions obtained for datasets
described in Section 2.2. For each graph dataset G = (V, E), we randomly selected a
start vertex s and built layering partition LP(G, s) of G with respect to s. For each
1
The parameters ∆(G) and R(G) can also be computed in total O(nm) time for any graph G.
31
dataset, Table 3 shows the cluster-diameter ∆s (G), the number of clusters in layering
partition LP(G, s) and the average diameter of clusters in LP(G, s). It turns out that
all graph datasets have small average diameter of clusters. Most clusters have diameter
0 or 1, i.e., they are essentially cliques (=complete subgraphs) of G. For most datasets,
more than 95% of clusters are cliques.
Graph
G = (V, E)
n=
|V |
diameter # of clusters clusteraverage
% of clusters
diam(G) in LP(G, s) diameter diameter having diameter 0
∆s (G) of clusters in or 1 (i.e., cliques)
LP(G, s)
PPI
1458
Yeast
2224
DutchElite
3621
EPA
4253
EVA
4475
California
5925
Erdös
6927
Routeview
10515
Homo release 3.2.99
16711
AS Caida 20071105
26475
Dimes 3/2010
26424
Aqualab 12/2007- 09/2008 31845
AS Caida 20120601
41203
itdk0304
190914
DBLB-coauth
317080
Amazon
334863
19
11
22
10
18
13
4
10
10
17
8
9
10
26
23
47
1017
1838
2934
2523
4266
2939
6288
6702
6817
17067
16065
16287
26562
89856
99828
72278
8
6
10
6
9
8
4
6
5
6
4
6
6
11
11
21
0.118977384
0.119575699
0.070211316
0.06698375
0.031879981
0.092208234
0.001113232
0.063264697
0.03432595
0.056424679
0.056582633
0.05826733
0.055568105
0.270377048
0.45350002
0.489056144
97.05014749%
96.33558341%
98.02317655%
98.5731272%
99.2030005%
97.141885%
99.9681934%
98.4482244%
99.2518703%
98.5527626%
98.5434174%
98.5816909%
98.5731496%
91.3851051%
92.97091%
86.049697%
Table 3: Layering partitions of the datasets and their parameters. ∆s (G) is the largest
diameter of a cluster in LP(G, s), where s is a randomly selected start vertex. For all
datasets, the average diameter of a cluster is between 0 and 1. For most datasets, more
than 95% of clusters are cliques.
To have a better picture on the overall distribution of diameters of clusters, in Table 4,
we show the frequencies of diameters of clusters for three sample datasets: PPI, Yeast,
32
and AS Caida 20071105. It is interesting to note that, in all datasets, the clusters with
large diameters induce a connected subtree in the tree Γ(G, s). For example, in PPI, the
cluster with diameter 8 is adjacent in Γ(G, s) to all clusters with diameters 6 and 5. This
may indicate that all those clusters are part of the well connected network core.
diameter frequency relative
of a cluster
frequency
0
1
2
3
4
5
6
7
8
966
21
14
5
5
1
4
0
1
(a) PPI
0.9499
0.0206
0.0138
0.0049
0.0049
0.0001
0.0039
0
0.0001
diameter frequency relative
diameter frequency relative
of a cluster
frequency of a cluster
frequency
0
1
2
3
4
5
6
981
18
23
6
5
2
2
0.946
0.0174
0.0223
0.0058
0.0048
0.0019
0.0019
(b) Yeast
0
1
2
3
4
5
6
16459
361
174
46
21
4
2
0.9644
0.0216
0.0102
0.0027
0.0012
0.0002
0.0001
(c) AS Caida 20071105
Table 4: Frequency of diameters of clusters in layering partition LP(G, s) (three
datasets).
Most of the graph parameters discussed in this paper could be related to a special
tree H introduced in [54] and produced from a layering partition of a graph G.
Canonic tree H: A tree H = (V, F ) of a graph G = (V, E), called a canonic tree of
G, is constructed from a layering partition LP(G, s) of G by identifying for each cluster
C = Lij ∈ LP(G, s) an arbitrary vertex xC ∈ Li−1 which has a neighbor in C = Lij and
by making xC adjacent in H with all vertices v ∈ C (see Figure 2d for an illustration).
Vertex xC is called the support vertex for cluster C = Lij . It was shown in [54] that tree
H for a graph G can be constructed in O(n + m) total time.
The following statement from [54] relates the cluster-diameter of a layering partition
33
of G with embedability of graph G into the tree H.
Proposition 1 ([54]). For every graph G = (V, E) and any vertex s of G,
∀x, y ∈ V, dH (x, y) − 2 ≤ dG (x, y) ≤ dH (x, y) + ∆s (G).
The above proposition shows that the distortion of embedding of a graph G into
tree H is additively bounded by ∆s (G), the largest diameter of a cluster in a layering
partition of G. This result confirms that the smaller the cluster-diameter ∆s (G) (clusterradius Rs (G)) of G is, the closer the graph G is to a tree metric. Note that trees have
cluster-diameter and cluster-radius equal to 0. Results similar to Proposition 1 were used
in [39] to embed a chordal graph to a tree with an additive distortion at most 2, in [51]
to embed a k-chordal graph to a tree with an additive distortion at most k/2 + 2, and
in [54] to obtain a 6-approximation algorithm for the problem of optimal non-contractive
embedding of an unweighted graph metric into a weighted tree metric. For every chordal
graph G (a graph whose largest induced cycles have length 3), ∆s (G) ≤ 3 and Rs (G) ≤ 2
hold [39]. For every k-chordal graph G (a graph whose largest induced cycles have length
k), ∆s (G) ≤ k/2 + 2 holds [51]. For every graph G embeddable non-contractively into a
(weighted) tree with multiplicative distortion α, ∆s (G) ≤ 3α holds [54]. See Section 2.5
for more on this topic.
2.4
Hyperbolicity
δ-Hyperbolic metric spaces have been defined by M. Gromov [99] in 1987 via a simple
4-point condition: for any four points u, v, w, x, the two larger of the distance sums
d(u, v) + d(w, x), d(u, w) + d(v, x), d(u, x) + d(v, w) differ by at most 2δ. They play an
34
important role in geometric group theory, geometry of negatively curved spaces, and have
recently become of interest in several domains of computer science, including algorithms
and networking. For example, (a) it has been shown empirically in [154] (see also [6])
that the Internet topology embeds with better accuracy into a hyperbolic space than
into an Euclidean space of comparable dimension, (b) every connected finite graph has
an embedding in the hyperbolic plane so that the greedy routing based on the virtual
coordinates obtained from this embedding is guaranteed to work (see [115]). A connected
graph G = (V, E) equipped with standard graph metric dG is δ-hyperbolic if the metric
space (V, dG ) is δ-hyperbolic.
More formally, let G be a graph and u, v, w and x be arbitrary four of its vertices.
Denote by S1 , S2 , S3 the three distance sums, dG (u, v) + dG (w, x), dG (u, w) + dG (v, x) and
dG (u, x) + dG (v, w) sorted in non-decreasing order S1 ≤ S2 ≤ S3 . Define the hyperbolicity
of a quadruplet u, v, w, x as δ(u, v, w, x) =
S3 −S2
.
2
Then the hyperbolicity δ(G) of a graph
G is the maximum hyperbolicity over all possible quadruplets of G, i.e.,
δ(G) =
max δ(u, v, w, x).
u,v,w,x∈V
δ-Hyperbolicity measures the local deviation of a metric from a tree metric; a metric is
a tree metric if and only if it has hyperbolicity 0. Note that chordal graphs, mentioned in
Section 2.3, have hyperbolicity at most 1 [42], while k-chordal graphs have hyperbolicity
at most k/4 [163].
In Table 5, we show the hyperbolicities of most of our graph datasets. The computation of hyperbolicities is a costly operation. We did not compute it for only three very
35
large graph datasets since it would take very long time to calculate. The best known algorithm to calculate hyperbolicity has time complexity of O(n3.69 ), where n is the number
of vertices in the graph; it was proposed in [92] and involves matrix multiplications. This
algorithm still takes long running time for large graphs and is hard to implement. Authors of [92] also propose a 2-approximation algorithm for calculating hyperbolicity that
runs in O(n2.69 ) time and a 2 log2 n-approximation algorithm that runs in O(n2 ) time. In
our computations, we used the naive algorithm which calculates the exact hyperbolicity
of a given graph in O(n4 ) time via calculating the hyperbolicities of its quadruplets. It is
easy to show that the hyperbolicity of a graph is realized on its biconnected component.
Thus, for very large graphs, we needed to check hyperbolicities only for quadruplets
coming from the same biconnected component. Additionally, we used an algorithm by
Cohen et al. from [58] which has O(n4 ) time complexity but performs well in practice as
it prunes the search space of quadruplets.
Graph
G = (V, E)
n=
|V |
m= δ(G)
|E|
PPI
1458 1948
Yeast
2224 6609
DutchElite
3621 4311
EPA
4253 8953
EVA
4475 4664
California
5925 15770
Erdös
6927 11850
Routeview
10515 21455
Homo release 3.2.99
16711 115406
AS Caida 20071105
26475 53381
Dimes 3/2010
26424 90267
Aqualab 12/2007- 09/2008 31845 143383
AS Caida 20120601
41203 121309
3.5
2.5
4
2.5
1
3
2
2.5
2
2.5
2
2
2
Table 5: δ-hyperbolicity of the graph datasets.
36
It turns out that most of the quadruplets in our datasets have small δ values (see
Table 6). For example, more than 96% of vertex quadruplets in EVA and Erdös datasets
have δ values equal to 0. For the remaining graph datasets in Table 6, more than 96% of
the quadruplets have δ ≤ 1, indicating that all of those graphs are metrically very close
to trees.
@
@Graph
PPI
Yeast DucthElite EPA
EVA California Erdös
@
@
δ
@
0
0.4831 0.487015 0.54122195 0.5778 0.9973 0.49057007 0.96694
0.5
0.3634 0.450362
0
0.3655 0.0007 0.41052969 0.03278
1
0.1336 0.060844 0.42201697 0.0552 0.0020 0.09527387 0.00028
1.5
0.0179 0.001762
0
0.0015
– 0.00344690 6.80E-08
2
0.0019 0.000017 0.03642388 2.09E-05 – 0.00017945 3.64E-11
2.5
3.55E-05 2.4641E-09
0
1.37E-10 – 0.00000001
–
3
1.65E-06
–
0.00033717
–
–
1.88E-11
–
3.5
3.79E-09
–
0
–
–
–
–
4
–
–
0.00000004
–
–
–
–
%≤1
98.01
99.8221
96.323891
99.84
100
99.637364 99.99999
Table 6: Relative frequency of δ-hyperbolicity of quadruplets in our graph datasets that
have less than 10K vertices.
In the remaining part of this section, we discuss the theoretical relations between
parameters δ(G) and ∆s (G) of a graph. In [52], the following inequality was proven.
Proposition 2 ([52]). For every n-vertex graph G and any vertex s of G,
∆s (G) ≤ 4 + 12δ(G) + 8δ(G) log2 n.
Here we complement that inequality by showing that the hyperbolicity of a graph is
at most ∆s (G).
37
Proposition 3. For every n-vertex graph G and any vertex s of G,
δ(G) ≤ ∆s (G).
Proof. Let LP(G, s) be a layering partition of G and Γ(G, s) be the corresponding layering tree (consult Figure 2). From construction of LP(G, s) and Γ(G, s), every cluster
C of LP(G, s) separates in G any two vertices belonging to nodes (clusters) of different
subtrees of the forest obtained from Γ(G, s) by removing node C. Note that every vertex
of G belongs to exactly one node (cluster) of the layering tree Γ(G, s).
Consider an arbitrary quadruplet x, y, z, w of vertices of G. Let X, Y, Z, W be the four
nodes in Γ(G, s) (i.e., four clusters in LP(G, s)) containing vertices x, y, z, w, respectively.
In the tree Γ(G, s), consider a median node M of nodes X, Y, Z, W , i.e., a node M
removing of which from Γ(G, s) leaves no connected subtree with more that two nodes
from {X, Y, Z, W }. As a consequence, any connected component of graph G[V \ M ] (the
graph obtained from G by removing vertices of M ) cannot have more than 2 vertices out
of {x, y, z, w}. Thus, M separates at least 4 pairs out of the 6 possible pairs formed by
vertices x, y, z, w. Assume, without loss of generality, that M separates in G vertices x
and y from vertices z and w. See Figure 3 for an illustration.
Let µa be the distance from a ∈ {x, y, z, w} to its closest vertex in M . Let a, b be
a pair of vertices from {x, y, z, w}. If the vertices a, b belong to different components
of G[V \ M ], then M separates a from b and therefore µa + µb ≤ dG (a, b). Since M
separates in G vertices x and y from vertices z and w, we get dG (x, z) + dG (y, w) ≥
µx +µy +µz +µw and dG (x, w)+dG (y, z) ≥ µx +µy +µz +µw . On the other hand, all three
sums dG (x, z)+dG (y, w), dG (x, w)+dG (y, z) and dG (x, y)+dG (z, w) are less than or equal
38
(a) M is a median node for X, Y, Z, W in
(b) M separates in G vertices x and y from
Γ(G, s).
vertices z and w.
Figure 3: Illustration to the proof of Proposition 3.
to µx +µy +µz +µw +2∆s (G), since, by the triangle inequality, dG (a, b) ≤ µa +µb +∆s (G)
for every a, b ∈ {x, y, z, w}. Now, since the two larger distance sums are between µ and
µ + 2∆s (G), where µ := µx + µy + µz + µw , we conclude that the difference between the
two larger distance sums is at most 2∆s (G). Thus, necessarily δ(G) ≤ ∆s (G).
Combining Proposition 2 with Proposition 1, one obtains also the following interesting
result relating the hyperbolicity of a graph G with additive distortion of embedding of
G to its canonic tree H.
Proposition 4 ([52]). For any graph G = (V, E) and its canonic tree H = (V, F ) the
following is true:
∀u, v ∈ V, dH (u, v) − 2 ≤ dG (u, v) ≤ dH (u, v) + O(δ(G) log n).
Since a canonic tree H is constructible in linear time for a graph G, by Proposition 4,
the distances in n-vertex δ-hyperbolic graphs can efficiently be approximated within an
39
additive error of O(δ log n) by a tree metric and this approximation is sharp (see [96, 99]
and [52, 94]).
Graphs and general geodesic spaces with small hyperbolicities have many other algorithmic advantages. They allow efficient approximate solutions for a number of optimization problems. For example, Krauthgamer and Lee [119] presented a PTAS for
the Traveling Salesman Problem when the set of cities lies in a hyperbolic metric space.
Chepoi and Estellon [55] established a relationship between the minimum number of
balls of radius r + 2δ covering a finite subset S of a δ-hyperbolic geodesic space and the
size of the maximum r-packing of S and showed how to compute such coverings and
packings in polynomial time. Chepoi et al. gave in [52] efficient algorithms for fast and
accurate estimations of diameters and radii of δ-hyperbolic geodesic spaces and graphs.
Additionally, Chepoi et al. showed in [53] that every n-vertex δ-hyperbolic graph has an
additive O(δ log n)-spanner with at most O(δn) edges and enjoys an O(δ log n)-additive
routing labeling scheme with O(δ log2 n) bit labels and O(log δ) time routing protocol.
We elaborate more on these results in Section 2.7.
2.5
Tree-Distortion
The problem of approximating a given graph metric by a “simpler” metric is well
motivated from several different perspectives. A particularly simple metric of choice, also
favored from the algorithmic point of view, is a tree metric, i.e., a metric arising from
shortest path distance on a tree containing the given points. In recent years, a number
of authors considered problems of minimum distortion embeddings of graphs into trees
(see [9, 19, 21, 54]), most popular among them being a non-contractive embedding with
40
minimum multiplicative distortion.
Let G = (V, E) be a graph. The (multiplicative) tree-distortion td(G) of G is the
smallest integer α such that G admits a tree (possibly weighted and with Steiner points)
with
∀u, v ∈ V, dG (u, v) ≤ dT (u, v) ≤ α dG (u, v).
The problem of finding, for a given graph G, a tree T = (V ∪ S, F ) satisfying dG (u, v) ≤
dT (u, v) ≤ td(G)dG (u, v), for all u, v ∈ V , is known as the problem of minimum distortion
non-contractive embedding of graphs into trees. In a non-contractive embedding, the
distance in the tree must always be larger than or equal to the distance in the graph,
i.e., the tree distances “dominate” the graph distances.
It is known that this problem is NP-hard, and even more, the hardness result of [9]
implies that it is NP-hard to approximate td(G) better than γ, for some small constant
γ. The best known 6-approximation algorithm using layering partition technique was
recently given in [54]. It improves the previously known 100-approximation algorithm
from [21] and 27-approximation algorithm from [19]. Below we will provide a short
description of the method of [54].
The following proposition establishes a relationship between the tree-distortion and
the cluster-diameter of a graph.
Proposition 5 ([54]). For every graph G and any vertex s, ∆s (G)/3 ≤ td(G) ≤ 2∆s (G)+
2.
Proposition 5 shows that the cluster-diameter ∆s (G) of a layering partition of a graph
G linearly bounds the tree-distortion td(G) of G.
41
Combining Proposition 5 and Proposition 1, the following result is obtained.
Proposition 6 ([54]). For any graph G = (V, E) and its canonic tree H = (V, F ) the
following is true:
∀u, v ∈ V, dH (u, v) − 2 ≤ dG (u, v) ≤ dH (u, v) + 3 td(G).
Surprisingly, a multiplicative distortion turned into an additive distortion. Furthermore, while a tree T = (V ∪ S, F ) satisfying dG (u, v) ≤ dT (u, v) ≤ td(G)dG (u, v), for all
u, v ∈ V , is NP-hard to find, a canonic tree H of G can be constructed in O(m) time
(where m = |E|).
By assigning proper weights to edges of a canonic tree H or adding at most n = |V |
new Steiner points to H, the authors of [54] achieve a good non-contractive embedding of
a graph G into a tree. Recall that a canonic tree H = (V, F ) of G = (V, E) is constructed
in the following way: identify for each cluster C = Lij ∈ LP(G, s) of a layering partition
LP(G, s) of G an arbitrary vertex xC ∈ Li−1 which has a neighbor in C = Lij and make
xC adjacent in H with all vertices v ∈ C (see Figure 4a). Note that H is an unweighted
tree, without any Steiner points, and resembles a BFS-tree of G. Two other trees for G
are constructed as follows.
Tree Hℓ : Tree Hℓ = (V, F, ℓ) is obtained from H by assigning uniformly the weight
ℓ = max{dG (u, v) : uv is an edge of H} to all edges of H. So, Hℓ is a uniformly weighted
tree without Steiner points. It turns out that G embeds in tree Hℓ non-contractively.
Note that, although the topology of the tree Hℓ can be determined in O(m) time (Hℓ is
isomorphic to H), computation of the weight ℓ requires O(nm) time. Thus, the tree Hℓ
is constructible in O(nm) total time. See Figure 4a for an illustration.
42
Tree H′ℓ : Tree Hℓ′ = (V ∪ S, F ′ , ℓ) is obtained from H by first introducing one Steiner
point pC for each cluster C := Lij and adding an edge between each vertex of C and pC and
an edge between pC and the support vertex xC for C, and then by assigning uniformly
the weight ℓ =
1
2
max{∆s (G), max{dG (u, v) : uv is an edge of H}} to all edges of the
obtained tree. So, Hℓ′ is a uniformly weighted tree with at most O(n) Steiner points.
Again, G embeds into tree Hℓ′ non-contractively and Hℓ′ can be obtained in O(nm) total
time. See Figure 4b for an illustration.
(a) Topology of trees H and Hℓ .
(b) Topology of tree Hℓ′ .
Squares denote
Steiner points.
Figure 4: Embedding into trees H, Hℓ and Hℓ′ .
Constructed trees have the following distance properties (for comparison reasons, we
include also the results for H mentioned earlier).
Proposition 7 ([54]). Let G = (V, E) be a graph, s be an arbitrary vertex, α = td(G),
∆s = ∆s (G), and H, Hℓ , Hℓ′ be trees as described above. Then, for any two vertices x
and y of G, the following are true:
dH (x, y) − 2 ≤ dG (x, y) ≤ dH (x, y) + ∆s ,
dH (x, y) − 2 ≤ dG (x, y) ≤ dH (x, y) + 3α,
43
dG (x, y) ≤ dHℓ (x, y) ≤ (∆s + 1)(dG (x, y) + 2),
dG (x, y) ≤ dHℓ (x, y) ≤ max{3α − 1, 2α + 1} (dG (x, y) + 2) ,
dG (x, y) ≤ dHℓ′ (x, y) ≤ (∆s + 1)(dG (x, y) + 1),
dG (x, y) ≤ dHℓ′ (x, y) ≤ 3α(dG (x, y) + 1).
As pointed out in [54], tree Hℓ′ provides a 6-approximate solution to the problem of
minimum distortion non-contractive embedding of a graph into tree.
In our empirical study, we analyze embeddings of our graph datasets into each of
these three trees and measure how closely these graph datasets resemble a tree from this
perspective. We compute the following measures:
- maximum distortion right := max{ ddGT (u,v)
: u, v ∈ V, dT (u, v) > dG (u, v) > 0};
(u,v)
(u,v)
: u, v ∈ V, dG (u, v) > dT (u, v) > 0};
- maximum distortion left := max{ ddGT (u,v)
- average distortion right := avg{ ddGT (u,v)
: u, v ∈ V, dT (u, v) > dG (u, v) > 0};
(u,v)
(u,v)
- average distortion left := avg{ ddGT (u,v)
: u, v ∈ V, dG (u, v) > dT (u, v) > 0};
G (u,v)|
: u, v ∈ V };
- average relative distortion := avg{ |dT (u,v)−d
dG (u,v)
- distance-weighted average distortion :=
=
1
Σ
(dG (u, v)
Σu,v∈V dG (u,v) u,v∈V
·
dT (u,v)
)
dG (u,v)
Σu,v∈V dT (u,v)
.
Σu,v∈V dG (u,v)
A pair of distinct vertices u, v of G = (V, E) we call a right pair with respect to tree
H = (V, F ) if dG (u, v) < dH (u, v). If dH (u, v) < dG (u, v) then they are called a left pair.
Note that G has no left pairs with respect to trees Hℓ and Hℓ′ , in case of trees Hℓ and Hℓ′ ,
44
we talk only about maximum distortion, average distortion, average relative distortion
and distance-weighted average distortion. Distance-weighted average distortion is used
in literature when distortion of distant pairs of vertices is more important than that
of close pairs, as it gives larger weight values to distortion of distant pairs (see [109]).
Clearly, any tree graph would have maximum distortion, average relative distortion and
distance-weighted average distortion equal to 1, 0 and 1, respectively.
Graph
PPI
Yeast
DutchElite
EPA
EVA
California
Erdös
Routeview
Homo release 3.2.99
AS Caida 20071105
Dimes 3/2010
Aqualab
AS Caida 20120601
itdk0304
DBLB-coauth
Amazon
average max
% of
average max
% of
% of
average distancedistortion distor- left distortion distor- right
pairs
relative weighted
left
tion
pairs
right
tion
pairs dT = dG distortion average
left (round.)
right (round.) (round.)
distortion
1.50159
1.48714
1.54045
1.50416
1.29905
1.52477
1.35242
1.40636
1.533
1.48085
1.53666
1.42269
1.34538
1.60077
1.77416
2.48301
7
5
7
5
6
5
3
4
4
4
3
4
4
8
9
19
70.5
56.3
73.0
44.66
32.31
61.82
2.75
24.39
2.83
21.43
5.74
31.71
22.42
94.85
95.82
99.17
1.34140
1.38989
1.41254
1.38107
1.27780
1.37071
1.41097
1.41413
1.67827
1.35730
1.37247
1.41923
1.40429
1.26367
1.24977
1.20027
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
9.1
12.2
3.9
10.47
14.77
7.92
8.91
33.34
25.16
35.42
44.42
35.75
20.43
0.55
0.59
0.20
20.4
31.5
23.1
44.87
52.92
30.25
88.34
42.28
72.01
43.15
49.84
32.54
57.15
4.60
3.59
0.63
0.24669
0.219268
0.252341
0.178557
0.110271
0.227176
0.0437277
0.205375
0.180092
0.192302
0.184767
0.241815
0.138869
0.331656
0.383101
0.536656
0.790311
0.850311
0.760714
0.878082
0.951626
0.810647
1.02241
1.03343
1.13402
1.02943
1.12555
1.03194
1.0068
0.673012
0.615328
0.536656
Table 7: Distortion results of embedding datasets into a canonic tree H.
Tables 7 and 8 show the results of embedding our graph datasets into trees H, Hℓ and
Hℓ′ , respectively. It turns out that most of the datasets embed into tree H with average
distortion (right or left, right being usually better) between 1 and 1.5. Also, many pairs
of vertices enjoy exact embedding to tree H; they preserve their original graph distances
45
tree Hℓ′
tree Hℓ
Graph
average max
average distance- average max
average distance-
distor-
distor- relative weighted distor-
distor- relative weighted
tion
tion
tion
distor-
average tion
tion
distor-
distor-
average
tion
distor-
tion
PPI
Yeast
DutchElite
EPA
EVA
California
Erdös
Routeview
Homo release 3.2.99
AS Caida 20071105
Dimes 3/2010
Aqualab 12/2007- 09/2008
AS Caida 20120601
itdk0304
DBLB-coauth
Amazon
5.70566
4.37781
5.45299
4.50619
5.83084
4.15785
3.08843
4.28302
4.64504
4.24314
3.43833
4.23183
4.10547
5.370078
5.57869
8.81911
21
15
21
15
18
15
9
12
12
12
9
12
12
24
27
57
4.70566
3.37781
4.45299
3.50619
4.83084
3.15785
2.08843
3.28302
3.64504
3.24314
2.43833
3.23183
3.10547
4.37008
4.57869
7.81911
5.53218
4.25155
5.325
4.39041
5.70976
4.05324
3.06724
4.13371
4.53609
4.11772
3.37664
4.12775
4.0272
5.3841
5.53795
8.78382
tion
5.29652
3.79318
6.53269
4.06901
7.77752
4.98668
3.06705
4.80363
3.96703
4.76795
3.35917
4.54116
4.53051
5.710122
5.12724
7.87004
16
12
20
12
18
16
8
12
10
12
8
12
12
22
22
42
4.29652
2.79318
5.53269
3.06901
6.77752
3.98668
2.06705
3.80363
2.96703
3.76795
2.35917
3.54116
3.53051
4.71012
4.12724
6.87004
5.2027
3.74159
6.4574
3.99447
7.65544
4.92935
3.05622
4.66503
3.94713
4.65617
3.32159
4.4587
4.4896
5.82908
5.14932
7.95201
Table 8: Distortion results of non-contractive embedding of datasets into trees Hℓ and
Hℓ′ .
(for example, around 88% of the pairs in the Erdös dataset, 72% of pairs in Homo release
3.2.99, 57% in AS Caida 20120601 preserve their original graph distances). Comparing
the results of non-contractive embeddings to trees Hℓ and Hℓ′ , we observe that maximum
distortions are slightly improved in Hℓ′ over distortions in Hℓ , but average distortions are
very much comparable. Furthermore, distance-weighted average distortions are better in
Hℓ than in Hℓ′ . This confirms Gupta’s claim in [101] that the Steiner points do not really
help.
46
As tree Hℓ′ provides a 6-approximate solution to the problem of minimum distortion
non-contractive embedding of graph into tree, dividing by 6 the maximum distortion
values in Table 8 for tree Hℓ′ , we obtain a lower bound on td(G) for each graph dataset
G. For example, td(G) is at lest 4/3 for Erdös and Dimes 3/2010, at least 5/3 for
Homo release 3.2.99, at least 2 for Yeast, EPA, Routeview, AS Caida 20071105, Aqualab
12/2007-09/2008 and AS Caida 20120601, at least 8/3 for PPI and California, at least
10/3 for DutchElite, at least 3 for EVA, at least 11/3 for itdk0304 and DBLB-coauth,
and at least 7 for Amazon.
2.6
Tree-Breadth, Tree-Length and Tree-Stretch
There are two other graph parameters measuring metric tree likeness of a graph that
are based on the notion of tree-decomposition introduced by Robertson and Seymour
in their work on graph minors [150]. Analysis of few real-life networks (like Aqualab,
AS Caida, Dimes) performed in [62] shows that although those networks have small
hyperbolicities, they all have sufficiently large tree-width due to well connected cores.
As we demonstrate below, the tree-length of those graph datasets is relatively small.
Evidently, for any graph G, 1 ≤ tb(G) ≤ tl(G) ≤ 2tb(G) holds. Hence, if one parameter
is bounded by a constant for a graph G then the other parameter is bounded for G
as well. Clearly, in view of tree-decomposition T (G) of a graph G, the smaller the
parameters tl(G) and tb(G) of G are, the closer the graph G is to a tree metrically.
Unfortunately, while graphs with tree-length 1 (as they are exactly the chordal graphs)
can be recognized in linear time, the problem of determining whether a given graph has
tree-length at most λ is NP-complete for every fixed λ > 1 (see [127]). Judging from
47
this result, it is conceivable that the problem of determining whether a given graph has
tree-breadth at most ρ is NP-complete, too.
The following proposition from [71] establishes a relationship between the tree-length
and the cluster-diameter of a layering partition of a graph.
Proposition 8 ([71]). For every graph G and any vertex s, ∆s (G)/3 ≤ tl(G) ≤ ∆s (G) +
1.
Thus, the cluster-diameter ∆s (G) of a layering partition provides easily computable
bounds for the hard to compute parameter tl(G).
One can prove similar inequalities relating the tree-breadth and the cluster-radius of
a layering partition of a graph.
Proposition 9. For every graph G and any vertex s,
∆s (G)/6 ≤ Rs (G)/3 ≤ tb(G) ≤ Rs (G) + 1 ≤ ∆s (G) + 1.
Furthermore, a tree-decomposition of G with breadth at most 3tb(G) can be constructed
in O(n + m) time.
Proof. The proof is similar to the proof from [71] of Proposition 8. First we show
Rs (G)/3 ≤ tb(G). Let T (G) be a tree-decomposition of G with minimum breadth tb(G).
Let X1 X2 be an edge of T (G) and T1 , T2 be subtrees of T (G) after removing the edge
X1 X2 . It is known [65] that set I = X1
∩
X2 separates in G vertices belonging to bags
of T1 but not to I from vertices belonging to bags of T2 but not to I. Assume that T (G)
is rooted at a bag containing vertex s, the source of layering partition LP(G, s). Let C
be a cluster from layer Li (i.e., C = Lji for some j = 1, · · · , pi ). Let Z be the nearest
48
common ancestor of all bags of T (G) containing vertices of C. Let z be the vertex such
that Z ⊆ Btb(G) (z, G).
Figure 5: Illustration to the proof of Proposition 9.
Consider arbitrary vertex x ∈ C. Necessarily, there is a vertex y ∈ C and two bags X
and Y of T (G) containing vertices x and y, respectively, such that Z = N CAT (G) (X, Y )
(i.e., Z is the nearest common ancestor of X and Y in T (G)). Let P be a shortest path
of G from s to x. By the separator property above, P intersects Z. See Figure 5 for an
illustration. Let a be a vertex of P
∩
Z closest to s in G. Since both x and y belong
to C, there exists a path Q from x to y in G using only intermediate vertices w with
dG (s, w) ≥ i. Let b ∈ Q ∩ Z (i.e. Q intersects Z at vertex b). We have dG (s, x) = i =
dG (s, a) + dG (a, x) and i ≤ dG (s, b) ≤ dG (s, a) + dG (a, z) + dG (z, b) ≤ dG (s, a) + 2tb(G).
49
Hence, dG (a, x) = i − dG (s, a) ≤ 2tb(G) and therefore
dG (x, z) ≤ dG (x, a) + dG (a, z) ≤ 2tb(G) + tb(G) = 3tb(G).
Thus, any vertex x of C is at distance at most 3tb(G) from z in G, implying Rs (G)/3 ≤
tb(G).
Note that, for the neighbor x′ of x on P , d(x′ , z) ≤ 3tb(G) − 1 must hold, i.e.,
B3tb(G) (z, G) contains not only all vertices of C = Lji but also all neighbors of vertices of
C lying in layer Li−1 . This fact will be useful in the second part of this proof.
Now we show that tb(G) ≤ Rs (G) + 1. Consider tree Γ(G, s) of a layering partition
LP(G, s) and assume Γ(G, s) is rooted at node {s}. Let p(C) be the parent of node C in
Γ(G, s). Clearly, Γ(G, s) satisfies already conditions 1 and 3 of tree-decompositions and
only violates condition 2 as the edges joining vertices in different (neighboring) layers
are not yet covered by bags (which are the clusters in this case). We can obtain a treedecomposition Γ′ from Γ(G, s) as follows. Γ′ will have the same structure as Γ(G, s),
only the nodes of Γ(G, s) will slightly expand to cover additional edges of G and form
the bags of Γ′ . To each node C of Γ(G, s) (assume C ⊆ Li ) we add all vertices from its
parent p(C) (p(C) ⊆ Li−1 ) which are adjacent to vertices of C in G. This expansion of
C results in a bag C + of Γ′ which, by construction, contains now also each edge uv of
G with u ∈ C ⊆ Li and v ∈ p(C) ⊆ Li−1 . Thus, Γ′ satisfies conditions 1 and 2 of treedecompositions. Also, if C ⊆ Br (z) for some vertex z and integer r, then C + ⊆ Br+1 (z)
must hold. Furthermore, each vertex v of G that was in a node C now belongs to bag
C + and to all bags formed from children of C in Γ(G, s) (and only to them). Hence, all
bags containing v form a star in Γ′ . All these indicate that Γ′ is a tree-decomposition of
50
G with breadth at most Rs (G) + 1, i.e., tb(G) ≤ Rs (G) + 1.
Furthermore, as we indicated in the first part of this proof, for any cluster C there is
a vertex z in G such that C + ⊆ B3tb(G) (z, G). The latter implies that the tree Γ′ obtained
from Γ(G, s) has breadth at most 3tb(G). Finally, since Γ′ is constructible in linear time
and Rs (G) ≤ ∆s (G) ≤ 2Rs (G) holds for every graph G, the proposition follows.
Hence, the cluster-radius Rs (G) of a layering partition provides easily computable
bounds for the tree-breadth tb(G) of a graph. In Table 9, we show the corresponding
lower and upper bounds on the tree-breadth for some of our datasets. The lower bound is
obtained by dividing Rs (G) by 3, the upper bound is obtained by calculating the breadth
of the tree-decomposition Γ′ .
Graph
G = (V, E)
PPI
Yeast
DutchElite
EPA
EVA
California
Erdös
Routeview
Homo release 3.2.99
AS Caida 20071105
Dimes 3/2010
Aqualab 12/2007- 09/2008
AS Caida 20120601
itdk0304
DBLB-coauth
Amazon
Rs (G) lower bound upper bound
on tb(G)
on tb(G)
4
4
6
4
5
4
2
3
3
3
2
3
3
6
7
12
2
2
2
2
2
2
1
1
1
1
1
1
1
2
3
4
5
4
6
4
5
4
2
4
3
3
2
3
3
6
7
12
Table 9: Lower and upper bounds on the tree-breadth of our graph datasets.
Reformulating Proposition 1, we obtain the following result.
51
Proposition 10. For any graph G = (V, E) and its canonic tree H = (V, F ), the following is true:
∀u, v ∈ V, dH (u, v) − 2 ≤ dG (u, v) ≤ dH (u, v) + 3 tl(G) ≤ dH (u, v) + 6 tb(G).
Graphs with small tree-length or small tree-breadth have many other nice properties. Every n-vertex graph with tree-length tl(G) = λ has an additive 2λ-spanner with
O(λn + n log n) edges and an additive 4λ-spanner with O(λn) edges, both constructible
in polynomial time [70]. Every n-vertex graph G with tb(G) = ρ has a system of at most
log2 n collective additive tree (2ρ log2 n)-spanners constructible in polynomial time [73].
Those graphs also enjoy a 6λ-additive routing labeling scheme with O(λ log2 n) bit labels and O(log λ) time routing protocol [69], and a (2ρ log2 n)-additive routing labeling
scheme with O(log3 n) bit labels and O(1) time routing protocol with O(log n) message
initiation time (by combining results of [73] and [78]). See Section 2.7 for some details.
Here we elaborate a little bit more on a connection established in [76] between the treebreadth and the tree-stretch of a graph (and the corresponding tree t-spanner problem).
The tree-stretch ts(G) of a graph G = (V, E) is the smallest number t such that G
admits a spanning tree T = (V, E ′ ) with dT (u, v) ≤ tdG (u, v) for every u, v ∈ V . The
tree T is called a tree t-spanner of G and the problem of finding such a tree T for G is
known as the tree t-spanner problem. Note that as T is a spanning tree of G, necessarily
dG (u, v) ≤ dT (u, v) and E ′ ⊆ E. The latter makes the tree-stretch parameter different
from the tree-distortion where new (not from graph) edges can be used to build a tree. It
is known that the tree t-spanner problem is NP-hard [44]. The best known approximation
algorithms have approximation ratio of O(log n) [76, 86].
52
The following two results were obtained in [76].
Proposition 11 ([76]). For every graph G, tb(G) ≤ ⌈ts(G)/2⌉ and tl(G) ≤ ts(G).
Proposition 12 ([76]). For every n-vertex graph G, ts(G) ≤ 2tb(G) log2 n. Furthermore,
a spanning tree T of G with dT (u, v) ≤ 2tb(G) log2 n dG (u, v), for every u, v ∈ V, can be
constructed in polynomial time.
Proposition 12 is obtained by showing that every n-vertex graph G with tb(G) =
ρ admits a tree (2ρ log2 n)-spanner constructible in polynomial time. Together with
Proposition 11, this provides a log2 n-approximate solution for the tree t-spanner problem
in general unweighted graphs.
We conclude this section with two other inequalities establishing relations between
the tree-stretch and the tree-distortion and hyperbolicity of a graph.
Proposition 13 ([72]). For every graph G, tl(G) ≤ td(G) ≤ ts(G) ≤ 2td(G) log2 n.
Proposition 14 ([72]). For every δ-hyperbolic graph G, ts(G) ≤ O(δ log2 n).
Proposition 13 says that if a graph G is non-contractively embeddable into a tree
with distortion td(G) then it is embeddable into a spanning tree with stretch at most
2td(G) log2 n. Furthermore, a spanning tree with stretch at most 2td(G) log2 n can be
constructed in polynomial time. Proposition 14 says that every δ-hyperbolic graph G
admits a tree O(δ log2 n)-spanner. Furthermore, such a spanning tree for a δ-hyperbolic
graph can be constructed in polynomial time.
53
2.7
Use of Metric Tree-Likeness
As we have mentioned earlier, metric tree-likeness of a graph is useful in a number
of ways. Among other advantages, it allows to design compact and efficient approximate
distance labeling and routing labeling schemes, fast and accurate estimation of the diameter and the radius of a graph. In this section, we elaborate more on these applications.
In general, low distortion embedability of a graph G into a tree T allows to solve approximately many distance related problems on G by first solving them on the tree T and
then interpreting that solution on G.
2.7.1
Approximate distance queries
Commonly, when one makes a query concerning a pair of vertices in a graph (adjacency, distance, shortest route, etc.), one needs to make a global access to the structure
storing that information. A compromise to this approach is to store enough information
locally in a label associated with a vertex such that the query can be answered using only
the information in the labels of the two vertices in question and nothing else. Motivation
of localized data structure in distributed computing is surveyed and widely discussed
in [95, 138].
Here, we are mainly interested in the distance and routing labeling schemes introduced
by Peleg (see, e.g., [138]). Distance labeling schemes are schemes that label the vertices
of a graph with short labels in such a way that the distance between any two vertices
u and v can be determined or estimated efficiently by merely inspecting the labels of u
and v, without using any other information. Routing labeling schemes are schemes that
label the vertices of a graph with short labels in such a way that given the label of a
54
source vertex and the label of a destination, it is possible to compute efficiently the port
number of the edge from the source that heads in the direction of the destination.
It is known that n-vertex trees enjoy a distance labeling scheme where each vertex is
assigned a O(log2 n)-bit label such that given labels of two vertices the distance between
them can be inferred in constant time [140]. We can use for our datasets their canonic
trees to compactly and distributively encode their approximate distance information.
Given a graph dataset G, we first compute in linear time its canonic tree H. Then, we
preprocess H in O(n log n) time (see [140]) to assign each vertex v ∈ V an O(log2 n)-bit
distance label. Given two vertices u, v ∈ V , we can compute in O(1) time the distance
dH (u, v) from their labels and output this distance as a good estimate for the distance
between u and v in G.
In Figure 6, we demonstrate how accurately canonic trees represent pairwise distances
in our datasets. For a given number ϵ ≥ 1, we show how many vertex pairs had a
(u,v) dG (u,v)
distortion less than ϵ, i.e., pairs u, v ∈ V with max{ ddHG (u,v)
, dH (u,v) } < ϵ. We can see that
H approximates distances for most vertex pairs with a high level of accuracy. Exact
graph distances were preserved in H for at least 40% of pairs in 8 datasets (EPA, EVA,
Erdös, Routeview, Homo, AS Caida 20071105, Dimes 3/2010 and AS Caida 20120601).
At least 50% of pairs of 6 datasets have distance distortion in H less than 1.2. At least
60% of pairs for 6 datasets have distance distortion less than 1.3. At least 70% of pairs
of 10 datasets have distance distortion less than 1.5. At least 80% of pairs of 14 datasets
have distance distortion less than 2. At least 90% of pairs of 14 datasets have distance
distortion less than 2.2. For the DBLB-coauth dataset, 80% (90%) of pairs embed into
H with distortion no more than 2.2 (2.4, respectively; not shown on table). For the
55
Graph
G = V, E)
distortion
= 1 < 1.2 < 1.3 < 1.5 < 2 < 2.2
PPI
20.41
Yeast
31.51
DutchElite
23.13
EPA
44.87
EVA
52.92
California
30.25
Erdös
88.34
Routeview
42.28
Homo release 3.2.99
72.01
AS Caida 20071105
43.15
Dimes 3/2010
49.84
Aqualab 12/2007- 09/2008 32.54
AS Caida 20120601
57.15
itdk0304
4.60
DBLB-coauth
3.59
Amazon
0.63
37.68
38.45
27.99
50.83
73.37
40.21
88.34
44.75
72.13
46.60
50.06
33.23
59.57
15.18
12.08
2.67
47.90
53.22
42.97
65.50
82.68
51.89
89.84
58.17
73.48
62.39
56.77
44.61
71.82
23.67
17.60
4.57
65.93
72.30
64.60
76.52
92.83
64.53
96.99
81.94
79.08
84.54
89.30
76.46
89.58
42.54
30.64
10.16
90.68
91.03
88.71
91.82
99.12
88.97
99.55
96.40
90.79
95.68
97.05
95.93
98.65
81.98
67.92
33.10
96.37
98.55
95.44
98.68
99.88
98.06
99.98
99.85
99.97
99.90
99.99
99.98
99.98
93.55
83.10
46.53
(a) Percentage of vertex pairs whose distance was distorted only up-to a given value.
1
0.9
accumulative frequency
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
1.2
1.4
1.6
1.8
2
2.2
distortion
PPI
Yeast
DutchElite
EPA
EVA
California
Erdös
Routeview
Homo release 3.2.99
AS_Caida_20071105
Dimes
Aqualab
AS_Caida_20120601
itdk0304
DBLB-coauth
Amazon
(b) Accumulative frequency chart.
Figure 6: Distortion distribution for embedding of a graph dataset into its canonic tree
H.
56
Amazon dataset, 80% (90%) of pairs embed into H with distortion no more than 3.2
(3.8, respectively; not shown on table).
Hence, using embeddings of our datasets into their canonic trees, we obtain a compact
and efficient approximate distance labeling scheme for them. Each vertex of a graph
dataset G gets O(log2 n)-bit label from the canonic tree and the distance between any
two vertices of G can be computed with a good level of accuracy in constant time from
their labels only.
2.7.2
Approximating optimal routes
First we formally define approximate routing labeling schemes. A family ℜ of graphs is
said to have an l(n) bit (s, r)-approximate routing labeling scheme if there exist a function
L, labeling the vertices of each n-vertex graph in ℜ with distinct labels of up to l(n) bits,
and an efficient algorithm/function f, called the routing decision or routing protocol, that
given the label of a current vertex v and the label of the destination vertex (the header of
the packet), decides in time polynomial in the length of the given labels and using only
those two labels, whether this packet has already reached its destination, and if not, to
which neighbor of v to forward the packet. Furthermore, the routing path from any source
s to any destination t produced by this scheme in a graph G from ℜ must have the length
at most s · dG (s, t) + r. For simplicity, (1, r)-approximate labeling schemes (distance or
routing) are called r-additive labeling schemes, and (s, 0)-approximate labeling schemes
are called s-multiplicative labeling schemes.
A very good routing labeling scheme exists for trees [157]. An n-vertex tree can be
preprocessed in O(n log n) time so that each vertex is assigned an O(log n)-bit routing
57
label. Given the label of a source vertex and the label of a destination, it is possible to
compute in constant time the port number of the edge from the source that lays on the
(shortest) path to the destination.
Unfortunately, a canonic tree H of a graph G is not suitable for approximately routing
in G; H may have artificial edges (not coming from G) and therefore a path of H from
a source to a destination may not be available for routing in G. To reduce the problem
of routing in G to routing in a tree T , the tree T needs to be a spanning tree of G.
Hence, a spanning tree T of G with minimum stretch (i.e., a tree t-spanner of G with
t = ts(G)) would be a perfect choice. Unfortunately, finding a tree t-spanner of a graph
with minimum t is an NP-hard problem.
For our graph datasets, one can exploit the facts that they have small tree-breadth/treelength and/or small hyperbolicity.
If the tree-breadth of an n-vertex graph G is ρ, then, by a result from [76], G admits a
tree (2ρ log2 n)-spanner constructible in polynomial time. Hence, G enjoys a (2ρ log2 n)multiplicative routing labeling scheme with O(log n) bit labels and O(1) time routing
protocol (routing is essentially done in that tree spanner). Another result for graphs with
tb(G) = ρ, useful for designing routing labeling schemes, is presented in [73]. It states that
every n-vertex graph G with tb(G) = ρ has a system of at most log2 n collective additive
tree (2ρ log2 n)-spanners, i.e., a system T of at most log2 n spanning trees of G such that
for any two vertices u, v of G there is a tree T in T with dT (u, v) ≤ dG (u, v) + 2ρ log2 n.
Furthermore, such a system T for G can be constructed in polynomial time [73]. By
combining this with a result from [78], we obtain that every n-vertex graph G with
tb(G) = ρ enjoys a (2ρ log2 n)-additive routing labeling scheme with O(log3 n) bit labels
58
and O(1) time routing protocol with O(log n) message initiation time. The approach
of [78] is to assign to each vertex of G a label with O(log3 n) bits (distance and routing
labels coming from log2 n spanning trees) and then, using the label of source vertex v
and the label of destination vertex u, identify in O(log n) time the best spanning tree in
T to route from v to u.
If the tree-length of an n-vertex graph G is λ, then, by result from [69], G enjoys a
6λ-additive routing labeling scheme with O(λ log2 n) bit labels and O(log λ) time routing
protocol.
If the hyperbolicity of an n-vertex graph G is δ, then, by result from [53], G enjoys
an O(δ log n)-additive routing labeling scheme with O(δ log2 n) bit labels and O(log δ)
time routing protocol. Note that for any graph G, the hyperbolicity of G is at most its
tree-length [52].
Thus, for our graph datasets, there exists a very compact labeling scheme (at most
O(log2 n) or O(log3 n) bits per vertex) that encodes logarithmic length routes between any
pair of vertices, i.e., routes of length at most dG (u, v) + min{O(δ log n), 6λ, 2ρ log2 n} ≤
diam(G) + O(log n) ≤ O(log n) for each vertex pair u, v of G. The latter implies very
good navigability of our graph datasets. Recall that, for our graph datasets, diam(G) ≤
O(log n) holds.
2.7.3
Approximating diameter and radius
Recall that the eccentricity of a vertex v of a graph G, denoted by ecc(v), is the
maximum distance from v to any other vertex of G, i.e., ecc(v) := maxu∈V dG (v, u). The
diameter diam(G) of G is the largest eccentricity of a vertex in G, i.e., diam(G) :=
59
Graph
G = (V, E)
PPI
Yeast
DutchElite
EPA
EVA
California
Erdös
Routeview
Homo release 3.2.99
AS Caida 20071105
Dimes 3/2010
Aqualab 12/2007- 09/2008
AS Caida 20120601
itdk0304
DBLB-coauth
Amazon
diameter radius # of BFS scans estimated radius
diam(G) rad(G) needed to get
or ecc(·) of a
diam(G)
middle vertex
19
11
22
10
18
13
4
10
10
17
8
9
10
26
23
47
11
6
12
6
10
7
2
5
5
9
4
5
5
14
12
24
3
3
4
2
2
2
2
2
2
2
2
2
2
2
2
2
12
6
13
7
10
8
3
5
6
9
5
5
5
15
14
26
Table 10: Estimation of diameters and radii.
maxv∈V ecc(v) = maxv,u∈V dG (u, v). The radius rad(G) of G is the smallest eccentricity
of a vertex in G, i.e., rad(G) := minv∈V ecc(v). A vertex c of G with ecc(v) = rad(G)
(i.e., a smallest eccentricity vertex) is called a central vertex of G. The center C(G) of
G is the set of all central vertices of G. Let also F (v) := {u ∈ V : dG (v, u) = ecc(v)} be
the set of vertices of G furthest from v.
In general (even unweighted) graphs, it is still an open problem whether the diameter
and/or the radius of a graph G can be computed faster than the time needed to compute
the entire distance matrix of G (which requires O(nm) time for a general unweighted
graph). On the other hand, it is known that both the diameter and the radius of a
tree T can be calculated in linear time. That can be done by using two Breadth-FirstSearch (BFS) scans as follows. Pick an arbitrary vertex u of T . Run a BFS starting
60
from u to find v ∈ F (u). Run a second BFS starting from v to find w ∈ F (v). Then
dT (v, w) = diam(T ), i.e., v, w is a diametral pair of T , and rad(T ) = ⌊(dT (v, w) + 1)/2⌋.
To find the center of T it suffices to take one or two adjacent middle vertices of the
(v, w)-path of T .
Interestingly, in [52], Chepoi et al. established that this approach of 2 BFS-scans can
be adapted to provide fast (in linear time) and accurate approximations of the diameter,
radius, and center of any finite set S of δ-hyperbolic geodesic spaces and graphs. In
particular, for a δ-hyperbolic graph G, it was shown that if v ∈ F (u) and w ∈ F (v), then
dG (v, w) ≥ diam(G) − 2δ and rad(G) ≤ ⌊(dG (v, w) + 1)/2⌋ + 3δ. Furthermore, the center
C(G) of G is contained in the ball of radius 5δ + 1 centered at a middle vertex c of any
shortest path connecting v and w in G.
Since our graph datasets have small hyperbolicities, according to [52], few (2, 3, 4,
...) BFS-scans, each next starting at a vertex last visited by the previous scan, should
provide a pair of vertices x and y such that dG (x, y) is close to the diameter diam(G) of
G. Surprisingly (see Table 10), few BFS-scans were sufficient to get exact diameters of all
of our datasets: for thirteen datasets, two BFS-scans (just like for trees) were sufficient
to find the exact diameter of a graph. Two datasets needed three BFS-scans to find the
diameter, and only one dataset required four BFS-scans to get the diameter. We also
computed the eccentricity of a middle vertex of a longest shortest path produced by these
few BFS-scans and reported this eccentricity as an estimation for the graph radius. It
turned out that the eccentricity of that middle vertex was equal to the exact radius for
six datasets, was only one apart from the exact radius for eight datasets, and only for
two datasets was two units apart from the exact radius.
61
2.8
Conclusion
Based on solid theoretical foundations, we presented strong evidence that a number
of real-life networks, taken from different domains like Internet measurements, biological datasets, web graphs, social and collaboration networks, exhibit metric tree-like
structures. We investigated a few graph parameters, namely, the tree-distortion and
the tree-stretch, the tree-length and the tree-breadth, the Gromov’s hyperbolicity, the
cluster-diameter and the cluster-radius in a layering partition of a graph, which capture
and quantify this phenomenon of being metrically close to a tree. Recent advances in
theory allowed us to calculate or accurately estimate these parameters for sufficiently
large networks. All these parameters are at most constant or (poly)logarithmic factors
apart from each other. Specifically, graph parameters td(G), tl(G), tb(G), ∆s (G), Rs (G)
are within small constant factors from each other. Parameters ts(G) and δ(G) are within
factor of at most O(log n) from td(G), tl(G), tb(G), ∆s (G), Rs (G). Tree-stretch ts(G)
is within factor of at most O(log2 n) from hyperbolicity δ(G). One can summarize those
relationships with the following chains of inequalities:
δ(G) ≤ ∆s (G) ≤ O(δ(G) log n); Rs (G) ≤ ∆s (G) ≤ 2Rs (G); tb(G) ≤ tl(G) ≤ 2tb(G);
δ(G) ≤ tl(G) ≤ td(G) ≤ ts(G) ≤ 2tb(G) log2 n ≤ O(δ(G) log2 n);
tl(G) − 1 ≤ ∆s (G) ≤ 3tl(G) ≤ 3td(G) ≤ 3(2∆s (G) + 2);
tb(G) − 1 ≤ Rs (G) ≤ 3tb(G) ≤ 3⌈ts(G)/2⌉.
If one of these parameters or its average version has small value for a large scale network,
we say that that network has a metric tree-like structure. Among these parameters,
62
theoretically smallest ones are δ(G), Rs (G) and tb(G) (tb(G) being at most Rs (G) + 1).
Our experiments showed that average versions of ∆s (G) and of td(G) have also very small
values for the investigated graph datasets.
In Table 11, we provide a summary of metric tree-likeness measurements calculated for
our datasets. Figure 7 shows four important metric tree-likeness measurements (scaled)
in comparison. Figure 8 gives pairwise dependencies between those measurements (one
as a function of another).
Graph
G = (V, E)
diameter radius clusteraverage δ(G) Tree H
Hℓ
Hℓ′
clusterdiam(G)rad(G)diameter diameter
average average average radius
∆s (G) of clusters in
distortion* distortiondistortion Rs (G)
LP(G, s)
(round.)
PPI
19
11
Yeast
11
6
DutchElite
22
12
EPA
10
6
EVA
18
10
California
13
7
Erdös
4
2
Routeview
10
5
Homo release 3.2.99
10
5
AS Caida 20071105
17
9
Dimes 3/2010
8
4
Aqualab 12/2007- 09/2008
9
5
AS Caida 20120601
10
5
itdk0304
26
14
DBLB-coauth
23
12
Amazon
47
24
avg. distortion right×#right pairs + avg.
*
=
8
0.118977384 3.5 1.38471 5.70566 5.29652
6
0.119575699 2.5 1.32182 4.37781 3.79318
10
0.070211316 4 1.41056 5.45299 6.53269
6
0.06698375 2.5 1.26507 4.50619 4.06901
9
0.031879981 1 1.13766 5.83084 7.77752
8
0.092208234 3 1.35380 4.15785 4.98668
4
0.001113232 2 1.04630 3.08843 3.06705
6
0.063264697 2.5 1.23716 4.28302 4.80363
5
0.03432595 2 1.18574 4.64504 3.96703
6
0.056424679 2.5 1.22959 4.24314 4.76795
4
0.056582633 2 1.19626 3.43833 3.35917
6
0.05826733 2 1.28390 4.23183 4.54116
6
0.055568105 2 1.16005 4.10547 4.53051
11
0.270377048 – 1.57126 5.370078 5.710122
11
0.45350002 – 1.74327 5.57869 5.12724
21
0.489056144 – 2.47109 8.81911 7.87004
distortion left×#left pairs +#undistorted pairs
(n2 )
4
4
6
4
5
4
2
3
3
3
2
3
3
6
7
12
Table 11: Summary of tree-likeness measurements.
From the experiment results we observe that in almost all cases the measurements
seem to be monotonic with respect to each other. The smaller one measurement is
63
for a given dataset, the smaller the other measurements are. There are also a few exceptions. For example, EVA dataset has relatively large cluster-diameter, ∆s (G) = 9,
but small hyperbolicity, δ(G) = 1. On the other hand, Erdös dataset has ∆s (G) = 4
while its hyperbolicity δ(G) is equal to 2 (see Figure 8a). Yet Erdös dataset has better embedability (smaller average distortions) to trees H, Hℓ and Hℓ′ than that of EVA,
suggesting that the (average) cluster-diameter may have greater impact on the embedability into trees H, Hℓ and Hℓ′ . Comparing the measurements of Erdös vs. Homo release
3.2.99, we observe that both have the same hyperbolicity 2, but Erdös has better embedability (average distortion) to trees H, Hℓ , Hℓ′ . This could be explained by smaller
∆s (G) and average diameter of clusters in Erdös dataset. Comparing measurements of
PPI vs. California (the same holds for AS Caida 20071105 vs. AS Caida 20120601),
both have the same ∆s (G) and Rs (G) values but California (AS Caida 20120601) has
smaller hyperbolicity and average diameter of clusters. We also observe that the datasets
Routeview and AS Caida 20071105 have the same values of ∆s (G), Rs (G) and δ(G) but
AS Caida 20071105 has a relatively smaller average diameter of clusters. This could explain why AS Caida 20071105 has relatively better embedability to H, Hℓ and Hℓ′ than
Routeview. We can see that the difference in average diameters of clusters was relatively
small, resulting in small difference in embeddability.
From these observations, one can suggest that for classification of our datasets all
these tree-likeness measurements are important; they collectively capture and explain
metric tree-likeness of the datasets. We suggest that metric tree-likeness measurements
in conjunction with other local characteristics of networks, such as the degree distribution
and clustering coefficients, provide a more complete unifying picture of networks.
64
25
20
15
10
5
0
(G)
tree H average distortion*10
average diameter of clusters*10
∆s
12
1.6
10
1.5
8
1.4
avg. distortion
∆ (G)
Figure 7: Four tree-likeness measurements scaled.
6
4
1.3
1.2
1.1
2
1
0
1
2
2
2
2
2
2.5
2.5
2.5
2.5
3
3.5
1
4
2
2
2
2
2
2.5
2.5
2.5
2.5
3
3.5
4
δ(G)
δ(G)
(a) hyperbolicity δ(G) vs.
cluster-diameter (b) hyperbolicity δ(G) vs. avg. distortion of
∆s (G).
H.
2.4
2.4
2.2
avg. distortion
avg. distortion
2.2
2
1.8
1.6
2
1.8
1.6
1.4
1.2
1.4
1
1.2
1
4
4
5
6
6
6
6
6
6
8
8
9
10
11
11
21
s(G)
avg. diameter of clusters
(c) cluster-diameter ∆s (G) vs. avg. distortion (d) avg. diameter of clusters vs. avg. distorof H.
tion of H.
Figure 8: Tree-likeness measurements: pairwise comparison.
CHAPTER 3
Collective Additive Tree Spanners and the
Tree-Breadth of a Graph with Consequences
3.1
Introduction
The work in this chapter was inspired by few recent results from [70,76,84,86]. Elkin
and Peleg in [84], among other results, described a polynomial time algorithm that,
given an n-vertex graph G admitting a tree t-spanner, constructs a t-spanner of G with
O(n log n) edges. Emek and Peleg in [86] presented the first O(log n)-approximation
algorithm for the minimum value of t for the tree t-spanner problem. They described a polynomial time algorithm that, given an n-vertex graph G admitting a tree
t-spanner, constructs a tree O(t log n)-spanner of G. Later, a simpler and faster O(log n)approximation algorithm for the problem was given by Dragan and Köhler [76]. Their
result uses a new necessary condition for a graph to have a tree t-spanner: if a graph G
has a tree t-spanner, then G admits a Robertson-Seymour’s tree-decomposition with bags
of radius at most ⌈t/2⌉ in G. In other words, if a graph G admits a tree t-spanner, then its
tree-breadth is at most ⌈t/2⌉ and its tree-length is at most t. Furthermore, any graph G
with tree-breadth tb(G) ≤ ρ admits a tree (2ρ⌊log2 n⌋)-spanner that can be constructed
in polynomial time. Thus, these two results gave a new log2 n-approximation algorithm
for the tree t-spanner problem on general (unweighted) graphs (see [76] for details).
65
66
The algorithm of [76] is conceptually simpler than the previous O(log n)-approximation
algorithm proposed for the problem by Emek and Peleg [86].
Dourisboure et al. in [70] considered the construction of additive spanners with few
edges for n-vertex graphs having a tree-decomposition into bags of diameter at most
λ, i.e., the tree-length λ graphs. For such graphs, they construct additive 2λ-spanners
with O(λn + n log n) edges, and additive 4λ-spanners with O(λn) edges. Combining
these results with the results of [76], we obtain the following interesting fact (in a sense,
turning a multiplicative stretch into an additive surplus without much increase in the
number of edges).
Theorem 1. (combining [70] and [76]) If a graph G admits a (multiplicative) tree tspanner, then it has an additive 2t-spanner with O(tn + n log n) edges and an additive
4t-spanner with O(tn) edges, both constructible in polynomial time.
This fact raises a few intriguing questions. Does a polynomial time algorithm exist
that, given an n-vertex graph G admitting a (multiplicative) tree t-spanner, constructs
an additive O(t)-spanner of G with O(n) or O(n log n) edges (where the number of
edges in the spanner is independent of t)? Is a result similar to the one presented by
Elkin and Peleg in [84] possible? Namely, does a polynomial time algorithm exist that,
given an n-vertex graph G admitting a (multiplicative) tree t-spanner, constructs an
additive (t − 1)-spanner1 of G with O(n log n) edges? If we allow to use more trees (like
in collective tree spanners), does a polynomial time algorithm exist that, given an nvertex graph G admitting a (multiplicative) tree t-spanner, constructs a system of Õ(1)
collective additive tree Õ(t)-spanners of G (where Õ is similar to Big-O notation up to
1
Note that any additive (t − 1)-spanner is a multiplicative t-spanner (see Proposition 16).
67
a poly-logarithmic factor)? Note that an interesting question whether a multiplicative
tree spanner can be turned into an additive tree spanner with a slight increase in the
stretch is (negatively) settled already in [86]: if there exist some δ = o(n) and ϵ > 0 and
a polynomial time algorithm that for any graph admitting a tree t-spanner constructs a
tree ((6/5 − ϵ)t + δ)-spanner, then P=NP.
We give some partial answers to these questions. Moreover, we investigate a more
general question whether a graph with bounded tree-breadth admits a small system of
collective additive tree spanners. We show that any n-vertex graph G has a system of
at most log2 n collective additive tree (2ρ log2 n)-spanners, where ρ ≤ tb(G). This settles
also an open question from [70] whether a graph with tree-length λ admits a small system
of collective additive tree Õ(λ)-spanners.
As a consequence, we obtain that there is a polynomial time algorithm that, given an
n-vertex graph G admitting a (multiplicative) tree t-spanner, constructs:
- a system of at most log2 n collective additive tree O(t log n)-spanners of G (compare
with [76, 86] where a multiplicative tree O(t log n)-spanner was constructed for G
in polynomial time; thus, we “have turned” a multiplicative tree O(t log n)-spanner
into at most log2 n collective additive tree O(t log n)-spanners);
- an additive O(t log n)-spanner of G with at most n log2 n edges (compare with
Theorem 1).
It is well known that the t-spanners can equivalently be defined as follows.
Proposition 15 ([44]). Let G be a connected graph and t be a positive number. A
spanning subgraph H of G is a t-spanner of G if and only if for every edge xy of G,
68
dH (x, y) ≤ t holds.
This proposition implies that the stretch of a spanning subgraph of a graph G is
always obtained on a pair of vertices that form an edge in G. Consequently, throughout
this dissertation, t can be considered as an integer which is greater than 1 (the case t = 1
is trivial since H must be G itself).
It is also known that every additive r-spanner of G is a (multiplicative) (r+1)-spanner
of G.
Proposition 16 ([146]). Every additive r-spanner of G is a (multiplicative) (r + 1)spanner of G. The converse is generally not true.
3.2
Collective Additive Tree Spanners and the Tree-Breadth of
a Graph
In this section, we show that every n-vertex graph G has a system of at most log2 n
collective additive tree (2ρ log2 n)-spanners, where ρ ≤ tb(G). We also discuss consequences of this result. Our method is a generalization of techniques used in [79] and [76].
We will assume that n ≥ 4 since any connected graph with at most 3 vertices has an
additive tree 1-spanner.
Note that we do not assume here that a tree-decomposition T (G) of breadth ρ is given
for G as part of the input. Our method does not need to know T (G), our algorithm works
directly on G. For a given graph G and an integer ρ, even checking whether G has a
tree-decomposition of breadth ρ could be a hard problem. For example, while graphs
with tree-length 1 (as they are exactly the chordal graphs) can be recognized in linear
69
time, the problem of determining whether a given graph has tree-length at most λ is
NP-complete for every fixed λ > 1 (see [127]).
We will need the following results proven in [76].
Lemma 1 ([76]). Every graph G has a balanced disk-separator Dr (v, G) centered at some
vertex v, where r ≤ tb(G).
Lemma 2 ([76]). For an arbitrary graph G with n vertices and m edges, a balanced
disk-separator Dr (v, G) with minimum r can be found in O(nm) time.
3.3
Hierarchical decomposition of a graph with bounded treebreadth
In this section, following [76], we show how to decompose a graph with bounded
tree-breadth and build a hierarchical decomposition tree for it. This hierarchical decomposition tree is used later for construction of collective additive tree spanners for such a
graph.
Let G = (V, E) be an arbitrary connected n-vertex m-edge graph with a disk-separator
Dr (v, G). Also, let G1 , . . . , Gq be the connected components of G[V \ Dr (v, G)]. Denote
by Si := {x ∈ V (Gi )|dG (x, Dr (v, G)) = 1} the neighborhood of Dr (v, G) with respect
to Gi . Let also G+
i be the graph obtained from component Gi by adding a vertex ci
(representative of Dr (v, G)) and making it adjacent to all vertices of Si , i.e., for a vertex
x ∈ V (Gi ), ci x ∈ E(G+
i ) if and only if there is a vertex xD ∈ Dr (v, G) with xxD ∈ E(G).
See Figure 9 for an illustration. In what follows, we will call vertex ci a meta vertex
representing disk Dr (v, G) in graph G+
i . Given a graph G and its disk-separator Dr (v, G),
70
+
the graphs G+
1 , . . . , Gq can be constructed in total time O(m). Furthermore, the total
+
number of edges in the graphs G+
1 , . . . , Gq does not exceed the number of edges in G,
and the total number of vertices (including q meta vertices) in those graphs does not
exceed the number of vertices in G[V \ Dr (v, G)] plus q.
Figure 9: A graph G with a disk-separator Dr (v, G) and the corresponding graphs
+
G+
1 , . . . , G4 obtained from G. c1 , . . . , c4 are meta vertices representing the disk
Dr (v, G) in the corresponding graphs.
Denote by G/e the graph obtained from G by contracting its edge e. Recall that
edge e contraction is an operation which removes e from G while simultaneously merging
together the two vertices e previously connected. If a contraction results in multiple
edges, we delete duplicates of an edge to stay within the class of simple graphs. The
operation may be performed on a set of edges by contracting each edge (in any order).
The following lemma guarantees that the tree-breadths of the graphs G+
i , i = 1, . . . , q,
are no larger than the tree-breadth of G.
Lemma 3 ([76]). For any graph G and its edge e, tb(G) ≤ ρ implies tb(G/e ) ≤ ρ.
Consequently, for any graph G with tb(G) ≤ ρ, tb(G+
i ) ≤ ρ holds for each i = 1, . . . , q.
71
Clearly, one can get G+
i from G by repeatedly contracting (in any order) edges of G
that are not incident to vertices of Gi . In other words, G+
i is a minor of G. Recall that
a graph G′ is a minor of G if G′ can be obtained from G by contracting some edges,
deleting some edges, and deleting some isolated vertices. The order in which a sequence
of such contractions and deletions is performed on G does not affect the resulting graph
G′ .
Let G = (V, E) be a connected n-vertex, m-edge graph and assume that tb(G) ≤ ρ.
Lemma 1 and Lemma 2 guarantee that G has a balanced disk-separator Dr (v, G) with
r ≤ ρ, which can be found in O(nm) time by an algorithm that works directly on graph
G and does not require construction of a tree-decomposition of G of breadth ≤ ρ. Using
these and Lemma 3, we can build a (rooted) hierarchical tree H(G) for G as follows. If G
is a connected graph with at most 5 vertices, then H(G) is one node tree with root node
(V (G), G). Otherwise, find a balanced disk-separator Dr (v, G) in G with minimum r (see
+
+
+
Lemma 2) and construct the corresponding graphs G+
1 , G2 , . . . , Gq . For each graph Gi
+
(i = 1, . . . , q) (by Lemma 3, tb(G+
i ) ≤ ρ), construct a hierarchical tree H(Gi ) recursively
and build H(G) by taking the pair (Dr (v, G), G) to be the root and connecting the root
of each tree H(G+
i ) as a child of (Dr (v, G), G).
The depth of this tree H(G) (that is, the length of a longest path from the root to
any node) is the smallest integer k such that
n
1
1
+ k−1 + · · · + + 1 ≤ 5,
k
2
2
2
that is, the depth is at most log2 n − 1.
It is also easy to see that, given a graph G with n vertices and m edges, a hierarchical
72
tree H(G) can be constructed in O(nm log2 n) total time. There are at most O(log n)
levels in H(G), and one needs to do at most O(nm log n) operations per level since the
total number of edges in the graphs of each level is at most m and the total number of
vertices in those graphs cannot exceed O(n log n).
For an internal (i.e., non-leaf) node Y of H(G), since it is associated with a pair
(Dr′ (v ′ , G′ ), G′ ), where r′ ≤ ρ, G′ is a minor of G and v ′ is the center of disk Dr′ (v ′ , G′ )
of G′ , it will be convenient in what follows to denote G′ by G(↓ Y ), v ′ by c(Y ), r′ by
r(Y ), and Dr′ (v ′ , G′ ) by Y itself. Thus, (Dr′ (v ′ , G′ ), G′ ) = (Dr(Y ) (c(Y ), G(↓ Y )), G(↓
Y )) = (Y, G(↓ Y )) in these notations, and we identify node Y of H(G) with the set
Y = Dr(Y ) (c(Y ), G(↓ Y )) and associate with this node also the graph G(↓ Y ). See Figure
10 for an illustration. Each leaf Y of H(G), since it corresponds to a pair (V (G′ ), G′ ), we
identify with the set Y = V (G′ ) and use, for convenience, the notation G(↓ Y ) for G′ .
If now (Y 0 , Y 1 , . . . , Y h ) is the path of H(G) connecting the root Y 0 of H(G) with a
node Y h , then the vertex set of the graph G(↓ Y h ) consists of some (original) vertices
of G plus at most h meta vertices representing the disks Dr(Y ) (c(Y i ), G(↓ Y i )) = Y i ,
i = 0, 1, . . . , h − 1. Note also that each (original) vertex of G belongs to exactly one node
of H(G).
3.4
Construction of collective additive tree spanners
Unfortunately, the class of graphs of bounded tree-breadth is not hereditary, i.e.,
induced subgraphs of a graph with tree-breath ρ are not necessarily of tree-breadth at
most ρ (for example, a cycle of length ℓ with one extra vertex adjacent to each vertex of
the cycle has tree-breadth 1, but the cycle itself has tree-breadth ℓ/3). Thus, the method
73
Figure 10: a) A graph G and its balanced disk-separator D1 (13, G). b) A hierarchical
tree H(G) of G. We have G = G(↓ Y 0 ), Y 0 = D1 (13, G). Meta vertices are shown circled,
disk centers are shown in bold. c) The graph G(↓ Y 1 ) with its balanced disk-separator
D1 (23, G(↓ Y 1 )) = Y 1 . G(↓ Y 1 ) is a minor of G(↓ Y 0 ). d) The graph G(↓ Y 2 ), a minor
of G(↓ Y 1 ) and of G(↓ Y 0 ). Y 2 = V (G(↓ Y 2 )) is a leaf of H(G).
presented in [79], for constructing collective additive tree spanners for hereditary classes
of graphs admitting balanced disk-separators, cannot be applied directly to the graphs
of bounded tree-breadth. Nevertheless, we will show that, with the help of Lemma 3,
the notion hierarchical tree from the previous section and a careful analysis of distance
changes (see Lemma 4), it is possible to generalize the method of [79] and construct in
polynomial time for every n-vertex graph G a system of at most log2 n collective additive
tree (2ρ log2 n)-spanners, where ρ ≤ tb(G). Unavoidable presence of meta vertices in the
74
graphs resulting from a hierarchical decomposition of the original graph G complicates
the construction and the analysis. Recall that, in [79], it was shown that if every induced
subgraph of a graph G enjoys a balanced disk-separator with radius at most r, then G
admits a system of at most log2 n collective additive tree 2r-spanners.
Let G = (V, E) be a connected n-vertex, m-edge graph and assume that tb(G) ≤ ρ.
Let H(G) be a hierarchical tree of G. Consider an arbitrary internal node Y h of H(G),
and let (Y 0 , Y 1 , . . . , Y h ) be the path of H(G) connecting the root Y 0 of H(G) with Y h .
j
b
Let G(↓Y
) be the graph obtained from G(↓Y j ) by removing all its meta vertices (note
j
b
that G(↓Y
) may be disconnected).
Lemma 4. For any vertex z from Y h ∩ V (G), there exists an index i ∈ {0, 1, . . . , h}
such that c(Y i ) is not a meta vertex and vertices z and c(Y i ) are connected in the graph
b Y i ) by a path of length at most ρ(h + 1). In particular, dG (z, c(Y i )) ≤ ρ(h + 1) holds.
G(↓
Gh
Proof. Set Gh := G(↓ Y h ), c := c(Y h ), and let SPc,z
be a shortest path of Gh connecting
Gh
vertices c and z. We know that this path has at most r(Y h ) ≤ ρ edges. If SPc,z
does not
b Y h ) and of G and therefore
contain any meta vertices, then this path is a path of G(↓
dG (c, z) ≤ ρ holds.
Gh
does contain meta vertices and let µ′ be the closest to z meta
Assume now that SPc,z
Gh
Gh
= (c, . . . , a′ , µ′ , b′ , . . . , z).
. See Figure 11 for an illustration. Let SPc,z
vertex in SPc,z
By construction of H(G), meta vertex µ′ was created at some earlier recursive step to
′
′
represent disk Y i of graph Gi′ := G(↓ Y i ) for some i′ ∈ {0, . . . , h − 1}. Hence, there
G
′
is a path Pc′ ,zi′ = (c′ , . . . , b′ , . . . , z) of length at most 2ρ in Gi′ with c′ := c(Y i ). Again,
G
b Y i′ ) and of
if Pc′ ,zi′ does not contain any meta vertices, then this path is a path of G(↓
75
G
G and therefore dG (c′ , z) ≤ 2ρ holds. If Pc′ ,zi′ does contain meta vertices, then again,
G
G
“unfolding” a meta vertex µ′′ of Pc′ ,zi′ closest to z, we obtain a path Pc′′i,z′′ of length at
′′
′′
most 3ρ in Gi′′ := G(↓ Y i ) with c′′ := c(Y i ) for some i′′ ∈ {0, . . . , i′ − 1}.
By continuing “unfolding” this way meta vertices closest to z, after at most h steps,
we will arrive at the situation when, for some index i∗ ∈ {0, 1, . . . , h}, a path of length
∗
b Y i∗ ).
at most ρ(h + 1) will connect vertices z and c(Y i ) in the graph G(↓
Figure 11: Illustration to the proof of Lemma 4: “unfolding” meta vertices.
Consider two arbitrary vertices x and y of G, and let S(x) and S(y) be the nodes
of H(G) containing x and y, respectively. Let also N CAH(G) (S(x), S(y)) be the nearest
common ancestor of nodes S(x) and S(y) in H(G) and (Y 0 , Y 1 , . . . , Y h ) be the path of
H(G) connecting the root Y 0 of H(G) with N CAH(G) (S(x), S(y)) = Y h (in other words,
Y 0 , Y 1 , . . . , Y h are the common ancestors of S(x) and S(y)). Clearly, Y 0 ∪ Y 1 ∪ · · · ∪ Y h
separates vertices x and y in G.
G
connecting vertices x and y in G contains a vertex from Y 0 ∪
Lemma 5. Any path Px,y
Y 1 ∪ · · · ∪ Y h.
76
G
Let SPx,y
be a shortest path of G connecting vertices x and y, and let Y i be the node
G
of the path (Y 0 , Y 1 , . . . , Y h ) with the smallest index such that SPx,y
∩
Y i ̸= ∅ in G. The
following lemma holds.
j
b
).
Lemma 6. For each j = 0, . . . , i, we have dG (x, y) = dG′ (x, y), where G′ := G(↓Y
G
Proof. It is enough to show that the path SPx,y
consists of only vertices of G′ . Assume,
G
by a way of contradiction, that there is a vertex z of SPx,y
that does not belong to G′ . Let
G
G
SPx,z
be a subpath of SPx,y
between x and z. Clearly, the node S(z) of H(G), containing
vertex z, is not a descendent of Y i . Therefore, the nearest common ancestor of S(x) and
S(z) in H(G) is a node Y j from {Y 0 , Y 1 , . . . , Y h } with j < i. But then, by Lemma 5,
G
G
the path SPx,z
(and hence the path SPx,y
) must have a vertex in Y 0 ∪ Y 1 ∪ · · · ∪ Y j ,
contradicting the choice of Y i , i > j.
Let now B1i , . . . , Bpi i be the nodes at depth i of the tree H(G). For each node Bji that
is not a leaf of H(G), consider its (central) vertex cij := c(Bji ). If cij is an original vertex of
G (not a meta vertex created during the construction of H(G)), then define a connected
graph Gij obtained from G(↓ Bji ) by removing all its meta vertices. If removal of those
meta vertices produced few connected components, choose as Gij that component which
contains the vertex cij . Denote by Tji a BFS–tree of graph Gij rooted at vertex cij of Bji .
If Bji is a leaf of H(G), then Bji has at most 5 vertices. In this case, remove all meta
vertices from G(↓ Bji ) and for each connected component of the resulting graph construct
an additive tree spanner with optimal surplus ≤ 3. Note that the diameter of a tree with
5 vertices is at most 4. Denote the resulting subtree (forest) by Tji .
The trees Tji (i = 0, 1, . . . , depth(H(G)), j = 1, 2, . . . , pi ) obtained this way, are called
77
Figure 12: Illustration to the proof of Lemma 7.
local subtrees of G. Clearly, the construction of these local subtrees can be incorporated
into the procedure of constructing the hierarchical tree H(G) of G and will not increase
the overall O(nm log2 n) run-time (see Section 3.3).
Lemma 7. For any two vertices x, y ∈ V (G), there exists a local subtree T such that
dT (x, y) ≤ dG (x, y) + 2ρ log2 n − 1.
G
Proof. We know, by Lemma 6, that a shortest path SPx,y
, intersecting Y i and not inter-
b Y i ). Thus, dG (x, y) = dG′ (x, y). If Y i is
secting any Y l (l < i), lies entirely in G′ := G(↓
a leaf of H(G), then for a local subtree T ′ (it could be a forest) of G constructed for G′ ,
the following holds:
dT ′ (x, y) ≤ dG′ (x, y) + 3 = dG (x, y) + 3 ≤ dG (x, y) + 2ρ log2 n − 1
(since n ≥ 4 and ρ ≥ 1). Assume now that Y i is an internal node of H(G). We have
i ≤ log2 n − 2, since the depth of H(G) is at most log2 n − 1. Let z ∈ Y i be a vertex on
G
. By Lemma 4, there exists an index j ∈ {0, 1, . . . , i} such that
the shortest path SPx,y
78
b Y j ) by a path of length at
the vertices z and c(Y j ) can be connected in the graph G(↓
b Y j ) and c := c(Y j ). By
most ρ(i + 1). See Figure 12 for an illustration. Set G′′ := G(↓
Lemma 6, dG (x, y) = dG′ (x, y) = dG′′ (x, y). Let T ′′ be the local tree constructed for graph
b Y j ), i.e., a BFS–tree of a connected component of the graph G′′ = G(↓
b Y j)
G′′ = G(↓
and rooted at vertex c = c(Y j ).
We have dT ′′ (x, c) = dG′′ (x, c) and dT ′′ (y, c) = dG′′ (y, c). By the triangle inequality,
dT ′′ (x, c) = dG′′ (x, c) ≤ dG′′ (x, z) + dG′′ (z, c)
and
dT ′′ (y, c) = dG′′ (y, c) ≤ dG′′ (y, z) + dG′′ (z, c).
That is,
dT ′′ (x, y) ≤ dT ′′ (x, c)+dT ′′ (y, c) ≤ dG′′ (x, z)+dG′′ (y, z)+2dG′′ (z, c) = dG′′ (x, y)+2dG′′ (z, c).
Now, using Lemma 6 and inequality dG′′ (z, c) ≤ ρ(i + 1) ≤ ρ(log2 n − 1), we get
dT ′′ (x, y) ≤ dG′′ (x, y) + 2dG′′ (z, c) ≤ dG (x, y) + 2ρ(log2 n − 1).
This lemma implies two important results. Let G be a graph with n vertices and
m edges having tb(G) ≤ ρ. Also, let H(G) be its hierarchical tree and LT (G) be the
family of all its local subtrees (defined above). Consider a graph H obtained by taking
the union of all local subtrees of G (by putting all of them together), i.e.,
H :=
∪
{Tji |Tji ∈ LT (G)} = (V, ∪{E(Tji )|Tji ∈ LT (G)}).
79
Clearly, H is a spanning subgraph of G, constructible in O(nm log2 n) total time, and,
for any two vertices x and y of G, dH (x, y) ≤ dG (x, y) + 2ρ log2 n − 1 holds. Also, since
for every level i (i = 0, 1, . . . , depth(H(G))) of hierarchical tree H(G), the corresponding
local subtrees T1i , . . . , Tpii are pairwise vertex-disjoint, their union has at most n−1 edges.
Therefore, H cannot have more than (n − 1) log2 n edges in total. Thus, we have proven
the following result.
Theorem 2. Every graph G with n vertices and tb(G) ≤ ρ admits an additive (2ρ log2 n)spanner with at most n log2 n edges. Furthermore, such a sparse additive spanner of G
can be constructed in polynomial time.
Instead of taking the union of all local subtrees of G, one can fix i (i ∈ {0, 1, . . . ,
depth(H(G))}) and consider separately the union of only local subtrees T1i , . . . , Tpii , corresponding to the level i of the hierarchical tree H(G), and then extend in linear O(m)
time that forest to a spanning tree T i of G (using, for example, a variant of Kruskal’s
Spanning Tree algorithm for the unweighted graphs). We call this tree T i the spanning
tree of G corresponding to the level i of the hierarchical tree H(G). In this way we can
obtain at most log2 n spanning trees for G, one for each level i of H(G). Denote the
collection of those spanning trees by T (G). Thus, we obtain the following theorem.
Theorem 3. Every graph G with n vertices and tb(G) ≤ ρ admits a system T (G) of at
most log2 n collective additive tree (2ρ log2 n)-spanners. Furthermore, such a system of
collective additive tree spanners of G can be constructed in polynomial time.
80
3.5
Additive spanners for graphs admitting (multiplicative) tree
t-spanners
Now we give two implications of the above results for the class of tree t-spanner
admissible graphs. In [76], the following important (“bridging”) lemma was proven.
Lemma 8 ([76]). If a graph G admits a tree t-spanner, then its tree-breadth is at most
⌈t/2⌉.
Note that the tree-breadth bounded by ⌈t/2⌉ provides only a necessary condition
for a graph to have a multiplicative tree t-spanner. There are (chordal) graphs which
have tree-breadth 1 but any multiplicative tree t-spanner of them has t = Ω(log n) [76].
Furthermore, a cycle on 3n vertices has tree-breadth n but admits a system of 2 collective
additive tree 0-spanners.
Combining Lemma 8 with Theorem 2 and Theorem 3, we deduce the following
results.
Theorem 4. Let G be a graph with n vertices and m edges having a (multiplicative) tree
t-spanner. Then G admits an additive (2⌈t/2⌉ log2 n)-spanner with at most n log2 n edges
constructible in O(nm log2 n) time.
Theorem 5. Let G be a graph with n vertices and m edges having a (multiplicative)
tree t-spanner. Then G admits a system T (G) of at most log2 n collective additive tree
(2⌈t/2⌉ log2 n)-spanners constructible in O(nm log2 n) time.
CHAPTER 4
Collective Additive Tree Spanners of Graphs with
Bounded k-Tree-Breadth, k ≥ 2
4.1
Introduction
In this chapter we generalize the method of Chapter 3. We define a new notion which
combines both the tree-width and the tree-breadth of a graph.
We define a new notion/ parameter that is related to the problem of k-Tree-width
t-spanner. This parameter combines both the tree-width and the tree-breadth of a
graph. The k-breadth of a tree-decomposition T (G) = ({Xi |i ∈ I}, T = (I, F )) of a
graph G is the minimum integer r such that for each bag Xi , i ∈ I, there is a set of
at most k vertices Ci = {vji |vji ∈ V (G), j = 1, . . . , k} such that for each u ∈ Xi , we
have dG (u, Ci ) ≤ r (i.e., each bag Xi can be covered with at most k disks of G of radius
at most r each; Xi ⊆ Dr (v1i , G) ∪ . . . ∪ Dr (vki , G)). The k-tree-breadth of a graph G,
denoted by tbk (G), is the minimum of the k-breadth, over all tree-decompositions of G.
We say that a family of graphs G is of bounded k-tree-breadth, if there is a constant c such
that for each graph G from G, tbk (G) ≤ c. Clearly, for every graph G, tb(G) = tb1 (G),
and tw(G) ≤ k − 1 if and only if tbk (G) = 0 (consider each vertex in the bags of the
tree-decomposition of width k as a disk center of radius 0). Thus, the notions tree-width
and the tree-breadth are particular cases of the k-tree-breadth.
81
82
In this chapter, we show that any n-vertex graph G with tbk (G) ≤ ρ has a system
of at most k(1 + log2 n) collective additive tree (2ρ(1 + log2 n))-spanners constructible in
polynomial time for every fixed k. We will assume that n > k, since any graph with n
vertices has a system of n − 1 collective additive tree 0-spanners (consider n − 1 BFStrees rooted at different vertices). Also, In Section 4.6, we extend a result from [76] and
show that if a graph G admits a (multiplicative) t-spanner H with tw(H) = k − 1 then
its k-tree-breadth is at most ⌈t/2⌉. As a consequence, we obtain that, for every fixed
k, there is a polynomial time algorithm that, given an n-vertex graph G admitting a
(multiplicative) t-spanner with tree-width at most k − 1, constructs:
- a system of at most k(1 + log2 n) collective additive tree O(t log n)-spanners of G;
- an additive O(t log n)-spanner of G with at most O(kn log n) edges.
4.2
Balanced separators for graphs with bounded k-tree-breadth
We will need the following balanced clique-separator result for chordal graphs. Recall
that a graph is chordal if each of its induced cycles has length three.
Theorem 6 ([97]). Every chordal graph G with n vertices and m edges contains a maximal clique C such that if the vertices in C are deleted from G, every connected component
in the graph induced by any remaining vertices is of size at most n/2. Such a balanced
clique-separator C of a connected chordal graph can be found in O(m) time.
We say that a graph G = (V, E) with |V | ≥ k has a balanced Dkr -separator if there
exists a collection of k disks Dr (v1 , G), Dr (v2 , G), . . . , Dr (vk , G) in G, centered at (different) vertices v1 , v2 , . . . , vk and each of radius r, such that the union of those disks
83
Dkr :=
∪k
i=1
Dr (vi , G) forms a balanced separator of G, i.e., each connected component
of G[V \ Dkr ] has at most |V |/2 vertices. The following result generalizes Lemma 1.
Lemma 9. Every graph G with at least k vertices and tbk (G) ≤ ρ has a balanced Dkρ separator.
Proof. The proof of this lemma follows from acyclic hypergraph theory. First we review
some necessary definitions and an important result characterizing acyclic hypergraphs.
Recall that a hypergraph H is a pair H = (V, E) where V is a set of vertices and E is a
set of non-empty subsets of V called hyperedges. For these and other hypergraph notions
see [31].
Let H = (V, E) be a hypergraph with the vertex set V and the hyperedge set E. For
every vertex v ∈ V , let E(v) = {e ∈ E |v ∈ e}. The 2–section graph 2SEC(H) of a
hypergraph H has V as its vertex set and two distinct vertices are adjacent in 2SEC(H)
if and only if they are contained in a common hyperedge of H. A hypergraph H is
called conformal if every clique of 2SEC(H) is contained in a hyperedge e ∈ E, and a
hypergraph H is called acyclic if there is a tree T with node set E such that for all vertices
v ∈ V , E(v) induces a subtree Tv of T . It is a well-known fact (see, e.g., [15, 30, 31]) that
a hypergraph H is acyclic if and only if H is conformal and 2SEC(H) of H is a chordal
graph.
Let now G = (V, E) be a graph with tbk (G) = ρ and T (G) = ({Xi |i ∈ I}, T =
(I, F )) be its tree-decomposition of k-breadth ρ. Evidently, the third condition of treedecompositions can be restated as follows: the hypergraph H = (V (G), {Xi |i ∈ I}) is
an acyclic hypergraph. Since each edge of G is contained in at least one bag of T (G),
84
the 2–section graph G∗ := 2SEC(H) of H is a chordal supergraph of the graph G (each
edge of G is an edge of G∗ , but G∗ may have some extra edges between non-adjacent
vertices of G contained in a common bag of T (G)). By Theorem 6, the chordal graph
G∗ contains a balanced clique-separator C ⊆ V (G). By conformality of H, C must be
contained in a bag of T (G). From the definition of k-breadth, there must exist k vertices
v1 , v2 , . . . , vk such that C ⊆ Dkρ , where Dkρ = Dρ (v1 , G) ∪ · · · ∪ Dρ (vk , G). As the removal
of the vertices of C from G∗ leaves no connected component in G∗ [V \ C] with more than
|V |/2 vertices and since G∗ is a supergraph of G, clearly, the removal of the vertices of Dkρ
from G leaves no connected component in G[V \ Dkρ ] with more than |V |/2 vertices.
Again, as in Chapter 3, we do not assume that a tree-decomposition T (G) of k-breadth
ρ is given for G as part of the input. Our method does not need to know T (G). For a given
graph G, integers k ≥ 1 and ρ ≥ 0, even checking whether G has a tree-decomposition
of k-breadth ρ is a hard problem (as tbk (G) = 0 if and only if tw(G) ≤ k − 1) (see
Subsection 1.3.1).
Let G be an arbitrary connected n-vertex m-edge graph. In [76], an algorithm was
described which, given G and its arbitrary fixed vertex v, finds in O(m) time a balanced
disk separator Dr (v, G) of G centered at v and with minimum r. We can use this
algorithm as a subroutine to find for G in O(nk m) time a balanced Dkr -separator with
minimum r. Given arbitrary k vertices v1 , v2 , . . . , vk of G, we can add a new dummy
vertex x to G and make it adjacent to only v1 , v2 , . . . , vk in G. Denote the resulting
graph by G + x. Then, a balanced disk separator Dr+1 (x, G + x) of G + x with minimum
r +1 gives a balanced separator of G of the form Dr (v1 , G)∪· · ·∪Dr (vk , G) (for particular
85
disk centers v1 , v2 , . . . , vk ) with minimum r. Iterating over all k vertices of G, we can find
a balanced Dkr -separator of G with the smallest (absolute minimum) radius r. Thus, we
have the following result.
Proposition 17. Let k be a positive integer (assumed to be small). For an arbitrary
graph G with n ≥ k vertices and m edges, a balanced Dkr -separator with the smallest
radius r can be found in O(nk m) time.
4.3
Decomposition of a graph with bounded k-tree-breadth
Let G = (V, E) be an arbitrary connected graph with n vertices and m edges and
with a balanced Dkr -separator, where Dkr =
∪k
j=1
Dr (vj , G).
Note that some disks
in {Dr (v1 , G), . . . , Dr (vk , G)} may overlap. In what follows, we will partition Dkr =
∪k
j=1
Dr (vj , G) into k sets D1 , . . . , Dk such that no two of them intersect and each
Dj , j = 1, . . . , k, contains at least one vertex vj and induces a connected subgraph
of G[Dr (vj , G)]. Create a graph G + s by adding a new dummy vertex s to G and making
it adjacent to only v1 , v2 , . . . , vk in G. Let T be a BFS-tree of G + s started at vertex s
and T ′ be a subtree of T formed by vertices {v ∈ V (G+s)|dT (s, v) ≤ r +1} and rooted at
s. Let also T (v1 ), . . . , T (vk ) be the subtrees of T ′ \ {s} rooted at v1 , . . . , vk , respectively.
Clearly, each T (vj ), j = 1, . . . , k, is a subtree (not necessarily spanning) of G[Dr (vj , G)]
and Dkr =
∪k
j=1
V (T (vj )). Set now Dj := V (T (vj )), j = 1, . . . , k.
Let G1 , G2 , . . . , Gq be the connected components of G[V \ Dkr ]. Denote by Sij = {v ∈
V (Gi )|dG (v, Dj ) = 1}, i = 1, . . . , q, j = 1, . . . , k, the neighborhood of Dj in Gi . Also,
j
let G+
i be the graph obtained from component Gi by adding one meta vertex ci for each
disk Dr (vj , G) (a representative of Dr (vj , G)), j = 1, . . . k, and making it adjacent to all
86
vertices of Sij , i.e., for a vertex x ∈ V (Gi ), cji x ∈ E(G+
i ) if and only if there is a vertex
xD ∈ Dj ⊆ Dr (vj , G) with xxD ∈ E(G). If Sij is empty for some j, then vertex cji is not
j
l
added to G+
i . Also, add an edge between any two representatives ci and ci if vertices vj
and vl are connected by a path in G[V \ V (Gi )]. See Figure 13 for an illustration.
+
Given an n-vertex m-edge graph G and its balanced Dkr -separator, the graphs G+
1 , . . . , Gq
can be constructed in total time O(kqm). Furthermore, the total number of edges in
+
2
graphs G+
1 , . . . , Gq does not exceed m + qk , and the total number of vertices in those
graphs does not exceed the number of vertices in G[V \ Dkr ] plus qk.
Figure 13: A graph G with a balanced D3r -separator and the corresponding graphs
+
+
G+
1 , . . . , G4 obtained from G. Each Gi has three meta vertices representing the
three disks.
Note that G+
i is a minor of G and can be obtained from G by a sequence of edge
contractions in the following way. First contract all edges (in any order) that are incident
to V (Gi′ ), for all i′ = 1, . . . , q, i′ ̸= i. Then, for each j = 1, . . . , k, contract (all edges of)
connected subgraph G[Dj ] of G to get meta vertex cji representing the disk Dr (vj , G) in
G+
i .
Let again G/e be the graph obtained from G by contracting edge e. We have the
87
following analog of Lemma 3.
Lemma 10. For any graph G and its edge e, tbk (G) ≤ ρ implies tbk (G/e ) ≤ ρ. Consequently, for any graph G with tbk (G) ≤ ρ, tbk (G+
i ) ≤ ρ holds for i = 1, . . . , q.
Proof. Our proof is similar to the proof from [76] of Lemma 3. We provide it here for
the sake of completeness. Let T (G) = ({Xi |i ∈ I}, T = (I, F )) be a tree-decomposition
of G with k-breadth ρ. Let e = xy be an arbitrary edge of G. We can obtain a treedecomposition T (G/e ) of the graph G/e by replacing in each bag Xi , i ∈ I, vertices x
and y with a new vertex x′ representing them (if some bag A contained both x and
y, only one copy of x′ is kept). Evidently, the first and the second conditions of treedecompositions are fulfilled for T (G/e ). Furthermore, the topology (the tree T = (I, F ))
of the tree-decomposition did not change. Still, for any vertex v ̸= x′ of G/e , the bags
of T (G/e ) containing v form a subtree in T (G/e ). Since vertices x and y were adjacent
in G, there was a bag A of T (G) containing both those vertices. Hence, a subtree of
T (G/e ) formed by bags of T (G/e ) containing vertex x′ is nothing else but the union of
two subtrees (one for x and one for y) of T (G) sharing at least one common bag A.
Also, contracting an edge can only reduce the distances in a graph. Hence, still, for
each bag B of T (G/e ), there must exist corresponding vertices v1 , . . . , vk in G/e with
B ⊆ Dρ (v1 , G/e ) ∪ · · · ∪ Dρ (vk , G/e ). Thus, tbk (G/e ) ≤ ρ. Since G+
i can be obtained from
G by a sequence of edge contractions, we also have tbk (G+
i ) ≤ ρ.
4.4
Construction of a hierarchical tree
Here we show how a hierarchical tree for a graph with bounded k-tree-breadth is
built.
88
Let G = (V, E) be a connected n-vertex, m-edge graph with tbk (G) ≤ ρ and n ≥ k.
Lemma 9 guarantees that G has a balanced Dkr -separator with r ≤ ρ. Proposition 17 says
that such a balanced Dkr -separator of G can be found in O(nk m) time by an algorithm that
works directly on the graph G and does not require construction of a tree-decomposition
of G with k-breadth ≤ ρ. Using these and Lemma 10, we can build a rooted hierarchicaltree H(G) for G, which is constructed as follows. If G is a connected graph with at most
2k + 1 vertices, then H(G) is a one node tree with root node (V (G), G). It is known
[104] that any connected graph with p ≥ 2 vertices has a dominating set of size ⌊p/2⌋,
i.e., all vertices of it can be covered by ⌊p/2⌋ disks of radius one. Hence, in our case,
G with at most 2k + 1 vertices can be covered by k disks of radius one each, i.e., there
are k vertices v1 , . . . , vk such that V (G) = Dr (v1 , G) ∪ · · · ∪ Dr (vk , G) for r = 1 ≤ ρ. If
G is a connected graph with more than 2k + 1 vertices, find a balanced Dkr -separator of
+
minimum radius r in O(nk m) time and construct the corresponding graphs G+
1 , . . . , Gq .
+
For each graph G+
i , i ∈ {1, . . . , q}, (by Lemma 10, tbk (Gi ) ≤ ρ) construct a hierarchical
k
tree H(G+
i ) recursively and build H(G) by taking the pair (Dr , G) to be the root and
k
connecting the root of each tree H(G+
i ) as a child of (Dr , G).
The depth of this tree H(G) is the smallest integer p such that
n
1
1
+
k(
+
·
·
·
+
+ 1) ≤ 2k + 1,
2p
2p−1
2
that is, the depth is at most log2 n. It is also not hard to see that, given a graph G with n
vertices and m edges, a hierarchical tree H(G) can be constructed in O((kn)k+2 logk+1 n)
total time. There are at most O(log n) levels in H(G), and one needs to do at most
O((n + kn log n)k (m + k 2 n log n)) ≤ O((kn)k+2 logk n) operations per level since the total
89
number of edges in the graphs of each level is at most O(m + k 2 n log n) and the total
number of vertices in those graphs can not exceed O(n + kn log n).
For nodes of H(G), we use the same notation as in Chapter 3. For a node Y of
H(G), since it is associated with a pair (Dkr′ , G′ ), where r′ ≤ ρ, G′ is a minor of G and
Dkr′ = Dr′ (v1′ , G′ ) ∪ · · · ∪ Dr′ (v1′ , G′ ), it is convenient to denote G′ by G(↓ Y ), {v1′ , . . . , vk′ }
by c(Y ) = {c1 (Y ), . . . , ck (Y )}, r′ by r(Y ), and Dkr′ by Y itself. Thus, (Dkr′ , G′ ) =
∪
( kl=1 Dr(Y ) (cl (Y ), G(↓ Y )), G(↓ Y )) = (Y, G(↓ Y )) in these notations, and we identify
node Y of H(G) with the set
∪k
l=1
Dr(Y ) (cl (Y ), G(↓ Y )) and associate with this node
also the graph G(↓ Y ). If now (Y 0 , Y 1 , . . . , Y h ) is the path of H(G) connecting the
root Y 0 of H(G) with a node Y h , then the vertex set of the graph G(↓ Y h ) consists
of some (original) vertices of G plus at most kh meta vertices representing the disks
Dr(Y ) (c1 (Y i ), G(↓ Y i )), . . . , Dr(Y ) (ck (Y i ), G(↓ Y i )) of Y i , i = 0, 1, . . . , h − 1. Note also
that each (original) vertex of G belongs to exactly one node of H(G).
4.5
Construction of collective additive tree spanners
Let G = (V, E) be a connected n-vertex, m-edge graph and assume that tbk (G) ≤ ρ
and n ≥ k. Let H(G) be a hierarchical tree of G. Consider an arbitrary node Y h of
H(G), and let (Y 0 , Y 1 , . . . , Y h ) be the path of H(G) connecting the root Y 0 of H(G) with
j
b
Y h . Let G(↓Y
) be the graph obtained from G(↓Y j ) by removing all its meta vertices
j
b
(note that G(↓Y
) may be disconnected and that all meta vertices of G(↓Y j ) come from
previous levels of H(G)). We have the following analog of Lemma 4.
Lemma 11. For any vertex z from Y h ∩ V (G), there exists an index i ∈ {0, 1, . . . , h}
such that the vertices z and cl (Y i ), for some l ∈ {1, . . . , k}, can be connected in the graph
90
b Y i ) by a path of length at most ρ(h + 1). In particular, dG (z, cl (Y i )) ≤ ρ(h + 1)
G(↓
holds.
Proof. The proof is similar to the proof of Lemma 4 of Chapter 3. Set Gh := G(↓ Y h )
and c := cl (Y h ), where z ∈ Dl ⊆ Dr(Y h ) (cl (Y h ), Gh ) (for the definition of set Dl see the
Gh
first paragraph of Section 4.3). Let SPc,z
be a shortest path of Gh connecting vertices c
Gh
and z. We know that this path has at most r(Y h ) ≤ ρ edges. If SPc,z
does not contain
b Y h ) and of G and therefore dG (c, z) ≤ ρ
any meta vertices, then this path is a path of G(↓
holds.
Gh
Assume now that SPc,z
does contain meta vertices and let µ′ be the closest to z meta
Gh
Gh
vertex in SPc,z
(consult with Figure 11 of Chapter 3 ). Let SPc,z
= (c, . . . , a′ , µ′ , b′ , . . . , z).
By construction of H(G), meta vertex µ′ was created at some earlier recursive step to
′
′
represent one disk of Y i of graph Gi′ := G(↓ Y i ) for some i′ ∈ {0, . . . , h − 1}. Hence,
′
G
there is a path Pc′ ,zi′ = (c′ , . . . , b′ , . . . , z) of length at most 2ρ in Gi′ with c′ := cl′ (Y i )
G
for some l′ ∈ {1, . . . , k}. Again, if Pc′ ,zi′ does not contain any meta vertices, then this
b Y i′ ) and of G and therefore dG (c′ , z) ≤ 2ρ holds. If P G′ i′ does
path is a path of G(↓
c ,z
G
contain meta vertices then again, “unfolding” a meta vertex µ′′ of Pc′ ,zi′ closest to z, we
G
′′
′′
obtain a path Pc′′i,z′′ of length at most 3ρ in Gi′′ := G(↓ Y i ) with c′′ := cl′′ (Y i ) for some
i′′ ∈ {0, . . . , i′ − 1} and l′′ ∈ {1, . . . , k}.
We continue “unfolding” this way meta vertices closest to z. Eventually, after at most
h steps, we will arrive at the situation when, for some index i∗ ∈ {0, 1, . . . , h}, a path of
∗
length at most ρ(h + 1) will connect vertices z and cl∗ (Y i ), for some l∗ ∈ {1, . . . , k}, in
b Y i∗ ).
the graph G(↓
91
Let B1i , . . . , Bpi i be the nodes at depth i of the tree H(G). Assume Bji =
∪k
l=1
Dr (cij (l), G(↓
Bji )), where r := r(Bji ). Denote k central vertices of Bji by Cji = {cij (1), cij (2), . . . , cij (k)}.
For each node Bji , consider its (central) vertex cij (l) (l ∈ {1, . . . , k}). If cij (l) is an original
vertex of G (not a meta vertex created during the construction of H(G)), then define
a connected graph Gij (l) obtained from G(↓ Bji ) by removing all its meta vertices. If
removal of those meta vertices produces few connected components, choose as Gij (l) that
component which contains the vertex cij (l). Denote by Tji (l) a BFS–tree of graph Gij (l)
rooted at vertex cij (l) of Bji .
The trees Tji (l) (i = 0, 1, . . . , depth(H(G)), j = 1, 2, . . . , pi , l = 1, 2, . . . , k), obtained
this way, are called local subtrees of G. Clearly, the construction of these local subtrees
can be incorporated into the procedure of constructing a hierarchical tree H(G) of G and
will not increase the overall O((kn)k+2 logk+1 n) run-time (see Section 4.4).
Since Lemma 5 and Lemma 6 hold for G, similarly to the proof of Lemma 7, one can
prove its analog for graphs with bounded k-tree-breadth.
Lemma 12. For any two vertices x, y ∈ V (G), there exists a local subtree T such that
dT (x, y) ≤ dG (x, y) + 2ρ(1 + log2 n).
This lemma implies the following two results. Let G be a graph with n vertices and
m edges having tbk (G) ≤ ρ. Let also H(G) be its hierarchical tree and LT (G) be the
family of all its local subtrees (defined above). Consider a graph H obtained by taking
the union of all local subtrees of G (by putting all of them together). Clearly, H is a
spanning subgraph of G, constructible in polynomial time for every fixed k. We have
dH (x, y) ≤ dG (x, y) + 2ρ(1 + log2 n) for any two vertices x and y of G. Also, since for
92
every level i (i = 0, 1, . . . , depth(H(G))) of hierarchical tree H(G), the corresponding
local subtrees T1i (l), . . . , Tpii (l) for each fixed index l ∈ {1, . . . , k} are pairwise vertexdisjoint, their union has at most n − 1 edges. Therefore, H cannot have more than
k(n − 1)(1 + log2 n) edges in total. Thus, we have the following result.
Theorem 7. Every graph G with n vertices and tbk (G) ≤ ρ admits an additive (2ρ(1 +
log2 n))-spanner with at most O(kn log n) edges constructible in polynomial time for every
fixed k.
For a node Bji of H(G), let Tji = {Tji (1), . . . , Tji (k)} be the set of its local subtrees. Instead of taking the union of all local subtrees of G, one can fix i (i ∈ {0, 1, . . . , depth(H(G))})
and fix l ∈ {1, . . . , k} and consider separately the union of only local subtrees T1i (l), . . . , Tpii (l),
corresponding to the lth subtrees of level i of the hierarchical tree H(G), and then extend in linear O(m) time that forest to a spanning tree T i (l) of G (using, for example,
a variant of Kruskal’s Spanning Tree algorithm for the unweighted graphs). We call this
tree T i (l) the lth spanning tree of G corresponding to the level i of the hierarchical tree
H(G). In this way we can obtain at most k(1 + log2 n) spanning trees for G, k trees for
each level i of H(G). Denote the collection of those spanning trees by T (G). Thus, we
deduce the following theorem.
Theorem 8. Every graph G with n vertices and tbk (G) ≤ ρ admits a system T (G) of
at most k(1 + log2 n) collective additive tree (2ρ(1 + log2 n))-spanners constructible in
polynomial time for every fixed k.
93
4.6
Additive Spanners for Graphs Admitting (Multiplicative)
t-Spanners of Bounded Tree-width.
In this section, we show that if a graph G admits a (multiplicative) t-spanner H with
tw(H) = k −1 then its k-tree-breadth is at most ⌈t/2⌉. As a consequence, we obtain that,
for every fixed k, there is a polynomial time algorithm that, given an n-vertex graph G
admitting a (multiplicative) t-spanner with tree-width at most k − 1, constructs a system
of at most k(1 + log2 n) collective additive tree O(t log n)-spanners of G.
4.6.1
k-Tree-breadth of a graph admitting a t-spanner of bounded tree-width
Let H be a graph with tree-width k − 1, and let T (H) = ({Xi |i ∈ I}, T = (I, F ))
(r)
be its tree-decomposition of width k − 1. For an integer r ≥ 0, denote by Xi , i ∈ I,
the set Dr (Xi , H) :=
∪
x∈Xi
(0)
Dr (x, H). Clearly, Xi
= Xi for every i ∈ I. The following
important lemma holds.
(r)
Lemma 13. For every integer r ≥ 0, T (r) (H) := ({Xi |i ∈ I}, T = (I, F )) is a treedecomposition of H with k-breadth ≤ r.
Proof. It is enough to show that the third condition of tree-decompositions (see Subsection 1.3.1) is fulfilled for T (r) (H). That is, for all i, j, k ∈ I, if j is on the path from i to
(r)
k in T , then Xi
∩
(r)
(r)
Xk ⊆ Xj . We know that Xi
∩
Xk ⊆ Xj holds and need to show
that for every vertex v of H, dH (v, Xi ) ≤ r and dH (v, Xk ) ≤ r imply dH (v, Xj ) ≤ r.
Assume, by way of contradiction, that for some integer r > 0 and for some vertex v of
H, dH (v, Xj ) > r while dH (v, Xi ) ≤ r and dH (v, Xk ) ≤ r.
Consider the original tree-decomposition T (H). It is known [65] that if ab (a, b ∈ I)
is an edge of the tree T = (I, F ) of tree-decomposition T (H), and Ta , Tb are the subtrees
94
of T obtained after removing edge ab from T , then S = Xa ∩ Xb separates in H vertices
belonging to bags of Ta but not to S from vertices belonging to bags of Tb but not to S.
We will use this nice separation property.
Let T \ {j} be the forest obtained from T by removing node j, and let T (i) and T (k)
be the trees from this forest containing nodes i and k, respectively. Clearly, T (i) and T (k)
are disjoint. The above separation property and inequalities dH (v, Xi ) ≤ r < dH (v, Xj )
ensure that the vertex v belongs to a node (a bag) of T (i) (Xj cannot separate in H vertex
v from a vertex xi of Xi with dH (v, Xi ) = dH (v, xi ) since otherwise dH (v, Xi ) > dH (v, Xj )
will hold). Similarly, inequalities dH (v, Xk ) ≤ r < dH (v, Xj ) and the above separation
property guarantee that the vertex v belongs to a node of T (k). But then, the third
condition of tree-decompositions says that v must also belong to the bag Xj of T (H).
The latter, however, is in a contradiction to the assumption that dH (v, Xj ) > r ≥ 0.
Now we can prove the main lemma of this section.
Lemma 14. If a graph G admits a t-spanner with tree-width k − 1, then tbk (G) ≤ ⌈t/2⌉.
Proof. Let H be a t-spanner of G with tw(G) = k−1 and T (H) = ({Xi |i ∈ I}, T = (I, F ))
be a tree-decomposition of H of width k − 1. We claim that T (G) := T (⌈t/2⌉) (H) :=
(⌈t/2⌉)
({Xi
|i ∈ I}, T = (I, F )) is a tree-decomposition of G with k-breadth ≤ ⌈t/2⌉. See
Figure 14 for an illustration.
By Lemma 13, T (⌈t/2⌉) (H) is a tree-decomposition of H with k-breadth ≤ ⌈t/2⌉.
Hence, the first and the third conditions of tree-decompositions hold for T (G). For every
pair u, v of vertices of G, dG (u, v) ≤ dH (u, v). Therefore, every disk D⌈t/2⌉ (x, H) of H is
contained in a disk D⌈t/2⌉ (x, G) of G. This implies that every bag of T (G) is covered by
95
at most k disks of G of radius at most ⌈t/2⌉ each, i.e.,
(⌈t/2⌉)
Xi
= D⌈t/2⌉ (Xi , H) =
∪
x∈Xi
D⌈t/2⌉ (x, H) ⊆
∪
D⌈t/2⌉ (x, G).
x∈Xi
We need only to show additionally that each edge uv of G belongs to some bag of
T (G). Since H is a t-spanner of G, dH (u, v) ≤ t holds. Let x be a middle vertex
of a shortest path connecting u and v in H. Then, both u and v belong to the disk
D⌈t/2⌉ (x, H). Let Xi be a bag of T (H) containing vertex x. Then, both u and v are
(⌈t/2⌉)
contained in Xi
4.6.2
, a bag of T (G).
Consequences
Now we give two implications of the above results for the class of graphs admitting
(multiplicative) t-spanners with tree-width k−1. They are direct consequences of Lemma
14, Theorem 7 and Theorem 8.
Theorem 9. Let G be a graph with n vertices and m edges having a (multiplicative)
t-spanner with tree-width k − 1. Then G admits an additive (2⌈t/2⌉(1 + log2 n))-spanner
with at most O(kn log n) edges constructible in polynomial time for every fixed k.
Theorem 10. Let G be a graph with n vertices and m edges having a (multiplicative)
t-spanner with tree-width k − 1. Then G admits a system T (G) of at most k(1 + log2 n)
collective additive tree (2⌈t/2⌉(1 + log2 n))-spanners constructible in polynomial time for
every fixed k.
96
(a) A graph G.
(b) A 2-spanner H of G with tree-width 2.
(c) Tree-decomposition T (H) of width 2. (d) Tree-decomposition T (G) = T (1) (H) of
3-tree-breadth equal 1.
Figure 14: Illustration to the proof of Lemma 14. A tree-decomposition for G is obtained
from a tree-decomposition of H.
CHAPTER 5
Embedding of Weighted Graphs into Trees:
Theoretical Grounds and Empirical Analysis on Real
Datasets
In this chapter, we present our work on the problem of embedding weighted graphs
into (weighted) trees. One of the applications of this problem is the reconstruction of the
evolutionary tree from evolutionary distances between species [81, section 4.3] and [5].
We say that a weighted graph G = (V, E) has a non-contractive embedding into a tree
T = (V ∪ S, E ′ ), (weighted tree possibly with Steiner vertices), with distortion λ, if T
satisfies the following two conditions:
(1) ∀x, y ∈ V, dG (x, y) ≤ dT (x, y)
(2) ∀x, y ∈ V, dT (x, y) ≤ λdG (x, y)
(non-contractibility);
(bounded expansion).
The problem of the minimum distortion non-contractive embedding of a weighted graph
is to find a tree embedding with the minimum distortion λ∗ .
The approach we use is an extension of the approach of [54] of embedding unweighted
graphs into trees. First we present a graph decomposition procedure (layering partition)
used for our embedding.
97
98
5.1
Layering partition for weighted graphs
Layering partition has been introduced in [39] and being used in [21,54] for embedding
graph metrics into trees. We extend the procedure of layering partition on unweighted
graphs to weighted graphs.
Let h be a positive real number and G = (V, E) be a weighted connected graph with a
distinguished vertex s and let r = ⌈maxx∈V dG (s, x)/h⌉. A layering of a weighted graph G
with respect to the special vertex s is the partition of V into the layers (spheres or rings)
Li = {v ∈ V : ih ≤ dG (s, v) < (i + 1)h}, i = 0, 1, . . . , r of width h. A layering partition
LP(s, h) = {Li1 , . . . , Lipi } of G is a partition of each layer Li into clusters Li1 , · · · , Lipi
such that two vertices u, v ∈ Li belong to the same cluster Lji if and only if they can
be connected via a path outside the ball B(i−1)h (s) of radius (i − 1)h centered at s. In
other words, clusters could be defined as following: if Xij is a connected component of
G \ {L0 , . . . , Li−1 }, then cluster Lji is equal to Xij ∩ Li . For illustration see Figure 15.
It was proved in [51] that such layering partition can be found in a linear time for
unweighted graphs. We extend the approach of [51] to work for weighted graphs. This is
done in two phases. The first phase finds the layers {L0 , . . . , Lr } using Dijkstra’s single
source shortest path algorithm, starting from the special vertex s. The second phase
finds the clusters Li1 , . . . , Lipi for each layer. This is done as follows. Start from the
layer Lr farthest from s and find the connected components of the graph induced by Lr .
These connected components are the clusters of the layer Lr . Then, contract each of
these connected components into a single node. Then find the connected components in
the graph induced by Lr−1 and the set of contracted nodes. We proceed in the same way
99
downward the layers until layer 1. The running time for our layering partition procedure
of weighted graphs would take O(|E| log |V |) time, where |E| and |V | are the numbers of
edges and vertices of a graph G = (V, E), respectively.
Let Γ(s, h) be the graph whose vertex set is the set of all clusters Lji of a layering
′
partition LP(s, h) of a given graph G. Two nodes C = Lji and C ′ = Lji′ are adjacent
′
in Γ(s, h) if and only if there exist u ∈ Lji and v ∈ Lji′ such that dG (u, v) ≤ h. See
Figure 15c for illustration. It was proved in [51] that Γ has a tree structure and is being
called the layering tree. For a weighted graph with non-negative weights, Γ is found in
|E| log |V | time using the above procedure of layering partition with Dijkstra’s algorithm.
In our following discussion, we assume that the layering tree Γ(s,h) is rooted at the cluster
containing the special vertex s. Also, to guarantee that no edge crosses non-consecutive
layers in LP(s, h), we assume that the cluster-width h is larger than or equal the weight
w of the longest edge in the graph (i.e., h ≥ w).
5.2
Properties of layering partition for weighted graphs
In the following we prove some properties of layering partition related to our problem
of embedding weighted graphs into trees. First we prove a bound on the diameter of
clusters in a layering partition for such graphs.
We use proofs similar to [54] to prove the following two lemmas.
Lemma 15. If a graph G embeds into a tree T with multiplicative distortion λ, then
for any x, y ∈ V, any path PG (x, y) between x and y in G and any vertex c ∈ PT (x, y),
dT (c, PG (x, y)) ≤
λw
,
2
where w is the largest edge weight of the graph G.
Proof. Removing c from T , we consider the subtree Ty of T \{c} containing vertex y.
100
(a) Layering of G with respect to s.
(b) Clusters of the layering partition
LP(s, h) of G.
(c) The layering tree Γ(s, h).
(d) The tree H associated with LP(s, h).
Figure 15: A layering partition of a weighted graph G.
101
Since x ∈
/ Ty , we can find an edge ab of PG (x, y) with a ∈ Ty and b ∈
/ Ty . Therefore,
the path PT (a, b) must go via c. If dT (c, a) >
λw
2
and dT (c, b) >
λw
,
2
then dT (a, b) =
dT (a, c) + dT (b, c) > λw. This would lead to a contradiction with the assumption that
the embedding of G has a distortion of at most λ, as condition 2 implies that dT (a, b) ≤
λdG (a, b) ≤ λw. By the fact dT (c, PG (x, y)) ≤ min{dT (c, a), dT (c, b)} ≤
λw
,
2
we conclude
our proof.
Lemma 16. For a given graph G that is embeddable into a tree T with distortion λ, the
diameter of any cluster C of a layering partition with width h of G is at most 3λw + 2h.
In other words, ∀x, y ∈ C, dG (x, y) ≤ 3λw + 2h.
Proof. Let PG (x, y) be a path connecting x and y in Xij . Let PG (s, x) and PG (s, y) be
two shortest paths of G connecting s, x and y, s, respectively. Let c be the least common
ancestor of x and y in T (i.e., c = PT (x, y) ∩ PT (s, x) ∩ PT (s, y)). Let a, b and z be the
closest three vertices of PG (s, x), PG (s, y) and PG (x, y), respectively, to c in the tree T , i.e.,
dT (c, a) = dT (c, PG (s, x)), dT (c, b) = dT (c, PG (s, y)) and dT (c, z) = dT (c, PG (x, y)). By
applying Lemma 15 three times, we have: dT (c, a) ≤
λw
,
2
dT (c, b) ≤
λw
2
and dT (c, z) ≤
λw
.
2
From the triangle inequality, condition 1 and the previous inequalities, we conclude that
dG (a, z) ≤ dG (a, c) + dG (c, z) ≤ dT (a, c) + dT (c, z) ≤
λw λw
+
≤ λw.
2
2
Also, we claim that dG (a, x) ≤ λw + h. Since dG (s, a) = dG (s, x) − dG (a, x) and by the
triangle inequality, we have
dG (s, z) ≤ dG (s, a) + dG (a, z) = dG (s, x) − dG (a, x) + dG (a, z).
From the definition of clusters, we have dG (s, x) < (i + 1)h and dG (s, z) ≥ ih. Thus, we
have dG (a, x) ≤ (i + 1)h − ih + λw = λw + h. In an analogous way, we can prove that
102
dG (b, y) ≤ λw + h. Now, by condition 1 of non-contractibility and the triangle inequality
we have
dG (a, b) ≤ dT (a, b) = dT (a, c) + dT (c, b) ≤
λw λw
+
≤ λw.
2
2
Now, summing these inequalities, we conclude our proof
dG (x, y) ≤ dG (x, a) + dG (a, b) + dG (b, y) ≤ 3λw + 2h.
Corollary 1. Given the tree embedding of a graph G with the minimum distortion of λ∗ ,
λ∗ ≥
∆s (h)−2h
,
3w
where ∆s (h) is the maximal diameter of a cluster in the layering partition
LP(s, h) of G.
5.3
Construction of tree embedding
Given a weighted graph G = (V, E) and a layering partition LP(s, h) of G with
cluster-width h, our embedding constructs a tree H = (V ∪ S, E ′ ), where S is a set of
Steiner points, such that H closely reproduces the global structure of the layering tree
Γ(s, h). Let C = Lji ∈ LP(s, h) be a node (cluster) in Γ(s, h) and P (C) = Lki−1 ∈
LP(s, h) be its parent in Γ(s, h). The construction of H creates for each cluster C a
new vertex (Steiner point) sC and makes it adjacent in H to all vertices v ∈ C. Also,
it connects each Steiner point sC to the Steiner point of its parent sP (C) (for illustration
see Figure 15d).
The weighting of edges of the constructed tree H = (V ∪ S, E ′ ) is done as follows.
Edges between Steiner points are weighted uniformly with the cluster-width h . Edges
between the vertices of a given cluster C and their Steiner point are weighted with
103
∆s (h)/2 + h, where ∆s (h) is the largest cluster diameter of the layering partition.
Now, we will show that such weighting of H will produce a non-contractive embedding
with bounded distortion.
Lemma 17. Given a weighted graph G = (V, E) and a weighted tree H = (V ∪ S, E ′ )
constructed as described above, H provides a non-contractive embedding of G (i.e., ∀x, y ∈
V dG (x, y) ≤ dH (x, y)). Also, ∀x, y ∈ V, dH (x, y) ≤ dG (x, y) + 3λw + 6h.
Proof. First, we prove the non-contractiveness of the tree H. Let Cx and Cy be the
two clusters in Γ(s, h) containing vertices x and y, respectively. Let C be the nearest
common ancestor of Cx and Cy in Γ(s, h). Assume the depths of Cx , Cy and C in Γ are
i, j and k, respectively. Let x′ be the closest vertex in C to x (i.e., x′ ∈ C such that ∀z ∈
C, dG (x, x′ ) ≤ dG (x, z)). Let y ′ be the closest vertex in C to y (i.e., y ′ ∈ C such that ∀z ∈
C, dG (y, y ′ ) ≤ dG (y, z)). For illustration see Figure 16. By our construction, we have
the following inequalities:
kh ≤ dG (s, x′ ) < (k + 1)h,
kh ≤ dG (s, y ′ ) < (k + 1)h,
ih ≤ dG (s, x) < (i + 1)h,
jh ≤ dG (s, y) < (j + 1)h.
Now, let x′′ ∈ C be a vertex on the shortest path from s to x. Also, let y ′′ ∈ C be a
vertex on the shortest path from s to y. By our assumption that h ≥ w, we can guarantee
that no edge of the shortest path tree SP T (s) rooted at s crosses non-consecutive layers.
Thus, such vertices x′′ and y ′′ must exist. Since x′ is the closest vertex in C to x, we have
104
dG (x, x′ ) ≤ dG (x, x′′ ) = dG (s, x) − dG (s, x′′ ) < (i − k)h + h. In the same way, we have
dG (y, y ′ ) < (j − k)h + h. By our construction of H and the way weights are assigned to
its edges, we have dH (x, y) = (i − k)h + (j − k)h + ∆s (h) + 2h. By the triangle inequality,
we have:
dG (x, y) ≤ dG (x, x′ ) + dG (y, y ′ ) + dG (x′ , y ′ )
< (i − k)h + h + (j − k)h + h + ∆s (h)
= (i − k)h + (j − k)h + ∆s (h) + 2h
= dH (x, y),
thus proving the non-contractiveness of our embedding into H.
Second, we prove the upper bound result on the distances in the tree H. By the
triangle inequality, we have dG (x, x′ ) ≥ dG (s, x) − dG (s, x′ ). Since ih ≤ dG (s, x) and
dG (s, x′ ) < (k + 1)h, we have dG (x, x′ ) > ih − (k + 1)h = (i − k)h − h. In the same way,
we have dG (y, y ′ ) > (j − k)h − h. Since dH (x, y) = (i − k)h + (j − k)h + ∆s (h) + 2h and by
applying the last two inequalities, we have dH (x, y) < dG (x, x′ ) + dG (y, y ′ ) + ∆s (h) + 4h.
Furthermore, we have dG (x, y) ≥ dG (x, x′ ) + dG (y, y ′ ). Applying this, we have dH (x, y) <
dG (x, y) + ∆s (h) + 4h. By Lemma 16, we have ∆s (h) ≤ 3λw + 2h, thus we can conclude
that dH (x, y) < dG (x, y) + 3λw + 6h.
An outline of our algorithm for embedding weighted graphs into trees is described in
Algorithm 1.
Given a weighted graph G = (V, E) with n vertices and m edges, the construction
of the layering partition of G builds a shortest path tree (SP T (s)) originating from the
vertex s using Dijkstra’s algorithm in O(m log n) time. The weighting of the edges of
105
Figure 16: Illustration of proof of Lemma 17.
H requires finding the largest cluster diameter and thus finding distances in G. We
can calculate all pairwise distances in the graph by applying Disjktra’s algorithm n
times yielding O(nm log n) total time. Thus, our algorithm requires O(nm log n) time to
construct the tree embedding H of the graph G.
Now we conclude our work with the following theorem.
Theorem 11. If a weighted graph G = (V, E) with n vertices and m edges admits
a non-contractive embedding into a tree with distortion λ, then we construct a noncontractive tree embedding H of G in O(nm log n) time such that: ∀x, y ∈ V, dH (x, y) ≤
dG (x, y) + 3λw + 6h.
It is worth noting that our algorithm for the problem of embedding weighted graphs
into trees with multiplicative distortion would produce additive distortion error of 3λw +
106
Algorithm 1 Approximation Algorithm for Embedding into Tree Metric
Input: A weighted graph G = (V, E), a root vertex s and the cluster-width h
Output: Tree embedding H for G
Find the layering partition LP(s, h) = {Li1 , . . . , Lipi : i = 0, 1, . . . , r} of G
Set initially H := (V, ∅)
for i = r down to 1 do do
for each cluster C from {Li1 , . . . , Lipi } do
Add to H a Steiner point sc
Add to H edges {vsc : v ∈ C} with weights ∆s (h)/2 + h
for each child cluster Z of C in Li+1 do
Add to H the edge between Steiner points sc sz with weight h
end for
end for
end for
Return tree H
6h. To compare with other results, we recall the best results achieved in [21] for embedding a general metric into a tree metric. In [21], they produce a multiplicative error of
(λ log n)log
1/2
∆
, where ∆ is the spread of the metric (i.e., the ratio of the diameter over
the minimum distance in the metric). Comparing with our result, our distortion has w
and h as additive terms, while ∆ appears as exponent in the distortion error of [21]. Also,
comparing with the results of [7], their algorithm requires O(n4 ) running time, while ours
requires O(nm log n) time. The approach of [7] embeds a general metric into a tree with
distortion (1 + ϵ)(O(log n)) , where ϵ is a measure quantifying how close a given metric is to
107
a tree with values in the range [0, 1]. We found that ϵ values for our datasets are equal
to 1.
5.4
Experiment
In this section, we experiment our algorithm on real datasets. We test on a variety
of real graph datasets including datasets of Internet measurements (MIT-PlanetLab,
Cornell-King, HP-PlanetLab and routeview). We used these datasets since empirical
studies of the Internet measurements [148] indicated that the Internet has a tree-like
structure to a certain degree (i.e., a good embeddability to a tree). Therefore, these
datasets would be useful to verify that our algorithm practically produces good tree
embeddings. Also, we run our algorithm on other types of datasets (social, biological
and information networks) to measure tree-likeness in different domains. Furthermore,
three datasets (routeview, yeast and Dutchelite) are uniformly weighted (unweighted)
graphs. They can be used to obtain a view of how edge weights affect the results of our
algorithm.
5.4.1
Datasets
The datasets are obtained from different domains (Internet measurements, social and
collaboration networks, biological and information networks). Some parameters of these
datasets are shown in Table 12. Original datasets have been preprocessed to remove
violations of the triangle inequality in order to make each dataset a metric space. Also,
some of the graph datasets were not connected, therefore we run our algorithm on the
largest connected component of such graphs.
108
graph
n
m
MIT-PlanetLab [156]
416 10277
Cornell-King [161]
2500 60758
HP-PlanetLab [3]
410 76943
NetScience [132]
379
898
Geom [61]
3621 9438
Facebook-like Social Network [136] 1893 13830
FFN-msg-sum [135]
897 70904
FFN-char-sum [135]
897 68772
FFN-msg-newman [135]
897 70845
FFN-char-newman [135]
897 70904
Celegans [159]
297 2087
cond-mat-99-joint [131]
13861 44619
cond-mat-99-newman [131]
13861 44616
US Top 500-Airport Network [59] 500 2872
US Airport Network [134]
1572 16786
OpenFlights [134]
2905 15601
cond-mat-2003 [131]
27519 116173
cond-mat-2005 [131]
36458 171731
hep-th [131]
5835 13811
astro-ph [131]
14845 119648
routeview [4]
10515 21455
yeast [43]
2224 6609
DutchElite [63]
3621 4311
largest edge smallest edge diameter
6708.4
146777
1352720
4.67763
77
184
1568
127792
52.8877
6726.23
72
37
22.3333
2253990
2974630
11
35.2
46
33.999
16.5
1
1
1
0.1215
1001
1215.21
0.0526316
1
1
1
1
0.008
0.016
1
1
0.0588235
9
1
1
0.0416667
7.00118
0.0434783
0.178571
1
1
1
8623.81
284367
3893440
69.5212
1069
1445
6159
494136
201.271
26171
344
650
382.949
14714300
23763600
151
551.315
806.662
614.537
225.317
10
11
16
Table 12: Real datasets parameters: n: the number of vertices, m: the number of edges,
the largest edge weight, the smallest edge weight and the diameter of the graph.
Internet measurement datasets:
MIT-PlanetLab [156]: A dataset of round-trip latency times between 497 PlanetLab [2]
nodes/hosts measured using the Ping utility. The dataset was collected on 12/01/2005
at MIT. The latency times has been averaged over 10 pings.
Cornell-King [161]: A dataset of round-trip latency times between 2500 DNS servers
measured using the KING technique [100]. The data was collected between 5/5/2004
and 5/13/2004 by Jeremy Stribling at Cornell University. The latency times are the
medians of 10 measurements.
109
HP-PlanetLab [3]: A dataset of the available bandwidth measurement between PlanetLab
nodes/servers using the pathChirp tool [149]. The dataset was collected at HP labs.
The above three Internet measurement datasets are originally directed (i.e., two measurements could exist between two nodes in both directions). In such case, we take the
average of the two measurements and replace both edges by one undirected edge. If only
one measurement exists between two nodes, we regard that measurement as an undirected edge between the two nodes. To make our dataset a metric space, we remove
those edges causing violation of the triangle inequality.
Collaboration networks:
NetScience [132]: A co-authorship network of authors in the area of network theory and
experiment. The data was compiled by M.J. Newman in May 2006 from the bibliographies
of two review papers on networks. The weight of each edge is the M.J. Newman assigned
weight [130] such that the weight between authors i and j is defined as w(i, j) =
∑
1
p Ap −1 ,
where p is a joint paper of i and j and Ap is the number of authors of p.
Geom [61]: A co-authorship network of authors in the area of computational geometry.
An edge exists between two authors if they coauthored at least one joint work. The weight
of an edge between two authors is the number of joint collaborations. The dataset was
compiled in February 2002.
Condensed Matter collaborations 1999 (cond-mat-99-joint) [131]: A co-authorship network between authors posting preprints on Condensed Matter in the arXiv E-Print
Archive between January 1, 1995 and December 31, 1999. An edge between two authors is weighted by the number of joint papers on the subject of Condensed Matter.
Condensed Matter collaborations 1999 (cond-mat-99-newman) [131]: The same network
110
as the one above but with different edge weights. Edges are being weighted by Newman’s weighting method such that the edge weight between authors i and j is defined as
w(i, j) =
∑
1
p Ap −1 ,
where p is a joint paper of i and j and Ap is the number of authors
of p.
Condensed Matter collaborations 2003 (cond-mat-2003) [131]: An updated co-authorship
network between authors posting preprints on Condensed Matter in the arXiv E-Print
Archive between January 1, 1995 and June 30, 2003. The network is weighted using
Newman’s weighting method described above.
Condensed Matter collaborations 2005 (cond-mat-2005) [131]: An updated co-authorship
network between authors posting preprints on Condensed Matter in the arXiv E-Print
Archive between January 1, 1995 and March 31, 2005. The network is weighted using
Newman’s weighting method described above.
High-energy theory collaborations (hep-th) [131]: A co-authorship network between scientists posting preprints on the High-Energy Theory E-Print Archive between January
1, 1995 and December 31, 1999. The network is weighted using Newman’s weighting
method described above.
Astrophysics collaborations(astro-ph) [131]: A co-authorship network between scientists
posting preprints on the Astrophysics E-Print Archive between January 1, 1995 and December 31, 1999. The network is weighted using Newman’s weighting method described
above.
Social networks:
Facebook-like Social Network [136]: A dataset of messages between an online community
of students at the University of California, Irvine. The dataset includes all students who
111
sent or received at least one message. The dataset is originally directed (sent/received
messages). We drop the edge direction and weight each edge by the total number of
messages exchanged between two students.
Facebook-like Forum Networks [135]: Datasets of online forum activity between an online
community of students at the University of California, Irvine. The datasets include
all forum users who posted messages on different topics of the forum. An edge exists
between two users if they both posted messages on at least one common topic. Four
different networks obtained from the datasets depending on the method of weighting
edges between users.
FFN-msg-sum: Edges between two nodes (users) are weighted by the total number of
messages posted by both users on the same topics (i.e., the weight between users i and
j is defined as w(i, j) =
∑
t
mit + mjt , where t is a topic received posts from both users
i and j, and mit and mjt are the total number of messages being posted by i and j,
respectively, on t).
FFN-char-sum: Edges between two nodes (users) are weighted by the total number of
characters of all messages being posted by both users on the same topics (i.e., w(i, j) =
∑
t cit
+ cjt , where t is a topic received posts from both users i and j, and cit and cjt are
the number of characters posted by i and j, respectively, on t).
FFN-msg-newman: Edges between two nodes (users) are weighted by Newman’s weighting method proportional to the total number of messages posted by both users on the
same topics (i.e., w(i, j) =
∑
t
mit +mjt
,
Mt
where t is a topic received posts from both users
i and j, and mit and mjt are the number of messages posted by i and j, respectively, on
t, and Mt is the total number of messages posted on t by all users).
112
FFN-char-newman: Edges between two nodes (users) are weighted by Newman’s weighting method proportional to the total number of characters posted by both users on the
same topics (i.e., w(i, j) =
∑
t
cit +cjt
,
Ct
where t is a topic received posts from both users
i and j, and cit and cjt are the number of characters posted by i and j, respectively, on
topic t, and Ct is the total number of characters of all messages posted on t by all users).
Biological Datasets:
Celegans [159]: A dataset of the neural network of the Caenorhabditis elegans worm (C.
elegans). Each node represents a neuron. An edge exists between two neurons if there is
at least one synapse or gap junction between them. The weight of an edge is the number
of synapses and gap junctions between two neurons. The dataset is originally directed.
Information Networks:
US Top 500-Airport Network [59]: A network of the 500 busiest commercial airports in
the US. An edge exists between two airports if a flight was scheduled between them in
the year 2002 with weight equal to the total number of seats available on the scheduled
flights in 2002. The data was obtained from Tore Oplash website [134].
US Airport Network [134]: A network of the commercial airports in the US. An edge
exists between two airports if a flight was scheduled between them in the year 2010 with
weight equal to the total number of seats available on the scheduled flights in 2010. The
data was downloaded and compiled from the Bureau of Transportation Statistics (BTS)
Transtats site by Tore Oplash [134].
OpenFlight [134]: A network of commercial airports in the US and two other non-US
based airports. The weight of an edge is the number of routes between two airports. The
data was downloaded and compiled from Openflights.org by Tore Oplash [134].
113
For all of the datasets except MitPlanetLab and Cornell-King, the semantic of an
original edge weight represents the similarity between the two vertices being connected
by that edge. In such case, we change the edges’ weights as following: w′ := max w −
w + min w, where w′ is the new weight and w is the original edge weight. This, would
guarantee that the smaller the distance between two vertices, the more similar and thus
closer to each other.
5.4.2
Layering partition results
Recall that the embeddability of weighted graphs into tree metrics is related to the
largest cluster diameter ∆s (h) of the layering partition LP(s, h) as shown by Lemma 16
in Section 5.2. Also, the construction of our tree embedding uses the layering partition.
Table 13 shows the results of the layering partition obtained for the datasets described in
Subsection 5.4.1. For each graph dataset, we randomly select a start vertex s and build
the layering partition LP(s, h) with respect to s. Table 13 shows the cluster-diameter
∆s (h), the number of clusters in the layering partition LP(s, h) and the average diameter
of clusters in LP(s, h). We find that all graph datasets have relatively small average
diameter of clusters compared to their diameters. More than 40% of clusters having
diameter of 0 (i.e., singleton clusters).
5.4.3
Non-contractive embedding results
We embed our datasets into the tree H. Our embedding depends on the clusterwidth h of the layering partition as shown in Lemma 17. Lemma 17 shows that the
smaller the value of h the smaller distortion of our embedding into H. Also, since we
have the requirement that h ≥ w, we set h to the longest edge weight w. Table 14
114
Graph
G = (V, E)
n = diameter cluster# of
cluster- average
% of
|V | diam(G) width h clusters in diameter diameter
clusters
LP(s, h) ∆s (h) of clusters having
in LP(s, h) diameter 0
MIT-PlanetLab
416 8623.81 6708.94
Cornell-King
2500 284367 146777
HP-PlanetLab
410 3893440 1352720
NetScience
379 69.5212 4.67763
Geom
3621
1069
77
Facebook-like Social Network 1893
1445
184
FFN-msg-sum
897
6159
1568
FFN-char-sum
897 494136 127792
FFN-msg-newman
897 201.271 52.8877
FFN-char-newman
897
26171 6726.23
Celegans
297
344
72
cond-mat-99-joint
13861
650
37
cond-mat-99-newman
13861 382.949 22.3333
US Top 500-Airport Network 500 14714300 2253990
US Airport Network
1572 23763600 2974630
OpenFlights
2905
151
11
cond-mat-2003
27519 551.315
35.2
cond-mat-2005
36458 806.662
46
hep-th
5835 614.537 33.999
astro-ph
14845 225.317
16.5
routeview
10515
10
1
yeast
2224
11
1
DutchElite
3621
16
1
2
2
10
120
1540
701
73
59
70
59
36
391
3676
189
632
1253
6202
7880
2143
2750
6702
1037
2934
2482.69
211549
2694960
20.1798
536
897
4692
382456
155.801
20107.1
282
3681
230.314
8981270
11865000
54
339.633
409.525
329.046
138.284
6
6
10
1241.345
50%
105774.5
50%
269496
90%
3.984
40%
31.4727 69.026 %
16.6377
94.15%
834.82
61.64%
56899.03
72.88%
23.82
68.57%
2832.39
72.88%
26.61
83.33%
20.88
58.63%
12.4
58.71%
922660.21 74.07%
197311.91 96.20%
3.06
83.08%
18.15
61.54%
22.78
62.07%
15.83
68.92%
7.294
65.71%
0.0632
96.08%
0.11956
94.56%
0.07
98.02%
Table 13: Layering partitions of the datasets and their parameters. h is the cluster-width
of LP(s, h) and set equal to the longest edge weight. s is a randomly selected start vertex.
shows the results of embedding by our algorithm running on our datasets. We report the
average distortion ratio, the maximum distortion ratio, the average relative distortion
and the distance-weighted average distortion. These results show that some datasets
have a “good” small average distortion but a very large maximum distortion. This could
be justified as being due to a few “anomaly” vertices in the graphs which do not fit well
into the tree metric. Table 14 shows that thirteen datasets have average distortion of less
115
than 3. It is worth noting that these datasets of small average distortion have relatively
small values of the longest edge weight and thus cluster-width h.
graph
MIT-PlanetLab
Cornell-King
HP-PlanetLab
NetScience
Geom
Facebook-like Social Network
FFN-msg-sum
FFN-char-sum
FFN-msg-newman
FFN-char-newman
Celegans
cond-mat-99-joint
cond-mat-99-newman
US Top 500-Airport Network
US Airport Network
OpenFlights
cond-mat-2003
cond-mat-2005
hep-th
astro-ph
routeview
yeast
DutchElite
avg.
max
avg.
distance- ∆s (h)
distortion distortion relative weighted
ratio
ratio distortion average
distortion
486.222 130869 485.222
56.0313 504.598 55.0313
4.228
4444.01
3.228
2.33795 561.166 1.33795
2.46234
690
1.46234
2.94284
1265
1.94284
3.21513
7828
2.21513
4.88572 638040 3.88572
3.22905
32697
2.22905
8.38588 2097470 7.38588
3.29534
426
2.29534
2.43412
461
1.43412
2.44168 4674.67 1.44168
15.2011 1498810 14.2011
17.2603 17814200 16.2603
2.55111
76
1.55111
2.51556 9840.79 1.51556
2.46088 78.2047 1.46088
2.30369
9132
1.30369
2.74645 9590.9 1.74645
2.87213
9
1.87213
2.35655
9
1.35655
2.07811
13
1.07811
h
86.2065 2482.69 6708.4
37.6178 211549 146777
4.05282 2694960 1352720
2.04786 20.1798 4.67763
2.29134
536
77
2.79683
897
184
2.99343
4692
1568
3.06892 382456 127792
2.94213 155.801 52.8877
2.95605 20107.1 6726.23
3.00878
282
72
2.30791
391
37
2.31843 230.314 22.3333
2.94326 8981270 2253990
2.641 11865000 2974630
2.40262
54
11
2.40361 339.633
35.2
2.35567 409.525
46
2.15664 329.046 33.999
2.59063 138.284
16.5
2.70627
6
1
2.22493
6
1
1.93266
10
1
Table 14: Distortion results for non-contractive embedding of the datasets into tree H.
Cluster-width is equal to the largest edge weight (h = w).
5.4.4
Edge subdivision (h ≤ w)
From our analytical analysis of the constructed tree H (consult Lemma 17), the
distances in H depend on the value of the cluster-width h. That is, the smaller h the
116
smaller bound on the distances in H. Recall that the construction of H required that
no edge of the shortest path tree SP T (s) rooted at s crosses non-consecutive layers in
the layering partition LP(s, h) (see Lemma 17). Setting h to be smaller than w requires
subdivision of those SP T (s) edges longer than h and thus introducing new Steiner vertices
to the graph. We call these Steiner vertices “dummy” vertices to distinguish them from
the other Steiner vertices used in the construction of H. Table 15 shows the results of
embedding our datasets with values of cluster-width smaller than or equal to the longest
edge weight for each dataset. Table 15 shows that we obtained better embedding results
compared to the results of Table 14, obtained by embedding with cluster-width set equal
to longest edge weight. Generally, smaller values of h would yield smaller distortion
(better embedding) but at the expense of increasing the graph size by adding dummy
vertices. This is shown in Figure 17 and Figure 18. These figures show that smaller values
of h result in smaller distortion up to some point. Decreasing h after this point will not
produce better distortion. This could be due to the fact that as we add more dummy
nodes, we increase the diameter of those clusters containing these dummy vertices.
20000
20000
18000
18000
16000
16000
14000
14000
12000
12000
10000
10000
8000
8000
6000
6000
4000
4000
2000
2000
0
0
72 60 50 40 30 20 19 18 17 16 15 14 13 12 11 10 9
8
7
6
5
4
3
2
1
cluster-width
# of dummy nodes
avg error ratio*1000
72 60 50 40 30 20 19 18 17 16 15 14 13 12 11 10 9
8
7
6
5
4
3
2
1
cluster-width
# of dummy nodes
max error ratio*10
Figure 17: Cluster-width versus average distortion, maximum distortion and number of
dummy vertices for the Celegans dataset.
117
graph
avg.
max
avg.
distance- ∆s (h)
distortion distortion relative weighted
ratio
ratio distortion average
distortion
MIT-PlanetLab
Cornell-King
HP-PlanetLab
NetScience
Geom
Facebook-like Social Network
FFN-msg-sum
FFN-char-sum
FFN-msg-newman
FFN-char-newman
Celegans
cond-mat-99-joint
cond-mat-99-newman
US Top 500-Airport Network
US Airport Network
OpenFlights
cond-mat-2003
cond-mat-2005
hep-th
astro-ph
h
7.58017 1767.77 6.58017 2.14556 212.784
1
7.09732 60.6909 6.09732 5.05324
57870
1001
2.17509 2254.58 1.17509 2.09186 2679790 30000
1.76056 163.652 0.76056 1.63624 8.45536 0.0526316
2.1185
553
1.1185
1.9818
533
10
2.24927
905.6
1.24927 2.14829
825.6
40
1.12194
4861
1.15249 2.01465
4661
100
3.20813 386634 2.20813 2.09797 376634
5000
2.19989 20725.1 1.19989 2.02006 155.801
5
5.26393 1282490 4.26393 1.94908 19919.8
300
2.46878
301
1.46878 2.26691
279
11
2.11556
387.5
1.11556 2.01216
357.5
15
2.23679 4246.83 1.23679 2.1277 229.814
10
8.1244
749096
7.1244
2.0063 6541860 100000
12.1137 12185500 11.1137 2.11725 11785500 200000
1.90162 53.6667 0.901621 1.80349 47.6667
2
2.51556 9840.79 1.51556 2.40361 339.633
35.2
2.46088 78.2047 1.46088 2.35567 409.525
46
2.16475 8488.05 1.16475 2.03154 329.046
20
2.42821 8298.2 1.42821 2.29359 128.182
10
# of
dummy
vertices
26752
9959
17323
29322
23961
6763
12474
19776
8153
18313
1485
19698
15179
10123
21371
12403
0
0
3750
8973
Table 15: Distortion results for non-contractive embedding of the datasets into tree H.
Cluster-width is less than or equal the largest edge weight (h ≤ w).
10000
10000
9000
9000
8000
8000
7000
7000
6000
6000
5000
5000
4000
4000
3000
3000
2000
2000
1000
1000
0
0
cluster-width
# of dummy nodes
avg error ratio*100
cluster-width
# of dummy nodes
max error ratio*10
Figure 18: Cluster-width versus average distortion, maximum distortion and number of
dummy vertices for the CornellKing dataset.
118
5.4.5
Contractive embedding: weighting clusters with their own diameters
In the non-contractive embedding into the tree H, the weighting of edges inside each
cluster is proportional to the largest cluster diameter ∆s (h). This could result in large
distortion for vertices with small graph distances. If an embedding with a smaller average
distortion is rather more desirable, we drop the requirement of non-contraction, where
we would weight edges inside each cluster using the cluster diameter (i.e., edges inside
cluster C are weighted by diam(C)/2). This weighting may result in contraction of the
tree distances with respect to the original graph distances. Table 16 shows the results
for embedding with weighting using clusters’ own diameters into the tree H ′ . In Table
16, we compute the average and the maximum distortions as follows:
- average distortion :=
Σu,v:dT (u,v)<dG (u,v) dG (u,v)/dT (u,v)+Σu,v:dT (u,v)≥dG (u,v) dT (u,v)/dG (u,v)
(n2 )
;
(u,v) dG (u,v)
- maximum distortion := max{ ddHG (u,v)
, dH (u,v) }.
It turns out that seven datasets have average distortion between 1 and 1.5 while nine of
them have average distortion between 1.5 and 2. Furthermore, in Table 17 we show the
number of vertex pairs having a distortion less than a specific value, i.e., pairs u, v ∈ V
(u,v) dG (u,v)
with max{ ddHG (u,v)
, dH (u,v) } < ϵ. We can see that at least 40% of pairs of thirteen datasets
have distance distortion in H ′ less than 1.3. At least 50% of pairs of sixteen datasets
have distance distortion less than 1.5.
5.4.6
Embedding with recursive partitioning of clusters
The distortion error could be large for vertices with small graph distances. Such
vertices tend to be within the same cluster of the layering partition LP(s, h) used for the
119
Graph
MIT-PlanetLab
Cornell-King
HP-PlanetLab
NetScience
Geom
Facebook-like Social Network
FFN-msg-sum
FFN-char-sum
FFN-msg-newman
FFN-char-newman
Celegans
cond-mat-99-joint
cond-mat-99-newman
US Top 500-Airport Network
US Airport Network
OpenFlights
hep-th
astro-ph
cond-mat-2003
cond-mat-2005
routeview
yeast
DutchElite
avg.
max
avg.
distancedistortion distortion relative weighted
distortion average
distortion
2.1476
2.15488
2.07588
1.24350
1.38654
1.46909
1.76963
2.22802
1.77188
3.66260
1.61404
1.48059
1.49963
5.33292
6.30806
1.36410
1.51081
1.50542
1.6371
1.52376
1.47421
1.52098
1.71461
778.017
25.9803
2168.83
305.541
353
731
1569
127782
6596.1
755492
141.457
218
2939.93
478798
5949250
46.25
4565.36
4605.94
6633.59
52.2427
6.5
7
11
1.14171
1.15016
1.07585
0.232137
0.300326
0.445232
0.767333
1.22761
0.76893
2.66155
0.569447
0.398237
0.397248
4.32615
5.29057
0.294977
0.404596
0.475554
0.580958
0.473886
0.368122
0.414786
0.372014
1.16353
1.67531
1.98151
1.08414
0.943303
1.24037
1.61176
1.76627
1.59801
1.63005
1.38213
1.08351
1.05721
1.34539
1.30221
0.992853
1.034
1.27308
1.33231
1.25279
1.01176
1.06257
0.71202
h
# of
dummy
vertices
1
1001
30000
2
30
130
800
127792
30
6053.6
3
30
10
2185380
2974630
7
33.999
16.5
35.2
46
1
1
1
26752
9959
17323
377
5527
725
872
0
801
24
6359
4091
15179
3
0
1246
0
0
0
0
0
0
0
Table 16: Distortion results for embedding of the datasets into tree H ′ . Edges inside
each cluster C are weighted equal to diam(C)/2.
construction of the tree embedding H. But, since edges inside each cluster are weighted
uniformly using the diameter of the cluster (or the largest cluster diameter), the distortion
for these vertices could be large. In order to reduce the distortion error between vertices
within the same cluster, we recursively partition each cluster into groups or partitions.
Then, we add a Steiner point to each partition p and connect this new Steiner point to
each vertex of the partition with weight equal to Dp /2, where Dp is the diameter of p.
120
Graph
MIT-PlanetLab
Cornell-King
HP-PlanetLab
NetScience
Geom
Facebook-like Social Network
FFN-msg-sum
FFN-char-sum
FFN-msg-newman
FFN-char-newman
Celegans
cond-mat-99-joint
cond-mat-99-newman
US Top 500-Airport Network
US Airport Network
OpenFlights
hep-th
astro-ph
cond-mat-2003
cond-mat-2005
routeview
yeast
DutchElite
distortion
< 1.2 < 1.3 < 1.5 < 2 < 2.5
51.98
16.14
4.24
60.99
38.54
37.29
8.27
2.69
8.62
5.74
27.07
32.84
33.96
28.22
26.73
43.15
33.29
29.75
25.17
31.29
35.21
27.10
21.97
62.57
22.23
4.83
76.81
52.87
43.773
15.36
8.70
12.61
9.81
36.76
44.77
44.78
40.53
35.29
56.20
44.60
41.66
34.39
40.33
43.58
45.52
30.06
75.12
29.99
5.15
89.55
72.76
67.42
23.99
21.27
32.43
27.92
56.62
62.30
59.72
63.29
61.71
75.58
60.76
59.27
50.01
52.68
54.04
54.72
43.17
85.62
47.04
25.31
97.31
93.20
87.52
81.78
68.35
81.24
81.43
81.79
88.45
87.54
89.22
89.16
93.24
87.14
89.89
78.17
86.38
82.28
78.52
69.60
89.48
69.31
97.50
98.99
98.20
97.37
82.38
80.91
82.13
82.24
93.70
97.17
96.34
97.35
98.50
98.32
95.47
97.84
92.44
96.76
95.04
92.67
86.96
h
1
1001
30000
2
30
130
800
127792
30
6053.6
3
30
10
2185380
2974630
7
33.999
16.5
35.2
46
1
1
1
Table 17: Percentage of vertex pairs with distortion up to a given value by embedding
datasets into tree H ′ with own diameter weighting.
Also, we connect the partition Steiner point to the Steiner point of the partition from the
previous iteration, with edge weight ∆s (h)/2 + h − Dp /2. Description of this procedure is
given in Algorithm 2. We consider the partitioning of a cluster as a P -centers problem,
where P is the number of partitions. We apply the farthest point heuristic algorithm
to solve the P -centers problem. The algorithm runs in P iterations. The first iteration
randomly chooses a vertex and adds it to the set of centers (S). Each subsequent iteration
chooses a vertex v with maximum dG (S, v) and adds v to S. This algorithm achieves a
121
factor 2 approximation for the P -centers problem in O(nP ) time [98].
Algorithm 2 Tree Embedding with Cluster Partitioning
Input: A cluster (or partition) C with Steiner point sc , P the number of partitions and
tree H.
Partition C into P partitions using the farthest point heuristic
for each partition p do
Add to p a Steiner point sp
Add to H edges {vsp : v ∈ p} with weights diam(p)/2
Add to H the edge sc sp with weight diam(C) − diam(p)/2
end for
Return tree H
We tested our embedding algorithm with the above partition procedure on our graph
datasets. It achieved better average distortions for some of our datasets as shown in Table
18. For example, embedding with the use of the P -centers partitioning almost halved the
average distortions for US Top 500-Airport Network and US Airport Network datasets
compared to the average distortions of embedding without partitioning. The partitioning
technique gave negligible improvement for other datastes, where they already have small
embedding distortions without partitioning.
122
Graph
avg.
max
distortion distortion
h
FFN-char-sum *
2.2027
127530 127792
FFN-char-sum
2.22802 127782 127792
FFN-char-newman *
3.0713
522556 6053.6
FFN-char-newman
3.66260 755492 6053.6
US Top 500-Airport Network * 3.4379
243443 2185380
US Top 500-Airport Network 5.33292 478798 2185380
US Airport Network *
3.1059 2005240 2974630
US Airport Network
6.30806 5949250 2974630
*
Embedding with cluster partitioning using P -center
# of
# of
dummy clusters and
vertices partitions
248
0
197
24
58
3
253
0
307
59
219
46
241
186
885
632
Table 18: Distortion results for embedding with P-centers partitioning for datasets into
tree H ′ . P-centers has negligible improvement of distortion for other datasets of table 12
.
CHAPTER 6
Conclusion and Future Work
In Chapter 2, we discussed geometric properties characterizing “tree-likeness” of a
graph from a metric point of view. Specifically, we investigated a few graph parameters,
namely, the tree-distortion and the tree-stretch when embedding a graph into a tree
(tree spanner), the tree-length and the tree-breadth, Gromov’s hyperbolicity, the clusterdiameter and the cluster-radius in a layering partition of a graph, which capture and
quantify this phenomenon of being metrically close to a tree. We provided a detailed
and comprehensive survey on the theory related to the graph parameters used and, in
particular, on the bounds relating these parameters. Furthermore, we calculated or
accurately estimated those parameters on a wide range of real-life networks, taken from
different domains like Internet measurements, biological datasets, web graphs, social and
collaboration networks. Measuring these parameters allowed us to demonstrate existence
of metric tree-like structures in these networks.
Finally in Chapter 2, we discussed algorithmic advantages for a graph to be metrically
tree-like and a few applications of graph approximation with a tree or a tree spanner using
the existing embedding techniques. Such applications include solving some problems
related to routing and distance approximation in a network, as well as graph diameter
and radius estimation.
123
124
From the observations in Chapter 2, we suggest that all these tree-likeness measurements are important where they collectively capture and explain metric tree-likeness of
a given graph. Also, we suggest that metric tree-likeness measurements in conjunction
with other local characteristics of networks, such as the degree distribution and clustering
coefficients, provide a more complete unifying picture of networks.
One challenge intended for future investigation would be how to efficiently calculate
Gromov’s hyperbolicity for very large graphs. The best known algorithm to calculate
hyperbolicity has time complexity of O(n3.69 ) [92]. One algorithm that performs well in
practice is by Cohen et al. from [58], but still has O(n4 ) time complexity. Propositions
2 and 3 of Chapter 2 established lower and upper bounds on the value of hyperbolicity
using cluster-diameter of a layering partition.
• Can we utilize layering partition of a graph to efficiently calculate hyperbolicity?
• Can we obtain an algorithm that works well in practice for very large graphs even
better than the algorithm of [58].
In Chapters 3 and 4, by using Robertson-Seymour’s tree-decomposition of graphs, we
described a necessary condition for a graph to have a multiplicative t-spanner of treewidth k (in particular, to have a multiplicative tree t-spanner, when k = 1). As we have
mentioned earlier, this necessary condition is far from being sufficient. The following
interesting problem remains open.
• Does there exist a clean “if and only if” condition under which a graph admits
a multiplicative (or, additive) t-spanner of tree-width k (in particular, admits a
multiplicative (or, additive) tree t-spanner (k = 1 case))?
125
That necessary condition was very useful in demonstrating that, for every fixed k, there is
a polynomial time algorithm that, given an n-vertex graph G admitting a multiplicative
t-spanner with tree-width k, constructs a system of at most (k + 1)(1 + log2 n) collective
additive tree O(t log n)-spanners of G. In particular, we showed that when k = 1, there is
a polynomial time algorithm that, given an n-vertex graph G admitting a multiplicative
tree t-spanner, constructs a system of at most log2 n collective additive tree O(t log n)spanners of G. Can these results be improved?
• Does a polynomial time algorithm exist that, given an n-vertex graph G admitting
a multiplicative tree t-spanner, constructs a system of O(1) collective additive tree
O(t)-spanners of G?
• Does a polynomial time algorithm exist that, given an n-vertex graph G admitting
a multiplicative t-spanner with tree-width k, constructs a system of O(k) collective
additive tree O(t)-spanners of G?
As we have mentioned earlier, an interesting particular question whether a multiplicative
tree spanner can be turned in polynomial time into an (one) additive tree spanner with a
slight increase in the stretch is (negatively) settled already in [86]. Yet, it is interesting to
know whether an exponential time procedure that performs such a transformation exists.
Two more interesting challenging questions we leave for future investigation.
• Is there any polynomial time algorithm which, given a graph admitting a system of
at most µ collective tree t-spanners, constructs a system of at most α(µ, n) collective
tree β(t, n)-spanners, where α(µ, n) is O(µ) (or O(µ log n)) and β(t, n) is O(t) (or
O(t log n))?
126
In this approximation question, we assume that one knows that a graph G admits a
system of at most µ collective tree t-spanners, but (s)he does not know how to find it
in polynomial time and wonders if something weaker can be constructed efficiently. The
following question is about approximating the k-tree-width t-spanner problem.
• Is there a polynomial time algorithm that, for every unweighted graph G admitting
a t-spanner of tree-width k, constructs a (O(k log n)t)-spanner with tree-width at
most k?
In Chapter 5, we investigated the problem of embedding a weighted graph metric
into a tree metric. We developed an approach with proven theoretical bounds for this
problem. Furthermore, we applied and empirically tested our approach on real world
graph datasets. Generally, we obtained a good embedding results with low distortion
error on average for the tested graphs.
BIBLIOGRAPHY
[1] Pages linking to www.epa.gov. Obtained from Jon Kleinberg’s web page. Avaialable at: http://www.cs.cornell.edu/courses/cs685/2002fa/.
[2] Planetlab: An open platform for developing, deploying, and accessing planetaryscale services. https://www.planet-lab.org.
[3] S3 : Scalable sensing service. http://networking.hpl.hp.com/s-cube.
[4] University of oregon route-views project. http://www.routeviews.org/.
[5] Will there ever be a tree of life that systematists can agree on? Science, 125th
anniversary issue, 2005. http://www.sciencemag.org/sciext/125th/.
[6] Ittai Abraham, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Venugopalan
Ramasubramanian, and Kunal Talwar. Reconstructing approximate tree metrics.
In PODC, pages 43–52, 2007.
[7] Ittai Abraham, Mahesh Balakrishnan, Fabian Kuhn, Dahlia Malkhi, Venugopalan
Ramasubramanian, and Kunal Talwar. Reconstructing approximate tree metrics.
In PODC, pages 43–52, 2007.
[8] Aaron B. Adcock, Blair D. Sullivan, and Michael W. Mahoney. Tree-like structure
in large social and information networks. In ICDM, pages 1–10, 2013.
[9] Richa Agarwala, Vineet Bafna, Martin Farach, Mike Paterson, and Mikkel Thorup.
On the approximability of numerical taxonomy (fitting distances by tree metrics).
SIAM J. Comput., 28(3):1073–1085, 1999.
[10] Noga Alon, Mihai Badoiu, Erik D. Demaine, Martin Farach-Colton, Mohammad Taghi Hajiaghayi, and Anastasios Sidiropoulos. Ordinal embeddings of minimum relaxation: general properties, trees, and ultrametrics. In SODA, pages
650–659. SIAM, 2005.
[11] Noga Alon, Richard M. Karp, David Peleg, and Douglas West. A graph-theoretic
game and its application to the k-server problem. SIAM J. COMPUT, 24:78–100,
1995.
[12] Ingo Althöfer, Gautam Das, David P. Dobkin, Deborah Joseph, and José Soares.
On sparse spanners of weighted graphs. Discrete & Computational Geometry, 9:81–
100, 1993.
[13] Stefan Arnborg, Derek G. Corneil, and Andrzej Proskurowski. Complexity of finding embeddings in a k-tree. SIAM J. Algebraic Discrete Methods, 8(2):277–284,
April 1987.
127
128
[14] Yonatan Aumann and Yuval Rabani. An o(log k) approximate min-cut max-flow
theorem and approximation algorithm. SIAM J. Comput., 27(1):291–301, February
1998.
[15] Giorgio Ausiello, Alessandro D’Atri, and Marina Moscarini. Chordality properties on graphs and minimal conceptual connections in semantic data models. J.
Comput. Syst. Sci., 33(2):179–202, 1986.
[16] Baruch Awerbuch and Yossi Azar. Buy-at-bulk network design. In FOCS, pages
542–547, 1997.
[17] Mihai Badoiu, Julia Chuzhoy, Piotr Indyk, and Anastasios Sidiropoulos. Lowdistortion embeddings of general metrics into the line. In STOC, pages 225–233,
2005.
[18] Mihai Badoiu, Julia Chuzhoy, Piotr Indyk, and Anastasios Sidiropoulos. Embedding ultrametrics into low-dimensional spaces. In Symposium on Computational
Geometry, pages 187–196, 2006.
[19] Mihai Badoiu, Erik D. Demaine, MohammadTaghi Hajiaghayi, Anastasios
Sidiropoulos, and Morteza Zadimoghaddam. Ordinal embedding: Approximation
algorithms and dimensionality reduction. In APPROX-RANDOM, pages 21–34,
2008.
[20] Mihai Badoiu, Kedar Dhamdhere, Anupam Gupta, Yuri Rabinovich, Harald Räcke,
R. Ravi, and Anastasios Sidiropoulos. Approximation algorithms for low-distortion
embeddings into low-dimensional spaces. In SODA, pages 119–128, 2005.
[21] Mihai Badoiu, Piotr Indyk, and Anastasios Sidiropoulos. Approximation algorithms for embedding general metrics into trees. In SODA, pages 512–521, 2007.
[22] A. L. Barabasi and R. Albert. Emergence of scaling in random networks. Science,
286:509–512, 1999.
[23] Albert-László Barabási, Réka Albert, and Hawoong Jeong. Scale-free characteristics of random networks: the topology of the world-wide web. Physica A: Statistical
Mechanics and its Applications, 281(1-4):69–77, June 2000.
[24] Yair Bartal. Probabilistic approximation of metric spaces and its algorithmic applications. In In 37th Annual Symposium on Foundations of Computer Science,
pages 184–193, 1996.
[25] Yair Bartal. On approximating arbitrary metrics by tree metrics. In Proceedings of
the 30th Annual ACM Symposium on Theory of Computing, pages 161–168, 1998.
[26] Yair Bartal, Avrim Blum, Carl Burch, and Andrew Tomkins. A polylog(n)competitive algorithm for metrical task systems. In STOC, pages 711–719, 1997.
[27] Surender Baswana, Telikepalli Kavitha, Kurt Mehlhorn, and Seth Pettie. New
constructions of (alpha, beta)-spanners and purely additive spanners. In SODA,
pages 672–681, 2005.
129
[28] Surender Baswana and Sandeep Sen. A simple linear time algorithm for computing
a (2k-1)-spanner of o(n1+1/k ) size in weighted graphs. In ICALP, pages 384–296,
2003.
[29] Vladimir Batagelj and Andrej Mrvar. Some analyses of Erdos collaboration graph.
Social Networks, 22(2):173–186, May 2000. http://vlado.fmf.uni-lj.si/pub/
networks/data/Erdos/Erdos02.net.
[30] Catriel Beeri, Ronald Fagin, David Maier, and Mihalis Yannakakis. On the Desirability of Acyclic Database Schemes. Journal of the ACM, 30(3):479–513, 1983.
[31] C. Berge. Hypergraphs: Combinatorics of Finite Sets. North-Holland, 1989.
[32] Piotr Berman, Arnab Bhattacharyya, Konstantin Makarychev, Sofya Raskhodnikova, and Grigory Yaroslavtsev. Improved approximation for the directed spanner problem. In ICALP (1), pages 1–12, 2011.
[33] Arnab Bhattacharyya, Elena Grigorescu, Kyomin Jung, Sofya Raskhodnikova, and
David P. Woodruff. Transitive-closure spanners. SIAM J. Comput., 41(6):1380–
1425, 2012.
[34] Avrim Blum, Goran Konjevod, R. Ravi, and Santosh Vempala. Semi-definite relaxations for minimum bandwidth and other vertex-ordering problems. In STOC,
pages 100–105, 1998.
[35] Hans L. Bodlaender. A linear-time algorithm for finding tree-decompositions of
small treewidth. SIAM J. Comput., 25(6):1305–1317, December 1996.
[36] M. Boguñá, D. Krioukov, and K. C. Claffy. Navigability of complex networks.
Nature Physics, 5(1):74–80, 2009.
[37] J. Bourgain. On lipschitz embedding of finite metric spaces in Hilbert space. Isr.
J. of Math., 52(1):46–52, March 1985.
[38] Ulrik Brandes and Dagmar Handke. Np-completeness results for minimum planar
spanners. Discrete Mathematics & Theoretical Computer Science, 3(1):1–10, 1998.
[39] Andreas Brandstädt, Victor Chepoi, and Feodor F. Dragan. Distance approximating trees for chordal and dually chordal graphs. J. Algorithms, 30(1):166–184,
1999.
[40] Andreas Brandstädt, Feodor F. Dragan, Hoàng-Oanh Le, and Van Bang Le. Tree
spanners on chordal graphs: complexity and algorithms. Theor. Comput. Sci.,
310(1-3):329–354, 2004.
[41] Andreas Brandstädt, Feodor F. Dragan, Hoàng-Oanh Le, Van Bang Le, and Ryuhei
Uehara. Tree spanners for bipartite graphs and probe interval graphs. Algorithmica, 47(1):27–51, 2007.
[42] G. Brinkmann, J. Koolen, and V. Moulton. On the hyperbolicity of chordal graphs.
Annals of Combinatorics, 5(1):61–69, 2001.
130
[43] Dongbo Bu, Yi Zhao, Lun Cai, Hong Xue, Xiaopeng Zhu, Hongchao Lu, Jingfen
Zhang, Shiwei Sun, Lunjiang Ling, Nan Zhang, Guojie Li, and Runsheng Chen.
Topological structure analysis of the proteinprotein interaction network in budding
yeast. Nucleic Acids Research, 31(9):2443–2450, May 2003. Dataset available at:
http://vlado.fmf.uni-lj.si/pub/networks/data/bio/Yeast/Yeast.htm.
[44] Leizhen Cai and Derek G. Corneil. Tree spanners. SIAM J. Discrete Math.,
8(3):359–387, 1995.
[45] CAIDA. The CAIDA AS relationships dataset, 1 June 2012- 5 June 2012. http:
//www.caida.org/data/active/as-relationships.
[46] CAIDA. The internet topology data kit #0304, April 2003. http://www.caida.
org/data/active/internet-topology-data-kit.
[47] CAIDA. The CAIDA AS relationships dataset, 5 November 2007. http://www.
caida.org/data/active/as-relationships.
[48] Moses Charikar, Chandra Chekuri, Ashish Goel, and Sudipto Guha. Rounding via
trees: Deterministic approximation algorithms for group steiner trees and k-median.
In STOC, pages 114–123, 1998.
[49] Kai Chen, David R. Choffnes, Rahul Potharaju, Yan Chen, Fabian E. Bustamante,
Dan Pei, and Yao Zhao. Where the sidewalk ends: extending the internet as graph
using traceroutes from p2p users. In Proceedings of the 5th international conference
on Emerging networking experiments and technologies, CoNEXT ’09, pages 217–
228, New York, NY, USA, 2009. ACM. http://www.aqualab.cs.northwestern.
edu/projects.
[50] Wei Chen, Wenjie Fang, Guangda Hu, and Michael W. Mahoney. On the hyperbolicity of small-world and tree-like random graphs. In ISAAC, volume 7676 of
Lecture Notes in Computer Science, pages 278–288. Springer, 2012.
[51] Victor Chepoi and Feodor F. Dragan. A note on distance approximating trees in
graphs. Eur. J. Comb., 21(6):761–766, 2000.
[52] Victor Chepoi, Feodor F. Dragan, Bertrand Estellon, Michel Habib, and Yann
Vaxès. Diameters, centers, and approximating trees of delta-hyperbolicgeodesic
spaces and graphs. In Symposium on Computational Geometry, pages 59–68, 2008.
[53] Victor Chepoi, Feodor F. Dragan, Bertrand Estellon, Michel Habib, Yann Vaxès,
and Yang Xiang. Additive spanners and distance and routing labeling schemes for
hyperbolic graphs. Algorithmica, 62(3-4):713–732, 2012.
[54] Victor Chepoi, Feodor F. Dragan, Ilan Newman, Yuri Rabinovich, and Yann Vaxès.
Constant approximation algorithms for embedding graph metrics into trees and
outerplanar graphs. Discrete & Computational Geometry, 47(1):187–214, 2012.
[55] Victor Chepoi and Bertrand Estellon. Packing and covering delta -hyperbolic
spaces by balls. In APPROX-RANDOM, pages 59–73, 2007.
131
[56] Victor Chepoi and Bernard Fichet. l∞ -approximation via subdominants. J. Math.
Psychol., 44(4):600–616, 2000.
[57] Fan R. K. Chung and Linyuan Lu. The average distance in a random graph with
given expected degrees. Internet Mathematics, 1(1):91–113, 2003.
[58] Nathann Cohen, David Coudert, and Aurélien Lancin. Exact and approximate
algorithms for computing the hyperbolicity of large-scale graphs. Rapport de
recherche RR-8074, INRIA, September 2012.
[59] V Colizza, R Pastor-Satorras, and A Vespignani. Reaction–diffusion processes and
metapopulation models in heterogeneous networks. Nature Physics, 3:276–282,
January 2007.
[60] Derek G. Corneil, Feodor F. Dragan, Ekkehard Köhler, and Chenyu Yan. Collective
tree 1-spanners for interval graphs. In WG, pages 151–162, 2005.
[61] Pajek datasets. Geom: Collaboration network in computational geometry. http:
//vlado.fmf.uni-lj.si/pub/networks/data/collab/geom.htm.
[62] Fabien de Montgolfier, Mauricio Soto, and Laurent Viennot. Treewidth and hyperbolicity of the internet. In NCA, pages 25–32. IEEE Computer Society, 2011.
[63] W. de Nooy. The network data on the administrative elite in the netherlands
in April- June 2006. http://vlado.fmf.uni-lj.si/pub/networks/data/2mode/
DutchElite.htm.
[64] Michael J. Demmer and Maurice Herlihy. The arrow distributed directory protocol.
In Shay Kutten, editor, DISC, volume 1499 of Lecture Notes in Computer Science,
pages 119–133. Springer, 1998.
[65] Reinhard Diestel. Graph Theory, 4th Edition, volume 173 of Graduate texts in
mathematics. Springer, 2012.
[66] Michael Dinitz, Guy Kortsarz, and Ran Raz. Label cover instances with large girth
and the hardness of approximating basic k-spanner. CoRR, abs/1203.0224, 2012.
[67] Michael Dinitz and Robert Krauthgamer. Directed spanners via flow-based linear
programs. In STOC, pages 323–332, 2011.
[68] Dorit Dor, Shay Halperin, and Uri Zwick. All-pairs almost shortest paths. SIAM
J. Comput., 29(5):1740–1759, 2000.
[69] Yon Dourisboure. Compact routing schemes for generalised chordal graphs. J.
Graph Algorithms Appl., 9(2):277–297, 2005.
[70] Yon Dourisboure, Feodor F. Dragan, Cyril Gavoille, and Chenyu Yan. Spanners
for bounded tree-length graphs. Theor. Comput. Sci., 383(1):34–44, 2007.
[71] Yon Dourisboure and Cyril Gavoille. Tree-decompositions with bags of small diameter. Discrete Mathematics, 307(16):2008–2029, 2007.
132
[72] Feodor F. Dragan. Tree-like structures in graphs: a metric point of view. In WG,
2013.
[73] Feodor F. Dragan and Muad Abu-Ata. Collective additive tree spanners of bounded
tree-breadth graphs with generalizations and consequences. In SOFSEM, pages
194–206, 2013.
[74] Feodor F. Dragan, Fedor V. Fomin, and Petr A. Golovach. Approximation of
minimum weight spanners for sparse graphs. Theor. Comput. Sci., 412(8-10):846–
852, 2011.
[75] Feodor F. Dragan, Fedor V. Fomin, and Petr A. Golovach. Spanners in sparse
graphs. J. Comput. Syst. Sci., 77(6):1108–1119, 2011.
[76] Feodor F. Dragan and Ekkehard Köhler. An approximation algorithm for the
tree t-spanner problem on unweighted graphs via generalized chordal graphs. In
APPROX-RANDOM, pages 171–183, 2011.
[77] Feodor F. Dragan and Chenyu Yan. Collective tree spanners in graphs with
bounded parameters. Algorithmica, 57(1):22–43, 2010.
[78] Feodor F. Dragan, Chenyu Yan, and Derek G. Corneil. Collective tree spanners
and routing in at-free related graphs. J. Graph Algorithms Appl., 10(2):97–122,
2006.
[79] Feodor F. Dragan, Chenyu Yan, and Irina Lomonosov. Collective tree spanners of
graphs. SIAM J. Discrete Math., 20(1):241–260, 2006.
[80] William Duckworth, Nicholas C. Wormald, and Michele Zito. A ptas for the sparsest 2-spanner of 4-connected planar triangulations. J. Discrete Algorithms, 1(1):67–
76, 2003.
[81] Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme Mitchison. Biological
Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge
University Press, 1998.
[82] Michael Elkin and David Peleg. Strong inapproximability of the basic k-spanner
problem. In ICALP, pages 636–647, 2000.
[83] Michael Elkin and David Peleg. (1+epsilon, beta)-spanner constructions for general
graphs. In Proceedings of the thirty-third annual ACM symposium on Theory of
computing, STOC ’01, pages 173–182, New York, NY, USA, 2001. ACM.
[84] Michael Elkin and David Peleg. Approximating k-spanner problems for kge2.
Theor. Comput. Sci., 337(1-3):249–277, 2005.
[85] Michael Elkin and David Peleg. The hardness of approximating spanner problems.
Theory Comput. Syst., 41(4):691–729, 2007.
[86] Yuval Emek and David Peleg. Approximating minimum max-stretch spanning trees
on unweighted graphs. SIAM J. Comput., 38(5):1761–1781, 2008.
133
[87] Jittat Fakcharoenphol, Satish Rao, and Kunal Talwar. A tight bound on approximating arbitrary metrics by tree metrics. J. Comput. Syst. Sci., 69(3):485–497,
2004.
[88] Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251–262, 1999.
[89] Uriel Feige. Approximating the bandwidth via volume respecting embeddings. J.
Comput. Syst. Sci., 60(3):510–539, 2000.
[90] Sándor P. Fekete and Jana Kremer. Tree spanners in planar graphs. Discrete
Applied Mathematics, 108(1-2):85–103, 2001.
[91] Fedor V. Fomin, Petr A. Golovach, and Erik Jan van Leeuwen. Spanners of bounded
degree graphs. Inf. Process. Lett., 111(3):142–144, 2011.
[92] Hervé Fournier, Anas Ismail, and Antoine Vigneron. Computing the gromov hyperbolicity of a discrete metric space. CoRR, abs/1210.3323, 2012.
[93] Naveen Garg, Goran Konjevod, and R. Ravi. A polylogarithmic approximation
algorithm for the group steiner tree problem. In Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’98, pages 253–259,
Philadelphia, PA, USA, 1998. Society for Industrial and Applied Mathematics.
[94] Cyril Gavoille and Olivier Ly. Distance labeling in hyperbolic graphs. In ISAAC,
pages 1071–1079, 2005.
[95] Cyril Gavoille and David Peleg. Compact and localized distributed data structures.
Distributed Computing, 16(2-3):111–120, 2003.
[96] E. Ghys and P. de la Harpe eds. Les groupes hyperboliques d’après m. gromov.
Progress in Mathematics, 83, 1990.
[97] J. R. Gilbert, D. J. Rose, and A. Edenbrandt. A separator theorem for chordal
graphs. SIAM Journal on Algebraic and Discrete Methods, 5(3):306–313, 1984.
[98] T. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306, 1985.
[99] M Gromov. Hyperbolic groups: Essays in group theory. MSRI Publ., 8:75263,
1987.
[100] P. Krishna Gummadi, Stefan Saroiu, and Steven D. Gribble. King: estimating
latency between arbitrary internet end hosts. Computer Communication Review,
32(3):11, 2002.
[101] Anupam Gupta. Steiner points in tree metrics don’t (really) help. In SODA, pages
220–227, 2001.
[102] Anupam Gupta, Amit Kumar, and Rajeev Rastogi. Traveling with a pez dispenser
(or, routing issues in mpls). SIAM J. Comput., 34(2):453–474, 2004.
134
[103] Alexander Hall and Christos H. Papadimitriou. Approximating the distortion. In
APPROX-RANDOM, pages 111–122, 2005.
[104] Teresa W. Haynes, Stephen Hedetniemi, and Peter Slater. Fundamentals of Domination in Graphs (Pure and Applied Mathematics (Marcel Dekker)). CRC, 1998.
[105] Maurice Herlihy, Fabian Kuhn, Srikanta Tirthapura, and Roger Wattenhofer. Dynamic analysis of the arrow distributed protocol. Theory Comput. Syst., 39(6):875–
901, 2006.
[106] Piotr Indyk. Algorithmic applications of low-distortion geometric embeddings. In
FOCS, pages 10–33, 2001.
[107] Piotr Indyk and Jiri Matousek. Low-distortion embeddings of finite metric spaces.
In in Handbook of Discrete and Computational Geometry, pages 177–196. CRC
Press, 2004.
[108] H. Jeong, S. P. Mason, A.-L. Barabsi, and Z. N. Oltvai. Lethality and centrality
in protein networks. Nature, 411(6833):41–42, 2001. Avaialable at: http://www3.
nd.edu/~networks/resources.htm.
[109] Mong-Jen Kao, Der-Tsai Lee, and Dorothea Wagner. Approximating metrics by
tree metrics of small distance-weighted average stretch. CoRR, abs/1301.3252,
2013.
[110] W. S. Kennedy, O. Narayan, and I. Saniee. On the Hyperbolicity of Large-Scale
Networks. ArXiv e-prints, June 2013.
[111] Claire Kenyon, Yuval Rabani, and Alistair Sinclair. Low distortion maps between
point sets. In STOC, pages 272–280, 2004.
[112] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM,
46(5):604–632, September 1999. http://www.cs.cornell.edu/courses/cs685/
2002fa/.
[113] Jon M. Kleinberg. The small-world phenomenon: an algorithm perspective. In
STOC, pages 163–170, 2000.
[114] Jon M. Kleinberg. Small-world phenomena and the dynamics of information. In
NIPS, pages 431–438, 2001.
[115] Robert Kleinberg. Geographic routing using hyperbolic space. In INFOCOM,
pages 1902–1909, 2007.
[116] Guy Kortsarz. On the hardness of approximating spanners.
30(3):432–450, 2001.
Algorithmica,
[117] Guy Kortsarz and David Peleg. Generating sparse 2-spanners. J. Algorithms,
17(2):222–236, 1994.
[118] D. Kratsch, H. Le, H. Mller, E. Prisner, and D. Wagner. Additive tree spanners.
SIAM Journal on Discrete Mathematics, 17(2):332–340, 2003.
135
[119] Robert Krauthgamer and James R. Lee. Algorithms on negatively curved spaces.
In FOCS, pages 119–132, 2006.
[120] James R. Lee, Assaf Naor, and Yuval Peres. Trees and markov convexity. In SODA,
pages 1028–1037, 2006.
[121] Jure Leskovec, Kevin J. Lang, Anirban Dasgupta, and Michael W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large
well-defined clusters. Internet Mathematics, 6(1):29–123, 2009.
[122] Christian Liebchen and Gregor Wünsch. The zoo of tree spanner problems. Discrete Applied Mathematics, 156(5):569–587, 2008.
[123] A.L. Liestman and T. Shermer. Additive graph spanners. Networks, 23(4):343–364,
1993.
[124] Michal Linial, Nathan Linial, Naftali Tishby, and Golan Yona. Global self organization of all known protein sequences reveals inherent biological signatures, 1997.
[125] Nathan Linial. Finite metric spaces – combinatorics, geometry and algorithms. In
Proceedings of the International Congress of Mathematicians III, pages 573–586,
2002.
[126] Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and
some of its algorithmic applications. Combinatorica, 15(2):215–245, 1995.
[127] Daniel Lokshtanov. On the complexity of computing treelength. Discrete Applied
Mathematics, 158(7):820–827, 2010.
[128] Jirı́ Matousek and Anastasios Sidiropoulos. Inapproximability for metric embeddings into rd . In FOCS, pages 405–413, 2008.
[129] Onuttom Narayan and Iraj Saniee. Large-scale curvature of networks. Physical
Review E, 84(6):066108, 2011.
[130] M. E. J. Newman. Scientific collaboration networks. II. Shortest paths, weighted
networks, and centrality. Physical Review E, 64(1):016132+, June 2001.
[131] M. E. J. Newman. The structure of scientific collaboration networks. Proceedings
of the National Academy of Sciences, 98(2):404–409, January 2001.
[132] M. E. J. Newman. Finding community structure in networks using the eigenvectors
of matrices. Physical Review E, 74(3):036104+, September 2006. Dataset available
at:http://www-personal.umich.edu/~mejn/netdata/.
[133] K. Norlen, G. Lucas, M. Gebbie, and J. Chuang. EVA: Extraction, Visualization
and Analysis of the Telecommunications and Media Ownership Network. Proceedings of International Telecommunications Society 14th Biennial Conference
(ITS2002), Seoul Korea, August 2002. Dataset available at: http://vlado.fmf.
uni-lj.si/pub/networks/data/econ/Eva/Eva.htm.
136
[134] Tore Opsahl. Why anchorage is not (that) important: Binary ties and sample
selection. Available at:http://wp.me/poFcY-Vw.
[135] Tore Opsahl. Triadic closure in two-mode networks: Redefining the global and local
clustering coefficients. Social Networks, 35(2):159–167, 2013. Dataset available
at:http://toreopsahl.com/datasets/#online_forum_network.
[136] Tore Opsahl and Pietro Panzarasa. Clustering in weighted networks. Social
Networks, 31(2):155–163, 2009. Dataset available at:http://toreopsahl.com/
datasets/#online_social_network.
[137] Christos H. Papadimitriou and Shmuel Safra. The complexity of low-distortion
embeddings between point sets. In SODA, pages 112–118, 2005.
[138] D. Peleg. Distributed Computing: A Locality-Sensitive Approach. SIAM Monographs on Discrete Math. Appl. SIAM, Philadelphia, 2000.
[139] D. Peleg and D. Tendler. Low stretch spanning trees for planar graphs,. Technical
report, Weizmann Science Press of Israel, 2001.
[140] David Peleg. Proximity-preserving labeling schemes and their applications. In
WG, pages 30–41, 1999.
[141] David Peleg. Low stretch spanning trees. In MFCS, pages 68–80, 2002.
[142] David Peleg and Eilon Reshef. Low complexity variants of the arrow distributed
directory. J. Comput. Syst. Sci., 63(3):474–485, 2001.
[143] David Peleg and Alejandro A. Schäffer. Graph spanners. Journal of Graph Theory,
13(1):99–116, 1989.
[144] David Peleg and Jeffrey D. Ullman. An optimal synchronizer for the hypercube.
SIAM J. Comput., 18(4):740–747, 1989.
[145] David Peleg and Eli Upfal. A tradeoff between space and efficiency for routing
tables (extended abstract). In STOC, pages 43–52, 1988.
[146] Erich Prisner. Distance approximating spanning trees. In STACS, pages 499–510,
1997.
[147] Yuri Rabinovich and Ran Raz. Lower bounds on the distortion of embedding finite
metric spaces in graphs. Discrete & Computational Geometry, 19(1):79–94, 1998.
[148] Venugopalan Ramasubramanian, Dahlia Malkhi, Fabian Kuhn, Mahesh Balakrishnan, Archit Gupta, and Aditya Akella. On the treeness of internet latency and
bandwidth. In SIGMETRICS/Performance, pages 61–72, 2009.
[149] Vinay J. Ribeiro, Rudolf H. Riedi, Richard G. Baraniuk Jiri Navratil, and Les
Cottrell. pathChirp: Efficient available bandwidth estimation for network paths.
In Ronn Ritke, Tony McGregor, and Jörg Micheel, editors, PAM 2003, 4th Passive
and Active Measurement Workshop. NLANR/MNA, UCSD, apr 2002.
137
[150] N. Robertson and P. D. Seymour. Graph minors II: algorithmic aspects of treewidth. Journal Algorithms, 7:309–322, 1986.
[151] Charles Semple and Mike Steel. Phylogenetics, volume 24 of Oxford lecture series
in mathematics and its applications 24. Oxford University Press, 2003.
[152] Yuval Shavitt and Eran Shir. Dimes: Let the internet measure itself. CoRR,
abs/cs/0506099, 2005. Avaialable at: http://www.netdimes.org.
[153] Yuval Shavitt and Tomer Tankel. On the curvature of the internet and its usage
for overlay construction and distance estimation. In INFOCOM, 2004.
[154] Yuval Shavitt and Tomer Tankel. Hyperbolic embedding of internet graph for
distance estimation and overlay construction. IEEE/ACM Trans. Netw., 16(1):25–
36, 2008.
[155] Chris Stark, Bobby-Joe Breitkreutz, Teresa Reguly, Lorrie Boucher, Ashton Breitkreutz, and Mike Tyers. Biogrid: a general repository for interaction datasets.
Nucleic Acids Research, 34(Database-Issue):535–539, 2006. Dataset available at:
http://thebiogrid.org/, release 3.2.99.
[156] Jeremy Stribling.
~strib/pl_app.
Planetlab all-pairs-pings.
http://pdos.csail.mit.edu/
[157] Mikkel Thorup and Uri Zwick. Compact routing schemes. In SPAA, pages 1–10,
2001.
[158] Mikkel Thorup and Uri Zwick. Approximate distance oracles. J. ACM, 52(1):1–24,
2005.
[159] D. J. Watts and S. H. Strogatz. Collective dynamics of’small-world’networks.
Nature, 393(6684):409–10, 1998. Dataset available at:http://toreopsahl.com/
datasets/#celegans.
[160] D.J. Watts and S.H. Strogatz. Collective dynamics of ’small-world’ networks. Nature, (393):440–442, 1998.
[161] Bernard Wong, Aleksandrs Slivkins, and Emin Gün Sirer. Meridian: a lightweight
network location service without virtual coordinates. In SIGCOMM, pages 85–96,
2005. www.cs.cornell.edu/people/egs/meridian.
[162] David P. Woodruff. Additive spanners in nearly quadratic time. In ICALP (1),
pages 463–474, 2010.
[163] Yaokun Wu and Chengpeng Zhang. Hyperbolicity and chordality of a graph.
Electr. J. Comb., 18(1), 2011.
[164] Chenyu Yan, Yang Xiang, and Feodor F. Dragan. Compact and low delay routing
labeling scheme for unit disk graphs. Comput. Geom., 45(7):305–325, 2012.
138
[165] Jaewon Yang and Jure Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM, pages 745–754, 2012. Avaialable at:
http://snap.stanford.edu/data/com-Amazon.html, http://snap.stanford.
edu/data/com-DBLP.html.
Download