Stochastic Analysis and Improvement of the Reliability of DHT-based Multicast

advertisement
Stochastic Analysis and Improvement of the
Reliability of DHT-based Multicast
Guang Tan and Stephen A. Jarvis
Department of Computer Science, University of Warwick
Coventry, CV4 7AL, United Kingdom
Email: gtan,saj @dcs.warwick.ac.uk
Abstract— This paper investigates the reliability of applicationlevel multicast based on a distributed hash table (DHT) in a
highly dynamic network. Using a node residual lifetime model, we
derive the stationary end-to-end delivery ratio of data streaming
between a pair of nodes in the worst case, and show through
numerical examples that in a practical DHT network, this
ratio can be very low (e.g., less than 50%). Leveraging the
property of heavy-tailed lifetime distribution, we then consider
three optimizing techniques, namely Senior Member Overlay
(SMO), Longer-Lived Neighbor Selection (LNS), and Reliable
Route Selection (RRS), and present quantitative analysis of data
delivery reliability under these schemes. In particular, we discuss
the tradeoff between delivery ratio and the load imbalance among
nodes. Simulation experiments are also used to evaluate the
multicast performance under practical settings. Our model and
analytic results provide useful tools for reliability analysis for
other overlay-based applications (e.g., those involving persistent
data transfers).
I. I NTRODUCTION
Overlay multicast [12] is an effective paradigm to provide
large-scale data dissemination over the Internet. There are two
basic approaches to organizing multicast groups. The first is to
make all multicast members self-organize into a group according to some kind of topology (e.g., tree or mesh); the multicast
members need to locate upstream nodes and assume links
maintenance [11] [20]. In the second approach [10] [31] [21],
the multicast protocol is layered based on a distributed hash
table (DHT) protocol that supports application-layer routing
between overlay nodes. The different routes provided by the
DHT from receiving nodes to the source node automatically
form a tree topology. Two main advantages of the DHT-based
approach are (1) multicast applications can easily exploit the
DHT’s routing and failure recovery functions to organize the
multicast group, obviating the need to handle network dynamics and maintain neighbor sets themselves, and (2) the same
DHT-based overlay (e.g., openDHT [23]) can be shared by
many overlay applications and multicast trees simultaneously.
As DHTs have been increasingly deployed and used as a
building block of distributed systems, developing multicast
based on DHTs can considerably simply software development
and thus becomes an appealing scheme for multicast. A
number of projects have been or are being undertaken using
this technique (e.g., Splitstream [9], RSSDHT [1], MOOD [2],
and QStream [23]).
In this paper we study the reliability property of DHT-based
multicast in the context of low-bit-rate streaming applications,
such as text/voice streaming and distributed white board.
Reliability is one of the major concerns for any overlay-based
multicast protocol. In an overlay network, nodes are highly
transient and the data streaming between two end-points can
suffer frequent interruptions, which may last in the order of
tens of seconds [11]. As a result, the nodes may either receive
only a small proportion of the data or have to heavily rely
on some kind of error recovery mechanism. This problem
is particularly critical for a DHT-based multicast protocol, as
the DHT routes often pass through some non-multicast-group
nodes, which leads to longer data delivery paths and hence
poorer transfer reliability.
Our analysis begins with a stochastic model for node
lifetime and data delivery over a set of nodes. Specifically, we
consider the distribution of node residual lifetime, which plays
a fundamental role in the reliability analysis throughout the
paper. With this result, we then obtain the worst-case stationary
data delivery ratio between the source node and a receiving
node. The numerical examples show that this ratio can be very
low (e.g., less than 50%).
Using the fact that a node’s lifetime generally follows a
heavy-tailed distribution [8] [28] [25] [26], which itself implies
that the longer-lived nodes are likely to be more stable than
short-lived ones, we then consider three optimization schemes.
In the first scheme, called Senior Member Overlay (SMO), the
nodes above a certain threshold age are organized into a special
overlay, which takes the responsibility of all the forwarding
tasks in data delivery. The second scheme, called LongerLived Neighbor Selection (LNS), leverages the flexibility of
neighbor selection provided by some DHT algorithms to make
every node choose relatively stable nodes as their neighbors,
thus improving the stability of the data delivery paths. In the
third scheme, called Reliable Route Selection (RRS), the DHT
algorithm no longer progresses through the ID space in a
greedy manner; instead it will choose stable next hop nodes
from the available options under the constraint that the path
length remains unchanged. We examine the data delivery ratio
under these schemes and discuss their implications on the load
imbalance among nodes. To obtain insight into DHT-based
multicast under more realistic settings, we also conduct simulations to evaluate their performance using practical metrics.
For illustration purposes, we use Chord [27] and Scribe [10]
as examples of the DHT and multicast group management
protocols, respectively. We will also discuss the applicability
of certain techniques to a number of representative DHT
algorithms, including Pastry [24], CAN [22], Kademlia [16]
and De Bruijn [18]. Due to space limitation, we do not
discuss other multicast group management protocols, such as
Bayeux [31] and the one proposed in [21], as well as their
combinations with the different underlying DHT algorithms,
but the analysis would be easily applied to these variations.
We confine our study to applications for which the bandwidth constraint of nodes is not a big issue. Handling bandwidth constraints [5] poses major challenges to the formal
analysis and complicates the design of optimization schemes.
We leave this as a subject of future research.
The paper proceeds as follows: Section II establishes the
stochastic model; based on the model, Section III analyzes the
data delivery ratio of an overlay path under the plain DHT;
Section IV introduces and analyzes the three optimization
techniques; Section V presents the simulation results; Section VI documents some related work and finally Section VII
concludes the paper.
II. M ODEL AND P ERFORMANCE M ETRICS
We assume a simple multicast application that transfers data
in the following manner: in a multicast group, the data is
sent from the source infinitely at a constant data rate, without
buffering data at any node, and no retransmission or recovery
mechanism is present in the system.
It has been widely observed that application endpoints’
lifetimes (in terms of node uptime, user session time or
file transfer time) often follow a heavy-tailed distribution.
Typical applications exhibiting such characteristics include file
sharing [8] [28], multimedia streaming [6] [25] [26] [30], etc.
We therefore use a shifted Pareto distribution [14] "!#$%& to model nodes’
lifetimes ' . The shape parameter ! is assumed to be greater
than 2 so that the finite mean ( and variance ) exist. We also
0
1 2 34
8
use the density function *+,-.!/
1"!56$+ . It ?
is easy to verify that (#7,1!
and
)9:;!9<%=>!
9+@! $+BA .
Other assumptions are as follows. Nodes enter the system in
a Poisson process (PP) [28] at a constant rate C , with node D ’s
arrival time at EGF ; during the evolution of the network, nodes’
lifetimes follow the same distribution 3 . The overlay
network has a very large number of nodes (e.g., one million)
and has entered a steady state. We further assume a H -bit
(HGIKJ ) DHT identifier and that $ML is much larger than the
number of nodes in the system.
Throughout the paper, we use delivery ratio as a primary
metric for the reliability of the multicast. It is defined as the
proportion of data units successfully received by a node from
the source [4]. Under the assumed transfer mode, the delivery
ratio is approximately the fraction of time during which a
receiving node finds that all the forwarding nodes between the
source and itself function normally. We focus on the worstcase stationary delivery ratio between two nodes, which is
achieved when the two nodes have a maximum path length N
Q
(e.g., $O for two nodes $+9"P
distance apart). The quantity N is
determined by the two nodes’s IDs and the routing algorithm
of the DHT. Assuming a fixed maximum path length allows
us to obtain an exact lower bound for the stationary delivery
ratio between two general nodes.
A second metric in our model is the node stress, defined as
the number of children supported by a node. In our model, a
key tradeoff is between delivery ratio and the load balancing
of nodes. Since the minimum node stress is zero, the maximum
node stress (MNS) of all nodes, which reflects the variation in
the range of individual node stresses, is used as a metric for
load balancing. When the number of multicast groups R& ,
the MNS of a DHT is the per-group MNS times R . Usually
R is independent of the DHT and multicast
protocols, so we
often omit this factor (i.e., assume RS
) when comparing
the MNSs of various schemes.
Note that in a practical network, the dynamics of nodes
may introduce a substantial number of undesirable non-DHT
links among the nodes [5], which complicates the analysis of
MNS. We therefore assume that the multicast protocol can
automatically correct these non-DHT links (by, for example,
periodically re-establishing the application-layer connections
according to the DHT routing table), so that a node’s stress
can be approximated by the number of DHT nodes pointing
to it (i.e., its in-degree in the DHT graph).
III. DATA D ELIVERY R ATIO
IN A
P LAIN DHT
In this section we derive the worst-case delivery ratio
between two nodes in a DHT network. We first examine the
mean residual lifetime of a randomly chosen node, and then
obtain the stationary delivery ratio using renewal theory.
A. Node Residual Lifetime
Given the current time T , we let random variables U and V
denote a (randomly chosen) node’s age and residual lifetime,
respectively. This is illustrated in Figure 1(a), where the sum
of U and V is equal to ' . We first examine the property of
U , and then consider the joint distribution of 3UW"VX , whose
marginal distribution will characterize our variable of interest
V .
As assumed in the previous section, the nodes enter the
overlay network
Y as a homogeneous Poisson process (PP) with
rate C . Let
3Z["\]Z_^?T be the corresponding counting
process [19] formed by the arrival
YW` of all the nodes that have
ever entered the system, and
3ZaQ\]Z_^?T the counting
process formed by the arrival of all nodes that are present
at the network at time T . In Figure 1(b), the former process
is depicted by a sequence of solid points along the top time
axis, while the latter process is formed Yby ` the sequence of
circles on the bottom time axis. Indeed,
Zab\cZd^cT
is a non-homogeneous Poisson process (NPP), as stated by the
following lemma. (The proofs of some lemmas and theorems
are provided in our technical report Y[29].)
`
Z"G\eZ
^fT is a
Lemma 1: The counting process
non-homogeneous Poisson process (NPP) with rate function
g
3Zh;Ci
-
T
Z""Q\?Z_^]T-j
N(t) = PP(tau)
Hence,
0
T
L
3T
d
Pr U ^? d
3T:
d
g
3Z- %Z
.
3ZBA/%Z
=
. g
P 3ZBA/%Z
.
3Z 1Z
. =
P
P
which gives
0
, 3
* 3h
j
3Z A0%Z
. =
P
3Z A0%Z ( we
Letting T12 and noting that 43 =
:
P
BA( which establishes the lemma.
have *3d =
X
Y
0
L
T
X: age
Y: residual lifetime
L: lifetime
. . ))
N'(t) = NPP(lambda(
(a)
Fig. 1.
T
(b)
The point processes formed by the arrival of nodes.
Next,
Y ` we apply the generalized campbell theorem [19]
to
3Z to capture the characteristics of the arrival times
(thus the ages) of present nodes. This theorem is restated in
Lemma 2.
Lemma 2: [19, page 227] Let E aE j/j/j aE be the
g
9 2
event times in an NPP ; let
be i.i.d.
2 9 j/j/j/
random variables with distribution Pr F\K dd3"d3Td ,
g
where :;
, and 2 9 /j j/j[ Y the order
P
j/j/j . Then, conditioned on 3Z statistics of
With the above conclusion, we can solve the distribution of
, the node’s residual lifetime.
Lemma 4: As T+5 , the density function of the residual
lifetime of a randomly chosen node is given by
V
*6 871 2
9
f 2 9 j/j j/
@E 2 aE 9 j/j/j/aE
[j
Lemma 2 states that the joint distribution of the arrival times
of the present nodes is in distribution equivalent to that of
2 9 j/j/j/ , which has a marginal distribution which is
easy to manipulate. With this lemma, we can proceed to obtain
the density function of U .
Lemma 3: As T , the density function of the age of
a randomly chosen node is given by
*h
=
(
3 A j
(1)
Proof: Assume that at time T there are
` ` active nodes
`
in the system, whose arrival times are E E /j j/j/ E , re
2
9
$%/j j/j
spectively. Define i.i.d. random variable FD
and their order statistics as in Lemma 2. Let indicator :
represent one when event occurs and zero otherwise. In an
ordinary DHT, the selection of a node for general purposes
such as finding its neighbors is independent of the node’s
properties such as arrival time and age, so the arrival ` time
of a selected node can be seen as equiprobable for all E F D
a$1/j j/j[ . Thus,
Pr
U ?
"!
Y `
#
F%$ 2
3T:
`
Y `
3T
E F ]
& 3 T:(',j
Y `
Z"Z*)?
Applying Lemma 2 to the NPP
that all F ’s are symmetric, we have
Pr
U K"
"!
Pr
Y `
#
3T: +
3T
and observing
F K-'
F,$ 2
:T
]
^
T
d
j
2
d3Td
Moreover,
(
=
(i9
71BABj
(2)
)9
j
= V A $<(
* 6 7% * 3 , which
(3)
Notice that
means that a node’s
residual lifetime is in distribution equivalent to a node’s
age. Also from Eq.(3), it can be seen that = V A
= 'Ad
)9 (i9 "+$<(89 = VX9[A3+$<( , indicating that, provided
the network model, the mean residual lifetime of participating
nodes is even greater than the node’s mean lifetime. This
somewhat anti-intuitive result explains the measurement observation in [28]: that in a steady-state network, a majority of
participating nodes in the system are long-lived nodes, while
the remaining short-lived nodes turn over at a high rate. An
important implication of this fact is that, while a reliabilityignorant multicast protocol may have a poor delivery ratio
due to a few highly transient nodes on the path, the existence
of the many long-lived nodes provides the opportunity to
achieve a high delivery ratio without occurring significant load
imbalance, if the underlying DHT were able to provide a
reliability-aware routing service that avoids passing through
those unstable nodes.
The above relationship between node lifetime and residual
lifetime is particularly stressed in [7], where it is found
that random selection of replacement from all existing nodes
after some node fails produces surprisingly lower churn than
choosing replacement from a fixed set of nodes. It should also
be noted that Lemma 4 achieves the result that has been used
by Leonard et al. [14]. However, the results are obtained in
distinct approaches and under different modeling assumptions.
In [14], the node arrival/departure is modeled as a renewal
process and the residual lifetime distribution is immediately
obtained from existing results of renewal theory. Their model
relies on an important assumption: that the probability that a
newly arriving node finds a neighbor at any point within that
neighbor’s lifespan is equally likely. Unfortunately, it is not
clear under what circumstances this assumption holds, or how
it can be interpreted in a more plausible way. Also in their
simulations, they assume that a leaving node is immediately
replaced by a fresh node with the same lifetime distribution,
an arrival pattern yet to be justified. In contrast, we assume
a Poisson arrival pattern which has been verified by previous
empirical studies [28] [3]. More importantly, our modeling
reveals more details of the stochastic properties of a node’s
age and residual lifetime, which facilitate the analysis of data
delivery reliability in more complicated contexts, as will be
shown later in the paper.
B. Delivery Ratio
Consider a data delivery path from the source node
P to
some receiving node N I;J
. We assume that the two
nodes have a maximum path length of N . Between
the two
2 9 /j j/j 2 .
nodes is a sequence of forwarding nodes
When a forwarding node F on the path fails (departs), its
child node F will try to` find a` substitute for F and re2
establish a new path
F F 2 /j j/j[ , where, in
`
P 2 j/j/j ;
$%/j j/j"D
is a node succeeding
most instances, on the Chord ring. (It is possible that the length of the
new path becomes less than N . We again assume the worst
case where the path length remains unchanged.) We make a
minor modification to the Scribe protocol so that the nodes
2 9 j/j j/ F 2 need not to be` changed: when 1F ` fails, its
child F finds a substitute F and requests that F directly
2
connects to its original grandparent 1F , the original path
i2
from F
2 to P being re-used. Now, the path P 2 /j j/j/
will have only one forwarding node replaced when F fails.
An important consequence of this is that the replacement of
forwarding nodes on the path becomes independent of each
other, and so the modeling analysis can be greatly simplified.
From the viewpoint of the system, this change obviates the
need to destroy the previously established path and thus
reduces the communication cost. In addition, this modification
is easy to implement – recall that a Chord node already
maintains a successor list for each of its neighbors for the
purpose of fault tolerance [27].
Now, each forwarding node on the path has two states:
normal state and failure state. The normal period is equivalent
to the residual lifetime V of a randomly chosen node among
the present nodes in the network. The failure period includes
failure detection and finding a substitute node, which we
assume takes a random time , called the fixing time. We can
therefore treat a forwarding node’s evolution as an alternating
renewal process [19], and using Smith’s theorem, we obtain
the stationary probability of a forwarding node being found in
= A3 . Since all forwarding
a normal state + = V A% = V A
nodes on the path are independent of each other in their own
renewal processes, the joint probability of their status being
normal simultaneously is simply i2 , which corresponds to
the probability of the receiving node not being in starvation.
Thus the following result concerning the delivery ratio seen
by the receiver node holds.
Theorem 1: In a DHT network, the worst-case stationary
delivery ratio between two nodes that are at most N hops apart
is given by
= V
A
=V A
= :A
i2
( 9
2
(i9 )9
j
) 9 <$ (( = :A
(4)
Theorem 1 provides an estimate for the worst-case delivery
ratio regardless of the actual node lifetime distributions (such
as Pareto, lognormal, Weibull, etc.). As an example, Table I
shows the worst-case delivery ratios between two nodes with
Pareto node lifetime distribution for varying maximum path
lengths and mean fixing time = :A s. As we often do through
out the paper, the parameters ! and are set to and
respectively, such that the mean lifetime ( .1j hours and
the mean residual lifetime = V A-
hour. In the table, the
worst-case delivery ratio drops noticeably as the maximum
path lengths or mean fixing time increases. For example, for
N+ and = :A minute, the worst-case delivery ratio is
only slightly higher than 60%, a level far from satisfactory for
many applications.
Mean fixing
time 30 seconds
1 minute
2 minutes
Maximum path length 10
20
30
92.8%
85.4%
78.6%
86.2%
73.0%
61.9%
74.4%
53.6%
38.6%
TABLE I
W ORST- CASE DELIVERY RATIO FOR VARYING S AND
S .
C. Maximum Node Stress
In a DHT network, a node can be responsible for at most
IDs with high probability (whp)
(balls-in-bins model [17]); each of these IDs corresponds to
a virtual node which has an in-degree of at most H , so the
following theorem holds.
Theorem 2: With high probability, the maximum
node
stress of an -node DHT network is 9 (L Q*) +, .
@$ML Q !#" $%!&"'%!&"
IV. O PTIMIZING S CHEMES
) +,-) +,
AND
A NALYSIS
Given a fixed path length, Theorem 1 suggests two ways to
improve the delivery ratio: increase = V A and decrease = :A .
In this work, we focus on the first approach. The general idea
of increasing = V
A is to give preference to nodes that have
stayed alive for a relatively long period of time when selecting
forwarders in the delivery path. In this section, we introduce
three schemes that can assist with this. Note that we do not
elaborate on low-level protocol details here; rather, we focus
on the main ideas and modifications on the original multicast
and DHT protocols.
A. The Senior Member Overlay (SMO) Scheme
The SMO scheme organizes nodes above a certain threshold
age into a special overlay, called a senior member overlay
(SMO), which will take the responsibility of all the forwarding
tasks in data delivery. Now that most young (and thus unstable)
nodes are pushed to the leaf level of the tree, they will not
affect other nodes and thus the data delivery ratio can be
improved. The idea of biased task allocation among heterogeneous nodes (in terms of processing capacity, bandwidth, up
time, etc.) is not new. Our contribution here is the application
of this idea to the new context of DHT-based multicast and a
formalized analysis with respect to data delivery reliability.
The formation of the SMO is simple. The only parameter
involved is the threshold age ; when a node has stayed in
P
the base overlay for , it joins the SMO with the help of
P
some bootstrap node, which can be easily obtained through
the propagation and exchange of node information in the base
overlay.
In the SMO scheme, every publisher is required to join the
SMO. For a non-publisher node, if it is not in the SMO, it
needs to identify on the Chord ring the nearest successor node
that belongs to the SMO, called the SMO successor. When a
node joins a multicast group, if it is already a member of
the SMO, it simply performs the joining routine of multicast
protocol within the SMO; otherwise it asks its SMO successor
to perform the joining routine within the SMO, and then asks
the parent of the SMO successor to add itself as a child. After
this, the SMO successor is dropped from the parent’s child
list.
To help understand the tradeoff between delivery ratio and
load balancing, we define another metric that is related to
P . The SMO fraction, denoted , is the proportion of all
nodes that are selected to join the SMO. Clearly, P is
b
O -quantile, denoted 2 , of a node’s
equal to the age distribution . To calculate this quantile, a node first
estimates the nodes’ lifetime distribution 6 by monitoring
the arrival and departure times of its neighbors and exchanging
this information with others; it then obtains according to
Lemma 3 and finally calculates 2 .
Let random variable V denote the residual lifetime of a
randomly chosen SMO node. (We are interested in only SMO
nodes because they are the forwarding nodes for data delivery.)
The following lemma concerning the density function of V
holds.
Lemma 5: In a DHT network with the SMO scheme, as
T+5 ,
* 6
871h ` = , 7% A (5)
where
`
( (
P
(
ZBA4%Z @Q^\
(i9 )9
= V[A
` j
$<(
=
moreover,
(6)
Now, the worst-case data delivery ratio can be obtained
using the asymptotic results of renewal theory, as done in
Theorem 1.
Theorem 3: In a DHT network with the SMO scheme, the
worst-case stationary delivery ratio of two nodes that are at
most N hops apart is given by
2
(i9 )9
(7)
`
( 9 ) 9 $<( = :A
`
h3ZBA%Z @Q^\ j
where ( (
4 =
P
When node lifetime follows a Pareto distribution, the worst
-
case delivery ratio has an elegant expression. The following
corollary can be obtained after simple integration.
Corollary 1: For Pareto node lifetime distribution, the
worst-case stationary delivery ratio of two nodes that are at
most N hops apart in a DHT network with the SMO scheme is
2
(i9 )9
j
( 9 ) 9 $ (( = :A
(8)
Eq.(8) differs from Eq.(4) only in the factor
in the
denominator, which clearly shows the impact of on delivery
ratio. This is further demonstrated in Figure 2, which shows
the worst-case delivery ratios under the SMO scheme for
varying values of and = :A . The node lifetime model is
the same as that used in Table I. It can be seen that the use
of SMO can effectively improve the worst-case delivery ratio;
moreover, the smaller the SMO, the higher this ratio. However,
a small SMO may result in more load imbalance between the
SMO nodes and non-SMO nodes. This tradeoff is quantified
by the following theorem.
Theorem 4: With high probability, the maximum node
stress
of an -node network with the SMO scheme is
L 2 39( ) +, " ! j
) +,-) +, B. The Longer-Lived Neighbor Selection (LNS) Scheme
LNS makes use of the flexibility of neighbor selection
provided by some DHT algorithms such as Chord, Pastry, and
Kademlia. In Chord, for example, it is possible for a node
to choose its D th neighbor from a subset of nodes,
called a
F F
$ $ 2/ [13]. This
candidate subset, in the range =
enables a node to choose reliable neighbors by selecting the
oldest node from each candidate set, so that the data delivery
path can be more reliable. Considering that the candidate set
may grow too large as D increases, we let LNS sample at most
#
F
consecutive nodes starting from
$ for the D th neighbor
of node .
We now consider the residual lifetime of a forwarding node
on a delivery path. The following lemma characterizes such a
random variable and further provides its mean for the special
case of Pareto lifetime distribution.
Lemma 6: Let V F @b^ D:^ N be the residual lifetime of
a node’s D th neighbor node in a DHT network with the LNS
scheme, then the density of V F is given by
*
*
$
F
Z ! %Z ' ) i2 %
* "6 $% & 71 (
' $ 3 * 17 ! ( ) P
P
(9)
#
+.-0/ $ F i2 . Specially, if node lifetime follows
where F ,
'
a Pareto distribution,
2
! F 3
F
= V A
(10)
?21 2 ' j
!
2 !
F
'
2
F
1 F increases, setting 4
2
To see the trend of = V A as
'
2
%i2
and 5 6
! , we can expand Eq. (10) as
12 1
2
4
) $ 2>=a9
F
= V 8A 795;: 8< !
'
F 4M < (11)
'
F 4
#
F '
which indicates that = V A grows with F (thus# ) and tends
'
to infinity. Also notice that in the special case of "!G F
and W
, Eq. (10) reduces to = V A ,%@!
$+ +
= V
A ,
1.0
1.0
0.9
0.9
0.9
0.8
0.8
Worst-case delivery ratio
Worst-case delivery Ratio
0.8
0.7
0.6
0.5
E[R] = 30 sec
E[R] = 60 sec
E[R] = 90 sec
E[R] = 120 sec
0.4
0.3
0.2
0.2
0.7
0.6
E[R] = 30 sec
E[R] = 60 sec
E[R] = 90 sec
E[R] = 120 sec
0.5
0.4
0.4
0.5
0.6
0.7
Fig. 2. Impact of SMO fraction
delivery ratio.
0.8
0.9
1.0
1
1.1
2
0.6
5
10
i2
F,$ 2 1 #
F,,
+.-0/ $ F 20
30
60
Plain DHT, E[R] = 60 sec
Plain DHT, E[R] = 90 sec
RRS DHT, E[R] = 60 sec
RRS DHT, E[R] = 90 sec
0.5
0.4
0.3
100 150
10
15
on worst-case
2
2
1
2 ! F 3 ! ? 2 '
1
Fig. 3. Impact of maximum candidate size worst-case delivery ratio.
! F3
' ; 2 ! = :A
F
'
2 (12)
where
.
Figure' 3 shows the worst-case delivery ratios under the LNS
scheme for varying candidate set sizes and mean fixing times.
The node lifetime model is the same as that used in Table I. As
expected, the worst-case delivery ratio improves substantially
#
#
as
varies from 1 to 100. A large , however, implies more
significant load imbalance. The following theorem shows a
#
linear relationship between
and the maximum node stress.
Theorem 6: With high probability, the maximum node
stress
for an -node DHT network with the LNS scheme is
L 9( Q ) +,
.
) +,-) +,
C. The Reliable Route Selection (RRS) Scheme
Although in its original proposal, Chord greedily routes a
message towards a destination in decreasing ID distances, the
order of the distances is in no way essential to the correctness
and efficiency of routing. In other words, in terms of total hop
count, a route that spans a sequence of distances is equivalent
to any route that spans a permutation of that sequence, if
that permutated sequence could be achieved. For example,
the route traversing a node sequence (with distance
sequence ) is equivalent to the route traversing nodes
1 (with distance sequence ). This flexibility provides
some room for choosing stable nodes for a data delivery path.
Specifically, at each node the RRS scheme chooses the oldest
node from a set of neighbors, called the candidate set, as the
next hop. The candidate set is selected in such a way that
the total hop count will not be increased. Here we analyze an
approach to achieving this: if the distance of a node to some
20
25
30
35
Maximum path length
Maximum candidate set size (K)
which corresponds to the node residual lifetime in a plain
DHT.
The following theorem gives the delivery ratio result for
DHT network under the LNS scheme.
Theorem 5: In a DHT network where nodes’ lifetimes
follow a Pareto distribution and the LNS scheme is applied,
the worst-case stationary delivery ratio of two nodes that are
at most N hops apart is given by
0.7
0.3
0.3
SMO Fraction (delta)
Worst-case delivery ratio
1.0
Fig. 4. Worst-case delivery ratio as a function of
maximum path length .
on
destination is expressed as a binary number, then neighbor D
is selected to the set if that binary number has a 1 in the D th
position. This can be thought of as a binary string having its 1
bits cleared one at a time as a message routes from the source
to the destination, and hence we call it bit-clearing. This
heuristic is originally proposed in [13] for achieving network
proximity; it, however, can guarantee a fixed hop count only
in a fully populated DHT. Some other heuristics are possible
and we will discuss these in Section V.
In the following, we assume two nodes that are initially $
distance apart on the ring, then starting from the receiving
node, the first node can choose any of its N
neighbors as its
next hop, the next node has N
$ possible next hops, and so
on to generate a route. Following the same line of Lemma 6
and Theorem 5, we can obtain the following results.
Lemma 7: Let V F ^ D ^ N@ be the
of
residual lifetime
the D th forwarding node on the path
P 2 /j j/j[ , where
P is the source node and the receiving node, in a DHT
network with the RRS scheme. Then the density of V F is
given by
D
*
3
* 6 $ 7% F
* ( P
1
7
*
!
P
F
3Z ! 1 Z ' i 2 %j
(13)
Specially, if node lifetime follows a Pareto distribution,
2
!D3
F
= V A
j
] 1 2
!
2 !
D
2 nodes’
1
In a DHT network
where
(14)
Theorem 7:
lifetimes
follow a Pareto distribution and the RRS scheme is applied,
the worst-case stationary delivery ratio of two nodes that are
at most N hops apart is given by
2
!D3
0
2
1 ] 2 ;
2
D
3
!
D
!
!
,F $
2
2 2 1
1
i2
= :A
(15)
Theorem 8: With high probability, the maximum node
stress of an -node DHT network with the RRS scheme is
L 9( Q*) +, .
+,
Figure) +, 4) shows
the delivery ratios under the RRS scheme for
varying path lengths and mean fixing times. The node lifetime
model is the same as that used in Table I. As can be seen, the
j
RRS scheme leads to a considerable improvement in the worstcase delivery ratio. Moreover, the curves for the RRS scheme
have smaller slopes than those for a plain DHT, indicating
that as the path length increases, the worst-case delivery ratio
under the RRS scheme drops at a smaller speed than in a plain
DHT. This is because a longer path provides larger candidate
F
sets, which in turn increases = V A . This partly compensates
for the loss of delivery ratio due to the increase in path length.
D. Comparison of the Schemes
Besides the differences in delivery ratio and maximum node
stress, the three schemes differ in a number of other respects,
including applicability to different DHT algorithms, control
flexibility, implementation cost, and communication overhead.
The SMO scheme does not rely on any particular underlying
DHT geometries, so it can be applied to all types of DHT
network. It provides the parameter SMO fraction to balance
between data delivery ratio and load balancing, and therefore
has good control flexibility. To implement this scheme, the
original multicast protocol needs to be strengthened to maintain the age information of nodes and be aware of the existence
of two overlays, whereas the underlying DHT algorithm need
not be changed. The major drawback of this scheme is that
it requires a fraction of nodes to stay in two DHT overlays
simultaneously, which means a higher node overhead and
message traffic for those nodes.
The LNS scheme makes use of the flexibility of neighbor
selection, which is unavailable for some DHT algorithms such
as CAN and
# de Bruijn. Like SMO, the scheme also provides a
parameter to balance between reliability and load balancing.
This scheme only requires minor modification to existing
DHT algorithms, which is transparent to upper-layer multicast
protocols. The extra overhead imposed on the nodes is small
because the nodes only need to sample a limited number of
nodes to choose the oldest ones as its neighbors.
The RRS scheme relies on the flexibility of route selection,
which can only be provided by Chord and CAN in our cases.
Therefore, this scheme has the least applicability in terms of
DHT choice. Like LNS, it has minimal implementation cost
and running overhead.
V. S IMULATION S TUDY
A. Methodology
An event-driven simulator is developed with the Chord and
Scribe protocols implemented. Nodes enter the DHT network
in a homogeneous Poisson process such that the average
number of DHT nodes remains at approximately 800,000.
Upon joining the DHT, some nodes also participate in one of
30 multicast groups with equal probability; the total number of
multicast members remains at approximately 300,000. Since
the reliability of data delivery is our focus, the simulator
does not model network latency. Two lifetime distributions
are tested: Pareto distribution and Lognormal distribution as
observed by [26] and [30]. The Pareto distribution has the
same parameters as those used in Table I and the Lognormal
distribution has the scale and shape parameters set to 5.5 and
2, respectively, such that both distributions have a mean of
around 30 minutes. We wish to see how the variations in
lifetime distribution affect multicast reliability and how the
optimizing schemes adapt to them.
In the simulator, the data loss is only caused by node
departures (without notification). The total failure detection
and recovery time is assumed to be uniformly random between
= 1M<A seconds. We call the accumulative time a node spends
in failure detection and recovery its failure time, and define
the ratio of the failure time to its session time (current time
minus arrival time) as the data loss ratio (or loss ratio), which
is equal to
the delivery ratio.
Other performance metrics include node stress and path
length. For loss ratio and path length, we only report the
statistical results of all the leaf nodes, as they reflect the worstcase performance of the multicast. All the following results
are taken from a typical snapshot of the network after it has
evolved for 3.6 hours.
B. Simulation Results
1) Data loss ratio: Figures 5(a) and 5(b) show the cumulative distribution functions (CDFs) of loss ratio under the SMO
scheme. It can be seen that the SMO scheme considerably
reduces the loss ratios. In the Pareto case (Figure 5(a)), for
example, an SMO fraction of 0.1 reduces the mean loss ratio
by nearly a half (from 18.0% to 9.3%); and the fraction
of nodes whose loss ratios are below 10% improves from
22.6% to 68.9%. Similar observations can be made in the
Lognormal case (Figure 5(b)). Compared with the Pareto case,
the Lognormal case has consistently lower mean loss ratios.
Lemma 4 helps us understand the case for SMO fraction (plain DHT): the mean residual lifetime = V A is 1 hour for the
Pareto lifetime distribution, whereas with Lognormal lifetime
j hours, which means a higher
distribution, = V A 7
delivery ratio according to Theorem 1. (The path lengths under
the two cases are indeed very close to each other. The results
are reported in [29].) Lemma 5 and Theorem 3 can explain
the difference in curves for the other SMO fractions.
Figures 6(a) and 6(b) show the CDFs of loss ratios under
the LNS scheme, which demonstrate a noticeable improvement
by LNS scheme. The difference between the Pareto and
Lognormal cases lies in the sensitivity of loss ratio to the
#
changing value of : in the Lognormal case, the loss ratio
#
improves more significantly with small values of . For
#
example, the loss ratio drops by 49% as
grows from 1
#
to 4, whereas a reduction of only 16% can be observed as
grows from 5 to 16.
The loss ratios under the RRS scheme with Pareto lifetime
distribution are shown in Figure 7. The bit-clearing heuristic
results in a substantially higher mean loss ratio than in the
plain DHT. This is because in a non-fully-populated DHT,
the distance span for each hop is not necessarily a power of
2, thus the binary string of the distance may often find new
1 bits appear when messages route from the source to the
destination, which results in a larger number of hop counts
and hence an increased loss ratio than in the plain DHT.
SMO fraction = 0.5
Mean = 14.6%
20
SMO fraction = 1 (plain DHT)
Mean = 18.0%
1.5625
3.125
6.25
80
12.5
25
50
60
SMO fraction = 0.1
Mean = 6.1%
SMO fraction = 0.5
Mean = 11.7%
40
20
SMO fraction = 1 (plain DHT)
Mean = 14.9%
0
100
1.5625
3.125
Data loss ratio (%)
6.25
12.5
25
50
100
Candidate set size K=1
Candidate set size K=4
Candidate set size K=16
100
60
L = 16
Mean = 10.4%
Cumulative % of nodes
Cumulative % of nodes
80
L=4
Mean = 14.6%
40
20
K = 1 (plain DHT)
Mean = 18.0%
0
1.5625
3.125
6.25
12.5
Data loss ratio (%)
25
60
Candidate set size K=1
Candidate set size K=4
Candidate set size K=16
K = 16
Mean = 6.4%
K=4
Mean = 7.5%
40
20
K = 1 (plain DHT)
Mean = 14.9%
0
50
100
1.5625
3.125
6.25
12.5
25
50
100
Data loss ratio (%)
(b)
(a)
Fig. 6. CDF of Data loss ratio. (a) LNS + Pareto lifetime distribution; (b)
LNS + Lognormal lifetime distribution.
In view of this, we consider another heuristic for the
selection of next hop nodes. Assuming the distance of a
and the age of that neighbor is , then the
neighbor is
neighbor with the maximum value of
@!S
M is
selected as the next hop node. The heuristic trades off total
hop count against the choices of stable nodes, and the power
of age ! determines the effect of age. When !Gc , the routing
scheme reduces to that of a plain DHT. Figure 7 shows the
CDFs of loss ratio for three values of ! . For ! , the
loss ratio is still higher than that of the plain DHT, which
implies the negative effect of increased hop count exceeds the
positive effect of stable node selection. Lowering ! from 1
to 0.5 remedies this situation slightly, making the loss ratio
very close to that of the plain DHT. Further reduction of ! ,
however, have very little effect on the loss ratio under the
Pareto distribution. (The results are not shown to preserve the
clarity of the figures.)
The stable node selection appears to be more effective for
a Lognormal lifetime distribution (Figure 7(b)), although the
improvement is only moderate: for !]e j , the loss ratio is
reduced by 31%; the bit-clearing heuristic still performs worst.
2) Node stress: Figure 8 shows the distribution of node
-
stress under the SMO scheme. When ^ , a fraction
of nodes are outside the SMO and have a node stress of zero,
so the total number of zero-stress nodes are larger than that of
a plain DHT. For example, the number of zero-stress nodes is
105,695 for a plain DHT, whereas this figure grows to 150,010
for the SMO scheme with Q 1j . On the other hand, since
all the forwarding tasks are assigned to only a fraction of the
nodes, there are likely to be more nodes with high stress values
than in a plain DHT. This is shown by the bars in Figure 8
Power of age = 0 (plain DHT)
Mean = 18.0%
Power of age = 1
Mean = 19.1%
40
20
Power of age = 0.5
Mean = 17.9%
Bit-clearing
Mean = 34.6%
1.5625
3.125
6.25
12.5
25
50
Power of age = 0
Power of age = 1
Power of age = 0.5
Bit fixing
Power of age = 1
Mean = 10.4%
60
40
Power of age = 0
(plain DHT)
Mean = 14.9%
Power of age = 0.5
Mean = 11.4%
20
Bit-clearing
Mean = 15.4%
0
100
1.5625
3.125
6.25
12.5
25
50
100
Data loss ratio (%)
(b)
(a)
(b)
80
80
Data loss ratio (%)
Fig. 5. CDF of Data loss ratio. (a) SMO + Pareto lifetime distribution; (b)
SMO + Lognormal lifetime distribution.
100
100
0
Data loss ratio (%)
(a)
60
Plain DHT
Power of age = 1
Power of age = 0.5
Bit-fixing
Cumulative % of nodes
SMO fraction = 0.1
Mean = 9.3%
0
100
80
60
40
SMO fraction = 1
SMO fraction = 0.5
SMO fraction = 0.1
Cumulative % of nodes
Cumulative % of nodes
80
100
SMO fraction = 1
SMO fraction = 0.5
SMO fraction = 0.1
Cumulative % of nodes
100
Fig. 7. CDF of Data loss ratio. (a) RRS + Pareto lifetime distribution; (b)
RRS + Lognormal lifetime distribution.
for stress values ranging from 12 to 36. Generally, the smaller
the SMO fraction, the more nodes are distributed near both
ends of the range of stress values. Similar observations can be
made from Figures 9 and 10, which depict the distributions of
node stress under the LNS and the RRS schemes, respectively,
although the latter figure shows a smaller deviation of stress
distribution from that of the plain DHT.
3) Path length: Although it is an important factor for data
delivery reliability, path length is critical for many applications
in its own right. Figure 11 shows the mean path length of the
various schemes. It can be seen that the SMO and the LNS
schemes can slightly shorten the multicast paths, especially
with a smaller SMO fraction or a large candidate set size.
This is because a smaller set of forwarding nodes reduces the
number of necessary intermediate nodes on the path. Consider
in the extreme case where only one forwarder# is present
(corresponding to an SMO with a single node or
), the
overlay graph assumes to a star-like structure, and the path
length between an pair of nodes would be only 2.
The RRS scheme with the bit-clearing heuristic yields very
large path lengths, which verifies the qualitative discussion in
Section V-B.1. For the heuristic using the product of distance
and age, the path length is still longer than that of the plain
DHT – a consequence of sacrificing small hop counts for
choices of high reliability routes.
VI. R ELATED W ORK
From the perspective of stochastic modeling, perhaps the
closest to our work is by Leonard et al. [14]. Based on a similar
node lifetime model to ours (see the discussion of differences
in Section III-A), they analyze the resilience of general peerto-peer networks and derive the expected delay before a user is
isolated from the network and the probability of this occurring
within his/her lifetime. They model the evolution of a node’s
neighbors as a superposition of renewal processes, and then
obtain the limiting probability of at least one neighbor being
alive using renewal theory. We also use renewal theory to
analyze the normal probability of an intermediate node on a
delivery path, but our focus is on the probability of all nodes
being at the normal state simultaneously.
For the resilience of DHT networks, Gummadi et al. [13]
identify several representative routing geometries and analyze
their degrees of flexibility which benefits static resilience and
17
2
16
2
15
2
14
1x2
13
1x2
12
2
11
2
10
2
9
2
8
SMO fraction = 1 (Plain DHT)
SMO fraction = 0.5
SMO fraction = 0.1
Number of nodes
Number of nodes
2
2
17
2
16
2
15
2
14
1x2
13
1x2
12
2
11
2
10
2
Candidate set size = 1 (plain DHT)
Candidate set size = 4
Candidate set size = 8
9
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
Node stress value
Node stress value
2
17
2
16
2
15
2
14
1x2
13
1x2
12
2
11
2
10
2
Fig. 9. Node stress under the LNS
scheme.
Power of age = 0 (plain DHT)
Power of age = 0.5
Power of age = 1
Path length (hop count)
Number of nodes
Fig. 8.
Node stress under the
SMO scheme.
9
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
5th and 95th percentiles
plain DHT SMO,0.5 SMO,0.1
Node stress value
LNS,4
LNS,16
RRS,1 Bit-clearing
Scheme
Fig. 10. Node stress under the
RRS scheme.
Fig. 11. Path lengths under different schemes.
proximity. Loguinov et al. [15] examine graph-theoretic properties of several DHTs and analyze their routing performance
and fault resilience. Stutzbach et al. [28] characterize the churn
of P2P networks through empirical experiments.
In the field of application-layer multicast, reliability has
been a topic of enduring interest. The early work of Chu et
al. [12] and Padmanabhan et al. [20] proposes some simple
and effective techniques for improving the multicast reliability.
Castro et al. [9] propose to use multiple trees to improve the
resilience of streaming to interruptions. In [25], a number of
single-tree construction algorithms are proposed and evaluated
using traces from large-scale commercial systems. Due to the
lack of a generic supporting overlay topology, these optimizing
schemes are somewhat ad hoc, and are therefore difficult to
model and evaluate using an integrated framework.
VII. C ONCLUSIONS
This paper investigates the reliability of DHT-based multicast. The contributions include: (1) A node residual time
model, which is fundamental to our analysis and we believe
will be a useful tool in other contexts of overlay-based applications; (2) A renewal-theory-based model for stationary delivery
ratio, which provides a worst-case estimate for the reliability
of data delivery between two nodes in the DHT network; (3)
Three optimization schemes and analysis of their reliability;
and (4) simulation experiments which provide insight into the
performance of DHT-based multicast from a number of major
respects. In the future, we will consider how the model and the
optimization schemes can be applied to bandwidth-demanding
applications such as video streaming.
R EFERENCES
[1] RSSDHT. http://sourceforge.net/projects/rssdht/
[2] Bamboo DHT project. http://bamboo-dht.org/
[3] K. C. Almeroth and M. H. Ammar. Collecting and Modeling the
Join/Leave Behavior of Multicast Group Members in the MBone. Proc.
of the High Performance Distributed Computing (HPDC), 1996.
[4] S. Banerjee, S. Lee, B. Bhattacharjee, and A. Srinivasan. Resilient
multicast using overlays. ACM SIGMETRICS 2003.
[5] A. R. Bharambe, S. G. Rao, V. N. Padmanabhan, S. Seshan and H.
Zhang. The Impact of Heterogeneous Bandwidth Constraints on DHTBased Multicast Protocols. IPTPS, 2005.
[6] M. Bishop, S. Rao, and K. Sripanidkulchai. Considering Priority in
Overlay Multicast Protocols under Heterogeneous Environments. In
Proc. of INFOCOM 2006.
[7] P. B. Godfrey, S. Shenker, and I. Stoica. Minimizing Churn in Distributed
Systems. Proc. of SIGCOMM 2006.
[8] F. E. Bustemante and Y. Qiao. Friendships that last: peer lifespan and
its role in P2P protocols. WCW workshop 2003.
[9] M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron
and A. Singh. SplitStream: High-bandwidth multicast in a cooperative
environment. Proc. of SOSP 2003.
[10] M. Castro, P. Druschel, A-M. Kermarrec and A. Rowstron. Scribe: A
large-scale and decentralised application-level multicast infrastructure.
IEEE Journal on Selected Areas in Communications (JSAC). Oct., 2002.
[11] Y. Chu, A. Ganjam, T. S. E. Ng, S. G. Rao, K. Sripanidkulchai,
J. Zhan and H. Zhang. Early Experience with an Internet Broadcast
System Based on Overlay Multicast. USENIX 2004 Annual Technical
Conference.
[12] Y. Chu, S. Rao, and H. Zhang. A Case for End System Multicast. Proc.
of ACM SIGMETRICS, June 2000.
[13] P. K. Gummadi, R. Gummadi, S. D. Gribble, S. Ratnasamy, S. Shenker
and I. Stoica. The impact of DHT routing geometry on resilience and
proximity. Proc. of ACM SIGCOMM 2003.
[14] D. Leonard, V. Rai, and D. Loguinov. On lifetime-based node failure
and stochastic resilience of decentralized peer-to-peer networks. Proc.
of ACM SIGMETRICS 2005.
[15] D. Loguinov, A. Kumar, V. Rai, and S. Ganesh. Graph-theoretic analysis
of structured peer-to-peer systems: routing distances and fault resilience.
Proc. of ACM SIGCOMM 2003.
[16] P. Maymounkov and D. Mazieres. Kademlia: A Peer-to-peer Information
System Based on the XOR Metric. IPTPS, 2002.
[17] M. Mitzenmacher and E. Upfal. Probability and Computing . Cambridge
University Press, 2005.
[18] F. Kaashoek and D. R. Karger. Koorde: A simple degree-optimal
distributed hash table. IPTPS 2003.
[19] V. G. Kulkarni. Modeling and Analysis of Stochastic Systems. Chapman
& Hall Ltd. ISBN: 0-41204-991-0, 1996.
[20] V. N. Padmanabhan, H. J. Wang and P. A. Chou. Resilient Peer-to-Peer
Streaming. Proc. of ICNP 2003.
[21] S. Ratnasamy, M. Handley, R. Karp, S. Shenker. Application-level
Multicast using Content-Addressable Networks. Proc. of International
Workshop on Networked Group Communication (NGC) 2001.
[22] S. Ratnasamy, P. Francis, M. Handley, R. Karp and S. Shenker. A
Scalable Content-Addressable Network. Proc. of SIGCOMM 2001.
[23] S. Rhea, B. Godfrey, B. Karp, et al. OpenDHT: A Public DHT Service
and Its Uses. Proc. of SIGCOMM 2005.
[24] A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. IFIP/ACM Intl.
Conference on Distributed Systems Platforms 2001.
[25] K. Sripanidkulchai, A. Ganjam, B. Maggs and H. Zhang. The feasibility
of supporting large-scale live streaming applications with dynamic
application end-points. Proc. of ACM SIGCOMM, 2004.
[26] K. Sripanidkulchai, B. Maggs and H. Zhang An analysis of live
streaming workloads on the Internet. SIGCOMM IMC 2004.
[27] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan. Chord:
A Scalable Peer-to-Peer Lookup Service for Internet Applications. Proc.
of ACM SIGCOMM 2001.
[28] D. Stutzbach, and R. Rejaie. Understanding Churn in Peer-to-Peer
Networks. SIGCOMM IMC 2006.
[29] G. Tan and Stephen A. Jarvis. On the reliability of DHT-based multicast.
Technical Report CS-TR-06, University of Warwick, 2006.
[30] E. Veloso, V. Almeida, W. Meira, A. Bestavros, and S. Jin. A Hierarchical Characterization of A Live Streaming Media Workload. IEEE/ACM
Trans. on Networking, 14(1), 2006.
[31] S. Q. Zhuang, B. Y. Zhao, A. D. Joseph, R. H. Katz, J. D. Kubiatowicz.
Bayeux: An Architecture for Scalable and Fault-tolerant Wide-area Data
Dissemination. Proc. of NOSSDAV 2001.
Download