Scalable Content Distribution with Reliable Multicast from Multiple

advertisement
Scalable Content Distribution with Reliable
Multicast from Multiple Mirror Sites using
Tornado Codes
EE228A Project Report
Wei Wang and Abhishek Ghose.
Abstract— Mirror sites enable client requests to be serviced by any of a
number of servers, reducing load at individual servers and dispersing
network load. Typically, a client requests service from a single mirror
site. We consider enabling a client to access a file from multiple mirror
sites in parallel to speed up the download. To eliminate complex clientserver negotiations that a straightforward implementation of this
approach would require, we develop a feedback-free protocol based on
erasure codes. We demonstrate that a protocol using fast Tornado codes
can deliver dramatic speedups at the expense of transmitting a moderate
number of additional packets into the network. Our scalable solution
extends naturally to allow multiple clients to access data from multiple
mirror sites simultaneously. Our approach applies naturally to wireless
networks and satellite networks as well.
I. INTRODUCTION
Downloading a large file from a heavily loaded server or through a highly
congested link can be a painfully slow experience. Similarly, clients who
connect to the Internet at modem speeds often suffer through interminable
waiting periods to download web pages with graphical content. Hence
efficient techniques are needed for scalable distribution of popular web
content to a large number of receivers. Examples of such web content
include software upgrades and video clips from news sites. Any such
technique must satisfy the following criteria:
 Reliability – The protocol must be able to recover from network errors
and must guarantee transmission of correct data to all the receivers.
 Speedup – The protocol must use efficient techniques to reduce
download times.
 Asynchronous Information Retrieval – The protocol must support
asynchronous retrieval of a single unit of data by a large number of
users, i.e. it must allow a number of users with overlapping access
times to download the same file in parallel.
We propose a novel technique for scalable distribution of popular web
content which satisfies the aforementioned requirements. Our approach
consists of the following main points:
 Multiple mirror sites – To enable fast download times and to
distribute the requests amongst multiple servers so as to perform
server load balancing.
 Feedback-free multicast protocol – To enable the protocol to scale
to an essentially unlimited number of users.
 Tornado Codes – To efficiently implement a multicast protocol in
practice using erasure codes with fast encoding/decoding times.
 Digital Fountain – To provide a generic framework for asynchronous
information retrieval by a number of users.
The rest of the paper is organized as follows. In Section II, we provide a
theoretical background for the various issues involved in designing a
feedback-free multicast protocol. In Section III, we outline our proposal and
discuss related work. In Section IV, we describe and analyze our scheme for
optimal server selection in detail. In Section V, we describe our technique of
source blocking to eliminate distinctness inefficiency. Finally we state our
conclusions and give some suggestion for future work in Section VI.
II. THEORETICAL BACKGROUND
II.A Mirror Sites
The most relevant web technology to our proposal is the use of mirror sites.
The use of mirror sites involves the deployment of a number of servers
replicating the same data at geographically distributed locations, in an effort
to both distribute the load of requests across servers and to make network
connections shorter in length, thereby reducing network traffic.
A limitation of current mirroring technology is that the user must choose a
single mirror site from which to access data. Optimal Server Selection is a
widely studied problem in literature. While the choice of server may appear
obvious when the number of mirror sites is small, the work of [1],[5]
indicates that the obvious choice is not always the best choice and dramatic
performance improvements can result from more careful selection. This
selection process can be automated by methods such as statistical record
keeping [2], or by dynamic probing. In statistical record-keeping, a database
is maintained at each client which has a record of the download time at
previous attempts from each mirror server. This record is used to predict
future performances of each mirror server and to choose the optimal server.
II.B The Need for a Feedback-free Multicast Protocol
Our first objective is to enable users to download data from multiple mirror
sites in parallel in order to reduce downloading time. This technique not only
has the potential to improve performance substantially over a single server
approach, but can eliminate the need for a complex selection process. For
some applications, such as software distribution, mirrored data may be
requested by a vast number of autonomous clients whose access intervals
may overlap with one another. In such a situation, rather than having each
client establish a set of point-to-point connections to multiple mirror sites in
parallel, we propose a more general system by which mirror sites establish
multicast groups to transmit data. Provided that the transmissions of these
multicast groups are synchronized so as to keep the transmission of
duplicates to a minimum, a client could subscribe to several multicast groups
in parallel to retrieve the data more quickly. The use of multicast provides
scalability to an unlimited number of clients, each of which may be
subscribing to different subsets of the multicast groups. As an example, such
a system could be used by employees of an international conglomerate to
download daily financial statements via their corporate intranet. Each of a
number of geographically distributed mirror sites would establish a multicast
session to distribute the information. Transparent access from thousands of
clients to multiple mirror sites would be an effective mechanism to speed up
download times and to balance load, especially away from heavily accessed
servers at peak usage times. Though this approach is conceptually simple, it
is not clear how one can derive a simple and workable implementation
around it that delivers superior performance.
While unicast protocols successfully use receiver initiated requests for
retransmission of lost data to provide reliability, it is widely known that the
multicast analog of this solution is unscalable. For example, consider a
server distributing a new software release to thousands of clients. As clients
lose packets their requests for retransmission can quickly overwhelm the
server in a process known as feedback implosion. Even in the event that the
server can handle the requests, the retransmitted packets are often of use
only to a small subset of the clients. Moreover, whereas adaptive
retransmission solutions are at best unscalable and inefficient on terrestrial
networks, they are unworkable on wireless and satellite networks, where the
back channel typically has high latency and limited capacity, if it is available
at all. times.
The problems with solutions based on adaptive retransmission have led
many researchers to consider applying Forward Error Correction based on
erasure codes to reliable multicast [2],[4],[7]. The basics of erasure codes are
outlined in the next section.
II.C Erasure Codes
The basic principle behind the use of erasure codes [3] is that the original
source data, in the form of a sequence of k packets, along with l additional
redundant packets are transmitted by the sender. This n ( = k + l ) packet
encoding has the property that the initial file can be restituted from any k
packet subset of the encoding. For the application of reliable multicast, the
source transmits packets from this encoding, and the encoding property
ensures that different receivers can recover from different sets of lost
packets, provided they receive a sufficiently large subset of the transmission.
In principle, this idea can greatly reduce the number of retransmissions, as a
single retransmission of redundant data can potentially benefit many
receivers simultaneously. In applications such as real-time video,
retransmission may also be undesirable due to timing constraints.
To enable parallel access to multiple mirror sites, the sources each transmit
packets from the same encoding, and the encoding property ensures that a
receiver can recover the data once they receive a sufficiently large subset of
the transmitted packets, regardless of which server the packets came from.
Figure 1: How Erasure Codes Work.
Tornado Codes
Ideally, using erasure codes, a receiver could gather an encoded file in
parallel from multiple sources, and as soon as any k packets arrive from any
combination of the sources, the original file could be restituted efficiently. In
practice, however, designing a system with this ideal property and with very
fast encoding and decoding times appears difficult. Hence, although other
erasure codes could be used in this setting, we suggest that a newly
developed class of erasure codes called Tornado codes are best suited to this
application, as they have extremely fast encoding and decoding algorithms
[3]. Indeed, previously these codes have been shown to be more effective
than standard erasure codes in the setting of reliable multicast transmission
of large files. The price for the fast encoding and decoding is that slightly
more than k distinct packets are required to reconstitute the message.
_
Tornado codes relax the decoding guarantee as follows: to reconstruct the
source data, it is necessary to recover k of the n encoding packets, where 
is greater than 1. We then say that  is the decoding inefficiency. The
advantage of Tornado codes over standard codes (which do not relax the
decoding guarantee and have  equal to 1) is that Tornado codes trade off a
small increase in decoding inefficiency for a substantial decrease in
encoding and decoding times, thus making them far more attractive for large
values of k and l. As with standard codes, the Tornado decoding algorithm
can detect when it has received enough encoding packets to reconstruct the
original file. Thus, a client can perform decoding in real-time as the
encoding packets arrive, and reconstruct the original file as soon as it
determines that sufficiently many packets have arrived.
While fast implementations of standard codes have quadratic encoding and
decoding times for this application, Tornado codes have encoding and
decoding times proportional to ( k+l ln(1/-1) ). Moreover, Tornado codes
are simple to implement and use only exclusive-or operations. A comparison
between the properties of Tornado codes and standard codes is given in
Table I.
In our project, we will use Tornado codes as the basis for our proposed
approach.
Table 1: Comparison of Tornado Codes and Standard Erasure Codes.
II.D Digital Fountain
In this section, we outline an idealized solution to the downloading from
multiple mirrors problem, based on a similar approach to reliable multicast
presented in [2]. A client wishes to obtain a file consisting of k packets from
a collection of mirrored servers. In an idealized solution, each server sends
out a stream of n distinct packets, called encoding packets, which constitute
an encoding of the file. A client accepts encoding packets from all servers in
the collection until it obtains exactly k packets. In this idealized solution,
the file can then be reconstructed regardless of which k encoding packets the
client obtains. Therefore, once the client receives any k encoding packets, it
can then disconnect from the servers. We assume that this solution requires
negligible processing time by the servers to produce the encoding packets
packets and by the client while merging data received from the various
servers and recovering the source data from the n encoding packets. A
consequence of this approach is that there is no need for feedback from the
receiver to the senders aside from initiating and terminating the connections;
in particular, no renegotiation is ever necessary.
The stream of encoding packets produced by a server in this idealized
solution is metaphorically described as a digital fountain, as was done in a
similar approach to reliable bulk transfers presented in [2]. The digital
fountain has properties similar to a fountain of water for quenching thirst:
drinking a glass of water, irrespective of the particular drops that fill the
glass, quenches one’s thirst.
The digital fountain concept is quite general and can be applied in diverse
network environments. For example, it can be applied to the Internet,
satellite networks and to wireless networks with mobile agents. These
environments are quite different in terms of packet loss characteristics,
congestion control mechanisms, and end-to-end latency.
II.E Performance Metrics
The main benefit of our approach is the reduction in download times due to
the parallel access of data from multiple mirror sites. The main cost of our
approach is bandwidth, in the form of a moderate number of additional
packets injected into the network at each of the mirror sites. Accordingly,
our performance metrics reflect these factors:
 Speedup – This is defined as the factor by which the download time
is decreased compared to the single mirror site case.
 Reception Inefficiency – This is defined to be  if the receiver
receives a total of k packets from the servers before it can
reconstruct the file. The reception efficiency comprises two separate
sources of inefficiency: the decoding inefficiency of Tornado codes
outlined in a previous section, and the distinctness inefficiency due to
the arrival of duplicate packets. We define the distinctness inefficiency
to be  where  is the average number of copies of each packet
received by the receiver prior to reconstruction. The reception
inefficiency is then just the product of the decoding inefficiency of
the Tornado code and the distinctness inefficiency, that is, 
III SOLUTION
PROBLEM
TO
FAST
ASYNCHRONOUS
DOWNLOAD
The general framework we use to solve the fast and asynchronous
information retrieval problem comprises the following components: multiple
mirror sites transmitting to each receiver in parallel for significant speedup
of download times; each mirror site encodes the source data using Tornado
codes to ensure reliability against erasures with no feedback and fast
encoding/decoding; each mirror site carousels and multicasts the encoded
data to allow scalable asynchronous retrieval by many receivers.
III.A Proposal
Here we outline two main components of our proposal that improves
performance over past schemes. In the next two sections we will detail each
component and demonstrate the performance gains we claim here.
Past work by Byers et al [8] proposed that each receiver should download
data from all mirror sites in parallel for speedup gains over downloading
from a single site. However the argument was made assuming equal
download rates from all servers, which is not applicable in realistic
situations. In addition, such a scheme violates the original benefit of mirror
sites for load balancing, and distributing and minimizing network traffic
congestion. Previous proposals for mirror site selection [6] required
burdensome network performance monitoring and feedback by the receiver
to control the data sent by different servers. Clearly the feedback and
receiver control of mirror sites does not scale to multiple users. We propose
that optimal speedup of download times can be obtained by a sourcenegotiated mirror site selection scheme, such that each receiver joins the
multicast groups of only a subset of all mirror sites.
Byers et al proposed the use of multiple mirror sites that carousel the same
encoded data such that a receiver need only receive k distinct packets from
any of the sources to decode the source file. However, receiving duplicate
packets from different mirror sites does not further the receiver’s ability to
decode. Duplicate packets waste bandwidth and increases network traffic
unnecessarily. To combat this problem, Byers et al permute the order of the
encoded packets independently at each mirror site for a random offset effect.
This method still leads to some distinctness inefficiency. They discuss
receiver controlled offsets for different mirror sites, which requires feedback
from the receivers and does not scale to multiple receivers (who may want
different offsets). We propose a source-negotiated data blocking scheme
that eliminates the distinctness inefficiency.
IV. SOURCE NEGOTIATED SERVER SELECTION
Simulations from the Byers study demonstrate that if packets arrive at equal
rates from all senders, then with a moderate stretch factor (for example 3 or
4), adding additional sources dramatically speeds up downloads with
minimal additional reception inefficiency. However their own simulations
show that in the more likely case that packets are arriving at different rates
from different senders, that having more senders isn’t necessarily better.
Their results clearly show that having two senders with equal rates gives
higher speedup than having three senders with disparate rates. This suggests
that higher speedup can be obtained by using a few mirror sites with similar
rates, rather than using all the mirror sites with disparate rates. It is worth
noting that the results of an independent study by Rodriguez et al [6] with a
different experimental setup also illustrates the highly disparate rates from
different senders and the gain of using a subset of the senders.
Figure 2. This figure is taken from Byers et al [8]. It is replicated here solely for the
convenience of the reader. The graph on the left shows that with a high stretch factor
(more redundant encoded source packets), the number of duplicate packets received from
different mirror sites (the distinctness inefficiency) is lower (note that the average
decoding inefficiency was held constant at 5.5%). Since our second proposal eliminates
distinctness inefficiency, we will not further analyze this graph here. The graph on the
right shows that if packets arrive from each sender at equal rate, then additional sources
offer higher speedups compared to a single sender. However, consider the case when
packets arrive from different senders at disparate rates. The graph clearly shows that two
senders with equal rates gives higher speedup than three senders with different rates.
In addition, server selection recaptures the benefit of mirror sites to
minimize the length of network connections and therefore geographically
distribute and minimize network traffic.
The client negotiated server selection schemes in the literature does not scale
to multiple users and requires high overhead at the receiver to monitor server
rates. We propose a source negotiated server selection scheme that allows
each receiver to join the multicast groups of a subset of all mirror sites to
derive optimal speedup.
Myers et al found that, “Though mirror server performances (i.e. fetch
times) vary widely, there are only small time-independent variations in the
relative performance (ranks) of the mirror servers.” [5]
periodically probing server performances to derive rankings.
This justifies
Our source negotiated server selection scheme follows:
 Define geographical cells grouping users that close to each other.
 Each server probes a sample of clients in each cell to obtain an
estimate of the download time from a server to members of that cell.
 Servers then share this information and derive a ranked list of server
rates to each cell.
 A server prunes itself from the list of servers for a cell if it’s
performance is worse than a threshold. The threshold is chosen such
that the selected servers for each cell have similar rate (i.e. lie within a
bound).
 Thus members of a cell join only the multicast groups of the selected
servers for that cell.
 The probing process is repeated periodically.
M1
M5
M3
M2
M4
M6
Figure 3. How the server selection scheme works.
We make several reasonable assumptions in our scheme. We assume that
clients grouped in a cell are geographically close to each other and thus
experience similar performance characteristics from each mirror server.
Thus we can probe a sample of clients from each cell to estimate the average
performance of servers to all members of that cell. We further develop the
bottleneck disjoint assumption adopted by many works on mirror sites that
downloads from multiple mirror sites should not be bottlenecked by low
bandwidth at the receiver end. We assume that packets from the selected
mirror sites for a cell to each client in the cell do not interfere with each
other (the original bottleneck disjoint assumption), and also that the paths
from mirror sites to different cells are approximately disjoint (the latter
assumption will be useful for our next proposal).
Our main contribution here is the observation that using a few mirror sites
with similar rates gives higher speedup than using all mirror sites with
disparate rates, and the proposal of a source negotiated server selection
scheme. Past approaches use client negotiated server selection based on
statistical record-keeping or dynamic probing, which is not scalable to a
large number of clients.
V. SOURCE-NEGOTIATED DATA BLOCKING
We propose a source negotiated data blocking scheme that eliminates
distinctness inefficiency and is scalable to multiple users. First consider one
cell with a set of selected servers. At the sender, a source file is encoded
using Tornado codes. The encoded file is divided into blocks whose sizes
are proportional to the relative rates of the servers for a cell, which was
obtained during the probing described in the previous section. Each server
then carousels only that block of data. This scheme eliminates the
distinctness inefficiency of receiving duplicate packets from different
servers.
R1/R
R2/R
R3/R
Encoded file
Server 1
Server 2
Server 3
Figure 4. The source negotiated data blocking scheme.
A server defines a separate blocking for each cell it multicasts to. Thus there
may be multiple carousels running on a server. The bottleneck disjoint
assumption detailed in the previous section ensures that traffic from the
multiple carousels do not interfere with each other.
Our main contribution here again is the use of a server negotiated block size
division instead of a receiver controlled scheme, which would not scale to
more than one user.
VI. CONCLUSION
We have considered parallel access to multiple mirror sites and related
many-to-many distribution problems. Our approach introduced protocols for
these problems using erasure codes, based on the idea of a digital fountain.
We suggest that erasure codes are essential for building efficient, scalable
protocols for these problems and that Tornado codes in particular provide
the necessary encoding and decoding performance to make these protocols
viable in practice. We describe a server selection scheme wherein clients are
grouped into cells depending upon their geographical locations, and then
different subsets of all the mirror servers are chosen for each cell, where the
selection process ensures that all the servers chosen for a cell have
approximately similar performances. We demonstrate that this server
selection scheme delivers optimal speedup of download time. Further we
propose a scheme for division of the source blocks amongst the mirror
servers for each cell, which eliminates the distinctness inefficiency. We
therefore believe that our approach for scalable content distribution is highly
efficient and can deliver dramatic performance improvements over existing
techniques for many-many distribution of bulk data.
VII. BIBLIOGRAPHY
[1] S. Seshan, M. Stemm, and R. Katz, “SPAND: Shared Passive Network Performance
Discovery,” In Proc. of Usenix Symposium on Internet Technologies and Systems ’97,
Monterey, CA, December 1997.
[2] J. Byers, M. Luby, M. Mitzenmacher, and A. Rege, “A Digital Fountain Approach to
Reliable Distribution of Bulk Data,” In Proc. of ACM SIGCOMM ’98, Vancouver, 1998.
[3] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, and V. Stemann, “Practical
Loss-Resilient Codes.” In Proceedings of the ACM Symposium on Theory of Computing,
1997.
[4] L. Rizzo and L. Vicisano, “A Reliable Multicast Data Distribution Protocol Based on
Software FEC Techniques.” In Proc. of HPCS ’97, Greece, June 1997.
[5] A. Myers, P. Dinda and H. Zhang, “Performance Characteristics of Mirror Servers on
the Internet.” In Proc. Of IEEE INFOCOM ’99, Anchorage, AL. 1999
[6] P. Rodriguez, A. Kirpal, and E. W. Biersack, “Parallel-access for mirror sites in the
internet”. In Proc. of IEEE Infocom 2000, volume 2, pages 864--873, Tel Aviv, Isael,
Mar. 2000.
[7] M. Luby, J. Gemmell, L. Vicisano, L. Rizzo, J. Crowcroft, and B. Lueckenhoff.
Reliable multicast transport building block: Forward Error Correction codes, March
2000. Work in Progress: draft-ietf-rmt-bb-fec-00.txt.
[8] John Byers, Michael Luby, and Michael Mitzenmacher, "Accessing multiple mirror
sites in parallel: Using tornado codes to speed up downloads," in INFOCOM 99, Apr.
1999.
Download