Scalable Content Distribution with Reliable Multicast from Multiple Mirror Sites using Tornado Codes EE228A Project Report Wei Wang and Abhishek Ghose. Abstract— Mirror sites enable client requests to be serviced by any of a number of servers, reducing load at individual servers and dispersing network load. Typically, a client requests service from a single mirror site. We consider enabling a client to access a file from multiple mirror sites in parallel to speed up the download. To eliminate complex clientserver negotiations that a straightforward implementation of this approach would require, we develop a feedback-free protocol based on erasure codes. We demonstrate that a protocol using fast Tornado codes can deliver dramatic speedups at the expense of transmitting a moderate number of additional packets into the network. Our scalable solution extends naturally to allow multiple clients to access data from multiple mirror sites simultaneously. Our approach applies naturally to wireless networks and satellite networks as well. I. INTRODUCTION Downloading a large file from a heavily loaded server or through a highly congested link can be a painfully slow experience. Similarly, clients who connect to the Internet at modem speeds often suffer through interminable waiting periods to download web pages with graphical content. Hence efficient techniques are needed for scalable distribution of popular web content to a large number of receivers. Examples of such web content include software upgrades and video clips from news sites. Any such technique must satisfy the following criteria: Reliability – The protocol must be able to recover from network errors and must guarantee transmission of correct data to all the receivers. Speedup – The protocol must use efficient techniques to reduce download times. Asynchronous Information Retrieval – The protocol must support asynchronous retrieval of a single unit of data by a large number of users, i.e. it must allow a number of users with overlapping access times to download the same file in parallel. We propose a novel technique for scalable distribution of popular web content which satisfies the aforementioned requirements. Our approach consists of the following main points: Multiple mirror sites – To enable fast download times and to distribute the requests amongst multiple servers so as to perform server load balancing. Feedback-free multicast protocol – To enable the protocol to scale to an essentially unlimited number of users. Tornado Codes – To efficiently implement a multicast protocol in practice using erasure codes with fast encoding/decoding times. Digital Fountain – To provide a generic framework for asynchronous information retrieval by a number of users. The rest of the paper is organized as follows. In Section II, we provide a theoretical background for the various issues involved in designing a feedback-free multicast protocol. In Section III, we outline our proposal and discuss related work. In Section IV, we describe and analyze our scheme for optimal server selection in detail. In Section V, we describe our technique of source blocking to eliminate distinctness inefficiency. Finally we state our conclusions and give some suggestion for future work in Section VI. II. THEORETICAL BACKGROUND II.A Mirror Sites The most relevant web technology to our proposal is the use of mirror sites. The use of mirror sites involves the deployment of a number of servers replicating the same data at geographically distributed locations, in an effort to both distribute the load of requests across servers and to make network connections shorter in length, thereby reducing network traffic. A limitation of current mirroring technology is that the user must choose a single mirror site from which to access data. Optimal Server Selection is a widely studied problem in literature. While the choice of server may appear obvious when the number of mirror sites is small, the work of [1],[5] indicates that the obvious choice is not always the best choice and dramatic performance improvements can result from more careful selection. This selection process can be automated by methods such as statistical record keeping [2], or by dynamic probing. In statistical record-keeping, a database is maintained at each client which has a record of the download time at previous attempts from each mirror server. This record is used to predict future performances of each mirror server and to choose the optimal server. II.B The Need for a Feedback-free Multicast Protocol Our first objective is to enable users to download data from multiple mirror sites in parallel in order to reduce downloading time. This technique not only has the potential to improve performance substantially over a single server approach, but can eliminate the need for a complex selection process. For some applications, such as software distribution, mirrored data may be requested by a vast number of autonomous clients whose access intervals may overlap with one another. In such a situation, rather than having each client establish a set of point-to-point connections to multiple mirror sites in parallel, we propose a more general system by which mirror sites establish multicast groups to transmit data. Provided that the transmissions of these multicast groups are synchronized so as to keep the transmission of duplicates to a minimum, a client could subscribe to several multicast groups in parallel to retrieve the data more quickly. The use of multicast provides scalability to an unlimited number of clients, each of which may be subscribing to different subsets of the multicast groups. As an example, such a system could be used by employees of an international conglomerate to download daily financial statements via their corporate intranet. Each of a number of geographically distributed mirror sites would establish a multicast session to distribute the information. Transparent access from thousands of clients to multiple mirror sites would be an effective mechanism to speed up download times and to balance load, especially away from heavily accessed servers at peak usage times. Though this approach is conceptually simple, it is not clear how one can derive a simple and workable implementation around it that delivers superior performance. While unicast protocols successfully use receiver initiated requests for retransmission of lost data to provide reliability, it is widely known that the multicast analog of this solution is unscalable. For example, consider a server distributing a new software release to thousands of clients. As clients lose packets their requests for retransmission can quickly overwhelm the server in a process known as feedback implosion. Even in the event that the server can handle the requests, the retransmitted packets are often of use only to a small subset of the clients. Moreover, whereas adaptive retransmission solutions are at best unscalable and inefficient on terrestrial networks, they are unworkable on wireless and satellite networks, where the back channel typically has high latency and limited capacity, if it is available at all. times. The problems with solutions based on adaptive retransmission have led many researchers to consider applying Forward Error Correction based on erasure codes to reliable multicast [2],[4],[7]. The basics of erasure codes are outlined in the next section. II.C Erasure Codes The basic principle behind the use of erasure codes [3] is that the original source data, in the form of a sequence of k packets, along with l additional redundant packets are transmitted by the sender. This n ( = k + l ) packet encoding has the property that the initial file can be restituted from any k packet subset of the encoding. For the application of reliable multicast, the source transmits packets from this encoding, and the encoding property ensures that different receivers can recover from different sets of lost packets, provided they receive a sufficiently large subset of the transmission. In principle, this idea can greatly reduce the number of retransmissions, as a single retransmission of redundant data can potentially benefit many receivers simultaneously. In applications such as real-time video, retransmission may also be undesirable due to timing constraints. To enable parallel access to multiple mirror sites, the sources each transmit packets from the same encoding, and the encoding property ensures that a receiver can recover the data once they receive a sufficiently large subset of the transmitted packets, regardless of which server the packets came from. Figure 1: How Erasure Codes Work. Tornado Codes Ideally, using erasure codes, a receiver could gather an encoded file in parallel from multiple sources, and as soon as any k packets arrive from any combination of the sources, the original file could be restituted efficiently. In practice, however, designing a system with this ideal property and with very fast encoding and decoding times appears difficult. Hence, although other erasure codes could be used in this setting, we suggest that a newly developed class of erasure codes called Tornado codes are best suited to this application, as they have extremely fast encoding and decoding algorithms [3]. Indeed, previously these codes have been shown to be more effective than standard erasure codes in the setting of reliable multicast transmission of large files. The price for the fast encoding and decoding is that slightly more than k distinct packets are required to reconstitute the message. _ Tornado codes relax the decoding guarantee as follows: to reconstruct the source data, it is necessary to recover k of the n encoding packets, where is greater than 1. We then say that is the decoding inefficiency. The advantage of Tornado codes over standard codes (which do not relax the decoding guarantee and have equal to 1) is that Tornado codes trade off a small increase in decoding inefficiency for a substantial decrease in encoding and decoding times, thus making them far more attractive for large values of k and l. As with standard codes, the Tornado decoding algorithm can detect when it has received enough encoding packets to reconstruct the original file. Thus, a client can perform decoding in real-time as the encoding packets arrive, and reconstruct the original file as soon as it determines that sufficiently many packets have arrived. While fast implementations of standard codes have quadratic encoding and decoding times for this application, Tornado codes have encoding and decoding times proportional to ( k+l ln(1/-1) ). Moreover, Tornado codes are simple to implement and use only exclusive-or operations. A comparison between the properties of Tornado codes and standard codes is given in Table I. In our project, we will use Tornado codes as the basis for our proposed approach. Table 1: Comparison of Tornado Codes and Standard Erasure Codes. II.D Digital Fountain In this section, we outline an idealized solution to the downloading from multiple mirrors problem, based on a similar approach to reliable multicast presented in [2]. A client wishes to obtain a file consisting of k packets from a collection of mirrored servers. In an idealized solution, each server sends out a stream of n distinct packets, called encoding packets, which constitute an encoding of the file. A client accepts encoding packets from all servers in the collection until it obtains exactly k packets. In this idealized solution, the file can then be reconstructed regardless of which k encoding packets the client obtains. Therefore, once the client receives any k encoding packets, it can then disconnect from the servers. We assume that this solution requires negligible processing time by the servers to produce the encoding packets packets and by the client while merging data received from the various servers and recovering the source data from the n encoding packets. A consequence of this approach is that there is no need for feedback from the receiver to the senders aside from initiating and terminating the connections; in particular, no renegotiation is ever necessary. The stream of encoding packets produced by a server in this idealized solution is metaphorically described as a digital fountain, as was done in a similar approach to reliable bulk transfers presented in [2]. The digital fountain has properties similar to a fountain of water for quenching thirst: drinking a glass of water, irrespective of the particular drops that fill the glass, quenches one’s thirst. The digital fountain concept is quite general and can be applied in diverse network environments. For example, it can be applied to the Internet, satellite networks and to wireless networks with mobile agents. These environments are quite different in terms of packet loss characteristics, congestion control mechanisms, and end-to-end latency. II.E Performance Metrics The main benefit of our approach is the reduction in download times due to the parallel access of data from multiple mirror sites. The main cost of our approach is bandwidth, in the form of a moderate number of additional packets injected into the network at each of the mirror sites. Accordingly, our performance metrics reflect these factors: Speedup – This is defined as the factor by which the download time is decreased compared to the single mirror site case. Reception Inefficiency – This is defined to be if the receiver receives a total of k packets from the servers before it can reconstruct the file. The reception efficiency comprises two separate sources of inefficiency: the decoding inefficiency of Tornado codes outlined in a previous section, and the distinctness inefficiency due to the arrival of duplicate packets. We define the distinctness inefficiency to be where is the average number of copies of each packet received by the receiver prior to reconstruction. The reception inefficiency is then just the product of the decoding inefficiency of the Tornado code and the distinctness inefficiency, that is, III SOLUTION PROBLEM TO FAST ASYNCHRONOUS DOWNLOAD The general framework we use to solve the fast and asynchronous information retrieval problem comprises the following components: multiple mirror sites transmitting to each receiver in parallel for significant speedup of download times; each mirror site encodes the source data using Tornado codes to ensure reliability against erasures with no feedback and fast encoding/decoding; each mirror site carousels and multicasts the encoded data to allow scalable asynchronous retrieval by many receivers. III.A Proposal Here we outline two main components of our proposal that improves performance over past schemes. In the next two sections we will detail each component and demonstrate the performance gains we claim here. Past work by Byers et al [8] proposed that each receiver should download data from all mirror sites in parallel for speedup gains over downloading from a single site. However the argument was made assuming equal download rates from all servers, which is not applicable in realistic situations. In addition, such a scheme violates the original benefit of mirror sites for load balancing, and distributing and minimizing network traffic congestion. Previous proposals for mirror site selection [6] required burdensome network performance monitoring and feedback by the receiver to control the data sent by different servers. Clearly the feedback and receiver control of mirror sites does not scale to multiple users. We propose that optimal speedup of download times can be obtained by a sourcenegotiated mirror site selection scheme, such that each receiver joins the multicast groups of only a subset of all mirror sites. Byers et al proposed the use of multiple mirror sites that carousel the same encoded data such that a receiver need only receive k distinct packets from any of the sources to decode the source file. However, receiving duplicate packets from different mirror sites does not further the receiver’s ability to decode. Duplicate packets waste bandwidth and increases network traffic unnecessarily. To combat this problem, Byers et al permute the order of the encoded packets independently at each mirror site for a random offset effect. This method still leads to some distinctness inefficiency. They discuss receiver controlled offsets for different mirror sites, which requires feedback from the receivers and does not scale to multiple receivers (who may want different offsets). We propose a source-negotiated data blocking scheme that eliminates the distinctness inefficiency. IV. SOURCE NEGOTIATED SERVER SELECTION Simulations from the Byers study demonstrate that if packets arrive at equal rates from all senders, then with a moderate stretch factor (for example 3 or 4), adding additional sources dramatically speeds up downloads with minimal additional reception inefficiency. However their own simulations show that in the more likely case that packets are arriving at different rates from different senders, that having more senders isn’t necessarily better. Their results clearly show that having two senders with equal rates gives higher speedup than having three senders with disparate rates. This suggests that higher speedup can be obtained by using a few mirror sites with similar rates, rather than using all the mirror sites with disparate rates. It is worth noting that the results of an independent study by Rodriguez et al [6] with a different experimental setup also illustrates the highly disparate rates from different senders and the gain of using a subset of the senders. Figure 2. This figure is taken from Byers et al [8]. It is replicated here solely for the convenience of the reader. The graph on the left shows that with a high stretch factor (more redundant encoded source packets), the number of duplicate packets received from different mirror sites (the distinctness inefficiency) is lower (note that the average decoding inefficiency was held constant at 5.5%). Since our second proposal eliminates distinctness inefficiency, we will not further analyze this graph here. The graph on the right shows that if packets arrive from each sender at equal rate, then additional sources offer higher speedups compared to a single sender. However, consider the case when packets arrive from different senders at disparate rates. The graph clearly shows that two senders with equal rates gives higher speedup than three senders with different rates. In addition, server selection recaptures the benefit of mirror sites to minimize the length of network connections and therefore geographically distribute and minimize network traffic. The client negotiated server selection schemes in the literature does not scale to multiple users and requires high overhead at the receiver to monitor server rates. We propose a source negotiated server selection scheme that allows each receiver to join the multicast groups of a subset of all mirror sites to derive optimal speedup. Myers et al found that, “Though mirror server performances (i.e. fetch times) vary widely, there are only small time-independent variations in the relative performance (ranks) of the mirror servers.” [5] periodically probing server performances to derive rankings. This justifies Our source negotiated server selection scheme follows: Define geographical cells grouping users that close to each other. Each server probes a sample of clients in each cell to obtain an estimate of the download time from a server to members of that cell. Servers then share this information and derive a ranked list of server rates to each cell. A server prunes itself from the list of servers for a cell if it’s performance is worse than a threshold. The threshold is chosen such that the selected servers for each cell have similar rate (i.e. lie within a bound). Thus members of a cell join only the multicast groups of the selected servers for that cell. The probing process is repeated periodically. M1 M5 M3 M2 M4 M6 Figure 3. How the server selection scheme works. We make several reasonable assumptions in our scheme. We assume that clients grouped in a cell are geographically close to each other and thus experience similar performance characteristics from each mirror server. Thus we can probe a sample of clients from each cell to estimate the average performance of servers to all members of that cell. We further develop the bottleneck disjoint assumption adopted by many works on mirror sites that downloads from multiple mirror sites should not be bottlenecked by low bandwidth at the receiver end. We assume that packets from the selected mirror sites for a cell to each client in the cell do not interfere with each other (the original bottleneck disjoint assumption), and also that the paths from mirror sites to different cells are approximately disjoint (the latter assumption will be useful for our next proposal). Our main contribution here is the observation that using a few mirror sites with similar rates gives higher speedup than using all mirror sites with disparate rates, and the proposal of a source negotiated server selection scheme. Past approaches use client negotiated server selection based on statistical record-keeping or dynamic probing, which is not scalable to a large number of clients. V. SOURCE-NEGOTIATED DATA BLOCKING We propose a source negotiated data blocking scheme that eliminates distinctness inefficiency and is scalable to multiple users. First consider one cell with a set of selected servers. At the sender, a source file is encoded using Tornado codes. The encoded file is divided into blocks whose sizes are proportional to the relative rates of the servers for a cell, which was obtained during the probing described in the previous section. Each server then carousels only that block of data. This scheme eliminates the distinctness inefficiency of receiving duplicate packets from different servers. R1/R R2/R R3/R Encoded file Server 1 Server 2 Server 3 Figure 4. The source negotiated data blocking scheme. A server defines a separate blocking for each cell it multicasts to. Thus there may be multiple carousels running on a server. The bottleneck disjoint assumption detailed in the previous section ensures that traffic from the multiple carousels do not interfere with each other. Our main contribution here again is the use of a server negotiated block size division instead of a receiver controlled scheme, which would not scale to more than one user. VI. CONCLUSION We have considered parallel access to multiple mirror sites and related many-to-many distribution problems. Our approach introduced protocols for these problems using erasure codes, based on the idea of a digital fountain. We suggest that erasure codes are essential for building efficient, scalable protocols for these problems and that Tornado codes in particular provide the necessary encoding and decoding performance to make these protocols viable in practice. We describe a server selection scheme wherein clients are grouped into cells depending upon their geographical locations, and then different subsets of all the mirror servers are chosen for each cell, where the selection process ensures that all the servers chosen for a cell have approximately similar performances. We demonstrate that this server selection scheme delivers optimal speedup of download time. Further we propose a scheme for division of the source blocks amongst the mirror servers for each cell, which eliminates the distinctness inefficiency. We therefore believe that our approach for scalable content distribution is highly efficient and can deliver dramatic performance improvements over existing techniques for many-many distribution of bulk data. VII. BIBLIOGRAPHY [1] S. Seshan, M. Stemm, and R. Katz, “SPAND: Shared Passive Network Performance Discovery,” In Proc. of Usenix Symposium on Internet Technologies and Systems ’97, Monterey, CA, December 1997. [2] J. Byers, M. Luby, M. Mitzenmacher, and A. Rege, “A Digital Fountain Approach to Reliable Distribution of Bulk Data,” In Proc. of ACM SIGCOMM ’98, Vancouver, 1998. [3] M. Luby, M. Mitzenmacher, A. Shokrollahi, D. Spielman, and V. Stemann, “Practical Loss-Resilient Codes.” In Proceedings of the ACM Symposium on Theory of Computing, 1997. [4] L. Rizzo and L. Vicisano, “A Reliable Multicast Data Distribution Protocol Based on Software FEC Techniques.” In Proc. of HPCS ’97, Greece, June 1997. [5] A. Myers, P. Dinda and H. Zhang, “Performance Characteristics of Mirror Servers on the Internet.” In Proc. Of IEEE INFOCOM ’99, Anchorage, AL. 1999 [6] P. Rodriguez, A. Kirpal, and E. W. Biersack, “Parallel-access for mirror sites in the internet”. In Proc. of IEEE Infocom 2000, volume 2, pages 864--873, Tel Aviv, Isael, Mar. 2000. [7] M. Luby, J. Gemmell, L. Vicisano, L. Rizzo, J. Crowcroft, and B. Lueckenhoff. Reliable multicast transport building block: Forward Error Correction codes, March 2000. Work in Progress: draft-ietf-rmt-bb-fec-00.txt. [8] John Byers, Michael Luby, and Michael Mitzenmacher, "Accessing multiple mirror sites in parallel: Using tornado codes to speed up downloads," in INFOCOM 99, Apr. 1999.