Presented by: Chris (“CJ”) Janneck 11/12/01 Ridin’ The Storm Out: Two Articles on Electronic Wind and Digital Water Articles Discussed We’ll talk about the Tornado Codes, first. Byers et. al, 1999 Accessing Multiple Mirror Sites in Parallel: Using Tornado Codes to Speed Up Downloads Byers et al, 1998 A Digital Fountain Approach to Reliable Distribution of Bulk Data Obstacles to Overcome Often, in these situations, feedback from the receiver is not available Highly unreliable, and takes a fair amount of time Presently, (good) mirroring works by selecting a lightly-loaded, uncongested server, and all data is received from there We want to be able to download from multiple sites in parallel Multicast is a technology to broadcast data to many users simultaneously They require slightly more than k packets to reconstitute data Tornado Codes are very similar to standard erasure codes a.k.a.Forward Error Correction Codes Takes a file of k packets, encodes it into n packets such that the initial file can be reconstituted from any k-packet subset Using a modification of “erasure codes” How We Can Accomplish This Different transmission rates/sources Several renegotiations may be necessary This is, if renegotiation is even possible Realistically, renegotiation is difficult If there are problems, renegotiate Ideally, a server broadcasts packets until a receiver gets k of them Inadequacy of a Basic Scheme (why this is no walk in the park) A constant flow of data is available, and, regardless of which k packets are received, any k packets can be used to recreate the data Much like a fountain – it does not matter which drops of water are drunk to quench one’s thirst for drinking Enter: the “Digital Fountain” Metaphor “Stretch Factor” “Decoding Inefficiency” “Distinctness Inefficiency” “Reception Inefficiency” Tornado Vocab Results of Tests The “average” decoding inefficiency is good approximation for true behavior Figure 4 – Decoding Inefficiency As Trans. Rate equalizes, and No. of Senders increase, R.I. And Speedup increase Figure 3 – Trans. Rate and No. of Senders As diff. In Trans. Rate increases, R.I. And Speedup decrease Figure 2 – Trans. Rate of Senders As No. Senders increases, R.I. And Speedup / Stretch increase Figure 1 – Number of Senders As number of senders grows exponentially from 100 to 100K, the Reception Inefficiency changes very little, with either 2, 3 or 4 senders Figure 5 – Number of Senders That’s nice, but does it scale? Possible Enhancements Reliability is not major concern because only k*epsilon packets needed Integrate some level of renegotiation Doesn’t scale well – not good many-to-many Receiver requests start and end offset Guarantees, for some period of time, no duplicates are sent Synchronization of Senders “The variety of protocols and tunable parameters we describe demonstrates that one can develop a spectrum of solutions that trade off the necessary feedback, the bandwidth requirements, the download time, the memory requirements, and the complexity of the solution in various ways.” Conclusions If multiple senders at the same time is the “worst-case” scenario, then why does it attain the best speedup? Questions The other article A Digital Fountain Approach to Reliable Distribution of Bulk Data “A digital fountain allows any number of heterogeneous clients to acquire bulk data with optimal efficiency at times of their choosing.” Definition (from the abstract): Digital Fountain Suffers from “feedback implosion” Distributing data to many recipients Multicast analog of Unicast protocols are not scalable And, why, if it ain’t broken? Problem Domain – What are we trying to fix? Therefore, use a digital fountain instead Limitation on lossy networks Use of “erasure codes” Some deficits of standard Reed-Solomon codes Use of a “data carousel” to ensure full reliability Remnants from the Other Side Can take on some slower, some faster clients, and handle “swinging” Tolerant Able to handle recipients at any time, whenever they want it On Demand Should not take much longer, nor require more effort than others Efficient Delivers stuff in its entirety Must be: Reliable Wanted: One Single, Reliable Protocol Tornado Review Definitions used again: “Stretch Factor” Encoding/Decoding times/overhead Redundancy/Overlap Tornado Codes use only EX-OR operatoins Tables 2 and 3 pretty much speak for themselves Encoding/Decoding Times Figure 4 – shows the reception efficiency of Tornado codes and interleaved codes What happens when decoding time is set comparable to Tornado codes? Table 4 – shows Speedup of Tornado over interleaved codes What happens when reception efficiency is set comparable to Tornado codes? Simulation/Comparison Tests Used “layered” multicast techniques Congestion is simulated/controlled by “synchronization points” (SP’s) Server generates periodic bursts Packets are sent in correspondence to the “One Level Property” Encoded data is sent in “rounds” Implementation – Part I Implementation – Part II Berkeley Carnegie Mellon University Cornell University Servers/Clients set up at 3 locations: UDP unicast: control information Multicast transmission thread Client implements statistical approach – does not perform preliminary decoding before all packets arrive Server runs 2 threads: Quite pleased with experimental results Efficiency under 100% at low loss levels (~13%) due to switching between subscription levels Distinctness efficiency could be raised by using larger stretch factors Results Dispersity routing end-to-end of data in a packet-routing network Enhanced mirrored data Could bring about new opportunities for reliable multicast protocols Other applications: Conclusions ! ! ! This team seems heavily concerned with “efficient” data transmission, but how does this compare (practically, time-wise) to traditional methods? Is the much faster decoding enough to overlook a little data duplication? What are some inherent concerns in widespread multicasting? Questions – Round 2