Bullet: High Bandwidth Data Dissemination Using an Overlay Mesh Introduction Given a sender and a large set of receivers spread across the Internet, how can we maximize the bandwidth? Problem domain: Software or video distribution Real-time multimedia streaming Existing Solutions IP multicast does not consider bandwidth when constructing its distribution tree A promising alternative: Overlay Attempts to mimic the multicast routing trees Instead of high-speed routers, use programmable end hosts as interior nodes in the overlay tree Existing Solutions A tree structure is problematic Decreasing bandwidth as moving down a tree Any loss high up the tree will reduce the bandwidth lower down the tree Bandwidth of a node limited by its single parent A New Approach Transmit disjoint data set to various points in the network A node download from multiple sources rather than a single parent Higher reliability Conceptual Model Root 1 Mbps 1 Mbps 1 Mbps A B Conventional Model Root 1 Mbps 1 Mbps 1 Mbps A B Conventional Model Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Root 1 Mbps 1 Mbps 1 Mbps A B Bullet Properties TCP friendly Low control overhead Probing resource Locating multiple downloading sources Decentralized and scalable Must respond to congestion signals No global knowledge Robust to failures, even high up in a tree Bullet Overview Use meshes as opposed to trees Bandwidth independent of the underlying overlay tree Used a 1,000-node overlay and 20,000 network topologies Up to 2x bandwidth improvement over a bandwidth-optimized tree Overhead of 30 Kbps System Components Split the data into packet-sized objects Disseminate disjoint objects to clients at a rate determined by bandwidth to each client Nodes need to locate and retrieve disjoint data from their peers Periodically exchange summary tickets Minimize overlapping objects from each peer Illustration 1 2 3 4 5 6 7 S 1 2 3 5 1 2 5 1 3 4 6 A B D E 2 4 5 6 C 1 3 4 Data Encoding Multimedia: MDC encoding Large files: erasure encoding Tornado code Only need to locate 5% extra packets to reconstruct the original message Faster encoding and decoding time RanSub Distributes random subsets of participating nodes During the collect phase, each node sends a random subset of its descendant nodes up the tree During the distribute phase, each node sends a random subset of collected nodes down the tree Informed Content Delivery Techniques Use summary tickets A summary ticket is an array array[i] = hashi(working set) Check ticket elements against a Bloom filter It is possible to have false positives It is possible that B will not send a packet to A even though A is missing it TCP Friendly Rate Control (TFRC) TCP halves the sending rate as soon as one packet loss is detected Too severe TFRC is based on loss events, or multiple dropped packets within one round-trip time TCP Friendly Rate Control (TFRC) Bullet eliminated retransmission from TFRC Easier to recover from other sources than from the initial sender TFRC does not aggressively seek newly available bandwidth like TCP Bullet Layers a mesh on top of an overlay tree to increase overall bandwidth Finding Overlay Peers RanSub periodically delivers subsets of uniformly random selected nodes Via summary tickets (120 bytes per node) The working set from each node is associated with a Bloom filter Peer with nodes with the lowest similarity ratio Recovering Data from Peers A receiver assigns a portion of the sequence space to each of its senders, to avoid duplication among senders A receiver periodically updates each sender with its current Bloom filter and the range of sequences covered in its Bloom filter Less than 10% of all received packets are duplicates Making Data Disjoint Given a randomly chosen subset of peer nodes, it is about the same probability that each node has a particular data packet A parent decides the portion of its data being sent to each child A function of limiting and sending factors Making Data Disjoint The portion of data a child should own is proportional to The number of its descendants Bandwidth If not enough bandwidth Each child receives a completely disjoint data stream Making Data Disjoint If ample bandwidth Each child will received the entire parent stream Improving the Bullet Mesh What can go wrong Not enough peers Constant changing network Use trial senders and receivers Bullet periodically evaluates the performance of its peers Places the worst performing sender/receiver Evaluation Used Internet environments and ModelNet IP emulation Deployed on the PlanetLab Built on MACEDON Specifies the overlay algorithms Core logic under 1,000 lines of code Evaluation ModelNet experiments 50 2 GHz Pentium 4’s running Linux 2.4.20 100 Mbps and 1 Gbps Ethernet switches 1,000 emulated instances 20,000 INET-generated topologies Offline Bottleneck Bandwidth Tree Given global knowledge, what is the overlay tree that will deliver the highest bandwidth to a set of overlay nodes? Finding a tree with a maximum bottleneck NP hard in general Offline Bottleneck Bandwidth Tree Assumptions The path between two overlay nodes is fixed The overlay tree uses TCP-friendly unicast connections to transfer data point-to-point In the absence of other flows, we can estimate the throughput of a TCP-friendly flow using a steady-state formula In the case of sharing, each flow can achieve at most of 1/nth of the total throughput Bullet vs. Streaming The maximum bottleneck tree achieves 5x bandwidth compared to random trees Bullet outperforms the bottleneck tree by a factor up to 100% Creating Disjoint Data Without disjoint transmission of data Bullet degrades by 25% Epidemic Approaches Bullet is 60% better than anti-entropy (gossiping) approaches Epidemic algorithms have an excessive number of duplicate packets Bullet on a Lossy Network Bullet achieves 2x bandwidth compared to maximum bottleneck trees Performance Under Failure With failure detection/recovery disabled 30% performance degradation with a missing child under the root With failure detection/recovery enabled Negligible disruption of performance PlanetLab 47 nodes for deployment Similar results Related Work Kazza BitTorrent Perpendicular downloads Does not use erasure code Bandwidth consuming Centralized tracker FastReplica Not bandwidth aware Related Work Scalable Reliable Multicast Epidemic approaches Difficult to configure Do not avoid duplicates Narada Use overlay meshes Bandwidth still limited by parent Related Work Overcast Splitstream More heavyweight when nodes leave a tree Not bandwidth aware CoopNet Centralized