Chunkyspread: Multi-tree Unstructured Peer to Peer Multicast Vidhyashankar Venkataraman (Vidhya) Paul Francis (Cornell University) John Calandrino (University of North Carolina) Introduction Increased interest in P2P live streaming in recent past Existing multicast approaches Swarming style (tree-less) Tree-based Swarming-based protocols Data-driven tree-less multicast (swarming) getting popular Neighbors send notifications about data arrivals Nodes pull data from neighbors Eg. Coolstreaming, Chainsaw Simple, unstructured Latency-overhead tradeoff Not yet known if these protocols can have good control over heterogeneity (upload volume) Tree-based solutions Low latency and low overhead Tree construction/repair considered complex Eg. Splitstream (DHT, Pushdown, Anycast) [SRao] Tree repair takes time Requires buffering, resulting in delays Contribution: A multi-tree protocol that Is simple, unstructured Gives fine-grained control over load Has low latencies, low overhead, robust to failures [SRao] Sanjay Rao et. Al. The Impact of Heterogeneous Bandwidth Constraints on DHT-Based Multicast Protocols, IPTPS February 2005. Chunkyspread – Basic Idea Build heterogeneity-aware unstructured neighbor graph Tree building: Sliced data stream: one tree per slice (Splitstream) Simple and fast loop avoidance and detection Parent/child relationships locally negotiated to optimize criteria of interest Load, latency, tit-for-tat, node-disjointness, etc. Heterogeneity-aware neighbor graph Neighbor graph built with simple random walks Using “Swaplinks”, developed at Cornell [SWAP] Degree of node in graph proportional to its desired transmit load Notion of heterogeneity-awareness So that higher-capacity nodes have more children in multicast trees [SWAP] V. Vishnumurthy and P. Francis. On Heterogeneous Overlay Construction and Random Node Selection in Unstructured P2P Networks. To appear in INFOCOMM, Barcelona 2006. Sliced Data Stream Source selects random neighbors using Swaplinks Slice Source 2 Multicasts the slice Slice Source 1 Slice 2 Slice 1 Source Slice 3 Slice Source 3 Source sends one slice to each node - acts as slice source to a tree Building trees Initialized by flooding control message Pick parents subject to capacity (load) constraints Produces loop-free but bad trees Subsequently fine-tune trees according to criteria of interest Simple and fast loop avoidance/ detection Proposed by Whitaker and Wetherall [ICARUS] All data packets carry a bloom filter Each node adds its mask to the filter Small probability of false positives Avoidance: advertise per-slice bloom filters to neighbors Detection: by first packet that traverses the loop [ICARUS] A. Whitaker and D. Wetherall. Forwarding without loops in Icarus. In OPENARCH, 2002. Parent/child selection based on load & latency Never enter this load region ML = Maximum Load Higher Load (more children) Lower Load (less children) Sheds children TL+δ Children Improve Latency TL = Target Load TL-δ Adds children 0 Parent/Child Switch Potential Parents (Load<Satisfactory Threshold) A B Parent for slice k (Load>Satisfactory threshold) 2) Gets info from all children 5) A says yes if still underloaded 3) Chooses A and asks child to switch 1) Child sends info about A and B 4) Child requests A Child Chunkyspread Evaluation Discrete event-based simulator implemented in C++ Run over transit-stub topologies having 5K routers Heterogeneity : Stream split into 16 slices TL uniformly distributed between 4 & 28 slices ML=(1.5)TL: Enough capacity in network Two cases No latency improvement (δ=0) With latency improvement: δ=(2/16).TL (or 12.5% of TL) ML=(1.5).(TL) TL+δ TL = Target Load TL-δ 0 Control over load Flash crowd scenario Nodes within 20% of 2.5K nodes join a TL even with 7.5K node network latency reduction at 100 joins/sec (δ=12.5%) With latency Peak of 40 control messages node Snap shot of per system per second with latency after nodes finished reduction fine-tuning trees Median of 10 messages per node per second during Trees optimized ~95s period of join [(Load-TL)/TL]% after all nodes join with latency reduction Latency Improvement With Latency Maximum latency ~ Buffer capacity without node failures Flash crowd scenario No Latency 90th percentile network stretch of 9 ~ small buffer Burst Failure Recovery Time CDF of Disconnect duration with latency reduction 3 Redundancy 0 Redundant slices 1 Redundant slice Failure Burst:improve 1K nodes FEC codes failrecovery in a 10K-node time network at the same time instant Neighbor failure timeout set at 4 seconds Recovery time within a few seconds Buffering: Dominant over effectswith of latency Shown various Redundancy levels Conclusion Chunkyspread is a simple multi-tree multicast protocol A design alternative to swarming style protocols Achieves fine-grained control over load with good latencies Suited to non-interactive live streaming applications Need to do apples-to-apples comparisons with swarming protocols and Splitstream