(Re)Design Considerations for Scalable Large-File Content Distribution Brian Biskeborn, Michael Golightly, KyoungSoo Park, and Vivek Pai 7/2/2016 Systems Lunch 1 Design meets realities • • Challenges in deploying distributed systems Real issues that are feedback for better design • Not about a novel idea • Performance debugging with CoBlitz • Peering strategy • Reducing load to the origin • Latency bottlenecks 7/2/2016 Systems Lunch 2 CoBlitz background • Scalable large file service • • • • • HTTP on top of conventional CDN Cache by chunk rather than whole file Transparent split/merge of chunks http://coblitz.codeen.org:3125/your_url Deployed on PlanetLab • 10 months of north American deployment • 10 months of world-wide deployment 7/2/2016 Systems Lunch 3 How it works Only reverse proxy(CDN) caches the chunks! CDN = Redirector + Reverse Proxy file0-1 file1-2 CDN CDN file 0-1 Client Agent CDN file2-3 file2-3 CDN Agent Client file 4-5 file 4-5 CDN file3-4 7/2/2016 file 0-1 file2-3 CDN file4-5 Systems Lunch 4 Smart agent • Preserves HTTP semantics • Split large request into chunk requests • Merge chunk responses into one on the fly • In-order delivery • Parallel chunk requests • Keep a sliding window of chunk requests • Retry slow chunks 7/2/2016 Systems Lunch 5 Highest Random Weight(HRW) • • Proxy runs HRW to pick a reverse proxy Consistent hashing • Input: peer nodes + URL • Output: list of nodes in deterministic order • Action: pick the one with highest ranking 7/2/2016 Systems Lunch 6 Peering • • Each node independently chooses peers Before: • UDP ping, averaged for last four RTTs • Hysteresis • Problem: • Overlap of peer lists < 50% • Non-network delays introduced • After: • Use MinRTT, increase # of RTT samples • Overlap of peer lists > 90% 7/2/2016 Systems Lunch 7 Reducing origin load Origin server • Load to the origin • Peer set difference • Solution • Allow more peers • Multi-hop routing Peer to both Peer to only 7/2/2016 Systems Peer toLunch only 8 Latency bottlenecks • Slow nodes are bad for synch’d workload • Agent’s window progress gets stuck • Temporary congestion • Original design • Retry timeout • Redesign • Having multiple connections compete • Avoid them entirely if nodes are too slow 7/2/2016 Systems Lunch 9 Fractional HRW? • Introducing weight [0..1] to HRW • Higher means slower • Choose a node only if Last_10_bits(HRW hash)/1024 < weight • • Giving less chance for slower nodes Experiment results • Overall, it works as we expected • Not great for synchronized workload 7/2/2016 Systems Lunch 10 Bandwidth Bandwidth (Mbit/s) 120 100 80 60 40 potential bottlenecks 20 0 1 12 23 34 45 56 67 78 89 100 111 122 133 144 7/2/2016 Lunchby Bandwidth NodesSystems Sorted 11 Worst vs. best sites Worst five sites Site(# nodes) Node Avg Site Avg Fastest uoregon.edu (3) 2.46 - 2.66 2.59 4.63 cmu.edu (3) 3.50 – 3.95 3.67 5.74 csusb.edu (2) 3.93 – 4.21 4.07 6.76 rice.edu (3) 4.27 – 4.98 4.66 7.88 uconn.edu (2) 4.24 – 6.11 5.15 42.08 Best five sites Site(# nodes) Node Avg Site Avg Fastest neu.edu (2) 94.5 - 97.4 95.9 60.1 pitt.edu (1) 88.7 88.7 57.3 unc.edu (2) 84.6 – 87.1 85.9 66.1 rutgers.edu (2) 83.3 – 86.1 84.7 60.1 duke.edu (3) 80.5 – 89.9 84.2 59.6 7/2/2016 Systems Lunch 12 Downloading experiment • Fetch a 50MB file from a Princeton server • Use 115 PlanetLab nodes at the same time • Uncached workload • Evaluate our redesign step-by-step • • • • • • • 7/2/2016 Original NoSlow MinRTT 120Peers RepFactor MultiHop NewAgent Systems Lunch 13 Step-by-step improvement Fraction of Nodes <= x 1 0.8 0.6 0.4 0.2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Bandwidth(Kbps) Original 7/2/2016 No Slow Min RTT 120 Peers Rep Factor Systems Lunch MultiHop NewAgent BitTorrent 14 Requests to origin (total=115) Reduction of load at origin 20 18 16 14 12 10 8 6 4 2 0 19 11.5 3.8/19 = 1/5 3.8 Original 7/2/2016 120Peers Systems Lunch MultiHop 15 Conclusion • • Initial design may not reflect the realities Redesign dramatically improves the system • MinRTT • MultiHop • Aggressive retries • Result • 300% faster for synch’d workload • 80% reduction to the origin load 7/2/2016 Systems Lunch 16 Who’s using CoBlitz • Citeseer(http://citeseer.ist.psu.edu/) • PS/PDF links to CoBlitz • PlanetLab projects • Arizona Stork • Harvard SBON • Fedora Core mirror • http://coblitz.planet-lab.org/pub/fedora/linux/core/ 7/2/2016 Systems Lunch 17 Thanks! • • http://codeen.cs.princeton.edu/coblitz/ Demo? 7/2/2016 Systems Lunch 18 Comparisons with other systems • Bittorrent, Shark, BulletPrime System # nodes Median Mean CoBlitz cached 115 6.5 6.7 CoBlitz uncached 115 6.1 6.1 BitTorrent 115 2.0 2.9 Shark 185 1.0 CoBlitz cached 41 7.3 8.1 CoBlitz uncached 41 7.1 7.4 BulletPrime 41 7/2/2016 7.0 Systems Lunch 19 Measuring bandwidths • Measuring bandwidths • Have nearest 10 nodes issue TCP connections • Average aggregate bandwidth for 30 seconds 7/2/2016 Systems Lunch 20 7/2/2016 Systems Lunch 21