Scale and Performance in the CoBlitz Large-File Distribution Service KyoungSoo Park Vivek S. Pai Princeton University Large-file Distribution • Increasing demand for large files • Movies or software release • On-line movie downloads • Linux distribution • Files are 100MB ~ a couple of GB • One-to-many downloads • Nice to use a CDN, but… KyoungSoo Park NSDI 2006 2 Why Not Web CDNs? • Whole file caching • Optimized for 10KB objects • 2GB = 200,000 x 10KB • Memory pressure • Working sets do not fit in memory • Disk access 1000 times slower • Waste of resources • More servers needed • Provisioning is a must KyoungSoo Park NSDI 2006 3 Peer-to-Peer? • BitTorrent takes up ~30% Internet BW • Custom software • Deployment is a must • Configurations needed • Companies may want managed service • Handles flash crowds • Handles long-lived objects KyoungSoo Park NSDI 2006 4 What We’d like is Large-file Service with No custom client No custom server No prepositioning No rehosting No manual provisoning KyoungSoo Park NSDI 2006 5 CoBlitz: Scalable large-file CDN • Reducing the problem to small-file CDN • • • • Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment • Benefits • Faster than BitTorrent by 55-86% (~500%) • One copy from origin serves 43-55 nodes • Incremental build on existing CDNs KyoungSoo Park NSDI 2006 6 How it works CDN = Redirector + DNS Reverse Proxy Only reverse proxy(CDN) caches the chunks! chunk1 chunk2 CDN CDN HTTP RANGE QUERY Origin Server coblitz.codeen.org chunk1 Client Agent CDN chunk 3 chunk 3 CDN Agent Client chunk 5 chunk 5 CDN chunk5 KyoungSoo Park chunk 1 chunk3 CDN chunk4 NSDI 2006 7 Smart Agent • Preserves HTTP semantics • Parallel chunk requests sliding window of “chunks” HTTP Client KyoungSoo Park waiting done done done waiting waiting done done waiting … waiting … waiting … NSDI 2006 CDN CDN CDN CDN CDN 8 Operation & Challenges • Provides public service over 2 years • http://coblitz.codeen.org:3125/URL • Challenges • Scalability & robustness • Peering set difference • Load to the origin server KyoungSoo Park NSDI 2006 9 Unilateral Peering • Independent peering decision • No synchronized maintenance problem • Motivation • Partial network connectivity • Internet2, CANARIE nodes • Routing disruption • Isolated nodes • Improve both scalability & robustness KyoungSoo Park NSDI 2006 10 Peering Set Difference • No perfect clustering by design • Assumption • Close nodes shares common peers Both can reach Only can reach Only can reach KyoungSoo Park NSDI 2006 11 Peering Set Difference • Highly variable App-level RTTs • 10 x times variance than ICMP • High rate of change in peer set • Close nodes share less than 50% • Low cache hit • Low memory utility • Excessive load to the origin KyoungSoo Park NSDI 2006 12 Peering Set Difference • How to fix? • • • • Avg RTT min RTT Increase # of samples Increase # of peers Hysteresis • Close nodes share more than 90% KyoungSoo Park NSDI 2006 13 Reducing Origin Load • Still have peering set difference Origin server • Critical in traffic to origin • Proximity-based routing • • • • cf. P2P: key-based routing Converge exponentially fast 3-15% do one more hop Implicit overlay tree Rerun hashing • Result • Origin load reduction by 5x KyoungSoo Park NSDI 2006 14 Scale Experiments • Use all live PlanetLab nodes as clients • 380~400 live nodes at any time • Simultaneous fetch of 50MB file • Test scenarios • • • • Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper KyoungSoo Park NSDI 2006 15 Throughput Distribution 1 Fraction of Nodes <= X (CDF) 0.9 0.8 0.7 BT-Core Out-of-order staggered 55-86% 0.6 0.5 Direct 0.4 BT - total 0.3 BT - core 0.2 In - order uncached In - order staggered 0.1 In - order cached 0 0 KyoungSoo Park 2000 4000 6000 NSDI 2006 Throughput(Kbps) 8000 10000 16 Downloading Times 95% percentile: 1000+ secs faster 1 0.9 Fraction of Nodes <= X 0.8 In-order cached 0.7 0.6 In-order staggered 0.5 In-order uncached 0.4 BT-core 0.3 BT-total 0.2 Direct 0.1 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Download Time (sec) KyoungSoo Park NSDI 2006 17 Synchronized Workload Congestion Origin Server KyoungSoo Park NSDI 2006 18 Addressing Congestion • Proximity-based multi-hop routing • Overlay tree for each chunk • Dynamic chunk-window resizing • Increase by 1/log(x), (where x is win size) if chunk finishes < average • Decrease by 1 if retry kills the first chunk KyoungSoo Park NSDI 2006 19 Number of Failures 5.7 Failure Percentage(%) 6 5 4.3 4 3 2.1 2 1 0 Direct KyoungSoo Park BitTorrent NSDI 2006 CoBlitz 20 Performance after Flash Crowds 1 BitTorrent 0.9 Fraction of Nodes > X In-order CoBlitz CoBlitz:70+% > 5Mbps 0.8 0.7 0.6 0.5 0.4 BitTorrent: 20% > 5Mbps 0.3 0.2 0.1 0 0 KyoungSoo Park 5000 10000 15000 20000 Throughput(Kbps) NSDI 2006 25000 30000 35000 21 Data Reuse 7 fetches for 400 nodes, 98% cache hit Utility (# of nodes served / copy) 60 55 50 40 35 30 20 10 7.7 0 Shark KyoungSoo Park BitTorrent NSDI 2006 CoBlitz 22 Comparison with Other Systems • Shark [NSDI05] • Med thruput 0.96 Mbps with 185 clients • CoBlitz: 3.15Mbps with 380~400 clients • Bullet, Bullet’[SOSP03, USENIX05] • Using UDP, Avg 7Mbps with 41 nodes • CoBlitz: slightly better(7.4Mbps) with only TCP connections KyoungSoo Park NSDI 2006 23 Real-world Usage • Fedora Core official mirror • http://coblitz.planet-lab.org/ • US-East/West, England, Germany, Korea, Japan • CiteSeer repository (50,000+ links) • PlanetLab researchers • Stork(U of Arizona) + ~10 others KyoungSoo Park NSDI 2006 24 1.0E+07 107 1.0E+06 106 1.0E+05 105 1.0E+04 104 1.0E+03 103 102 1.0E+02 10 1.0E+01 1.0E+000 Number of Requests Number of requests Usage in Feb 2006 B 4G 2~ B 2G 1~ GB B 1 5~ M 0. 512 B 6~ M 25 256 8~ B 12 28M ~1 B 64 4M ~6 B 32 2M ~3 16 MB 16 8~ B 8M 4~ B 4M 2~ B 2M 0~ File Size KyoungSoo Park NSDI 2006 25 600 CD ISO 500 400 DVD ISO 300 200 100 Total Bytes Served(GB) Number of Bytes Served 0 B 4G 2~ B 2G 1~ B 1G B 5~ 2M 0. 51 B 6~ 25 56M 2 8~ B 12 8M 2 ~1 B 64 M 4 ~6 32 MB 2 ~3 16 B M 16 8~ B 8M 4~ B 4M 2~ B 2M 0~ File Size 26 NSDI 2006 KyoungSoo Park Fedora Core 5 Release • March 20th, 2006 • Peaks over 700Mbps M M M Release point 10am KyoungSoo Park NSDI 2006 27 Conclusion • Scalable large-file transfer service • Evolution under real traffic • Up and running 24/7 for over 2 years • Unilateral peering, multi-hop routing, window size adjustment • Better performance than P2P • Better throughput, download time • Far less origin traffic KyoungSoo Park NSDI 2006 28 Thank you! More information: http://codeen.cs.princeton.edu/coblitz/ How to use: http://coblitz.codeen.org:3125/URL* *Some content restrictions apply See Web site for details Contact me if you want full access! KyoungSoo Park NSDI 2006 29