CoBlitz: A Scalable Large-file Transfer Service (COS 461) KyoungSoo Park Princeton University Large-file Distribution • Increasing demand for large files • Movies or software release • On-line movie/ downloads • Linux distribution • Files are 100MB ~ tens of GB • One-to-many downloads How to serve large files to many clients? • Content Distribution Network(CDN)? • Peer-to-peer system? KyoungSoo Park 2 What CDNs Are Optimized For Most Web files are small (1KB ~ 100KB) KyoungSoo Park 3 Why Not Web CDNs? • Whole file caching in participating proxy • Optimized for 10KB objects • 2GB = 200,000 x 10KB • Memory pressure • Working sets do not fit in memory • Disk access is 1000 times slower • Waste of resources • More servers needed • Provisioning is a must KyoungSoo Park 4 Peer-to-Peer? • BitTorrent takes up ~30% Internet BW 1. 2. 3. 4. up down peers torrent tracker Download a “torrent” file Contact the tracker Enter the “swarm” network Chunk exchange policy - Rarest chunk first or random - Tit-for-tat: incentive to upload - Optimistic unchoking 5. Validate the checksums Benefit: extremely good use of resources! KyoungSoo Park 5 Peer-to-Peer? • Custom software • Deployment is a must • Configurations needed • Companies may want managed service • Handles flash crowds • Handles long-lived objects • Performance problem • Hard to guarantee the service quality • Others are discussed later KyoungSoo Park 6 What We’d Like Is Large-file service with No custom client No custom server No prepositioning No rehosting No manual provisoning KyoungSoo Park 7 CoBlitz: Scalable Large-file CDN • Reducing the problem to small-file CDN • • • • Split large-files into chunks Distribute chunks at proxies Aggregate memory/cache HTTP needs no deployment • Benefits • Faster than BitTorrent by 55-86% (~500%) • One copy from origin serves 43-55 nodes • Incremental build on existing CDNs KyoungSoo Park 8 How It Works CDN = Redirector + DNS Reverse Proxy Only reverse proxy(CDN) caches the chunks! chunk1 chunk2 CDN CDN HTTP RANGE QUERY Origin Server coblitz.codeen.org chunk1 Client Agent CDN chunk 3 chunk 3 CDN Agent Client chunk 5 chunk 5 CDN chunk5 KyoungSoo Park chunk 1 chunk3 CDN chunk4 9 Smart Agent • Preserves HTTP semantics • Parallel chunk requests sliding window of “chunks” HTTP Client KyoungSoo Park waiting done done done waiting waiting done done no waiting action no waiting action no waiting action Agent CDN CDN CDN CDN CDN 10 Chunk Indexing: Consistent Hashing Problem: How to find the node responsible for a specific chunk? Static hashing f(x) = some_f(x) % n … N-1 0 … X1 X3 CDN node (proxy) Xk : Chunk request X2 KyoungSoo Park But n is dynamic for servers - node can go down - new node can join Consistent Hashing F(x) = some_F(x) % N (N is a large but fixed number) Find a live node k, where |F(k) – F(URL) | is minimum 11 Operation & Challenges • Provides public service over 2.5 years • http://coblitz.codeen.org:3125/URL • Challenges • Scalability & robustness • Peering set difference • Load to the origin server KyoungSoo Park 12 Unilateral Peering • Independent proximity-aware peering • Pick “n” close nodes around me • Cf. BitTorrent picks “n” nodes randomly • Motivation • Partial network connectivity • Internet2, CANARIE nodes • Routing disruption • Isolated nodes • Benefits • No synchronized maintenance problem • Improve both scalability & robustness KyoungSoo Park 13 Peering Set Difference • No perfect clustering by design • Assumption • Close nodes shares common peers Both can reach Only can reach Only can reach KyoungSoo Park 14 Peering Set Difference • Highly variable App-level RTTs • 10 x times variance than ICMP • High rate of change in peer set • Close nodes share less than 50% • Low cache hit • Low memory utility • Excessive load to the origin KyoungSoo Park 15 Peering Set Difference • How to fix? • • • • Avg RTT min RTT Increase # of samples Increase # of peers Hysteresis • Close nodes share more than 90% KyoungSoo Park 16 Reducing Origin Load • Still have peering set difference Origin server • Critical in traffic to origin • Proximity-based routing • Converge exponentially fast • 3-15% do one more hop • Implicit overlay tree Rerun hashing • Result • Origin load reduction by 5x KyoungSoo Park 17 Scale Experiments • Use all live PlanetLab nodes as clients • 380~400 live nodes at any time • Simultaneous fetch of 50MB file • Test scenarios • • • • Direct BitTorrent Total/Core CoBlitz uncached/cached/staggered Out-of-order numbers in paper KyoungSoo Park 18 Throughput Distribution 1 Fraction of Nodes <= X (CDF) 0.9 0.8 0.7 BT-Core Out-of-order staggered 55-86% 0.6 0.5 Direct 0.4 BT - total 0.3 BT - core 0.2 In - order uncached In - order staggered 0.1 In - order cached 0 0 KyoungSoo Park 2000 4000 6000 Throughput(Kbps) 8000 10000 19 Downloading Times 95% percentile: 1000+ secs faster 1 0.9 Fraction of Nodes <= X 0.8 In-order cached 0.7 0.6 In-order staggered 0.5 In-order uncached 0.4 BT-core 0.3 BT-total 0.2 Direct 0.1 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Download Time (sec) KyoungSoo Park 20 Why Is BitTorrent Slow? • In the experiments • No locality – randomly choose peers • Chunk indexing – extra communication • Trackerless BitTorrent – Kademlia DHT • In practice • Upload capacity of typical peers is low • 10 to a few 100 Kbps for cable/DSL users • Tit for tat may not be fair • A few high-capacity uploaders help the most • BitTyrant[NSDI’07] KyoungSoo Park 21 Synchronized Workload Congestion Origin Server KyoungSoo Park 22 Addressing Congestion • Proximity-based multi-hop routing • Overlay tree for each chunk • Dynamic chunk-window resizing • Increase by 1/log(x), (where x is win size) if chunk finishes < average • Decrease by 1 if retry kills the first chunk KyoungSoo Park 23 Number of Failures 5.7 Failure Percentage(%) 6 5 4.3 4 3 2.1 2 1 0 Direct KyoungSoo Park BitTorrent CoBlitz 24 Performance After Flash Crowds 1 BitTorrent 0.9 Fraction of Nodes > X In-order CoBlitz CoBlitz:70+% > 5Mbps 0.8 0.7 0.6 0.5 0.4 BitTorrent: 20% > 5Mbps 0.3 0.2 0.1 0 0 KyoungSoo Park 5000 10000 15000 20000 Throughput(Kbps) 25000 30000 35000 25 Data Reuse 7 fetches for 400 nodes, 98% cache hit Utility (# of nodes served / copy) 60 55 50 40 35 30 20 10 7.7 0 Shark KyoungSoo Park BitTorrent CoBlitz 26 Real-world Usage • 1-2 Terabytes/day • Fedora Core official mirror • US-East/West, England, Germany, Korea, Japan • • • • • CiteSeer repository (50,000+ links) University Channel (podcast/video) Public lecture distribution by PU OIT Popular game patch distribution PlanetLab researchers • Stork(U of Arizona) + ~10 others KyoungSoo Park 27 Fedora Core 6 Release • October 24th, 2006 • Peak Throughput 1.44Gbps Release point 10am 1G Origin Server 30-40Mbps KyoungSoo Park 28 On Fedora Core Mirror List • Many people complained about I/O • Performing peak 500Mbps out of 2Gbps • 2 Sun x4200 w/Dual Operons, 2G mem • 2.5 TB Sata-based SAN • All ISOs in disk cache or in-memoy FS • CoBlitz uses 100MB mem per node • Many PL node disks are IDEs • Most nodes are BW capped at 10Mpbs KyoungSoo Park 29 Conclusion • Scalable large-file transfer service • Evolution under real traffic • Up and running 24/7 for over 2.5 years • Unilateral peering, multi-hop routing, window size adjustment • Better performance than P2P • Better throughput, download time • Far less origin traffic KyoungSoo Park 30 Thank you! More information: http://codeen.cs.princeton.edu/coblitz/ How to use: http://coblitz.codeen.org:3125/URL* *Some content restrictions apply See Web site for details Contact me if you want full access! KyoungSoo Park 31