Deploying Large File Transfer on an HTTP Content Distribution Network KyoungSoo Park and Vivek Pai Princeton University Large File Transfers Generally in range of 10+MB - few GB Software distribution, patches, movies, etc. One-to-many downloads Not friendly to HTTP CDNs High replication consumes too much space Whole-file caching evicts 1000’s of small files Current approach: custom protocols 2016-07-02 USENIX WORLDS '04 2 Why Not HTTP? Software widely available Both for publishers and clients No Firewall, NAT problems Well-known port 80 CDNs already exist No need for content in many formats Easier resource provisioning 2016-07-02 USENIX WORLDS '04 3 Large Files Over HTTP CDNs Break file into chunks Use byte-range support in HTTP But proxies, CDNs hate disjoint regions Treat chunks as files Hashes easily, caches easily Fetch & evict chunks, not large files Use unmodified clients & servers All 2016-07-02 support on the CDN itself USENIX WORLDS '04 4 Our Approach CDN = Redirector + Reverse Proxy CDN reverse caches the chunks! file0-1 file1-2 CDN CDN file 0-1 Client Agent CDN file2-3 file2-3 CDN Agent Client file 4-5 file 4-5 CDN file3-4 2016-07-02 file 0-1 file2-3 CDN file4-5 USENIX WORLDS '04 5 The Role of Agent Separate process on a CDN node Viewed as a simple HTTP server Split large request into many small reqs GET url GET url/range Merge replies into one large response Issue parallel requests of chunks Massage replies from servers Retry of slow chunks 2016-07-02 USENIX WORLDS '04 6 HTTP Header Modifications CDN proxy GET url/ranges Header: blah Header: blah Origin server GET url egress Header: blah Origin server HTTP/1.0 206 Partial 2016-07-02 ingress Range: start-end/length Header: blah Range: bytes ranges CDN proxy HTTP/1.0 200 OK Content-length: piece length New-header: obj length USENIX WORLDS '04 7 Deployment Status CoDeploy running since March 2004 Available on ~120 PlanetLab nodes Used in file synchronization service Low incremental overhead Agent is about 500 semicolons CDN mods about 20 semicolons Techniques portable to other CDNs 2016-07-02 USENIX WORLDS '04 8 Parallelism vs. Chunk size More parallel requests Involves more CDN nodes Bigger chunk size Reduces per-chunk overheads Total buffer size (# 2016-07-02 parallel) * (chunk size) * (# clients) USENIX WORLDS '04 9 Total Buffer Size vs. Bandwidth Bandwidth(Kbps) 10KB chunk 20KB chunk 40KB chunk 80KB chunk 12000 10000 BW total buffer size 8000 6000 4000 2000 0 5.12MB USENIX WORLDS '04 2.56 MB 2016-07-02 1.28MB 640KB 320KB 160KB 80KB 40KB 20KB 10KB Total Buffer Size 10 Downloading Tests Server at lightly-loaded Princeton node Downloading a 50MB file 10 parallel chunks of 60KB each Point-to-point vs. 1-to-many Test policies Direct (with larger socket buffers) Aggressive - point-to-point CoDeploy, no CDN CoDeploy First CoDeploy Cached 2016-07-02 USENIX WORLDS '04 11 One Server One Client Node Ping Direct unit = ms, Kbps CoDeploy Aggressive First Cached Rutgers 5.6 18617 3624 4501 14123 U Maryland 5.5 12046 4095 5535 14123 U Notre Dame 35.1 8359 4179 7728 25599 U Michigan 46.9 5688 4708 6205 10239 U Nebraska 51.4 2786 4551 5119 4266 U Utah 66.8 2904 2512 3900 7585 UBC 74.6 3276 2100 4137 7728 U Washington 79.2 4551 1765 4357 17808 UC Berkeley 81.4 4501 6501 9308 20479 UCLA 84.6 2677 2178 4055 7314 2016-07-02 USENIX WORLDS '04 12 One Server Many (120) Clients 7000 Direct Aggressive CoDeploy First CoDeploy Cached Bandwidth(Kbps) 6000 5000 6023 4995 4948 3938 4000 2731 3000 3011 2861 3225 2000 1000 651588 742647 861970 1037 745 0 25% 2016-07-02 Median Mean Percentile USENIX WORLDS '04 75% 13 Chunk Download Times Short time-scale effects dominate Low download time, high std deviation Implies fast == unpredictable We see this in practice Some nodes hit times vary 20x Makes managing timeouts harder 2016-07-02 USENIX WORLDS '04 14 Lessons Learned Large file support over HTTP possible Basic implementation is easy No client/server changes Tradeoffs not where you expect Flexibility on buffer size, parallelism Hard part is managing performance Short time-scale effects dominate 2016-07-02 USENIX WORLDS '04 15 Future Work Better proximity info Tradeoff with load balance Better timing management Prefetching at reverse proxies More aggressive retrying HTTP streaming More 2016-07-02 chunks = less jitter? USENIX WORLDS '04 16 More Info http://codeen.cs.princeton.edu/codeploy/ KyoungSoo Park kyoungso@cs.princeton.edu Vivek Pai vivek@cs.princeton.edu 2016-07-02 USENIX WORLDS '04 17