Deploying Large File Transfer on an HTTP Content Distribution Network

advertisement
Deploying Large File Transfer
on an HTTP Content
Distribution Network
KyoungSoo Park and Vivek Pai
Princeton University
Large File Transfers
Generally in range of 10+MB - few GB
 Software
distribution, patches, movies, etc.
 One-to-many downloads
Not friendly to HTTP CDNs
 High
replication consumes too much space
 Whole-file caching evicts 1000’s of small
files
Current approach: custom protocols
2016-07-02
USENIX WORLDS '04
2
Why Not HTTP?
Software widely available
 Both
for publishers and clients
No Firewall, NAT problems
 Well-known
port 80
CDNs already exist
 No
need for content in many formats
 Easier resource provisioning
2016-07-02
USENIX WORLDS '04
3
Large Files Over HTTP CDNs
Break file into chunks
 Use
byte-range support in HTTP
 But proxies, CDNs hate disjoint regions
Treat chunks as files
 Hashes
easily, caches easily
 Fetch & evict chunks, not large files
Use unmodified clients & servers
 All
2016-07-02
support on the CDN itself
USENIX WORLDS '04
4
Our Approach
CDN = Redirector +
Reverse Proxy
CDN reverse caches the chunks!
file0-1
file1-2
CDN
CDN
file 0-1
Client
Agent
CDN
file2-3
file2-3
CDN
Agent
Client
file 4-5
file 4-5
CDN
file3-4
2016-07-02
file 0-1
file2-3
CDN
file4-5
USENIX WORLDS '04
5
The Role of Agent
Separate process on a CDN node
 Viewed
as a simple HTTP server
Split large request into many small reqs
 GET
url  GET url/range
 Merge replies into one large response
Issue parallel requests of chunks
 Massage
replies from servers
 Retry of slow chunks
2016-07-02
USENIX WORLDS '04
6
HTTP Header Modifications
CDN proxy
GET url/ranges
Header: blah
Header: blah
 Origin server
GET url
egress
Header: blah
Origin server
HTTP/1.0 206 Partial
2016-07-02

ingress
Range: start-end/length
Header: blah
Range: bytes ranges
CDN proxy
HTTP/1.0 200 OK
Content-length: piece length
New-header: obj length
USENIX WORLDS '04
7
Deployment Status
CoDeploy running since March 2004
 Available
on ~120 PlanetLab nodes
 Used in file synchronization service
Low incremental overhead
 Agent
is about 500 semicolons
 CDN mods about 20 semicolons
 Techniques portable to other CDNs
2016-07-02
USENIX WORLDS '04
8
Parallelism vs. Chunk size
More parallel requests
 Involves
more CDN nodes
Bigger chunk size
 Reduces
per-chunk overheads
Total buffer size
 (#
2016-07-02
parallel) * (chunk size) * (# clients)
USENIX WORLDS '04
9
Total Buffer Size vs. Bandwidth
Bandwidth(Kbps)
10KB chunk 20KB chunk
40KB chunk
80KB chunk
12000
10000
BW  total buffer size
8000
6000
4000
2000
0
5.12MB
USENIX WORLDS '04
2.56 MB
2016-07-02
1.28MB
640KB
320KB
160KB
80KB
40KB
20KB
10KB
Total Buffer Size
10
Downloading Tests
Server at lightly-loaded Princeton node
Downloading a 50MB file
 10 parallel chunks of 60KB each

Point-to-point vs. 1-to-many
Test policies
Direct (with larger socket buffers)
 Aggressive - point-to-point CoDeploy, no CDN
 CoDeploy First
 CoDeploy Cached

2016-07-02
USENIX WORLDS '04
11
One Server One Client
Node
Ping
Direct
unit = ms, Kbps
CoDeploy
Aggressive
First
Cached
Rutgers
5.6
18617
3624
4501
14123
U Maryland
5.5
12046
4095
5535
14123
U Notre Dame
35.1
8359
4179
7728
25599
U Michigan
46.9
5688
4708
6205
10239
U Nebraska
51.4
2786
4551
5119
4266
U Utah
66.8
2904
2512
3900
7585
UBC
74.6
3276
2100
4137
7728
U Washington
79.2
4551
1765
4357
17808
UC Berkeley
81.4
4501
6501
9308
20479
UCLA
84.6
2677
2178
4055
7314
2016-07-02
USENIX WORLDS '04
12
One
Server
Many
(120)
Clients
7000
Direct
Aggressive
CoDeploy First
CoDeploy Cached
Bandwidth(Kbps)
6000
5000
6023
4995
4948
3938
4000
2731
3000
3011
2861
3225
2000
1000
651588
742647
861970
1037
745
0
25%
2016-07-02
Median
Mean
Percentile
USENIX WORLDS '04
75%
13
Chunk Download Times
Short time-scale effects dominate
 Low
download time, high std deviation
 Implies fast == unpredictable
We see this in practice
 Some
nodes hit times vary 20x
 Makes managing timeouts harder
2016-07-02
USENIX WORLDS '04
14
Lessons Learned
Large file support over HTTP possible
 Basic
implementation is easy
 No client/server changes
Tradeoffs not where you expect
 Flexibility
on buffer size, parallelism
 Hard part is managing performance
 Short time-scale effects dominate
2016-07-02
USENIX WORLDS '04
15
Future Work
Better proximity info
 Tradeoff
with load balance
Better timing management
 Prefetching
at reverse proxies
 More aggressive retrying
HTTP streaming
 More
2016-07-02
chunks = less jitter?
USENIX WORLDS '04
16
More Info
http://codeen.cs.princeton.edu/codeploy/
KyoungSoo Park
kyoungso@cs.princeton.edu
Vivek Pai
vivek@cs.princeton.edu
2016-07-02
USENIX WORLDS '04
17
Download