( Re)Design Considerations for Scalable Large-File Content Distribution Brian Biskeborn, Michael Golightly,

advertisement
(Re)Design Considerations for
Scalable Large-File Content Distribution
Brian Biskeborn, Michael Golightly,
KyoungSoo Park, and Vivek Pai
7/2/2016
Systems Lunch
1
Design meets realities
•
•
Challenges in deploying distributed systems
Real issues that are feedback for better design
• Not about a novel idea
•
Performance debugging with CoBlitz
• Peering strategy
• Reducing load to the origin
• Latency bottlenecks
7/2/2016
Systems Lunch
2
CoBlitz background
•
Scalable large file service
•
•
•
•
•
HTTP on top of conventional CDN
Cache by chunk rather than whole file
Transparent split/merge of chunks
http://coblitz.codeen.org:3125/your_url
Deployed on PlanetLab
• 10 months of north American deployment
• 10 months of world-wide deployment
7/2/2016
Systems Lunch
3
How it works
Only reverse proxy(CDN) caches the chunks!
CDN = Redirector +
Reverse Proxy
file0-1
file1-2
CDN
CDN
file 0-1
Client
Agent
CDN
file2-3
file2-3
CDN
Agent
Client
file 4-5
file 4-5
CDN
file3-4
7/2/2016
file 0-1
file2-3
CDN
file4-5
Systems Lunch
4
Smart agent
•
Preserves HTTP semantics
• Split large request into chunk requests
• Merge chunk responses into one on the fly
• In-order delivery
•
Parallel chunk requests
• Keep a sliding window of chunk requests
• Retry slow chunks
7/2/2016
Systems Lunch
5
Highest Random Weight(HRW)
•
•
Proxy runs HRW to pick a reverse proxy
Consistent hashing
• Input: peer nodes + URL
• Output: list of nodes in deterministic order
• Action: pick the one with highest ranking
7/2/2016
Systems Lunch
6
Peering
•
•
Each node independently chooses peers
Before:
• UDP ping, averaged for last four RTTs
• Hysteresis
•
Problem:
• Overlap of peer lists < 50%
• Non-network delays introduced
•
After:
• Use MinRTT, increase # of RTT samples
• Overlap of peer lists > 90%
7/2/2016
Systems Lunch
7
Reducing origin load
Origin server
•
Load to the origin
• Peer set difference
•
Solution
• Allow more peers
• Multi-hop routing
Peer to both
Peer to only
7/2/2016
Systems
Peer toLunch
only
8
Latency bottlenecks
•
Slow nodes are bad for synch’d workload
• Agent’s window progress gets stuck
• Temporary congestion
•
Original design
• Retry timeout
•
Redesign
• Having multiple connections compete
• Avoid them entirely if nodes are too slow
7/2/2016
Systems Lunch
9
Fractional HRW?
•
Introducing weight [0..1] to HRW
• Higher means slower
• Choose a node only if
Last_10_bits(HRW hash)/1024 < weight
•
•
Giving less chance for slower nodes
Experiment results
• Overall, it works as we expected
• Not great for synchronized workload
7/2/2016
Systems Lunch
10
Bandwidth
Bandwidth (Mbit/s)
120
100
80
60
40
potential
bottlenecks
20
0
1 12 23 34 45 56 67 78 89 100 111 122 133 144
7/2/2016
Lunchby Bandwidth
NodesSystems
Sorted
11
Worst vs. best sites
Worst five sites
Site(# nodes)
Node Avg
Site Avg
Fastest
uoregon.edu (3)
2.46 - 2.66
2.59
4.63
cmu.edu (3)
3.50 – 3.95
3.67
5.74
csusb.edu (2)
3.93 – 4.21
4.07
6.76
rice.edu (3)
4.27 – 4.98
4.66
7.88
uconn.edu (2)
4.24 – 6.11
5.15
42.08
Best five sites
Site(# nodes)
Node Avg
Site Avg
Fastest
neu.edu (2)
94.5 - 97.4
95.9
60.1
pitt.edu (1)
88.7
88.7
57.3
unc.edu (2)
84.6 – 87.1
85.9
66.1
rutgers.edu (2)
83.3 – 86.1
84.7
60.1
duke.edu (3)
80.5 – 89.9
84.2
59.6
7/2/2016
Systems Lunch
12
Downloading experiment
•
Fetch a 50MB file from a Princeton server
• Use 115 PlanetLab nodes at the same time
• Uncached workload
•
Evaluate our redesign step-by-step
•
•
•
•
•
•
•
7/2/2016
Original
NoSlow
MinRTT
120Peers
RepFactor
MultiHop
NewAgent
Systems Lunch
13
Step-by-step improvement
Fraction of Nodes <= x
1
0.8
0.6
0.4
0.2
0
0
1000
2000
3000
4000
5000
6000
7000
8000
Bandwidth(Kbps)
Original
7/2/2016
No Slow
Min RTT
120 Peers
Rep Factor
Systems Lunch
MultiHop
NewAgent
BitTorrent
14
Requests to origin (total=115)
Reduction of load at origin
20
18
16
14
12
10
8
6
4
2
0
19
11.5
3.8/19 = 1/5
3.8
Original
7/2/2016
120Peers
Systems Lunch
MultiHop
15
Conclusion
•
•
Initial design may not reflect the realities
Redesign dramatically improves the system
• MinRTT
• MultiHop
• Aggressive retries
•
Result
• 300% faster for synch’d workload
• 80% reduction to the origin load
7/2/2016
Systems Lunch
16
Who’s using CoBlitz
•
Citeseer(http://citeseer.ist.psu.edu/)
• PS/PDF links to CoBlitz
•
PlanetLab projects
• Arizona Stork
• Harvard SBON
•
Fedora Core mirror
• http://coblitz.planet-lab.org/pub/fedora/linux/core/
7/2/2016
Systems Lunch
17
Thanks!
•
•
http://codeen.cs.princeton.edu/coblitz/
Demo?
7/2/2016
Systems Lunch
18
Comparisons with other systems
•
Bittorrent, Shark, BulletPrime
System
# nodes
Median
Mean
CoBlitz cached
115
6.5
6.7
CoBlitz uncached
115
6.1
6.1
BitTorrent
115
2.0
2.9
Shark
185
1.0
CoBlitz cached
41
7.3
8.1
CoBlitz uncached
41
7.1
7.4
BulletPrime
41
7/2/2016
7.0
Systems Lunch
19
Measuring bandwidths
•
Measuring bandwidths
• Have nearest 10 nodes issue TCP connections
• Average aggregate bandwidth for 30 seconds
7/2/2016
Systems Lunch
20
7/2/2016
Systems Lunch
21
Download