CoBlitz: A Scalable Large-file Transfer Service (COS 461) KyoungSoo Park

advertisement
CoBlitz: A Scalable Large-file
Transfer Service
(COS 461)
KyoungSoo Park
Princeton University
Large-file Distribution
• Increasing demand for large files
• Movies or software release
• On-line movie/ downloads
• Linux distribution
• Files are 100MB ~ tens of GB
• One-to-many downloads
How to serve large files to many clients?
• Content Distribution Network(CDN)?
• Peer-to-peer system?
KyoungSoo Park
2
What CDNs Are Optimized For
Most Web files are small (1KB ~ 100KB)
KyoungSoo Park
3
Why Not Web CDNs?
• Whole file caching in participating proxy
• Optimized for 10KB objects
• 2GB = 200,000 x 10KB
• Memory pressure
• Working sets do not fit in memory
• Disk access is 1000 times slower
• Waste of resources
• More servers needed
• Provisioning is a must
KyoungSoo Park
4
Peer-to-Peer?
• BitTorrent takes up ~30% Internet BW
1.
2.
3.
4.
up
down
peers
torrent
tracker
Download a “torrent” file
Contact the tracker
Enter the “swarm” network
Chunk exchange policy
- Rarest chunk first or random
- Tit-for-tat: incentive to upload
- Optimistic unchoking
5. Validate the checksums
Benefit: extremely good use of resources!
KyoungSoo Park
5
Peer-to-Peer?
• Custom software
• Deployment is a must
• Configurations needed
• Companies may want managed service
• Handles flash crowds
• Handles long-lived objects
• Performance problem
• Hard to guarantee the service quality
• Others are discussed later
KyoungSoo Park
6
What We’d Like Is
Large-file service with
No custom client
No custom server
No prepositioning
No rehosting
No manual provisoning
KyoungSoo Park
7
CoBlitz: Scalable Large-file CDN
• Reducing the problem to small-file CDN
•
•
•
•
Split large-files into chunks
Distribute chunks at proxies
Aggregate memory/cache
HTTP needs no deployment
• Benefits
• Faster than BitTorrent by 55-86% (~500%)
• One copy from origin serves 43-55 nodes
• Incremental build on existing CDNs
KyoungSoo Park
8
How It Works
CDN = Redirector +
DNS
Reverse Proxy
Only reverse proxy(CDN) caches the chunks!
chunk1
chunk2
CDN
CDN
HTTP
RANGE QUERY
Origin
Server
coblitz.codeen.org
chunk1
Client
Agent
CDN
chunk 3
chunk 3
CDN
Agent
Client
chunk 5
chunk 5
CDN
chunk5
KyoungSoo Park
chunk 1
chunk3
CDN
chunk4
9
Smart Agent
• Preserves HTTP semantics
• Parallel chunk requests
sliding window of “chunks”
HTTP
Client
KyoungSoo Park
waiting
done
done
done
waiting
waiting
done
done
no
waiting
action
no
waiting
action
no
waiting
action
Agent
CDN
CDN
CDN
CDN
CDN
10
Chunk Indexing: Consistent Hashing
Problem: How to find the node responsible for a specific chunk?
Static hashing
f(x) = some_f(x) % n
… N-1 0 …
X1
X3
CDN node (proxy)
Xk : Chunk request
X2
KyoungSoo Park
But n is dynamic for servers
- node can go down
- new node can join
Consistent Hashing
F(x) = some_F(x) % N
(N is a large but fixed number)
Find a live node k, where
|F(k) – F(URL) | is minimum
11
Operation & Challenges
• Provides public service over 2.5 years
• http://coblitz.codeen.org:3125/URL
• Challenges
• Scalability & robustness
• Peering set difference
• Load to the origin server
KyoungSoo Park
12
Unilateral Peering
• Independent proximity-aware peering
• Pick “n” close nodes around me
• Cf. BitTorrent picks “n” nodes randomly
• Motivation
• Partial network connectivity
• Internet2, CANARIE nodes
• Routing disruption
• Isolated nodes
• Benefits
• No synchronized maintenance problem
• Improve both scalability & robustness
KyoungSoo Park
13
Peering Set Difference
• No perfect clustering by design
• Assumption
• Close nodes shares common peers
Both can reach
Only can reach
Only can reach
KyoungSoo Park
14
Peering Set Difference
• Highly variable App-level RTTs
• 10 x times variance than ICMP
• High rate of change in peer set
• Close nodes share less than 50%
• Low cache hit
• Low memory utility
• Excessive load to the origin
KyoungSoo Park
15
Peering Set Difference
• How to fix?
•
•
•
•
Avg RTT  min RTT
Increase # of samples
Increase # of peers
Hysteresis
• Close nodes share more than 90%
KyoungSoo Park
16
Reducing Origin Load
• Still have peering set difference
Origin server
• Critical in traffic to origin
• Proximity-based routing
• Converge exponentially fast
• 3-15% do one more hop
• Implicit overlay tree
Rerun hashing
• Result
• Origin load reduction by 5x
KyoungSoo Park
17
Scale Experiments
• Use all live PlanetLab nodes as clients
• 380~400 live nodes at any time
• Simultaneous fetch of 50MB file
• Test scenarios
•
•
•
•
Direct
BitTorrent Total/Core
CoBlitz uncached/cached/staggered
Out-of-order numbers in paper
KyoungSoo Park
18
Throughput Distribution
1
Fraction of Nodes <= X (CDF)
0.9
0.8
0.7
BT-Core
Out-of-order staggered
55-86%
0.6
0.5
Direct
0.4
BT - total
0.3
BT - core
0.2
In - order uncached
In - order staggered
0.1
In - order cached
0
0
KyoungSoo Park
2000
4000
6000
Throughput(Kbps)
8000
10000
19
Downloading Times
95% percentile: 1000+ secs faster
1
0.9
Fraction of Nodes <= X
0.8
In-order cached
0.7
0.6
In-order staggered
0.5
In-order uncached
0.4
BT-core
0.3
BT-total
0.2
Direct
0.1
0
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Download Time (sec)
KyoungSoo Park
20
Why Is BitTorrent Slow?
• In the experiments
• No locality – randomly choose peers
• Chunk indexing – extra communication
• Trackerless BitTorrent – Kademlia DHT
• In practice
• Upload capacity of typical peers is low
• 10 to a few 100 Kbps for cable/DSL users
• Tit for tat may not be fair
• A few high-capacity uploaders help the most
• BitTyrant[NSDI’07]
KyoungSoo Park
21
Synchronized Workload
Congestion
Origin Server
KyoungSoo Park
22
Addressing Congestion
• Proximity-based multi-hop routing
• Overlay tree for each chunk
• Dynamic chunk-window resizing
• Increase by 1/log(x), (where x is win size)
if chunk finishes < average
• Decrease by 1 if retry kills the first chunk
KyoungSoo Park
23
Number of Failures
5.7
Failure Percentage(%)
6
5
4.3
4
3
2.1
2
1
0
Direct
KyoungSoo Park
BitTorrent
CoBlitz
24
Performance After Flash Crowds
1
BitTorrent
0.9
Fraction of Nodes > X
In-order CoBlitz
CoBlitz:70+% > 5Mbps
0.8
0.7
0.6
0.5
0.4
BitTorrent: 20% > 5Mbps
0.3
0.2
0.1
0
0
KyoungSoo Park
5000
10000
15000
20000
Throughput(Kbps)
25000
30000
35000
25
Data Reuse
7 fetches for 400 nodes, 98% cache hit
Utility
(# of nodes served / copy)
60
55
50
40
35
30
20
10
7.7
0
Shark
KyoungSoo Park
BitTorrent
CoBlitz
26
Real-world Usage
• 1-2 Terabytes/day
• Fedora Core official mirror
• US-East/West, England, Germany, Korea, Japan
•
•
•
•
•
CiteSeer repository (50,000+ links)
University Channel (podcast/video)
Public lecture distribution by PU OIT
Popular game patch distribution
PlanetLab researchers
• Stork(U of Arizona) + ~10 others
KyoungSoo Park
27
Fedora Core 6 Release
• October 24th, 2006
• Peak Throughput 1.44Gbps
Release point 10am
1G
Origin Server
30-40Mbps
KyoungSoo Park
28
On Fedora Core Mirror List
• Many people complained about I/O
• Performing peak 500Mbps out of 2Gbps
• 2 Sun x4200 w/Dual Operons, 2G mem
• 2.5 TB Sata-based SAN
• All ISOs in disk cache or in-memoy FS
• CoBlitz uses 100MB mem per node
• Many PL node disks are IDEs
• Most nodes are BW capped at 10Mpbs
KyoungSoo Park
29
Conclusion
• Scalable large-file transfer service
• Evolution under real traffic
• Up and running 24/7 for over 2.5 years
• Unilateral peering, multi-hop routing,
window size adjustment
• Better performance than P2P
• Better throughput, download time
• Far less origin traffic
KyoungSoo Park
30
Thank you!
More information:
http://codeen.cs.princeton.edu/coblitz/
How to use:
http://coblitz.codeen.org:3125/URL*
*Some content restrictions apply
See Web site for details
Contact me if you want full access!
KyoungSoo Park
31
Download