P2P

advertisement
Peer-to-Peer Networks
Outline
Overview
Pastry
OpenDHT
Contributions from Peter Druschel & Sean Rhea
Fall 2006
CS 561
1
What is Peer-to-Peer About?
•
•
•
•
Distribution
Decentralized control
Self-organization
Symmetric communication
• P2P networks do two things:
– Map objects onto nodes
– Route requests to the node responsible for a given object
Fall 2006
CS 561
2
Object Distribution
2128 - 1 O
Consistent hashing
[Karger et al. ‘97]
128 bit circular id space
objId
nodeIds (uniform random)
objIds (uniform random)
nodeIds
Invariant: node with
numerically closest nodeId
maintains object
Fall 2006
CS 561
4
Object Insertion/Lookup
2128 - 1 O
Msg with key X
is routed to live
node with nodeId
closest to X
X
Problem:
complete routing
table not feasible
Route(X)
Fall 2006
CS 561
5
Routing: Distributed Hash Table
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
d13da3
Properties
• log16 N steps
• O(log N) state
65a1fc
Fall 2006
CS 561
6
Leaf Sets
Each node maintains IP addresses of the nodes
with the L numerically closest larger and smaller
nodeIds, respectively.
• routing efficiency/robustness
• fault detection (keep-alive)
• application-specific local coordination
Fall 2006
CS 561
7
Routing Procedure
if (D is within range of our leaf set)
forward to numerically closest member
else
let l = length of shared prefix
let d = value of l-th digit in D’s address
if (RouteTab[l,d] exists)
forward to RouteTab[l,d]
else
forward to a known node that
(a) shares at least as long a prefix
(b) is numerically closer than this node
Fall 2006
CS 561
8
Routing
Integrity of overlay:
• guaranteed unless L/2 simultaneous failures of
nodes with adjacent nodeIds
Number of routing hops:
• No failures: < log16 N expected, 128/b + 1 max
• During failure recovery:
– O(N) worst case, average case much better
Fall 2006
CS 561
9
Node Addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node: d46a1c
Route(d46a1c)
d13da3
65a1fc
Fall 2006
CS 561
10
Node Departure (Failure)
Leaf set members exchange keep-alive messages
• Leaf set repair (eager): request set from farthest
live node in set
• Routing table repair (lazy): get table from peers
in the same row, then higher rows
Fall 2006
CS 561
11
PAST: Cooperative, archival file
storage and distribution
•
•
•
•
•
•
Layered on top of Pastry
Strong persistence
High availability
Scalability
Reduced cost (no backup)
Efficient use of pooled resources
Fall 2006
CS 561
12
PAST API
• Insert - store replica of a file at k diverse storage
nodes
• Lookup - retrieve file from a nearby live storage
node that holds a copy
• Reclaim - free storage associated with a file
Files are immutable
Fall 2006
CS 561
13
PAST: File storage
fileId
Insert fileId
Fall 2006
CS 561
14
PAST: File storage
k=4
Storage Invariant:
File “replicas” are
stored on k nodes
with nodeIds
closest to fileId
fileId
Insert fileId
(k is bounded by the
leaf set size)
Fall 2006
CS 561
15
PAST: File Retrieval
C
k replicas
Lookup
file located in log16 N
steps (expected)
fileId
usually locates replica
nearest client C
Fall 2006
CS 561
16
DHT Deployment Today
PAST
CFS
(MIT)
(MSR/
Rice)
Chord
DHT
Pastry
DHT
Overnet
(open)
OStore
pSearch
Coral
(UCB)
(HP)
i3
(NYU)
(UCB)
(UCB)
Tapestry
DHT
CAN
DHT
Kademlia
DHT
Chord
DHT
Bamboo Kademlia
DHT
DHT
PIER
Every application deploys its own DHT
(DHT as a library)
IP
connectivity
Fall 2006
CS 561
17
DHT Deployment Tomorrow?
PAST
CFS
(MIT)
(MSR/
Rice)
Chord
DHT
Pastry
DHT
Overnet
(open)
(HP)
Coral
i3
(NYU)
(UCB)
(UCB)
CAN
DHT
Kademlia
DHT
Chord
DHT
Bamboo Kademlia
DHT
DHT
OStore
pSearch
(UCB)
Tapestry
DHT
indirection
PIER
DHT
OpenDHT: one DHT, shared across applications
(DHT as a service)
IP
connectivity
Fall 2006
CS 561
18
Two Ways To Use a DHT
1. The Library Model
–
–
DHT code is linked into application binary
Pros: flexibility, high performance
2. The Service Model
–
–
Fall 2006
DHT accessed as a service over RPC
Pros: easier deployment, less maintenance
CS 561
19
The OpenDHT Service
• 200-300 Bamboo [USENIX’04] nodes on PlanetLab
– All in one slice, all managed by us
• Clients can be arbitrary Internet hosts
– Access DHT using RPC over TCP
• Interface is simple put/get:
– put(key, value) — stores value under key
– get(key) — returns all the values stored under key
• Running on PlanetLab since April 2004
– Building a community of users
Fall 2006
CS 561
20
OpenDHT Applications
Application
Uses OpenDHT for
Croquet Media Manager
replica location
DOA
indexing
HIP
name resolution
DTN Tetherless Computing Architecture
host mobility
Place Lab
range queries
QStream
multicast tree construction
VPN Index
indexing
DHT-Augmented Gnutella Client
rare object search
FreeDB
storage
Instant Messaging
rendezvous
CFS
storage
i3
Fall 2006
redirection
CS 561
21
An Example Application:
The CD Database
Compute Disc
Fingerprint
Recognize Fingerprint?
Album & Track Titles
Fall 2006
CS 561
22
An Example Application:
The CD Database
Type In Album and
Track Titles
Album & Track Titles
No Such Fingerprint
Fall 2006
CS 561
23
A DHT-Based FreeDB Cache
• FreeDB is a volunteer service
– Has suffered outages as long as 48 hours
– Service costs born largely by volunteer mirrors
• Idea: Build a cache of FreeDB with a DHT
– Add to availability of main service
– Goal: explore how easy this is to do
Fall 2006
CS 561
24
Cache Illustration
DHT
DHT
Fall 2006
CS 561
25
Is Providing DHT Service Hard?
• Is it any different than just running Bamboo?
– Yes, sharing makes the problem harder
• OpenDHT is shared in two senses
– Across applications  need a flexible interface
– Across clients  need resource allocation
Fall 2006
CS 561
26
Sharing Between Applications
• Must balance generality and ease-of-use
– Many apps want only simple put/get
– Others want lookup, anycast, multicast, etc.
• OpenDHT allows only put/get
–
–
–
–
But use client-side library, ReDiR, to build others
Supports lookup, anycast, multicast, range search
Only constant latency increase on average
(Different approach used by DimChord [KR04])
Fall 2006
CS 561
27
Sharing Between Clients
• Must authenticate puts/gets/removes
– If two clients put with same key, who wins?
– Who can remove an existing put?
• Must protect system’s resources
– Or malicious clients can deny service to others
– The remainder of this talk
Fall 2006
CS 561
28
Fair Storage Allocation
• Our solution: give each client a fair share
– Will define “fairness” in a few slides
• Limits strength of malicious clients
– Only as powerful as they are numerous
• Protect storage on each DHT node separately
– Global fairness is hard
– Key choice imbalance is a burden on DHT
– Reward clients that balance their key choices
Fall 2006
CS 561
29
Two Main Challenges
1. Making sure disk is available for new puts
–
–
As load changes over time, need to adapt
Without some free disk, our hands are tied
2. Allocating free disk fairly across clients
–
Fall 2006
Adapt techniques from fair queuing
CS 561
30
Making Sure Disk is Available
• Can’t store values indefinitely
– Otherwise all storage will eventually fill
• Add time-to-live (TTL) to puts
– put (key, value)  put (key, value, ttl)
– (Different approach used by Palimpsest [RH03])
Fall 2006
CS 561
31
Making Sure Disk is Available
• TTLs prevent long-term starvation
– Eventually all puts will expire
• Can still get short term starvation:
Client A arrives
fills entire of disk
Client B arrives
asks for space
Client A’s values
start expiring
B Starves
Fall 2006
CS 561
time
32
Making Sure Disk is Available
• Stronger condition:
Be able to accept rmin bytes/sec new data at all times
max
Sum must be < max capacity
Reserved for future
puts. Slope = rmin
TTL
size
0now
Fall 2006
time
Candidate put
max
CS 561
33
Making Sure Disk is Available
• Stronger condition:
Be able to accept rmin bytes/sec new data at all times
max
max
TTL
TTL
size
size
0now
Fall 2006
time
0now
max
CS 561
time
max
34
Fair Storage Allocation
Queue full:
reject put
Per-client
put queues
Not full:
enqueue put
Wait until can
accept without
violating rmin
Select most
underrepresented
Store and
send accept
message
to client
The Big Decision: Definition of “most under-represented”
Fall 2006
CS 561
35
Defining “Most Under-Represented”
• Not just sharing disk, but disk over time
– 1-byte put for 100s same as 100-byte put for 1s
– So units are bytes  seconds, call them commitments
• Equalize total commitments granted?
– No: leads to starvation
– A fills disk, B starts putting, A starves up to max TTL
Client A arrives
fills entire of disk
Client B arrives
asks for space
B catches up
with A
Now A Starves!
Fall 2006
CS 561
time
36
Defining “Most Under-Represented”
• Instead, equalize rate of commitments granted
– Service granted to one client depends only on others putting “at
same time”
Client A arrives
fills entire of disk
Client B arrives
asks for space
B catches up
with A
time
A & B share
available rate
Fall 2006
CS 561
37
Defining “Most Under-Represented”
• Instead, equalize rate of commitments granted
– Service granted to one client depends only on others putting “at
same time”
• Mechanism inspired by Start-time Fair Queuing
– Have virtual time, v(t)
– Each put gets a start time S(pci) and finish time F(pci)
F(pci) = S(pci) + size(pci)  ttl(pci)
S(pci) = max(v(A(pci)) - , F(pci-1))
v(t) = maximum start time of all accepted puts
Fall 2006
CS 561
38
Fairness with Different Arrival Times
Fall 2006
CS 561
39
Fairness With Different Sizes and TTLs
Fall 2006
CS 561
40
Performance
• Only 28 of 7 million values lost in 3 months
– Where “lost” means unavailable for a full hour
• On Feb. 7, 2005, lost 60/190 nodes in 15 minutes to
PL kernel bug, only lost one value
Fall 2006
CS 561
41
Performance
• Median get latency ~250 ms
– Median RTT between hosts ~ 140 ms
• But 95th percentile get latency is atrocious
– And even median spikes up from time to time
Fall 2006
CS 561
42
The Problem: Slow Nodes
• Some PlanetLab nodes are just really slow
–
–
–
–
But set of slow nodes changes over time
Can’t “cherry pick” a set of fast nodes
Seems to be the case on RON as well
May even be true for managed clusters (MapReduce)
• Modified OpenDHT to be robust to such slowness
– Combination of delay-aware routing and redundancy
– Median now 66 ms, 99th percentile is 320 ms
• using 2X redundancy
Fall 2006
CS 561
43
Download