Pastry

advertisement
PASTRY
1
Sources

Pastry paper


“Pastry: Scalable, decentralized object location and routing for largescale peer-to-peer systems” by Antony Rowstron (Microsoft Research)
and Peter Druschel (Rice University), IFIP/ACM International Conference
on Distributed Systems Platforms (Middleware), Heidelberg, Germany,
pages 329-350, November, 2001
Pastry Homepage

http://research.microsoft.com/en-us/um/people/antr/Pastry/default.htm
2
Related work









Chord [Sigcomm’01]
CAN [Sigcomm’01]
Tapestry [TR UCB/CSD-01-1141]
PNRP [unpub.]
Viceroy [PODC ’02]
Kademlia [IPTPS ’02]
Small World [Kleinberg ‘99, ‘00]
Plaxton Trees [Plaxton et al. ‘97]
Generalized Hypercube [Bhuyan et al. ‘84]
Pastry
Generic p2p location and routing substrate (DHT)





Self-organizing overlay network (join, departures, locality
repair)
Consistent hashing
Lookup/insert object in < log2b N routing steps (expected)
O(log N) per-node state
Network locality heuristics
Scalable, fault resilient, self-organizing,
locality aware, secure
4
Pastry: Object distribution
2128 - 1 O
Consistent hashing
128 bit circular id space
objId/key
nodeIds (uniform random)
objIds/keys (uniform random)
nodeIds
Invariant: node with
numerically closest nodeId
maintains object
5
Pastry: Object insertion/lookup
2128 - 1 O
X
Msg with key X is
routed to live node
with nodeId
closest to X
Problem:
complete routing
table not feasible
Route(X)
6
Pastry Node




Represented by 128-bit randomly chosen nodeId (Hash
of IP or public key)
NodeId is in base 2b (b is a configuration parameter; b
typical value 2 or 4)
Evenly distributed nodeIds along the circular namespace
(0-2128 – 1 space).
Routes a message in O(log N) steps to destination


N: size of network
Node state contains:



Leaf Set ( L )
Routing table ( R )
Neighborhood Set ( M )
CMPT 880: P2P Systems - SFU
7
Pastry node state

Leaf set: L/2
Numerically closest
nodes (L is a configuration
parameter = 16, 32 typically )


Routing Table (Prefixbased)
Neighborhood Set: M
physically closest nodes
8
Pastry node state (Leaf Set)



Serves as a fall back for routing table and
contains:
 L/2 numerically closest and larger nodeIds
 L/2 numerically closest and smaller nodIds
Size of L is typically 2b or 2 x 2b
Nodes in L are numerically close (could be
geographically diverse)
9
Pastry node state: Neighborhood set (M)



Contains the IP addresses and nodeIds of closest
nodes according to proximity metric
Size of |M| is typically 2b or 2x2b
Not used in routing, but instead for maintaining
locality properties
10
Node state: Routing Table




Matrix of Log2b N rows and 2b – 1 columns (N
is the number of nodes in the network)
Entries in row n match the first n digits of
current nodeId AND
Column number follows matched digits:
Format: matched digits–column number–rest of
ID
Log2b N populated on average
11
Node10233102
0
02212102
10031203
10200230
10230322
10233001
1
11301233
10132102
10211302
10231000
(2),
(b = 2, l = 8)
2
22301203
12230203
3
31203203
13021022
10323302
1022302
10232121
10233232
10233120
12
Pastry: Routing
Tradeoff

O(log N) routing table size


2b * log2bN + 2l
O(log N) message forwarding steps
13
Prefix Routing

Node IDs and keys from randomized namespace (SHA-1)



incremental routing towards destination ID
each node has small set of outgoing routes
log (n) neighbors per node, log (n) hops between any node pair
ID: ABCE

ABC0

To: ABCE

AB5F

A930

Pastry: Routing table (# 10233102)
L nodes in
leaf set
log2b N Rows
(actually
log2b 2128= 128/b)
2b columns
L neighbors
15
Pastry: Routing procedure
(1) Node is in the leaf set
(2) Forward message to a closer
node (Better match)
(3) Forward towards numerically
Closer node (not a better match)
D: Message Key
Li: ith closest NodeId in leaf set
shl(A, B): Length of prefix shared
by nodes A and B
i
th
R j: (j, i) entry of routing table
16
Pastry: Routing procedure
If (destination is within range of our leaf set)
forward to numerically closest member
else
let l = length of shared prefix
let d = value of l-th digit in D’s address
if (Rld exists)
forward to Rld
else
forward to a known node (from L R M ) that
(a) shares at least as long a prefix
(b) is numerically closer than this node 17
Pastry: Routing procedure



If message with key D is within range of leaf set,
forward to numerically closest leaf
Else forward to node that shares at least one
more digit with D in its prefix than current
nodeId
If no such node exists, forward to node that
shares at least as many digits with D as current
nodeId but numerically nearer than current
nodeId
CMPT 880: P2P Systems - SFU
18
Pastry: Routing
d46a1c
d471f1
d467c4
d462ba
d4213f
Look for (d46a1c)
65a1fc
d13da3
Properties
•
log2b N steps
•
O(log N) state
19
Pastry: Locality properties
Assumption: scalar proximity metric
 e.g. ping/RTT delay, # IP hops
 traceroute, subnet masks
 a node can probe distance to any other node
Proximity invariant:
Each routing table entry refers to a node close
to the local node (in the proximity space), among
all nodes with the appropriate nodeId prefix.
20
Pastry: Geometric Routing in proximity space
d46a1c
d471f1
d467c4
d462ba
d4213f
Route(d46a1c)
d467c4
Proximity space
d13da3
NodeId space
65a1fc
The proximity distance traveled by message in each
routing step is exponentially increasing (entry in row
l is chosen from a set of nodes of size N/2bl)
The distance traveled by message from its source
increases monotonically at each step (message d462ba
takes larger and larger strides)
d4213f
65a1fc
d13da321
Pastry: Locality properties


Each routing step is local, but there is
no guarantee of globally shortest path
Nevertheless, simulations show:


Expected distance traveled by a message
in the proximity space is within a small
constant of the minimum
Among k nodes with nodeIds closest to
the key, message likely to reach the
node closest to the source node first
22
Pastry: Self-organization
Initializing and maintaining routing tables and
leaf sets


Node addition
Node departure (failure)
The goal is to maintain all routing table entries
to refer to a near node, among all live nodes
with appropriate prefix
23
Pastry: Node addition




New node X contacts nearby node A
A routes “join” message to X, which arrives to Z,
closest to X
X obtains leaf set from Z, i’th row for routing
table from i’th node from A to Z
X informs any nodes that need to be aware of its
arrival


X also improves its table locality by requesting
neighborhood sets from all nodes X knows
In practice: optimistic approach
24
Pastry: Node addition
X=d46a1c
d471f1
Z=d467c4
d462ba
d4213f
New node: X=d46a1c
A is X’s neighbor
Route(d46a1c)
d13da3
A = 65a1fc
25
Pastry: Node addition
d467c4
d471f1
d467c4
d462ba
d46a1c
d4213f
Route(d46a1c)
65a1fc
Proximity space
d13da3
B1 is first row of B
d4213f
New node: d46a1c
X NodeId space
d462ba
X is close to A, B is close to B1. Why X is close to B1?
The expected distance from B to its row one entries (B1) is much larger
than the expected distance from A to B (chosen from exponentially
decreasing set size)
65a1fc
d13da3
26
Node departure (failure)

Leaf set repair (eager – all the time):



Routing table repair (lazy – upon failure):


Leaf set members exchange keep-alive messages
request set from furthest live node in set
get table from peers in the same row, if not found
– from higher rows
Neighborhood set repair (eager)
27
Pastry: Summary





Generic p2p overlay network
Scalable, fault resilient, self-organizing,
secure
O(log N) routing steps (expected)
O(log N) routing table size
Network locality properties
28
Download