Peer-to-Peer Networks Christian Scheideler Institut für Informatik Technische Universität München

advertisement
Peer-to-Peer Networks
0
1
Christian Scheideler
Institut für Informatik
Technische Universität München
Motivation
• Every distributed system must be based
on a network interconnecting its sites
• Network: of physical or logical nature
Physical Network
Supercomputers, multicore systems,…
Logical Network
Internet
Overlay Network
Internet
Overlay Network
Overlay Network
Basic question: how to organize sites in a
scalable and robust overlay network???
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Graph theory
Graph G=(V,E):
v knows
w
• V: set of nodes
/ vertices
• E ½ { (v,w) | v,w 2 V}: set of edges / arcs
v can send info to w
valid path
B
A
C
D
Graph theory
• (v,w): distance (length of shortest path)
of w to v in G
• D=maxv,w (v,w): diameter of G
B
D
C
A
D=4
Graph theory
• (U): set of neighbors of node set U
• (U)=|(U)| / |U|
• (G) = minU,|U|<|V|/2 (U): expansion of G
B
D
C
A
|U|=2
U
|(U)|=1
Graph theory
Network G=(V,E,c):
• V: set of nodes, E: set of edges
• c:E ! IR+: edge capacities
2
B
A
D
C
Graph Theory
Unless mentioned otherwise:
• All edges have capacity 1
• {v,w} represents {(v,w), (w,v)}
B
A
D
C
Network topologies
Ideally, complete network:
Problem: does not scale well! (~n2 edges)
Line Network
• degree 2 (optimal), BUT
• diameter bad (n-1 for n nodes)
• expansion bad ( (line) = 2/n )
How to get a low diameter?
Binary Tree
0
depth k
k
• n=2k+1-1 nodes, degree 3
• diameter is k = 2 log2 n, BUT
• expansion is still bad ( (tree)=2/n )
2-dimensional Grid
1
side length k
k
•
•
•
•
n = k2 nodes, maximum degree 4
diameter is 2(k-1) < 2 n
expansion is ~2/ n
Not too bad, but can we get better values?
Hypercube
• Nodes: (x1,…,xd) 2 {0,1}d
• Edges: 8 i: (x1,…,xd) ! (x1,..,1-xi,..,xd)
d=1
d=2
d=3
Routing:
(x1,x2,…,x
! (y1,x2,…,x
Degree
d, diameter
d,d)expansion
1/d) d
! (y1,y2,x3,…,xd) ! … ! (y1,y2,…,yd)
Butterfly
• Nodes: (k,(xd,…,x1)) 2 {0,..,d} £ {0,1}d
• Edges: (k-1,(xd,…,x1)) ! (k,(xd,..,xk,..,x1))
(k,(xd,..,1-xk,..,x1))
00
0
1
01
10
11
0
0
1
1
2
Routing: (0,(x1,x2,…,xd)) ! (1,(y1,x2,…,xd))
Degree 4, diameter 2d, expansion ~1/d
! (2,(y1,y2,x3,…,xd)) ! … ! (d,(y1,y2,…,yd))
Cube-Connected-Cycles
• Nodes: (k,(x1,…,xd)) 2 {0,..,d-1} £ {0,1}d
• Edges: (k,(x1,…,xd)) ! (k-1,(x1,...,xd))
(k+1,(x1,..,xd))
(k,(x1,..,1-xk+1,..,xd)
De Bruijn Graph
• Nodes: (x1,…,xd) 2 {0,1}d
• Edges: (x1,…,xd) ! (0,x1,…,xd-1)
(1,x1,…,xd-1)
01
001
011
010
00
11
10
101
000
111
100
(x1,…xd) ! (yd,x1,…xd-1) ! (yd-1,yd,x1,…,xd-2) ! …
110
The Diameter
Theorem: Every graph of maximum degree
d>2 and size n must have a diameter of at
least (log n)/(log(d-1))-1.
Theorem: For every even d>2 there is a
family of graphs of maximum degree d and
size n with diameter
tree(log
of n)/(log d -1).
all reachable
nodes at dist. k
The Expansion
Theorem: For every graph G the expansion (G) is
at most 1.
Theorem: There are families of constant degree
graphs with constant expansion.
Example: Gabber-Galil Graph
• Node set: (x,y) 2 {0,…,n-1}2
• (x,y) ! (x,x+y),(x,x+y+1), (x+y,y), (x+y+1,y)
(mod n)
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Overlay Network
Basic question: how to organize sites in a
scalable and robust overlay network???
Robustness:
Scalability: works
can handle
efficiently
faults
forand
large
malicious
number behavior
of sites
Server-based approach
Internet
server
Does not scale well!
sites
Alternatives
Supervised overlay network
Supervisor assists in
maintaining network
Peer-to-peer overlay network
Peers maintain
network themselves
Overlay Network
Problem: How to maintain an overlay
network as peers join and leave?
Supervised Overlay Network
• Supervisor assigns peers to points in [0,1)
so that peers evenly distributed
• Neighboring peers connect to form cycle
1
0
0
7/8
1/8
1/4
3/4
5/8
3/8
1/2
Supervised Overlay Network
• Node v wants to join (n nodes in system):
give it (n+1)th position
• Node w wants to leave:
move last node v to w‘s position
1
0
v
w
Supervised Overlay Network
• v: node at nth position
• supervisor: stores pred(v), v, succ(v),
succ(succ(v))
• join and graceful leave operation:
1
0
v
Pure Peer-to-Peer Network
We also focus on [0,1).
Every peer mapped to random point in [0,1).
0
1
v
Peers form cycle based on points.
• Chord: cryptographic hash function
• CAN: random number
Continuous-Discrete Approach
Problem: cycle not a good routing topology!
1
0
long paths!
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Continuous-discrete Approach
•
•
•
•
V: set of peers, U: virtual space
Each v 2 V mapped to region R(v) ½ U
Family F of functions f:U ! U
{v,w} edge , [F(R(v)) Å R(w)] [ [F(R(w)) Å R(v)] = ;
Continuous-discrete Approach
Basic questions:
• How to map peers to regions?
• What family F to choose?
Continuous-discrete Approach
• Take a classical family of networks
(Hypercube, de Bruijn graph,…)
• Convert it into continuous form by
interpreting node labels as points in U,
edges as a family of functions F
• Mapping peers to regions will then convert
continuous form back into discrete graph.
Hypercube
Classical hypercube:
• V: nodes with labels (x1,…,xd) 2 {0,1}d
• For all i: (x1,…,xd) ! (x1,..,1-xi,..,xd)
Continuous version of hypercube:
• Interpret (x1,…,xd) as z=i xi/2i
• d ! 1: U=[0,1)
• F: fi+(x) = x+1/2i, fi-(x) = x-1/2i 8 i>0
De Bruijn Graph
Classical de Bruijn graph:
• V: nodes with labels (x1,…,xd) 2 {0,1}d
• E: (x1,…,xd) ! (0,x1,…,xd-1), (1,x1,…,xd-1)
Continuous de Bruijn graph:
• Interpret (x1,…,xd) as z=i xi/2i
• d ! 1: U=[0,1)
• F: f0(x) = x/2, f1(x) = (1+x)/2
Gabber-Galil Graph
Classical Gabber-Galil graph:
• Node set: (x,y) 2 {0,…,n-1}2
• (x,y) ! (x,x+y),(x,x+y+1), (x+y,y),
(x+y+1,y) (mod n)
Continuous Gabber-Galil graph:
• n ! 1: U=[0,1)2
• F: f1(x,y)=(x,x+y), f2(x,y)=(x+y,y)
Continuous-discrete Approach
• Take a classical family of networks
(Hypercube, de Bruijn graph,…)
• Convert it into continuous form by
interpreting node labels as points in U,
edges as a family of functions F
• Mapping peers to regions will then convert
continuous form back into discrete graph.
Supervised Overlay Network
• How to map peers to regions?
• Consider any space
U=[0,1)d
• Hierarchical
decomposition tree:
Supervised Overlay Network
0
1
000
001
01
10
11
Supervised Overlay Network
Fact:
• Volumes of subcubes assigned to nodes
differ by factor of at most 2.
• Subcubes pairwise disjoint.
• Union of subcubes gives U.
Combine this with family F of functions.
Join Operation
v
w
0
1
000
001
010
01
011
10
11
Join Operation
000
001
10
f
R(v) R(v)R(w)
f’
11
{u,v} edge , [F(R(u)) Å R(v)] [ [F(R(u)) Å R(v)] = ;
Join Operation
v
w
w inherits connections from v
0
1
000
001
010
01
011
10
11
Leave Operation
v inherits connections from w
v
w
0
1
000
00
001
01
10
11
Supervised Overlay Network
For any supervised network based on
continuous-discrete approach with [0,1)d:
• Sufficient if supervisor introduces new
peer to cycle neighbors. From these, new
peer can get all F-connections
• Join/leave can be performed with constant
time and work for supervisor.
High robustness:
• Sufficient to secure base cycle!
Peer-to-Peer Overlay Network
We focus on U=[0,1).
Every peer mapped to random point in [0,1).
1
0
v
v owns region
[v,succ(v))
Join Operation
• New peer chooses random position x.
• Route to peer v owning position.
0
v
x
1
• Inherit all relevant edges w.r.t. F from v
Leave Operation
• Node that wants to leave transfers its
connections to its predecessor.
0
1
Peer-to-Peer Overlay Network
Scalability: with hypercube / de Bruijn
• network has logarithmic diameter
• peers have (poly-)logarithmic degree
• join/leave need (poly-)logarithmic
time/work (w.h.p.)
Robustness:
• Make sure base ring is robust!
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Maintaining a robust cycle
Problem: cycle very fragile structure!
1
0
Maintaining a robust cycle
Solution: connect to (log n) nearest
neighbors
Chernoff
1 0bounds: nodes still
2 nearest
connected under constant
fraction of random failures
(with high probability)
Nodes randomly distributed
on cycle: constant fraction of
correlated failures redu-ces
to random failure case
Maintaining a robust cycle
Problem: what if adversarial peers are part
of in the system?
system cannot distinguish
between peers!
honest peers
adversarial peers
Supervised cycle
1
0
v
w
Nodes connect to (log n) nearest neighbors:
Hard for adversarial peers to isolate honest peers
Peer-to-peer cycle
Chord: uses cryptographic hash function to map peers to
points in [0,1)
• randomly distributes honest peers
• does not randomly distribute adversarial peers
Peer-to-peer cycle
CAN: map peers to random points in [0,1)
Peer-to-peer cycle
Group spreading:
• Map peers to random points in [0,1)
• Limit lifetime of peers
Too
expensive!
Peer-to-peer cycle
How can the system enforce an even
distribution of honest and adversarial peers
in the [0,1) space???
Peer-to-peer cycle
• n honest peers, n adversarial peers
• partition [0,1) space into regions of size
(c log n)/n for some constant c
For any regionscalability
I ½ [0,1) of size (c log n)/n:
• Balancing condition: (log n) peers in I
• Majority condition: honest peers in majority
robustness
How to satisfy conditions?
• Rule that works: k-cuckoo rule
n honest
n adversarial
evict k/n-region
 < 1-1/k
Limitation of k-cuckoo rule
• Only works for any sequence of join and
leave requests of adversarial peers.
• Does not work for any sequence of join
and leave requests.
Example: adversary orders all peers in a
region of size O(log n / n) to leave
Solution: also rearrangements for leave Op.
k-Flip&Cuckoo Rule
• Join: as before (k-cuckoo rule)
• Leave: choose random k/n-region among neighboring
(c log n) k/n-regions, empty & flip it with random k/nregion
n honest
n adversarial
flip
join
Random Number Generation
Critical component:
robust distributed random number generator
Solution:
• very simple (no error-correcting codes)
• works for public channels
• even if constant fraction is adversarial
Trick: generate groups of random numbers
Maintaining a robust cycle
• So far, only proactive techniques (i.e.,
techniques that protect cycle)
• Proactive techniques expensive and have
their limits (minority of adv. peers)
• Also reactive techniques needed (i.e.,
techniques that can recover cycle)
Recovering the cycle
First approach: recover sorted list
20
5
8
12
2
5
2
8
12
20
Recovering a sorted list
Naïve approach:
Not
easy
to
check!
• Continuously collect info about neighbors
of neighbors until all nodes known
• Transform neighborhood into sorted list
Not scalable!
Initial
graph
Recovering a sorted list
Better approach: linearization
Every node does the following locally:
3
5
8
12
14
16
coordination problem
3
5
8
12
14
16
Recovering a sorted list
Naïve solution of coordination problems:
• Suppose that time is synchronized
• In each round (2 time steps) each node v:
– right linearization
v
v
– left linearization
v
v
Recovering a sorted list
Correctness of right/left linearization:
• Consider arbitrary consecutive pair v,w
v
w
range of path from v to w
• Range reduces by 1 in each round
Recovering a sorted list
Correctness of right/left linearization:
• Consider arbitrary consecutive pair v,w
v
w
range of path from v to w
Recovering a sorted list
Correctness of right/left linearization:
• Consider arbitrary consecutive pair v,w
v
w
range of path from v to w
• degree increases by +2 in each round
Recovering a sorted list
More realistic approach: take asynchronous
behavior into account
• Peers operate in actions:
<label>: <guard> ! <commands>
• v.NB: neighbor list of v
• we assume: w 2 v.NB , v 2 w.NB
v
w
{v,w}: 0/1
edges like shared variables
no edges {v,v}
Recovering a sorted list
safe
executed
sequentiallyofin
u.L,
u.R:ifleft
/ right neighborhood
u each node
Actions for node
u: until w2 u.NB and u2 w.NB
wait
• grow right: (v 2 u.R) Æ (w 2 v.L) Æ (w 2 u.NB) !
u.NB := u.NB [ {w}
w
u
v
u
w
v
• trim right: (v,w 2 u.R) Æ (w 2 v.L) !
u.NB := u.NB n {v}
preferred op to keep degree low
• grow left and trim left similar
Recovering a sorted cycle
Establish wrap-around edge:
• v.wa: wrap-around edge of v
• we assume: v.wa = w , w.wa=v
• v sets v.wa to w: v.NB:=v.NB [ {v.wa}, v.wa:=w
Problem: more cases for initial state!
Recovering a sorted cycle
Additional actions for node u:
• wrap: (u.L=;) Æ (u.wa=?) Æ (w 2 u.R) !
u.wa := w
u
w
• extend: (u.L=;) Æ (u.wa=?) Æ (w2 u.wa.R) !
u.wa := w
u
w
• unwrap: (u.L=;) Æ (u.wa=?) Æ (u.wa>u) !
u.wa := ?
v
u
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Skip Graphs
Problem: messages between local peers
may be sent across world
Skip Graphs
Better:
• Give nodes hierarchically specified names
europe.germany.bavaria.munich.tum
• Sort nodes according to names
name space
Problem: high imbalance,
so cont-disc approach does not work!
Skip Graphs
• Each node v has arbitrary unique name
ID(v) and random bit string s(v)
• prefixi(s(v)): first i bits of s(v)
Skip graph rule:
For every node v and i 2 IN0:
• v connects to closest successor and predecessor w (w.r.t. ID(v) ) with prefixi(s(w))
= prefixi(s(v))
Skip Graphs
Nodes v with s(v)=0…
Nodes v with s(v)=1…
Skip Graphs
Hierarchical view:
000
001
00
01
10
0
11
1

(log n) Degree, (log n) diameter, (1) expansion w.h.p.
Routing in Skip Graphs
Asia
Europe
O(log n) hops w.h.p.
Australia
America
Africa
The Hyperring
Is randomization in skip graphs necessary?
Hyperring: deterministic form of skip graph
Approach similar to skip graphs: organize
nodes in cycle according to real names.
Apple
Banana
Cherry
Shortcuts: Intertwined Rings
bridge
Join and Leave
• Inserting a node: bottom up
Join and Leave
• Deleting a node: bottom up
k-separated Hyperring
In every level, bridges are k nodes apart.
How large does k have to be to guarantee
polylogarithmic expansion  ?
Theorem:  =
2
W(1/
k
)
(1/n)
So k has to be non-constant ( W( log n ) ).
Do areas with old insertions/deletions have to be
revisited??
k-separated Hyperring
Rule: Choose k=6(d+3)
d: current degree of node initiating op.
Theorem:
• degree: O(log n)
• expansion: W(1/log n)
• congestion for permutations: O(log n)
w.h.p.
• work for Join/Leave: O(log 3 n)
Locality-aware Overlay Networks
Problem: in general, a distance metric cannot be embedded well into 1-dimensional
space
So applicability of skip graphs limited
Use different construction based on Plaxton,
Rajaraman and Richa
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Locality-aware Overlay Networks
For a node v let
• s(v) be its random bit string and
• Bi(v) be ball around v of minimum radius
so that Bi(v) contains c 2i log n peers
B3(v)
B1(v)
B2(v)
Locality-aware Overlay Networks
Assumption: growth-bounded metric
• N(v,r): set of nodes w with d(v,w) < r
• There is a constant >0 so that
|N(v,(1+)r)| < 2|N(v,r)| all v, r
B3(v)
B1(v)
B2(v)
Locality-aware Overlay Networks
Topology: for every node v and i 2 IN:
• v connects to all nodes w 2 Bi(v) with
prefixi-1(s(v)) = prefixi-1(s(w))
B3(v)
B1(v)
B2(v)
c 2i log n peers
in Bi(v)
Locality-aware Overlay Networks
Topology rule implies:
• degree of each node (log2 n) w.h.p.
• v has nodes w in Bi(v) with prefixi(s(w)) =
prefixi-1(s(v)) ± x for all x 2 {0,1} w.h.p.
B3(v)
B1(v)
B2(v)
c 2i log n peers
in Bi(v)
Locality-aware Routing
Routing from v to w:
• s(v)=(x1 x2 x3…), s(w)=(y1,y2,y3,…)
• v ! closest u1 in B1(v) with prefix1(u1) = y1
• u1 ! closest u2 in B2(u1) with prefix2(u2) = y1 y2
• …
• until we reach uk-1 with w in Bk(uk-1)
Locality-aware Routing
=1
B1(v)
B2(v)
u2 u1v
B2(u1)
B3(u2)
w
B3(u1)
Locality-aware Routing
Let r(B) be radius of ball B.
• d(u1,v) < r(B1(v))/ w.h.p. (  = (log1+ c) )
• r(B2(u1)) > (1+-1/) r(B1(v))
• d(u2,u1) < r(B2(u1))/ w.h.p.
• r(B3(u2)) > (1+-1/) r(B2(u1))
• …
After k hops ( r=r(B1(v)) ):
• d(uk, w) < d(v,w) + i=0k-1 (1+-1)i r/
< d(v,w) + (-1)-1 r (1+-1/)k
• r(Bk+1(uk)) > (1+-1/)k r
Locality-aware Routing
After k hops ( r=r(B1(v)) ):
• d(uk, v) < i=0k-1 (1+-1)i r/
< (-1)-1 r (1+-1/)k
• r(Bk+1(uk)) > (1+-1/)k r
u
k
v
Finally, w 2 Bk+1(uk):
• d(v,w) > r(Bk(uk-1)) – d(uk-1,v)
> (1-1/(-1)) (1+-1/)k-1 r
• d(uk,v) < d*=(-1)-1 r (1+-1/)k and
total path length < 2d*+d(v,w)
d* < (/2)d(v,w) if  > 2(1+)/+2
w
Overview
• Graph Theory
• Supervised and Peer-to-Peer Overlay
Networks
• Continuous-Discrete Approach
• Maintaining a robust Cycle
• Skip Graphs
• Locality-aware Overlay Networks
• Networks for non-uniform Peers
Networks for non-uniform peers
Problem: peers have non-uniform bandwidth
Cont-disc and skip graphs do not work!
Networks for non-uniform peers
Ad-hoc solutions:
• cut large peers into many small peers
• multi-tier network
Better approach:
• organize peers
in a heap
How to design scalable distributed heap?
Networks for non-uniform peers
dB(1)
3 levels
4 levels
PAGODA heap network
dB(2)
dB(d): leveled de Bruijn
graph of dimension d
dB(3)
v
dB(4)
w
5 levels
………………..
Routing between v and w via nodes of two dB-levels up
Join
~log2
n levels
dB(1)
PAGODA heap network
dB(2)
4 levels
dB(3)
dB(4)
dB(d): leveled de Bruijn
graph of dimension d
5 levels
………………..
Move upwards until all parents have
larger bandwidth
Leave
~log2
n levels
dB(1)
PAGODA heap network
dB(2)
4 levels
dB(3)
dB(4)
dB(d): leveled de Bruijn
graph of dimension d
5 levels
………………..
Set bandwidth to 0, send downwards until
no further children, remove node
Networks for non-uniform peers
~log2
n levels
dB(1)
PAGODA heap network
dB(2)
dB(3)
dB(d): leveled de Bruijn
graph of dimension d
dB(4)
………………..
Problem: updating PAGODA may need O(log2 n) time
Networks for non-uniform peers
SHELL network: oblivious heap
Join operation:
O(log n) time
Leave operation:
O(1) time
Conclusions
Many interesting fronts to work on in context
of scalable distributed systems:
• self-optimizing networks
• social networks
• proactive approaches
• reactive approaches
(repairs under adversarial presence)
• new paradigms
Questions?
Supervised Overlay Network
1
0
v
Download