Peer-to-Peer Networks 0 1 Christian Scheideler Institut für Informatik Technische Universität München Motivation • Every distributed system must be based on a network interconnecting its sites • Network: of physical or logical nature Physical Network Supercomputers, multicore systems,… Logical Network Internet Overlay Network Internet Overlay Network Overlay Network Basic question: how to organize sites in a scalable and robust overlay network??? Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Graph theory Graph G=(V,E): v knows w • V: set of nodes / vertices • E ½ { (v,w) | v,w 2 V}: set of edges / arcs v can send info to w valid path B A C D Graph theory • (v,w): distance (length of shortest path) of w to v in G • D=maxv,w (v,w): diameter of G B D C A D=4 Graph theory • (U): set of neighbors of node set U • (U)=|(U)| / |U| • (G) = minU,|U|<|V|/2 (U): expansion of G B D C A |U|=2 U |(U)|=1 Graph theory Network G=(V,E,c): • V: set of nodes, E: set of edges • c:E ! IR+: edge capacities 2 B A D C Graph Theory Unless mentioned otherwise: • All edges have capacity 1 • {v,w} represents {(v,w), (w,v)} B A D C Network topologies Ideally, complete network: Problem: does not scale well! (~n2 edges) Line Network • degree 2 (optimal), BUT • diameter bad (n-1 for n nodes) • expansion bad ( (line) = 2/n ) How to get a low diameter? Binary Tree 0 depth k k • n=2k+1-1 nodes, degree 3 • diameter is k = 2 log2 n, BUT • expansion is still bad ( (tree)=2/n ) 2-dimensional Grid 1 side length k k • • • • n = k2 nodes, maximum degree 4 diameter is 2(k-1) < 2 n expansion is ~2/ n Not too bad, but can we get better values? Hypercube • Nodes: (x1,…,xd) 2 {0,1}d • Edges: 8 i: (x1,…,xd) ! (x1,..,1-xi,..,xd) d=1 d=2 d=3 Routing: (x1,x2,…,x ! (y1,x2,…,x Degree d, diameter d,d)expansion 1/d) d ! (y1,y2,x3,…,xd) ! … ! (y1,y2,…,yd) Butterfly • Nodes: (k,(xd,…,x1)) 2 {0,..,d} £ {0,1}d • Edges: (k-1,(xd,…,x1)) ! (k,(xd,..,xk,..,x1)) (k,(xd,..,1-xk,..,x1)) 00 0 1 01 10 11 0 0 1 1 2 Routing: (0,(x1,x2,…,xd)) ! (1,(y1,x2,…,xd)) Degree 4, diameter 2d, expansion ~1/d ! (2,(y1,y2,x3,…,xd)) ! … ! (d,(y1,y2,…,yd)) Cube-Connected-Cycles • Nodes: (k,(x1,…,xd)) 2 {0,..,d-1} £ {0,1}d • Edges: (k,(x1,…,xd)) ! (k-1,(x1,...,xd)) (k+1,(x1,..,xd)) (k,(x1,..,1-xk+1,..,xd) De Bruijn Graph • Nodes: (x1,…,xd) 2 {0,1}d • Edges: (x1,…,xd) ! (0,x1,…,xd-1) (1,x1,…,xd-1) 01 001 011 010 00 11 10 101 000 111 100 (x1,…xd) ! (yd,x1,…xd-1) ! (yd-1,yd,x1,…,xd-2) ! … 110 The Diameter Theorem: Every graph of maximum degree d>2 and size n must have a diameter of at least (log n)/(log(d-1))-1. Theorem: For every even d>2 there is a family of graphs of maximum degree d and size n with diameter tree(log of n)/(log d -1). all reachable nodes at dist. k The Expansion Theorem: For every graph G the expansion (G) is at most 1. Theorem: There are families of constant degree graphs with constant expansion. Example: Gabber-Galil Graph • Node set: (x,y) 2 {0,…,n-1}2 • (x,y) ! (x,x+y),(x,x+y+1), (x+y,y), (x+y+1,y) (mod n) Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Overlay Network Basic question: how to organize sites in a scalable and robust overlay network??? Robustness: Scalability: works can handle efficiently faults forand large malicious number behavior of sites Server-based approach Internet server Does not scale well! sites Alternatives Supervised overlay network Supervisor assists in maintaining network Peer-to-peer overlay network Peers maintain network themselves Overlay Network Problem: How to maintain an overlay network as peers join and leave? Supervised Overlay Network • Supervisor assigns peers to points in [0,1) so that peers evenly distributed • Neighboring peers connect to form cycle 1 0 0 7/8 1/8 1/4 3/4 5/8 3/8 1/2 Supervised Overlay Network • Node v wants to join (n nodes in system): give it (n+1)th position • Node w wants to leave: move last node v to w‘s position 1 0 v w Supervised Overlay Network • v: node at nth position • supervisor: stores pred(v), v, succ(v), succ(succ(v)) • join and graceful leave operation: 1 0 v Pure Peer-to-Peer Network We also focus on [0,1). Every peer mapped to random point in [0,1). 0 1 v Peers form cycle based on points. • Chord: cryptographic hash function • CAN: random number Continuous-Discrete Approach Problem: cycle not a good routing topology! 1 0 long paths! Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Continuous-discrete Approach • • • • V: set of peers, U: virtual space Each v 2 V mapped to region R(v) ½ U Family F of functions f:U ! U {v,w} edge , [F(R(v)) Å R(w)] [ [F(R(w)) Å R(v)] = ; Continuous-discrete Approach Basic questions: • How to map peers to regions? • What family F to choose? Continuous-discrete Approach • Take a classical family of networks (Hypercube, de Bruijn graph,…) • Convert it into continuous form by interpreting node labels as points in U, edges as a family of functions F • Mapping peers to regions will then convert continuous form back into discrete graph. Hypercube Classical hypercube: • V: nodes with labels (x1,…,xd) 2 {0,1}d • For all i: (x1,…,xd) ! (x1,..,1-xi,..,xd) Continuous version of hypercube: • Interpret (x1,…,xd) as z=i xi/2i • d ! 1: U=[0,1) • F: fi+(x) = x+1/2i, fi-(x) = x-1/2i 8 i>0 De Bruijn Graph Classical de Bruijn graph: • V: nodes with labels (x1,…,xd) 2 {0,1}d • E: (x1,…,xd) ! (0,x1,…,xd-1), (1,x1,…,xd-1) Continuous de Bruijn graph: • Interpret (x1,…,xd) as z=i xi/2i • d ! 1: U=[0,1) • F: f0(x) = x/2, f1(x) = (1+x)/2 Gabber-Galil Graph Classical Gabber-Galil graph: • Node set: (x,y) 2 {0,…,n-1}2 • (x,y) ! (x,x+y),(x,x+y+1), (x+y,y), (x+y+1,y) (mod n) Continuous Gabber-Galil graph: • n ! 1: U=[0,1)2 • F: f1(x,y)=(x,x+y), f2(x,y)=(x+y,y) Continuous-discrete Approach • Take a classical family of networks (Hypercube, de Bruijn graph,…) • Convert it into continuous form by interpreting node labels as points in U, edges as a family of functions F • Mapping peers to regions will then convert continuous form back into discrete graph. Supervised Overlay Network • How to map peers to regions? • Consider any space U=[0,1)d • Hierarchical decomposition tree: Supervised Overlay Network 0 1 000 001 01 10 11 Supervised Overlay Network Fact: • Volumes of subcubes assigned to nodes differ by factor of at most 2. • Subcubes pairwise disjoint. • Union of subcubes gives U. Combine this with family F of functions. Join Operation v w 0 1 000 001 010 01 011 10 11 Join Operation 000 001 10 f R(v) R(v)R(w) f’ 11 {u,v} edge , [F(R(u)) Å R(v)] [ [F(R(u)) Å R(v)] = ; Join Operation v w w inherits connections from v 0 1 000 001 010 01 011 10 11 Leave Operation v inherits connections from w v w 0 1 000 00 001 01 10 11 Supervised Overlay Network For any supervised network based on continuous-discrete approach with [0,1)d: • Sufficient if supervisor introduces new peer to cycle neighbors. From these, new peer can get all F-connections • Join/leave can be performed with constant time and work for supervisor. High robustness: • Sufficient to secure base cycle! Peer-to-Peer Overlay Network We focus on U=[0,1). Every peer mapped to random point in [0,1). 1 0 v v owns region [v,succ(v)) Join Operation • New peer chooses random position x. • Route to peer v owning position. 0 v x 1 • Inherit all relevant edges w.r.t. F from v Leave Operation • Node that wants to leave transfers its connections to its predecessor. 0 1 Peer-to-Peer Overlay Network Scalability: with hypercube / de Bruijn • network has logarithmic diameter • peers have (poly-)logarithmic degree • join/leave need (poly-)logarithmic time/work (w.h.p.) Robustness: • Make sure base ring is robust! Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Maintaining a robust cycle Problem: cycle very fragile structure! 1 0 Maintaining a robust cycle Solution: connect to (log n) nearest neighbors Chernoff 1 0bounds: nodes still 2 nearest connected under constant fraction of random failures (with high probability) Nodes randomly distributed on cycle: constant fraction of correlated failures redu-ces to random failure case Maintaining a robust cycle Problem: what if adversarial peers are part of in the system? system cannot distinguish between peers! honest peers adversarial peers Supervised cycle 1 0 v w Nodes connect to (log n) nearest neighbors: Hard for adversarial peers to isolate honest peers Peer-to-peer cycle Chord: uses cryptographic hash function to map peers to points in [0,1) • randomly distributes honest peers • does not randomly distribute adversarial peers Peer-to-peer cycle CAN: map peers to random points in [0,1) Peer-to-peer cycle Group spreading: • Map peers to random points in [0,1) • Limit lifetime of peers Too expensive! Peer-to-peer cycle How can the system enforce an even distribution of honest and adversarial peers in the [0,1) space??? Peer-to-peer cycle • n honest peers, n adversarial peers • partition [0,1) space into regions of size (c log n)/n for some constant c For any regionscalability I ½ [0,1) of size (c log n)/n: • Balancing condition: (log n) peers in I • Majority condition: honest peers in majority robustness How to satisfy conditions? • Rule that works: k-cuckoo rule n honest n adversarial evict k/n-region < 1-1/k Limitation of k-cuckoo rule • Only works for any sequence of join and leave requests of adversarial peers. • Does not work for any sequence of join and leave requests. Example: adversary orders all peers in a region of size O(log n / n) to leave Solution: also rearrangements for leave Op. k-Flip&Cuckoo Rule • Join: as before (k-cuckoo rule) • Leave: choose random k/n-region among neighboring (c log n) k/n-regions, empty & flip it with random k/nregion n honest n adversarial flip join Random Number Generation Critical component: robust distributed random number generator Solution: • very simple (no error-correcting codes) • works for public channels • even if constant fraction is adversarial Trick: generate groups of random numbers Maintaining a robust cycle • So far, only proactive techniques (i.e., techniques that protect cycle) • Proactive techniques expensive and have their limits (minority of adv. peers) • Also reactive techniques needed (i.e., techniques that can recover cycle) Recovering the cycle First approach: recover sorted list 20 5 8 12 2 5 2 8 12 20 Recovering a sorted list Naïve approach: Not easy to check! • Continuously collect info about neighbors of neighbors until all nodes known • Transform neighborhood into sorted list Not scalable! Initial graph Recovering a sorted list Better approach: linearization Every node does the following locally: 3 5 8 12 14 16 coordination problem 3 5 8 12 14 16 Recovering a sorted list Naïve solution of coordination problems: • Suppose that time is synchronized • In each round (2 time steps) each node v: – right linearization v v – left linearization v v Recovering a sorted list Correctness of right/left linearization: • Consider arbitrary consecutive pair v,w v w range of path from v to w • Range reduces by 1 in each round Recovering a sorted list Correctness of right/left linearization: • Consider arbitrary consecutive pair v,w v w range of path from v to w Recovering a sorted list Correctness of right/left linearization: • Consider arbitrary consecutive pair v,w v w range of path from v to w • degree increases by +2 in each round Recovering a sorted list More realistic approach: take asynchronous behavior into account • Peers operate in actions: <label>: <guard> ! <commands> • v.NB: neighbor list of v • we assume: w 2 v.NB , v 2 w.NB v w {v,w}: 0/1 edges like shared variables no edges {v,v} Recovering a sorted list safe executed sequentiallyofin u.L, u.R:ifleft / right neighborhood u each node Actions for node u: until w2 u.NB and u2 w.NB wait • grow right: (v 2 u.R) Æ (w 2 v.L) Æ (w 2 u.NB) ! u.NB := u.NB [ {w} w u v u w v • trim right: (v,w 2 u.R) Æ (w 2 v.L) ! u.NB := u.NB n {v} preferred op to keep degree low • grow left and trim left similar Recovering a sorted cycle Establish wrap-around edge: • v.wa: wrap-around edge of v • we assume: v.wa = w , w.wa=v • v sets v.wa to w: v.NB:=v.NB [ {v.wa}, v.wa:=w Problem: more cases for initial state! Recovering a sorted cycle Additional actions for node u: • wrap: (u.L=;) Æ (u.wa=?) Æ (w 2 u.R) ! u.wa := w u w • extend: (u.L=;) Æ (u.wa=?) Æ (w2 u.wa.R) ! u.wa := w u w • unwrap: (u.L=;) Æ (u.wa=?) Æ (u.wa>u) ! u.wa := ? v u Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Skip Graphs Problem: messages between local peers may be sent across world Skip Graphs Better: • Give nodes hierarchically specified names europe.germany.bavaria.munich.tum • Sort nodes according to names name space Problem: high imbalance, so cont-disc approach does not work! Skip Graphs • Each node v has arbitrary unique name ID(v) and random bit string s(v) • prefixi(s(v)): first i bits of s(v) Skip graph rule: For every node v and i 2 IN0: • v connects to closest successor and predecessor w (w.r.t. ID(v) ) with prefixi(s(w)) = prefixi(s(v)) Skip Graphs Nodes v with s(v)=0… Nodes v with s(v)=1… Skip Graphs Hierarchical view: 000 001 00 01 10 0 11 1 (log n) Degree, (log n) diameter, (1) expansion w.h.p. Routing in Skip Graphs Asia Europe O(log n) hops w.h.p. Australia America Africa The Hyperring Is randomization in skip graphs necessary? Hyperring: deterministic form of skip graph Approach similar to skip graphs: organize nodes in cycle according to real names. Apple Banana Cherry Shortcuts: Intertwined Rings bridge Join and Leave • Inserting a node: bottom up Join and Leave • Deleting a node: bottom up k-separated Hyperring In every level, bridges are k nodes apart. How large does k have to be to guarantee polylogarithmic expansion ? Theorem: = 2 W(1/ k ) (1/n) So k has to be non-constant ( W( log n ) ). Do areas with old insertions/deletions have to be revisited?? k-separated Hyperring Rule: Choose k=6(d+3) d: current degree of node initiating op. Theorem: • degree: O(log n) • expansion: W(1/log n) • congestion for permutations: O(log n) w.h.p. • work for Join/Leave: O(log 3 n) Locality-aware Overlay Networks Problem: in general, a distance metric cannot be embedded well into 1-dimensional space So applicability of skip graphs limited Use different construction based on Plaxton, Rajaraman and Richa Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Locality-aware Overlay Networks For a node v let • s(v) be its random bit string and • Bi(v) be ball around v of minimum radius so that Bi(v) contains c 2i log n peers B3(v) B1(v) B2(v) Locality-aware Overlay Networks Assumption: growth-bounded metric • N(v,r): set of nodes w with d(v,w) < r • There is a constant >0 so that |N(v,(1+)r)| < 2|N(v,r)| all v, r B3(v) B1(v) B2(v) Locality-aware Overlay Networks Topology: for every node v and i 2 IN: • v connects to all nodes w 2 Bi(v) with prefixi-1(s(v)) = prefixi-1(s(w)) B3(v) B1(v) B2(v) c 2i log n peers in Bi(v) Locality-aware Overlay Networks Topology rule implies: • degree of each node (log2 n) w.h.p. • v has nodes w in Bi(v) with prefixi(s(w)) = prefixi-1(s(v)) ± x for all x 2 {0,1} w.h.p. B3(v) B1(v) B2(v) c 2i log n peers in Bi(v) Locality-aware Routing Routing from v to w: • s(v)=(x1 x2 x3…), s(w)=(y1,y2,y3,…) • v ! closest u1 in B1(v) with prefix1(u1) = y1 • u1 ! closest u2 in B2(u1) with prefix2(u2) = y1 y2 • … • until we reach uk-1 with w in Bk(uk-1) Locality-aware Routing =1 B1(v) B2(v) u2 u1v B2(u1) B3(u2) w B3(u1) Locality-aware Routing Let r(B) be radius of ball B. • d(u1,v) < r(B1(v))/ w.h.p. ( = (log1+ c) ) • r(B2(u1)) > (1+-1/) r(B1(v)) • d(u2,u1) < r(B2(u1))/ w.h.p. • r(B3(u2)) > (1+-1/) r(B2(u1)) • … After k hops ( r=r(B1(v)) ): • d(uk, w) < d(v,w) + i=0k-1 (1+-1)i r/ < d(v,w) + (-1)-1 r (1+-1/)k • r(Bk+1(uk)) > (1+-1/)k r Locality-aware Routing After k hops ( r=r(B1(v)) ): • d(uk, v) < i=0k-1 (1+-1)i r/ < (-1)-1 r (1+-1/)k • r(Bk+1(uk)) > (1+-1/)k r u k v Finally, w 2 Bk+1(uk): • d(v,w) > r(Bk(uk-1)) – d(uk-1,v) > (1-1/(-1)) (1+-1/)k-1 r • d(uk,v) < d*=(-1)-1 r (1+-1/)k and total path length < 2d*+d(v,w) d* < (/2)d(v,w) if > 2(1+)/+2 w Overview • Graph Theory • Supervised and Peer-to-Peer Overlay Networks • Continuous-Discrete Approach • Maintaining a robust Cycle • Skip Graphs • Locality-aware Overlay Networks • Networks for non-uniform Peers Networks for non-uniform peers Problem: peers have non-uniform bandwidth Cont-disc and skip graphs do not work! Networks for non-uniform peers Ad-hoc solutions: • cut large peers into many small peers • multi-tier network Better approach: • organize peers in a heap How to design scalable distributed heap? Networks for non-uniform peers dB(1) 3 levels 4 levels PAGODA heap network dB(2) dB(d): leveled de Bruijn graph of dimension d dB(3) v dB(4) w 5 levels ……………….. Routing between v and w via nodes of two dB-levels up Join ~log2 n levels dB(1) PAGODA heap network dB(2) 4 levels dB(3) dB(4) dB(d): leveled de Bruijn graph of dimension d 5 levels ……………….. Move upwards until all parents have larger bandwidth Leave ~log2 n levels dB(1) PAGODA heap network dB(2) 4 levels dB(3) dB(4) dB(d): leveled de Bruijn graph of dimension d 5 levels ……………….. Set bandwidth to 0, send downwards until no further children, remove node Networks for non-uniform peers ~log2 n levels dB(1) PAGODA heap network dB(2) dB(3) dB(d): leveled de Bruijn graph of dimension d dB(4) ……………….. Problem: updating PAGODA may need O(log2 n) time Networks for non-uniform peers SHELL network: oblivious heap Join operation: O(log n) time Leave operation: O(1) time Conclusions Many interesting fronts to work on in context of scalable distributed systems: • self-optimizing networks • social networks • proactive approaches • reactive approaches (repairs under adversarial presence) • new paradigms Questions? Supervised Overlay Network 1 0 v