SHELL: A Distributed and Oblivious Heap with Applications for Robust Information Systems and Heterogeneous Peer-to-Peer Networks Christian Scheideler Stefan Schmid Network Algorithms Summer 2008 Bevor wir SHELL anschauen... • Prof. Scheideler an Konferenz • Deshalb: Spezialprogramm • Shell - Baut auf gelerntem auf! DISTRIBUTED COMPUTING - Ongoing work... Keine Unterlagen Hat noch Lücken, ev. auch Fehler / Slides auf Englisch damit auch sonst mal gebrauchbar! Offen für Inputs / Ideen! Stefan Schmid @ TU München, 2008 2 Motivation • Today, still many challenges in distributed systems (e.g., the Internet) • E.g., viruses, spam, DoS attacks, selfish users, etc. • Very active research • DISTRIBUTED COMPUTING For example, peer-to-peer computing - Dynamics / churn: Peers join and leave frequently - In 1,000,000 network where peer sessions are around 60 minutes, there are hundreds of membership changes every second! - Peer-to-peer based on contributions of participants: problematic if users are selfish! - E.g., BitThief free-rides in BitTorrent - Heterogeneity: peers have different Internet connections, different CPUs, run different operating systems, etc. Stefan Schmid @ TU München, 2008 3 SHELL Overview • SHELL = our overlay architecture • Basically, a distributed heap • Refresher: min heap - children have larger key DISTRIBUTED COMPUTING than parent - e.g., useful for priority queues (fast removeMin()) slide from GAD lecture 2008... Stefan Schmid @ TU München, 2008 4 Heap Refresher • Heap in GAD... Stefan Schmid @ TU München, 2008 5 A Distributed Heap? • What is a distributed heap? • We assume that peers have a key / order / rank / id - for example: time when peer joined • (Min-) heap property: Peers only connect to peers of lower order DISTRIBUTED COMPUTING - for example: peers only connect to older peers - Shell constructs a directed overlay (however, backward edges, see later) 28 26 23 21 18 17 20 19 16 9 Stefan Schmid @ TU München, 2008 10 3 6 An Oblivious Distributed Heap? (1) • What is an oblivious distributed heap? • Oblivious = overlay topology only depends on set of currently active peers (and their IDs / orders) in the network DISTRIBUTED COMPUTING - but not on history, e.g., on time when these peers joined! - example: if at join time, a new peer is inserted at the end of a list of peers, the resulting topology is not oblivious - example: if a new peer is inserted in a list of peers with respect to the peer‘s order, the topology is oblivious Stefan Schmid @ TU München, 2008 7 An Oblivious Distributed Heap? (2) • Why is oblivious good? - the oblivious property is useful when it comes to fault-tolerance - e.g., desktops may crash temporarily, and will then rejoin - if topology is oblivious, peers can „remember“ their old contacts, and when an old contact reappears, it can be integrated immediately (instantaneous rejoin) DISTRIBUTED COMPUTING • Many systems today are oblivious - e.g., Pastry, Chord, etc. - but not: e.g., Pagoda - many systems in practice are not: Gnutella, BitTorrent, etc. Stefan Schmid @ TU München, 2008 8 Objectives of Shell • Primary goal: dynamic and robust overlay • In particular: - maintaining heap property - low peer degree, low network diameter, low congestion - fast join / rejoin / leave DISTRIBUTED COMPUTING - peers can simply crash • Applications - i-SHELL: A distributed information system robust to Sybil attacks - h-SHELL: A peer-to-peer system for heterogeneous environments Stefan Schmid @ TU München, 2008 9 Overlay Graph (1) • How to achieve these goals? • Overlay based on continuous-discrete approach - basically a de Bruijn graph • Refresher: continuous-discrete approach - peers in cyclic [0,1)-interval - connected to peer responsible for continuous position x/2 and (x+1)/2 Stefan Schmid @ TU München, 2008 10 Overlay Graph (2) • Our distributed heap has larger peer degree • Space is divided into different partitions - partition i = 2i intervals of size 1/2i - global partition renders analysis simpler („same views“) Stefan Schmid @ TU München, 2008 11 Overlay Graph (3) • Peer connects to all peers of lower order in - Level-i home interval (interval which includes position x of peer) - Adjacent level-i intervals to home - de Bruijn intervals: intervals which include position x/2 and (x+1)/2 • What is level i? - Level i chosen such that there are c log np peers in interval - np = total number of peers in system with lower order - np can be estimated, in the following we assume it is given Stefan Schmid @ TU München, 2008 12 Overlay Graph (4) • In order to ensure connectivity when many peers leave, interval size must be increased over time (peer upgrades to larger partition) • Similarly, if many peers of lower order join in interval, peers needs to downgrade • In addition to these forward edges, peers store incoming edges - called backward edges Stefan Schmid @ TU München, 2008 13 Overlay Graph (5) • These edges are already sufficient for Shell • However, in order to speed-up changes between levels, peer additionally store pointers to peers it would connect to if it upgraded - to „funnel“ to which peer would connect - of course, peer only connects to these lower order peers once they are on the corresponding level - requires notification mechanism Level 1 ... • ... Level i-2 In the following, we will not consider funnel edges in further detail! Level i-1 Level i Stefan Schmid @ TU München, 2008 14 Implication: Monotonicity • From this construction, we can already derive some properties • For instance, Shell features a monotonicity property: If two peers p and p‘ are connected to the same interval I and if p is of larger order than p‘, then p knows strictly more peers in I - because peers only connect to lower order peers in an interval Stefan Schmid @ TU München, 2008 15 Distributed Order...: A Simplification • In the following, we will assume that peers have distinct IDs • E.g., assigned at join time by network entry point • Otherwise: in case of multiple joins close in time, peers may not be able to decide which is older => need to introduce blackout zones, etc. • In the following, we will not consider this issue in more detail Stefan Schmid @ TU München, 2008 16 Analysis of Degree (1) • Topological description allows to analyze the peer degree • Peers employ the following strategy: if number of neighbors falls below c log n_p in at least one interval, all intervals are doubled • According to Chernoff bounds, it holds that if one interval contains c log n peers, there is no interval of size larger (1+d) c log n for any d > 0, with high probability. • Therefore, degree is in O(log n) w.h.p. - with funnel edges, the degree is log square Stefan Schmid @ TU München, 2008 17 Analysis of Degree (2) • What about incoming / backward edges? Stefan Schmid @ TU München, 2008 18 Routing (1) • The Shell overlay allows peers to route messages • Similarly to continuous-discrete routing (adjusting one bit after another) • Routing operation route(x) consists of two phases Phase 1: Route along forward edges to peer of lower order which is closest to x (or: to a lower order peer whose home region contains position x) Phase 2: Descent along backward edges to peer which is closest to x Implication: If a peer wants to send a message to a peer of lower order, only Phase 1 is necessary, and the message will not traverse any higher order peers! Stefan Schmid @ TU München, 2008 19 Routing (2) • Observe that in our overlay, peers have multiple neighbors which could be used for the next de Bruijn routing hop (log n neighbors per interval) • This can be exploited in order to minimize congestion • Routing policy: peer p always forwards packets to its neighbor which is of largest order among the eligible peers (lower order than p) • This alleviates load on very low order peers Stefan Schmid @ TU München, 2008 20 Routing (3) • Visualization of routing towards higher order peers • Messages travel towards lower order peers • But on each hop, as high order peer as possible is taken Stefan Schmid @ TU München, 2008 21 Routing (4) towards higher order peers • Analysis of Phase 1 - accoring to continuous-discrete routing, at most log n hops are needed to destination - we make the following observation: prob that this peer is located in the corresponding interval prob that all peers of order lower than p but higher than n_p-l_1 are in other interval Stefan Schmid @ TU München, 2008 22 Routing (5) towards higher order peers • Generally for i-th hop: • Summing up, after some lines of calculation, the probability that the final peer reached is of order np/2 or smaller is at most O(np-c) for some constant c With high probability, in first phase of routing, request travels to peer of order at least np/2. Stefan Schmid @ TU München, 2008 23 Routing (6) towards higher order peers • Definition of congestion: • So what is the congestion in the first routing phase? Stefan Schmid @ TU München, 2008 24 Routing (7) towards higher order peers • So what is the congestion in the first routing phase? See our argument before... At most k peers can send via p, routing path is of length log 2k and probability that it enters interval on2008 one of these hops is c log k / k Stefan Schmid @ TU München, 25 Routing (8) Theorem: First phase of routing terminates in logarithmic time and yields congestion of asymptotically log2 np. Stefan Schmid @ TU München, 2008 26 Routing (9) • Routing phase 2: descent along backward edges to higher order peers - idea: binary search which exploits monotonicity property - higher order peers know more about interval - on each level i, go to highest order peer which is located in interval which includes final position x - terminates in logarithmic time - logarithmic congestion: in each hop, a peer forwards at most one request Stefan Schmid @ TU München, 2008 27 Join and Leave • Join: similar to lookup, find highest order peer in final interval, get integrated • Leave: peers can even crash, not particular operation • Change of level in time O(1), update cost induced at other peers in O(log2 n) Stefan Schmid @ TU München, 2008 28 Application 1: i-Shell • i-Shell is a distributed information system • Idea: data management through consistent hashing approach • Generalized to multiple levels: on each level, data is stored on peer closest to x - on each hop during insertion, a replica is placed • Order of peers: time-stamps (assigned by network entry point) • Thus: peers only connect to older peers Stefan Schmid @ TU München, 2008 29 i-Shell • Therefore: - we immediately get that two peers p and p‘ can communicate on paths which include only peers which are of peers at least their age - this renders the communication independent of younger peers • Side benefit: measurement studies have shown that older peers typically have a longer remaining session time - renders topology more stable • Shells imply rebustness to various attacks • E.g., Sybil attack Stefan Schmid @ TU München, 2008 30 Sybil Attack (1) • Sybil attack - big problem in Internet - e.g., spam - Sybil: book by Flora Rheta about person with 16 identities • Attacker seeks to acquire many identities - e.g., to control large fraction of network • Countermeasures - virutal identities: captchas etc. - real identities? botnet? - Douceur has shown that issue is difficult to deal with in distributed environments... Stefan Schmid @ TU München, 2008 31 Sybil Attack (2) • Shell is resilient to Sybil attacks of any scale! • Model: Sybil attack starts at some time t0 • Theorem: traffic of old peers independent of Sybil attack • Techniques - Admission control - Rate control 3 5 4 7 10 8 traffic between older peers unaffected 9 12 21 14 15 Stefan Schmid @ TU München, 2008 11 higher peers can perform a rate control algorithm attack originates from lower peers 32 Application 2: h-Shell • Alternatively, IDs could represent inverse of the peers‘ capabilities • Therefore: peers only connect to peers with stronger capabilities • Interesting architecture for heterogeneous systems • Corollary: paths between strong peers only include strong peers • Interesting, e.g., for multi-quality live-streaming Stefan Schmid @ TU München, 2008 33 Conclusion • Distributed heap based on continuous-discrete appraoch • Oblivious for highly transient environments • Robustness to Sybil attacks of arbitrary scale • Alternatively, useful for heterogeneous environments • Work in progress... Stefan Schmid @ TU München, 2008 34 Stefan Schmid @ TU München, 2008 35