Scalable S l bl membership b hi management and failure detection INF5360 topic Presendted by Jan Erik Haavet janehaa@ifi.uio.no Papers presented Correctness of a Gossip Based Membership Protocol SCAMP: Peer-to-Peer Membership Management for Gossip Based Protocols Gossip-Based By Ayalvadi J. Ganesh, Anne-Marie Kermarrec, and Laurent Massoulié Newscast Computing By Andre Allavena, Alan Demers and John Hopcroft By Márk Jelasity, Wojtek Kowalczyk and Maarten van Steen CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays By Spyros Voulgaris, Daniela Gavidia and Maarten van Steen Outline Introduction Main paper Group membership Gossip based membership Correctness of a Gossip Based Membership Protocol Related work papers SCAMP Newscast CYCLON Group Membership Main motivation of group membership protocol Scalability Multicast in distributed systems Problem P bl iin llarge scale l networks t k Each node needs full membership knowledge to guarantee delivery y to all Reliability Need high probability of delivery Even when nodes join/leave/fail One often used protocol to achieve the above for group membership: b hi G Gossiping i i Gossip based membership Each node forwards a message to set of gossip targets Probabilistic guarantees of delivery. Reliable By setting amount of gossip targets large Fault tolerant By randomly B d l selecting l ti gossip i ttargets t Generally requires full group knowledge. Gossip based membership Each node forwards a message to set of gossip targets Probabilistic guarantees of delivery. Reliable By setting amount of gossip targets large Fault tolerant By randomly B d l selecting l ti gossip i ttargets t Generally requires full group knowledge. Gossip based membership Each node forwards a message to set of gossip targets Probabilistic guarantees of delivery. Reliable By setting amount of gossip targets large Fault tolerant By randomly B d l selecting l ti gossip i ttargets t Generally requires full group knowledge. Outline Introduction Main paper Group membership Gossip based membership Correctness of a Gossip Based Membership Protocol Related work papers SCAMP Newscast CYCLON Correctness of a Gossip Based Membership Protocol Paper motivation Importance of scalability and fault-tolerance in distributed system Has led to considerable research in multicast protocols using gossip This paper introduces a scalable gossip-based algorithm for local view maintenance. Can be combined with any application level gossip protocol that relies on randomly selected gossip partners S l bl Scalable Because it does not require full group membership knowledge at each node, only a local view Preserve connectivity and load balancing between nodes Correctness of a Gossip Based Membership Protocol - Properties Desirable properties Even load distribution / Load balancing Connectivity Probabilistic bounds on the degree of each node Low degree -> low load Reinforcement Even distribution of pointers to other nodes Avoid partitions Mixing g Local views that are uniform samples of membership set. Over time, local view changes and emulates complete membership N t ffully Not ll achieved hi d iin thi this work, k lilisted t d as ffuture t work. k Could be done as done in CYCLON, with timestamps How it works Protocol is based on each node having a local view Fixed-size random subset of group membership N number of nodes K the th size i off a local l l view i F fanout parameter W reinforcement weight Each node periodically updates its local view in rounds Join by copying local view of random node Leave by stopping to participate. No difference between stopping and failing node Maintaining the local view For each round, a node S will: Mixing: Construct a list L1 comprising the local views of F nodes chosen at random from S’s local view Reinforcement: Construct a list L2 of the other nodes that requested S’s local view during that round Create a new local view by choosing K distinct nodes from L1 and L2 W determines the selection distribution between L1 and L2. W = 0, no nodes from L2 W = 1, equal distribution W > 1, increasing amount of nodes chosen from L2 Protocol can be Synchronous, loosely synchronized or asynchronous Simulations show no significant difference A closer look Claim: The protocol automatically adapts and reequilibrates ilib t th the network t k Regardless of what caused the imbalance Two forces responsible for this Mixing: local views requested by node S Connectivity property Ensures that the graph does not partition This p pulling g of local views ensures that the number of edges g between partitions is balanced Mixing Set A 70% of nodes Edges Set B 30% if nodes Merge of local views results in a list of distinct peers. For each iteration, this merger will converge towards an even distribution between the number of edges from A to B and A to A A closer look – Continued Reinforcement: Nodes that requested S’s local view Node s positively reinforce nodes that pulled its local view by adding them to its new local view. If reinforcement weight W is > 1 Removes older/dead edges in approximately K/F rounds adds fresh edges Without, the network would collapse into star-like graph Isolated nodes would not be able to reenter mixing Some nodes with manyy in-edges. g Many nodes with few in-edges Simulation results Simulations show that performance of protocol is good For large scale test with up to 100 000 nodes High number of rounds before any partitioning happens Almost as good as a random graph M i Maximum iin-degree d always l b below l 4 4.5 5 ti times th thatt off random d graph h Scales well with increasing number of nodes Conclusion: Satisfactory protocol for local view maintenance Outline Introduction Main paper Group membership Gossip based membership Correctness of a Gossip Based Membership Protocol Related work papers SCAMP Newscast CYCLON Introductory overview of related work SCAMP: Peer-to-Peer Membership Management for G Gossip-Based i B dP Protocols t l Newscast Computing By Ayalvadi J. Ganesh, Anne-Marie Kermarrec, and Laurent Massoulié By y Márk Jelasity, y Wojtek j Kowalczyk y and Maarten van Steen CYCLON: Inexpensive Membership Management for Unstructured P2P Overlays By Spyros Voulgaris, Daniela Gavidia and Maarten van Steen SCAMP Motivation Expansion of internet wide distributed applications Need scalable reliable group communication Previous work work, as of 2003 Assume that each member has full group membership knowledge Not feasible for very large-scale groups (Scalability) Each member needs partial membership set (Decentralization) Goal A id th Avoid the need d tto kknow ffullll view i size i The partial view size is dependent on full view size Partial view size automatically y adepts p to full view size as it g grows SCAMP – Desired properties Scalability Reliability Large enough L h partial ti l views i tto supportt gossip i protocols t l with ith hi high h reliability Decentralization Size of partial view grows with full view size Partial views should be update as members leave/join while maintaining scalability and reliability Isolation recovery Recover isolated nodes P ti l views Partial i should h ld change h ffor each h message sentt using i th thatt partial view SCAMP – How it works Subscription Contact: New node P sends subscription request to arbitrary member Q New subscription: On contact Q forwards P P’ss id to all members in its partial view Forwarded subscription: A node in Q’s view receives s bscription and inserts it in its o subscription own n view ie with ith a probabilit probability p. p If it does not succeed, P’s id is forwarded to a random node in the partial view Keeping a subscription: Each node maintains two lists, a PartialView of nodes it sends gossip messages to and an InView of nodes that contain its node id in their partial view. When it keeps a subscription, it stores that id in its InView SCAMP – How it works D B E A 1. Contact C F SCAMP – How it works D 2. New subscription B E A 1. Contact C F SCAMP – How it works 3. Forwarded subscription p 4. Keeping a subscription D 2. New subscription B E A 1. Contact C F SCAMP – How it works Unsubscription Tell all nodes in P’s InView that its leaving The message includes a node Q from P’s partial view list The nodes receiving this message replaces P in its partial view with the Q To avoid isolated nodes in case of node failures Nodes send periodic heartbeat messages A node is isolated if it does not receive heartbeat messages for a while. hil To remove itself from isolation, it resubscribes SCAMP – Rebalancing the graph Cannot expect new subscribers to select their contact uniformly if l ffrom th the whole h l membership b hi sett May lead to an unbalanced graph Two mechanisms proposed to achieve balance Indirection: Contact node Q for new subscription does not handle the request. q Instead it forwards the request q for contact. Q uses a forwarding rule to chose which node to forward to from its partial view A stopping rule decides if the request should be handled by a node or forwarded using the forwarding rule Lease: Nodes only have a subscription for a leased time Aft lease After l time ti is i over, it will ill h have tto resubscribe b ib Resubscription contact is chosen randomly from its partial view SCAMP – Results Scalability Reliability R Results lt confirm fi th thatt gossip i protocols t l are good d ffor reliability li bilit Decentralization Partial view size shown to grow with full view size Rebalances itself with lease and indirection mechanism Isolation recovery Heartbeat allows for resubscription for isolated nodes Newscast Motivation Monitoring of large computer networks Failure detection Peer-to-peer Peer to peer protocol Maintains and disseminates up to date information and membership data data. Aimed at large and dynamic distributed environment Provides information dissemination service to applications Resilient to peer failures Newscast - How it works Each peer keeps a small fixed-size cache of C news items. it Cache entry: Contains C t i a news ititem, a ti timestamp t and d a peer address. dd Each news item contains application id and some news. Address Timestamp Appid News data News item Newscast - How it works 2 At intervals T, each peer: Gets news from application, timestamps it and adds local peer address to the cache entry Finds a random peer in cache addresses Sends all cache entries to this peer Receives all cache entries from that peer Passes on cache entries (containing ( news items)) to application Merges old cache with received cache Keeps at most C cache entries Does not require synchronization Throws away oldest entries Only needs to normalize incoming cache entries Can result in errors, but sufficient for this work Passive peer does the same as initiating peer, except peer selection Newscast – How it works Application Application 1. News item Peer Peer Newscast – How it works Application Application 1. News item Peer 2. Add news Item to cache Peer Newscast – How it works Application Application 1. News item Peer 2. Add news Item to cache Peer 3. Find a random node 3 and exchange cache Newscast – How it works Application 1. News item Application 4. Deliver news to application 4. Deliver news to application Peer 2. Add news Item to cache Peer 3. Find a random node 3 and exchange cache Newscast – How it works Application 1. News item Application 4. Deliver news to application 4. Deliver news to application Peer 2. Add news Item to cache Peer 3. Find a random node 3 and exchange cache 5. Merge 5 e ge o old d cac cache e with t incoming co g cac cache e Newscast – Membership management Membership management is disseminated together with news items it Join A peer jjoins i b by iinitializing iti li i itits cache h with ith att lleastt one kknown peer Leave Failing/leaving node treated the same way As there is a timestamp with each cache entry, failed nodes will quickly disappear from caches Newscast – Wanted Properties Self-organizing Effective Quality of service should not decrease when scaling up Robust I f Information ti di dissemination i ti should h ld b be ffastt and d predictable di t bl Scalable No matter what join/leave patterns, it should organize itself Should handle massive node failures Show empirical evidence of listed properties Newscast – Empirical Evidence Results show that the average path length converges to a low l value l Clustering coefficient not as good as random graph For non random join sequences After radical fluctuation of membership Robust, effective and self-organizing About the same as random graph g p Information dissemination still effective due to good average path length Communication cost Ab t 2*cache About 2* h entry t size i every cycle l * some application specific size CYCLON Motivation: Content based searching in peer-to-peer overlays Contribution: Framework for inexpensive membership management g While retaining random-graph properties Gossip-based p membership p management g p protocol Resilient to massive node failures Handles high g churn rate Low membership management cost Shown to construct membership pg graphs p with: Low diameter Low clustering factor Highly symmetric node degrees CYCLON – Basic Shuffling Simple peer-to-peer communication protocol. Each peer has a small fixed-size cache of C peers. Each entry contains an address to another peer. At Intervals I t l T, T each h peer P: P Selects a random subset L of the cache and a random peer Q within this subset subset. Replaces Q’s address with P’s Sends subset to Q Receives a subset of Q’s cache. Discards redundant entries Fill cache h with ith new set, t replacing l i those th entries t i sentt to t Q Q also merges its cache with subset from P CYCLON – Basic Shuffling Figure g from Voulgaris g et al. Note: Connectivity is directional. In (a), 9 is 2’s neighbor, but not vice versa versa. In (b) it it’ss the opposite opposite. CYCLON – Enhanced Shuffling Contribution from CYCLON Almost the same as basic shuffling Key difference is that: Peers do not choose who to shuffle with randomly Instead, it uses the oldest entry in the cache This prevents dead peers from lingering Another difference is: Limit the lifetime of each cache entry entry. Allows control over the number of existing cache entries pointing to one peer. CYCLON – Join/leave Join Without disrupting randomness Joining node does a random walk of average path length Node where it stops exchanges 1 cache entry Node continues and repeats until cache is filled Leave Failed/leaving nodes are the same No heartbeat/keep alive messages Due to age of cache entries, dead nodes will be removed CYCLON – Basic properties Connectivity No node becomes disconnected as a result of shuffling Convergence S ll average path Small th llength th iis good d ffor iinformation f ti di dissemination i ti Results show that average g p path length g converges g to a small value over time. Lower communication cost and delay This value is comparable to average path length of a random graph Clustering coefficient should be low High coefficient increases chances of partitioning It is also not optimal for information dissemination Many redundant messages Also comparable to clustering coefficient of a random graph Both clustering and average path length converge exponentially CYCLON – Basic properties 2 Degree Distribution Robustness Load balancing Robustness in presence of failures Avoid poorly/highly connected nodes Results show that enhanced shuffling outperforms basic shuffling in degree distribution. It also outperforms random graph. Results show that CYCLON is able to heal itself after massive node d ffailure il Bandwidth Estimations show that each node needs a total of 40*L 40 L bytes per shuffle, which is very low Questions – Discussion