Presentation slides

advertisement
Scalable
S
l bl membership
b
hi
management and failure
detection
INF5360 topic
Presendted by Jan Erik Haavet
janehaa@ifi.uio.no
Papers presented

Correctness of a Gossip Based Membership Protocol


SCAMP: Peer-to-Peer Membership Management for
Gossip Based Protocols
Gossip-Based


By Ayalvadi J. Ganesh, Anne-Marie Kermarrec, and Laurent
Massoulié
Newscast Computing


By Andre Allavena, Alan Demers and John Hopcroft
By Márk Jelasity, Wojtek Kowalczyk and Maarten van Steen
CYCLON: Inexpensive Membership Management for
Unstructured P2P Overlays

By Spyros Voulgaris, Daniela Gavidia and Maarten van Steen
Outline

Introduction



Main paper


Group membership
Gossip based membership
Correctness of a Gossip Based Membership Protocol
Related work papers



SCAMP
Newscast
CYCLON
Group Membership

Main motivation of group membership protocol


Scalability



Multicast in distributed systems
Problem
P
bl
iin llarge scale
l networks
t
k
Each node needs full membership knowledge to guarantee
delivery
y to all
Reliability

Need high probability of delivery


Even when nodes join/leave/fail
One often used protocol to achieve the above for group
membership:
b hi G
Gossiping
i i
Gossip based membership



Each node forwards a message to set of gossip targets
Probabilistic guarantees of delivery.
Reliable


By setting amount of gossip targets large
Fault tolerant


By randomly
B
d l selecting
l ti gossip
i ttargets
t
Generally requires full group knowledge.
Gossip based membership



Each node forwards a message to set of gossip targets
Probabilistic guarantees of delivery.
Reliable


By setting amount of gossip targets large
Fault tolerant


By randomly
B
d l selecting
l ti gossip
i ttargets
t
Generally requires full group knowledge.
Gossip based membership



Each node forwards a message to set of gossip targets
Probabilistic guarantees of delivery.
Reliable


By setting amount of gossip targets large
Fault tolerant


By randomly
B
d l selecting
l ti gossip
i ttargets
t
Generally requires full group knowledge.
Outline

Introduction



Main paper


Group membership
Gossip based membership
Correctness of a Gossip Based Membership Protocol
Related work papers



SCAMP
Newscast
CYCLON
Correctness of a Gossip Based
Membership Protocol

Paper motivation



Importance of scalability and fault-tolerance in distributed system
Has led to considerable research in multicast protocols using
gossip
This paper introduces a scalable gossip-based algorithm
for local view maintenance.


Can be combined with any application level gossip protocol that
relies on randomly selected gossip partners
S l bl
Scalable


Because it does not require full group membership knowledge at
each node, only a local view
Preserve connectivity and load balancing between nodes
Correctness of a Gossip Based
Membership Protocol - Properties

Desirable properties

Even load distribution / Load balancing




Connectivity




Probabilistic bounds on the degree of each node
Low degree -> low load
Reinforcement
Even distribution of pointers to other nodes
Avoid partitions
Mixing
g
Local views that are uniform samples of membership set.


Over time, local view changes and emulates complete membership
N t ffully
Not
ll achieved
hi
d iin thi
this work,
k lilisted
t d as ffuture
t
work.
k

Could be done as done in CYCLON, with timestamps
How it works

Protocol is based on each node having a local view








Fixed-size random subset of group membership
N number of nodes
K the
th size
i off a local
l
l view
i
F fanout parameter
W reinforcement weight
Each node periodically updates its local view in rounds
Join by copying local view of random node
Leave by stopping to participate.

No difference between stopping and failing node
Maintaining the local view

For each round, a node S will:



Mixing: Construct a list L1 comprising the local views of F nodes
chosen at random from S’s local view
Reinforcement: Construct a list L2 of the other nodes that
requested S’s local view during that round
Create a new local view by choosing K distinct nodes from L1
and L2

W determines the selection distribution between L1 and L2.




W = 0, no nodes from L2
W = 1, equal distribution
W > 1, increasing amount of nodes chosen from L2
Protocol can be


Synchronous, loosely synchronized or asynchronous
Simulations show no significant difference
A closer look

Claim: The protocol automatically adapts and reequilibrates
ilib t th
the network
t
k



Regardless of what caused the imbalance
Two forces responsible for this
Mixing: local views requested by node S



Connectivity property
Ensures that the graph does not partition
This p
pulling
g of local views ensures that the number of edges
g
between partitions is balanced
Mixing
Set A
70% of nodes

Edges
Set B
30% if nodes
Merge of local views results in a list of distinct peers.

For each iteration, this merger will converge towards an even
distribution between the number of edges from A to B and A to A
A closer look – Continued

Reinforcement: Nodes that requested S’s local view


Node s positively reinforce nodes that pulled its local view by
adding them to its new local view.
If reinforcement weight W is > 1



Removes older/dead edges in approximately K/F rounds
adds fresh edges
Without, the network would collapse into star-like graph



Isolated nodes would not be able to reenter mixing
Some nodes with manyy in-edges.
g
Many nodes with few in-edges
Simulation results


Simulations show that performance of protocol is good
For large scale test with up to 100 000 nodes



High number of rounds before any partitioning happens


Almost as good as a random graph
M i
Maximum
iin-degree
d
always
l
b
below
l
4
4.5
5 ti
times th
thatt off random
d
graph
h
Scales well with increasing number of nodes
Conclusion:

Satisfactory protocol for local view maintenance
Outline

Introduction



Main paper


Group membership
Gossip based membership
Correctness of a Gossip Based Membership Protocol
Related work papers



SCAMP
Newscast
CYCLON
Introductory overview of related work

SCAMP: Peer-to-Peer Membership Management for
G
Gossip-Based
i B
dP
Protocols
t
l


Newscast Computing


By Ayalvadi J. Ganesh, Anne-Marie Kermarrec, and Laurent
Massoulié
By
y Márk Jelasity,
y Wojtek
j
Kowalczyk
y and Maarten van Steen
CYCLON: Inexpensive Membership Management for
Unstructured P2P Overlays

By Spyros Voulgaris, Daniela Gavidia and Maarten van Steen
SCAMP

Motivation



Expansion of internet wide distributed applications
Need scalable reliable group communication
Previous work
work, as of 2003




Assume that each member has full group membership knowledge
Not feasible for very large-scale groups (Scalability)
Each member needs partial membership set (Decentralization)
Goal

A id th
Avoid
the need
d tto kknow ffullll view
i
size
i


The partial view size is dependent on full view size
Partial view size automatically
y adepts
p to full view size as it g
grows
SCAMP – Desired properties

Scalability


Reliability


Large enough
L
h partial
ti l views
i
tto supportt gossip
i protocols
t
l with
ith hi
high
h
reliability
Decentralization


Size of partial view grows with full view size
Partial views should be update as members leave/join while
maintaining scalability and reliability
Isolation recovery


Recover isolated nodes
P ti l views
Partial
i
should
h ld change
h
ffor each
h message sentt using
i th
thatt
partial view
SCAMP – How it works

Subscription




Contact: New node P sends subscription request to arbitrary
member Q
New subscription: On contact Q forwards P
P’ss id to all members
in its partial view
Forwarded subscription: A node in Q’s view receives
s bscription and inserts it in its o
subscription
own
n view
ie with
ith a probabilit
probability p.
p If it
does not succeed, P’s id is forwarded to a random node in the
partial view
Keeping a subscription: Each node maintains two lists, a
PartialView of nodes it sends gossip messages to and an InView
of nodes that contain its node id in their partial view. When it
keeps a subscription, it stores that id in its InView
SCAMP – How it works
D
B
E
A
1. Contact
C
F
SCAMP – How it works
D
2. New subscription
B
E
A
1. Contact
C
F
SCAMP – How it works
3. Forwarded subscription
p
4. Keeping a subscription
D
2. New subscription
B
E
A
1. Contact
C
F
SCAMP – How it works

Unsubscription




Tell all nodes in P’s InView that its leaving
The message includes a node Q from P’s partial view list
The nodes receiving this message replaces P in its partial view
with the Q
To avoid isolated nodes in case of node failures



Nodes send periodic heartbeat messages
A node is isolated if it does not receive heartbeat messages for a
while.
hil
To remove itself from isolation, it resubscribes
SCAMP – Rebalancing the graph

Cannot expect new subscribers to select their contact
uniformly
if
l ffrom th
the whole
h l membership
b hi sett


May lead to an unbalanced graph
Two mechanisms proposed to achieve balance

Indirection: Contact node Q for new subscription does not
handle the request.
q
Instead it forwards the request
q
for contact.



Q uses a forwarding rule to chose which node to forward to from its
partial view
A stopping rule decides if the request should be handled by a node or
forwarded using the forwarding rule
Lease: Nodes only have a subscription for a leased time

Aft lease
After
l
time
ti
is
i over, it will
ill h
have tto resubscribe
b ib

Resubscription contact is chosen randomly from its partial view
SCAMP – Results

Scalability


Reliability


R
Results
lt confirm
fi th
thatt gossip
i protocols
t
l are good
d ffor reliability
li bilit
Decentralization


Partial view size shown to grow with full view size
Rebalances itself with lease and indirection mechanism
Isolation recovery

Heartbeat allows for resubscription for isolated nodes
Newscast

Motivation

Monitoring of large computer networks






Failure detection
Peer-to-peer
Peer
to peer protocol
Maintains and disseminates up to date information and
membership data
data.
Aimed at large and dynamic distributed environment
Provides information dissemination service to
applications
Resilient to peer failures
Newscast - How it works


Each peer keeps a small fixed-size cache of C news
items.
it
Cache entry:


Contains
C
t i a news ititem, a ti
timestamp
t
and
d a peer address.
dd
Each news item contains application id and some news.
Address
Timestamp
Appid
News data
News item
Newscast - How it works 2

At intervals T, each peer:


Gets news from application, timestamps it and adds local peer
address to the cache entry
Finds a random peer in cache addresses




Sends all cache entries to this peer
Receives all cache entries from that peer
Passes on cache entries (containing
(
news items)) to application
Merges old cache with received cache

Keeps at most C cache entries


Does not require synchronization



Throws away oldest entries
Only needs to normalize incoming cache entries
Can result in errors, but sufficient for this work
Passive peer does the same as initiating peer, except peer
selection
Newscast – How it works
Application
Application
1. News
item
Peer
Peer
Newscast – How it works
Application
Application
1. News
item
Peer
2. Add news
Item to cache
Peer
Newscast – How it works
Application
Application
1. News
item
Peer
2. Add news
Item to cache
Peer
3. Find a random node
3
and exchange cache
Newscast – How it works
Application
1. News
item
Application
4. Deliver news
to application
4. Deliver news
to application
Peer
2. Add news
Item to cache
Peer
3. Find a random node
3
and exchange cache
Newscast – How it works
Application
1. News
item
Application
4. Deliver news
to application
4. Deliver news
to application
Peer
2. Add news
Item to cache
Peer
3. Find a random node
3
and exchange cache
5. Merge
5
e ge o
old
d cac
cache
e with
t incoming
co
g cac
cache
e
Newscast – Membership management


Membership management is disseminated together with
news items
it
Join


A peer jjoins
i b
by iinitializing
iti li i itits cache
h with
ith att lleastt one kknown peer
Leave


Failing/leaving node treated the same way
As there is a timestamp with each cache entry, failed nodes will
quickly disappear from caches
Newscast – Wanted Properties

Self-organizing


Effective


Quality of service should not decrease when scaling up
Robust


I f
Information
ti di
dissemination
i ti should
h ld b
be ffastt and
d predictable
di t bl
Scalable


No matter what join/leave patterns, it should organize itself
Should handle massive node failures
Show empirical evidence of listed properties
Newscast – Empirical Evidence

Results show that the average path length converges to
a low
l
value
l





Clustering coefficient not as good as random graph


For non random join sequences
After radical fluctuation of membership
Robust, effective and self-organizing
About the same as random graph
g p
Information dissemination still effective due to good average path
length
Communication cost

Ab t 2*cache
About
2*
h entry
t size
i every cycle
l

* some application specific size
CYCLON


Motivation: Content based searching in peer-to-peer
overlays
Contribution: Framework for inexpensive membership
management
g






While retaining random-graph properties
Gossip-based
p
membership
p management
g
p
protocol
Resilient to massive node failures
Handles high
g churn rate
Low membership management cost
Shown to construct membership
pg
graphs
p with:



Low diameter
Low clustering factor
Highly symmetric node degrees
CYCLON – Basic Shuffling


Simple peer-to-peer communication protocol.
Each peer has a small fixed-size cache of C peers.


Each entry contains an address to another peer.
At Intervals
I t
l T,
T each
h peer P:
P







Selects a random subset L of the cache and a random peer Q
within this subset
subset.
Replaces Q’s address with P’s
Sends subset to Q
Receives a subset of Q’s cache.
Discards redundant entries
Fill cache
h with
ith new set,
t replacing
l i those
th
entries
t i sentt to
t Q
Q also merges its cache with subset from P
CYCLON – Basic Shuffling
Figure
g
from Voulgaris
g
et al.
Note: Connectivity is directional. In (a), 9 is 2’s neighbor,
but not vice versa
versa. In (b) it
it’ss the opposite
opposite.
CYCLON – Enhanced Shuffling



Contribution from CYCLON
Almost the same as basic shuffling
Key difference is that:




Peers do not choose who to shuffle with randomly
Instead, it uses the oldest entry in the cache
This prevents dead peers from lingering
Another difference is:

Limit the lifetime of each cache entry
entry.

Allows control over the number of existing cache entries pointing to
one peer.
CYCLON – Join/leave

Join


Without disrupting randomness
Joining node does a random walk of average path length



Node where it stops exchanges 1 cache entry
Node continues and repeats until cache is filled
Leave



Failed/leaving nodes are the same
No heartbeat/keep alive messages
Due to age of cache entries, dead nodes will be removed
CYCLON – Basic properties

Connectivity


No node becomes disconnected as a result of shuffling
Convergence

S ll average path
Small
th llength
th iis good
d ffor iinformation
f
ti di
dissemination
i ti


Results show that average
g p
path length
g converges
g to a small
value over time.


Lower communication cost and delay
This value is comparable to average path length of a random graph
Clustering coefficient should be low


High coefficient increases chances of partitioning
It is also not optimal for information dissemination



Many redundant messages
Also comparable to clustering coefficient of a random graph
Both clustering and average path length converge exponentially
CYCLON – Basic properties 2

Degree Distribution





Robustness


Load balancing
Robustness in presence of failures
Avoid poorly/highly connected nodes
Results show that enhanced shuffling outperforms basic shuffling
in degree distribution. It also outperforms random graph.
Results show that CYCLON is able to heal itself after massive
node
d ffailure
il
Bandwidth

Estimations show that each node needs a total of 40*L
40 L bytes per
shuffle, which is very low
Questions – Discussion
Download