Failure Detectors CS 717 Ashish Motivala Dec 6

advertisement
Failure Detectors
CS 717
Ashish Motivala
Dec 6th 2001
Some Papers Relevant Papers
• Unreliable Failure Detectors for Reliable Distributed
Systems. Tushar Deepak Chandra and Sam Toueg. Journal of
the ACM.
• A gossip-style failure detection service. R. van Renesse,
Y. Minsky, and M. Hayden. Middleware '98.
• Scalable Weakly-consistent Infection-style Process
Group Membership protocol. Ashish Motivala, Abhinandan
Das, Indranil Gupta. To be submitted to DSN 2002 tomorrow.
http://www.cs.cornell.edu/gupta/swim
• On the Quality of Service of Failure Detectors. Wei Chen,
Cornell University (with Sam Toueg, Advisor, and Marcos
Aguilera, Contributing Author). DSN 2000.
• Fail-aware failure detectors. C. Fetzer and F. Cristian. In
Proceedings of the 15th Symposium on Reliable Distributed
Systems.
Asynchronous vs Synchronous
Model
– No value to assumptions about process speed
– Network can arbitrarily delay a message
– But we assume that messages are sequenced and
retransmitted (arbitrary numbers of times), so they
eventually get through.
• Failures in asynchronous model?
• Usually, limited to process “crash” faults
– If detectable, we call this “fail-stop” – but how to
detect?
Asynchronous vs Synchronous
Model
• No value to assumptions
about process speed
• Network can arbitrarily
delay a message
• But we assume that
messages are sequenced
and retransmitted
(arbitrary numbers of
times), so they
eventually get through.
• Assume that every
process will run within
bounded delay
• Assume that every link
has bounded delay
• Usually described as
“synchronous rounds”
Failures in Asynchronous and
Synchronous Systems
• Usually, limited to
process “crash” faults
• If detectable, we call this
“fail-stop” – but how to
detect?
• Can talk about message
“omission” failures:
failure to send is the
usual approach
• But network assumed
reliable (loss “charged”
to sender)
• Process crash failures, as
in asynchronous setting
• “Byzantine” failures:
arbitrary misbehavior by
processes
Realistic???
• Asynchronous model is too weak since they
have no clocks(real systems have clocks,
“most” timing meets expectations… but heavy
tails)
• Synchronous model is too strong (real
systems lack a way to implement synchronize
rounds)
• Partially Synchronous Model: async n/w with a
reliable channel
• Timed Asynchronous Model: time bounds on
clock drift rates and message delays [Fetzer]
Impossibility Results
• Consensus: All processes need to agree on a value
• FLP Impossibility of Consensus
– A single faulty process can prevent consensus
– Realistic because a slow process is indistinguishable from a
crashed one.
• Chandra/Toueg Showed that FLP Impossibility applies to
many problems, not just consensus
– In particular, they show that FLP applies to group
membership, reliable multicast
– So these practical problems are impossible in
asynchronous systems
• They also look at the weakest condition under which
consensus can be solved
Byzantine Consensus
• Example: 3 processes, 1 is faulty (A, B, C)
• Non-faulty processes A and B start with input 0 and 1,
respectively
• They exchange messages: each now has a set of inputs
{0, 1, x}, where x comes from C
• C sends 0 to A and 1 to B
• A has {0, 1, 0} and wants to pick 0. B has {0, 1, 1}
and wants to pick 1.
• By definition, impossibility in this model
means “xxx can’t always be done”
Chandra/Toueg Idea
• Theoretical Idea
• Separate problem into
– The consensus algorithm itself
– A “failure detector:” a form of oracle that announces
suspected failure
– But the process can change its decision
• Question: what is the weakest oracle for which
consensus is always solvable?
Sample properties
• Completeness: detection of every crash
– Strong completeness: Eventually, every process that
crashes is permanently suspected by every correct
process
– Weak completeness: Eventually, every process that
crashes is permanently suspected by some correct
process
Sample properties
• Accuracy: does it make mistakes?
– Strong accuracy: No process is suspected before it
crashes.
– Weak accuracy: Some correct process is never
suspected
– Eventual {strong/ weak} accuracy: there is a time
after which {strong/weak} accuracy is satisfied.
A sampling of failure detectors
Completeness
Accuracy
Strong
Weak
Eventually Strong
Eventually Weak
Strong
Perfect
P
Strong
S
Eventually Perfect
P
Eventually Strong
S
Weak
D
Weak
W
D
Eventually Weak
W
Perfect Detector?
• Named Perfect, written P
• Strong completeness and strong accuracy
• Immediately detects all failures
• Never makes mistakes
Example of a failure detector
• The detector they call W: “eventually weak”
• More commonly: W: “diamond-W”
• Defined by two properties:
– There is a time after which every process that crashes is
suspected by some correct process {weak completeness}
– There is a time after which some correct process is never
suspected by any correct process {weak accuracy}
• Eg. we can eventually agree upon a leader. If it
crashes, we eventually, accurately detect the
crash
W: Weakest failure detector
• They show that W is the weakest failure
detector for which consensus is guaranteed to
be achieved
• Algorithm is pretty simple
– Rotate a token around a ring of processes
– Decision can occur once token makes it around once
without a change in failure-suspicion status for any
process
– Subsequently, as token is passed, each recipient
learns the decision outcome
Building systems with W
• Unfortunately, this failure detector is not
implementable
• This is the weakest failure detector that solves
consensus
• Using timeouts we can make mistakes at
arbitrary times
Group Membership Service
Process Group
pi
X
Asynchronous
Lossy Network
X
Join
Leave
Failure
pj
pi
pj’s Membership list
Data Dissemination using
Epidemic Protocols
• Want efficiency, robustness, speed and scale
• Tree distribution is efficient, but fragile and
hard configure
• Gossip is efficient and robust but has high
latency. Almost linear in network load and
scales O(nlogn) in detection time with number
of processes.
State Monotonic Property
• A gossip message contains the state of the
sender of the gossip.
• The receiver used a merge function to merge
the received state and the sent state.
• Need some kind of monotonicity in state and
in gossip
Simple Epidemic
• Assume a fixed population of size n
• For simplicity, assume homogeneous
spreading
– Simple epidemic: any one can infect any one with
equal probability
• Assume that k members are already in
infected
• And that the infection occurs in rounds
Probability of Infection
• Probability Pinfect(k,n) that a particular
uninfected member is infected in a round if k
are already in a round if k are already
infected?
• Pinfect(k,n) = 1 – P(nobody infects member)
= 1 – (1 – 1/n)k
E(#newly infected members) = (n-k)x Pinfect(k,n)
Basically its a Binomial Distribution
2 Phases
• Intuition: 2 Phases
• First Half: 1 -> n/2
• Second Half: n/2 -> n
Phase 1
Phase 2
• For large n, Pinfect(n/2,n) ~ 1 – (1/e)0.5 ~ 0.4
Infection and Uninfection
• Infection
– Initial Growth Factor is very high about 2
– At the half way mark its about 1.4
– Exponential growth
• Uninfection
– Slow death of uninfection to start
– At half way mark its about 0.4
– Exponential decline
Rounds
• Number of rounds necessary to infect the
entire population is O(log n)
• Robbert uses and base of 1.585 for
experiments
How the Protocol Works
• Each member maintains a list of (address
heartbeat) pairs.
• Periodically each member gossips:
– Increments his heartbeat
– Sends (part of) list to a randomly chosen member
• On receipt of gossip, merge the lists
• Each member maintains the last heartbeat of
each list member
SWIM
Group Membership Service
Process Group
pi
X
Asynchronous
Lossy Network
X
Join
Leave
Failure
pj
pi
pj’s Membership list
System Design
• Join, Leave, Failure : broadcast to all
processes
• Need to detect a process failure at some
process quickly (to be able to broadcast it)
• Failure Detector Protocol Specifications
– Detection Time
– Accuracy
– Load
Specified by application
designer to SWIM
Optimized by SWIM
SWIM Failure Detector Protocol
pi
pj
K random
processes
X
Protocol period
= T time units
X
Properties
• Expected Detection time
e/(e-1) protocol periods
• Load: O(K) per process
– Inaccuracy probability exponential in K
• Process failures detected
– in O(log N) protocol periods w.h.p.
– in O(N) protocol periods deterministically
Why not Heartbeating ?
• Centralized : single failure point
• All-to-all : O(N) load per process
• Logical ring : unpredictability on multiple failures
LAN Scalability
6
5
4
Experimental
Expected
3
2
1
22
20
18
16
14
12
10
8
6
4
0
2
Mean Time to Failure
Detection / RTT
7
Number of Processes
Win2000, 100 Base-T Ethernet LAN
Protocol Period = 3*RTT, RTT=10 ms, K=1
Deployment
• Broadcast ‘suspicion’ before ‘declaring’ process failure
• Piggyback broadcasts through ping messages
– Epidemic-style broadcast
• WAN
– Load on core routers
– No representatives per subnet/domain
Download