So far... Distributed algorithms for fault-tolerance

advertisement
So far...
• Asynchronous models
• Crash or partition failures
Distributed algorithms for
fault-tolerance
• This time:
Synchronous algorithms,
Byzantine agreement
Simin Nadjm-Tehrani
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
– What is meant by synchrony in
algorithms?
– How to deal with byzantine failures?
1
Synchronous algorithms
Can this help to solve difficult
problems?
© Simin Nadjm -Tehrani, 2003
© Simin Nadjm -Tehrani, 2003
2
Byzantine generals
• Proceed in rounds initiated by
pulses
• Pulses can be implemented using
local physical clocks, based on
assumed bounded message delays
Dist. Algorithms for FT
Dist. Algorithms for FT
3
• A difficult agreement problem
• Solved in 1980 by Pease, Shostak
and Lamport
• There is an upperbound t for the
number of byzantine failures
compared to the size of the
network: N ≥ 3t+1
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
4
1
Scenario 1
Scenario 2
• G and L1 are correct, L2 is faulty
G
0
Dist. Algorithms for FT
L1
L1
L2
G said 0
© Simin Nadjm -Tehrani, 2003
5
• The general is faulty!
0
Dist. Algorithms for FT
0
L2
sa
id
0
L2
sa
id
1
1
L1
L2
sa
id
0
0
G said 0
L2
© Simin Nadjm -Tehrani, 2003
6
• Similarly for L2, if it decides 0 in
scenario 2 it decides 0 in scenario 3
id
sa
L1
1
1
said
L1
0
Dist. Algorithms for FT
L1
• Seen from L1, scenario 1 and 3 are
identical, so if L1 decides 1 in scenario
1 it will decide 1 in scenario 3
G
G
0
0
G said 1
L2
2-round algorithm
does not work with t=1!
Scenario 3
1
L2
sa
id
0
L2
sa
id
L2
1
1
G said 1
id
sa
L1
L1
0
0
1
1
0
1
1
said
L1
0
id
sa
L1
1
1
G
G
1
said
L1
L2
sa
id
0
G
1
• G and L2 are correct, L1 is faulty
G said 1
L2
L1
© Simin Nadjm -Tehrani, 2003
G said 0
L2
• L1 and L2 do not agree in scenario 3 !
7
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
8
2
Idea of [PSL80] algorithm
Illustration
• V[xy]= v after round 2 means:
y said that x has value v
• Algorithm proceeds in rounds
– At round 1 each process sends its value to
all others
– At next round the recieved messages are
relayed and the algorithm recursively
applied with (N-1, t-1)
• Each process maintains a t+2 level tree,
in which the nodes at each level k are
decorated with values received in round
k-1
Dist. Algorithms for FT
3
1
© Simin Nadjm -Tehrani, 2003
9
2
12
32
13
21
31
23
Untrusted values are denoted by ⊥
Dist. Algorithms for FT
Decision procedure
© Simin Nadjm -Tehrani, 2003
Correctness
• After the t+1 rounds, the tree for
each process is evaluated bottomup
• At each level 1 ≤ k ≤ t+1 the value
of each node is computed as the
majority of the values of its
children. If a majority doesnot
exist, the value is ⊥
• Agreement: if all nodes have the
same initial value the computed
value for each non-faulty node is
the same
• Termination: based on decreasing
chain of recursive calls
Dist. Algorithms for FT
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
10
11
© Simin Nadjm -Tehrani, 2003
12
3
Effects of faults
Reading material
• Transfer of incorrect own state
• Incorrect relay of another process’
message
• Lynch, Chapters 6.3 and 6.4
• Tel, Chapter 12.1 and 15
• Authentication:
– avoids the latter
– With t+1 rounds can tolerate t < N
failures
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
13
Dist. Algorithms for FT
© Simin Nadjm -Tehrani, 2003
14
4
Download