Byzantine fault tolerance

Byzantine fault-tolerance

COMP 413

Fall 2002

Overview

• Models

– Synchronous vs. asynchronous systems

– Byzantine failure model

• Secure storage with self-certifying data

• Byzantine quorums

• Byzantine state machines

Models

Synchronous system: bounded message delays (implies reliable network!)

Asynchronous system: message delays are unbounded

In practice (Internet): reasonable to assume that network failures are eventually fixed

(weak synchrony assumption).

Model (cont’d)

• Data and services (state machines) can be replicated on a set of nodes R .

• Each node in

R has iid probability of failing

• Can specifiy bound f on the number of nodes that can fail simultaneously

Model (cont’d)

Byzantine failures

• no assumption about nature of fault

• failed nodes can behave in arbitrary ways

• may act as intelligent adversary

(compromised node), with full knowledge of the protocols

• failed nodes may conspire (act as one)

Self-certifying data

Byzantine quorums

• Data is not self-certifying (multiple writers without shared keys)

• Idea: replicate data on sufficient number of replicas (relative to f ) to be able to rely on majority vote

Byzantine quorums: r/w variable

Representative problem: implement a read/write variable

Assuming no concurrent reads, writes for now

Assuming trusted clients, for now


How many replicas do we need?

• clearly, need at least 2 f +1, so we have a majority of good nodes

• write( x ): send x to all replicas, wait for acknowledgments (must get at least f +1)

• read( x ): request x from all replicas, wait for responses, take majority vote (if no concurrent writes, must get f +1 identical votes!)

R

W


Does this work? Yes, but only if

• system is synchronous (bounded msg delay)

• faulty nodes cannot forge messages

(messages are authenticated!)


Now, assume

• Weak synchrony (network failures are fixed eventually)

• messages are authenticated (e.g., signed with sender’s private key)


Let’s try 3 f +1 replicas (known lower bound)

• write( x ): send x to all replicas, wait for 2 f +1 responses (must have at least f +1 good replicas with correct value)

• read( x ): request x from all replicas, wait for 2 f +1 responses, take majority vote (if no concurrent writes, must get f +1 identical votes!? – no, it is possible that the f nodes that did not respond were good nodes!)

R

W


Let’s try 4 f +1 replicas

• write( x ): send x to all replicas, wait for 3 f +1 responses (must have at least 2 f +1 good replicas with correct value)

• read( x ): request x from all replicas, wait for 3 f +1 responses, take majority vote (if no concurrent writes, must get f +1 identical votes!? – no, it is possible that the f faulty nodes vote with the good nodes that have an old value of x !)

R

W


Let’s try 5 f +1 replicas

• write( x ): send x to all replicas, wait for 4 f +1 responses (must have at least 3 f +1 good replicas with correct value)

• read( x ): request x from all replicas, wait for 4 f +1 responses, take majority vote (if no concurrent writes, must get f +1 identical votes!)

• Actually, can use only 5 f replicas if data is written with monotonically increasing timestamps

R

W


Still rely on trusted clients

• Malicious client could send different values to replicas, or send value to less than a full quorum

• To fix this, need a byzantine agreement protocols among the replicas

Still don’t handle concurrent accesses

Still don’t handle group changes

Byzantine state machine

BFT (Castro, 2000)

• Can implement any service that behaves like a deterministic state machine

• Can tolerate malicious clients

• Safe with concurrent requests

• Requires 3 f +1 replicas

• 5 rounds of messages

Byzantine state machine

• Clients send requests to one replica

• Correct replicas execute all requests in same order

• Atomic multicast protocol among replicas ensures that all replicas receive and execute all requests in the same order

• Since all replicas start in same state, correct replicas produce identical result

• Client waits for f +1 identical results from different replicas

BFT protocol

BFT: Protocol overview

• Client c sends m = <REQUEST,o,t,c>σ c to the primary . (o=operation,t=monotonic timestamp)

• Primary p assigns seq# n to m and sends <PRE-

PREPARE,v,n,m> σ p to other replicas. (v=current view, i.e., replica set)

• If replica i accepts the message, it sends

<PREPARE,v,n,d,i> σ i to other replicas. (d is hash of the request). Signals that i agrees to assign n to m in v .

BFT: Protocol overview

• Once replica i has a pre-prepare and 2 f +1 matching prepare messages, it sends

<COMMIT,v,n,d,i> σ i to other replicas. At this point, correct replicas agree on an order of requests within a view.

• Once replica i has 2 f +1 matching prepare and commit messages, it executes m, then sends

<REPLY,v,t,c,i,r> σ i to the client. (The need for this last step has to do with view changes.)

BFT

• More complexity related to view changes and garbage collection of message logs

• Public-key crypto signatures are bottleneck: a variation of the protocol uses symmetric crypto

(MACs) to provide authenticated channels. (Not easy: MACs are less powerful: can’t prove authenticity to a third party!)

Byzantine fault tolerance

Byzantine fault-tolerance

Overview

Models

Model (cont’d)

Model (cont’d)

Self-certifying data

Byzantine quorums

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine quorums: r/w variable

Byzantine state machine

Byzantine state machine

BFT protocol

BFT: Protocol overview

BFT: Protocol overview

BFT

Related documents

Products

Support

Byzantine fault tolerance