High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin

advertisement
High Throughput Byzantine Fault Tolerance
Ramakrishna Kotla, Mike Dahlin
Laboratory for Advanced Systems Research,
The University of Texas at Austin
Summary of the talk
 High throughput is achievable along with
Byzantine fault tolerance
 Contributions
 High Throughput BFT Architecture
 CBASE : Generic Prototype
 CBASE-FS : High throughput replicated NFS
July 12, 2016
Department of Computer Sciences, UT Austin
2
Outline
 Overview
 Architecture
 Implementation
 Evaluation
 Conclusion
July 12, 2016
Department of Computer Sciences, UT Austin
3
Motivation
 Large scale Internet services
 High Availability
 High Reliability
 High Security
 High Throughput
: 24 X 7 service
: Correctness
: Data integrity/Confidentiality
: System load
 Challenges : Byzantine failures
 Malicious attacks
• http://www.cert.org
 Software and operator errors
• ROC@USITS03
 Network and hardware failures
July 12, 2016
Department of Computer Sciences, UT Austin
4
BFT State Machine Replication
July 12, 2016
Department of Computer Sciences, UT Austin
5
BFT state machine replication
 Byzantine Fault Tolerance Protocol
 Tolerates f Byzantine server failures using 3f+1
replicas
 Agreement : Order requests from clients
 Execution stage : Execute requests
 Provide high availability, reliability and security
 PBFT, Farsite, Oceanstore [OSDI99, OSDI01, SOSP01, SOSP03]
Server Replicas
Execution
Execution
Execution
Execution
Agreement
Agreement
Agreement
Agreement
Clients
July 12, 2016
Department of Computer Sciences, UT Austin
6
BFT : Tradeoff throughput for fault tolerance ?
July 12, 2016
Department of Computer Sciences, UT Austin
7
Traditional BFT : Limitations
 Fail to provide high throughput
 Does not scale with hardware resources and
application parallelism
 Reason
 Uses Generalized State Machine Replication
 Correctness conditions:
• Agreement : Every non-faulty state machine replica
receives every request
• Order : Every non-faulty state machine replica
processes the requests in the same relative order
 BFT State machine replication :
 Execute requests sequentially to ensure order
July 12, 2016
Department of Computer Sciences, UT Austin
8
High Throughput BFT : Idea
 Modify Order without compromising consistency/safety
 Relaxed order : Every non-faulty replica executes
dependent requests in the same relative order
 Dependent requests : Two requests are dependent if
read set or write set of one intersects with write set of
the other.
 Requests that are not dependent can be concurrently
executed
 Exploit application parallelism to provide high throughput
 Commercial applications like web server, file systems,
databases have inherent data parallelism
July 12, 2016
Department of Computer Sciences, UT Austin
9
Outline
 Overview
 Architecture
 Implementation
 Evaluation
 Conclusion
July 12, 2016
Department of Computer Sciences, UT Austin
10
HT BFT : Architecture
 Goals :




Generic : Generic interface that exposes application parallelism
Extensible : Easily extensible to support any application
Modular : Support different fault models easily
Reuse : Reuse existing agreement protocols
Server Replicas
July 12, 2016
Execution
Execution
Execution
Execution
Parallelizer
Parallelizer
Parallelizer
Parallelizer
Agreement
Agreement
Agreement
Agreement
Department of Computer Sciences, UT Austin
11
Parallelizer
 Application independent module
 Receives ordered requests from agreement
 Maintains/Updates dependency graph of
requests
 2 level dependency analysis
 Concurrency matrix
 Schedules a request if it is not dependent on any
outstanding requests (no outgoing edges at a
request node)
 Requests that are not dependent are
concurrently executed
July 12, 2016
Department of Computer Sciences, UT Austin
12
Parallelizer : Concurrency Matrix
 Definition/Figure : Square matrix rows/columns
represent operations
 1 represents independent, 0 represents
dependent operations
 Exports application level parallelism
 Statically defined
 Two matrices : Dependency also depends on
objects
 Related objects
 Unrelated objects
 Table Lookup
 Low overhead
July 12, 2016
Department of Computer Sciences, UT Austin
13
Parallelizer : Dependence Analysis
 Parallelizer figure : agreement stage, input
queue, dependency graph, multi thread
execution stage
July 12, 2016
Department of Computer Sciences, UT Austin
14
Advantages/Limitations
 Advantages :
 Supports high throughput applications
 Simple : Minimal/No changes to client/agreement
protocol/application
 Flexible : Supports different fault models easily
 Limitation :
 Concurrency matrix requires inner workings of
application
 Conservative rules ensures correctness at the
expense of performance
 Incrementally refine the rules to gain performance
July 12, 2016
Department of Computer Sciences, UT Austin
15
Outline
 Overview
 Architecture
 Implementation
 Evaluation
 Conclusion
July 12, 2016
Department of Computer Sciences, UT Austin
16
System Model
 Asynchronous system
 Nodes operate at arbitrarily different speeds
 Network may delay, drop or deliver messages out
of order
 Assumption : Bounded fair links
 Fault Model : Byzantine Faults
 Faulty nodes may behave arbitrarily : crash,
lose/alter data, send incorrect messages
 Adversary : Strong adversary
 Can coordinate faulty nodes in arbitrarily bad ways
 Assumption : Computationally limited
July 12, 2016
Department of Computer Sciences, UT Austin
17
CBASE : Concurrent BASE
 Uses unmodified PBFT agreement protocol
[OSDI 1999]
 Built upon BASE library [SOSP 2001]
 Agreement stage : Single thread
 Execution stage : Multithreaded
 Parallelizer : Producer/Consumer queue
 Figure ??
July 12, 2016
Department of Computer Sciences, UT Austin
18
Parallelizer : Interface
 Parallelizer.insert()
 Parallelizer.next_request()
 Parallelizer.sync()
July 12, 2016
Department of Computer Sciences, UT Austin
19
CBASE-FS : BFT NFS
 Figure
 Brief description of NFS concurrency matrix rules
 Related objects : Same NFS handle
 Rules are conservative
 Refer paper for more details
July 12, 2016
Department of Computer Sciences, UT Austin
20
Outline
 Overview
 Architecture
 Implementation
 Evaluation
 Conclusion
July 12, 2016
Department of Computer Sciences, UT Austin
21
Evaluation
 With 4 server replicas that tolerate 1 Byzantine
failure
 Replicas running on different uniprocessor
machine
 933 MHz P3, 256 MB Ram




July 12, 2016
5 Client machines
Dedicated network with 100MB ethernet hub
OS : Redhat Linux 7.2 with NFS 2.0
Assumption : No correlated failures due to OS.
Department of Computer Sciences, UT Austin
22
Microbenchmark : Overhead
 BASE versus CBASE
July 12, 2016
Department of Computer Sciences, UT Austin
23
Microbenchmark : Scalability
 Scalability with hardware resources
 Scalability with application level parallelism
July 12, 2016
Department of Computer Sciences, UT Austin
24
Microbenchmark : CBASE-FS/BASE-FS/NFS
 Latency versus Throughput with no sleep
 Latency versus Throughput with 20 ms sleep
 Iozone results summary
July 12, 2016
Department of Computer Sciences, UT Austin
25
Macrobenchmarks
 Postmark :
 Andrew :
July 12, 2016
Department of Computer Sciences, UT Austin
26
Conclusions
 Commercial applications have parallelism
 High throughput BFT provides a simple/flexible
solution to achieve high throughput
July 12, 2016
Department of Computer Sciences, UT Austin
27
Questions ?
 Why don’t you have parallelizer in the agreement
stage to reduce agreement cost ?
July 12, 2016
Department of Computer Sciences, UT Austin
28
Download