High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin

High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin Laboratory for Advanced Systems Research, The University of Texas at Austin Summary of the talk  High throughput is achievable along with Byzantine fault tolerance  Contributions  High Throughput BFT Architecture  CBASE : Generic Prototype  CBASE-FS : High throughput replicated NFS July 12, 2016 Department of Computer Sciences, UT Austin 2 Outline  Overview  Architecture  Implementation  Evaluation  Conclusion July 12, 2016 Department of Computer Sciences, UT Austin 3 Motivation  Large scale Internet services  High Availability  High Reliability  High Security  High Throughput : 24 X 7 service : Correctness : Data integrity/Confidentiality : System load  Challenges : Byzantine failures  Malicious attacks • http://www.cert.org  Software and operator errors • ROC@USITS03  Network and hardware failures July 12, 2016 Department of Computer Sciences, UT Austin 4 BFT State Machine Replication July 12, 2016 Department of Computer Sciences, UT Austin 5 BFT state machine replication  Byzantine Fault Tolerance Protocol  Tolerates f Byzantine server failures using 3f+1 replicas  Agreement : Order requests from clients  Execution stage : Execute requests  Provide high availability, reliability and security  PBFT, Farsite, Oceanstore [OSDI99, OSDI01, SOSP01, SOSP03] Server Replicas Execution Execution Execution Execution Agreement Agreement Agreement Agreement Clients July 12, 2016 Department of Computer Sciences, UT Austin 6 BFT : Tradeoff throughput for fault tolerance ? July 12, 2016 Department of Computer Sciences, UT Austin 7 Traditional BFT : Limitations  Fail to provide high throughput  Does not scale with hardware resources and application parallelism  Reason  Uses Generalized State Machine Replication  Correctness conditions: • Agreement : Every non-faulty state machine replica receives every request • Order : Every non-faulty state machine replica processes the requests in the same relative order  BFT State machine replication :  Execute requests sequentially to ensure order July 12, 2016 Department of Computer Sciences, UT Austin 8 High Throughput BFT : Idea  Modify Order without compromising consistency/safety  Relaxed order : Every non-faulty replica executes dependent requests in the same relative order  Dependent requests : Two requests are dependent if read set or write set of one intersects with write set of the other.  Requests that are not dependent can be concurrently executed  Exploit application parallelism to provide high throughput  Commercial applications like web server, file systems, databases have inherent data parallelism July 12, 2016 Department of Computer Sciences, UT Austin 9 Outline  Overview  Architecture  Implementation  Evaluation  Conclusion July 12, 2016 Department of Computer Sciences, UT Austin 10 HT BFT : Architecture  Goals :     Generic : Generic interface that exposes application parallelism Extensible : Easily extensible to support any application Modular : Support different fault models easily Reuse : Reuse existing agreement protocols Server Replicas July 12, 2016 Execution Execution Execution Execution Parallelizer Parallelizer Parallelizer Parallelizer Agreement Agreement Agreement Agreement Department of Computer Sciences, UT Austin 11 Parallelizer  Application independent module  Receives ordered requests from agreement  Maintains/Updates dependency graph of requests  2 level dependency analysis  Concurrency matrix  Schedules a request if it is not dependent on any outstanding requests (no outgoing edges at a request node)  Requests that are not dependent are concurrently executed July 12, 2016 Department of Computer Sciences, UT Austin 12 Parallelizer : Concurrency Matrix  Definition/Figure : Square matrix rows/columns represent operations  1 represents independent, 0 represents dependent operations  Exports application level parallelism  Statically defined  Two matrices : Dependency also depends on objects  Related objects  Unrelated objects  Table Lookup  Low overhead July 12, 2016 Department of Computer Sciences, UT Austin 13 Parallelizer : Dependence Analysis  Parallelizer figure : agreement stage, input queue, dependency graph, multi thread execution stage July 12, 2016 Department of Computer Sciences, UT Austin 14 Advantages/Limitations  Advantages :  Supports high throughput applications  Simple : Minimal/No changes to client/agreement protocol/application  Flexible : Supports different fault models easily  Limitation :  Concurrency matrix requires inner workings of application  Conservative rules ensures correctness at the expense of performance  Incrementally refine the rules to gain performance July 12, 2016 Department of Computer Sciences, UT Austin 15 Outline  Overview  Architecture  Implementation  Evaluation  Conclusion July 12, 2016 Department of Computer Sciences, UT Austin 16 System Model  Asynchronous system  Nodes operate at arbitrarily different speeds  Network may delay, drop or deliver messages out of order  Assumption : Bounded fair links  Fault Model : Byzantine Faults  Faulty nodes may behave arbitrarily : crash, lose/alter data, send incorrect messages  Adversary : Strong adversary  Can coordinate faulty nodes in arbitrarily bad ways  Assumption : Computationally limited July 12, 2016 Department of Computer Sciences, UT Austin 17 CBASE : Concurrent BASE  Uses unmodified PBFT agreement protocol [OSDI 1999]  Built upon BASE library [SOSP 2001]  Agreement stage : Single thread  Execution stage : Multithreaded  Parallelizer : Producer/Consumer queue  Figure ?? July 12, 2016 Department of Computer Sciences, UT Austin 18 Parallelizer : Interface  Parallelizer.insert()  Parallelizer.next_request()  Parallelizer.sync() July 12, 2016 Department of Computer Sciences, UT Austin 19 CBASE-FS : BFT NFS  Figure  Brief description of NFS concurrency matrix rules  Related objects : Same NFS handle  Rules are conservative  Refer paper for more details July 12, 2016 Department of Computer Sciences, UT Austin 20 Outline  Overview  Architecture  Implementation  Evaluation  Conclusion July 12, 2016 Department of Computer Sciences, UT Austin 21 Evaluation  With 4 server replicas that tolerate 1 Byzantine failure  Replicas running on different uniprocessor machine  933 MHz P3, 256 MB Ram     July 12, 2016 5 Client machines Dedicated network with 100MB ethernet hub OS : Redhat Linux 7.2 with NFS 2.0 Assumption : No correlated failures due to OS. Department of Computer Sciences, UT Austin 22 Microbenchmark : Overhead  BASE versus CBASE July 12, 2016 Department of Computer Sciences, UT Austin 23 Microbenchmark : Scalability  Scalability with hardware resources  Scalability with application level parallelism July 12, 2016 Department of Computer Sciences, UT Austin 24 Microbenchmark : CBASE-FS/BASE-FS/NFS  Latency versus Throughput with no sleep  Latency versus Throughput with 20 ms sleep  Iozone results summary July 12, 2016 Department of Computer Sciences, UT Austin 25 Macrobenchmarks  Postmark :  Andrew : July 12, 2016 Department of Computer Sciences, UT Austin 26 Conclusions  Commercial applications have parallelism  High throughput BFT provides a simple/flexible solution to achieve high throughput July 12, 2016 Department of Computer Sciences, UT Austin 27 Questions ?  Why don’t you have parallelizer in the agreement stage to reduce agreement cost ? July 12, 2016 Department of Computer Sciences, UT Austin 28

High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin

Related documents

Products

Support

High Throughput Byzantine Fault Tolerance Ramakrishna Kotla, Mike Dahlin

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib