Practical accountability for distributed systems Petr Kuznetsov Peter Druschel Andreas Haeberlen

Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University © 2007 Andreas Haeberlen, MPI-SWS Petr Kuznetsov Peter Druschel MPI-SWS MPI-SWS SOSP 2007 1 Motivation Admin   Distributed state, incomplete information General case: Multiple admins with different interests © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 2 General faults occur in practice  Many faults are not 'fail-stop'   Node is still running, but its behavior changes Examples:      Hardware malfunctions Misconfigurations Software modifications by users Hacker attacks ... © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 3 Dealing with general faults is difficult Responsible admin Incorrect message    How to detect faults? How to identify the faulty nodes? How to convince others that a node is (not) faulty? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 4 Learning from the 'offline' world      Relies on accountability Example: Banks Requirement Solution Commitment Signed receipts Tamper-evident record Double-entry bookkeeping Inspections Audits Can be used to detect, identify and convince But: Existing fault-tolerance work mostly focused on prevention Goal: A general+practical system for accountability © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 5 Outline     Introduction What is accountability? How can we implement it? How well does it work? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 6 Ideal accountability   Fault := Node deviates from expected behavior Recall that our goal is to     detect faults identify the faulty nodes convince others that a node is (or is not) faulty Can we build a system that provides the following guarantee? Whenever a node is faulty in any way, the system generates a proof of misbehavior against that node © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 7 Can we detect all faults?  Problem: Faults that affect only a node's internal state   A Requires online trusted probes at each node Focus on observable faults:   1001010110 0010110101 0 1100100100 Faults that causally affect a correct node C This allows us to detect faults without introducing any trusted components © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 8 Can we always get a proof?   Problem: He-said-she-said situation Three possible causes:      A ? B I never received X! Cannot get proof of misbehavior! Generalize to verifiable evidence:    A never sent X B refuses to accept X X was lost by the network I sent X! ?! C a proof of misbehavior, or a challenge that the node cannot answer What if, after a long time, no response has arrived?  Does not prove the fault, but we can suspect the node © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 9 Practical accountability  We propose the following definition of a distributed system with accountability: Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node  This is useful   Any (!) fault that affects a correct node is eventually detected and linked to a faulty node It can be implemented in practice © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 10 Outline     Introduction What is accountability? How can we implement it? How well does it work? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 11 Implementation: PeerReview  Adds accountability to a given system    Implemented as a library Provides secure record, commitment, auditing, etc. Assumptions: 1. 2. 3. 4. System can be modeled as collection of deterministic state machines Nodes have reference implementations of the state machines Correct nodes can eventually communicate Nodes can sign messages © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 12 PeerReview from 10,000 feet  A's witnesses C  D  E  A A's log B   SOSP 2007 Including all messages Each node has a set of witnesses, who audit its log periodically If the witnesses detect misbehavior, they  B's log © 2007 Andreas Haeberlen, MPI-SWS All nodes keep a log of their inputs & outputs generate evidence make the evidence available to other nodes Other nodes check evidence, report fault 13 PeerReview detects tampering Message Hash(log)  B A ACK Hash(log)   B's log H4 Send(Z) Recv(Y) H1 H0  Recv(M) H3 H2 What if a node modifies its log entries? Log entries form a hash chain Inspired by secure histories [Maniatis02] Signed hash is included with every message  Node commits to its current state  Changes are evident Send(X) © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 14 PeerReview detects inconsistencies  "View #1" "View #2" H4   H4 ' Not found H3 H3 ' Read X H0 OK Create X H2  keeps multiple logs? forks its log? Check whether the signed hashes form a single hash chain OK H1 © 2007 Andreas Haeberlen, MPI-SWS What if a node Read Z SOSP 2007 15 State machine PeerReview detects faults  Module A Module B  How to recognize faults in a log? Assumption:  Log Network  To audit a node:  Module A Module B  Input Output © 2007 Andreas Haeberlen, MPI-SWS =? if ≠ SOSP 2007 Node can be modeled as a deterministic state machine Replay inputs to a trusted copy of the state machine Check outputs against the log 16 PeerReview offers provable guarantees  PeerReview guarantees that: 1) Faults will be detected If node commits a fault + has a correct witness, then witness obtains   2) Good nodes cannot be accused If node is correct    a proof of misbehavior (PoM), or a challenge that the faulty node cannot answer there can never be a PoM, and it can answer any challenge Formal definitions and proof in a TR © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 17 Outline     Introduction What is accountability? How can we implement it? How well does it work?    Is it widely applicable? How much does it cost? Does it scale? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 18 PeerReview is widely applicable  App #1: NFS server in the Linux kernel   App #2: Overlay multicast   Transfers large volume of data  Freeloading  Tampering with content App #3: P2P email   Many small, latency-sensitive requests  Metadata corruption  Tampering with files  Incorrect access control  Lost updates Complex, large, decentralized  Denial of service  Attacks on DHT routing  Censorship More information in the paper © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 19 How much does PeerReview cost? Avg traffic (Kbps/node) 100 80 60 Baseline traffic 40 Signatures and ACKs 20 0 Baseline  Checking logs 1 2 3 Number of witnesses 4 5 W dedicated witnesses Dominant cost depends on number of witnesses W  O(W2) component © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 20 Small random sample of peers chosen as witnesses Mutual auditing Node  Small probability of error is inevitable   Example: Replication Can use this to optimize PeerReview   Accept that an instance of a fault is found only with high probability Asymptotic complexity: O(N2)  O(log N) © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 21 Avg traffic (Kbps/node) PeerReview is scalable DSL/cable upstream Email system + PeerReview (P=1.0) O((log N)2) O(log N) Email system + PeerReview (P=0.999999) Email system w/o accountability System size (nodes)   Assumption: Up to 10% of nodes can be faulty Probabilistic guarantees enable scalability  Example: Email system scales to over 10,000 nodes with P=0.999999 © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 22 Summary  Accountability is a new approach to handling faults in distributed systems     detects faults identifies the faulty nodes produces evidence Our practical definition of accountability: Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node  PeerReview: A system that enforces accountability  Offers provable guarantees and is widely applicable Thank you! © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 23

Practical accountability for distributed systems Petr Kuznetsov Peter Druschel Andreas Haeberlen

Related documents

Products

Support

Practical accountability for distributed systems Petr Kuznetsov Peter Druschel Andreas Haeberlen

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib