Practical accountability for distributed systems Andreas Haeberlen MPI-SWS / Rice University © 2007 Andreas Haeberlen, MPI-SWS Petr Kuznetsov Peter Druschel MPI-SWS MPI-SWS SOSP 2007 1 Motivation Admin Distributed state, incomplete information General case: Multiple admins with different interests © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 2 General faults occur in practice Many faults are not 'fail-stop' Node is still running, but its behavior changes Examples: Hardware malfunctions Misconfigurations Software modifications by users Hacker attacks ... © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 3 Dealing with general faults is difficult Responsible admin Incorrect message How to detect faults? How to identify the faulty nodes? How to convince others that a node is (not) faulty? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 4 Learning from the 'offline' world Relies on accountability Example: Banks Requirement Solution Commitment Signed receipts Tamper-evident record Double-entry bookkeeping Inspections Audits Can be used to detect, identify and convince But: Existing fault-tolerance work mostly focused on prevention Goal: A general+practical system for accountability © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 5 Outline Introduction What is accountability? How can we implement it? How well does it work? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 6 Ideal accountability Fault := Node deviates from expected behavior Recall that our goal is to detect faults identify the faulty nodes convince others that a node is (or is not) faulty Can we build a system that provides the following guarantee? Whenever a node is faulty in any way, the system generates a proof of misbehavior against that node © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 7 Can we detect all faults? Problem: Faults that affect only a node's internal state A Requires online trusted probes at each node Focus on observable faults: 1001010110 0010110101 0 1100100100 Faults that causally affect a correct node C This allows us to detect faults without introducing any trusted components © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 8 Can we always get a proof? Problem: He-said-she-said situation Three possible causes: A ? B I never received X! Cannot get proof of misbehavior! Generalize to verifiable evidence: A never sent X B refuses to accept X X was lost by the network I sent X! ?! C a proof of misbehavior, or a challenge that the node cannot answer What if, after a long time, no response has arrived? Does not prove the fault, but we can suspect the node © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 9 Practical accountability We propose the following definition of a distributed system with accountability: Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node This is useful Any (!) fault that affects a correct node is eventually detected and linked to a faulty node It can be implemented in practice © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 10 Outline Introduction What is accountability? How can we implement it? How well does it work? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 11 Implementation: PeerReview Adds accountability to a given system Implemented as a library Provides secure record, commitment, auditing, etc. Assumptions: 1. 2. 3. 4. System can be modeled as collection of deterministic state machines Nodes have reference implementations of the state machines Correct nodes can eventually communicate Nodes can sign messages © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 12 PeerReview from 10,000 feet A's witnesses C D E A A's log B SOSP 2007 Including all messages Each node has a set of witnesses, who audit its log periodically If the witnesses detect misbehavior, they B's log © 2007 Andreas Haeberlen, MPI-SWS All nodes keep a log of their inputs & outputs generate evidence make the evidence available to other nodes Other nodes check evidence, report fault 13 PeerReview detects tampering Message Hash(log) B A ACK Hash(log) B's log H4 Send(Z) Recv(Y) H1 H0 Recv(M) H3 H2 What if a node modifies its log entries? Log entries form a hash chain Inspired by secure histories [Maniatis02] Signed hash is included with every message Node commits to its current state Changes are evident Send(X) © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 14 PeerReview detects inconsistencies "View #1" "View #2" H4 H4 ' Not found H3 H3 ' Read X H0 OK Create X H2 keeps multiple logs? forks its log? Check whether the signed hashes form a single hash chain OK H1 © 2007 Andreas Haeberlen, MPI-SWS What if a node Read Z SOSP 2007 15 State machine PeerReview detects faults Module A Module B How to recognize faults in a log? Assumption: Log Network To audit a node: Module A Module B Input Output © 2007 Andreas Haeberlen, MPI-SWS =? if ≠ SOSP 2007 Node can be modeled as a deterministic state machine Replay inputs to a trusted copy of the state machine Check outputs against the log 16 PeerReview offers provable guarantees PeerReview guarantees that: 1) Faults will be detected If node commits a fault + has a correct witness, then witness obtains 2) Good nodes cannot be accused If node is correct a proof of misbehavior (PoM), or a challenge that the faulty node cannot answer there can never be a PoM, and it can answer any challenge Formal definitions and proof in a TR © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 17 Outline Introduction What is accountability? How can we implement it? How well does it work? Is it widely applicable? How much does it cost? Does it scale? © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 18 PeerReview is widely applicable App #1: NFS server in the Linux kernel App #2: Overlay multicast Transfers large volume of data Freeloading Tampering with content App #3: P2P email Many small, latency-sensitive requests Metadata corruption Tampering with files Incorrect access control Lost updates Complex, large, decentralized Denial of service Attacks on DHT routing Censorship More information in the paper © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 19 How much does PeerReview cost? Avg traffic (Kbps/node) 100 80 60 Baseline traffic 40 Signatures and ACKs 20 0 Baseline Checking logs 1 2 3 Number of witnesses 4 5 W dedicated witnesses Dominant cost depends on number of witnesses W O(W2) component © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 20 Small random sample of peers chosen as witnesses Mutual auditing Node Small probability of error is inevitable Example: Replication Can use this to optimize PeerReview Accept that an instance of a fault is found only with high probability Asymptotic complexity: O(N2) O(log N) © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 21 Avg traffic (Kbps/node) PeerReview is scalable DSL/cable upstream Email system + PeerReview (P=1.0) O((log N)2) O(log N) Email system + PeerReview (P=0.999999) Email system w/o accountability System size (nodes) Assumption: Up to 10% of nodes can be faulty Probabilistic guarantees enable scalability Example: Email system scales to over 10,000 nodes with P=0.999999 © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 22 Summary Accountability is a new approach to handling faults in distributed systems detects faults identifies the faulty nodes produces evidence Our practical definition of accountability: Whenever a fault is observed by a correct node, the system eventually generates verifiable evidence against a faulty node PeerReview: A system that enforces accountability Offers provable guarantees and is widely applicable Thank you! © 2007 Andreas Haeberlen, MPI-SWS SOSP 2007 23