UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau Fail-Stop Processors Byzantine Generals in Action: Implementing Fail-Stop Processors, Fred Schneider, TOCS, May 1984 • Example usage of byzantine agreement • Why fail-stop processors can simplify replicated services • Why fail-top processors are expensive (impractical?) to build Remaining Time: Byzantine Werewolves (improved?) Motivation Goal: Build systems that continue to work in presence of component failure Difficulty/cost of building those systems depends upon how components can fail Fail-stop components make building reliable systems easier than components with byzantine failures Fail-Stop Processors What is a failure? • Output (or behavior) that is inconsistent with specification What is a Byzantine failure? • Arbitrary, even malicious, behavior • Components may collude with each other • Cannot necessarily detect output is faulty What is a fail-stop processor? • Halts instead of performing erroneous transformations • Others can detect halted state • Others can access uncorrupted stable storage even after failure Questions to Answer 1) What are the advantages of fail-stop processors? 2) Real processors are not fail-stop • • Can we build one? How can we build an approximation of one? 3) Approximations of fail-stop processors are expensive to build • Under what circumstances is replicated service with fail-stop processors “better”? 1) Distributed State Machine Common approach for building a reliable system Idea: Replicate faulty servers, coordinate client interactions with replicas input sequence State machine Client Byzantine agreement R Combine outputs R R output T-fault tolerant: Satisfies specification as long as no more than t components fail Failure model of components determines how many replicas, R, are needed and their interactions How to build t-fault tolerant state machine? Inputs • Key: All replicas receive and process same sequence of inputs • 1) Agreement: Every nonfaulty replica receives same request (interactive consistency or byzantine agreement) • 2) Ordering: Every nonfaulty replica processes requests in same order (logical clocks) Outputs Combine output? Byzantine majority Fail-Stop any Number of replicas? 2t+1 t+1 2) Building a Fail-Stop Processor Must provide stable storage • Volatile: Lost on failure • Stable – – – – Not affected (lost or corrupted) by failure Can be read by any processor Benefit: Recover work of failed process Drawback: Minimize interactions since slow Can only build approximation of fail-stop processor • Finite hardware -> Finite failures could disable all error detection hardware k-fail-stop processor: behaves fail-stop unless k+1 or more failures Implementation of k-FSP: Overview Two components • k+1 p-processes (program) • 2k+1 s-processes (storage) • Each process runs on own processor, all connected with network P-Processes (k+1) • Each runs program for state machine • Interacts with s-processes to read and write data • If any fail (if any disagreement), then all STOP • Cannot necessarily detect k+1 failures S-Processes (2k+1) • Each replicates contents of stable storage for this FSP • Provides reliable data with k failures (cannot just stop) • Detects disagreements/failures across p-processes – How??? Interactive Consistency Requirements IC1. If nonfaulty p-process, then every nonfaulty s-process receives request within seconds (as measured on sprocess clock) IC2. Non-faulty s-processes in same k-FSP agree on every request from p-process j • S-processes must agree even when p-process is faulty To provide IC1 and IC2: • Assuming can authenticate sender of messages, use signed message (SM) protocol for byzantine agreement – Need just k+1 processes for agreeement IC3. For each k-FSP, clocks of all p-processes are synchronized • All non-faulty p-processes must send requests at same time to sprocesses FSP Algorithm Details: Writes Each p-process, on a write: • Broadcast write to all s-processes • Byzantine agreement across all s-processes (all s-processes must agree on same input value from particular pprocess) Each s-process, on a write (Fig 1): • Ensure each p-process writes same value and receive within time bound – Initial code: Handle messages after at least time has transpired since receipt (every s-process should receive by then) – If receive write request from all k+1 p-processes (|M| = k+1), then update value in stable storage – If not, then halt all p-processes • Set failed variable to true • Do not allow future writes FSP Algorithm Details: Reads Each p-process, on a read: • Broadcast request to all s-processes • Use result from majority (k+1 out of 2k+1) • Can read from other FSPs as well – Useful if FSP failed and re-balancing work Each p-process, determine if halted/failed: • Read failed variable from s-process (use majority) FSP Example k=2, SM code: “b=a+1”; How many p and s processes? p: s: 0 0 1 1 2 2 3 4 a: 6 b: failed: 0 How do p-processes read a? 1) Broadcast request to each s-process 2) Each s-process responds to read request 3) Each p-process uses majority of responses from sprocess FSP Example k=2, SM code: “b=a+1” p: s: 0 0 1 1 2 2 How do p-processes read a? • What if 2 s-processes fail? – E.g., think a=5? • What if 3 s-processes fail? 3 4 a: b: failed: FSP Example k=2, SM code: “b=a+1” p: s: 0 0 1 1 2 2 3 4 a: b: failed: How do p-processes write b? • Each p-process j performs byzantine agreement using signed message protocol SM(2) across s-processes – Each s-process must agree on what p-process j is doing, even if j is faulty • Each s-process looks at requests after time delta elapsed • If see same write from all k+1 processes, perform write – Otherwise, halt all p-processes; forbid future writes FSP Example k=2, SM code: “b=a+1” p: s: 0 0 1 1 2 2 3 4 a: b: failed: How do p-processes write b? • What if 1 p-process (or network) is very slow? • What if 1 p-process gives incorrect request to all s-processes? • What if 1 p-process gives incorrect request to some? – Byzantine agreement catches: All s-processes agree that p-process is faulty (giving different requests); agree to treat it similarly – When see doesn’t agree with other p-processes, will halt • What if 3 p-processes give bad result? 3) Higher-Level Example Goal: Service handling k faults; N nodes for performance Solution: Use N+k k-failstop processors Example: N=2, k=3 FSP0 FSP1 FSP2 FSP3 FSP4 SS0 SS1 SS2 SS3 SS4 What happens if: • • • • • 3 p-processes in FSP0 fail? 4 p-processes in FSP0 fail? 1 p-process in FSP0, FSP1, and FSP2 fail? also in FSP3? 2 p-processes in FSP0, FSP1, and FSP2 fail? 1 s-process in SS0 fails? also in SS1, SS2, and SS3? 4 s-processes in SS0 fail? Should we use Fail Stop Processors? Metric: Hardware cost for state machines: • Fail-stop components: – Worst-case (assuming 1 process per processor): • (N+k) * [2k+1 + k+1] = (N+k) * (3k+2) processors – Best-case (assuming s-processes from different FSP share same processor) • (N+k)(k+1) + (2k+1) processors • Byzantine components: – N * (2k+1) • Fail-stop can be better if s-processes share and N>k… Metric: Frequency of byzantine agreement protocol • Fail-Stop: On every access to stable storage • Byzantine: On every input read • Probably fewer input reads Summary Why build fail-stop components? • Easier for higher layers to model and deal with • Matches assumptions of many distributed protocols Why not? • Usually more hardware • Usually more agreements needed • Higher-levels may be able to cope with “slightly faulty” components – Violates end-to-end argument Conclusion: Probably shouldn’t assume fail-stop components Byzantine Werewolves Previous :Too easy for villagers to identify werewolves • Villager A had reliable information that Z was werewolf • Villager B could validate that A was villager • Hard for Z to lie that C was werewolf, because D could have checked C too Signed Protocol: Many could hear what one said • Difficult for werewolves to tell different lies to others • Have to tell everyone same thing New Changes to give more advantage to werewolves: • Unknown number of werewolves (1 <= w < 1/2 N) • Night: Werewolves convert multiple villagers to wolves (1 <= v <= w) – Key: Info told by moderator will then be stale and wrong! • Day: Villagers can vote to lynch multiple victims Byzantine-Werewolf Game Rules Everyone secretly assigned as werewolf or villager • W werewolves, rest are “seeing” villagers • I am moderator Night round (changed order): • “Close your eyes”; make noises with one hand to hide activity • For all: “NAME, open your eyes” “Pick someone to ask about” – – – – Useless for Werewolves, but hides their identity… Point to another player Moderator signs thumbs up for werewolf, down for villager “NAME, close your eyes” • “Werewolves, open your eyes”: W can see who is who – – – – “Werewolves, pick villagers to convert” Moderator picks secret number between 1 and W Silently agree on villagers by pointing Moderator taps converts on shoulder; should open eyes to see other werewolves – “Werewolves, close your eyes” Rules: Day Time Day Time: “Everyone open your eyes; its daytime” • Agreement time: Everyone talks and votes on who should be “decommissioned” – – – – Villagers try to decommission werewolves Werewolves try to trick villagers with bad info Someone must propose who should be killed Vote until kill villager or no more proposals or no majority • Werewolves really spread at night, so large incentive to kill as many as possible now – Moderator: Uses majority voting to determine who is decommissioned “Okay, NAME is dead” – Person is out of game (can’t talk anymore) and shows card Repeat cycle until All werewolves dead OR werewolves >= villagers