Paxos Student Presentation by Jeremy Trimble CS 5204 – Operating Systems 1 Paxos Overview Distributed State Machine Distributed Consensus Fault-tolerance Requirements/Assumptions Paxos Algorithm Concepts The Algorithm Itself Examples Multi-Paxos Variations of Paxos CS 5204 – Operating Systems 2 Paxos Distributed State Machine Fault-tolerance through replication. Need to ensure that replicas remain consistent. Replicas must process requests in the same order. CS 5204 – Operating Systems 3 Paxos The Distributed Consensus Problem In a distributed system, how can we: Select a single action among many options? Simple solution: A single node acts as the “decider.” How can this be done in a fault-tolerant way? But this is not fault tolerant. (What if the decider fails?) A better solution: Paxos CS 5204 – Operating Systems 4 Paxos Fault-Tolerant Consensus Requirements Safety Only a value that has been proposed may be chosen. Only a single value is chosen. A process never learns that a value has been chosen until it actually has been. Goals Liveness FLP Impossibility Proof (1985) CS 5204 – Operating Systems 5 Paxos Assumptions Failures “Fail Stop” assumption Messages When a node fails, it ceases to function entirely. May resume normal operation when restarted. May be lost. May be duplicated. May be delayed (and thus reordered). May not be corrupt. Stable Storage CS 5204 – Operating Systems 6 Paxos Paxos Terms Proposer Proposal Advocates for a client. A1 Renders an accept/reject decision. An alternative proposed by a proposer. Consists of a unique number and a proposed value. ( 42, B ) Considers the values proposed by proposers. Learner Suggests values for consideration by Acceptors. Acceptor P1 We say a value is chosen when consensus is reached on that value. Learns the chosen value. In practice, each node will usually play all three roles. CS 5204 – Operating Systems 7 Paxos Strong Majority “Strong Majority” / “Quorum” Any two quorums have a nonempty intersection. A1 Acceptors decisions are not in agreement. Common node acts as “tie-breaker.” In a system with 2F+1 acceptors, F acceptors can fail and we'll be OK. CS 5204 – Operating Systems A5 A6 Helps avoid “split-brain” problem. A2 A set of acceptors consisting of more than half of all acceptors. A7 A4 A3 Quorums in a system with seven acceptors. 8 Paxos Consensus (N1, V1) A1 (N5, V3) (N2, V2) A2 time (N7, V3) (N4, V1) A3 (N6, V3) (N3, V3) A4 (N2, V2) A5 (N7, V3) consensus reached, V3 chosen Values proposed by proposers are constrained so that once consensus has been reached, all future proposals will carry the chosen value. P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either: (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S. CS 5204 – Operating Systems 9 Paxos Basic Paxos Algorithm Phase 1a: “Prepare” Select proposal number* N and send a prepare(N) request to a quorum of acceptors. Proposer Phase 1b: “Promise” If N > number of any previous promises or acceptances, * promise to never accept any future proposal less than N, - send a promise(N, U) response (where U is the highest-numbered proposal accepted so far (if any)) Phase 2a: “Accept!” If proposer received promise responses from a quorum, - send an accept(N, W) request to those acceptors Acceptor (where W is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Phase 2b: “Accepted” If N >= number of any previous promise, * accept the proposal - send an accepted notification to the learner CS 5204 – Operating Systems * = record to stable storage 10 Paxos P1 A1 A2 A3 start prepare(1) prepare(1) promise(1, -) promise(1) accept(1, A) prepare(1) accepted(1, A) prepare(1) time CS 5204 – Operating Systems 11 Paxos P1 P2 A1 A2 A3 accepted(1, A) prepare(1) continued... prepare(2) prepare(1) promise(2, -) promise(2, (1,A)) accept(2, A) prepare(1) accepted(2, A) accepted(2, I) time CS 5204 – Operating Systems 12 Paxos P1 P2 A1 A2 A3 start prepare(1) prepare(1) promise(1) promise(1) prepare(2) prepare(1) promise(2, -) prepare(1) accept(1, C) prepare(1) accepted(1, C) accept(2, D) prepare(1) accepted(2, D) accepted(1, I) CS 5204 – Operating Systems time 13 Paxos P1 P2 A1 A2 A3 start prepare(1) prepare(1) promise(1) promise(1) prepare(2) prepare(1) promise(2, -) prepare(1) accept(1, F) prepare(1) accepted(1, F) prepare(3) prepare(1) ... prepare(4) prepare(1) time ... CS 5204 – Operating Systems 14 Paxos Other Considerations Liveness Can't be guaranteed in general. Learning the Chosen Value Distinguished Proposer All proposals are funneled through one node. Can re-elect on failure. Acceptors notify some set of learners upon acceptance. Distinguished Learner A node may play the role of both distinguished proposer and distinguished learner – we call such a node the master. CS 5204 – Operating Systems 15 Paxos Multi-Paxos chosen value S P ??? O instance 39 40 41 42 A single instance of Paxos yields a single chosen value. To form a sequence of chosen values, simply apply Paxos iteratively. “time” To distinguish, include an instance number in messages. Facilitates replication of a state machine. CS 5204 – Operating Systems 16 Paxos Paxos Variations Cheap Paxos Reconfiguration Fast Paxos Eject failed acceptors. Fault-tolerant with only F+1 nodes (vs 2F+1). Failures must not happen too quickly. Clients send accept messages to acceptors. Master is responsible for breaking ties. Reduces message traffic. Byzantine Paxos Arbitrary failures – lying, collusion, fabricated messages, selective non-participation. Adds an extra “verify” phase to the algorithm. CS 5204 – Operating Systems 17 Paxos Conclusion State-Machine Replication Distributed Consensus Basic Paxos Examples Optimizations Variations CS 5204 – Operating Systems 18 Paxos Questions? CS 5204 – Operating Systems 19 Paxos References Paxos Made Simple Paxos Made Live http://en.wikipedia.org/wiki/Paxos_algorithm The Byzantine Generals Problem http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Chubby.pdf Wikipedia – Paxos Algorithm http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Simple-Lamport.pdf http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf Impossibility of distributed consensus with one faulty process http://portal.acm.org/citation.cfm?doid=3149.214121 CS 5204 – Operating Systems 20