Logic and Lattices for Distributed Programming Neil Conway UC Berkeley Joint work with: Peter Alvaro, Peter Bailis, David Maier, Bill Marczak, Joe Hellerstein, Sriram Srinivasan Basho Chats #004 June 27, 2012 Programming Input x Output f(x) Distributed Programming Output f(x) Input x Network Behavior Output f(x) Output f(x) Dealing with Disorder Introduce order – Paxos, Zookeeper, Two-Phase Commit, … – “Strong Consistency” Tolerate disorder – Correct behavior in the face of many possible network orders – Typical goal: replicas converge to same final state • “Eventual Consistency” Eventual Consistency Popular Hard to program Help developers build reliable programs on top of eventual consistency This Talk 1. Theory – CRDTs, Lattices, and CALM 2. Practice – Programming with Lattices – Case Study: KVS Write: Read: {Alice, {Alice, Bob,Bob} Carol} Client0 Students {Alice, Bob, Carol} {Alice, Bob} How to resolve? Write: Read: {Alice, {Alice, Bob,Bob} Dave} Client1 Students {Alice, {Alice, Bob, Bob} Dave} Proble m Replicas perceive different event orders Goal Same final state at all replicas Solutio Commutative operations n (“merge functions”) Client0 Students {Alice, Bob, Carol, Dave} Merge = Set Union Client1 Students {Alice, Bob, Carol, Dave} Commutative Operations • Used by Dynamo, Riak, Bayou, etc. • Formalized as CRDTs: Convergent and Commutative Replicated Data Types – Shapiro et al., INRIA (2009-2012) – Based on join semilattices – Commutative, associative, idempotent • Practical libraries: Statebox, Knockbox Time “Growth”: Larger Sets Set (Union) “Growth”: Larger Numbers Integer (Max) “Growth”: false true Boolean (Or) Read: {Alice, Bob, Carol, Dave} Students {Alice, {Alice, Bob, Bob, Carol, Carol} Dave} Client0 Read: {<Alice,Bob>} Write: {<Alice,Bob>, <Carol,Dave>} Teams Teams {<Alice, Bob>, {<Alice, Bob>} <Carol, Dave>} Replica Synchronization Remove: {Dave} Client1 Students {Alice, Bob, Carol} {Alice, Bob, Carol, Dave} Teams Teams {<Alice, Bob>, {<Alice, Bob>} <Carol, Dave>} Read: {Alice, Bob, Carol} Students {Alice, Bob, Carol, Dave} {Alice, Bob, Carol} Client0 Read: {<Alice,Bob>} Teams {<Alice, Bob>} {<Alice, Bob>} Replica Nondeterministic Synchronization Outcome! Remove: {Dave} Students {Alice, {Alice, Bob, Bob, Carol, Carol} Dave} Client1 Teams {<Alice, Bob>} {<Alice, Bob>} Possible Solution: Wrap both replicated values in a single complex CRDT Goal: Compose larger application using “safe” mappings between simple lattices Time Monotone function from set max size() Set (merge = Union) Monotone function from max boolean >= 5 Integer (merge = Max) Boolean (merge = Or) Monotonicity in Practice “The more you know, the more you know” Never retract previous outputs (“mistake-free”) Typical patterns: • immutable data • accumulate knowledge over time • threshold tests (“if” w/o “else”) Monotonicity and Determinism Agents strictly learn more knowledge over time Monotone: different learning order, same final outcome Result: Program is deterministic! A program is confluent if it produces the same results regardless of network nondeterminism Output f(x) Input x Network Behavior Output f(x) Output f(x) 20 A program is confluent if it produces the same results regardless of network nondeterminism Input x Network Behavior Output f(x) 21 Consistency As Logical Monotonicity CALM Analysis 1. All monotone programs are confluent 2. Simple syntactic test for monotonicity Result: Simple static analysis for eventual consistency Handling Non-Monotonicity … is not the focus of this talk Basic choices: 1. Nodes agree on an event order using a coordination protocol (e.g., Paxos) 2. Allow non-deterministic outcomes • If needed, compensate and apologize Putting It Into Practice What we’d like: • Collection of agents • No shared state ( message passing) • Computation over arbitrary lattices Bloom Organization Collection of agents Communication Message passing State Relations (sets) Computation Relational rules over sets (Datalog, SQL) Bloom Organization BloomL Collection of agents Collection of agents Communication Message passing Message passing State Relations (sets) Lattices Computation Relational rules over sets (Datalog, SQL) Functions over lattices Quorum Vote in BloomL QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud Annotated Ruby class Communication interfaces state do channel :vote_chn, [:@addr, :voter_id] Program state channel :result_chn, [:@addr] Lattice state declarations lset :votes lmax :vote_cnt lbool :got_quorum AccumulateMap votes set ! max end into set Map max ! bool bloom do votes <= vote_chn {|v| v.voter_id} Program vote_cnt <= votes.size got_quorum <= vote_cnt.gt_eq(QUORUM_SIZE) result_chn <~ got_quorum.when_true { [RESULT_ADDR] } Merge function for set lattice end Threshold test on bool end logic 27 Builtin Lattices Name Description ? atb lbool Threshold test lmax Sample Monotone Functions false a∨ b Increasing number 1 max(a,b ) gt(n) ! lbool +(n) ! lmax -(n) ! lmax lmin Decreasing number −1 min(a,b) lt(n) ! lbool lset Set of values ; a[b intersect(lset) ! lset product(lset) ! lset contains?(v) ! lbool size() ! lmax lpset Non-negative set ; a[b sum() ! lmax lbag Multiset of values ; a[b mult(v) ! lmax +(lbag) ! lbag lmap Map from keys to lattice values empty map when_true() ! v at(v) ! any-lat intersect(lmap) ! lmap 28 Case Study Goal: Provably eventually consistent key-value store (KVS) Assumption: Map keys to lattice values (i.e., values do not decrease) Solution: Use a map lattice Time Nested lattice value Replica 1 Replica 2 Time Add new K/V pair Replica 1 Replica 2 Time “Grow” value in extant K/V pair Replica 1 Replica 2 Time Replica Synchronization Replica 1 Replica 2 Goal: Provably eventually consistent KVS that stores arbitrary values Solution: Assign a version to each key-value pair Each replica stores increasing versions, not increasing values Object Versions in Dynamo/Riak 1. Each KV pair has a vector clock version 2. Given two versions of a KV pair, prefer the one with the strictly greater version 3. If versions are incomparable, invoke userdefined merge function Vector Clock: Map from node IDs logical clocks Solution: Use a map lattice Logical Clock: Increasing counter Solution: Use an increasing-int lattice Version-Value Pairs Pair = <fst, snd> Pair merge(Pair o) { if self.fst > o.fst: self elsif self.fst < o.fst: o else new Pair(self.fst.merge(o.fst), self.snd.merge(o.snd)) } Time Replica 1 Replica 2 Time Version increase; NOT value increase Replica 1 Replica 2 Time R1’s version replaces R2’s version Replica 1 Replica 2 Time New version @ R2 Replica 1 Replica 2 Time Concurrent writes! Replica 1 Replica 2 Merge VC (automatically), value merge via user’s lattice (as in Dynamo) Time Replica 1 Replica 2 Lattice Composition in KVS Key-Value Store lmap <Version, Value> Pair key lpair User-Defined Merge Function Vector Clock lmap node ID user-lattice lmax Conclusion Dealing with EC Many event orders orderindependent (disorderly) programs Lattices Disorderly state Monotone Disorderly computation Functions Monotone Lattices + monotone functions for Bloom safe distributed programming Questions Welcome Please try Bloom! http://www.bloom-lang.org Or: gem install bud Backup Slides Lattices hS,t,?i is a bounded join semi-lattice iff: – S is a partially ordered set – t is a binary operator (“least upper bound”) • For all x,y 2 S, x t y = z where x ·S z, y ·S z, and there is no z’ z 2 S such that z’ ·S z. • Associative, commutative, and idempotent – ? is the “least” element in S (8x 2 S: ? t x = x) Example: increasing integers – S = Z, t = max, ? = -∞ 49 Monotone Functions f : ST is a monotone function iff 8a,b 2 S : a ·S b ) f(a) ·T f(b) Example: size(Set) ! Increasing-Int size({A, B}) = 2 size({A, B, C}) = 3 50 From Datalog ! Lattices Datalog (Bloom) BloomL State Relations Lattices Example Values [[“red”, 1], [“green”, 2]] set: [“red”, “green”] map: {“red” => 1, “green” => 2} counter: 5 condition: false Computation Rules over relations Functions over lattices Monotone Computation Monotone rules Monotone functions Program Semantics Fixpoint of rules (stratified semantics) Fixpoint of functions (stratified semantics) 51 Bloom Operational Model Fixpoint State Update Local Updates Bloom Rules System Events Inbound Network atomic, local, deterministic Outbound Network 52 Quorum Vote in Bloom QUORUM_SIZE = 5 RESULT_ADDR = "example.org" class QuorumVote include Bud Communication state do channel :vote_chn, [:@addr, :voter_id] channel :result_chn, [:@addr] table :votes, [:voter_id] scratch :cnt, [] => [:cnt] end Persistent Storage Transient Storage Not (set) monotonic! Accumulate votes bloom do votes <= vote_chn {|v| [v.voter_id]} cnt <= votes.group(nil, count(:voter_id)) result_chn <~ cnt {|c| [RESULT_ADDR] if c >= QUORUM_SIZE} end end Send message when quorum reached 53 Current Status Writeups BloomL: UCB Tech Report Bloom/CALM: CIDR’11, website Lattice Runtime Available as a git branch • To be merged soon-ish Examples, • KVS Case • Shopping carts Studies • Causal delivery Under development: • MDCC, concurrent editing