Transaction chains: achieving serializability with low-latency in geo-distributed storage systems Yang Zhang Russell Power Siyuan Zhou Yair Sovran *Marcos K. Aguilera Jinyang Li New York University *Microsoft Research Silicon Valley Why geo-distributed storage? Large-scale Web applications Geo-distributed storage Replication Geo-distribution is hard Low latency: O(Intra-datacenter RTT) Strong semantics: relational tables w/ transactions Prior work Strict serializable High latency Spanner [OSDI’12] Serializable Provably high latency according to CAP Various non-serializable Our work Walter [SOSP’11] COPS [SOSP’11] ? Eiger [NSDI’13] Low latency Dynamo [SOSP’07] Eventual Key/value only Limited forms of transaction General transaction Our contributions 1. A new primitive: transaction chain – Allow for low latency, serializable transactions 1. Lynx geo-storage system: built with chains – Relational tables – Secondary indices, materialized join views Talk Outline • • • • Motivation Transaction chains Lynx Evaluation Why transaction chains? Auction service Items Bids Bidder Alice Item Book Price $100 Seller Item Highest bid Alice iPhone $20 Bob Book $20 Bob Camera $100 Bob Alice Datacenter-1 Datacenter-2 Why transaction chains? Operation: Alice bids on Bob’s camera 1. Insert bid to Alice’s Bids 2. Update highest bid on Bob’s Items Alice’s Bids Alice Book $100 Bob Bob’s Items Alice Bob Datacenter-1 Camera Datacenter-2 $100 Why transaction chains? Operation: Alice bids on Bob’s camera 1. Insert bid to Alice’s Bids 2. Update highest bid on Bob’s Items Alice’s Bids Alice Book $100 Bob Bob’s Items Alice Bob Datacenter-1 Camera Datacenter-2 $100 Low latency with first-hop return Alice bid on Bob’s camera Alice’s Bids Alice Book $100 Alice Camera $500 Bob Bob’s Items Bob Datacenter-1 Camera Datacenter-2 $100 $500 Problem: what if chains fail? 1. What if servers fail after executing first-hop? 2. What if a chain is aborted in the middle? Solution: provide all-or-nothing atomicity 1. Chains are durably logged at first-hop – Logs are replicated to another closest data center – Chains are re-executed upon recovery 2. Chains allow user-aborts only at first hop • Guarantee: First hop commits all hops eventually commit Problem: non-serializable interleaving • Concurrent chains ordered inconsistently at different hops Not serializable! T1 Server-X: T1 < T2 X=1 T2 Y=1 T2 X=2 Server-Y: T2 < T1 T1 Y=2 Time • Traditional 2PL+2PC prevents non-serializable interleaving at the cost of high latency Solution: detect non-serializable interleaving via static analysis • Statically analyze all chains to be executed – Web applications invoke fixed set of operations T1 X=1 Y=1 Conflict? T2 X=2 Y=2 A SC-cycle has both red and blue edges Serializable if no SC-cycle [Shasha et. al TODS’95] Outline • • • • Motivation Transaction chains Lynx’s design Evaluation How Lynx uses chains • User chains: used by programmers to implement application logic • System chains: used internally to maintain – Secondary indexes – Materialized join views – Geo-replicas Example: secondary index Bids (secondary index) Bids (base table) Bidder Item Price Bidder Item Price Alice Camera $100 Alice Camera $100 Bob iPhone Bob Car $20 Alice Book Alice iPhone $20 $20 $100 Bob Car Bob $20 Camera $100 Example user and system chain Alice bid on Bob’s camera Alice Book $100 Alice Camera $100 Bob Bob Datacenter-1 Camera Datacenter-2 $100 Lynx statically analyzes all chains beforehand Read-bids Read Bids table Put-bid Insert to Bids table Update Items table Put-bid Insert to Bids table Update Items table SC-cycle Read-bids Read Bids table One solution: execute chain as a distributed transaction SC-cycle source #1: false conflicts in user chains Put-bid Insert to Bids table Update Items table False conflict because max(bid, current_price) commutes Put-bid Insert to Bids table Update Items table Solution: users annotate commutativity Update Items table commutes Put-bid Insert to Bids table Put-bid Insert to Bids table Update Items table SC-cycle source #2: system chains Put-bid Put-bid Insert to Insert to Bids table Bids-secondary Insert to Insert to Bids table Bids-secondary … … SC-cycle Solution: chains provide origin-ordering • Observation: conflicting system chains originate at the same first hop server. T1 T2 Insert to Bids table Insert to Bids table Insert to Bids-secondary Insert to Bids-secondary Both write the same row of Bids table • Origin-ordering: if chains T1 < T2 at same first hop, then T1 < T2 at all subsequent overlapping hops. – Can be implemented cheaply sequence number vectors Limitations of Lynx/chains 1. Chains are not strictly serializable, only serializable. 2. Programmers can abort only at first hop • Our application experience: limitations are managable Outline • • • • Motivation Transaction chains Lynx’s design Evaluation Simple Twitter Clone on Lynx Tweets Geo-replicated Author Tweet Alice New York rocks Bob Time to sleep Eve Hi there Follow-Graph Geo-replicated From To Tweets JOIN Follow-Graph (Timeline) Author (=to) Follow-Graph (secondary) To From Alice Bob Bob Alice Alice Eve Bob Clark From Tweet Bob Alice Time to sleep Eve Alice Hi there Experimental setup europe Lynx protoype: • In-memory database • Local disk logging only. us-west us-east Returning on first-hop allows low latency Chain completion 300 252 Latency (ms) 250 200 174 150 100 First hop return 50 3.2 0 Follow-user Post-tweet Follow-user 3.1 3.1 Post-tweet Read-timeline Applications achieve good throughput 1.6 1.35 Million ops/sec 1.4 1.2 1 0.8 0.6 0.4 0.2 0.184 0.173 Follow-User Post-Tweet 0 Read-Timeline Related work • Transaction decomposition – SAGAS [SIGMOD’96], step-decomposed transactions • Incremental view maintenance – Views for PNUTS [SIGMOD’09] • Various geo-distributed/replicated storage – Spanner[OSDI’12], MDCC[Eurosys’13], Megastore[CIDR’11], COPS [SOSP’11], Eiger[NSDI’13], RedBlue[OSDI’12]. Conclusion • Chains support serializability at low latency – With static analysis of SC-cycles • Key techniques to reduce SC-cycles – Origin ordering – Commutative annotation • Chains are useful – Performing application logic – Maintaining indices/join views/geo-replicas Limitations of Lynx/chains 1. Chains are not strict serializable Time Serializable Strict serializable Remedies: – Programmers can wait for chain completion – Lynx provides read-your-own-writes 2. Programmers can only abort at first hop • Our application experience shows the limitations are managable 2PC and chains The easy way T1 T2 T1 R(A) W(A) R(A) W(B) T2 T2 T1 W(A) R(A) W(B) T1 R(A) 2PC-W(AB) 2PC and chains The hard way T1 T2 R(A) R(B) W(A) W(B) R(A) T1 T2 T2 W(A) R(B) 2PC-W(AB) W(B) T1 T1 R(A) R(B) R(A) R(B) 2PC and chains The hard way Chain A B C D DC1 DC2 DC3 DC4 Parallel unlock 2PC retry Lynx is scalable 3000 2770 2500 QPS (K/s) 2000 1500 Follow 1350 Tweet 1000 Timeline 586 500 374 356 265 48 42 93 86 184 173 0 1 2 4 #Servers per DC 8 Challenge of static analysis: false conflict T1 T2 1. Insert bid into bid history 2. Update max price on item Conflict on bid history Conflict on item 1. Insert bid into bid history 2. Update max price on item SC-cycle Not serializable Solution: communitivity annotations T1 1. Insert bid into bid history No real conflict because bid ids are unique T2 Conflict on Commutative bid history operation 1. Insert bid into bid history 2. Update max price on item Updating max commutes Conflict on Commutative operation item 2. Update max price on item No SC-cycle Serializable ACID: all-or-nothing atomicity • Chain’s failure guarantee: – If the first hop of a chain commits, then all hops eventually commit • Users are only allowed to abort a chain in the first hop • Achievable with low latency: – Log chains durably at the first hop • Logs replicated to a nearby datacenter – Re-execute stalled chains upon failure recovery ACID: serializability • Serializability – Execution result appears as if obey a serial order for all transactions – No restrictions on the serial order Transactions Ordering 1 Ordering 2 Problem #2: unsafe interleaving • Serializability – Execution result appears as if obey a serial order for all transactions – No restrictions on the serial order Transactions Ordering 1 Ordering 2 Chains are not linearizable • Serializability • Linearability a total ordering of chains a total ordering of chains & total order obeys the issue order Transactions Time Ordering 1 Linearizable Ordering 2 Transaction chains: recap • Chains provide all-or-nothing atomicity • Chains ensure serializability via static analysis • Practical challenges: – How to use chains? – How to avoid SC-cycles? Example user chain Items Bids Bidder Alice Item Seller Price Camera 100 Item Bob Camera Highest 100 Alice 1. Insert bid into Alice’s bid history Bob 2. Update max price on Bob’s camera Lynx implementation • 5000 lines C++ and 3500 lines RPC library • Uses an in-memory key/value store • Support user chains in Javascript (via V8) Geo-distributed storage is hard • Applications demand simplicity & performance – Friendly programming model • Relational tables • Transactions – Fast response • Ideally, operation latency = O(intra-datacenter RTT) • Geo-distribution leads to high latency – Coordinate data access across datacenters • Operation latency = O(inter-datacenter RTT) = O(100ms)