Life Beyond Distributed Transactions: An Apostate’s Opinion Pat Helland Partner Architect Microsoft Corporation Apostate: noun “One who renounces a previously held belief.” Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 2 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 3 Session Objectives And Takeaways • Distributed transactions aren’t used much in practice • They are fragile and impede availability • Local transactions are wonderful! • Designing for scalability requires planning • Need to think about separate items as the scope of transactions • Even when separate items are on the same machine (today), you must plan for them to be repartitioned later • Interacting across items requires messaging • Managing the messaging is complex • Each partner must track the state of its interaction with partner items • Scalable applications become state-driven workflow • Surprise: the fine granularity of the participants in the workflow Today’s Goal: Offer hopefully insightful opinions about scaleable apps. incite-ful Slide 4 Pointer to Paper • Paper delivered at CIDR-2007 • Conference on Innovation Database Research • http://wwwdb.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf • Terminology changes from Paper: • Entity Item • Activity Partner-State-Machine Slide 5 Assumptions (Don’t Have to Prove These… Just Plain Believe Them) Grown-Ups Don’t Use Distributed Transactions •The apps using distributed transactions become too fragile… • Let’s just consider local transactions. Multiple disjoint scopes of serializability Want Almost-Infinite Scaling • More of everything… Year by year, bigger and bigger • If it fits on your machines, multiply by 10, if that fits, multiply by 1000… • Strive to scale almost linearly (N log N for some big log). Want Scale-Agnostic Apps • Two layers to the application: scale-agnostic and scale-aware • Consider scale-agnostic API Application Upper Layer Scale Agnostic Code Scale Agnostic API Lower Layer Scale-Aware-Code Slide 6 Disjoint Scopes of Serializability • Assume transactions only within a single machine • OK, I’ll give you a small cluster…but not all the machines! • Repartitioning moves data • To expand the app, some data moves to a new data-store • Which data can you count on for a transaction? • • • • Remember, it might get moved… What’s on one machine today may get moved to another tomorrow! Recall, no transactions may cross machines What CAN you tie into a single transaction? Transaction Data Transaction Data Data Slide 7 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 8 Uniquely Keyed Items • Not all data may be in a single transaction • We must collect the data into pieces • We must annotate the boundaries of the data guaranteed to be transactional • Must remain transactional even if we repartition! • An item: • A collection of data that fits on a single machine • Identified by a unique key • Assume the scale-aware-code never partitions an item • The unique key defines the data that can’t be partitioned Key = “ABC” Item Key = “WPB” Item Key = “QLA” Item Key = “UNB” Item The application’s data is factored into items, each of which has a unique key Each item will reside on a single machine (ignoring replication & H/A) Slide 9 Transactions and Items • A transaction may update a single item • The scale-aware-code (and API) guarantee it • The item is never partitioned • A transaction must not ever update two items • Even if the two live on one machine today • Tomorrow, they may repartition to different machines… Item “ABC” Transaction Item “DEF” Slide 10 Repartitioning and Items • Items allow scaling • Items remain intact even when repartitioning • The application can count on the integral nature of each item • It is OK to know that the entire item is local • It is OK to work on anything in the item at once Item “ABC” Item “FAW” “LMN” “GHI” Item “ABZ” Item “FXQ” “JKL” Item “GHI” Item “MOE” Item “NAO” Item “DEF” Item “JAA” “LMN” Item “RST” “JKL” Item “RAA” Item “RST” Item “XYZ” Item “KZU” Item “XYZ” “LMN” Item “XYZ” Frequently the work won’t fit on one computer! No Promise that Two Different Items Stay on the Same Machine!! Slide 11 Thinking about Queries • Queries just got HARD! • Certainly can’t do cross-item transactional queries • The items aren’t in the same scope of serializability • Perhaps can query on stale versions of the data • Very useful… just different than classic DB • Can do distributed queries • Send partial queries around the network • Hard as the dataset explodes in size • Can filter copies of old versions • Keep a subset on a machine for ad-hoc queries • Subset becomes a smaller percentage as we scale… Many Traditional Queries Are Used Today to Implement Items Ad-Hoc Queries Get Harder… Scaling Means It Won’t All Fit Gotta Join to Overcome the Normalization of Rows! Slide 12 Thinking about Alternate Indices • Items must have a unique key • Unless the you begin with the same key, you aren’t the same • CANNOT guarantee the alternate index will co-locate with the item’s primary key • By definition, alternate indices don’t have the same key! • We must index them with a different key… • Alternate indices CANNOT be updated in the same transaction as the primary data • There is no way to guarantee they are on the same machine • They must be updated in different transactions… A2:klw A2:ghu A2:fgh A2:def Item Keys Indexed by 2nd Alternate Key A2:abc A1:MNO A1:JKL A1:GHI A1:DEF Item Keys Indexed by 1st Alternate Key A1:ABC PK:719 PK:589 PK:332 PK:217 PK:123 Item Keys Indexed by 1st Alternate Key Slide 13 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 14 Items Are Connected by Messaging • Items are key-named boundaries for transactional work • Transactions never span items • The scale-aware-code may move them to repartition • The only way to communicate across items is with messaging! • The scale-aware-code is responsible for finding the correct item (by key-name) and for routing the message to it Item-X Send To: Item-Y Boundary of Transactions Item-Y Boundary of Transactions “Messaging” Is in Quotes… Work Is Invoked -- Potentially across Machines -- Definitely across Transactions! Slide 15 Keeping Notes Before You Speak • Transactions update the data within an item • They also update the intent to send a message • Must not send a message unless the intent commits • Otherwise, the message could arrive and the intent to send the message aborts with the sending transaction • Output queues are frequently transactional • Otherwise even more confusing things can happen Item Transaction Item Private Data App Logic Slide 16 At-Least-Once Delivery Semantics • Each message is sent at-least-once • Given infinite time… • The sender tries and tries and tries until acked • Eventually, the message is delivered Dialogs and Exactly-Once Delivery • It is Possible to Implement Exactly-Once Delivery Within a Relationship • Dialog: • Similar to TCP-IP but Long-Running • Can Guarantee Exactly-Once Delivery OR Failure-Notification • Requires Interesting Platform Support • Not the Topic of this Talk • See Microsoft SQL Server 2005 – SQL Service Broker Slide 17 Idempotence: It’s Not a Medical Condition • Requests get lost… • Gotta retry them to handle lost requests • Requests arrive more than once… • Those pesky retries may actually arrive • Idempotent means it’s OK to arrive multiple times • As long as the request is processed at least once, the correct stuff occurs • In today’s world, you must design your requests to be idempotent Not idempotent Withdrawing $1 billion Not idempotent Baking a cake starting from ingredients Naturally idempotent Sweeping the floor Idempotent If haven’t yet done Withdrawal #XYZ for $1 billion, then withdraw $1 billion and label as #XYZ Idempotent Baking a cake Starting from the shopping list (if money doesn’t matter) Naturally idempotent Read record “X” Slide 18 Out of Order Arrival • Any message may arrive multiple times • Even after a long while • This can be very confusing… • Lots of possible message deliveries Item C B A C B A Applications find it difficult to ensure there are no latent bugs -----------------Esoteric late retries of messages may find untested windows… Item Arrg! Slide 19 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 20 Messages Connect Items • Messages are the only way into and out of items • They are produced by transactions • They are consumed by transactions • Transactions are local to the item From: Item-A From: Item-B From: Item-C From: Item-D Send To: Item-A Item-X Send To: Item-B Send To: Item-C Slide Slide21 21 Items Connected by Partnerships • Mostly, messaging occurs between two partner items • Usually, a two-way exchange moving both items’ state • Each keeps data about how far its state has advanced… Item-W Item-X Item-Y Item-Z Slide 22 Tracking with Partner-State-Machines • Partner-state-machine refers to the knowledge about a partner item • Descriptions of what messages have been received • Descriptions of what obligations exist to the partner • The foundation for workflow to replace distributed transactions Item-W PSM-X Item-X Item-Y PSM-W PSM-X PSM-W PSM-Z PSM-X Item-Z • Two basic observations wrapped up in the “partner-state-machine” • Work across items is workflow based on two-party relationships • The granularity of the workflow participant is an item (fine-grained) Slide 23 Idempotence, Partners, and Partner-State-Machines • Partner-state-machines manage idempotence • They keep track of what’s been seen • If it’s a repeat, ignore it • Repeated messages eliminated via partnership Item-X Item-Y 1 2 3 PSM-Y Seen Msg-1, 2, 3… PSM-X Seen Msg-A, B, C… C B A Slide 24 Retirement of Items • It is normal for items to retire • The shipment is shipped • The order completes • Activities advance to completion • Incoming messages are accepted • No new messages are needed • Typical for the work of an item to complete… • Retirement usually means “become read-only” • Sometimes old items are deleted Sometimes Items Exist for Long-Lived Purposes: -- Inventory, Bank-Balance, Customer -- Called “Resource-Items” Not the topic for this talk… another talk is needed! Slide 25 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 26 “Append-Only” Data • Many Kinds of Computing are “Append-Only” • Lots of observations are made about the world • Debits, credits, Purchase-Orders, Customer-ChangeRequests, etc • As time moves on, more observations are added • You can’t change the history but you can add new observations • Derived Results May Be Calculated • Estimate of the “current” inventory • Frequently inaccurate • Historic Rollups Are Calculated • Monthly bank statements Slide 27 Databases and Transaction Logs • Transaction Logs Are the Truth • High-performance & write-only • Describe ALL the changes to the data • Data-Base the Current Opinion • Describes the latest value of the data as perceived by the application The Database Is a Caching of the Transaction Log ! Log DB It is the subset of the latest committed values represented in the transaction log… Slide 28 Accountants, Erasers, and Jail • Accountants Go to Jail if They Use Erasers !!! • The normal accounting practices allow for corrections but not updates • Corrections are added to the information • The derived values are recalculated • It is a common application paradigm to keep almost all data as append-only • The transactions themselves are append-only • Sometimes they are eventually retired. • The rollup (derived) summary may be recalculated • Periodic snapshots of the rollup (derived) data is appended to the record • E.g. a monthly bank statement Slide 29 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 30 Versions and Distributed Systems • Can’t have “the same” data at many locations • Unless it is a snapshot • Changing distributed data needs versions • Creates a snapshot… Data Owning Service Wednesday’s Price-List Wednesday’s Price-List Price-List Wednesday’s Price-List Tuesday’s Price-List Monday’s Price-List Wednesday’s Price-List Tuesday’s Price-List Listening Partner Service-1 Listening Partner Service-5 Listening Partner Service-8 Tuesday’s Price-List Monday’s Price-List Listening Partner Service-7 Slide 31 DAGs of History Data “B1” Data “A1” Data “C2.1” Data “A1.1” Data “B2” Data “B3” Data “A2” Data “D1.1” Data “C1” Data “D1” Service-1 Service-2 Data “D2.1” Data “C2” Data “D2” Data “D1.2” Data “C3” Service-3 Data “D3” Service-4 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 33 Tentative Operations • Items don’t share transactions • Now what can we do? • Items may accept tentative operations • Like a reservation; may be cancelled later • If cancelled, the receiving item must cope • Special logic to deal with cancellations Item-A Cancellation Tentative Op Item-B Slide 34 Semantics of Tentative Operations • Tentative operations must be reorderable • When cancelled, a compensation must occur • Other operations may have occurred since • Operations and cancellations must be reorderable! 3 Item-A Cancellation 1 Tentative Op Item-B 2 Item-C Tentative Op Slide 35 Semantics of Cancellation and Confirmation • Cancellation • Cope with not doing tentative operation • Not undo • New operation to “make things right” • Accepting tentative means it’s OK to cancel • Confirmation • Relinquish the right to cancel tentative op • Sometimes time driven • Hotel rooms confirm in the morning • Every tentative op confirms or cancels Slide 36 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 37 Increasing & Decreasing Uncertainty • Each tentative operation increases your uncertainty • You get more and more confused each time you accept a tentative operation • Each confirmation or cancellation decreases your uncertainty • It resolves the confusion imparted by the tentative operation it is confirming or canceling Tentative Operation More Uncertain Cancellation or Confirmation Uncertainty Less Uncertain Slide 38 Bounded Uncertainty • You can track the worst case situations for data values you are managing • If you keep inventory, you can know the lowest possible and highest possible values • Tentative operations move lowest and highest values apart • This increases uncertainty • Confirmations and cancellations move lowest and highest values together • This decreases uncertainty Minimum Widgets Possible Maximum Widgets Possible Probability • Knowing the bounds, you have Bounded Uncertainty Widget Inventory Slide 39 Acting on Bounded Uncertainty • Knowing bounds on uncertainty allows many different business rules: • Refuse an order which may (in the worst case) result in widgets overflowing the warehouse • Calculate probability of worst case overflowing the warehouse • Cost of temporary storage vs. value of accepting order… • Order food for hotel restaurant based on reservations and probabilities • May result in interesting work by applying risk management algorithms… Slide 40 Outline Introduction Data, Transactions, and Scalability Messaging across Items Partner-State and State-Machines Accountants Don’t Use Erasers Accurate Representations of Historical Facts Managing Uncertainty across Items Bounds of Uncertainty in Loosely-Coupled Systems Conclusion Slide 41 Vocabulary and Assertions Assertions about large scale apps New vocabulary for discussing scale Almost-infinite scaling Scale-agnostic app Item Partner-StateMachine An environment demanding rapidly increasing data and computation over time An application that does not need to change to support almost-infinite scaling A collection of data referenced by a single key; transactional scope of the scale-agnostic app Data used inside one item to describe its workflow state with a single partner item Alternate indices aren’t transactionally consistent As scale increases, the primary and alternate indices cannot be guaranteed to live together Items cooperate using fine-grained two-party workflow No dist-txs workflow; workflow participants are items; work coordinated across pairs Slide 42 Takeaways • Scale agnostic application design • Designing for scale leads you away from distributed transactions • Local transactions are great distributed transactions suck • Programming for scale leads to separate pieces of data called items • Items must live in separate transactions • Items are only connected with messaging • “Classic” workflow but fine-grained • Separate items messaging… but messaging is hard! • Messages get lost and need retries • Retries give at least once delivery • Must have idempotent processing of messages • Coping with idempotent messaging requires “partner-state-machines” • One PSM per-partner per-side holds the state of the relationship • The scale-agnostic app uses activities to cope with retries • PSMs can compose to mask complexity Slide 43 Complete your evaluation on the My Event pages of the website at the CommNet or the Feedback Terminals to win! All attendees who submit a session feedback form within 12 hours after the session ends will have the chance to win the very latest HTC 'Touch' smartphone complete with Windows Mobile® 6 Professional © 2007 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.