Life Beyond
Distributed Transactions:
An Apostate’s Opinion
Pat Helland
Partner Architect
Microsoft Corporation
Apostate:
noun
“One who renounces a previously held belief.”
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 2
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 3
Session Objectives And Takeaways
• Distributed transactions aren’t used much in practice
• They are fragile and impede availability
• Local transactions are wonderful!
• Designing for scalability requires planning
• Need to think about separate items as the scope of transactions
• Even when separate items are on the same machine (today), you must plan
for them to be repartitioned later
• Interacting across items requires messaging
• Managing the messaging is complex
• Each partner must track the state of its interaction with partner items
• Scalable applications become state-driven workflow
• Surprise: the fine granularity of the participants in the workflow
Today’s Goal:
Offer hopefully insightful opinions about scaleable apps.
incite-ful
Slide 4
Pointer to Paper
• Paper delivered at CIDR-2007
• Conference on Innovation Database Research
• http://wwwdb.cs.wisc.edu/cidr/cidr2007/papers/cidr07p15.pdf
• Terminology changes from Paper:
• Entity  Item
• Activity  Partner-State-Machine
Slide 5
Assumptions
(Don’t Have to Prove These… Just Plain Believe Them)
Grown-Ups Don’t Use Distributed Transactions
•The apps using distributed transactions become too fragile…
• Let’s just consider local transactions.
 Multiple disjoint scopes of serializability
Want Almost-Infinite Scaling
• More of everything… Year by year, bigger and bigger
• If it fits on your machines, multiply by 10, if that fits, multiply by 1000…
• Strive to scale almost linearly (N log N for some big log).
Want Scale-Agnostic Apps
• Two layers to the application:
scale-agnostic and scale-aware
• Consider scale-agnostic API
Application
Upper Layer
Scale Agnostic Code
Scale Agnostic API
Lower Layer
Scale-Aware-Code
Slide 6
Disjoint Scopes of Serializability
• Assume transactions only within a single machine
• OK, I’ll give you a small cluster…but not all the machines!
• Repartitioning moves data
• To expand the app, some data moves to a new data-store
• Which data can you count on for a transaction?
•
•
•
•
Remember, it might get moved…
What’s on one machine today may get moved to another tomorrow!
Recall, no transactions may cross machines
What CAN you tie into a single transaction?
Transaction
Data
Transaction
Data
Data
Slide 7
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 8
Uniquely Keyed Items
• Not all data may be in a single transaction
• We must collect the data into pieces
• We must annotate the boundaries of the data guaranteed to be
transactional
• Must remain transactional even if we repartition!
• An item:
• A collection of data that fits on a single machine
• Identified by a unique key
• Assume the scale-aware-code never partitions an item
• The unique key defines the data that can’t be partitioned
Key = “ABC”
Item
Key = “WPB”
Item
Key = “QLA”
Item
Key = “UNB”
Item
The application’s data is
factored into items, each
of which has a unique key
Each item will reside on
a single machine
(ignoring replication & H/A)
Slide 9
Transactions and Items
• A transaction may update a single item
• The scale-aware-code (and API) guarantee it
• The item is never partitioned
• A transaction must not ever update two items
• Even if the two live on one machine today
• Tomorrow, they may repartition to different machines…
Item
“ABC”
Transaction
Item
“DEF”
Slide 10
Repartitioning and Items
•
Items allow scaling
• Items remain intact even when repartitioning
•
The application can count on the integral nature of each item
• It is OK to know that the entire item is local
• It is OK to work on anything in the item at once
Item
“ABC”
Item
“FAW”
“LMN”
“GHI”
Item
“ABZ”
Item
“FXQ”
“JKL”
Item
“GHI”
Item
“MOE”
Item
“NAO”
Item
“DEF”
Item
“JAA”
“LMN”
Item
“RST”
“JKL”
Item
“RAA”
Item
“RST”
Item
“XYZ”
Item
“KZU”
Item
“XYZ”
“LMN”
Item
“XYZ”
Frequently
the work
won’t fit
on one
computer!
No Promise
that Two
Different
Items Stay
on the Same
Machine!!
Slide 11
Thinking about Queries
• Queries just got HARD!
• Certainly can’t do cross-item transactional queries
• The items aren’t in the same scope of serializability
• Perhaps can query on stale versions of the data
• Very useful… just different than classic DB
• Can do distributed queries
• Send partial queries around the network
• Hard as the dataset explodes in size
• Can filter copies of old versions
• Keep a subset on a machine for ad-hoc queries
• Subset becomes a smaller percentage as we scale…
Many Traditional Queries Are
Used Today to Implement Items
Ad-Hoc Queries Get Harder…
Scaling Means It Won’t All Fit
Gotta Join to Overcome the
Normalization of Rows!
Slide 12
Thinking about Alternate Indices
•
Items must have a unique key
• Unless the you begin with the same key, you aren’t the same
•
CANNOT guarantee the alternate index will co-locate with the
item’s primary key
• By definition, alternate indices don’t have the same key!
• We must index them with a different key…
•
Alternate indices CANNOT be updated in the same transaction
as the primary data
• There is no way to guarantee they are on the same machine
• They must be updated in different transactions…
A2:klw
A2:ghu
A2:fgh
A2:def
Item Keys Indexed
by 2nd Alternate Key
A2:abc
A1:MNO
A1:JKL
A1:GHI
A1:DEF
Item Keys Indexed
by 1st Alternate Key
A1:ABC
PK:719
PK:589
PK:332
PK:217
PK:123
Item Keys Indexed
by 1st Alternate Key
Slide 13
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 14
Items Are Connected by Messaging
• Items are key-named boundaries for transactional work
• Transactions never span items
• The scale-aware-code may move them to repartition
• The only way to communicate across items is with messaging!
• The scale-aware-code is responsible for finding the correct item
(by key-name) and for routing the message to it
Item-X
Send To:
Item-Y
Boundary of
Transactions
Item-Y
Boundary of
Transactions
“Messaging” Is in Quotes… Work Is Invoked
-- Potentially across Machines
-- Definitely across Transactions!
Slide 15
Keeping Notes Before You Speak
• Transactions update the data within an item
• They also update the intent to send a message
• Must not send a message unless the intent commits
• Otherwise, the message could arrive and the intent to send the
message aborts with the sending transaction
• Output queues are frequently transactional
• Otherwise even more confusing things can happen
Item
Transaction
Item
Private
Data
App Logic
Slide 16
At-Least-Once Delivery Semantics
• Each message is sent at-least-once
• Given infinite time…
• The sender tries and tries and tries until acked
• Eventually, the message is delivered
Dialogs and Exactly-Once Delivery
• It is Possible to Implement Exactly-Once Delivery Within a Relationship
• Dialog:
• Similar to TCP-IP but Long-Running
• Can Guarantee Exactly-Once Delivery OR Failure-Notification
• Requires Interesting Platform Support
• Not the Topic of this Talk
• See Microsoft SQL Server 2005 – SQL Service Broker
Slide 17
Idempotence:
It’s Not a Medical Condition
• Requests get lost…
• Gotta retry them to handle lost requests
• Requests arrive more than once…
• Those pesky retries may actually arrive
• Idempotent means it’s OK to arrive multiple times
• As long as the request is processed at least once, the correct stuff
occurs
• In today’s world, you must design your requests to be
idempotent
Not idempotent
Withdrawing
$1 billion
Not idempotent
Baking a cake
starting from
ingredients
Naturally idempotent
Sweeping the floor
Idempotent
If haven’t yet done
Withdrawal #XYZ
for $1 billion,
then withdraw
$1 billion and
label as #XYZ
Idempotent
Baking a cake
Starting from
the shopping
list (if money
doesn’t matter)
Naturally idempotent
Read record “X”
Slide 18
Out of Order Arrival
• Any message may arrive multiple times
• Even after a long while
• This can be very confusing…
• Lots of possible message deliveries
Item
C
B
A
C
B
A
Applications find it
difficult to ensure
there are no
latent bugs
-----------------Esoteric late retries
of messages may find
untested windows…
Item
Arrg!
Slide 19
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 20
Messages Connect Items
• Messages are the only way into and out of items
• They are produced by transactions
• They are consumed by transactions
• Transactions are local to the item
From:
Item-A
From:
Item-B
From:
Item-C
From:
Item-D
Send To:
Item-A
Item-X
Send To:
Item-B
Send To:
Item-C
Slide
Slide21
21
Items Connected by Partnerships
• Mostly, messaging occurs between two partner items
• Usually, a two-way exchange moving both items’ state
• Each keeps data about how far its state has advanced…
Item-W
Item-X
Item-Y
Item-Z
Slide 22
Tracking with Partner-State-Machines
• Partner-state-machine refers to the knowledge about a partner item
• Descriptions of what messages have been received
• Descriptions of what obligations exist to the partner
• The foundation for workflow to replace distributed transactions
Item-W
PSM-X
Item-X
Item-Y
PSM-W
PSM-X
PSM-W
PSM-Z
PSM-X
Item-Z
• Two basic observations wrapped up in the “partner-state-machine”
• Work across items is workflow based on two-party relationships
• The granularity of the workflow participant is an item (fine-grained)
Slide 23
Idempotence, Partners, and
Partner-State-Machines
• Partner-state-machines manage idempotence
• They keep track of what’s been seen
• If it’s a repeat, ignore it
• Repeated messages eliminated via partnership
Item-X
Item-Y
1
2
3
PSM-Y
Seen Msg-1, 2, 3…
PSM-X
Seen Msg-A, B, C…
C
B
A
Slide 24
Retirement of Items
• It is normal for items to retire
• The shipment is shipped
• The order completes
• Activities advance to completion
• Incoming messages are accepted
• No new messages are needed
• Typical for the work of an item to complete…
• Retirement usually means “become read-only”
• Sometimes old items are deleted
Sometimes Items Exist for Long-Lived Purposes:
-- Inventory, Bank-Balance, Customer
-- Called “Resource-Items”
Not the topic for this talk… another talk is needed!
Slide 25
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 26
“Append-Only” Data
• Many Kinds of Computing are “Append-Only”
• Lots of observations are made about the world
• Debits, credits, Purchase-Orders, Customer-ChangeRequests, etc
• As time moves on, more observations are added
• You can’t change the history but you can add new
observations
• Derived Results May Be Calculated
• Estimate of the “current” inventory
• Frequently inaccurate
• Historic Rollups Are Calculated
• Monthly bank statements
Slide 27
Databases and Transaction Logs
• Transaction Logs Are the Truth
• High-performance & write-only
• Describe ALL the changes
to the data
• Data-Base  the Current Opinion
• Describes the latest value of the
data as perceived by the application
The Database Is a Caching
of the Transaction Log !
Log
DB
It is the subset of the latest committed
values represented in the transaction log…
Slide 28
Accountants, Erasers, and Jail
• Accountants Go to Jail if They Use Erasers !!!
• The normal accounting practices allow for corrections
but not updates
• Corrections are added to the information
• The derived values are recalculated
• It is a common application paradigm to keep almost all data
as append-only
• The transactions themselves are append-only
• Sometimes they are eventually retired.
• The rollup (derived) summary may be recalculated
• Periodic snapshots of the rollup (derived) data is appended to
the record
• E.g. a monthly bank statement
Slide 29
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 30
Versions and Distributed Systems
• Can’t have
“the same” data
at many locations
• Unless it is
a snapshot
• Changing
distributed data
needs versions
• Creates a
snapshot…
Data Owning Service
Wednesday’s
Price-List
Wednesday’s
Price-List
Price-List
Wednesday’s
Price-List
Tuesday’s
Price-List
Monday’s
Price-List
Wednesday’s
Price-List
Tuesday’s
Price-List
Listening
Partner
Service-1
Listening
Partner
Service-5
Listening
Partner
Service-8
Tuesday’s
Price-List
Monday’s
Price-List
Listening
Partner
Service-7
Slide 31
DAGs of History
Data
“B1”
Data
“A1”
Data
“C2.1”
Data
“A1.1”
Data
“B2”
Data
“B3”
Data
“A2”
Data
“D1.1”
Data
“C1”
Data
“D1”
Service-1
Service-2
Data
“D2.1”
Data
“C2”
Data
“D2”
Data
“D1.2”
Data
“C3”
Service-3
Data
“D3”
Service-4
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 33
Tentative Operations
• Items don’t share transactions
• Now what can we do?
• Items may accept tentative operations
• Like a reservation; may be cancelled later
• If cancelled, the receiving item must cope
• Special logic to deal with cancellations
Item-A
Cancellation
Tentative
Op
Item-B
Slide 34
Semantics of Tentative Operations
• Tentative operations must be reorderable
• When cancelled, a compensation must occur
• Other operations may have occurred since
• Operations and cancellations must be
reorderable!
3
Item-A
Cancellation
1
Tentative
Op
Item-B
2
Item-C
Tentative
Op
Slide 35
Semantics of Cancellation and
Confirmation
• Cancellation
• Cope with not doing tentative operation
• Not undo
• New operation to “make things right”
• Accepting tentative means it’s OK to cancel
• Confirmation
• Relinquish the right to cancel tentative op
• Sometimes time driven
• Hotel rooms confirm in the morning
• Every tentative op confirms or cancels
Slide 36
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 37
Increasing & Decreasing
Uncertainty
• Each tentative operation increases your uncertainty
• You get more and more confused each time you accept a tentative
operation
• Each confirmation or cancellation decreases your
uncertainty
• It resolves the confusion imparted by the tentative operation it is
confirming or canceling
Tentative
Operation
More
Uncertain
Cancellation
or Confirmation
Uncertainty
Less
Uncertain
Slide 38
Bounded Uncertainty
• You can track the worst case situations for data values you
are managing
• If you keep inventory, you can know the lowest possible and highest
possible values
• Tentative operations move lowest and highest values apart
• This increases uncertainty
• Confirmations and cancellations move lowest and highest values
together
• This decreases uncertainty
Minimum
Widgets
Possible
Maximum
Widgets
Possible
Probability
• Knowing the bounds, you have Bounded Uncertainty
Widget Inventory
Slide 39
Acting on Bounded Uncertainty
• Knowing bounds on uncertainty allows many different
business rules:
• Refuse an order which may (in the worst case) result in widgets
overflowing the warehouse
• Calculate probability of worst case overflowing the warehouse
• Cost of temporary storage vs.
value of accepting order…
• Order food for hotel restaurant based on reservations and
probabilities
• May result in interesting work by applying risk
management algorithms…
Slide 40
Outline
Introduction
Data, Transactions, and Scalability
Messaging across Items
Partner-State and State-Machines
Accountants Don’t Use Erasers
Accurate Representations of Historical Facts
Managing Uncertainty across Items
Bounds of Uncertainty in Loosely-Coupled Systems
Conclusion
Slide 41
Vocabulary and Assertions
Assertions
about large
scale apps
New vocabulary
for discussing scale
Almost-infinite
scaling
Scale-agnostic app
Item
Partner-StateMachine
An environment demanding rapidly increasing
data and computation over time
An application that does not need to change
to support almost-infinite scaling
A collection of data referenced by a single key;
transactional scope of the scale-agnostic app
Data used inside one item to describe its
workflow state with a single partner item
Alternate indices
aren’t transactionally
consistent
As scale increases, the primary and alternate
indices cannot be guaranteed to live together
Items cooperate
using fine-grained
two-party workflow
No dist-txs  workflow; workflow participants
are items; work coordinated across pairs
Slide 42
Takeaways
• Scale agnostic application design
• Designing for scale leads you away from distributed transactions
• Local transactions are great  distributed transactions suck
•
Programming for scale leads to separate pieces of data called items
• Items must live in separate transactions
• Items are only connected with messaging
• “Classic” workflow but fine-grained
•
Separate items  messaging… but messaging is hard!
• Messages get lost and need retries
• Retries give at least once delivery
• Must have idempotent processing of messages
•
Coping with idempotent messaging requires “partner-state-machines”
• One PSM per-partner per-side  holds the state of the relationship
• The scale-agnostic app uses activities to cope with retries
• PSMs can compose to mask complexity
Slide 43
Complete your evaluation on the My Event pages
of the website at the CommNet or the Feedback
Terminals to win!
All attendees who submit
a session feedback form
within 12 hours after the
session ends will have the
chance to win the very latest
HTC 'Touch' smartphone
complete with Windows
Mobile® 6 Professional
© 2007 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only.
MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.