Supporting Multi-row Distributed Transactions with Global
Snapshot Isolation
Using Bare-bones HBase
Chen Zhang
Hans De Sterck
University of Waterloo
Outline
Introduction
General Background
Snapshot Isolation (SI)
HBase
System Design
Transactional SI Protocol
System Performance
Future Work
General Background (1)
Database transactions have been widely used by websites,
analytical programs, etc.
Snapshot isolation (SI) has been adopted by major DBMS
for high throughput
No solution exists for traditional DBMS to be easily
replicated and scaled on clouds
Column-oriented data stores are proven to be scalable on
clouds (BigTable, HBase). However, multi-row distributed
transactions are not supported out-of-the-box
General Background (2)
Google recently published a paper in OSDI10, Oct. 4
(submission deadline May 7) about their “Percolator”
system on top of BigTable for multi-row distributed
transactions with SI
Our paper describes an approach for multi-row
distributed transactions with SI on top of HBase, and it
turns out that Google’s system has many design elements
that are similar to ours
Snapshot Isolation (1)
Snapshot Isolation (SI)
For transaction T1 that starts at timestamp ts1
T1 is given the database snapshot up to ts1, and T1 can do
reads/writes on its own snapshot independently
When T1 commits, T1 checks to see if any other transactions have
committed conflicting data updates. If not, T1 commits
Snapshot Isolation (2)
Strong SI vs SI
Strong SI requires every transaction T to see the most up-todate snapshot of data
SI requires every transaction T to see a consistent snapshot
which can be any snapshot taken earlier than T’s start timestamp
Timestamp Ordering
S2
S1
C1
C2
S3
C3
T1
T2
T3
HBase
HBase is a column-oriented data store
A single global database-like table view
Multi-version data distinguished by timestamp
A data table is horizontally split into row regions and each region is
hosted by a region server
HBase guarantees single atomic row read/write
Outline
Introduction
General Background
Snapshot Isolation (SI)
HBase
System Design
Transactional SI Protocol
System Performance
Future Work
Design-Overview (1)
General Design Objective
No deployment of extra programs and inherits HBase
properties
Scalability, fault tolerance, high throughput, access transparency,
etc.
Non-intrusive to user data and easy to be adopted
No modification to existing user data
Implement a client library to manage transaction at
client side autonomously; no server-side changes
Transactions put their own information into the global tables
Meanwhile query those tables for information about other
transactions to determine whether to commit/abort
Design-Overview (2)
General SI Protocol
Timestamp Ordering
S2
S1
C1
C2
S3
C3
T1
T2
T3
Every transaction, when it commits, obtains a unique,
strictly incremental commit timestamp to determine
the order between transactions and be used to enforce SI
Every transaction commits successfully by inserting a
row in the Committed table
Every transaction, when it starts, read the commit
timestamp of the most recently committed transaction,
and use that as its start timestamp
Design-Overview (3)
Simplified Protocol Walkthrough
Get start timestamp S, a snapshot of Committed table
T reads/writes versions of data identified by S
When T tries to commit
Checks conflicting updates committed by other transactions by
scanning Committed table
T writes a row into Precommit table to indicate its attempt to
commit
Checks conflicting commit attempts by scanning Precommit
table
If both checks return no conflict, T proceeds to commit by
atomically inserting one row to Committed table
Design-SI Protocol
For Read-only transactions:
Only need to obtain start timestamp and read the correct version of
data from the snapshot
No need to do Precommit/Commit
For Update transactions:
Get start timestamp ts
Read/write
Precommit
Commit
Design-SI Protocol
For Read-only Transaction Ti
Get start timestamp Si and maintain DataSet DS in
memory
Data read {(L1, data1),…}
To read data item at L1
If L1 is in DS, read from DS. Otherwise
Query Version table and get C1
Scan Committed table and get the most recent transaction Ci
that updates data to version V; update Version table with Ci
Use V to read data and add (L1, data) to DS if necessary
Design-SI Protocol
For Update Transaction Ti (1)
Get start timestamp Si and maintain DataSet DS
Data read/written {(L1, data1),…}
Read data item at L1 (same as Read-only case)
Write
Directly write to data tables with unique timestamp Wi
Design-SI Protocol
For Update Transaction Ti (2)
Precommit
Get precommit label Pi
Scan Committed table at range [Si+1, ∞) for conflicting
commits with overlapping write set. If no conflicts, proceed
Add a row Pi to Precommit table. Scan Precommit Table at full
range for other rows with overlapping writeset with either
nothing under column “Committed” or a value under
“Committed” column larger than Si. If no conflicts, proceed
Design-SI Protocol
For Update Transaction Ti (3)
Commit
Get Commit timestamp Ci
Add a row Ci to Committed table with data items in writeset as
columns (HBase atomic row write)
Add “Ci” to row Pi in Precommit table
Design-Timestamp Mechanism
For each transaction Ti, four labels/timestamps are used
Design-Timestamp Mechanism
Issue globally unique and incremental timestamp/label
by using the HBase atomic incrementColumnValue
method on a single HBase Table
Design- Obtain Start Timestamp
For example, at the time T1 starts, there is a gap for C2 in
Committed table
The snapshot for T1 is C3, which includes L1 with version
W1, and L2 with version W3
Before T1 commits, C2 appears. The snapshot of T1 should
have included L1 with version W2
Use CommittedIndex Table to store recent snapshot
Outline
Introduction
General Background
Snapshot Isolation (SI)
HBase
System Design
Transactional SI Protocol
System Performance
Future Work
Performance (1)
Test the basic timestamp/label issuing mechanism using
a single HBase table
Performance (2)
Test the necessity of using version table to minimize the
range of scanning Committed table to find the most
recent data version
Performance (3)
Compare SI Read performance compared to bare-bones
HBase Read
Performance (4)
Compare SI Write performance compared to bare-bones
HBase Write
Future Work
Support strong SI with no blocking reads providing high
throughput
Add mechanism in handling straggling/failed transactions
Explore and experiment with usage application scenarios
General Background (3)
Similarities-Compared with Percolator
Support ACID transactions and guarantee snapshot
isolation utilizing the multi-version data support from
the underlying column store
Implemented as client library rather than as server side
middleware
Dispense globally unique and well-ordered timestamps
from a central location
Share some similar protocols for the commit process
General Background (4)
Differences-Compared with Percolator
Percolator focuses on analytical workloads that tolerate
large latency; our system focuses on random data access
with high throughput and low latency for web
applications
Percolator achieves Strong SI but reads may be blocking,
sacrificing throughput to data freshness; our system
achieves SI and does not block reads, sacrificing data
freshness to high throughput
Percolator requires modification to existing user data;
our system uses a separate set of tables, which is nonintrusive to user data
Percolator relies on BigTable single row atomic
transaction which is not supported by HBase
Questions
Thank you!