Supporting Multi-row Distributed Transactions with Global Snapshot

advertisement
Supporting Multi-row Distributed Transactions with Global
Snapshot Isolation
Using Bare-bones HBase
Chen Zhang
Hans De Sterck
University of Waterloo
Outline
 Introduction
 General Background
 Snapshot Isolation (SI)
 HBase
 System Design
 Transactional SI Protocol
 System Performance
 Future Work
General Background (1)
 Database transactions have been widely used by websites,
analytical programs, etc.
 Snapshot isolation (SI) has been adopted by major DBMS
for high throughput
 No solution exists for traditional DBMS to be easily
replicated and scaled on clouds
 Column-oriented data stores are proven to be scalable on
clouds (BigTable, HBase). However, multi-row distributed
transactions are not supported out-of-the-box
General Background (2)
 Google recently published a paper in OSDI10, Oct. 4
(submission deadline May 7) about their “Percolator”
system on top of BigTable for multi-row distributed
transactions with SI
 Our paper describes an approach for multi-row
distributed transactions with SI on top of HBase, and it
turns out that Google’s system has many design elements
that are similar to ours
Snapshot Isolation (1)
 Snapshot Isolation (SI)
 For transaction T1 that starts at timestamp ts1
 T1 is given the database snapshot up to ts1, and T1 can do
reads/writes on its own snapshot independently
 When T1 commits, T1 checks to see if any other transactions have
committed conflicting data updates. If not, T1 commits
Snapshot Isolation (2)
 Strong SI vs SI
 Strong SI requires every transaction T to see the most up-todate snapshot of data
 SI requires every transaction T to see a consistent snapshot
which can be any snapshot taken earlier than T’s start timestamp
Timestamp Ordering
S2
S1
C1
C2
S3
C3
T1
T2
T3
HBase
 HBase is a column-oriented data store
 A single global database-like table view
 Multi-version data distinguished by timestamp
 A data table is horizontally split into row regions and each region is
hosted by a region server
 HBase guarantees single atomic row read/write
Outline
 Introduction
 General Background
 Snapshot Isolation (SI)
 HBase
 System Design
 Transactional SI Protocol
 System Performance
 Future Work
Design-Overview (1)
General Design Objective
 No deployment of extra programs and inherits HBase
properties
 Scalability, fault tolerance, high throughput, access transparency,
etc.
 Non-intrusive to user data and easy to be adopted
 No modification to existing user data
 Implement a client library to manage transaction at
client side autonomously; no server-side changes
 Transactions put their own information into the global tables
 Meanwhile query those tables for information about other
transactions to determine whether to commit/abort
Design-Overview (2)
General SI Protocol
Timestamp Ordering
S2
S1
C1
C2
S3
C3
T1
T2
T3
 Every transaction, when it commits, obtains a unique,
strictly incremental commit timestamp to determine
the order between transactions and be used to enforce SI
 Every transaction commits successfully by inserting a
row in the Committed table
 Every transaction, when it starts, read the commit
timestamp of the most recently committed transaction,
and use that as its start timestamp
Design-Overview (3)
Simplified Protocol Walkthrough
 Get start timestamp S, a snapshot of Committed table
 T reads/writes versions of data identified by S
 When T tries to commit
 Checks conflicting updates committed by other transactions by
scanning Committed table
 T writes a row into Precommit table to indicate its attempt to
commit
 Checks conflicting commit attempts by scanning Precommit
table
 If both checks return no conflict, T proceeds to commit by
atomically inserting one row to Committed table
Design-SI Protocol
 For Read-only transactions:
 Only need to obtain start timestamp and read the correct version of
data from the snapshot
 No need to do Precommit/Commit
 For Update transactions:
 Get start timestamp ts
 Read/write
 Precommit
 Commit
Design-SI Protocol
For Read-only Transaction Ti
 Get start timestamp Si and maintain DataSet DS in
memory
 Data read {(L1, data1),…}
 To read data item at L1
 If L1 is in DS, read from DS. Otherwise
 Query Version table and get C1
 Scan Committed table and get the most recent transaction Ci
that updates data to version V; update Version table with Ci
 Use V to read data and add (L1, data) to DS if necessary
Design-SI Protocol
For Update Transaction Ti (1)
 Get start timestamp Si and maintain DataSet DS
 Data read/written {(L1, data1),…}
 Read data item at L1 (same as Read-only case)
 Write
 Directly write to data tables with unique timestamp Wi
Design-SI Protocol
For Update Transaction Ti (2)
 Precommit
 Get precommit label Pi
 Scan Committed table at range [Si+1, ∞) for conflicting
commits with overlapping write set. If no conflicts, proceed
 Add a row Pi to Precommit table. Scan Precommit Table at full
range for other rows with overlapping writeset with either
nothing under column “Committed” or a value under
“Committed” column larger than Si. If no conflicts, proceed
Design-SI Protocol
For Update Transaction Ti (3)
 Commit
 Get Commit timestamp Ci
 Add a row Ci to Committed table with data items in writeset as
columns (HBase atomic row write)
 Add “Ci” to row Pi in Precommit table
Design-Timestamp Mechanism
 For each transaction Ti, four labels/timestamps are used
Design-Timestamp Mechanism
 Issue globally unique and incremental timestamp/label
by using the HBase atomic incrementColumnValue
method on a single HBase Table
Design- Obtain Start Timestamp
 For example, at the time T1 starts, there is a gap for C2 in
Committed table
 The snapshot for T1 is C3, which includes L1 with version
W1, and L2 with version W3
 Before T1 commits, C2 appears. The snapshot of T1 should
have included L1 with version W2
 Use CommittedIndex Table to store recent snapshot
Outline
 Introduction
 General Background
 Snapshot Isolation (SI)
 HBase
 System Design
 Transactional SI Protocol
 System Performance
 Future Work
Performance (1)
 Test the basic timestamp/label issuing mechanism using
a single HBase table
Performance (2)
 Test the necessity of using version table to minimize the
range of scanning Committed table to find the most
recent data version
Performance (3)
 Compare SI Read performance compared to bare-bones
HBase Read
Performance (4)
 Compare SI Write performance compared to bare-bones
HBase Write
Future Work
 Support strong SI with no blocking reads providing high
throughput
 Add mechanism in handling straggling/failed transactions
 Explore and experiment with usage application scenarios
General Background (3)
Similarities-Compared with Percolator
 Support ACID transactions and guarantee snapshot
isolation utilizing the multi-version data support from
the underlying column store
 Implemented as client library rather than as server side
middleware
 Dispense globally unique and well-ordered timestamps
from a central location
 Share some similar protocols for the commit process
General Background (4)
Differences-Compared with Percolator
 Percolator focuses on analytical workloads that tolerate
large latency; our system focuses on random data access
with high throughput and low latency for web
applications
 Percolator achieves Strong SI but reads may be blocking,
sacrificing throughput to data freshness; our system
achieves SI and does not block reads, sacrificing data
freshness to high throughput
 Percolator requires modification to existing user data;
our system uses a separate set of tables, which is nonintrusive to user data
 Percolator relies on BigTable single row atomic
transaction which is not supported by HBase
Questions
 Thank you!
Download