Supporting Multi-row Distributed Transactions with Global Snapshot

Supporting Multi-row Distributed Transactions with Global Snapshot Isolation Using Bare-bones HBase Chen Zhang Hans De Sterck University of Waterloo Outline  Introduction  General Background  Snapshot Isolation (SI)  HBase  System Design  Transactional SI Protocol  System Performance  Future Work General Background (1)  Database transactions have been widely used by websites, analytical programs, etc.  Snapshot isolation (SI) has been adopted by major DBMS for high throughput  No solution exists for traditional DBMS to be easily replicated and scaled on clouds  Column-oriented data stores are proven to be scalable on clouds (BigTable, HBase). However, multi-row distributed transactions are not supported out-of-the-box General Background (2)  Google recently published a paper in OSDI10, Oct. 4 (submission deadline May 7) about their “Percolator” system on top of BigTable for multi-row distributed transactions with SI  Our paper describes an approach for multi-row distributed transactions with SI on top of HBase, and it turns out that Google’s system has many design elements that are similar to ours Snapshot Isolation (1)  Snapshot Isolation (SI)  For transaction T1 that starts at timestamp ts1  T1 is given the database snapshot up to ts1, and T1 can do reads/writes on its own snapshot independently  When T1 commits, T1 checks to see if any other transactions have committed conflicting data updates. If not, T1 commits Snapshot Isolation (2)  Strong SI vs SI  Strong SI requires every transaction T to see the most up-todate snapshot of data  SI requires every transaction T to see a consistent snapshot which can be any snapshot taken earlier than T’s start timestamp Timestamp Ordering S2 S1 C1 C2 S3 C3 T1 T2 T3 HBase  HBase is a column-oriented data store  A single global database-like table view  Multi-version data distinguished by timestamp  A data table is horizontally split into row regions and each region is hosted by a region server  HBase guarantees single atomic row read/write Outline  Introduction  General Background  Snapshot Isolation (SI)  HBase  System Design  Transactional SI Protocol  System Performance  Future Work Design-Overview (1) General Design Objective  No deployment of extra programs and inherits HBase properties  Scalability, fault tolerance, high throughput, access transparency, etc.  Non-intrusive to user data and easy to be adopted  No modification to existing user data  Implement a client library to manage transaction at client side autonomously; no server-side changes  Transactions put their own information into the global tables  Meanwhile query those tables for information about other transactions to determine whether to commit/abort Design-Overview (2) General SI Protocol Timestamp Ordering S2 S1 C1 C2 S3 C3 T1 T2 T3  Every transaction, when it commits, obtains a unique, strictly incremental commit timestamp to determine the order between transactions and be used to enforce SI  Every transaction commits successfully by inserting a row in the Committed table  Every transaction, when it starts, read the commit timestamp of the most recently committed transaction, and use that as its start timestamp Design-Overview (3) Simplified Protocol Walkthrough  Get start timestamp S, a snapshot of Committed table  T reads/writes versions of data identified by S  When T tries to commit  Checks conflicting updates committed by other transactions by scanning Committed table  T writes a row into Precommit table to indicate its attempt to commit  Checks conflicting commit attempts by scanning Precommit table  If both checks return no conflict, T proceeds to commit by atomically inserting one row to Committed table Design-SI Protocol  For Read-only transactions:  Only need to obtain start timestamp and read the correct version of data from the snapshot  No need to do Precommit/Commit  For Update transactions:  Get start timestamp ts  Read/write  Precommit  Commit Design-SI Protocol For Read-only Transaction Ti  Get start timestamp Si and maintain DataSet DS in memory  Data read {(L1, data1),…}  To read data item at L1  If L1 is in DS, read from DS. Otherwise  Query Version table and get C1  Scan Committed table and get the most recent transaction Ci that updates data to version V; update Version table with Ci  Use V to read data and add (L1, data) to DS if necessary Design-SI Protocol For Update Transaction Ti (1)  Get start timestamp Si and maintain DataSet DS  Data read/written {(L1, data1),…}  Read data item at L1 (same as Read-only case)  Write  Directly write to data tables with unique timestamp Wi Design-SI Protocol For Update Transaction Ti (2)  Precommit  Get precommit label Pi  Scan Committed table at range [Si+1, ∞) for conflicting commits with overlapping write set. If no conflicts, proceed  Add a row Pi to Precommit table. Scan Precommit Table at full range for other rows with overlapping writeset with either nothing under column “Committed” or a value under “Committed” column larger than Si. If no conflicts, proceed Design-SI Protocol For Update Transaction Ti (3)  Commit  Get Commit timestamp Ci  Add a row Ci to Committed table with data items in writeset as columns (HBase atomic row write)  Add “Ci” to row Pi in Precommit table Design-Timestamp Mechanism  For each transaction Ti, four labels/timestamps are used Design-Timestamp Mechanism  Issue globally unique and incremental timestamp/label by using the HBase atomic incrementColumnValue method on a single HBase Table Design- Obtain Start Timestamp  For example, at the time T1 starts, there is a gap for C2 in Committed table  The snapshot for T1 is C3, which includes L1 with version W1, and L2 with version W3  Before T1 commits, C2 appears. The snapshot of T1 should have included L1 with version W2  Use CommittedIndex Table to store recent snapshot Outline  Introduction  General Background  Snapshot Isolation (SI)  HBase  System Design  Transactional SI Protocol  System Performance  Future Work Performance (1)  Test the basic timestamp/label issuing mechanism using a single HBase table Performance (2)  Test the necessity of using version table to minimize the range of scanning Committed table to find the most recent data version Performance (3)  Compare SI Read performance compared to bare-bones HBase Read Performance (4)  Compare SI Write performance compared to bare-bones HBase Write Future Work  Support strong SI with no blocking reads providing high throughput  Add mechanism in handling straggling/failed transactions  Explore and experiment with usage application scenarios General Background (3) Similarities-Compared with Percolator  Support ACID transactions and guarantee snapshot isolation utilizing the multi-version data support from the underlying column store  Implemented as client library rather than as server side middleware  Dispense globally unique and well-ordered timestamps from a central location  Share some similar protocols for the commit process General Background (4) Differences-Compared with Percolator  Percolator focuses on analytical workloads that tolerate large latency; our system focuses on random data access with high throughput and low latency for web applications  Percolator achieves Strong SI but reads may be blocking, sacrificing throughput to data freshness; our system achieves SI and does not block reads, sacrificing data freshness to high throughput  Percolator requires modification to existing user data; our system uses a separate set of tables, which is nonintrusive to user data  Percolator relies on BigTable single row atomic transaction which is not supported by HBase Questions  Thank you!

Supporting Multi-row Distributed Transactions with Global Snapshot

Related documents

Products

Support

Supporting Multi-row Distributed Transactions with Global Snapshot

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib