Exploiting Distributed Version
Concurrency in a Transactional
Memory Cluster
Kaloian Manassiev, Madalin Mihailescu
and Cristiana Amza
University of Toronto, Canada
Transactional Memory Programming
Paradigm
Each thread executing a parallel region:
Announces start of a transaction
Executes operations on shared objects
Attempts to commit the transaction
If no data race, commit succeeds, operations
take effect
Otherwise commit fails, operations discarded,
transaction restarted
Simpler than locking!
Transactional Memory
Used in multiprocessor platforms
Our work: the first TM implementation on
a cluster
Supports both SQL and parallel scientific
applications (C++)
TM in a Multiprocessor Node
T1: Active
T2: Active
Copy of A
T2: Write(A)
A
T1: Read(A)
Multiple physical copies of data
High memory overhead
TM on a Cluster
Key Idea 1. Distributed Versions
Different
versions of data arise
naturally in a cluster
Create new version on different
node, others read own versions
read
write
read
read
Exploiting Distributed Page Versions
Distributed Transactional Memory (DTM)
network
v3
v2
v1
mem0
mem1
mem2
txn0
txn1
txn2
v0
...
memN
txnN
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v2
read
v2
v2
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v2
read
v2
v2
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v1
v2
read
v2
v2
Distributed Transactional Memory
A novel fine-grained distributed concurrency
control algorithm
Low memory overhead
Exploits distributed versions
Supports multithreading within the node
Provides 1-copy serializability
Outline
Programming
Interface
Design
Data access tracking
Data replication
Conflict resolution
Experiments
Related
work and Conclusions
Programming Interface
init_transactions()
begin_transaction()
allocate_dtmemory()
commit_transaction()
Need to declare TM variables explicitly
Data Access Tracking
DTM traps reads and writes to shared
memory by either one of:
Virtual memory protection
Classic page-level memory protection
technique
Operator overloading in C++
Trapping reads: conversion operator
Trapping writes: assignment ops (=, +=, …)
& increment/decrement(++/--)
Data Replication
T1(UPDATE)
Page 1
Page 1
Page 2
Page 2
……
……
Page n
Page n
Twin Creation
Wr p1
T1(UPDATE)
Page 1
Page 2
Page 1
Page 2
……
……
Page n
P1
Twin
Page n
Twin Creation
Wr p2
T1(UPDATE)
Page 1
Page 2
P2
Twin
Page 1
Page 2
……
……
Page n
P1
Twin
Page n
Diff Creation
T1(UPDATE)
Page 1
Page 1
Page 2
Page 2
……
……
Page n
Page n
Broadcast of the Modifications at
Commit
Latest Version = 7
Latest Version = 7
T1(UPDATE)
Page 1
Page 1
Diff broadcast
(vers 8)
Page 2
v2
v1
Page 2
v1
……
……
Page n
Page n
Other Nodes Enqueue Diffs
Latest Version = 7
Latest Version = 7
T1(UPDATE)
Page 1
Page 1
Diff broadcast
(vers 8)
Page 2
v2
v1
v8
v1
Page 2
……
……
Page n
v8
Page n
Update Latest Version
Latest Version = 7
Latest Version = 8
T1(UPDATE)
Page 1
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
Other Nodes Acknowledge Receipt
Latest Version = 7
Latest Version = 8
T1(UPDATE)
Page 1
Ack (vers 8)
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
T1 Commits
Latest Version = 8
Latest Version = 8
T1(UPDATE)
Page 1
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V0
Page 1
V8
V2
V1
V0
Page 2
V8
.
V1
.
.
Page N
V5
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V0
Page 2
V8
.
V1
.
.
Page N
V5
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
Page N
V5
V8
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
T3(V8):
Rd(PN)
Page N
V5
V8
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
T3(V8):
Rd(PN)
Page N
V8
V5
Waiting Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
Wait until T2 commits
V1
Page 2
.
.
.
T3(V8):
Rd(PN, P2)
Page N
V8
V5
Transaction Abort Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V0
Page 2
V8
.
V1
.
.
T3(V8):
Rd(P2)
Page N
V5
V3
V4
Transaction Abort Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V8
Page 2
CONFLICT!
T3(V8):
Rd(P2)
.
.
.
Page N
V5
V3
V4
Write-Write Conflict Resolution
Can be done in two ways
Executing all updates on a master node, which
enforces serialization order
OR
Aborting the local update transaction upon
receiving a conflicting diff flush
More on this in the paper
Experimental Platform
Cluster of Dual AMD Athlon Computers
512 MB RAM
1.5GHz CPUs
RedHat Fedora Linux OS
Benchmarks for Experiments
TPC-W e-commerce benchmark
Models an on-line book store
Industry-standard workload mixes
Browsing (5% updates)
Shopping (20% updates)
Ordering (50% updates)
Database size of ~600MB
Hash-table micro-benchmark (in paper)
Application of DTM for E-Commerce
App Server
Web Server
Customer
TP
HT
HTTP
The Internet
HTTP
RPC
Web Server
App Server
H
TT
P
Customer
SQL
DATABASE
Customer
Web Server
App Server
Application of DTM for E-Commerce
We use a Transactional Memory Cluster as the
DB Tier
Cluster Architecture
Implementation Details
We
use MySQL’s in-memory HEAP
tables
RB-Tree main-memory index
No transactional properties
Provided
Multiple
node
by inserting TM calls
threads running on each
Baseline for Comparison
State-of-the-art
Conflict-aware
protocol for scaling e-commerce on
clusters
Coarse grained (per-table) concurrency
control
(USITS’03, Middleware’03)
Throughput Scaling
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
1
2
3
4
5
6
# of Slave Replicas
Ordering
Shopping
Browsing
7
8
Fraction of Aborted Transactions
# of slaves
Ordering
Shopping
Browsing
1
1.15%
1.44%
0.63%
2
0.35%
2.27%
1.34%
4
0.07%
1.70%
2.37%
6
0.02%
0.41%
2.07%
8
0.00%
0.22%
1.59%
Comparison (browsing)
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Comparison (shopping)
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Comparison (ordering)
Throughput (WIPS)
200
180
160
140
120
100
80
60
40
20
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Related Work
Distributed concurrency control for database
applications
Distributed object stores
Postgres-R(SI), Wu and Kemme (ICDE’05)
Ganymed, Plattner and Alonso (Middleware’04)
Argus (’83), QuickStore (’94), OOPSLA’03
Distributed Shared Memory
TreadMarks, Keleher et al. (USENIX’94)
Tang et al. (IPDPS’04)
Conclusions
New software-only transactional memory
scheme on a cluster
Fine-grained distributed concurrency
control
Both strong consistency and scaling
Exploits distributed versions, low memory
overheads
Improved throughput scaling for ecommerce web sites
Questions?
Backup slides
Example Program
#include <dtm_types.h>
typedef struct Point {
dtm_int x;
dtm_int y;
} Point;
init_transactions();
for (int i = 0; i < 10; i++) {
begin_transaction();
Point * p = allocate_dtmemory();
p->x = rand();
p->y = rand();
commit_transaction();
}
Query weights
1.0
0.9
0.8
0.7
0.6
Writes
Reads
0.5
0.4
0.3
0.2
0.1
0.0
Ord
Idx(0.35)
Shp
Idx(0.1)
Brw
Idx(0.03)
Ord,No
Idx(0.26)
Shp,No
Idx(0.07)
Brw,No
Idx(0.02)
Decreasing the fraction of aborts
3.00%
2.68%
2.34% 2.37%
2.50%
Fraction of Aborts
2.83%
2.07%
2.00%
1.59%
1.50% 1.34% 1.34%
1.00%
0.50%
0.00%
M + 2S
M + 2S,
Confl.
Reduce
M + 4S
M + 4S,
Confl.
Reduce
M + 6S
M + 6S,
Confl.
Reduce
M + 8S
M + 8S,
Confl.
Reduce
Micro benchmark experiments
1%
5%
10%
15%
20%
Throughput ( x 1000 )
1200
1000
800
600
400
200
0
1
2
3
4
5
6
7
number of machines
8
9
10
Micro benchmark experiments (with
read-only optimization)
R/O Opt
Base
Throughput ( x 1000 )
500
400
300
200
100
0
1
2
3
4
5
6
7
number of machines
8
9
10
Fraction of aborts
# of machines
1
2
4
6
8
10
% aborts
0
0.57
1.69
2.94
4.05
5.08