Presentation slides - University of Toronto

advertisement
Exploiting Distributed Version
Concurrency in a Transactional
Memory Cluster
Kaloian Manassiev, Madalin Mihailescu
and Cristiana Amza
University of Toronto, Canada
Transactional Memory Programming
Paradigm
Each thread executing a parallel region:
 Announces start of a transaction
 Executes operations on shared objects
 Attempts to commit the transaction



If no data race, commit succeeds, operations
take effect
Otherwise commit fails, operations discarded,
transaction restarted
Simpler than locking!
Transactional Memory

Used in multiprocessor platforms

Our work: the first TM implementation on
a cluster

Supports both SQL and parallel scientific
applications (C++)
TM in a Multiprocessor Node
T1: Active
T2: Active
Copy of A
T2: Write(A)
A
T1: Read(A)
Multiple physical copies of data
 High memory overhead

TM on a Cluster
Key Idea 1. Distributed Versions
 Different
versions of data arise
naturally in a cluster
 Create new version on different
node, others read own versions
read
write
read
read
Exploiting Distributed Page Versions
Distributed Transactional Memory (DTM)
network
v3
v2
v1
mem0
mem1
mem2
txn0
txn1
txn2
v0
...
memN
txnN
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v2
read
v2
v2
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v2
read
v2
v2
Key Idea 2: Concurrent “Snapshots”
Inside Each Node
Txn1 (v2)
Txn0 (v1)
v1
v1
v2
v1
v2
read
v2
v2
Distributed Transactional Memory
A novel fine-grained distributed concurrency
control algorithm




Low memory overhead
Exploits distributed versions
Supports multithreading within the node
Provides 1-copy serializability
Outline
 Programming
Interface
 Design
Data access tracking
 Data replication
 Conflict resolution

 Experiments
 Related
work and Conclusions
Programming Interface

init_transactions()
begin_transaction()
allocate_dtmemory()
commit_transaction()

Need to declare TM variables explicitly



Data Access Tracking

DTM traps reads and writes to shared
memory by either one of:

Virtual memory protection


Classic page-level memory protection
technique
Operator overloading in C++


Trapping reads: conversion operator
Trapping writes: assignment ops (=, +=, …)
& increment/decrement(++/--)
Data Replication
T1(UPDATE)
Page 1
Page 1
Page 2
Page 2
……
……
Page n
Page n
Twin Creation
Wr p1
T1(UPDATE)
Page 1
Page 2
Page 1
Page 2
……
……
Page n
P1
Twin
Page n
Twin Creation
Wr p2
T1(UPDATE)
Page 1
Page 2
P2
Twin
Page 1
Page 2
……
……
Page n
P1
Twin
Page n
Diff Creation
T1(UPDATE)
Page 1
Page 1
Page 2
Page 2
……
……
Page n
Page n
Broadcast of the Modifications at
Commit
Latest Version = 7
Latest Version = 7
T1(UPDATE)
Page 1
Page 1
Diff broadcast
(vers 8)
Page 2
v2
v1
Page 2
v1
……
……
Page n
Page n
Other Nodes Enqueue Diffs
Latest Version = 7
Latest Version = 7
T1(UPDATE)
Page 1
Page 1
Diff broadcast
(vers 8)
Page 2
v2
v1
v8
v1
Page 2
……
……
Page n
v8
Page n
Update Latest Version
Latest Version = 7
Latest Version = 8
T1(UPDATE)
Page 1
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
Other Nodes Acknowledge Receipt
Latest Version = 7
Latest Version = 8
T1(UPDATE)
Page 1
Ack (vers 8)
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
T1 Commits
Latest Version = 8
Latest Version = 8
T1(UPDATE)
Page 1
Page 1
v8
Page 2
v1
v8
v1
Page 2
……
……
Page n
v2
Page n
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V0
Page 1
V8
V2
V1
V0
Page 2
V8
.
V1
.
.
Page N
V5
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V0
Page 2
V8
.
V1
.
.
Page N
V5
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
Page N
V5
V8
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
T3(V8):
Rd(PN)
Page N
V5
V8
V3
V4
Lazy Diff Application
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V1
Page 2
.
.
.
T3(V8):
Rd(PN)
Page N
V8
V5
Waiting Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
Wait until T2 commits
V1
Page 2
.
.
.
T3(V8):
Rd(PN, P2)
Page N
V8
V5
Transaction Abort Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V0
Page 2
V8
.
V1
.
.
T3(V8):
Rd(P2)
Page N
V5
V3
V4
Transaction Abort Due to Conflict
T2(V2):
Latest
Version = 8
Rd(…, P1, P2)
V2
Page 1
V8
V8
Page 2
CONFLICT!
T3(V8):
Rd(P2)
.
.
.
Page N
V5
V3
V4
Write-Write Conflict Resolution

Can be done in two ways



Executing all updates on a master node, which
enforces serialization order
OR
Aborting the local update transaction upon
receiving a conflicting diff flush
More on this in the paper
Experimental Platform

Cluster of Dual AMD Athlon Computers



512 MB RAM
1.5GHz CPUs
RedHat Fedora Linux OS
Benchmarks for Experiments

TPC-W e-commerce benchmark


Models an on-line book store
Industry-standard workload mixes





Browsing (5% updates)
Shopping (20% updates)
Ordering (50% updates)
Database size of ~600MB
Hash-table micro-benchmark (in paper)
Application of DTM for E-Commerce
App Server
Web Server
Customer
TP
HT
HTTP
The Internet
HTTP
RPC
Web Server
App Server
H
TT
P
Customer
SQL
DATABASE
Customer
Web Server
App Server
Application of DTM for E-Commerce
We use a Transactional Memory Cluster as the
DB Tier
Cluster Architecture
Implementation Details
 We
use MySQL’s in-memory HEAP
tables
RB-Tree main-memory index
 No transactional properties

 Provided
 Multiple
node
by inserting TM calls
threads running on each
Baseline for Comparison
 State-of-the-art
Conflict-aware
protocol for scaling e-commerce on
clusters

Coarse grained (per-table) concurrency
control
(USITS’03, Middleware’03)
Throughput Scaling
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
1
2
3
4
5
6
# of Slave Replicas
Ordering
Shopping
Browsing
7
8
Fraction of Aborted Transactions
# of slaves
Ordering
Shopping
Browsing
1
1.15%
1.44%
0.63%
2
0.35%
2.27%
1.34%
4
0.07%
1.70%
2.37%
6
0.02%
0.41%
2.07%
8
0.00%
0.22%
1.59%
Comparison (browsing)
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Comparison (shopping)
350
Throughput (WIPS)
300
250
200
150
100
50
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Comparison (ordering)
Throughput (WIPS)
200
180
160
140
120
100
80
60
40
20
0
0
2
4
6
Number of Replicas
Conflict-Aware
DTM
8
Related Work

Distributed concurrency control for database
applications



Distributed object stores


Postgres-R(SI), Wu and Kemme (ICDE’05)
Ganymed, Plattner and Alonso (Middleware’04)
Argus (’83), QuickStore (’94), OOPSLA’03
Distributed Shared Memory


TreadMarks, Keleher et al. (USENIX’94)
Tang et al. (IPDPS’04)
Conclusions

New software-only transactional memory
scheme on a cluster


Fine-grained distributed concurrency
control


Both strong consistency and scaling
Exploits distributed versions, low memory
overheads
Improved throughput scaling for ecommerce web sites
Questions?
Backup slides
Example Program
#include <dtm_types.h>
typedef struct Point {
dtm_int x;
dtm_int y;
} Point;
init_transactions();
for (int i = 0; i < 10; i++) {
begin_transaction();
Point * p = allocate_dtmemory();
p->x = rand();
p->y = rand();
commit_transaction();
}
Query weights
1.0
0.9
0.8
0.7
0.6
Writes
Reads
0.5
0.4
0.3
0.2
0.1
0.0
Ord
Idx(0.35)
Shp
Idx(0.1)
Brw
Idx(0.03)
Ord,No
Idx(0.26)
Shp,No
Idx(0.07)
Brw,No
Idx(0.02)
Decreasing the fraction of aborts
3.00%
2.68%
2.34% 2.37%
2.50%
Fraction of Aborts
2.83%
2.07%
2.00%
1.59%
1.50% 1.34% 1.34%
1.00%
0.50%
0.00%
M + 2S
M + 2S,
Confl.
Reduce
M + 4S
M + 4S,
Confl.
Reduce
M + 6S
M + 6S,
Confl.
Reduce
M + 8S
M + 8S,
Confl.
Reduce
Micro benchmark experiments
1%
5%
10%
15%
20%
Throughput ( x 1000 )
1200
1000
800
600
400
200
0
1
2
3
4
5
6
7
number of machines
8
9
10
Micro benchmark experiments (with
read-only optimization)
R/O Opt
Base
Throughput ( x 1000 )
500
400
300
200
100
0
1
2
3
4
5
6
7
number of machines
8
9
10
Fraction of aborts
# of machines
1
2
4
6
8
10
% aborts
0
0.57
1.69
2.94
4.05
5.08
Download