Optimistic Intra-Transaction Parallelism on Chip-Multiprocessors

advertisement
Optimistic Intra-Transaction
Parallelism on
Chip-Multiprocessors
Chris Colohan1, Anastassia Ailamaki1,
J. Gregory Steffan2 and Todd C. Mowry1,3
1Carnegie
Mellon University
2University of Toronto
3Intel Research Pittsburgh
Chip Multiprocessors are Here!
AMD Opteron
IBM Power 5
Intel Yonah



2 cores now, soon will have 4, 8, 16, or 32
Multiple threads per core
How do we best use them?
Copyright 2005 Chris Colohan
2
Multi-Core Enhances Throughput
Users
Database Server
Transactions
DBMS
Database
Cores can run concurrent transactions
and improve throughput
Copyright 2005 Chris Colohan
3
Multi-Core Enhances Throughput
Users
Database Server
Transactions
DBMS
Database
Can multiple cores improve
transaction latency?
Copyright 2005 Chris Colohan
4
Parallelizing transactions
DBMS
SELECT cust_info FROM customer;
UPDATE district WITH order_id;
INSERT order_id INTO new_order;
foreach(item) {
GET quantity FROM stock;
quantity--;
UPDATE stock WITH quantity;
INSERT item INTO order_line;
}

Intra-query parallelism



Used for long-running queries (decision support)
Does not work for short queries
Short queries dominate in commercial workloads
Copyright 2005 Chris Colohan
5
Parallelizing transactions
DBMS
SELECT cust_info FROM customer;
UPDATE district WITH order_id;
INSERT order_id INTO new_order;
foreach(item) {
GET quantity FROM stock;
quantity--;
UPDATE stock WITH quantity;
INSERT item INTO order_line;
}

Intra-transaction parallelism


Each thread spans multiple queries
Hard to add to existing systems!

Need to change interface, add latches and locks, worry
about correctness of parallel execution…
Copyright 2005 Chris Colohan
6
Parallelizing transactions
DBMS
SELECT cust_info FROM customer;
UPDATE district WITH order_id;
INSERT order_id INTO new_order;
foreach(item) {
GET quantity FROM stock;
quantity--;
UPDATE stock WITH quantity;
INSERT item INTO order_line;
}

Intra-transaction parallelism

Thread Level Speculation (TLS)
Hard to add to existing systems!
makes
parallelization
Need
to change
interface, add latcheseasier.
and locks, worry

Breaks transaction into threads

about correctness of parallel execution…
Copyright 2005 Chris Colohan
7
Thread Level Speculation (TLS)
Epoch 1
Epoch 2
=*p
*p=
*p=
Time
=*q
*q=
*q=
=*p
=*p
=*q
=*q
Sequential
Copyright 2005 Chris Colohan
Parallel
8
Thread Level Speculation (TLS)
Epoch 1
*p=
Epoch 2
Violation!
=*p 
*p=

Time
R2
=*p
*q=
*q=


=*q 
=*p
Use epochs
Detect violations
Restart to recover
Buffer state
Worst case:

=*q

Best case:

Sequential
Sequential
Fully parallel
Parallel
Data dependences limit performance.
Copyright 2005 Chris Colohan
9
A Coordinated Effort
TPC-C
Transactions
DBMS
BerkeleyDB
Hardware
Simulated machine
Copyright 2005 Chris Colohan
10
A Coordinated Effort
Transaction
Programmer
DBMS Programmer
Choose epoch
boundaries
Remove performance
bottlenecks
Hardware Developer
Add TLS support to
architecture
Copyright 2005 Chris Colohan
11
So what’s new?

Intra-transaction parallelism






Without changing the transactions
With minor changes to the DBMS
Without having to worry about locking
Without introducing concurrency bugs
With good performance
Halve transaction latency on four cores
Copyright 2005 Chris Colohan
12
Related Work

Optimistic Concurrency Control (Kung82)

Sagas (Molina&Salem87)

Transaction chopping (Shasha95)
Copyright 2005 Chris Colohan
13
Outline






Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions
Copyright 2005 Chris Colohan
14
Case Study: New Order (TPC-C)
GET cust_info FROM customer;
UPDATE district WITH order_id;
INSERT order_id INTO new_order;
foreach(item) {
GET quantity FROM stock
WHERE i_id=item;
UPDATE stock WITH quantity-1
WHERE i_id=item;
INSERT item INTO order_line;
}

78% of transaction
execution time
Only dependence is the quantity field

Very unlikely to occur (1/100,000)
Copyright 2005 Chris Colohan
15
Case Study: New Order (TPC-C)
GET cust_info FROM customer;
UPDATE district WITH order_id;
INSERT order_id INTO new_order;
foreach(item) {
GET quantity FROM stock
WHERE i_id=item;
UPDATE stock WITH quantity-1
WHERE i_id=item;
GET cust_info FROM customer;
INSERT item INTO order_line;
UPDATE district WITH order_id;
}
INSERT order_id INTO new_order;
TLS_foreach(item) {
GET quantity FROM stock
WHERE i_id=item;
UPDATE stock WITH quantity-1
WHERE i_id=item;
INSERT item INTO order_line;
}
Copyright 2005 Chris Colohan
16
Outline






Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions
Copyright 2005 Chris Colohan
17
Time
Dependences in DBMS
Copyright 2005 Chris Colohan
18
Dependences in DBMS
Time
Dependences serialize execution!
Copyright 2005 Chris Colohan
Performance tuning:
 Profile execution
 Remove bottleneck dependence
 Repeat
19
Buffer Pool Management
CPU
get_page(5)
put_page(5)
Buffer Pool
ref: 0
1
Copyright 2005 Chris Colohan
20
Buffer Pool Management
get_page(5)
put_page(5)
get_page(5)
Time
CPU
put_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref: 0
Copyright 2005 Chris Colohan
get_page(5)
put_page(5)
TLS ensures first
epoch gets page first.
Who cares?
21
Buffer Pool Management
Escape speculation
get_page(5)
CPU
put_page(5)
Invoke operation
Store undo function
Resume speculation
get_page(5)
Time
•
•
•
•
put_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref: 0
Copyright 2005 Chris Colohan
get_page(5)
put_page(5)
= Escape Speculation
22
Buffer Pool Management
get_page(5)
put_page(5)
get_page(5)
Time
CPU
put_page(5)
get_page(5)
put_page(5)
Buffer Pool
ref: 0
Copyright 2005 Chris Colohan
get_page(5)
put_page(5)
Not undoable!
= Escape Speculation
23
Buffer Pool Management
get_page(5)
put_page(5)
get_page(5)
Time
CPU
put_page(5)
Buffer Pool
ref: 0
get_page(5)
put_page(5)
= Escape Speculation

Delay put_page until end of
epoch

Copyright 2005 Chris Colohan
Avoid dependence
24
Removing Bottleneck Dependences
We introduce three techniques:
 Delay operations until non-speculative




Escape speculation


Mutex and lock acquire and release
Buffer pool, memory, and cursor release
Log sequence number assignment
Buffer pool, memory, and cursor allocation
Traditional parallelization

Memory allocation, cursor pool, error checks,
false sharing
Copyright 2005 Chris Colohan
25
Outline






Introduction
Related work
Dividing transactions into epochs
Removing bottlenecks in the DBMS
Results
Conclusions
Copyright 2005 Chris Colohan
26
Experimental Setup

Detailed simulation



Superscalar, out-of-order,
128 entry reorder buffer
Memory hierarchy
modeled in detail
CPU
CPU
CPU
CPU
32KB
4-way
L1 $
32KB
4-way
L1 $
32KB
4-way
L1 $
32KB
4-way
L1 $
TPC-C transactions on
BerkeleyDB





In-core database
Single user
Single warehouse
Measure interval of 100
transactions
Measuring latency not
throughput
Copyright 2005 Chris Colohan
2MB 4-way L2 $
Rest of memory system
27
Optimizing the DBMS: New Order
Time (normalized)
1.25
26%
improvement
1
0.75
Other CPUs
not helping
0.5
0.25
Cache misses
increase
Idle CPU
Violated
Cache Miss
Busy
Can’t optimize
much more
0
Copyright 2005 Chris Colohan
28
Optimizing the DBMS: New Order
Time (normalized)
1.25
1
Idle CPU
Violated
Cache Miss
Busy
0.75
0.5
0.25
0
This process took me 30 days
and <1200 lines of code.
Copyright 2005 Chris Colohan
29
Other TPC-C Transactions
Time (normalized)
1
3/5 Transactions
speed up by 46-66%
0.75
Idle CPU
Failed
Cache Miss
Busy
0.5
0.25
0
New Order
Copyright 2005 Chris Colohan
Delivery
Stock Level
Payment
Order Status
30
Conclusions

A new form of parallelism for databases


Intra-transaction parallelism



Tool for attacking transaction latency
Without major changes to DBMS
TLS can be applied to more than
transactions
Halve transaction latency by using 4 CPUs
Copyright 2005 Chris Colohan
31
Any questions?
For more information, see:
www.colohan.com
Backup Slides Follow…
TPC-C Transactions on 2 CPUs
Time (normalized)
1
3/5 Transactions
speed up by 30-44%
0.75
Idle CPU
Failed
Cache Miss
Busy
0.5
0.25
0
New Order
Copyright 2005 Chris Colohan
Delivery
Stock Level
Payment
Order Status
34
LATCHES
Latches


Mutual exclusion between transactions
Cause violations between epochs


Read-test-write cycle  RAW
Not needed between epochs

TLS already provides mutual exclusion!
Copyright 2005 Chris Colohan
36
Large critical section
Latches: Aggressive Acquire
Acquire
latch_cnt++
…work…
latch_cnt--
latch_cnt++
…work…
(enqueue release)
latch_cnt++
…work…
(enqueue release)
Homefree
Commit work
latch_cnt--
Homefree
Commit work
latch_cnt-Release
Copyright 2005 Chris Colohan
37
Latches: Lazy Acquire
Small critical sections
Acquire
…work…
Release
(enqueue acquire)
…work…
(enqueue release)
(enqueue acquire)
…work…
(enqueue release)
Homefree
Copyright 2005 Chris Colohan
Acquire
Commit work
Release
Homefree
Acquire
Commit work
Release
38
HARDWARE
TLS in Database Systems
Time
Large epochs:
• More dependences
• Must tolerate
• More state
• Bigger buffers
Non-Database
TLS
Copyright 2005 Chris Colohan
TLS in Database
Systems
Concurrent
transactions
40
Feedback Loop
for() {
do_work();
}
think
I know this
is parallel!
par_for() {
do_work();
}
Must…Make
…Faster
feed back
back feed
feed back
back feed
feed back
back feed
feed back
Copyright 2005 Chris Colohan
feed
back
feed
back
feed
back
feed
41
Violations == Feedback
*p=
Violation!
=*p
*p=
Time
R2
=*p
*q=
*q=
=*q
=*p
Must…Make
=*q …Faster
Sequential
Copyright 2005 Chris Colohan
Parallel
0x0FD8
0xFD20
0x0FC0
0xFC18
42
Eliminating Violations
0x0FD8
0xFD20
0x0FC0
0xFC18
Violation!
=*p
*p=
Time
R2
=*p
*q=
Violation!
=*q
*q=
=*q
=*q
Parallel
Copyright 2005 Chris Colohan
Optimization may
make slower?
Eliminate *p Dep.
43
Time
Tolerating Violations: Sub-epochs
Violation!
=*q
*q=
Violation!
=*q
*q=
=*q
=*q
Eliminate *p Dep.
Copyright 2005 Chris Colohan
Sub-epochs
44
Sub-epochs

Started periodically by
hardware



How many?
When to start?
Violation!
*q=
Hardware implementation

=*q
Just like epochs


=*q
Use more epoch contexts
No need to check violations
between sub-epochs within
an epoch
Copyright 2005 Chris Colohan
Sub-epochs
45
Download