HTM_Eyal_and_Nimord

advertisement
Hardware Transactional Memory
Eyal Widder
Nimrod Reiss
Instructor: Yehuda Afek
Tel Aviv University
10/06/2007
References

Thread-Level Transactional Memory
–

Kevin E. Moore, Mark D. Hill & David A. Wood
[2005]
LogTM : Log-based Transactional Memory
–
Kevin E. Moore, Jayaram Bobba, Michelle J.
Moravam, Mark D. Hill & David A. Wood [2006]
Outline





Locks Vs. Transactional Memory
Introduction to LogTM
LogTM Version Management
LogTM Conflict Detection
Conclusions
The Challenge of Multithreaded SW
Goal: Parallelization
Problem: Unrestricted concurrency  bugs
Solution: Synchronization
New problem: Synchronization
•
Tension between performance and correctness
Current Mechanism: Locks
•
Locks: objects only one thread can hold at a time
•
•
•
Correctness issues
–
–
•
Organization: lock for each shared structure
Usage: (block)  acquire  access  release
Under-locking  data races
Acquires in different orders  deadlock
Performance issues
–
–
–
Conservative serialization
Overhead of acquiring
Difficult to find right granularity
Transactions vs. Locks
Lock issues:
–
Under-locking
–
Acquires in different orders
–
Blocking
–
Conservative serialization
How transactions help:
–
Simpler interface
–
No ordering
–
Can cancel transactions
–
Serialization only on conflicts
Locks  simplicity/performance tension
Transactions  (potentially) simple and efficient
Transaction Semantics ACI Properties
•
Atomicity – All or Nothing
•
Consistency – Correct at beginning and end
•
Isolation – Partially done work not visible to
other threads
Thread-Level Transactional Memory
•
Separate semantics from implementation
•
Adapt DBMS(database management systems) concepts:
•
•
•
Concurrency control algorithms
Conflict detection
Taking the appropriate action
(commit\abort\delay)
Main challenge: Reduce the overhead of
enforcing the ACI properties!
Basic Idea
Module TM like virtual memory:
•
•
•
•
A thread level abstraction
Use 3 types of interfaces – User,
System\Library, Low-level
An interface independent of implementation
Combine HW and SW in implementation
How Do Transactional Memory
Systems Differ?


(Data) Version Management
–
Keep old values for abort AND new values for commit
–
Eager: record old values “elsewhere”; update “in place” 
–
Lazy: update “elsewhere”; keep old values “in place”
Fast
commit
(Data) Conflict Detection
–
Find read-write, write-read or write-write conflicts
among concurrent transactions
–
Eager: detect conflict on every read/write
–
Lazy: detect conflict at end (commit/abort)
 Less
wasted work
Outline





Locks Vs. TM
Introduction to LogTM
LogTM Version Management
LogTM Conflict Detection
Conclusions
Log Based Transactional Memory –
LogTM
•
(Hardware) Transactional Memory promising
–
Most use lazy version management
•
•
–
•
Old values “in place”
New values “elsewhere”
Commits slower than aborts
New LogTM: Log-based Transactional Memory
–
Uses eager version management (like most databases)
•
•
–
–
–
Old values to log in thread-private virtual memory
New values “in place”
Makes common commits fast!
Hardware traps to Software handler
Aborts handled in software
Outline





Locks Vs. TM
Introduction to LogTM
LogTM Version Management
LogTM Conflict Detection
Conclusions
LogTM’s Eager Version Management

Old values stored in the transaction log
–
–
–

A per-thread linear (virtual) address space (like
the stack)
Filled by hardware (during transactions)
Read by software (on abort)
New values stored “in place”
Transaction Log Example
VA



Initial State
LogBase = LogPointer
TM count > 0
Log Base
1000
Log Ptr
1000
TM count
0
1
Data Block
R W
00
12--------------
0
0
40
--------------23
0
0
C0
34--------------
0
0
1000
1040
1080
Transaction Log Example
•
Store r2, (c0) /* r2 = 56 */
–
–
–
–
Set W bit for block (c0)
Store address (c0) and old
data on the log
Increment Log Ptr to 1048
Update memory
Log Base
1000
Log Ptr
1000
1048
TM count
1
VA
Data Block
R W
00
12--------------
0
0
40
--------------23
0
0
C0
34-------------56--------------
0
0
1
1000
1040
1080
c0 34-------------
Transaction Log Example
VA
Data Block
R W
Commit transaction
–
–
–
Clear R & W for all
blocks
Reset Log Ptr to Log
Base (1000)
Clear TM count
Log Base
1000
Log Ptr
1048
1000
TM count
1
0
00
12--------------
0
0
40
--------------23
0
0
C0
56--------------
0
0
1
1000
c0 34------------
1040
1080
--
Transaction Log Example
Abort transaction
–
–
–
–
VA
Replay log entries to
“undo” the transaction
Reset Log Ptr to Log
Base (1000)
Clear R & W bits for all
blocks
Clear TM count
Log Base
1000
Log Ptr
1000
1048
1048
TM count
1
0
Data Block
R W
00
12--------------
0
0
40
--------------23
0
0
C0
56-------------34--------------
0
0
1
1000
c0 34------------
1040
1080
--
Eager Version Management
Discussion

Advantages:
–
Fast Commits



No copying
Common case
Disadvantages:
–
Slow/Complex Aborts

–
Undo aborting transaction
Relies on Eager Conflict Detection/Prevention
Outline





Locks Vs. TM
Introduction to LogTM
LogTM Version Management
LogTM Conflict Detection
Conclusions
LogTM’s Eager Conflict Detection
1)
2)
3)
4)
5)
Requesting processor sends a coherence request
to the directory.
The directory responds and possibly forwards the
request to one or more processors.
Each responding processor examines some local
state to detect a conflict.
The responding processors each ack or nack the
request.
The requesting processor resolves any conflict.
Conflict Detection

Validation is retained by using the R,W bits
and the directory MOESI states.

A “Sticky State” is used to detect possible
conflicts from overflows
Conflict Detection (example)
P0 store
–
–
–
P0 sends get exclusive
(GETX) request
Directory responds with
data (old)
GETX
P0 executes store
P0
1
TM mode 0
Overflow 0
IM (--)
(-W)
(--) [none]
[new]
[old]
Directory
I
[old]
M@P0
[old]
DATA
P1
TM mode 0
Overflow 0
I (--) [none]
Conflict Detection (example)

In-cache transaction
conflict
–
–
–
Directory
P1 sends get shared
(GETS) request
Directory forwards to P0
P1 detects conflict and
sends NACK
P0
Fwd_GETS
M@P0 [old]
GETS
P1
1
TM mode 0
Overflow 0
M (-W) [new]
TM mode 0
Overflow 0
I (--) [none]
Conflict!
NACK
Conflict Detection (example)
Cache overflow
–
–
–
–
P0 sends put exclusive
(PUTX) request
Directory acknowledges
P0 sets overflow bit
P0 writes data back to
memory
P0
Directory
PUTX
M@P0
[old]
Msticky
@P0
[new]
ACK
DATA
1
TM mode 0
Overflow 1
0
(-W) [none]
[new]
IM (--)
P1
TM mode 0
Overflow 0
I (--) [none]
Conflict Detection (example)

Out-of-cache conflict
–
–
–
–
P1 sends GETS request
Directory forwards to P0
P0 detects a (possible)
conflict
P0 sends NACK
P0
Directory
M@P0@P0
[old]
Msticky
[new]
GETS
Fwd_GETS
P1
1
TM mode 0
Overflow 1
0
(--)
(-W) [none]
[old]
[new]
IM (--)
TM mode 0
Overflow 0
I (--) [none]
NACK
Conflict!
Conflict Detection (example)

Commit
–
P0 clears TM mode and
Overflow bits
Directory
M@P0@P0
[old]
Msticky
[new]
P0
0
TM mode 1
Overflow 0
1
(--)
(-W) [none]
[old]
[new]
IM (--)
P1
TM mode 0
Overflow 0
I (--) [none]
Conflict Detection (example)

Lazy cleanup
–
–
–
–
P1 sends GETS request
Directory forwards
request to P0
P0 detects no conflict,
sends CLEAN
Directory sends Data to
P1
P0
Fwd_GETS
TM mode 0
Overflow 0
(--)
(-W) [none]
[old]
[new]
IM (--)
Directory
S(P1)
Msticky
@P0[new]
[new]
GETS
CLEAN
DATA
P1
TM mode 0
Overflow 0
(--) [none]
[new]
IS (--)
LogTM’s Conflict Detection w/ Cache
Overflow

At overflow at processor P
–
–

At transaction end (commit or abort) at processor P
–

Set P’s overflow bit (1 bit per processor)
Allow writeback, but set directory state to Sticky@P
Reset P’s overflow bit
At (potential) conflicting request by processor R
–
–
–
Directory forwards R’s request to P.
P tells R “no conflict” if overflow is reset
But asserts conflict if set (w/ small chance of false positive)
Conflict Resolution

Conflict Resolution
–
–
–

LogTM resolves conflicts at requesting processor
–
–

Can wait risking deadlock
Can abort risking livelock
Wait/abort transaction at requesting or responding proc?
Requesting processor waits (using coherence nacks/retries)
But aborts if other processor is waiting (deadlock possible)
& it is logically younger (using timestamps)
Future: Requesting processor traps to software
contention manager that decides who waits/aborts
Outline





Locks Vs. TM
Introduction to LogTM
LogTM Version Management
LogTM Conflict Detection
Conclusions
Conclusion

Commits are far more common than aborts
–
–
–


Conflicts are rare
Most conflicts can be resolved w/o aborts
Software aborts do not impact performance
Overflows are rare (in current benchmarks)
LogTM
–
–
Eager Version Management makes the common case
(commit) fast
Sticky States/Lazy Cleanup detects conflicts outside the
cache (if overflows are infrequent)
QUESTIONS?
Break Time!
References

LogTM : Log-based Transactional Memory
–

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill &
David A. Wood [2006]
Supporting Nested Transactional Memory
in LogTM
–
Michelle J. Moravam, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D.
Hill, Ben Liblit, Michael M. Swift & David A. Wood [2006]
Motivation
Till now: Transactional Memory promises lock-free
atomic, consistent and isolated execution.
 But what should occur when a transaction
executes another transaction within ?
LogTM enables flattening



In the last lecture we’ve introduced LogTM
which enables subsuming inner transactions
into the top-level transaction.
A counter is used to count the nesting level,
Transaction_begin() increments and
Transaction_end() decrements.
A conflict on an inner transaction may cause
a complete abort to the beginning of the toplevel one.
Challenges in nesting transactions



Facilitating Software Composition.
Enhancing Concurrency.
Escaping to non-transactional
systems.
Facilitating Software Composition


Calling modules that use locks within
requires caller knowledge of internal module
implementation details.
In order to aid modular programming,
transactional memory should support
nesting.
Challenges in nesting transactions



Facilitating Software Composition.
Enhancing Concurrency.
Escaping to non-transactional
systems.
Enhancing Concurrency

Closed nesting does not eliminate all
problems posed by modular software.
–
Concurrency is limited by maintaining isolation
until the top-level transaction commits.
Example
P2
P1
Transaction L
conflict
Transaction T
conflict
Transaction S
pNextFree
Transaction S
M@P1
 How would you do it differently ?
Ideally S should release pNextFree so that other transactions can access the
allocator without conflicting with transaction L.
Challenges in nesting transactions



Facilitating Software Composition.
Enhancing Concurrency.
Escaping to non-transactional
systems.
Escaping to non-transactional
systems.

Many TM systems will run on top of nontransactional base systems that may include:
–
–
–



Runtime libraries
Operation systems
Language virtual machines (e.g. JVM)
STMs handle such escapes easily.
An escape to non-transactional system must disable
HTM mechanisms to allow correct operation.
Allow Inter-Transaction / Device communication.
Outline



Motivation and challenges.
Closed vs. Open Nesting.
Nested LogTM.
–
Supporting Closed Nesting.

–
Supporting Open Nesting.


–

Partial aborts.
Abort actions / Commit actions.
Condition O1.
Escape actions.
Conclusions.
Closed vs. Open nesting


Closed Nested Transactions
extends isolation of an inner transaction until
the top-level transaction commits.
Open Nested Transactions
allow committing inner transaction to
immediately release isolation.
Closed Nested Transactions


May flatten transactions into the top-level
one (as we’ve already seen) .
May allow partial roll-back.
Open Nested Transactions

Increase concurrency and expressiveness.
May increase both SW & HW complexity.

Higher-level atomicity

–
–
Child’s memory updates not undone if parent aborts
Use abort action to undo the child’s forward action at a
higher-level of abstraction


E.g., malloc() compensated by free()
Higher-level isolation
–
–
–
Release memory-level isolation
Programmer enforce isolation at higher level (e.g., locks)
Use commit action to release isolation at parent commit
Outline



Motivation and challenges.
Closed vs. Open Nesting.
Nested LogTM.
–
Supporting Closed Nesting.

–
Supporting Open Nesting.


–

Partial aborts.
Compensating actions / commit actions.
Condition O1.
Escape actions.
Conclusions.
Nested LogTM 


Nested LogTM extends Flat LogTM (last
lecture).
Splits the log into “frames”.
–
–
Header contains Frame Pointer to the parent’s
Header.
Header
Header contains register
Undo record
checkpoint.
Undo record
Header
Log Frame
Log Ptr
Undo record
Undo record
Level 1
Nested LogTM 

Replicates R/W bits.
–
–
Maintains a separate Read set, Write set for each
nesting level.
Use constant (k) number of R/W sets, and flatten
transactions whose nesting level is bigger than k.
Closed Nested LogTM
On Commit :

Top Level Transactions commit normally.
t
If ( 1 < curr_level ≤ k) :



Merge the current log frame with parent’s.
“Flash – OR” R/W bits of curr_level – 1 with curr_level ‘s.
Decrement curr_level .
Otherwise:


Merge the current log frame with parent’s.
Decrement curr_level .
Closed Nested LogTM
Conflict detection :


An incoming read from memory location m
conflicts with another thread’s level j
transaction if j is the minimal level where
block(m)’s Write bit is set.
An incoming write to memory location m
conflicts with another thread’s level j
transaction if j is the minimal level where
block(m)’s Write or Read bit is set.
Closed Nested LogTM
On Abort :




An abort of the current transaction at curr_level
traps to a software handler.
Suppose the transaction aborts for a conflict in
abort_level transaction.
The software handler walks the log frame backwards
and undoes curr_level – abort_level + 1 log
frames.
Finally it restores the register state save in header.
Log
frame pointer
end pointer
2, a
garbage header
6, c
// thread i at level 0 (Non-transactional)
4, b
a = 2; b = 4; c = 6;
// Initialize
transaction_begin()
// top-level (level 1)
a = b + 1;
// a gets 5.
transaction_begin();
// level 2
c = b – 3; // c gets 1.
Var
b = a + 2; // b gets 7.
a
b
a = c + 7; // a gets 8.
c
transaction_commit();
// level 2.
transaction_commit(); // level 1.
5, a
Cache
R1 R2 W1 W2 Val
1 01
0
0 0
1
1
0 01
1
Level
0 0
1
1
0 0
1
1
0 0
1
1
1
2
0
8
2
5
4
7
6
1
Supporting Open Transactions
When an open nested transaction Topen at level
j commits:



Its frame is discarded from the log.
R/W bits for level j are cleared.
(Optionally) Append commit and abort action
records, Copen and Aopen to the newly exposed
end of Topen’s parent’s frame.
Commit and Abort Actions


To ensure consistency, open nested
transactions must raise the abstraction level
of both isolation and rollback.
Commit actions are executed in FIFO order
while Abort actions are executed in LIFO
order.
Log
frame pointer
end pointer
2, a
Aopen
6, c
4, b
// thread i at level 0 (Non-transactional)
5, a
a = 2; b = 4; c = 6;
// Initialize
transaction_begin()
// top-level (level 1)
a = b + 1;
// a gets 5.
transaction_begin();
// level 2
Cache
c = b – 3; // c gets 1.
Var R R W W
b = a + 2; // b gets 7.
a
0 0
1 1 0
1
b
1 0
1 0 0
1
a = c + 7; // a gets 8.
c
0 0
1 0 0
1
transaction_commit();
// level 2.
transaction_commit(); // level 1.
Level
2
1
1
2
1
2
Val
8
7
1
Condition O1
No Writes to Data Written by Ancestors
Neither an open transaction Topen nor its commit and abort actions, Copen
and Aopen writes any data written by Topen’s ancestors.
counter = 0;
// initialize
transaction_begin ( ) ;
counter++;
Example
open_begin ( ) ;
// top-level
// counter gets 1.
// level 2
counter++;
// counter gets 2.
// commit with an abort action.
open_commit (
abort_action( decr(counter) ) );
…..
// Abort and run abort action
// Expect counter to be 0.
….
transaction_commit(); // not executed.
O1 Violation
Escape Actions


“Real world” is not transactional
Current OS’s are not transactional

Systems should allow non-transactional
escapes from a transaction

Interact with OS, VM, devices, etc.
Escape Actions – First Class




Keep a per-thread “Escape” bit.
Escape Actions read most recent values from
memory (Even uncommitted).
Escape Actions never aborts or stalls.
Similar to Open Transaction, an escape
action may register Commit/Abort actions.
Conclusions



Closed Nesting is easy to implement, and
may allow partial rollback to improve
efficiency.
Open Nesting improves concurrency in cost
for higher level atomicity and isolation and
the complexity of software implementation.
Using open nesting it is possible to provide
non-transactional operations inside
transactions.
QUESTIONS?
The End
10/06/2007
Download