Hardware Transactional Memory Eyal Widder Nimrod Reiss Instructor: Yehuda Afek Tel Aviv University 10/06/2007 References Thread-Level Transactional Memory – Kevin E. Moore, Mark D. Hill & David A. Wood [2005] LogTM : Log-based Transactional Memory – Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill & David A. Wood [2006] Outline Locks Vs. Transactional Memory Introduction to LogTM LogTM Version Management LogTM Conflict Detection Conclusions The Challenge of Multithreaded SW Goal: Parallelization Problem: Unrestricted concurrency bugs Solution: Synchronization New problem: Synchronization • Tension between performance and correctness Current Mechanism: Locks • Locks: objects only one thread can hold at a time • • • Correctness issues – – • Organization: lock for each shared structure Usage: (block) acquire access release Under-locking data races Acquires in different orders deadlock Performance issues – – – Conservative serialization Overhead of acquiring Difficult to find right granularity Transactions vs. Locks Lock issues: – Under-locking – Acquires in different orders – Blocking – Conservative serialization How transactions help: – Simpler interface – No ordering – Can cancel transactions – Serialization only on conflicts Locks simplicity/performance tension Transactions (potentially) simple and efficient Transaction Semantics ACI Properties • Atomicity – All or Nothing • Consistency – Correct at beginning and end • Isolation – Partially done work not visible to other threads Thread-Level Transactional Memory • Separate semantics from implementation • Adapt DBMS(database management systems) concepts: • • • Concurrency control algorithms Conflict detection Taking the appropriate action (commit\abort\delay) Main challenge: Reduce the overhead of enforcing the ACI properties! Basic Idea Module TM like virtual memory: • • • • A thread level abstraction Use 3 types of interfaces – User, System\Library, Low-level An interface independent of implementation Combine HW and SW in implementation How Do Transactional Memory Systems Differ? (Data) Version Management – Keep old values for abort AND new values for commit – Eager: record old values “elsewhere”; update “in place” – Lazy: update “elsewhere”; keep old values “in place” Fast commit (Data) Conflict Detection – Find read-write, write-read or write-write conflicts among concurrent transactions – Eager: detect conflict on every read/write – Lazy: detect conflict at end (commit/abort) Less wasted work Outline Locks Vs. TM Introduction to LogTM LogTM Version Management LogTM Conflict Detection Conclusions Log Based Transactional Memory – LogTM • (Hardware) Transactional Memory promising – Most use lazy version management • • – • Old values “in place” New values “elsewhere” Commits slower than aborts New LogTM: Log-based Transactional Memory – Uses eager version management (like most databases) • • – – – Old values to log in thread-private virtual memory New values “in place” Makes common commits fast! Hardware traps to Software handler Aborts handled in software Outline Locks Vs. TM Introduction to LogTM LogTM Version Management LogTM Conflict Detection Conclusions LogTM’s Eager Version Management Old values stored in the transaction log – – – A per-thread linear (virtual) address space (like the stack) Filled by hardware (during transactions) Read by software (on abort) New values stored “in place” Transaction Log Example VA Initial State LogBase = LogPointer TM count > 0 Log Base 1000 Log Ptr 1000 TM count 0 1 Data Block R W 00 12-------------- 0 0 40 --------------23 0 0 C0 34-------------- 0 0 1000 1040 1080 Transaction Log Example • Store r2, (c0) /* r2 = 56 */ – – – – Set W bit for block (c0) Store address (c0) and old data on the log Increment Log Ptr to 1048 Update memory Log Base 1000 Log Ptr 1000 1048 TM count 1 VA Data Block R W 00 12-------------- 0 0 40 --------------23 0 0 C0 34-------------56-------------- 0 0 1 1000 1040 1080 c0 34------------- Transaction Log Example VA Data Block R W Commit transaction – – – Clear R & W for all blocks Reset Log Ptr to Log Base (1000) Clear TM count Log Base 1000 Log Ptr 1048 1000 TM count 1 0 00 12-------------- 0 0 40 --------------23 0 0 C0 56-------------- 0 0 1 1000 c0 34------------ 1040 1080 -- Transaction Log Example Abort transaction – – – – VA Replay log entries to “undo” the transaction Reset Log Ptr to Log Base (1000) Clear R & W bits for all blocks Clear TM count Log Base 1000 Log Ptr 1000 1048 1048 TM count 1 0 Data Block R W 00 12-------------- 0 0 40 --------------23 0 0 C0 56-------------34-------------- 0 0 1 1000 c0 34------------ 1040 1080 -- Eager Version Management Discussion Advantages: – Fast Commits No copying Common case Disadvantages: – Slow/Complex Aborts – Undo aborting transaction Relies on Eager Conflict Detection/Prevention Outline Locks Vs. TM Introduction to LogTM LogTM Version Management LogTM Conflict Detection Conclusions LogTM’s Eager Conflict Detection 1) 2) 3) 4) 5) Requesting processor sends a coherence request to the directory. The directory responds and possibly forwards the request to one or more processors. Each responding processor examines some local state to detect a conflict. The responding processors each ack or nack the request. The requesting processor resolves any conflict. Conflict Detection Validation is retained by using the R,W bits and the directory MOESI states. A “Sticky State” is used to detect possible conflicts from overflows Conflict Detection (example) P0 store – – – P0 sends get exclusive (GETX) request Directory responds with data (old) GETX P0 executes store P0 1 TM mode 0 Overflow 0 IM (--) (-W) (--) [none] [new] [old] Directory I [old] M@P0 [old] DATA P1 TM mode 0 Overflow 0 I (--) [none] Conflict Detection (example) In-cache transaction conflict – – – Directory P1 sends get shared (GETS) request Directory forwards to P0 P1 detects conflict and sends NACK P0 Fwd_GETS M@P0 [old] GETS P1 1 TM mode 0 Overflow 0 M (-W) [new] TM mode 0 Overflow 0 I (--) [none] Conflict! NACK Conflict Detection (example) Cache overflow – – – – P0 sends put exclusive (PUTX) request Directory acknowledges P0 sets overflow bit P0 writes data back to memory P0 Directory PUTX M@P0 [old] Msticky @P0 [new] ACK DATA 1 TM mode 0 Overflow 1 0 (-W) [none] [new] IM (--) P1 TM mode 0 Overflow 0 I (--) [none] Conflict Detection (example) Out-of-cache conflict – – – – P1 sends GETS request Directory forwards to P0 P0 detects a (possible) conflict P0 sends NACK P0 Directory M@P0@P0 [old] Msticky [new] GETS Fwd_GETS P1 1 TM mode 0 Overflow 1 0 (--) (-W) [none] [old] [new] IM (--) TM mode 0 Overflow 0 I (--) [none] NACK Conflict! Conflict Detection (example) Commit – P0 clears TM mode and Overflow bits Directory M@P0@P0 [old] Msticky [new] P0 0 TM mode 1 Overflow 0 1 (--) (-W) [none] [old] [new] IM (--) P1 TM mode 0 Overflow 0 I (--) [none] Conflict Detection (example) Lazy cleanup – – – – P1 sends GETS request Directory forwards request to P0 P0 detects no conflict, sends CLEAN Directory sends Data to P1 P0 Fwd_GETS TM mode 0 Overflow 0 (--) (-W) [none] [old] [new] IM (--) Directory S(P1) Msticky @P0[new] [new] GETS CLEAN DATA P1 TM mode 0 Overflow 0 (--) [none] [new] IS (--) LogTM’s Conflict Detection w/ Cache Overflow At overflow at processor P – – At transaction end (commit or abort) at processor P – Set P’s overflow bit (1 bit per processor) Allow writeback, but set directory state to Sticky@P Reset P’s overflow bit At (potential) conflicting request by processor R – – – Directory forwards R’s request to P. P tells R “no conflict” if overflow is reset But asserts conflict if set (w/ small chance of false positive) Conflict Resolution Conflict Resolution – – – LogTM resolves conflicts at requesting processor – – Can wait risking deadlock Can abort risking livelock Wait/abort transaction at requesting or responding proc? Requesting processor waits (using coherence nacks/retries) But aborts if other processor is waiting (deadlock possible) & it is logically younger (using timestamps) Future: Requesting processor traps to software contention manager that decides who waits/aborts Outline Locks Vs. TM Introduction to LogTM LogTM Version Management LogTM Conflict Detection Conclusions Conclusion Commits are far more common than aborts – – – Conflicts are rare Most conflicts can be resolved w/o aborts Software aborts do not impact performance Overflows are rare (in current benchmarks) LogTM – – Eager Version Management makes the common case (commit) fast Sticky States/Lazy Cleanup detects conflicts outside the cache (if overflows are infrequent) QUESTIONS? Break Time! References LogTM : Log-based Transactional Memory – Kevin E. Moore, Jayaram Bobba, Michelle J. Moravam, Mark D. Hill & David A. Wood [2006] Supporting Nested Transactional Memory in LogTM – Michelle J. Moravam, Jayaram Bobba, Kevin E. Moore, Luke Yen, Mark D. Hill, Ben Liblit, Michael M. Swift & David A. Wood [2006] Motivation Till now: Transactional Memory promises lock-free atomic, consistent and isolated execution. But what should occur when a transaction executes another transaction within ? LogTM enables flattening In the last lecture we’ve introduced LogTM which enables subsuming inner transactions into the top-level transaction. A counter is used to count the nesting level, Transaction_begin() increments and Transaction_end() decrements. A conflict on an inner transaction may cause a complete abort to the beginning of the toplevel one. Challenges in nesting transactions Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems. Facilitating Software Composition Calling modules that use locks within requires caller knowledge of internal module implementation details. In order to aid modular programming, transactional memory should support nesting. Challenges in nesting transactions Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems. Enhancing Concurrency Closed nesting does not eliminate all problems posed by modular software. – Concurrency is limited by maintaining isolation until the top-level transaction commits. Example P2 P1 Transaction L conflict Transaction T conflict Transaction S pNextFree Transaction S M@P1 How would you do it differently ? Ideally S should release pNextFree so that other transactions can access the allocator without conflicting with transaction L. Challenges in nesting transactions Facilitating Software Composition. Enhancing Concurrency. Escaping to non-transactional systems. Escaping to non-transactional systems. Many TM systems will run on top of nontransactional base systems that may include: – – – Runtime libraries Operation systems Language virtual machines (e.g. JVM) STMs handle such escapes easily. An escape to non-transactional system must disable HTM mechanisms to allow correct operation. Allow Inter-Transaction / Device communication. Outline Motivation and challenges. Closed vs. Open Nesting. Nested LogTM. – Supporting Closed Nesting. – Supporting Open Nesting. – Partial aborts. Abort actions / Commit actions. Condition O1. Escape actions. Conclusions. Closed vs. Open nesting Closed Nested Transactions extends isolation of an inner transaction until the top-level transaction commits. Open Nested Transactions allow committing inner transaction to immediately release isolation. Closed Nested Transactions May flatten transactions into the top-level one (as we’ve already seen) . May allow partial roll-back. Open Nested Transactions Increase concurrency and expressiveness. May increase both SW & HW complexity. Higher-level atomicity – – Child’s memory updates not undone if parent aborts Use abort action to undo the child’s forward action at a higher-level of abstraction E.g., malloc() compensated by free() Higher-level isolation – – – Release memory-level isolation Programmer enforce isolation at higher level (e.g., locks) Use commit action to release isolation at parent commit Outline Motivation and challenges. Closed vs. Open Nesting. Nested LogTM. – Supporting Closed Nesting. – Supporting Open Nesting. – Partial aborts. Compensating actions / commit actions. Condition O1. Escape actions. Conclusions. Nested LogTM Nested LogTM extends Flat LogTM (last lecture). Splits the log into “frames”. – – Header contains Frame Pointer to the parent’s Header. Header Header contains register Undo record checkpoint. Undo record Header Log Frame Log Ptr Undo record Undo record Level 1 Nested LogTM Replicates R/W bits. – – Maintains a separate Read set, Write set for each nesting level. Use constant (k) number of R/W sets, and flatten transactions whose nesting level is bigger than k. Closed Nested LogTM On Commit : Top Level Transactions commit normally. t If ( 1 < curr_level ≤ k) : Merge the current log frame with parent’s. “Flash – OR” R/W bits of curr_level – 1 with curr_level ‘s. Decrement curr_level . Otherwise: Merge the current log frame with parent’s. Decrement curr_level . Closed Nested LogTM Conflict detection : An incoming read from memory location m conflicts with another thread’s level j transaction if j is the minimal level where block(m)’s Write bit is set. An incoming write to memory location m conflicts with another thread’s level j transaction if j is the minimal level where block(m)’s Write or Read bit is set. Closed Nested LogTM On Abort : An abort of the current transaction at curr_level traps to a software handler. Suppose the transaction aborts for a conflict in abort_level transaction. The software handler walks the log frame backwards and undoes curr_level – abort_level + 1 log frames. Finally it restores the register state save in header. Log frame pointer end pointer 2, a garbage header 6, c // thread i at level 0 (Non-transactional) 4, b a = 2; b = 4; c = 6; // Initialize transaction_begin() // top-level (level 1) a = b + 1; // a gets 5. transaction_begin(); // level 2 c = b – 3; // c gets 1. Var b = a + 2; // b gets 7. a b a = c + 7; // a gets 8. c transaction_commit(); // level 2. transaction_commit(); // level 1. 5, a Cache R1 R2 W1 W2 Val 1 01 0 0 0 1 1 0 01 1 Level 0 0 1 1 0 0 1 1 0 0 1 1 1 2 0 8 2 5 4 7 6 1 Supporting Open Transactions When an open nested transaction Topen at level j commits: Its frame is discarded from the log. R/W bits for level j are cleared. (Optionally) Append commit and abort action records, Copen and Aopen to the newly exposed end of Topen’s parent’s frame. Commit and Abort Actions To ensure consistency, open nested transactions must raise the abstraction level of both isolation and rollback. Commit actions are executed in FIFO order while Abort actions are executed in LIFO order. Log frame pointer end pointer 2, a Aopen 6, c 4, b // thread i at level 0 (Non-transactional) 5, a a = 2; b = 4; c = 6; // Initialize transaction_begin() // top-level (level 1) a = b + 1; // a gets 5. transaction_begin(); // level 2 Cache c = b – 3; // c gets 1. Var R R W W b = a + 2; // b gets 7. a 0 0 1 1 0 1 b 1 0 1 0 0 1 a = c + 7; // a gets 8. c 0 0 1 0 0 1 transaction_commit(); // level 2. transaction_commit(); // level 1. Level 2 1 1 2 1 2 Val 8 7 1 Condition O1 No Writes to Data Written by Ancestors Neither an open transaction Topen nor its commit and abort actions, Copen and Aopen writes any data written by Topen’s ancestors. counter = 0; // initialize transaction_begin ( ) ; counter++; Example open_begin ( ) ; // top-level // counter gets 1. // level 2 counter++; // counter gets 2. // commit with an abort action. open_commit ( abort_action( decr(counter) ) ); ….. // Abort and run abort action // Expect counter to be 0. …. transaction_commit(); // not executed. O1 Violation Escape Actions “Real world” is not transactional Current OS’s are not transactional Systems should allow non-transactional escapes from a transaction Interact with OS, VM, devices, etc. Escape Actions – First Class Keep a per-thread “Escape” bit. Escape Actions read most recent values from memory (Even uncommitted). Escape Actions never aborts or stalls. Similar to Open Transaction, an escape action may register Commit/Abort actions. Conclusions Closed Nesting is easy to implement, and may allow partial rollback to improve efficiency. Open Nesting improves concurrency in cost for higher level atomicity and isolation and the complexity of software implementation. Using open nesting it is possible to provide non-transactional operations inside transactions. QUESTIONS? The End 10/06/2007