Transactional Memory Yujia Jin Lock and Problems • Lock is commonly used with shared data • Priority Inversion – Lower priority process hold a lock needed by a higher priority process • Convoy Effect – When lock holder is interrupted, other is forced to wait • Deadlock – Circular dependence between different processes acquiring locks, so everyone just wait for locks Lock-free • Shared data structure is lock-free if its operations do not require mutual exclusion - Will not prevent multiple processes operating on the same object + avoid lock problems - Existing lock-free techniques use software and do not perform well against lock counterparts Transactional Memory • Use transaction style operations to operate on lock free data • Allow user to customized read-modifywrite operation on multiple, independent words • Easy to support with hardware, straight forward extensions to conventional multiprocessor cache Transaction Style • A finite sequence of machine instruction with – – – – Sequence of reads, Computation, Sequence of write and Commit • Formal properties – Atomicity, Serializability (~ACID) Access Instructions • Load-transactional (LT) – Reads from shared memory into private register • Load-transactional-exclusive (LTX) – LT + hinting write is coming up • Store-transactional (ST) – Tentatively write from private register to shared memory, new value is not visible to other processors till commit State Instructions • Commit – Tries to make tentative write permanent. – Successful if no other processor read its read set or write its write set – When fails, discard all updates to write set – Return the whether successful or not • Abort – Discard all updates to write set • Validate – Return current transaction status – If current status is false, discard all updates to write set Typical Transaction /* keep trying */ While ( true ) { /* read variables */ v1 = LT ( V1 ); …; vn = LT ( Vn ); /* check consistency */ if ( ! VALIDATE () ) continue; /* compute new values */ compute ( v1, … , vn); /* write tentative values */ ST (v1, V1); … ST(vn, Vn); /* try to commit */ if ( COMMIT () ) return result; else backoff; } Warning… • Not intended for database use • Transactions are short in time • Transactions are small in dataset Idea Behind Implementation • Existing cache protocol detects accessibility conflicts • Accessibility conflicts ~ transaction conflicts • Can extended to cache coherent protocols – Includes bus snoopy, directory Bus Snoopy Example Regular cache 2048 8-byte lines Direct mapped processor bus Transaction cache 64 8-byte lines Fully associative • Caches are exclusive • Transaction cache contains tentative writes without propagating them to other processors Transaction Cache • Cache line contains separate transactional tag in addition to coherent protocol tag – Transactional tag state: empty, normal, xcommit, xabort • Two entries per transaction – Modification write to xabort, set to empty when abort – Xcommit contains the original, set to empty when commits • Allocation policy order in decreasing favor – Empty entries, normal entries, xcommit entries • Must guarantee a minimum transaction size Bus Actions • T_READ and T_RFO(read for ownership) are added for transactional requests • Transactional request can be refused by responding BUSY • When BUSY response is received, transaction is aborted – This prevents deadlock and continual mutual aborts – Can subject to starvation Processor Actions • Transaction active (TACTIVE) flag indicate whether a transaction is in progress, set on first transactional operation • Transaction status (TSTATUS) flag indicate whether a transaction is aborted LT Actions • Check for XABORT entry • If false, check for NORMAL entry – Switch NORMAL to XABORT and allocate XCOMMIT • If false, issue T_READ on bus, then allocate XABORT and XCOMMIT • If T_READ receive BUSY, abort – – – – Set TSTATUS to false Drop all XABORT entries Set all XCOMMIT entries to NORMAL Return random data LTX and ST Actions • Same as LT Except – Use T_RFO on a miss rather than T_READ – For ST, XABORT entry is updated More Exciting Actions • VALIDATE – Return TSTATUS flag – If false, set TSTATUS true, TACTIVE false • ABORT – Update cache, set TSTATUS true, TACTIVE false • COMMIT – Return TSTATUS, set TSTATUS true, TACTIVE false – Drops all XCOMMIT and changes all XABORT to NORMAL Snoopy Cache Actions • Regular cache acts like MESI invalidate, treats READ same as T_READ, RFO same as T_RFO • Transactional cache – Non-transactional cycle: Acts like regular cache with NORMAL entries only – T_READ: If the the entry is valid (share), returns the value – All other cycle: BUSY Simulation • Proteus Simulator • 32 processors • Regular cache – Direct mapped, 2048 8-byte lines • Transactional cache – Fully associative, 64 8-byte lines • • • • Single cycle caches access 4 cycle memory access Both snoopy bus and directory are simulated 2 stage network with switch delay of 1 cycle each Benchmarks • Counter – n processors, each increment a shared counter (2^16)/n times • Producer/Consumer buffer – n/2 processors produce, n/2 processor consume through a shared FIFO – end when 2^16 items are consumed • Doubly-linked list – – – – N processors tries to rotate the content from tail to head End when 2^16 items are moved Variables shared are conditional Traditional locking method can introduce deadlock Comparisons • Competitors – – – – – Transactional memory Load-locked/store-cond (Alpha) Spin lock with backoff Software queue Hardware queue Counter Result Producer/Consumer Result Doubly Linked List Result Conclusion • Avoid extra lock variable and lock problems • Trade dead lock for possible live lock/starvation • Comparable performance to lock technique when shared data structure is small • Relatively easy to implement