ppt

advertisement
Transactional Memory
Yujia Jin
Lock and Problems
• Lock is commonly used with shared data
• Priority Inversion
– Lower priority process hold a lock needed by a higher
priority process
• Convoy Effect
– When lock holder is interrupted, other is forced to wait
• Deadlock
– Circular dependence between different processes
acquiring locks, so everyone just wait for locks
Lock-free
• Shared data structure is lock-free if its
operations do not require mutual exclusion
- Will not prevent multiple processes
operating on the same object
+ avoid lock problems
- Existing lock-free techniques use software
and do not perform well against lock
counterparts
Transactional Memory
• Use transaction style operations to operate
on lock free data
• Allow user to customized read-modifywrite operation on multiple, independent
words
• Easy to support with hardware, straight
forward extensions to conventional
multiprocessor cache
Transaction Style
• A finite sequence of machine instruction
with
–
–
–
–
Sequence of reads,
Computation,
Sequence of write and
Commit
• Formal properties
– Atomicity, Serializability (~ACID)
Access Instructions
• Load-transactional (LT)
– Reads from shared memory into private register
• Load-transactional-exclusive (LTX)
– LT + hinting write is coming up
• Store-transactional (ST)
– Tentatively write from private register to shared
memory, new value is not visible to other
processors till commit
State Instructions
• Commit
– Tries to make tentative write permanent.
– Successful if no other processor read its read set or
write its write set
– When fails, discard all updates to write set
– Return the whether successful or not
• Abort
– Discard all updates to write set
• Validate
– Return current transaction status
– If current status is false, discard all updates to write set
Typical Transaction
/* keep trying */
While ( true ) {
/* read variables */
v1 = LT ( V1 ); …; vn = LT ( Vn );
/* check consistency */
if ( ! VALIDATE () ) continue;
/* compute new values */
compute ( v1, … , vn);
/* write tentative values */
ST (v1, V1); … ST(vn, Vn);
/* try to commit */
if ( COMMIT () ) return result;
else backoff;
}
Warning…
• Not intended for database use
• Transactions are short in time
• Transactions are small in dataset
Idea Behind Implementation
• Existing cache protocol detects accessibility
conflicts
• Accessibility conflicts ~ transaction
conflicts
• Can extended to cache coherent protocols
– Includes bus snoopy, directory
Bus Snoopy Example
Regular cache
2048 8-byte lines
Direct mapped
processor
bus
Transaction cache
64 8-byte lines
Fully associative
• Caches are exclusive
• Transaction cache contains tentative writes
without propagating them to other processors
Transaction Cache
• Cache line contains separate transactional tag in
addition to coherent protocol tag
– Transactional tag state: empty, normal, xcommit, xabort
• Two entries per transaction
– Modification write to xabort, set to empty when abort
– Xcommit contains the original, set to empty when
commits
• Allocation policy order in decreasing favor
– Empty entries, normal entries, xcommit entries
• Must guarantee a minimum transaction size
Bus Actions
• T_READ and T_RFO(read for ownership) are
added for transactional requests
• Transactional request can be refused by
responding BUSY
• When BUSY response is received, transaction is
aborted
– This prevents deadlock and continual mutual aborts
– Can subject to starvation
Processor Actions
• Transaction active (TACTIVE) flag indicate
whether a transaction is in progress, set on
first transactional operation
• Transaction status (TSTATUS) flag indicate
whether a transaction is aborted
LT Actions
• Check for XABORT entry
• If false, check for NORMAL entry
– Switch NORMAL to XABORT and allocate
XCOMMIT
• If false, issue T_READ on bus, then allocate
XABORT and XCOMMIT
• If T_READ receive BUSY, abort
–
–
–
–
Set TSTATUS to false
Drop all XABORT entries
Set all XCOMMIT entries to NORMAL
Return random data
LTX and ST Actions
• Same as LT Except
– Use T_RFO on a miss rather than T_READ
– For ST, XABORT entry is updated
More Exciting Actions
• VALIDATE
– Return TSTATUS flag
– If false, set TSTATUS true, TACTIVE false
• ABORT
– Update cache, set TSTATUS true, TACTIVE false
• COMMIT
– Return TSTATUS, set TSTATUS true, TACTIVE false
– Drops all XCOMMIT and changes all XABORT to
NORMAL
Snoopy Cache Actions
• Regular cache acts like MESI invalidate, treats
READ same as T_READ, RFO same as T_RFO
• Transactional cache
– Non-transactional cycle: Acts like regular cache with
NORMAL entries only
– T_READ: If the the entry is valid (share), returns the
value
– All other cycle: BUSY
Simulation
• Proteus Simulator
• 32 processors
• Regular cache
– Direct mapped, 2048 8-byte lines
• Transactional cache
– Fully associative, 64 8-byte lines
•
•
•
•
Single cycle caches access
4 cycle memory access
Both snoopy bus and directory are simulated
2 stage network with switch delay of 1 cycle each
Benchmarks
• Counter
– n processors, each increment a shared counter (2^16)/n times
• Producer/Consumer buffer
– n/2 processors produce, n/2 processor consume through a shared
FIFO
– end when 2^16 items are consumed
• Doubly-linked list
–
–
–
–
N processors tries to rotate the content from tail to head
End when 2^16 items are moved
Variables shared are conditional
Traditional locking method can introduce deadlock
Comparisons
• Competitors
–
–
–
–
–
Transactional memory
Load-locked/store-cond (Alpha)
Spin lock with backoff
Software queue
Hardware queue
Counter Result
Producer/Consumer Result
Doubly Linked List Result
Conclusion
• Avoid extra lock variable and lock problems
• Trade dead lock for possible live
lock/starvation
• Comparable performance to lock technique
when shared data structure is small
• Relatively easy to implement
Download