An Introduction to Software Transactional Memory Alessia Milani Labri, Bordeaux Popularizing Concurrent Programming • A multi-core revolution is underway • Exploit the power of concurrent computing, by restructuring applications • Devise scalable concurrent programs is hard…unless good abstractions : ☞ Transaction 2 Transaction A transaction is a sequence of operations by a single process on a set of shared data items (transactional objects) that ends Either by committing : all of its updates take effect atomically or by aborting : has no effect (typically restarted) 3 Transactional Memory (TM) • To simplify : Just wrap (sequential) code in begin / end transaction • TM synchronizes memory accesses so that each transaction seems to execute sequentially and in isolation begin-transaction --------------------------------------------------------end-transaction 4 Implementing Transactional Memory • TM was originally suggested as hardware platform [Herlihy and Moss 1993] – HTM is in today hardware platforms, e.g. Intel, IBM, Sun • Purely in software : – First Software Transactional Memory (STM) only for static transactions [Shavit & Touitou 1995] – First dynamic STM [Herlihy, Luchnagco, Moir and Schrer 2003] • Hybrid schemes (HyTM) that combine hardware and software [Moir et al. 2006] Implementing Transactional Memory in Software (STM) • Data representation for transactions and data items using base objects • Algorithms for operations on data items, applying primitives to base objects – registers, CAS, DCAS begin-transaction read read … Algorithms base objects write TryCommit end-transaction Asynchronous processes execute these algorithms to execute the operations of the transactions 6 3 levels of abstractions • Transaction read • Operations read write tryC – on data items: E.g., read and write – tryCommit / tryAbort • Primitives on base objects (registers, CAS…) 7 STM algorithms Main Techniques 8 Back to TM Consistency Serializability: committed transactions appear to execute sequentially begin-Tx Commit write read write read TryC begin-Tx read Commit write begin-Tx write read Commit read write read TryC TryC begin-Tx read Commit write read TryC Back to TM Consistency Serializability: committed transactions appear to execute sequentially Strict serializability: also preserves the order of non-overlapping transactions [Papadimitriou 1979] Opacity: even transactions that later abort are (strictly) serializable [Guerraoui, Kapalka POPL 2008] Much more … serializability strict serializability opacity Conflicts begin-Tx p1 Commit Read(x)0 Write(x)1 begin-Tx p2 Read(x)0 begin-Tx p1 Commit Write(x)2 Commit Read(x)0 Write(y)1 begin-Tx p2 Read(y)0 Commit Write(x)1 • Two concurrent transactions have a conflict if they access the same data item and at least one of these accesses is a Write operation. • Two transactions that cannot be serialized have a conflict. (The converse is not true) 11 Design approaches • Deferred/Direct updates : operate on local copies of the data items & install changes at commit/ Modify in place, roll back on abort • Detect conflicts – Commit time – Encountering time • Resolve a conflict either by aborting one of the conflicting transactions or by waiting/helping it to complete – This depends on the progress you want – Contention manager 12 Contention Manager Strategies • Priority to – Oldest? – Most work? … • None Dominates • Lots of empirical work but formal work in infancy 13 Progress for TM • Lock-free TM – Wait-freedom : each non-faulty process completes (successfully) its transaction within a finite number of steps – Obstruction-freedom : a process running solo eventually commits its transaction • Lock-Based TM – Weakly progressive: a transaction aborts only if it has conflicts [Guerraoui, Kapalka POPL 2009] – Strongly progressive: at least one of the transactions involved in the conflict commits – Multi-version permissive: only writing transaction that conflicts with another writing transaction aborts [Perelman, Fan, Keidar PODC 2010] Read-only transactions always commit STM algorithms Two Case Studies 15 A lock-based STM : TL2 [Dice et al. DISC 2006] • Each data item is associated with a version number – TL2 relies on a global versioning clock • Transaction keeps – Read set: data items & values read – Write set: data items & values to be written • Deferred update – Changes installed at commit • Lazy conflict detection – Conflicts detected at commit 16 Read-Only Transactions Mem Locks Copy version clock to local read version clock RV 12 32 56 19 100 100 17 Shared Version Clock Private Read Version (RV) 17 Read-Only Transactions Mem Locks 12 32 56 Copy version clock to local read version clock Each read operation is postvalidated checking the lock and version # of the corresponding memory location 19 100 100 17 Shared Version Clock Private Read Version (RV) 18 Read-Only Transactions Mem Locks Copy version clock to local readversion version#,clock Read lock, and 32 memory, check versionfails. # less COMMIT if no post-validation thanasread clock Otherwise ABORT soon as one 56 Read fails 12 19 100 100 17 Shared Version Clock Private Read Version (RV) 19 Read-Only Transactions Mem Locks 12 32 56 We have taken a snapshot without keeping an explicit read set! 19 100 100 17 Shared Version Clock Private Read Version (RV) 20 Example Execution: Read Only Trans Mem Locks 87 87 0 34 34 34 00 88 88 0 V# 99 99 0 44 44 0 50 50 V# 0 100 Shared Version Clock 1. RV Shared Version Clock 2. On Read: read lock, read mem, read lock: check unlocked, unchanged, and v# <= RV 3. Commit. Reads form a snapshot of memory. No read set! 100 RV Writing Transactions Mem Locks Copy version clock to local read version clock RV 12 32 56 19 100 100 17 Shared Version Clock Private Read Version (RV) 22 Writing Transactions Mem Locks 12 32 56 Copy version clock to local read version clock On read/write, check: Unlocked & version # < RV Add to R/W set 19 100 100 17 Shared Version Clock Private Read Version (RV) 23 On Commit Mem Locks Acquire write locks 12 32 56 19 100 17 Shared Version Clock 100 Private Read Version (RV) 24 On Commit Mem Locks 12 Acquire write locks Increment Version Clock 32 56 19 100 100 101 17 Shared Version Clock Private Read Version (RV) Art of Multiprocessor Programming 25 On Commit Mem Locks 12 32 Acquire write locks Increment Version Clock Check version numbers ≤ RV 56 19 100 100 101 17 Shared Version Clock Private Read Version (RV) Art of Multiprocessor Programming 26 On Commit Mem Locks 12 x 32 Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory 56 19 100 101 y 17 Shared Version Clock 100 Private Read Version (RV) 27 On Commit Mem Locks 12 x 32 101 56 Acquire write locks Increment Version Clock Check version numbers ≤ RV Update memory Update write version #s 19 100 101 y 17 101 Shared Version Clock 100 Private Read Version (RV) 28 Example: Writing Transaction Mem X X Y Y Locks 121 120 100 87 87 87 0 00 121 34 34 121 0 00 1 88 88 0 0 V# 121 99 121 0 0 10 44 44 0 0 50 V# 50 V# 50 50 0 0 00 Commit Shared Version Clock 1. RV Shared Version Clock 2. On Read/Write: check unlocked and v# <= RV then add to Read/Write-Set 3. Acquire Locks 4. WV = F&I(VClock) 5. Validate each v# <= RV 6. Release locks with v# WV 100 RV 29 A lock-free STM : Dynamic Software Transactional Memory • Proposed in [Herlihy et al. DISC 2003] – Opacity & obstruction freedom • Transaction keeps – Read set: data items & values read • Direct update – Changes installed when the corresponding Write is executed • Eager conflict detection – Conflicts detected at encountering time 30 DSTM : transaction and transactional object representation • A transaction has – a status field that is initialized to be ACTIVE, and it is later COMMITED or ABORTED using a CAS primitive – a readlist to store the data items read together with the values read • Each transactional object has the following structure transaction start TMObject status new object old object Data Locator Data Status of the transaction that most recently accessed the object to write it Current object version • The current object version is determined by the status of the transaction that most recently accessed the object to WRITE : – committed: the new object is the current – aborted: the old object is the current – active: the old object is the current, and the new is tentative • The actual version only changes when a commit is successful Write operation : example • Transaction A tries to write object o. Let B be the transaction that most recently accessed o to WRITE it committed transaction start o 4 Use CAS in order to replace locator 3 1 Data new object old object Data B’s Locator transaction active new object old object If CAS fails, A restarts A’s Locator from the beginning A sets old aobject to the creates new Locator previous new Data 2 A copies the previous new object, and sets new copy Which is the current version of the object if B is active? • A and B are conflicting transactions, that run at the same time • Use Contention Manager to decide which should continue and which should abort • If B needs to abort, try to change its status to aborted (using CAS) Read operation • To read object o by a transaction A – Fetch the current version v just as before – Add the pair (o, v) to the read set of A Validating a transaction • Before returning the value either read or written, check consistency • For each pair (o,v) in the read set, verify that v is still the most recently committed version of the transactional object o. • Check that the status of the transaction is still ACTIVE Committing a transaction • The commit needs to do the following: 1. Validate the transaction 2. Change the transaction’s status from active to committed (using CAS) That’s it? You are here Elastic Txs Irrevocable transactions Multiversioning Privatization Distributed STM Nested transactions Lower bounds More references and credits Many of these slides are (largely inspired) from – The slides of “The Art of Multiprocessor Programming” by Maurice Herlihy and Nir Shavit – A PODC 2010 talk by Hagit Attiya – Teaching slides by Danny Hendler Other reference : – Transactional Memory,Foundations, Algorithms, Tools, and Applications. COST Action Euro-TM IC1001. Lecture Notes in Computer Science, Springer 2014.