The Complexity of Transactional Memory & What to Do About It Hagit Attiya Technion & EPFL The Challenge of Concurrent Programming A multi-core revolution is underway Exploit the power of concurrent computing, by restructuring applications Writing concurrent applications is harder than sequential programming Transactional Memory (TM) A way to deal with the difficulty of writing concurrent applications. In its simplest form, just wrap code begin / end transaction TM synchronizes memory accesses so that each transaction seems to execute sequentially and in isolation begin-transaction --------------------------------------------------------end transaction A Brief History of TM TM originally suggested as hardware platform [Herlihy and Moss 1993] Software transactional memory (STM), essentially optimized multi-word synchronization (static) [Shavit & Touitou 1995] Popularization in the programming languages & architecture communities [Rajwar 2002] First made dynamic only with a weaker liveness condition (obstruction-freedom) [Herlihy, Luchnagco, Moir and Schrer 2003] The Promise TM will track memory accesses and will allow transactions to proceed concurrently, if they are not conflicting Optimism begin-transaction end-transaction vs. pessimism lock (entry) unock (exit) 2-3 Levels of Abstraction Transactions, each a sequence of operations accessing data items, by a single thread Operations – on data items: E.g., Read and Write read – TryCommit / TryAbort Data set = Read set Write set Primitives on base objects (load, store, CAS) read write tryC More Modeling Data representation for transactions and data items using base objects Algorithms for operations, applying primitives to base objects – load, store, CAS, DCAS Asynchronous processes invoke these procedures Lead to interleaved executions, in the standard sense STM ----------------- Safety Serializability: transactions appear to execute sequentially Strict serializability: preserves the order of non-overlapping transactions [Papadimitriou 1979] Opacity: even transactions that later abort are (strictly) serializable [Guerraoui, Kapalka POPL 2008] – Also support for operations other than read and write. Snapshot isolation snapshot isolation serializability strict serializability opacity The Many Faces of Progress TM may abort transactions, in case of conflicts Could admit trivial implementations Several progress properties When locking is not allowed: • Wait-freedom • Obstruction-freedom Progress for Lock-Based TM Better performance with locks [Dice, Shalev, Shavit DISC 2006] Weakly progressive: a transaction aborts only if it has conflicts [Guerraoui, Kapalka POPL 2009] Strongly progressive: at least one of the transactions involved in the conflict commits Minimally progressive: a transaction commits if it runs alone, with no pending transactions Multi-version permissive: only an update transaction that conflicts with another update transaction aborts [Perlman, Fan, Keidar PODC 2010] Read-only transactions always commit multi-valued permissive wait free strongly progressive obstruction free weakly progressive minimally progressive The Consensus Number of TM Minimally progressive TMs solve consensus for at most two processes [Guerraoui, Kapalka SPAA 2008] Their consensus number is 2 Holds for obstruction-free and weakly progressive Key step: equivalence with a consensus object that fails in a very clean manner [A, Guerraoui, Hendler, Kuznetsov PODC 2006] propose decide(v) / fail Invisible Reads Optimize read-only transactions, which in principle, need not modify the shared memory Invisible reads: Read operations do not store Read-only transactions do not store at all Semi-visible read operations store some information, but not very detailed E.g., [Dice, Matveev, Shavit Transact 2010] Oblivious STM [A & Hillel DISC 2010] Step Complexity Lower Bound A read operation has O( | read set | ) step complexity, in an STM that is – single version – with invisible reads – weakly progressive [Guerraoui, Kapalka PPoPP 2008] Predicting TM Scalability Unrelated transactions progress independently even if they are concurrent Represent relations between transactions by a conflict graph: – Vertices represent transactions, – Edges connect transactions that share a data item T1{A,B,C}, T2{A,D}, T3{D,E}, T4{F,L}, T5{L}, T6{J} T2 T1 T4 T3 T5 T6 Disjoint access transactions are not connected in the graph Strictly disjoint access transactions are not adjacent Disjoint Access Parallelism TM is DAP: Two transactions concurrently contend on the same base object, only if they are not disjoint-access access the same base object, at least one a store ~ [Israeli and Rappoport PODC 1995] Similar definition for strict DAP T2 T1 T4 T3 T5 T6 Achieving Disjoint-Access Parallelism No obstruction-free and strict DAP STM [Guerraoui, Kapalka 2008] But there is obstruction-free and DAP STM [Herlihy, Luchnagco, Moir and Schrer 2003] Not if read-only transactions are invisible and always succeed to commit [A, Hillel, Milani SPAA 2009] Achieving DAP A read-only transaction have O( | read set | ) stores when the STM is – MV-permissive (read-only transactions commit) – DAP [A, Hillel, Milani SPAA 2009] Holds for strict serializability and opacity Also for serializability and snapshot isolation (under a slightly stronger notion of DAP) Privatization Apply loads and stores to the underlying data (un instrumented access) Avoids transactional overhead [Spear, Marathe, Dalessandro, Scott 2007] [Shpeisman, Menon, Adl-Tabatabai, Balensiefer, Grossman, Hudson, Moore, Saha 2007] STM Cost of Privatization Cannot be achieved without prior privatization [Guerraoui, Henzinger, Kapalka, Singh SPAA 2010] [A, Hillel DISC 2010] Must invoke a privatizing transaction or a privatizing barrier [Dice, Matvev Shavit Transact 2010] Unless parallelism is reduced or detailed information is kept, privatization cost is linear in the number of privatized items [A, Hillel DISC 2010] STM And a few more results… So, In Theory TM cannot efficiently provide clean semantics either weaken the consistency semantics or compromise the progress guarantees Limited scalability & significant cost TM is not an expressive programming idiom But In Practice, We are Fine, No? Not really… Worst-case lower bounds are not for corner cases – likely to happen in practice – hard to program around them Implementation-focused research seems to be hitting the same wall [Cascaval, Blundell, Michael, Cain, Wu, Chiras, Chatterjee 2008] Design choices compromise either simplicity – Elastic STM [Felber, Gramoli, Guerraoui, DISC 2009] Or scalability – Single-lock STMs [Olszewski, Cutler, Steffan] [Dalessandro, Spear, Scott] A Post-TM Era TM cannot make programs run correctly and efficiently, without programmer’s awareness Stop hiding the realities of concurrency • Expose a cleaner model of a multi-core that does not hide tradeoffs • Provide additional methodologies and tools Multitude of approaches – I will discuss two Approach I: Optimizing Coarse-Grain Programming For applications with moderate amount of contention (say <32 threads), the overhead of managing the memory can outweigh synchronization cost Access the data mostly “in exclusion” Combining: The thread winning the lock carries out many of the pending operations [Hendler, Incze, Shavit, Tzafrir SPAA 2010] Without locking: optimize the memory utilization of Herlihy's universal construction [Chuong, Ellen, Ramachandran SPAA 2010] Approach II: Programming with Mini-Transactions Extension of DCAS or kCAS (for small k’s) or multi-location variant of LL/SC [PowerPC, DEC Alpha] – – – – Short Works on a small, static data set Simple functionality No I/O, out-of-core memory accesses, etc. May fail spuriously Mini-Transactions Lower bounds use • large, dynamic data sets Mini-transactions • small, static data sets • long transactions • short transactions • accessed w/ arbitrary operations and unrestricted calculations • simple functionality, e.g., arithmetic, comparison, and memory access Mini-Transactions & HTM Mini-transaction are almost provided by recent hardware TM proposals – AMD Advanced Synchronization Facility [2009] – Sun [Chaudhry, Cypher, Ekman, Karlsson, Landin, Yip, Zeffer, and Tremblay Micro 2009] Best-effort: transactions can be aborted for reasons other than conflicts – TLB misses, interrupts, certain function-call sequences, division instructions Algorithmic Challenges • Mini-transactions provide a significant handle on the difficult task of writing concurrent applications – DCAS is already a big help [A, Hillel, 2006, 2009] – Experience with hardware TM support [Dice, Lev, Marathe, Moir, Olszewski, Nussbaum SPAA 2010] [Carouge, Spear, DISC 2010] • Design algorithms accommodating the best-effort nature of mini-transactions • Avoid sure killers • Work around the small data sets – amorphous data parallelism [Pingali, Kulkarni, Nguyen, Burtscher, Mendez-Lojo, Prountzos, Sui, Zhong 2009] Programming Support Creating patterns for employing minitransactions, hopefully, encapsulated within programming language support Cleanly combine with native (un instrumented) access to the locations accessed by mini-transactions – Beware of privatization scenarios Summary • Facilitate the design of efficient and correct concurrent applications, in the post-TM era. – Capitalize on lessons learned and wide interest in TM – Multitude of approaches • Specifically, develop a model, algorithms and programming patterns that for best-effort minitransactions Thank you!