מצגת של PowerPoint

advertisement
The Complexity of
Transactional Memory
& What to Do About It
Hagit Attiya
Technion & EPFL
The Challenge of Concurrent
Programming
A multi-core revolution is underway
Exploit the power of concurrent computing,
by restructuring applications
Writing concurrent applications is harder
than sequential programming
Transactional Memory (TM)
A way to deal with the difficulty of
writing concurrent applications.
In its simplest form, just wrap code
begin / end transaction
TM synchronizes memory accesses so
that each transaction seems
to execute sequentially
and in isolation
begin-transaction
--------------------------------------------------------end transaction
A Brief History of TM
TM originally suggested as hardware platform
[Herlihy and Moss 1993]
Software transactional memory (STM), essentially
optimized multi-word synchronization (static)
[Shavit & Touitou 1995]
Popularization in the programming languages &
architecture communities [Rajwar 2002]
First made dynamic only with a weaker liveness
condition (obstruction-freedom)
[Herlihy, Luchnagco, Moir and Schrer 2003]
The Promise
TM will track memory accesses and will
allow transactions to proceed
concurrently, if they are not conflicting
Optimism
begin-transaction
end-transaction
vs.
pessimism

lock (entry)
unock (exit)
2-3 Levels of Abstraction
Transactions, each a sequence of
operations accessing data items,
by a single thread
Operations
– on data items: E.g., Read and Write read
– TryCommit / TryAbort
Data set = Read set  Write set
Primitives on base objects
(load, store, CAS)
read
write
tryC
More Modeling
Data representation for transactions
and data items using base objects
Algorithms for operations, applying
primitives to base objects
– load, store, CAS, DCAS
Asynchronous processes invoke these
procedures
Lead to interleaved executions,
in the standard sense
STM
-----------------
Safety
Serializability: transactions appear to
execute sequentially
Strict serializability: preserves the order of
non-overlapping transactions
[Papadimitriou 1979]
Opacity: even transactions that later
abort are (strictly) serializable
[Guerraoui, Kapalka POPL 2008]
– Also support for operations other than
read and write.
Snapshot isolation
snapshot
isolation
serializability
strict
serializability
opacity
The Many Faces of Progress
TM may abort transactions, in case of
conflicts
Could admit trivial implementations
Several progress properties
When locking is not allowed:
• Wait-freedom
• Obstruction-freedom
Progress for Lock-Based TM
Better performance with locks [Dice, Shalev, Shavit DISC 2006]
Weakly progressive: a transaction aborts only if it has
conflicts [Guerraoui, Kapalka POPL 2009]
Strongly progressive: at least one of the transactions
involved in the conflict commits
Minimally progressive: a transaction commits if it runs
alone, with no pending transactions
Multi-version permissive: only an update transaction that
conflicts with another update transaction aborts
[Perlman, Fan, Keidar PODC 2010]
 Read-only transactions always commit
multi-valued
permissive
wait
free
strongly
progressive
obstruction
free
weakly
progressive
minimally
progressive
The Consensus Number of TM
Minimally progressive TMs solve consensus for at
most two processes [Guerraoui, Kapalka SPAA 2008]
 Their consensus number is 2
Holds for obstruction-free and weakly progressive
Key step: equivalence with a consensus object that
fails in a very clean manner
[A, Guerraoui, Hendler, Kuznetsov PODC 2006]
propose
decide(v) / fail
Invisible Reads
Optimize read-only transactions, which in
principle, need not modify the shared memory
Invisible reads: Read operations do not store
 Read-only transactions do not store at all
Semi-visible read operations store some
information, but not very detailed
E.g., [Dice, Matveev, Shavit Transact 2010]
 Oblivious STM [A & Hillel DISC 2010]
Step Complexity Lower Bound
A read operation has O( | read set | ) step
complexity, in an STM that is
– single version
– with invisible reads
– weakly progressive
[Guerraoui, Kapalka PPoPP 2008]
Predicting TM Scalability
Unrelated transactions progress independently even if they
are concurrent
Represent relations between transactions by a conflict
graph:
– Vertices represent transactions,
– Edges connect transactions that share a data item
T1{A,B,C}, T2{A,D}, T3{D,E},
T4{F,L}, T5{L}, T6{J}
T2
T1
T4
T3
T5
T6
Disjoint access transactions are not connected in the graph
Strictly disjoint access transactions are not adjacent
Disjoint Access Parallelism
TM is DAP: Two transactions concurrently
contend on the same base object, only if
they are not disjoint-access
access the same
base object, at
least one a store
~ [Israeli and Rappoport PODC 1995]
Similar definition for strict DAP
T2
T1
T4
T3
T5
T6
Achieving Disjoint-Access
Parallelism
No obstruction-free and strict DAP STM
[Guerraoui, Kapalka 2008]
But there is obstruction-free and DAP STM
[Herlihy, Luchnagco, Moir and Schrer 2003]
Not if read-only transactions are invisible
and always succeed to commit
[A, Hillel, Milani SPAA 2009]
Achieving DAP
A read-only transaction have O( | read set | ) stores
when the STM is
– MV-permissive (read-only transactions commit)
– DAP
[A, Hillel, Milani SPAA 2009]
Holds for strict serializability and opacity
Also for serializability and snapshot isolation
(under a slightly stronger notion of DAP)
Privatization
Apply loads and stores to the
underlying data
(un instrumented access)
Avoids transactional overhead
[Spear, Marathe, Dalessandro, Scott 2007]
[Shpeisman, Menon, Adl-Tabatabai, Balensiefer,
Grossman, Hudson, Moore, Saha 2007]
STM
Cost of Privatization
Cannot be achieved without prior privatization
[Guerraoui, Henzinger, Kapalka, Singh SPAA 2010]
[A, Hillel DISC 2010]
Must invoke a privatizing transaction or
a privatizing barrier
[Dice, Matvev Shavit Transact 2010]
Unless parallelism is reduced or detailed
information is kept, privatization cost is
linear in the number of privatized items
[A, Hillel DISC 2010]
STM
And a few more results…
So, In Theory
TM cannot efficiently provide clean semantics
either weaken the consistency semantics
or compromise the progress guarantees
Limited scalability & significant cost
TM is not an expressive programming idiom
But In Practice, We are Fine, No?
Not really…
Worst-case lower bounds are not for corner cases
– likely to happen in practice
– hard to program around them
Implementation-focused research seems to be hitting the
same wall
[Cascaval, Blundell, Michael, Cain, Wu, Chiras, Chatterjee 2008]
Design choices compromise either simplicity
– Elastic STM [Felber, Gramoli, Guerraoui, DISC 2009]
Or scalability
– Single-lock STMs
[Olszewski, Cutler, Steffan] [Dalessandro, Spear, Scott]
A Post-TM Era
TM cannot make programs run correctly and
efficiently, without programmer’s awareness
Stop hiding the realities of concurrency
• Expose a cleaner model of a multi-core that
does not hide tradeoffs
• Provide additional methodologies and tools
Multitude of approaches
– I will discuss two
Approach I: Optimizing
Coarse-Grain Programming
For applications with moderate amount of contention
(say <32 threads), the overhead of managing the
memory can outweigh synchronization cost
Access the data mostly “in exclusion”
Combining: The thread winning the lock carries out many
of the pending operations
[Hendler, Incze, Shavit, Tzafrir SPAA 2010]
Without locking: optimize the memory utilization of
Herlihy's universal construction
[Chuong, Ellen, Ramachandran SPAA 2010]
Approach II: Programming
with Mini-Transactions
Extension of DCAS or kCAS (for small k’s)
or multi-location variant of LL/SC
[PowerPC, DEC Alpha]
–
–
–
–
Short
Works on a small, static data set
Simple functionality
No I/O, out-of-core memory accesses, etc.
May fail spuriously
Mini-Transactions
Lower bounds use
• large, dynamic data
sets
Mini-transactions
• small, static data sets
• long transactions
• short transactions
• accessed w/ arbitrary
operations and
unrestricted
calculations
• simple functionality,
e.g., arithmetic,
comparison, and
memory access
Mini-Transactions & HTM
Mini-transaction are almost provided by
recent hardware TM proposals
– AMD Advanced Synchronization Facility [2009]
– Sun [Chaudhry, Cypher, Ekman, Karlsson, Landin, Yip,
Zeffer, and Tremblay Micro 2009]
Best-effort: transactions can be aborted for
reasons other than conflicts
– TLB misses, interrupts, certain function-call
sequences, division instructions
Algorithmic Challenges
• Mini-transactions provide a significant handle on the
difficult task of writing concurrent applications
– DCAS is already a big help [A, Hillel, 2006, 2009]
– Experience with hardware TM support
[Dice, Lev, Marathe, Moir, Olszewski, Nussbaum SPAA 2010]
[Carouge, Spear, DISC 2010]
• Design algorithms accommodating the best-effort
nature of mini-transactions
• Avoid sure killers
• Work around the small data sets
– amorphous data parallelism [Pingali, Kulkarni, Nguyen,
Burtscher, Mendez-Lojo, Prountzos, Sui, Zhong 2009]
Programming Support
Creating patterns for employing minitransactions, hopefully, encapsulated
within programming language support
Cleanly combine with native (un
instrumented) access to the locations
accessed by mini-transactions
– Beware of privatization scenarios
Summary
• Facilitate the design of efficient and correct
concurrent applications, in the post-TM era.
– Capitalize on lessons learned and wide interest in TM
– Multitude of approaches
• Specifically, develop a model, algorithms and
programming patterns that for best-effort minitransactions
Thank you!
Download