Lowering the Overhead of Software Transactional Memory

advertisement
Lowering the Overhead
of Software Transactional
Memory
Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat,
William N. Scherer III, Michael L. Scott
Featuring:
RSTM – low overhead STM library for C++
Presenting: Yosef Etigin
What is this paper about?
 Design and implementation of RSTM.
 RSTM is meant to be a fast STM library for C++
multi-threaded programs.
 RSTM main features:
 Cache-optimized metadata organization.
 No memory allocations during runtime, except
for cloning objects.
 Use a contention manager to tune performance.
 Allow different strategies: eager/lazy acquire,
visible/invisible readers.
Where RSTM fits in?
 Requires atomic load/store and CAS in hardware.
 Provides C++ “Smart Pointers” API that can be
used to safely access shared data within
transactions.
User application
beginTx { openRO, openRW } endTx
RSTM Library
HW: atomic Load & Store, CAS
Overview





RSTM Theory
 Transaction Semantics
 Readers
 Writers
RSTM Design
 Descriptor
 Data Object
 Shared Object Handle
RSTM Implementation
 Resolving the data object
 Open for read-only
 Acquire
 Open for read-write
 Commit
 Abort
Performance results
Conclusion
Transaction Semantics
 Data is considered in object granularity.
 Objects are shadowed, rather than changed “in





place”.
Inside a transaction, objects may be opened for
read-only or for read-write.
Objects that are opened for read-write are cloned,
and those for read-only are not.
“Commit” tries to set the clone as the current object.
“Abort” tries to set the original as the current object.
Transactions may abort each other, but they consult
the Contention Manager (CM) before doing so.
Readers
 A thread that opens an object for reading may




become a “visible” or “invisible” reader.
“visible” = visible to writers.
Reader must have a consistent view of its opened
objects.
“consistent” = no writer has made a change that the
reader sees only in some of its opened objects.
Inconsistency might cause hardware exceptions and
infinite loops, thus:


Invisible reader, on every “open”, must validate all
previously opened objects (O(n2) cost).
Visible reader must be explicitly aborted by a writer
that acquired it.
Writers
 Opening an object for writing involves “acquiring” it.
 Acquiring is getting exclusive access to the object.
Writers conflict with other writers and with visible readers.
 Visible readers can co-exist with each other.
 Acquiring can be done in eager or lazy fashion:
 Eager – acquire an object as soon as it’s opened.
 Lazy – acquire it prior to committing the transaction.
 Eager acquire aborts doomed transactions immediately, but
causes more conflicts.
 Lazy acquire enables readers to run together with a writer that
is not committing yet.
 Has the same consistency issue as with invisible reads.

Contention Management
 CM is a Thread-local object
 Notified of transaction events
 Decides what to do on a conflict:


Abort a transaction or spin-wait
Which transaction to abort, if any
 For instance: “Polka” CM

Prefers writers over readers
Overview





RSTM Theory
 Transaction Semantics
 Readers
 Writers
RSTM Design
 Descriptor
 Data Object
 Shared Object Handle
RSTM Implementation
 Resolving the data object
 Open for read-only
 Acquire
 Open for read-write
 Commit
 Abort
Performance results
Conclusion
RSTM Design
Shared Object Handle
header
visible
readers
Data Object
(New)
owner next
Data Object
(Old)
Descriptor
(writer)
Descriptor
(reader)
Thread 1
Descriptor
(reader)
Thread 2
Thread 3
Descriptor
 Each thread has a static descriptor that is
used for all transactions of this thread.

Don’t support nested transactions
 Descriptor has:


Status: ACTIVE / COMMITTED / ABORTED
Lists of opened objects:


Visible, invisible reads.
Eager, lazy writes.
Data Object
 Shared objects hold, in addition to data fields,
“owner” and “next” fields.
 Owner is the descriptor of the current writer
thread, if any.
 Next is the original object, if this is a writermade clone.
Shared Object Handle (1)
 Encapsulates a reference to a shared object.


Global variables are handles rather than
pointers.
Direct pointers are obtained within a
transaction, via “open”.
 Holds:


“header” word - identifies the current version
of the object.
“visible readers” word – bitmap of the visible
readers.
Shared Object Handle (2)
 The header is a single word that holds a
pointer and a dirty bit.

Take advantage of address alignment
 The pointer holds some data object “pObj”.
 The dirty bit tells whether “pObj” is a clean
object, or a writer-made clone.
 Saves one dereference in the common case
of non-conflicting access.
Shared Object Handle (3)
 “Visible readers” is a bitmap of the visible
readers.
 Bit i of the mask is set if thread i is a visible
reader of the object.
 Allows getting all readers or adding a reader
in a single atomic operation.
 Limits the number of visible readers

All others will be invisible
Overview





RSTM Theory
 Transaction Semantics
 Readers
 Writers
RSTM Design
 Descriptor
 Data Object
 Shared Object Handle
RSTM Implementation
 Resolving the data object
 Open for read-only
 Acquire
 Open for read-write
 Commit
 Abort
Performance results
Conclusion
RSTM Implementation
 This section will provide pseudo-code for the
most important STM operations:




Open object for read-only
Open object for read-write
Commit
Abort
 We present pseudo-code for methods of
Descriptor class, which is the object that
implements RSTM functionality.
Resolving the Data Object
// This function returns the up-to-date data object, associated with
// a handle. If the object has an active owner, call CM.
Object *Descriptor::resolve(Handle *shared)
{
long snapshot = shared->header;
Object *ptr = snapshot & ~1;
// mask out LSB
if (snapshot & 1) {
// dirty
switch (ptr->owner->m_status) {
case ACTIVE:
m_cm.handleConflict(this, ptr->owner);
return NULL;
case COMMITTED:
return ptr;
case ABORTED:
return ptr->next;
}
} else {
// clean
return ptr;
}
}
Open for Read-Only
// Open an object for read-only
Object *Descriptor::openRO(Handle *shared)
{
long headerSnapshot = shared->header;
// find the data object
Object *ptr;
do {
ptr = resolve(shared);
} while (!ptr);
if (m_isVisible) {
m_visibleReads.add(shared);
// install this tx as a visible reader of the object
while (! CAS(&shared->readers, shared->readers,
shared->readers | (1 << m_id)) );
// make sure no writer acquired this object before he could see the CAS above
if (headerSnapshot != shared->header)
abort();
} else {
m_invisibleReads.add(shared);
}
validate();
return ptr;
}
Open for Read-Write
// Open an object for read-write
Object *Descriptor::openRW(Handle *shared)
{
// find the data object
Object *ptr;
do {
ptr = resolve(shared);
} while (!ptr);
// make a writeable clone
Object *clone = ptr->clone();
clone->owner = this;
clone->next = ptr;
// eager acquires now. lazy acquires later.
if (m_isEager) {
acquire(shared, clone);
m_eagerWrites.add(shared, clone);
} else {
m_lazyWrites.add(shared, clone);
}
validate();
return clone;
}
Acquire
// acquire the object
void Descriptor::acquire(Handle *shared, Object *clone)
{
// replace the header with a dirty reference to the clone
if (!CAS( &shared->header, shared->header, (long)clone | 1))
abort();
// abort all visible readers
for (i = 0; i < sizeof(shared->readers) * 8; ++i) {
if (shared->readers & (1 << i))
allDescriptors[i]->abort();
}
// record this object for cleanup
m_acquiredObjects.add(<shared, clone>);
}
Commit
// commit a transaction
void Descriptor::onCommit()
{
validate();
// acquire now lazily opened-for-rw objects
acquireLazyWrites();
// if this CAS succeeds our clones (if any) become the active objects
CAS( &m_status, ACTIVE, COMMITTED );
if (m_status == COMMITTED) {
// replace a dirty reference to our clone
// with a clean reference to our clone
for (<shared, clone> in m_acquiredObjects) {
CAS( &shared->header, clone | 1, clone );
}
for (Shared *shared in m_visibleReads) {
while (!CAS( &shared->readers, shared->readers,
shared->readers & ~(1 << m_id)) );
}
} else {
abort();
}
}
Linearization Point
Abort
// called when “Aborted” exception is caught
void Descriptor::onAbort()
{
// after this CAS, our clones (if any) are discarded
CAS( &m_status, ACTIVE, ABORTED );
// cleanup the written objects
// replace a dirty reference to our clone
// with a clean reference to the original object
for (<shared, clone> in m_acquiredObjects) {
CAS( &shared->header, clone | 1, clone->next );
}
// remove the thread from readers bitmap of all
// visibly opened objects
for (Shared *shared in m_visibleReads) {
while (!CAS( &shared->readers, shared->readers,
shared->readers & ~(1 << m_id)) );
}
}
Overview





RSTM Theory
 Transaction Semantics
 Readers
 Writers
RSTM Design
 Descriptor
 Data Object
 Shared Object Handle
RSTM Implementation
 Resolving the data object
 Open for read-only
 Acquire
 Open for read-write
 Commit
 Abort
Performance results
Conclusion
Performance Results (1)
 Compare ASTM and RSTM (previous work showed





that ASTM outperforms DSTM and OSTM).
Platform: 16-processor SunFire 6800 at 1.2GHz.
Use several benchmarks with different configurations:
visible/invisible readers, eager/lazy writers.
Each benchmark was run for 10 seconds with 1 to 28
threads.
Contention manager: “Polka”.
Count successful transactions.
Performance Results (2)
• RSTM with invisible readers is ~2 times better than ASTM.
• Visible readers are expensive because each access reads the root
node and causes cache invalidation.
• The only difference between C++ ASTM and RSTM is metadata
organization.
Performance Results (3)
• In LinkedList, FGL performs bad if #threads > #CPUs due to preemption.
• In LinkedList, ASTM outperforms RSTM since each writer invalidates
objects for many readers.
• HashTable allows great concurrency, so RSTM works well (~3 times
faster than ASTM).
Performance Results (4)
• In RandomGraph and LFUCache, all STM’s perform worse than CGL,
because these data structures do not allow much concurrency.
• Nevertheless, RSTM beats ASTM.
Conclusion
 RSTM has a novel metadata organization which
reduces overhead, due to:


One level of indirection instead of the common two.
Using static instead of dynamic data structures.
 RSTM provides a variety of policies for conflict
detection, so can be customized for a given
workload.
 Compared to ASTM, RSTM gives better performance
due to better metadata organization.
Download