SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM

advertisement
dsdsdddddd
SOFTWARE TRANSATIONAL MEMORY (STM)
ALGORITHM
By: Rania Briq
Supervisor: Dima Perelman
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Contents
Introduction: ................................................................................................................................ 2
Software Transactional Memory:................................................................................................. 4
Example program: .................................................................................................................. 7
Project goal: ................................................................................................................................ 9
Algorithm Overview: .................................................................................................................... 9
Algorithm Description: ............................................................................................................... 10
Implementation ......................................................................................................................... 19
Algorithm Design: ..................................................................................................................... 21
Evaluation and results............................................................................................................... 24
Analysis: ................................................................................................................................... 25
Project benefits and knowledge acquired .................................................................................. 26
Appendix................................................................................................................................... 27
References: .............................................................................................................................. 28
figure 1 ........................................................................................................................................................ 16
figure 2 ........................................................................................................................................................ 17
figure 3 ........................................................................................................................................................ 18
figure 4 ........................................................................................................................................................ 20
figure 5 ........................................................................................................................................................ 24
figure 6 ........................................................................................................................................................ 25
1
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Introduction:
Multi-core architecture has become the norm in current-generation computers. It
has become present not only in personal computers, but also in most cutting
edge new smart phones and tablets. These computers however will not run
faster if we don't write parallel programs to exploit them. In order to take
advantage of multi core architecture, programs need to be made multi-threaded.
Multi-threaded programming allows a program to execute concurrently while
preserving its correctness as if it were executing sequentially. In essence,
concurrency is about allowing operations that do not conflict with each other in
their memory access to execute concurrently.
In multi-threaded programs conflicts occur when multiple threads access a
shared object simultaneously and at least one of them is a write operation. In
order to handle conflicting operations we need synchronization facilities that
enable threads to execute concurrently without obstructing the program's flow or
invalidating its correctness.
The classic approach to achieving synchronization in concurrent programs has
been locks. Generally, locks are fault intolerant wherein the failure of a single
thread causes a program failure.
Basically there are two types of lock based solutions: coarse grained locking and
fine grained locking. Coarse grained locking is the simpler between the two
approaches, but does not scale well due to the locking of multiple objects or large
pieces of code by a single lock, resulting in limited parallelism. The other type is
fine grained locking. Fine-grained locking based programs scale well when
written by expert programmers but the approach imposes more complexity on the
less experienced programmer making it more error prone and causing it to suffer
from several drawbacks such as deadlocks, priority inversion and starvation.
It is also noteworthy that managing locks becomes prohibitive in a program with a
large number of threads due to the overhead that they incur. Using locks in such
2
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
applications results in inhibiting the program's progress rendering locks an
unpractical approach to implement.
As can be inferred, writing (correct) parallel programs is notoriously difficult, but
is of an increased significance. A need therefore arose for a more abstract and
fault tolerant parallel paradigm.
3
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Software Transactional Memory:
Software transactional memory is a generic synchronization construct for
controlling access to shared memory in concurrent programming. Its model is
analogous to database transactions in terms of notion and correctness.
STM strives to alleviate the drawbacks associated with lock based
synchronization, as well as provide a higher level of abstraction with modularity.
This paradigm is about saving the programmer the effort of figuring out the
interaction and conflicts among the executing threads. It also encapsulates the
design details from the programmer such that the programmer doesn't have to
worry about acquiring or releasing locks. With STM the programmer need only
specify the block of code that should execute atomically, without taking care of
how atomicity is attained. Abstraction is achieved by writing programs that
appear to be sequential, when in fact a program is split into transactions that
execute in parallel in their own context. Those transactions that do not conflict
pass the commit phase and those that do collide with each other are aborted and
restarted.
STM based algorithms allow automatic conversion of a serial program to a
concurrent program. Its non-blocking feature allows a fault tolerant thread
execution wherein a failure of a single thread does not hinder the execution of
other threads.
In STM, a transaction is a sequence of commands that are executed
contiguously by a single thread on a dataset (read/write sets) that might be
shared with other transactions. Typically, a transaction has to maintain certain
properties in order to guarantee a program's correctness.
Two main properties must be satisfied in order to ensure that a transaction is
being processed in a reliable manner:
4
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
1. Atomicity: all or nothing, this is expressed in the requirement that either all
of a transaction's commands are executed and its changes are made
permanent, or none of its commands is executed and any changes made
are discarded. This property makes error recovery possible and allows a
transaction rollback.
2. Isolation: an executing transaction takes a snapshot of memory in its
initialization stage and does not see the changes of another executing
transaction thereof.
A transaction is marked by four phases/operations from its start of
execution until its commit time:
1. Initialization: this phase comprises acquiring a snapshot of the memory
contents or other transactions' datasets that finished committing, as well
as performing all kinds of data initialization that are used in the
consecutive transaction phases.
2. Read: Reading a variable either from the snapshot acquired in the
initialization stage or the memory.
3. Write: writing a value to a variable or updating its value. Typically the
values are written to a thread-private log and not to the memory directly
in order to allow rollback in case of a conflict.
4. Commit: a validation stage that verifies that no conflicts occurred with
other transactions. If the validation succeeds the transaction's changes
are marked as permanent, otherwise the transaction is rolled-back,
aborted and then restarted.
A program that relies on STM to achieve concurrency is expected to satisfy
transactional sequential consistency. This property is defined by the
requirement that "there exists a global total order on operations, consistent with
5
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
the program order in each thread, and each read returns the most recent write
value to the same memory location".
Unlike lock based techniques, most STM algorithms rely on an optimistic
approach in their memory writes. With this approach, a variable modification is
performed without considering what other threads are doing. This however
should include logging of each read and write operation for the validation
process, allowing a transaction to roll back in case of a conflict.
Obviously, the approach imposes an extra overhead in validation and its
complexity depends on the underlying STM algorithm.
6
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Example program:
In the java programming language every object has an implicit lock that is used
by synchronized code. The use of synchronized methods or statements
provides access to the implicit lock of an object whereby only one thread can
own the lock of the object. This type of lock is easy to use but has many
limitations, most noticeable of all is the inability to withdraw a request to acquire a
lock of an object if the object is not available.
In this example we will demonstrate the need for a good concurrency control
model in a parallel program and the drawbacks associated with locks.
Let's take the classic example of bank accounts, where we have an account
object as follows:
Class account {
int balance;
synchronized void withdraw(int amount) {
balance = balance – amount;
}
synchronized void deposit (int amount) {
balance = balance + amount;
}
}
We use the synchronized method for withdraw and deposit so that no lost update
or dirty reads occur in the case that two threads withdraw/deposit concurrently.
Now let's consider the scenario where money is transferred from account A to
account B. naturally we would write the following transfer method:
void withdraw (Account A, Account B, int amount) {
A.withdraw(amount);
B.deposit(amount)
}
7
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
The fact that the two methods used in transfer are synchronized each separately
does not guarantee correct execution. Another thread could be reading the
intermediate state where the money has left account A and has not arrived in
account B.
The use of explicit lock might solve this problem, but is still deadlock prone in the
case that another thread is trying to transfer money in the opposite direction from
account B to account A.
The solution to deadlocks is a 2 phase locking. In this technique, a thread can
acquire locks in the acquire phase only and release them in the shrinking phase
in the same order that they were acquired in. Such a technique is not always
practical due to its blocking, which imposes a long waiting time for other threads
when there's a lot of interaction among the threads and the transactions are long.
From the description above we conclude that locks alone are not sufficient and
that we could do with a new paradigm of concurrency control.
8
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Project goal:
As mentioned earlier, we aim to introduce detail-encapsulated parallelism into
our software applications in order to exploit multiprocessor architecture. In this
project we devise a new algorithm that relies on software transactional memory
concepts, implement it and evaluate its performance in comparison to other STM
algorithms.
In its notion, our algorithm is similar to optimistic concurrency control. This
method assumes that no conflicts are occurring such that transactions can
continue working in their context of execution without locking the data that they
are using. This approach may cause some inconsistencies in the executing
transactions' read sets. In order to detect inconsistencies, in the commit phase,
each transaction has to verify that no other transaction has modified its read set
data. If an intersection is discovered, the committing transaction must roll back
and abort.
In what follows we give a description of our algorithm design and demonstrate its
properties.
Algorithm Overview:
Most existing STM algorithms make use of metadata such as version clocks or
time stamping attached to variables locations in the shared memory. TL2
algorithm for instance uses a global version-clock that is incremented by each
update transaction (a transaction that writes to memory) and all memory
locations have a lock that contains a version number attached to them.
In our algorithm, a transaction essentially consists of a read set, a write set and a
lookup table during its execution. Each successfully validated transaction after
completing its execution is appended to a global linked list of committed
transactions. These transactions may be written to the main memory after being
validated against by all transactions that started executing before it was
9
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
appended. This linked list along with the algorithm flow helps us realize a logical
ordering of events rather than the use of meta data during validation in a
transaction's commit phase.
The reads in our algorithm are invisible, which means that they are not written to
memory. Therefore there's no point in appending them to the list of successful
transactions.
This structure that we present helps to reduce the time of validation to be linear
in the number of concurrent transactions that commit rather than the transaction
itself.
Algorithm Description:
Every node in the global list comprises a successful writer transaction. A
transaction in the list consists of the following two thread-specific parameters:
1. Write set: this set consists of the objects that the transaction modified in a
finalized state.
2. A barrier: its value indicates the number of transactions that started
execution after this transaction was successfully enqueued onto the list.
Since our algorithm overlooks read inconsistencies till commit time, these
transactions that incremented the barrier have yet to validate their read
sets against this transaction's write set following their execution phase.
A barrier value that equals 0 marks the transaction as ready to be written
to main memory. Once the transaction has been written to the main
memory, its existence is no longer needed.
3. Next: Obviously this parameter points to the next transaction in the list of
successful write transactions.
Our algorithm uses a Bloom Filter to represent the read set of an executing
transaction. This data structure helps meet the read consistency condition at a
lower complexity as we will shortly describe.
10
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
In our implementation, we distinguish between a transaction (called a transaction
descriptor) and a thread context in which a transaction is executing. A context is
thread specific and naturally includes the transaction descriptor as will be shown
in the implementation section.
Below we will give a detailed description of the algorithm design and
implementation details, including the functionality in each of the transaction's
phases.
On a global level, we have the linked list of successful update transactions. We
maintain a pointer to the head of the list marking the first transaction that will be
written to the main memory. We denote it by first_committed. We also maintain
a pointer to the tail of the list marking the last transaction appended to the list, we
denote it by last_committed.
We have a Terminator thread. This thread is responsible for writing the writesets of the write-transaction in the linked list to the main memory if the barrier
value allows it. Eventually, we need these write-sets committed to the main
memory so that its size doesn't grow indefinitely resulting in increased overhead,
and by the end of the program we need to have these values committed to the
main memory to mark the program as successfully completed. Obviously, this
thread's existence is necessary as long as the program is executing only,
therefore is set to be a daemon thread.
The transaction context comprises the following parameters:
Write-set: the set of variables that this transaction modified. In its read phase,
the transaction checks this set as a first reference for data fetching.
Read-set: the set of variables that this transaction read.
Lookup-table: the set of variables compiled from the committed transactions in
the linked list starting from first_committed until last_committed. As mentioned
earlier, these variables' values are held by successful write transaction and so
are more updated than the main memory. In its read phase, a transaction in its
11
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
read phase checks this set first as a second reference for data fetching after its
own write-set.
Barrier: an atomic counter that maintains the number of transactions that need
to perform validation in their commit phase against this write-set and the writesets that follow in the linked list before it may be written to main memory.
This is necessary to satisfy read consistency in the validation process when we
check that this transaction does not read values that have been modified by other
transactions after it started its execution.
We note here that the barrier value should carefully (not only atomically) be
incremented and decremented as we will explain why in the algorithm overview
section.
Last_validated_against: this variable points to the last successful transaction in
the linked list (last_comimtted then) when this transaction started the execution
(the same one whose barrier it incremented). We will explain shortly why we
need a pointer to this transaction.
Below is the algorithm written in pseudo code. The flow is divided to the different
phases of a committing transaction:
12
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
For each transaction executing in a thread context:
{
At begin:
{
While txnList.hasMoreElements()
txi.lookupTable.add(txnList(i).writeSet)
txi. last_validated_against ← last_committed
txi.barrier ←0
}
At read:
{
value ← this.writeSet.getValue (x)
if (value = null)
value ← this.lookupTable.getValue(x)
if (value = null)
value ← memory.getValue (x)
}
}
At write:
{
x.value = newValue;
This.writesSet.add(x);
}
At commit:
{
While(true)
currentTransaction ← last_validated_against
13
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
validate: For each x in this readSet
If(currentTransaction.ws.contains(x))
currentTransaction.decreaseBarrier();
currentTransaction.abort();
last_validated_against ← currentTransaction
currentTransaction ← currentTransaction.next
if(CAS(latest_committed, last_validated_against, this tx)=false)
goto validate;
}
We will recap the algorithm written in psuedo code above at each of its main
phases:
init: For a transaction that was just created by a thread, we create a lookup-table
set and fill it with all the write-sets in the list of successful write transactions. We
don't perform validation against these write-sets since we now saved their values
in the transaction dataset. Therefore at this stage last_validated_against is
initialized to last_committed, indicating that any other transaction appended to
the list after that must be validated against in order to fulfill the read consistency
property.
Read: if a transaction needs to read a value, it first checks its own write-set. If it's
not found there, it checks in its look-up table. If it's still not found in its context
dataset, the transaction looks it up in the main memory.
Write: the transaction simply updates the value of the desired variable and then
adds it to its write-set. This flow allows concurrent write-backs without
invalidating a program's correctness.
Commit: this is the most complex phase in the transaction's lifecycle while
executing in its thread context. This phase comprises two interconnected stages,
the first being validation against the write-sets that have been appended
following the transaction's initialisation stage. In essence, it means that for
14
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
correctness sake the transaction must verify that in its read phase the latest
updated values were fetched. This is because the values read into the
transaction's read-set might have been updated by transactions that successfully
committed after this transaction's init phase, rendering the values in the
executing transaction's read-set obsolete. To this end, if an intersection is found
between this transaction's read-set and the scanned accumulated successful
transactions ( i.e. the following condition is not met,
 this tx .read-set  current scanned tx .write-set  
)
, then this transaction rolls back and aborts.
The second stage in this phase is atomically appending the transaction to the
linked list in case another thread was simultaneously in the process of appending
its transaction. To this end, we perform a CAS (compare and swap) operation
that affirms that no additional transaction accumulated after
last_validated_against:
CAS(latest_committed, last_validated_against, this tx) :
If the condition holds then the transaction is successfully appended to the list.
Otherwise, it goes back to the first stage of the validation process.
As can be seen, several CAS operations might be needed to append a validated
transaction to the list atomically. The algorithm also requires that appending a
new transaction to the list be done atomically, which dictates sequential
appendage of a new transaction.
Abort: in this phase the aborting transaction merely decrements the value of its
last_validated_against transaction's barrier as it will no longer be needed for this
specific thread's validation.
15
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Below is an illustration of different stages in the algorithm where txi is the
transaction executing in a thread context:
figure 1
The lookup table comprises the write-sets up until WSn. The corresponding
thread scans the list of committed transactions and adds their contents to the
lookup-table up until WSn.
16
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
figure 2
After completing the scan that constructs the lookup table a barrier is added to
indicate to the Terminator thread that it should not commit write-sets to the main
memory past WSn. The read set is initially empty.
17
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
figure 3
The accumulated transactions are transactions that committed after the
initialization of txi. By the time it finished execution, last_committed was already
pointing to the last transaction in the accumulated transactions. In this example,
we can see that the variable z in the read-set isn't the most updated read value
as there's a more updated value in one of the accumulated transactions. In this
case in the validation process txI is aborted and restarted.
As mentioned earlier, a transaction context is characterized by more parameters
than the transaction descriptor. This is because during execution we need more
data for the transaction to execute, whereas once the transaction reaches the
end of its commit phase, whether successfully or unsuccessfully, it can discard
this data. An example of redundant data that was needed before the transaction
completed its commit phase is the read-set and lookup table. A write-set however
is needed to be preserved in order to commit its elements to the main memory
permanently.
18
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Implementation
The algorithm was implemented in Deuce framework. Deuce is a three-layer
framework application written in Java. The algorithm developer can plug in their
algorithm to the framework, allowing them to fully focus on the algorithm
implementation rather than delve into details of how the interaction is done with
the JVM or how the classes are manipulated to execute as transactions. Deuce
provides a high level of abstraction by allowing a transaction to announce each of
its phases. Basically, there are five primary methods that the developer needs to
fill: Initialization, read, write, commit and rollback.
Deuce doesn’t require compiler support, a program parallelized using Deuce is
compiled as any other program using a regular Java compiler, making Deuce a
non-intrusive framework. To enable manipulation of the user classes when
enabling parallelism, Deuce relies on instrumenting the code at loadtime. Deuce
is passed as an agent parameter to the JVM when running a Deuce program,
then at load time Deuce intercept the program's classes and performs java
bytecode manipulation to methods that were marked as atomic by the high level
programmer.
In essence, Deuce is composed of three layers that make up its architecture as
shown in figure 4:
19
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
figure 4
The application layer is the highest level in this architecture and it comprises the
high level programmer classes. A method that needs to execute atomically needs
to be marked by higher-level programmer using the Java annotation "@Atomic",
providing a thread-safe execution for the method.
Example:
Public class Counter{
int counter = 0;
@Atomic
Public void increment()
{
counter++;
}
}
Deuce runtime is the layer responsible for orchestrating the interaction between
the transactions executed by the higher level application and the underlying STM
algorithm, for example, managing callbacks of init, read, write or commit events.
The third level comprises the classes that implement the STM algorithm.
20
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Algorithm Design:
The algorithm was implemented using these classes:
WriteSet: the class that contains the write-set data structure and the method that
writes its contents to the main memory. Basically it's a set of fields (variables),
where a field contains an identifier to the memory location and its value.
BloomFilter: A generic class that implements a bloom filter structure and that we
use for our ReadSet implementation to optimize the algorithm runtime.
A bloom filter is a probabilistic data structure that is used to query if a given
element is a member of a set. Initially, it's a m-long bit array all set to 0. K hash
functions are defined for insertion where the hash function output is larger or
equal to 0 and smaller than m. When we wish to insert a value to the set, we run
all k functions on this value and assign 1 to the positions corresponding to the
hash value calculation. When querying for an element, we feed its value to each
of the k hash functions and check if all positions obtained are set to 1. If at least
one position is zero then we conclude that the element is not in the set, making
false negatives impossible in this algorithm. However, in this structure false
positives are possible wherein the queried value was hashed to positions that
were set to one by other different elements.
In this data structure that we use to represent a transaction's read-set, the values
are lost. But as indicated before, these values are not needed and we can use a
non-lossless data structure for the intent of discovering an intersection only. We
discuss briefly how this structure reduces the complexity of intersection check
hence optimizing the runtime of our algorithm:
The complexity to query for an element E in a write-set is done at O(k),
comprising the time to evaluate k hash functions and the time to check k
positions in the array.
If we denote the read-set of the committing transaction by R, the write-set of
committed transactions by W i, and the size of the linked list by n, then
21
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
representing the read set of an executing transaction in a bloom filter reduces the
complexity of the intersection to:
k ∗ ∑𝑛𝑖=0 |Wi|
Rather than:
|R| ∗ ∑𝑛𝑖=0 |Wi| .
Hence, this makes the complexity independent of the size of the committing
transaction's read-set size.
TransactionDescriptor: the class that contains all the specific instance
transaction details, these include: the next transaction descriptor in the linked list,
barrier and the write-set. Maintaining the read-set or look-up table in a
transaction descriptor is redundant since past the validation stage there's no use
for it anymore and it may be discarded.
Context: this is the main class that the framework supplies to the algorithm
developer to plug their algorithm into the framework. Using this class, we
implemented all phases of a transaction as methods. The framework is
responsible for making callbacks to these functions by the executing transaction.
This class represents a thread context in which a transaction may be executing.
Once a thread is done executing an instance of a transaction, it creates a new
transaction and executes it.
Terminator: a daemon thread that iterates over the linked list of transactions in
order, never bypassing a transaction with a barrier value larger than zero. If it
finds a transaction with a barrier value 0 then it atomically marks it as -1 using a
CAS operation to indicate that it is written. If the CAS fails then the thread blocks
until the value becomes zero again.
We use the value -1 rather than 0 due to the ambiguity of the zero value. Zero
can indicate that either no other transactions are holding the associated
transaction but it can simply be incremented, or that no other transactions are
holding the associated transaction and it is already being written by terminator. A
possible scenario that could arise is that Terminator started writing the write-sets
22
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
to the main-memory and meanwhile a different transaction is trying to increment
its value. Evidently, it will be too late for the scanning transaction to assign this
transaction to its last_validated_against variable. On the other hand, if the barrier
value is -1 then it indicates that Terminator has marked it for writing and we can
stop the scanning thread from incrementing this value.
The implementation verifies that transactions that are already committed to the
main memory may be garbage collected by the garbage collector through
eliminating any reference to them.
We will now discuss the algorithm properties
Write values are marked as final only after the transaction has committed.
The read-set in an executing transaction is only composed of final values
that either exist in the transaction's own write, or the main memory or the
lookup-table. An executing transaction dataset is not seen by other
transactions executing in parallel as can be deduced from the algorithm
description. With this, we have satisfied the isolation property necessary
for a transaction's correct execution.
The algorithm also satisfies the atomicity property, since in the rollback
stage the dataset associated with the transaction's context is discarded
and any changes that are bound to affect other transactions are annulled
as well.
Safe execution and read consistency is achieved in the validation process
hence abiding by the optimistic approach in STM. Validating the read-set in
the commit phase is to guarantee that the most recent updates are read in
the committing transaction. Our algorithm overhead for safety yields
concurrency when a transaction's read-set does not intersect with writesets of accumulated transactions. If they do intersect however, then this
can be a source of a vast overhead degenerating the overall performance
of a program.
23
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Using the global linked of write-sets allows these write-sets to be written to
main memory without disrupting other transactions, as other transactions
may continue with their commit phase.
Evaluation and results
We evaluated our algorithm on a 24 core machine and compared it to other
existing algorithms in the framework: TL2 and LSA.
For the evaluation part we used benchmarks from the Stamp package. This suit
was designed for transactional memory research purposes.
We evaluated our algorithm against TL2 and LSA in 2 benchmarks:
Vacation which is a client/server travel reservation system and Labyrinth, a maze
routing program.
Results:
Vacation Benchmark
300000
throughput
250000
200000
llstm
150000
LSA
100000
TL2
50000
0
0
10
20
30
Threads number
figure 5
24
40
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Labryrith Benchmark
300000
throughput
250000
200000
llstm
150000
LSA
100000
TL2
50000
0
0
10
20
30
Threads number
40
figure 6
Analysis:
As can be seen, we couldn’t achieve a better performance with our approach in
comparison to the other two algorithms in the framework. Algorithms
performance greatly depends on the executing program. The program's
workload, the degree of interaction between the threads and the number of writer
transactions determine how much an algorithm can improve its performance. For
programs with many writer transactions, an optimistic approach wouldn't scale
that well in comparison to algorithms that use a different approach. This is due to
validation overhead and high number of rollbacks which results in lost work and
reduced throughput.
Our optimistic approach is suitable when updates are rare, conflicts rarely occur
and transactions access a small amount of objects leading to less collision
among the threads.
Generally, in all three algorithms the throughput increases with the number of
threads up to 16 threads, after that the number of threads nears the number of
cores and the performance is degraded because the CPUs start spending more
time in lost work due to the collisions among the transactions.
25
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Conclusion
The essence of STM notion is to let non-conflicting threads execute in parallel
without having to obstruct one another. As usual, there's no one silver bullet that
can solve all the problems raised when writing parallel programs.
Multi-threaded programs are very difficult to debug due to the lack of determinism
in execution, as well as the concurrent execution of multiple threads.
STM can also have an enormous overhead that degenerates its performance. It
can be observed however that there several are advantages credited with STM
that I hope I could convey in this book.
Project benefits and knowledge acquired
Although I have only had the opportunity to touch the surface of STM and there is
still so much more to learn in its topics and concept, one of the huge advantages
achieved in this project was the exposal to the software transactional memory
paradigm. This subject is of utmost importance and a lot of research in academia
revolves around it.
It was interesting to learn the analogy between distributed systems algorithms
and database transactions to Software transactional memory and observe similar
concepts being applied and implemented in STM.
Throughout the project, it can be observed that STM, even though mostly only
still in its experimental stages, has the potential to delicately solve concurrency
issues, therefore fulfilling the urgent need for parallel software applications on the
ubiquitous multi-core machines.
The other advantage gained in this project was the experimentation with the builtin multi-threading programming in java and learning the different concurrency
control mechanisms achieved through the language.
26
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
Needless to say, this project has inspired me to pursue further courses in the
subject in the near future.
Appendix
Deuce source code and documentation can be downloaded from:
https://sites.google.com/site/deucestm/.
If you want to write and plug in your own algorithm using the deuce framework,
you create a package under org.deuce.transaction, then you create the class
Context.java that implements the class org.deuce.transaction.Context.
To run the Deuce agent jar from eclipse, type the following in the VM arguments
box: javaagent:bin/deuceAgent.jar Dorg.deuce.transaction.contextClass=org.deuce.transaction.algorithmName.Cont
ext.
In Unix if for example we want to run the class jstamp.vacation.Vacation, we type
the following:
java -server –javaagent./bin/deuceAgent.jar -cp ./bin/classes/:. XX:+AggressiveHeap -verbose:gc Dorg.deuce.exclude="java.*,sun.*,org.eclipse.*" Dorg.deuce.transaction.contextClass=org.deuce.transaction.
algorithmName.Context jstamp.vacation.Vacation -n 4 -q 60 -u 90 -r 16000 -t
419430 -c 24
27
SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM
References:
1. Noninvasive concurrency with Java STM
Guy Korland and Nir Shavit and Pascal Felber2
2. Transactional Locking II
Dave Dice, Ori Shalev and Nir Shavit
3. Beautiful concurrency Simon Peyton Jones
4. A Lazy Snapshot Algorithm with EagerValidation
Torvald Riegel and Pascal Felber and Christof Fetzer1
5. Design Tradeoffs in Modern Software Transactional Memory Systems
Virendra J. Marathe, William N. Scherer III, and Michael L. Scott
28
Download