dsdsdddddd SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM By: Rania Briq Supervisor: Dima Perelman SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Contents Introduction: ................................................................................................................................ 2 Software Transactional Memory:................................................................................................. 4 Example program: .................................................................................................................. 7 Project goal: ................................................................................................................................ 9 Algorithm Overview: .................................................................................................................... 9 Algorithm Description: ............................................................................................................... 10 Implementation ......................................................................................................................... 19 Algorithm Design: ..................................................................................................................... 21 Evaluation and results............................................................................................................... 24 Analysis: ................................................................................................................................... 25 Project benefits and knowledge acquired .................................................................................. 26 Appendix................................................................................................................................... 27 References: .............................................................................................................................. 28 figure 1 ........................................................................................................................................................ 16 figure 2 ........................................................................................................................................................ 17 figure 3 ........................................................................................................................................................ 18 figure 4 ........................................................................................................................................................ 20 figure 5 ........................................................................................................................................................ 24 figure 6 ........................................................................................................................................................ 25 1 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Introduction: Multi-core architecture has become the norm in current-generation computers. It has become present not only in personal computers, but also in most cutting edge new smart phones and tablets. These computers however will not run faster if we don't write parallel programs to exploit them. In order to take advantage of multi core architecture, programs need to be made multi-threaded. Multi-threaded programming allows a program to execute concurrently while preserving its correctness as if it were executing sequentially. In essence, concurrency is about allowing operations that do not conflict with each other in their memory access to execute concurrently. In multi-threaded programs conflicts occur when multiple threads access a shared object simultaneously and at least one of them is a write operation. In order to handle conflicting operations we need synchronization facilities that enable threads to execute concurrently without obstructing the program's flow or invalidating its correctness. The classic approach to achieving synchronization in concurrent programs has been locks. Generally, locks are fault intolerant wherein the failure of a single thread causes a program failure. Basically there are two types of lock based solutions: coarse grained locking and fine grained locking. Coarse grained locking is the simpler between the two approaches, but does not scale well due to the locking of multiple objects or large pieces of code by a single lock, resulting in limited parallelism. The other type is fine grained locking. Fine-grained locking based programs scale well when written by expert programmers but the approach imposes more complexity on the less experienced programmer making it more error prone and causing it to suffer from several drawbacks such as deadlocks, priority inversion and starvation. It is also noteworthy that managing locks becomes prohibitive in a program with a large number of threads due to the overhead that they incur. Using locks in such 2 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM applications results in inhibiting the program's progress rendering locks an unpractical approach to implement. As can be inferred, writing (correct) parallel programs is notoriously difficult, but is of an increased significance. A need therefore arose for a more abstract and fault tolerant parallel paradigm. 3 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Software Transactional Memory: Software transactional memory is a generic synchronization construct for controlling access to shared memory in concurrent programming. Its model is analogous to database transactions in terms of notion and correctness. STM strives to alleviate the drawbacks associated with lock based synchronization, as well as provide a higher level of abstraction with modularity. This paradigm is about saving the programmer the effort of figuring out the interaction and conflicts among the executing threads. It also encapsulates the design details from the programmer such that the programmer doesn't have to worry about acquiring or releasing locks. With STM the programmer need only specify the block of code that should execute atomically, without taking care of how atomicity is attained. Abstraction is achieved by writing programs that appear to be sequential, when in fact a program is split into transactions that execute in parallel in their own context. Those transactions that do not conflict pass the commit phase and those that do collide with each other are aborted and restarted. STM based algorithms allow automatic conversion of a serial program to a concurrent program. Its non-blocking feature allows a fault tolerant thread execution wherein a failure of a single thread does not hinder the execution of other threads. In STM, a transaction is a sequence of commands that are executed contiguously by a single thread on a dataset (read/write sets) that might be shared with other transactions. Typically, a transaction has to maintain certain properties in order to guarantee a program's correctness. Two main properties must be satisfied in order to ensure that a transaction is being processed in a reliable manner: 4 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM 1. Atomicity: all or nothing, this is expressed in the requirement that either all of a transaction's commands are executed and its changes are made permanent, or none of its commands is executed and any changes made are discarded. This property makes error recovery possible and allows a transaction rollback. 2. Isolation: an executing transaction takes a snapshot of memory in its initialization stage and does not see the changes of another executing transaction thereof. A transaction is marked by four phases/operations from its start of execution until its commit time: 1. Initialization: this phase comprises acquiring a snapshot of the memory contents or other transactions' datasets that finished committing, as well as performing all kinds of data initialization that are used in the consecutive transaction phases. 2. Read: Reading a variable either from the snapshot acquired in the initialization stage or the memory. 3. Write: writing a value to a variable or updating its value. Typically the values are written to a thread-private log and not to the memory directly in order to allow rollback in case of a conflict. 4. Commit: a validation stage that verifies that no conflicts occurred with other transactions. If the validation succeeds the transaction's changes are marked as permanent, otherwise the transaction is rolled-back, aborted and then restarted. A program that relies on STM to achieve concurrency is expected to satisfy transactional sequential consistency. This property is defined by the requirement that "there exists a global total order on operations, consistent with 5 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM the program order in each thread, and each read returns the most recent write value to the same memory location". Unlike lock based techniques, most STM algorithms rely on an optimistic approach in their memory writes. With this approach, a variable modification is performed without considering what other threads are doing. This however should include logging of each read and write operation for the validation process, allowing a transaction to roll back in case of a conflict. Obviously, the approach imposes an extra overhead in validation and its complexity depends on the underlying STM algorithm. 6 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Example program: In the java programming language every object has an implicit lock that is used by synchronized code. The use of synchronized methods or statements provides access to the implicit lock of an object whereby only one thread can own the lock of the object. This type of lock is easy to use but has many limitations, most noticeable of all is the inability to withdraw a request to acquire a lock of an object if the object is not available. In this example we will demonstrate the need for a good concurrency control model in a parallel program and the drawbacks associated with locks. Let's take the classic example of bank accounts, where we have an account object as follows: Class account { int balance; synchronized void withdraw(int amount) { balance = balance – amount; } synchronized void deposit (int amount) { balance = balance + amount; } } We use the synchronized method for withdraw and deposit so that no lost update or dirty reads occur in the case that two threads withdraw/deposit concurrently. Now let's consider the scenario where money is transferred from account A to account B. naturally we would write the following transfer method: void withdraw (Account A, Account B, int amount) { A.withdraw(amount); B.deposit(amount) } 7 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM The fact that the two methods used in transfer are synchronized each separately does not guarantee correct execution. Another thread could be reading the intermediate state where the money has left account A and has not arrived in account B. The use of explicit lock might solve this problem, but is still deadlock prone in the case that another thread is trying to transfer money in the opposite direction from account B to account A. The solution to deadlocks is a 2 phase locking. In this technique, a thread can acquire locks in the acquire phase only and release them in the shrinking phase in the same order that they were acquired in. Such a technique is not always practical due to its blocking, which imposes a long waiting time for other threads when there's a lot of interaction among the threads and the transactions are long. From the description above we conclude that locks alone are not sufficient and that we could do with a new paradigm of concurrency control. 8 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Project goal: As mentioned earlier, we aim to introduce detail-encapsulated parallelism into our software applications in order to exploit multiprocessor architecture. In this project we devise a new algorithm that relies on software transactional memory concepts, implement it and evaluate its performance in comparison to other STM algorithms. In its notion, our algorithm is similar to optimistic concurrency control. This method assumes that no conflicts are occurring such that transactions can continue working in their context of execution without locking the data that they are using. This approach may cause some inconsistencies in the executing transactions' read sets. In order to detect inconsistencies, in the commit phase, each transaction has to verify that no other transaction has modified its read set data. If an intersection is discovered, the committing transaction must roll back and abort. In what follows we give a description of our algorithm design and demonstrate its properties. Algorithm Overview: Most existing STM algorithms make use of metadata such as version clocks or time stamping attached to variables locations in the shared memory. TL2 algorithm for instance uses a global version-clock that is incremented by each update transaction (a transaction that writes to memory) and all memory locations have a lock that contains a version number attached to them. In our algorithm, a transaction essentially consists of a read set, a write set and a lookup table during its execution. Each successfully validated transaction after completing its execution is appended to a global linked list of committed transactions. These transactions may be written to the main memory after being validated against by all transactions that started executing before it was 9 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM appended. This linked list along with the algorithm flow helps us realize a logical ordering of events rather than the use of meta data during validation in a transaction's commit phase. The reads in our algorithm are invisible, which means that they are not written to memory. Therefore there's no point in appending them to the list of successful transactions. This structure that we present helps to reduce the time of validation to be linear in the number of concurrent transactions that commit rather than the transaction itself. Algorithm Description: Every node in the global list comprises a successful writer transaction. A transaction in the list consists of the following two thread-specific parameters: 1. Write set: this set consists of the objects that the transaction modified in a finalized state. 2. A barrier: its value indicates the number of transactions that started execution after this transaction was successfully enqueued onto the list. Since our algorithm overlooks read inconsistencies till commit time, these transactions that incremented the barrier have yet to validate their read sets against this transaction's write set following their execution phase. A barrier value that equals 0 marks the transaction as ready to be written to main memory. Once the transaction has been written to the main memory, its existence is no longer needed. 3. Next: Obviously this parameter points to the next transaction in the list of successful write transactions. Our algorithm uses a Bloom Filter to represent the read set of an executing transaction. This data structure helps meet the read consistency condition at a lower complexity as we will shortly describe. 10 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM In our implementation, we distinguish between a transaction (called a transaction descriptor) and a thread context in which a transaction is executing. A context is thread specific and naturally includes the transaction descriptor as will be shown in the implementation section. Below we will give a detailed description of the algorithm design and implementation details, including the functionality in each of the transaction's phases. On a global level, we have the linked list of successful update transactions. We maintain a pointer to the head of the list marking the first transaction that will be written to the main memory. We denote it by first_committed. We also maintain a pointer to the tail of the list marking the last transaction appended to the list, we denote it by last_committed. We have a Terminator thread. This thread is responsible for writing the writesets of the write-transaction in the linked list to the main memory if the barrier value allows it. Eventually, we need these write-sets committed to the main memory so that its size doesn't grow indefinitely resulting in increased overhead, and by the end of the program we need to have these values committed to the main memory to mark the program as successfully completed. Obviously, this thread's existence is necessary as long as the program is executing only, therefore is set to be a daemon thread. The transaction context comprises the following parameters: Write-set: the set of variables that this transaction modified. In its read phase, the transaction checks this set as a first reference for data fetching. Read-set: the set of variables that this transaction read. Lookup-table: the set of variables compiled from the committed transactions in the linked list starting from first_committed until last_committed. As mentioned earlier, these variables' values are held by successful write transaction and so are more updated than the main memory. In its read phase, a transaction in its 11 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM read phase checks this set first as a second reference for data fetching after its own write-set. Barrier: an atomic counter that maintains the number of transactions that need to perform validation in their commit phase against this write-set and the writesets that follow in the linked list before it may be written to main memory. This is necessary to satisfy read consistency in the validation process when we check that this transaction does not read values that have been modified by other transactions after it started its execution. We note here that the barrier value should carefully (not only atomically) be incremented and decremented as we will explain why in the algorithm overview section. Last_validated_against: this variable points to the last successful transaction in the linked list (last_comimtted then) when this transaction started the execution (the same one whose barrier it incremented). We will explain shortly why we need a pointer to this transaction. Below is the algorithm written in pseudo code. The flow is divided to the different phases of a committing transaction: 12 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM For each transaction executing in a thread context: { At begin: { While txnList.hasMoreElements() txi.lookupTable.add(txnList(i).writeSet) txi. last_validated_against ← last_committed txi.barrier ←0 } At read: { value ← this.writeSet.getValue (x) if (value = null) value ← this.lookupTable.getValue(x) if (value = null) value ← memory.getValue (x) } } At write: { x.value = newValue; This.writesSet.add(x); } At commit: { While(true) currentTransaction ← last_validated_against 13 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM validate: For each x in this readSet If(currentTransaction.ws.contains(x)) currentTransaction.decreaseBarrier(); currentTransaction.abort(); last_validated_against ← currentTransaction currentTransaction ← currentTransaction.next if(CAS(latest_committed, last_validated_against, this tx)=false) goto validate; } We will recap the algorithm written in psuedo code above at each of its main phases: init: For a transaction that was just created by a thread, we create a lookup-table set and fill it with all the write-sets in the list of successful write transactions. We don't perform validation against these write-sets since we now saved their values in the transaction dataset. Therefore at this stage last_validated_against is initialized to last_committed, indicating that any other transaction appended to the list after that must be validated against in order to fulfill the read consistency property. Read: if a transaction needs to read a value, it first checks its own write-set. If it's not found there, it checks in its look-up table. If it's still not found in its context dataset, the transaction looks it up in the main memory. Write: the transaction simply updates the value of the desired variable and then adds it to its write-set. This flow allows concurrent write-backs without invalidating a program's correctness. Commit: this is the most complex phase in the transaction's lifecycle while executing in its thread context. This phase comprises two interconnected stages, the first being validation against the write-sets that have been appended following the transaction's initialisation stage. In essence, it means that for 14 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM correctness sake the transaction must verify that in its read phase the latest updated values were fetched. This is because the values read into the transaction's read-set might have been updated by transactions that successfully committed after this transaction's init phase, rendering the values in the executing transaction's read-set obsolete. To this end, if an intersection is found between this transaction's read-set and the scanned accumulated successful transactions ( i.e. the following condition is not met, this tx .read-set current scanned tx .write-set ) , then this transaction rolls back and aborts. The second stage in this phase is atomically appending the transaction to the linked list in case another thread was simultaneously in the process of appending its transaction. To this end, we perform a CAS (compare and swap) operation that affirms that no additional transaction accumulated after last_validated_against: CAS(latest_committed, last_validated_against, this tx) : If the condition holds then the transaction is successfully appended to the list. Otherwise, it goes back to the first stage of the validation process. As can be seen, several CAS operations might be needed to append a validated transaction to the list atomically. The algorithm also requires that appending a new transaction to the list be done atomically, which dictates sequential appendage of a new transaction. Abort: in this phase the aborting transaction merely decrements the value of its last_validated_against transaction's barrier as it will no longer be needed for this specific thread's validation. 15 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Below is an illustration of different stages in the algorithm where txi is the transaction executing in a thread context: figure 1 The lookup table comprises the write-sets up until WSn. The corresponding thread scans the list of committed transactions and adds their contents to the lookup-table up until WSn. 16 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM figure 2 After completing the scan that constructs the lookup table a barrier is added to indicate to the Terminator thread that it should not commit write-sets to the main memory past WSn. The read set is initially empty. 17 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM figure 3 The accumulated transactions are transactions that committed after the initialization of txi. By the time it finished execution, last_committed was already pointing to the last transaction in the accumulated transactions. In this example, we can see that the variable z in the read-set isn't the most updated read value as there's a more updated value in one of the accumulated transactions. In this case in the validation process txI is aborted and restarted. As mentioned earlier, a transaction context is characterized by more parameters than the transaction descriptor. This is because during execution we need more data for the transaction to execute, whereas once the transaction reaches the end of its commit phase, whether successfully or unsuccessfully, it can discard this data. An example of redundant data that was needed before the transaction completed its commit phase is the read-set and lookup table. A write-set however is needed to be preserved in order to commit its elements to the main memory permanently. 18 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Implementation The algorithm was implemented in Deuce framework. Deuce is a three-layer framework application written in Java. The algorithm developer can plug in their algorithm to the framework, allowing them to fully focus on the algorithm implementation rather than delve into details of how the interaction is done with the JVM or how the classes are manipulated to execute as transactions. Deuce provides a high level of abstraction by allowing a transaction to announce each of its phases. Basically, there are five primary methods that the developer needs to fill: Initialization, read, write, commit and rollback. Deuce doesn’t require compiler support, a program parallelized using Deuce is compiled as any other program using a regular Java compiler, making Deuce a non-intrusive framework. To enable manipulation of the user classes when enabling parallelism, Deuce relies on instrumenting the code at loadtime. Deuce is passed as an agent parameter to the JVM when running a Deuce program, then at load time Deuce intercept the program's classes and performs java bytecode manipulation to methods that were marked as atomic by the high level programmer. In essence, Deuce is composed of three layers that make up its architecture as shown in figure 4: 19 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM figure 4 The application layer is the highest level in this architecture and it comprises the high level programmer classes. A method that needs to execute atomically needs to be marked by higher-level programmer using the Java annotation "@Atomic", providing a thread-safe execution for the method. Example: Public class Counter{ int counter = 0; @Atomic Public void increment() { counter++; } } Deuce runtime is the layer responsible for orchestrating the interaction between the transactions executed by the higher level application and the underlying STM algorithm, for example, managing callbacks of init, read, write or commit events. The third level comprises the classes that implement the STM algorithm. 20 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Algorithm Design: The algorithm was implemented using these classes: WriteSet: the class that contains the write-set data structure and the method that writes its contents to the main memory. Basically it's a set of fields (variables), where a field contains an identifier to the memory location and its value. BloomFilter: A generic class that implements a bloom filter structure and that we use for our ReadSet implementation to optimize the algorithm runtime. A bloom filter is a probabilistic data structure that is used to query if a given element is a member of a set. Initially, it's a m-long bit array all set to 0. K hash functions are defined for insertion where the hash function output is larger or equal to 0 and smaller than m. When we wish to insert a value to the set, we run all k functions on this value and assign 1 to the positions corresponding to the hash value calculation. When querying for an element, we feed its value to each of the k hash functions and check if all positions obtained are set to 1. If at least one position is zero then we conclude that the element is not in the set, making false negatives impossible in this algorithm. However, in this structure false positives are possible wherein the queried value was hashed to positions that were set to one by other different elements. In this data structure that we use to represent a transaction's read-set, the values are lost. But as indicated before, these values are not needed and we can use a non-lossless data structure for the intent of discovering an intersection only. We discuss briefly how this structure reduces the complexity of intersection check hence optimizing the runtime of our algorithm: The complexity to query for an element E in a write-set is done at O(k), comprising the time to evaluate k hash functions and the time to check k positions in the array. If we denote the read-set of the committing transaction by R, the write-set of committed transactions by W i, and the size of the linked list by n, then 21 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM representing the read set of an executing transaction in a bloom filter reduces the complexity of the intersection to: k ∗ ∑𝑛𝑖=0 |Wi| Rather than: |R| ∗ ∑𝑛𝑖=0 |Wi| . Hence, this makes the complexity independent of the size of the committing transaction's read-set size. TransactionDescriptor: the class that contains all the specific instance transaction details, these include: the next transaction descriptor in the linked list, barrier and the write-set. Maintaining the read-set or look-up table in a transaction descriptor is redundant since past the validation stage there's no use for it anymore and it may be discarded. Context: this is the main class that the framework supplies to the algorithm developer to plug their algorithm into the framework. Using this class, we implemented all phases of a transaction as methods. The framework is responsible for making callbacks to these functions by the executing transaction. This class represents a thread context in which a transaction may be executing. Once a thread is done executing an instance of a transaction, it creates a new transaction and executes it. Terminator: a daemon thread that iterates over the linked list of transactions in order, never bypassing a transaction with a barrier value larger than zero. If it finds a transaction with a barrier value 0 then it atomically marks it as -1 using a CAS operation to indicate that it is written. If the CAS fails then the thread blocks until the value becomes zero again. We use the value -1 rather than 0 due to the ambiguity of the zero value. Zero can indicate that either no other transactions are holding the associated transaction but it can simply be incremented, or that no other transactions are holding the associated transaction and it is already being written by terminator. A possible scenario that could arise is that Terminator started writing the write-sets 22 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM to the main-memory and meanwhile a different transaction is trying to increment its value. Evidently, it will be too late for the scanning transaction to assign this transaction to its last_validated_against variable. On the other hand, if the barrier value is -1 then it indicates that Terminator has marked it for writing and we can stop the scanning thread from incrementing this value. The implementation verifies that transactions that are already committed to the main memory may be garbage collected by the garbage collector through eliminating any reference to them. We will now discuss the algorithm properties Write values are marked as final only after the transaction has committed. The read-set in an executing transaction is only composed of final values that either exist in the transaction's own write, or the main memory or the lookup-table. An executing transaction dataset is not seen by other transactions executing in parallel as can be deduced from the algorithm description. With this, we have satisfied the isolation property necessary for a transaction's correct execution. The algorithm also satisfies the atomicity property, since in the rollback stage the dataset associated with the transaction's context is discarded and any changes that are bound to affect other transactions are annulled as well. Safe execution and read consistency is achieved in the validation process hence abiding by the optimistic approach in STM. Validating the read-set in the commit phase is to guarantee that the most recent updates are read in the committing transaction. Our algorithm overhead for safety yields concurrency when a transaction's read-set does not intersect with writesets of accumulated transactions. If they do intersect however, then this can be a source of a vast overhead degenerating the overall performance of a program. 23 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Using the global linked of write-sets allows these write-sets to be written to main memory without disrupting other transactions, as other transactions may continue with their commit phase. Evaluation and results We evaluated our algorithm on a 24 core machine and compared it to other existing algorithms in the framework: TL2 and LSA. For the evaluation part we used benchmarks from the Stamp package. This suit was designed for transactional memory research purposes. We evaluated our algorithm against TL2 and LSA in 2 benchmarks: Vacation which is a client/server travel reservation system and Labyrinth, a maze routing program. Results: Vacation Benchmark 300000 throughput 250000 200000 llstm 150000 LSA 100000 TL2 50000 0 0 10 20 30 Threads number figure 5 24 40 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Labryrith Benchmark 300000 throughput 250000 200000 llstm 150000 LSA 100000 TL2 50000 0 0 10 20 30 Threads number 40 figure 6 Analysis: As can be seen, we couldn’t achieve a better performance with our approach in comparison to the other two algorithms in the framework. Algorithms performance greatly depends on the executing program. The program's workload, the degree of interaction between the threads and the number of writer transactions determine how much an algorithm can improve its performance. For programs with many writer transactions, an optimistic approach wouldn't scale that well in comparison to algorithms that use a different approach. This is due to validation overhead and high number of rollbacks which results in lost work and reduced throughput. Our optimistic approach is suitable when updates are rare, conflicts rarely occur and transactions access a small amount of objects leading to less collision among the threads. Generally, in all three algorithms the throughput increases with the number of threads up to 16 threads, after that the number of threads nears the number of cores and the performance is degraded because the CPUs start spending more time in lost work due to the collisions among the transactions. 25 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Conclusion The essence of STM notion is to let non-conflicting threads execute in parallel without having to obstruct one another. As usual, there's no one silver bullet that can solve all the problems raised when writing parallel programs. Multi-threaded programs are very difficult to debug due to the lack of determinism in execution, as well as the concurrent execution of multiple threads. STM can also have an enormous overhead that degenerates its performance. It can be observed however that there several are advantages credited with STM that I hope I could convey in this book. Project benefits and knowledge acquired Although I have only had the opportunity to touch the surface of STM and there is still so much more to learn in its topics and concept, one of the huge advantages achieved in this project was the exposal to the software transactional memory paradigm. This subject is of utmost importance and a lot of research in academia revolves around it. It was interesting to learn the analogy between distributed systems algorithms and database transactions to Software transactional memory and observe similar concepts being applied and implemented in STM. Throughout the project, it can be observed that STM, even though mostly only still in its experimental stages, has the potential to delicately solve concurrency issues, therefore fulfilling the urgent need for parallel software applications on the ubiquitous multi-core machines. The other advantage gained in this project was the experimentation with the builtin multi-threading programming in java and learning the different concurrency control mechanisms achieved through the language. 26 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM Needless to say, this project has inspired me to pursue further courses in the subject in the near future. Appendix Deuce source code and documentation can be downloaded from: https://sites.google.com/site/deucestm/. If you want to write and plug in your own algorithm using the deuce framework, you create a package under org.deuce.transaction, then you create the class Context.java that implements the class org.deuce.transaction.Context. To run the Deuce agent jar from eclipse, type the following in the VM arguments box: javaagent:bin/deuceAgent.jar Dorg.deuce.transaction.contextClass=org.deuce.transaction.algorithmName.Cont ext. In Unix if for example we want to run the class jstamp.vacation.Vacation, we type the following: java -server –javaagent./bin/deuceAgent.jar -cp ./bin/classes/:. XX:+AggressiveHeap -verbose:gc Dorg.deuce.exclude="java.*,sun.*,org.eclipse.*" Dorg.deuce.transaction.contextClass=org.deuce.transaction. algorithmName.Context jstamp.vacation.Vacation -n 4 -q 60 -u 90 -r 16000 -t 419430 -c 24 27 SOFTWARE TRANSATIONAL MEMORY (STM) ALGORITHM References: 1. Noninvasive concurrency with Java STM Guy Korland and Nir Shavit and Pascal Felber2 2. Transactional Locking II Dave Dice, Ori Shalev and Nir Shavit 3. Beautiful concurrency Simon Peyton Jones 4. A Lazy Snapshot Algorithm with EagerValidation Torvald Riegel and Pascal Felber and Christof Fetzer1 5. Design Tradeoffs in Modern Software Transactional Memory Systems Virendra J. Marathe, William N. Scherer III, and Michael L. Scott 28