Software Transactional Memory Abstract

advertisement
Software
Nir
Transactional
Memory
Dan
Shavit*
MIT
Touitou
and
Tel-Aviv
Tel-Aviv
University
by
Abstract
University
constructing
classes
blockirag [7, 15, 14].
As we learn
from
chroni~ation
signing
highly
isting
of
a
literature,
is
flexibility
greatly
concurrent
hardware
Buildktg
inflexible
on
chronization
for
supporting
synchronization
can
of
be
Herlihy
provide
a general
Empirical
all
a k-word
evidence
chitectures
the
concurrent
shows
that
lock-free
translation
and
outperforms
Herlihy’s
numbers
of processors.
software-transactional
always
in
translation
based
the
and
to
a Load.
unlike
most
by
employing
operating
proposed
Barnes
style
protocols,
pol-
language
they
1
tency
Introduction
adding
are
able
using
protocol.
can
, and
or soon
on the
for
to
level
a single
algorithms
the
of
word,
imprac-
problem
on
machines
Herlihy
and
hardware
to
can
trans-
associative
to the
support
cache
a flexible
cache
consistency
transactional
operations.
be written
Moss
solution:
a specialized
changes
of provid-
existing
support.
synchronization
operation
executed
which
current
concurrent
primitives
minor
writing
chronization
[18]
operations
operation
system
By
several
for
of the
an ingenious
memory.
making
helping”
icy.
most
to overcome
programming
and
Heap
Corn pare&Swap
support
[7] suggested
efficient
actional
of our
lists
net-
Fetch&Complement
of
Rappoport’s
highly
on
counting
future.
ing
have
the
of
on two
compression
a three-word
of these
near
simpro-
a Corn pare&Swap
path
Linked/Store-Conditional
greatly
data-structures
combination
and
architectures
in the
[16]
for sufficiently
use
non-
flexibil-
concurrent
operation,
Unfortunately,
Bershad
of Barnes,
parallel
using
more.
operations
use
S pi ice
which
be developed
making
ar-
[2]
are
literature,
non-blocking
which
a special
that
the
non-blocking
the
[22]
implemented
tical
based
are
Pu
[5]
from
designing
Fetch&I nc, Israeli
translating
efficiency
“recursive
of
implementations
synchronization
of
Anderson’s
many
a
STM
and
uses
and
be
the
task
Examples
ones
style
to the
is that
on a costly
works
outperforms
method
key
approach
it is not
offer
multiprocessor
method
The
which
STM-transaction.
methods
large
methods,
our
for
the
words,
syn-
only
choosing
Massalin
software
use
lock-free
on simulated
the
we
using
method
compare&swap
collected
single
non-blocking,
We
to
grams.
programming
machines
implementations
on implementing
is
ex-
a
novel
in
plifies
level
Moss,
ity
de-
the
on
a
operation.
highly
object
of
transactional
and
STM
existing
Load.Linked/Store.Conditional
sequential
on
transactional
operations.
on
best
(STM),
flexible
implemented
at
based
memory
syn-
task
operation
hardware
transactional
method
is
itional
the
choosing
the
Unfortunately,
and
methodology
soflwar-e
in
simplifies
programs.
Load_Linked/Store_Cond
word.
of
the
operations
of
As we learn
Any
syn-
as a transaction
an optimistic
algorithm
built
into
Unfortunately
though,
this
the
transactional
and
the
consisis block-
solution
ing.
A major
chines
obstacle
widely
designing
Given
an
on the
highly
the
increasingly
serious
tention
for
to
to
number
gram
timing
highly
and
uses
they
and
by
limit
anomalies
of
(possibly
modern
means
and
critical
eliminating
make
critical
is
the
a multiprocessor
sections
its
transactional
supports
flexible
face
clear
f0cu5
memory
pro-
altogether)
tions
which
This
class
Though
on
access
we cannot
and
processor
failures.
most
of
in the
aim
known
the
memto todays
in the
transactional
that
sequence
the
for
resiliency
a software
transactions,
a pre-determined
primitives
that
transactional
of applicability
of
static
includes
chronization
and
introduce
design
of synchroniza-
machines,
implement.ati0n9
support
a novel
software
in terms
among
anomalies
that
(STM),
our
advantages
We
programming
software.
portability
approach,
implementation.
memory
performance,
of tirnhg
We
the
in
overall
has
to adopt
based
transactional
operations
machines,
The
proposes
hardware
sofiware
ory
system
decrease
paper
not
same
con-
failures.
to
im-
sections
increase
but
tion
for
of critical
processor
sections
is
multiprocessor
and
programming
delay
techniques
parallelism,
interconnect,
concurrent
size
in
in
structures.
unpredictable
conventional
objects
since
memory
vulnerable
key
problem
data
This
ma-
of programmers
and
that
that
concurrent
unsuitable,
multiprocessor
difficulty
programs
realization
we argue
plementing
to making
is the
concurrent
growing
architectures,
are
way
acceptable
and
is,
transac-
of locations.
proposed
syn-
literature.
Permission
to make d@al/hard
copies of all or part of this material
for
personal
or classroom
use is granted
without
fee provided
that the copies
a~e not made or distributed
for profit
or commercial
advantage,
the copyright notice,
the title of the publication
and ik date appear,
and notice is
given that copyright
is by permission
of the ACM,
Inc. To copy otherwise,
to republish,
to post on servers
or to redistribute
to lists, requires
specific
permission
PODC
and/or
95 Ottawa
1.1
CA
01995
ACM
0-89791-710-3/95/08.
of the
.$3.50
Author:
E-mail:
shanir@theory.
lcs. rnit .edu
operation
204
a nutshell
environment,
operations
clusively
lContact
in
In a non-faulty
fee.
Ontario
STM
is usually
ownerships
Op.
on the
If a transaction
the
way
based
to ensure
on locking
memory
cannot
locations
capture
the
atomicity
or acquiring
accessed
exby an
an ownerships
it fails,
and releases
erwise,
acquired.
To
deadlocks,
which
the
guarantee
for
ownerships
continue
certain
process
which
This
other
same
location
plete
its
tional
is
to
only
help
eral
has
single
transactions
key
order
to
of
attempting
is the
cooperative
this
location
in
free
if
ing
the
gives,
to
com-
one
release
we
using
cific
can
by
The
raon-
cooperative
other
One
can
use
method
into
to
STM
for
to
[6, 26].
The
memory
to
tions.
non-blocking
Herlihy,
in
sequential
objects
cording
done
ory,
to
his
by first
switching
the
the
help
into
it
to the
solution
Herlihy
for
large
current
updating.
Alemany
suggested
the
new
data
structure,
to
whole
improve
price
of loosing
support
making
not
object,
and
the
and
all that
portability,
standard
does
not
support
[4] and
with
as our
and
drawbacks
em-
its
spe-
which
are
method:
a
recursive
causes
access
structure
processes
a disjoint
part
using
of
to
help
of the
data
ever,
in many
cases
since
have
to
will
have
to first
P‘s
operation
help
to
On
[20]
b, help
sys-
Q,
will
then
Q,
likely
other
hand,
and
fail.
for
again
P,
Moreover,
it,
when
also
the
P
and
location
after
b will
only
HOW-
change
read
waiting
requesting
to
to
likely
help
P.
executes
the
P
restart.
only
the
P
transaction,
a and
help
find
if
STM
release
have
most
P,
processes
as an
not have
on system
help
and
not
already
processes
some
coopera-
a
in
any
P has
ac-
redundantly
P.
the
a will
the
the
pare&Swap
method
All
the
operation.
will
P
a 2-word
that
operation
own
P’s Compare&Swap
retry.
b, all
its
its
but
helped
Assume
to
co-
fail
executes
b.
Q complete
11 after
of
nevertheless
which
on
level
percentage
b. According
continues
Q changed
will
P
are
a and
owns
helps
b and
transaction
operations
they
locations
P first
acquires
quired
since
Q already
method,
Compare&Swap
the
a high
a process
on
process
case
con-
operating
assumptions
example
on
Compare&Swap
contention
for
k-word
fail
“helped,”
k-word
then
ap-
LaMarca
general
tive
not
processes.
memory
a suitable
the
by
a set of strong
is
operations.
provide
of this
Ac-
of mem-
like
Felten
efficiency
’s
tentatively
atomic
and
the
at
version
does
of locking
tem
new
thus
by other
Take
of
structure
block
are
Compare&Swap
Herlihy
ones.
allocated
structures
proach
as
transformation
a data
’s method
data
loca-
guarantees
the
has
mostly
generate
other
of Load_ Linked/Store_Conditional
Unfortunately,
a
Com-
desired
concurrent
a new
on
to
k-word
sequel
updating
into
changes
pointer
the
non-blocking
methodology,
changes
the
a general
method
transactional
which
operative
succeed.
in
offer
which
STM’S
and
of
transactional
of
implementation
to
to
copying
making
the
(referred
first
approach
as an atomic
always
general
frequently
processes
operations
implementations
caching
2) on
[19]
streamlined
Compare&Swap
However,
translation
method
which
Unlike
concurrent
use
Figure
STM
will
[15]
was the
the
collection
them
(see
transaction
method),
object
on
any
performing
transaction
The
highly
is straightforward:
implement
object,
some
based
approach
to
a general
and
implemen-
and
.
major
is done
Barnes
structure.
Translation
sequential
ones
pare&Swap
that
provide
translating
non-blocking
shared
Lock-free
this
Rappoport
a clean
k-word
two
based
and
needs
the lock-
by
call
“helping”
Sequential
STM
and
by Israeli
the
have
our
it helps
on specific
method,
both
approach
Though
are vague
achieving
a process
operation,
chain.
paper
to
caching
process
cooperative
suggest,
the
by another
own
executing
key
whenever
Prakash
implementation
overcome
redundant-helping.
1.2
in
of a non-blocking
results
update
method
its
a recent
the
the
The
Load-Linked/Store_Conditional
pirical
sev-
a location
which
details,
using
in
locks.
dependency
and
implementation
need
among
complete
the
the
behavior
locked
along
Shasha,
tation
transacone
to
involved
releasing
already
process
capture
the
and
recursively
the
Turek,
Moreover,
help
a location
to
out,
a location
policy
of [6, 26]
methodology,
coordination
to
helping
resilient
we must
feature
transaction.
overhead
a “reactive”
of
The
owner
the
non-blocking
even
to
locations
the
swapped
trying
the
eliminate
In order
a “helping”
owner
in
order.
key,
operation
by acquiring
delayed,
are
of their
the
environment,
been
Oth-
ownerships
first
completes
by
the
the
is done
a faulty
which
help
avoid
must
increasing
achieved
is that
its
employing
it
transaction.
approach
one
transaction
transactions
own
effectively
in
every
acquired.
frees
transactions
in some
executes
crashed.
forcing
static
liveness
that
already
Op and
liveness,
needed
ensuring
make
or
the ownerships
it succeeds in executing
Finally,
value
The
The
P.
will
if
to
processes
processes
Q hasn’t
of b in its
2-word
fail
own
Comacquire
waiting
for
waiting
for b will
changed
b, P will
cache.
behavior.
To overcome
in
[6],
the
A
whole
limitations
hk
object
similar
and
approach
Shasha,
and
“simulates”
vate
memory,
the
in the
stores
rest arts
the
the
cache
new
from
k-word
a location
but
writing
process
uses
atomic
operation
memory
update.
values
the
in the
beginning.
Read-Modify-write
disjoint
to
the
for
Barnes,
is done
an
this
memory.
Barnes
To make
ceptable,
Turek,
its
which
the
checks
is the
case,
Otherwise,
suggested
by locking
has
the
the
Results
techniques
Parallel
We
val-
when
(see
Section
found
blocking
process
method
periments
order
205
the
cited
translation
to reduce
the
system
5) the
stable
above.
We
on
use
a simulated
accessing
translation
and
the
show
that
acone
(non-faulty).
We
comparison
of
well
Alewife
machine,
the
method
cooperative
the
methods
overhead
the
of
translation
accepted
Proteus
[8, 9].
shared-memory
in
stable
experimental
conditions
Simulator
that
performance
is
first
under
Hardware
concurrency
to implement
needs
pay
distributed
value
operation
in ascending
sequential-to-non-blocking
performance
k-word
the
the
to
private
if the
to the
Empirical
one
present
pri-
is done
non-blocking
are equivalent
If
updating.
time
into
Our
by
in
first
1.3
copying
a process
updating
the
Barnes,
avoids
proposed
According
of
’s method,
that
independently
execution
the
Write
the
concurrent
memory
Then,
Read-Modifyin
allows
i.e reading
ues contained
read
method,
[26].
the
shared
memory.
of Iferiihy
caching
was
Prakash
first
from
the
introduced
object
[1] cache-coherent
as
the
grows,
outperforms
method.
in general
STM
potentiid
the
both
Unfortunately,
and
other
for
STM
non-
Herlihg’s
our
non-blocking
ex-
Dequeueo
BeginTransaction
DeletedItem
G Read_transactional(
Hesd)
if DeletedItem
Q Null
ReturnedValue
= Empty
else
Write-transactional(
Hesd,DeletedItem+
if DeletedItem-+Next
n Null
Write-transactional(
’Ml, Null)
ReturnedValue
n DeletedItem+Value
EndTransaction
end Dequeue
k.word_C&S(Size,
DataSetlJ,OldH,
for
i=l
to
if
Size
do
Read_transactional
techniques
are
methods
In
ible
A Non
inferior
such
architecture
I:
to
summary,
similar
STM
shared
and
improved
section
objects,
for
i=l
to
Size
mentation
and
Finally,
in
and
ReturnedValue
for
design
which
ensures
the
lock-based
a shared
package
software
transactional
standard
properties
in non-faulty
in
ones.
of the
our
concur-
faulty
The
our
A
begin
variant
is
by
of the
a finite
transactional
sequence
the
transactional
memory
of
local
and
memory,
of [16].
shared
A
a
- reads
a local
the
value
memory
of
machine
a shared
location
The
data
set of
a transaction
accessed
by the
structions.
Any
cessfully,
other
– stores
location.
in
of the
which
case
For
of a doubly
returns
A
dequeued
k-word
proposed
The
k-word
two
cessful
values
that
case,
memory
turns
A
item
and
New
the
1 may
transaction
as in
the
stores
a C&S-Success
are
data
set’s
transaction
Figure
set,
the
checks
New
value,
its
size.
equivalent
size
A suc-
a finite
implementation
the
old.
into
otherwise
In
try
the
the
set,
set
will
and
focus
on
supports
of the
in
the
in
Figure
known
literature.
Dequeue
that
could
2 is an
procedure
in
(but
not
in
one
whole
cannot
and
with
system.
An
be swapped
implemented
fail
forever,
if
out
transac-
order
However,
if pro(as when
used
only
can be made
since
the
the
is non-blocking
in different
list).
if
implies
successfully
if it
implementation
during
terminates
the
repeatedly
non-blocking,
swapped
same
terminate
hardware
linked
which
successfully
non-blocking
the
tolerant,
locations
their
is
will
theory
process
by a process
a process
The
two
if any
It
necessarily
times.
a doubly
Our
(1)
data
that
terminates
of attempts
transactions,
3
be
data
most
transaction
is swap
to write
updating
never
while
transaction)
assumption
of [16]
will
the
we
transaction
transaction
(not
number
many
tolerant
includes
of attempts.
process
after
static
paper
in
thus
a deterministic
the
operations
of some
dtierent
is repeatedly
whether
to
values
2 is
number
STM
tions
and
is wait-free
the
execution
some
cesses
(3)
memory
that
can
be stored
This
implementation
a possibly
under
transaction
and
as parameters
inputs
transaction,
executes
infinitely
successfully
value.
data
the
be performed
or an Empty
of the
to
~from
terminates
memory
transaction
returns
a value
as parameters
and
atomically
gets
a transactional
a class
of
a single
execution
for
swap-
process
which
of a transaction
successfully).
the
it
re-
Implementation
of
Static-STM
CtYS-Fadure.
sofiware
which
changes
action
the
in
visible
as in Figure
insuc-
form
1 is not.
a finite
that
locations
or complete
dequeuing
Compare&Swap
stored
are
transaction
gets
Old
k-word
the
ject
which
vectors
of shared
fail,
trans-
order.
advance,
should
Compare&Swap
STM
after
the
which
synchronization
of a static
in Figure
register
Write-transactional
either
changes
Compare&Swap
a transaction
and
its
list
the
of a local
set
and
may
example,
linked
If
the
is the
among
transaction
transaction.
of
transactions,
and
repeated
transaction
as a transaction.
it
contents
Read-transactional
processes.
head
the
on
values
in
which
based
new
output
An
a shared
following
sequen-
real-time
a special
of the
which
repeatedly
into
the
execute
order
their
is known
inputs
implementations
register.
Write.transactional
with
is
set
the
the
example
into
satisfy
to
sequential
of as a procedure
static.
transaction
instructions:
Read-transactional
appear
The
data
set (2)
returns
sofiware
should
interleaving.
transaction
the
thought
performance
Memory
presenting
without
is consistent
static
which
proof.
function
We
transactions
i.e.,
actions
imple-
correctness
empirical
memory
[13]:
Serializability:
runs
data
Transactional
Transaction
following
3 we describe
a sketch
tially,
evaluation.
2
2: A Static
bus
of flex-
of highly
resiliency
In Section
present
C&S-Success
EndTransaction
A tomicity:
software
5 we
(DataSet~],New~])
=
k_word_C&S
end
for
a novel
provide
Section
Results
offers
STM,
Old~]
do
Wrke_transactional
non-resilient
[23].
performance
introduces
#
C& S-l%lure
Exitl’ransacti.m
Next )
in flavor.
coordination-operation
=
Transaction
standard
as queue-locks
were
rent
Static
(DataSet~]])
ReturnedValue
Figure
Figure
NewD)
BeginTransacti.m
transactional
behaves
to its
addresses
is a thread
of primitive
operations
memory
like
a memory
by means
of control
to
that
memory.
(STM),
that
is a shared
supports
of transactions.
applies
Any
a finite
ob-
multiple
A
implement
Memory[M],
trans-
of
206
a non-blocking
a vector
transactional
sequence
implementation
We
memory,
termines
for
it.
process
Each
which
any
cell
static
cent ains
TM
the
Ownerships[M],
in Memorg[lll],
keeps
in
the
shared
of
size
data
a vector
which
memory
M
stored
using
in the
which
transaction
a record
deowns
with
StartTransaction(
input,DataSet
Initialize(Tranj
,input
Tranj
=
+ Stable
)
,DataSet
+
-.
Stable
=
return
tran,version)
tran+stat.s)
version,’lkue)
False
Tranj
+. Version++
if TTanj
+ Status
=
rrm,versiOn,IsInitiatOr)
AcquireOwnerships(
f;~ayi~~fl/l;h~j(
TransactiOn(Tranj,Tranj
Tranj
‘hansaction(t
)
True
if
(version
#
tran-+version)
SC(tran+status,
Success
(Success,
then
CalcOutput(Tranj
-+
OldValues,input))
if status
else
=
=
I?dure
Success
NewValues
return
O))
LL(tran+status)
then
AgreeOldValues(
return
then
(Success,
(status,failadd)
tran,version)
=
CalcNewValues(tran+
UpdateMemory(tran,
version)
ReleaseOwnerships(
tran,version)
ReleaseOwnerships(
tran,version)
OldValues,tran+
NewValues)
else
Figure
if IsInitiator
3: St artTransaction
then
failtran=
Ownerships[failadd]
if failtran
=
Nobody
then
return
the
following
set.
Addo
in
fields:
increasing
order.
Oldvatues~
Null
the
successful
tion
are
of the
its
– the
vector
involved
in
record
termines
cent ains
this
vector
to
every
process
and
time
which
the
PJ,
the
former
may
a
Figure
transac-
The
eventually
help
O, which
transaction.
process
terminates
determines
AcquireO
wnerships(tran,
transize
de-
This
for
address
P]
the
the
stable,
that
and
the
if
the
by
3. Transaction
the
processors
checks
record’s
will
After
the
called
by the
The
parameter
output
of the
Mmtiator,
from
read
never
version
by the
or by
the
parameter
during
ownership
the
If the
sets
the
new
it
old
the
a helping
that
the
owns
record
1 The
Validate
the
status
0).
values
into
the
of
operation
this
field
In
ReleaseO
case
for
record,
the
status
the
failure.
The
it
already
owns
it helps
Helping
the
i =
1 to
if LL(
size
(Null
, O) ) then
while
ion],
tran)
then
loop
(Failure,i)
) then
version)
do
tran+Add~]
O wnerships[locat
if tran+.
version
ion])
AgreeOldValues(
size
=
i =
=
version
#
SC(Ownerships[location],
tran
then
then
return
Nobody)
tran,version)
tran+si5e
1 to
field
location=
tries
if LL(tran+
size
do
tran+.Add~]
OldValues[locat
if tran+version
#
ion])
version
#
then
Null
then
return
SC(tran+OldValues[location],Memory[location])
UpdateMemory(tran,
si5e
the
for
process
field
and,
the
in
transaction
size
do
LL(Memory[location])
AllWritten
#
if oldvalue#
then
if
case
(not
then
newvalues~]
LL(tran+
if version
return
tran+version
SC(Memory[location],
re-
newvalues)
tran+Add~]
if version
newvalues~])
AllWritten))
#
return
then
tran+version
then
then
return
SC(tran+AllWritten,True)
which
only
1 to
if tran+
contains
first
i =
version,
tran+size
oldvalue=
and
process
=
location=
calculates
memory
is performed
loop
tran+si5e
location=
(Fail-
yet,
the
to the
while
returning
a vzlue
of success
exit
SC( O wnerships[locat
wnerships(tran,
=
calling
be set to
have
Otherwise,
process,
upon
will
doesn’t
them
that
location.
a stable
so then
writes
caused
a helping
do
return
Add~]])
return
rou-
first
by
then
return
then
if SC(tran+status,
of the
version
locations
transaction’s
to be stored,
failing
use
the
(Success,
ownerships
is in
to
the
then
SC(tran+status,
if
and
the
Transaction,
set’s
fails
field
that
it is not
data
Nobody
process.
when
Null
version
else
and
number
used
since
call.
the
status
ownerships.
location
leases
to
values
releases
the
it
AquireOwnership,
zme, fadadd).
the
on
If
process
tran
Transaction
instance
is not
=
suc-
executed,
#
=
the
input
whether
process
initiating
AquireOwnership.
writes
#
(Ownerships[tran+
exit
as parameters
transaction
contains
This
change
acquire
process
LL
if owner
if
for
from
tran+.add~]
if owner
a con-
has
the
4), gets
indicating
initiating
executedl
is crdled
to
=
as
executing
transaction
(Figure
address
was
will
=
first
record
size
Transaction
value
tine
location
owner
vector.
a boolean
record
do
do
if LL(tran+status)
of
a transaction
declares
transaction.
process
procedure
tran,
of
then
helping
the
if so calculates
Oid Vaiues
The
record
any
of
the
execution
ion rout ine of Figure
description
ceeded,
the
process’s
ensuring
transaction
the
initiates
Transact
initializes
sistent
size
true
if tran-+version
process
calling
version)
tran+size
1 to
while
record.
A
=
i =
field
a transac-
the
4: Transaction
other
owner
of the
Tran3
of
the
initially
failtran+version
values
of the
input.
=
if failtran+stable
to
case
between
an integer,
failversion
TransactiOn(failtran,failversiOn,Fslse)
initialized
In
the
data
addresses
transaction.
are
output
synchronize
number
tion
every
The
processes
instance
is incremented
For
vector
set
the
cells
this
locations.
the
of
transaction.
Version–
the
input
which
size of the
data
every
order
and
the
the
of
from
used
transactions:
its
Input
transaction
is calculated
contains
cent sins
beginning
in the
fields
which
which
a consensus
at
stored
Size
– a vector
else
if the
state.
unbounded
is available
field
[18,
can
be
avoided
if
an
additional
Figure
19].
207
5: Ownerships
and
Memory
access
input)
Since
by
AcquireOwnerships
the
that
initiator
(1)
same
all
processes
locations
from
the
moment
tion.
The
but
which
reads
ership
on
a free
it,
undecided.
(Nu1l,O)
read
in the
the
action,
All
have
the
before
past
to
not
Any
only
prevents
acquiring
by
the
To
of T is the
ownership
Claim
4.2
the
process
Proof:
I )
which
ecuting
trans-
location
UpdateMemory
in
order
to
after
the
so every
process
updating
the
wrote
location
A failing
Formallyj
following
actional
[21],
memory
static
for
transactions
k types
TranJ
as
(Sketch
than
Return,
..n.
In
set to Failure.
owner-
failing
location
the
acquired
and
cess should
sets
memory
failing
of a static
that
supports
described
as an
(DataSet)
and
(FinalStatus,
k types
trans-
where
implementation,
transaction
record
of the
version
started
the
any
tran,
field).
the
initiator
Transaction
of
to
be
helping
of T
T.
All
with
the
an instance
Therefore,
execution
tine
and
transaction
(which
the
owns
k and
T is related
to
processes
parameters
helping
are the
The
implementation
record
which
of T.
executing
which
tran)
execute
The
as
rouare
initiator
processes
a
content
(tran,version,False),
processes
processes
(the
process
the
4.1
is
atomic
and
of T.
owned
P has
of thk
and
lemma
instruction
should
All
the
4.3
same
data
Any
set vector
executing
set will
processes
serializ-
not
process
be
able
T
of
to
was
stored
which
update
any
T read
by T“s
read
.
of
the
the
the
saw
the
if P has
failing
pro-
belongs
itself.
before
failing
transac-
the
to T.
But,
in
failing
process
pro-
saw
the
executed
the
Store-Conditions
therefore
the
Store-Conditions
loI
I
■
has failed.
the
initiator.
diRerent
data
shared
data
) Assume
nates
successfully.
failures
is finite.
in the
computation,
for
the
same
on
tries
such
By
dresses
higher
have
Claim
4.2
are
the
an
those
there
of
failing
the
of
has failed
there
are infinitely
on
A
many
but
failed
to the
con-
initiator
which
have
one
and
ownership
A – a contradiction
than
lo-
every
is at least
often,
transactions
the
is completed,
number
Since
that
if
Since
transaction
transaction
acquired
pro-
location
only
often.
infinitely
the
on,
Ac-
same
happen
location.
point
in the
are several
of the
that
fail
of transaction
“stuck”
infinite
that
termi-
some
there
may
it follows
which
number
case
implies
to help
retrying,
transaction
infinitely
be
processes
highest
failed.
this
contradiction
no
ownership
when
must
in turn
the
transaction
the
only
transaction
before
In
of
if from
processes
released
there
which
the
only
Thk
and
This
A,
the
transaction.
that
way
which
happens
acquire
is released
sider
that
to
is squired
follows
Assume
all
try
by
in
routine.
which
is non-blocking.
schedule
This
quireOwnerships
cesses
implementation
(Sketch
is an infinite
it
is based on the
of a transaction
which
since
P has
the
location
free
in
was
Now,
then
ex-
status
process
location
failing
an
a higher
invariant
the
that
location
the
first
failing
failing
ownership,
The
there
the
executing
the
be
on
transaction.
location
and
before
on the
him
1.
the
the
acquired
instruction
location
The proof
that
P
before
confirmed
on a higher
P saw the
occupied
transactions.
of proofi
invariants:
to
location.
Let
By
by another
seen
since
case,
cation
able.
Sketch
following
failed
owns its fail-
ownership
ownership
before
that
contrary.
location.
that
P has
Therefore
location
Lemma
never
its failing
an
Therefore
have
Lemma
J C 1...
the
process
exe-
fai~ing
with
actions:
number
we define
The
k different
automaton
of output
Output)
status.
T will
the
is undefined
ownership
Proof:
our
defined
be
define
T, is the
failing
that
acquired
P acquired
status
actions:
Request,
Zel.
n processes
the
location
which
the
4.1,
cation
specification
that
) Assume
process
tion’s
Outline
the
can
of input
TTanJ
the
Proof
to T’s
transaction,
or a higheT
prevent
ownerships.
Correctness
Failure
we first
on it.
cess saw it occupied
4
property
transaction
is still
a different
memory
do
after
location
acquire
Lemma
to
which
non-blocking
of a failing
own-
status
any
the
process
process
ing location
process
Store.-Conditiona
synchronize
updating
the
(with
(2)
to prove
failing
transac-
proving
before
as owned
to be True,
releasing
cuting
the
the
becomes
that
transaction
values
released.
field
on
Failure.
new
from
the
This
processes
been
Written
the
status
5, the
process
ships
that
for
for
have
by writing
that
between
transaction
allowed
property.
will
field.
in
writing
a slow
location
status
version
In order
ensure
instructions)
of the
are
non-blocking
is done
location
Figure
status
either
ownership
the
is essential
confirm
to set the
When
in
to
This
acquire
by checking
the
the
be called
we must
Store.Conditional
property
also
5 may
processes
to
ownerships
second
atomicity
the
that
no additional
try
is done
and
fixed,
helping
will
(this
Load.Linked
the
of Figure
or by the
fact
have
on
ad-
A is
that
■
highest.
structures.
To
2.
All
the
executing
processes
acquire
ownership
All
the
ownerships
the
version
field
after
of
of a transaction
the
the
owned
by
T’s
record
status
T
will
T will
be released
is incremented
gorithm
never
of T has been
set.
T’s
will
All
the
T will
executing
update
the
processes
memory
of a successful
before
T’s
transaction
AllWritten
field
is
set to True.
208
only
helping
increase
of the
or
decreases
“redundant
the
helping
and
the
the
avoid
al-
as much
when
a failing
process.
Such
help-
consequently,
will
cause
In
interval
it
any
implementa-
ownerships
helped.
helps”
occur,
S’rM
occurs
non-faulty
release
if not
must
In
contention
to
released
no failures
paradigm
helping.”
another
process
have
when
redundant
“helps”
cess increases
function
on the
above,
helped
would
overheads
“redundant
given
the
3.
based
transaction
ing
initiator.
major
as possible
tion
before
by
avoid
our
later
then
algorithm,
between
discovered.
it
a prohelps
as a
5
An
Empirical
tion
5.1
Evaluation
of
Transla-
no
Doubly
Methodology
We
compared
methods
ing
Colbrook
and
without
and
[8].
Our
2048
contention
of switching
software
at
Dellarocas,
architecture
was
MIT
[1].
of 6 bytes
and
4 cycles
or wiring
in
in the
us-
Brewer,
distributed-memory
lines
cost
other
architectures
by
network
development
with
and
network
developed
cache-coherent
under
a cache
cost
Weihl
of STIVI
bus
simulator
Alewife
currently
performance
64 processor
Proteus
of the
had
the
on
the
both
the
array
cent ain
item
in
version
of
used
a
slightly
respectively.
an item
processor
Corn pare&Swap
stamp.
2
version
operation
where
On
pare&Swap
may
existing
ples
of enqueue/dequeue
1
tial
be
lock-free
We
ous
used
methods
This
when
the
the
serve
as
64
bits
the
by
using
the
Alpha
a
shared
64
size
of
the
for
evaluating
structures.
data
structure
We
bits
Each
of n processes
10000/n
times.
In this
change
the whole
object
increments
variThe
and
a shared
benchmark
state,
updates
the
counter
in
and have no built
A resource
a few processes
to time
share
a process
tries
in par-
allocation
scenario
[10]:
a set of resources
and from
time
to atomically
acquire
a subset
size s of those resources.
This is the typical
of a well designed distributed
data structure.
of space we show only
cesses atomically
locations
increment
chosen
length
60.
highly
the benchmark
have
5000/n
uniformly
The
at random
benchmark
concurrent
times
queue
captures
and counter
the
transaction
n.
We used
t ation
[11].
a variant
In
consequently
dequeues
heap
this
of
used
the
lier
and
with
the
is probably
greatest
the
this
built
directly
cost
a memory
we
the
believe
the
theoretical
tations
of
[17]
do not
and
the
3 The
spurious
value
empty
and
most
trying
operation
Load-Linked/Store-Conditional
have
a random
s =
2
a vector
of
the
behavior
of
Proteus),
while
access
to
non-blocking
failures
the
it
efficient
15,
shared
the
18]
non-blocking
raeli
and
four
1.
(which
wont.
Alpha
between
[12]
2.
there
the
will
be achieved
only
if the
size
of
ia rela-
size.
we
to
queue-lock
in
the
STM
processes
data
do
set before
value
which
compare
says
STM
methods
[23]
include
solution
(the
method.
backoff
manner).
Method
Compare&Swap
cooperative
and
based
All
the
ear-
exclusive
Herlihy’s
to
described
based
a mutually
k-word
the
Is-
imple-
non-blocking
[3] to reduce
contention.
leads us to conclude
differentiating
among
parallelism:
do
not
process
at
The
the
that
there
performance
joint
parts
The
price
the
a time
is
allow
oj
are
of the
data
update
it
to the
is a least
the
private
pointer).
when
the
only
the
data
the
to
coopera-
access
Hedihy’s
dis-
object
is such
the
process
number
copy,
a failing
and
the
that
the
reading
nature
almost
updates
methods,
locations
of
(reading
Fortunately,
caching
lock-free
accesses
of the
to
In both
In
size
protocols
performed
are local.
and
processes
memory
the
coherence
cached
to
update:
of
copying
Herlihy’s
and
structure.
number
in is at least
writing
and
parallelism
allowed
concurrent
a jailing
the
cache
locking
potential
of the
and
failure
Both
exploit
software-transactional
methods
copy
However,
in
a boolean
MCS
methods
Potential
cesses
will
general
is that
stored
on
transaction
the
methods:
and
could
of
benchmarks
to be presented
factors
the
its
the
private
price
accessed
of
all ac-
of a
during
execution.
PowerPC
Load_Linked
3.
operations.
property
the
Results
object
world
then
implemenor
object
translation
of the
update
is
value
use exponential
method,
theoretical
the real
existing
interfering
oMl_J or not.
Rappoport’s
tive
This
Compare&Swap
to
since
memory
The
than
Store_Conditional
as on
times.
since
a failing
is closer
and
size is n.
benchmark
[6,
a heap
5000/n
without
simplification
is accessed
method
n processes
in
maximsl
in
a failing
Load-Linked/Store-Conditional
allow
its
since
64-bit
Com pare&Swap
Load&inked/Store.Conditional
Store-Conditional
from
less
proposed
into
access
is
value
queue
software
a blocking
The data
from
of the
parallelism
Compare&Swap
only
to
above
structure
The
three
has n pro-
The
started,
nonblocking
5.2
k-word
on the
We
heap implemen-
each
couof ini-
enqueues/dequeues
as specialization
above.
two
implementations
of a sequential
up-
of the
a queue
to the
ia equal
queue on a heap of size
benchmark
enqueues
is initially
2 Naturally
priority
index
limited
empty,
by
dequeues
5000/n
compared
2)
structure.
A shared
the
on
supports
of the
a high
item
executes
operations
which
element
and
enqueue/dequeue
value
one
Queue
process
of
Every
array,
index
head to contain
ia not
list.
a new
item’s
aa
cells
next
in each
if the
as in [24, 25].
Priority
the
and
locations
to agree
methods
behavior
For lack
which
of the
in
enqueues
benchmark
the
given
mentation
Allocation
tail
two
of processes,
Figure
not
data
are short,
allelism.
Resource
of a queue
number
implemented
(given
Ber-
of parallelism,
Counting
data
Com -
or using
data
small
of the
first
previous
Each
queue
For
updated
tively
a time
list.
tail/head
other.
the
64-bit
scheme
benchmarks
implementing
in
bits
a
the
each
we
[7].
synthetic
for
vary
amount
size n.
update
support
supports
as on the
methodology
four
methods
32
implemented
Load-Linked/Store_Conditional
shad’s
not
in the
the
new
the
The
Instead
machines
by updating
size
The
of cells
the
the
architectures.
was
and
process
contain
item
does
that
of
next
architecture
array.
is a couple
Each
tadto
the
implementation
an
head
index
access
instructions.
modified
list
the
in
the
and
n.
An
list
a memory
Alewife
Proteus
Load_Linked/Store.Conditional
the
dating
with
Queue
represent
that
concurrency
linked
machine
Each
Linked
since
current
for
increases
a doubly
cycle/packet.
The
potential
structure
Methods
number
of
is finite.
The
amount
of helping
ists
only
the
erative
209
in
methods.
by other
processes:
software-transactional
In
the
cooperative
Helping
and
the
implementation,
excoop-
12000,
~
1
1
8000
.-.
------
1
10
20
-------
------
30
50
k-word
ations
only
that
by all
the
and
so on...
the
locations.
mance
factor,
terminate
the
The
results
6.
and
the
vertical
there
architecture,
higher
the
number
concurrently,
sors,
the
the
number
of
priority
queue
and
accessed
most
linearly
word
updatea
k-word
on
the
given
in Fig-
of processors
achieved.
to
the
give
This
since
size
.
method
the
of the
On the
bus
significantly
the
update
with
In
the
queue
the
STM
Compare&Swap
5.3
-------
------
30
-------
40
50
60
local
work
can
performance
7,
the
STM
number
declines,
as the
be performed
of
a certain
of
causing
that
Every
theoretical
implemented
parison
them
increases
smaller
too.
im-
and
than
better
the
size
than
not
the
methods)
(in
STM).
of
the
allow
chose
doubly
linked
it
is
limited:
the
paid
usually
method,
priority
object.
transactions
most
for
low
two
in
it
should
ran
mark
are
results
given
inherent
As
in
processes
method
a
the
failed
number
granularity
implies
may
of
of
that
of
performs
update
the
up
to
the
the
price
Table
~tkc~~~~thod.
in all
remote
Israeli
of
disjoint
and
10.
In
since
of a 2-word
the
this
operations
times
of the
advantage
should
the
the
high
bench-
throughqueue
number
and
that
to Israeli
priority
in Israeli
number
give
highest
sequential
is the
priority
queue
provides
and
regular
priority
algorithm
STM
for
spite
the
the
highlights
1, where
entries
in all benchmarks
for
the
counter
of
are
of faihng
Rappoport
of successful
STM
and
the
other
the
k-word
pure
bench-
throughput
ratio
outperforms
the
coop-
outperforms
Herlihy’s
benchmark.
protwo-
4 In
of
210
tion
fact,
since
using
it
avoids
3-word
freezzng
Compare&Swap
[1S]
nodes
simplifies
the
a
pro-
Rappoport
different
instead
aa for
As can be seen,
method
except
2.5
helping.
execution
slightly
concurrent
of the
reason
We
a concurrent
4.
in Figure
counter
Com-
.
summarize
in
is
all
which
recursive
the
operation
of the
method,
the
k-word
operation.
on
to
ation
Compare&Swap
erative
grows
the
We
marks
advantage
benchmark
structure
Rappoport
put.
based
another
operation
same
(in
policy
for
during
Compare&Swap
the
The
is
use
perfor-
backoff
supported
helps
an
the
implementation
algorithm
it
implement
without
cooperative
a process
give
Our
Store-Conditional
is
since
one should
we compare
a specific
a software
[18],
ways
com-
non-redundant-helping
Rappoport’s
whenever
method:
counter
such
and
in many
to get a fair
methods
the
for
Compare&Swap
k-word
There
the
the
compare
STM
queue
method.
In order
methods
without
also
needs
Israeli
benchmarks,
method
results.
than
Herlihy’s
twice
the
queue
object
at
queue.
penalty
size:
the
though
methods
be improved
Therefore,
non-blocking
with
Therefore,
and
the
non-blocking
can
form.
and
Compare&Swap
acceasing
levels,
Test-and-Test-smd-Set.
the non-blocking
We
explicitly
queue.
number
Herlihy’s
concurrency
in practice.
purest
of all the
the
We
the
increases,
Still,
all
the
of
method
between
in their
uses a 3-word
a grow-
does
of processors
in
than
comparison
when
proces-
conflicts,
structure
number
A
cess,
Figure
constant
higher
only
k-word
methods.
Compare&Swap
is
in
because
1
20
is still
pare&Swap
level.
though
8
10
remains
work
the
perfor-
methods,
benchmark
performs
9 contains
concurrently
cesses.
are
parallelism
parallelism
accessed
concurrency
benchmark,
poorly
need
fails
number
based
caching
is a data
STM
concurrency
a failure
local
the
Iocations
the
Figure
more
the
for
the
as the
of locations
Therefore,
in
Com-
as failing
is equivalent
beyond
k-word
k-word
degrades.
concurrency,
number
thus
them
unsuccessful
throughput
increases,
for
of
throughput
the
the
bus,
potential
----a --- . . . . . ..a
Benchmark
mance
caching
allocation
and
On
a...
but
helping
that
not
Herlihy’s
than
of processors
proves.
A
and
oper-
is a crucial
benchmark
memory
resource
0
are
ones,
a transaction
shows
is no potential
throughput
In
---”.
e
0
helped.
shows
to the
locking
this
of the
when
axis
of updated
and
operations
and
counting
axia
ia cruel
amount
object
ing
the
an
transactions,
it is not
horizontal
benchmark
by
most
, and
for
The
L
concurrently,
are in turn
method,
only
as ~ailirzg
failing
locations
STM
in STM
location,
J
‘-%n-a-
1000
6: Counting
Compare&Swap
that
Moreover,
Compare&Swap
ure
k-word
same
is helped
same
first
the
operations
In
pare&Swap
, including
by
acceas
also
g
w
Q
-+-
Processors
Compare&Swap
not
2000
1
1
“Q..
60
Figure
helped
method
..
g
.g
1
tter+iiay!s,.rnethod ❑ ~
QUEUE spin KYcii““”X--
+------i-... _-
_____
40
1
.. . . . .. . gyera~ve
Q.....
0
+-------
1
1
STM +
““” -X
m
e
1
1
x
“n.....
o
~
. .. ..%.. ---
0
0
6000
1
STM +
-a
g
U3
:
1
Xk30perutiv.3method =@
....
Hertihy’s method -B-.... QUEUE spin lock -xQ....
...
....
‘El..
....
..
~
10000
Alewife
BUS
#
..... 1
...
❑.
implementa-
it
,7,
Alewife
BUS
12000
c
12000
1
10000
s
0
is
u!
:0
8000
6000
b
n
!4?
.g
p
~.\
-. %- ---..:’..*
x---
-. ,x.- ...
-. -...
4000
E
..X
/’...+------+-------+’...
1
x
+
‘%..~
-’-+ ------
-+
t
o~
o
@. . . . . . ..+ . . . . . . ..m . . . . . ..m..-..
-m
10
50
20
30
40
I
dl
I
0
10
o!
60
.. .
“““““~ . . .. .. x.. .,
........
20
7: Resource
Allocation
,
1
-1
~7.....
,
m
g
c1
(0
to
“ “’”’-% .. .
.-x..
.x
2000
/
x,,
,$
‘,.
..
,
.,
10
20
30
40
50
x ....,,,
““’-x....
%
la
g
.2
p
1000
g
500
--%- . .. .... ~. ..,.,
-’m.
1
,
1
0
60
8: Priority
Queue
Acknowledgments
10
20
Herlihy
Scale
ings
The
MIT
Alewife
Distributed-Memory
of
Workshop
processors,
tended
Kluwer
version
publication,
Scalable
Academic
of
and
Machine:
Multiprocessor.
on
this
Shared
haa
In
ProceedMulti-
1991.
been
as MIT/LCS
E. W.
Blocking
Synchronization
cessors.
In
pression.
Parallel
Primitives
Proceeding
for
of
Algorithms
and
An
submitted
Memo
the
Asynchronous
Jth
ACM
works.
for
[6] G.
T.E.
M.P.
pp.
Anderson.
for
The
shared
performance
memory
List
pagea
of
Performance
oj
Iesues
1 Ith
ACM
in
Non-
MultiproSymposium
Computation,
spin
multiprocessors.
Pages
on
1?25-134
and
ACM,
N.
Shavit.
Vol.
Counting
41, No.
Net-
5 (September
Method
Structures
on
for
In
Parallel
Implementing
Proceedings
Lock-Free
of the
Algorithms
and
5th
ACM
Architectures
1993.
Comon
199-208,
lock
Herlihy,
of the
A
Data
[7] B,N
Bershad.
current
Carnegie
ternatives
Systems,
1020-1048.
Barnes
Shared
1992.
[3]
Distributed
on Shared-Memory
Proceedings
Journal
1994),
ex-
TM-454,
Symposium
Architectures,
and
1992.
Symposium
R. J. Anderson.
l?elten
of Distributed
[5] J. Aspnes,
1991.
[2]
60
A Large-
Memory
Publishers,
paper
appears
50
1990.
[4] J. Alemany,
August
et al.
on Parallel
January
Principles
A. Agarwal
40
for their
comments.
References
[1]
1
30
Transaction
1(1):6-16,
and Maurice
..
. ... ..
--.-. -.:-----y
----------------
Benchmark
IEEE
Greg Barnes
x
Processors
Figure
helpful
I
.,
1500
Processors
We wish to thank
I
I
Cooperatie method -i--Herlih ‘smethod -D-QUEU~ spin lock -x--
y
2000
r)
many
60
1
+------+
6
50
STM +
‘“”,-x.
2500
o
40
........m
Benchmark
2500
1
Cooperative method -+-Herlih ‘s method .n-QUEU~spinlock X-
..
3000
z
a
........
STM +
....
1
Alewife
1
,
3500
%
W
:
c1
........
30
BUS
$
3
-x- -... -x .... .... .. -x
Processors
Figure
,
x,
/+
{
la . . . . . . . .
Processors
4000 I
.%.-’-
x . ..
1
2000
0
.. .
.,-”
alIn
211
[6] E.A.
E.
consideration
Technical
Mellon
University.
Brewer
Weihl.
Practical
objects.
C.N.
Proteus:
A.
lock-free
CMU-CS-91-
September
Dellarocas,
A
for
Report,
183,
1991.
Colbrook,
High-Performance
con-
and
W.
Parallel-
~r-iiiza
=1====1
5000
2000
x ... ... .
6000
4000
x.-.
+4-------
--.)+-..
--------
t
i
t
i
~i
‘~
20
30
40
Processors
50
600
,
BUS
,
,
550 -
-
350
-
300
-
250
-
200
-
Linked
Queue
‘.
‘.
‘.
50
60
1
1
Benchmark
Alewife
,
,
,
,
350
200 i
‘.,
‘.+
....
x’:
‘k . .
.. ..
w.-.
150
. ..
%..
%-.
-’--.+-
0
I
1
I
10
20
50
Figure
-
- ----
10:
Simulator.
+--.-.._.-+
I
1
1
30
40
Processors
Non-blocking
50
C.N.
Dellarocaa.
Proteus.
*------
o
ations
of Israeli
Documen-
10
20
-+
30
40
50
60
Processors
& Rappoport
[15] M.
September
User
----
,0 ~
60
implement
MIT/LCS/TR-516.
+..y
100
1989.
Brewer
30
40
Processors
250
‘.
100 -
E.A.
20
300
*
150 -
[9]
10
400
,
,
STM -e-- -
Cooperative mefhod -+--
500 -
r
0
60
9: Doubly
-+-.
+.-...
Q. -“-+..
‘- .. .
----- +------t--------w . . . . ..
--EK......P. . . . . . ..+. -...--..-.$
,
1
0
10
...@oWqye.rnehod
e
1000 -
Figure
400
-
500 -
o
450
.-x..
1500 -
i
t
Alewife
-x
t
3000
Architecture
--------
‘a Priority
Herlihy.
A
Queue
methodology
concurrent
data
gramming
Languages
November
1993.
for
implementing
ACM
objects.
and
highly
Transactions
on
15(9):
Systems,
Pro-
745–77o,
t ation.
[16] M.
[10]
K.
Chandy
Problem.
guages
[11]
T.H.
and
InA
and
to
CM
The
Drinking
Transaction
on
6(4):632-646,
Systems,
Cormen,
duction
J. Misra.
C .E.
Leiserson
algorithms.
MIT
Programming
October
and
R.L.
In
Lan-
20th
pages
1984.
Rivest.
Herlihy
and
Architectural
Philosopher
[17] IBM.
Intro-
Press.
Annual
289-5’00,
Power
[18] A. Israeli
DEC.
[13]
M.
Alpha
Herlihy
ness
system
and
condition
action
pages
J.M.
for
reference
manual.
Wing.
Linear&ability:
concurrent
on Programming
463-492,
July
objecte.
Languages
and
M.
Herlihy.
action
pages
on Programming
124-149,
January
Languages
and
on
Notes
Verlag,
1-17.
pages
Memory:
Data
Structures.
Computer
Architecture,
1993.
PC. Reference
199.5’. Lecture
In
ACM
Trans-
Systems,
12(3),
[19]
A.
Israeli
and
In A CM
Trans-
Systems,
13(1),
L.
Implementatione
the
Synchronization.
Symposium
May
Transactional
Lock-Free
manual.
in
Efficient
Priority
Wait
Free Imple-
Queue.
Computer
Science
In
725,
WDA
G
Springer
A correct-
1990.
Wait-Free
Moss.
for
of a Concurrent
13th
[20]
A.
LaMarca.
Synchronization
1991.
212
Rappoport.
of
ACM
Computing
[14]
B.
and L. Rappoport.
mentation
[12]
J.E
Support
Strong
Symposium
pages
A
Disjoint-Access-Parallel
Shared
on
Memory
Principles
Proc.
of
of Distributed
151-160.
Performance
Protocols.
Evaluation
Proc.
of the 13th
of Lock-Free
A CM
Sym-
Throughput
ratio of
STMf
Counter
Doubly
linked
queue
queue
Table
posiwn
on
Principles
0.34
0.30
6.07
2.44
22.5
24.14
0.42
0.41
Bus
Alewife
Bus
Alewife
Bus
Alewife
BUS
Alewife
Resource Allocation
Priority
10 processors
Herlihy’s
Cooperative
method
method
other
1: Pure
implementation
of Distributed
throughput
Computing,
pages
130-140.
[21]
N. Lynch
and M. Tuttle.
for Distributed
Symposium
Pages
[22]
on
kernel.
versit y. Mars
[23]
J.M.
[24]
L. Rudolph,
chines.
[25]
of
and
Support
Systems,
Allocation
the
3rd
Interna-
for
Program-
April
1991.
A Simple
Load
in Parallel
Ma-
Symposium
on
ACM
pages
Architectures,
and A. Zemach.
the
Annual
Architectures
D. Shasha
Making
1992
Principle
Touitou.
posal.
and
Lock
237-245,
Non-blocking.
Lock-Free
University
Trees. In ProceedParallel
Locking
Concurrent
In
Systems
without
Data
Struc-
Proceedings
pages
Programming:
April
Algorithms
1994.
S. Prakash.
Based
of Database
Tel Aviv
on
June
(SPAA),
J. Turek
Algorithms
Diffracting
Symposium
blocking:
D.
Synchronization
of the 4th
and E. Upfal.
for Task
Algorithms
of
ture
[27]
Uni-
1991.
ings
[26]
OS
Columbla
Scott
Operating
M. Slivkin,
Scheme
N. Shavit
and
and
In Proceedings
Parallel
July
and M.L.
In Proceedings
on Architecture
Languages
Balancing
multiprocessor
CUCS-005-91.
1991.
Conference
ming
A lock-free
Report
Contention.
tional
ACM
Computation,
1987.
Mellor-Crummey
without
Proofs
of 6th
of Distributed
and C. Pu.
Technical
Correctness
In Proceedings
Principles
August
137-151
H. Massalin
Hierarchical
Algorithm.
of
the
212-222.
A Thesis
Pro-
1993.
213
0.74
0.45
58.9
12.9
85.61
59.8
2.8
1.1
1.98
1.92
1.44
1.75
1.09
1.12
1.26
1.27
ratio:
60 processors
Herlihy’s
Cooperative
method
method
STM
/ other
8.44
7.6
3.36
7.28
1.69
2.35
2.16
2.24
methods
Download