slides - University of Pennsylvania

advertisement
Spin Locks and Contention
Companion slides for
The Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
Modified by Rajeev Alur
for CIS 640, University of Pennsylvania
Muddy children’s puzzle
(Common Knowledge)
• A group of kids are playing. A stranger walks by
and announces “Some of you have mud on your
forehead
• Each kid can see everyone else’s forehead, but
not his/her own (and they don’t talk to one
another)
• Stranger says “Raise your hand if you conclude
that you have mud on your forehead”. Nobody
does.
• Stranger keeps on repeating the statement.
• If k kids have muddy foreheads, then exactly
these k kids raise their hands after the
stranger repeats the statement exactly k times
Art of Multiprocessor
Programming
2
Muddy children’s puzzle
Why does this happen?
• For every k:
– If >=k kids have muddy foreheads, then in
the first k-1 rounds nobody raises hands
– If k kids have muddy foreheads, then in the
k-th round, exactly muddy kids raise their
hands
• This claim can be proved by induction on k
– Base case k=1
– Inductive case (assume for k, and prove for
k+1)
Art of Multiprocessor
Programming
3
What is the role of stranger’s statement?
• Let p stand for “> 0 kids have muddy foreheads”
• Assuming >1 kids are muddy, stranger
announcing p does not add to anyone’s
information
• However, without stranger’s announcement,
nobody will ever raise their hands
• So what’s going on
• Well, the base case for our proof fails, but
exactly what information do kids acquire from
the stranger’s announcement?
Art of Multiprocessor
Programming
4
Common Knowledge
•
•
•
•
E p : Everybody knows p
E E p: Everybody knows that everybody knows p
Ek p : defined similarly (k repetitions)
C p : p is “common knowledge”: limit of
Everybody knows that everybody knows ….
• For k =2, each kid knows p, but not Ep, and
after stranger’s announcement, each kid knows
Ep
• If k kids are muddy, before announcement,
each kid knows Ek-1 p, but not Ek p
• Stranger makes p the common knowledge
Art of Multiprocessor
Programming
5
Mutual Exclusion
Focus so far: Correctness
• Models
– Accurate
– But idealized
• Protocols
– Elegant
– Important
– But used in practice
Art of Multiprocessor Programming
6
New Focus: Performance
• Models
– More complicated
– Still focus on principles
• Protocols
– Elegant
– Important
– And realistic
Art of Multiprocessor Programming
7
Kinds of Architectures
• SISD (Uniprocessor)
– Single instruction stream
– Single data stream
• SIMD (Vector)
– Single instruction
– Multiple data
• MIMD (Multiprocessors)
– Multiple instruction
– Multiple data.
Art of Multiprocessor Programming
8
Kinds of Architectures
• SISD (Uniprocessor)
– Single instruction stream
– Single data stream
• SIMD (Vector)
Our space
– Single instruction
– Multiple data
• MIMD (Multiprocessors)
– Multiple instruction
– Multiple data.
(1)
Art of Multiprocessor Programming
9
MIMD Architectures
memory
Shared Bus
Distributed
• Memory Contention
• Communication Contention
• Communication Latency
Art of Multiprocessor Programming
10
Today: Revisit Mutual Exclusion
• Think of performance, not just
correctness and progress
• Begin to understand how performance
depends on our software properly
utilizing the multiprocessor machine’s
hardware
• And get to know a collection of
locking algorithms…
Art of Multiprocessor Programming
11 (1)
What Should you do if you can’t
get a lock?
• Keep trying
– “spin” or “busy-wait”
– Good if delays are short
• Give up the processor
– Good if delays are long
– Always good on uniprocessor
Art of Multiprocessor Programming
12 (1)
What Should you do if you can’t
get a lock?
• Keep trying
– “spin” or “busy-wait”
– Good if delays are short
• Give up the processor
– Good if delays are long
– Always good on uniprocessor
our focus
Art of Multiprocessor Programming
13
Basic Spin-Lock
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
14
Basic Spin-Lock
…lock introduces
sequential bottleneck
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
15
Basic Spin-Lock
…lock suffers from
contention
..
CS
spin
lock
critical
section
Art of Multiprocessor Programming
Resets lock
upon exit
16
Basic Spin-Lock
…lock suffers from
contention
..
CS
spin
lock
critical
section
Resets lock
upon exit
Notice: these are distinct
phenomena
Art of Multiprocessor Programming
17
Test-and-Set Primitive
• Boolean value
• Test-and-set (TAS)
– Swap true with current value
– Return value tells if prior value was true
or false
• Can reset just by writing false
• TAS aka “getAndSet”
Art of Multiprocessor Programming
18
Test-and-Set
public class AtomicBoolean {
boolean value;
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
}
Art of Multiprocessor Programming
19 (5)
Review: Test-and-Set
public class AtomicBoolean {
boolean value;
}
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
Package
java.util.concurrent.atomic
Art of Multiprocessor Programming
20
Review: Test-and-Set
public class AtomicBoolean {
boolean value;
public synchronized boolean
getAndSet(boolean newValue) {
boolean prior = value;
value = newValue;
return prior;
}
}
Swap old and new
values
Art of Multiprocessor Programming
21
Test-and-Set
AtomicBoolean lock
= new AtomicBoolean(false)
…
boolean prior = lock.getAndSet(true)
Art of Multiprocessor Programming
22
Test-and-Set
AtomicBoolean lock
= new AtomicBoolean(false)
…
boolean prior = lock.getAndSet(true)
Swapping in true is called
“test-and-set” or TAS
Art of Multiprocessor Programming
23 (5)
Test-and-Set Locks
• Locking
– Lock is free: value is false
– Lock is taken: value is true
• Acquire lock by calling TAS
– If result is false, you win
– If result is true, you lose
• Release lock by writing false
Art of Multiprocessor Programming
24
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
}}
Art of Multiprocessor Programming
25
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
Lock
state
}}
is AtomicBoolean
Art of Multiprocessor Programming
26
Test-and-set Lock
class TASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
Keep
trying
until
}}
lock acquired
Art of Multiprocessor Programming
27
Test-and-set Lock
class TASlock {
Release
lock
AtomicBoolean
state
= by resetting
new AtomicBoolean(false);
state to false
void lock() {
while (state.getAndSet(true)) {}
}
void unlock() {
state.set(false);
}}
Art of Multiprocessor Programming
28
Space Complexity
•
•
•
•
TAS spin-lock has small “footprint”
N thread spin-lock uses O(1) space
As opposed to O(n) Peterson/Bakery
How did we overcome the W(n) lower
bound?
• We used a combined read-write
operation…
Art of Multiprocessor Programming
29
Performance
• Experiment
– n threads
– Increment shared counter 1 million times
• How long should it take?
• How long does it take?
Art of Multiprocessor Programming
30
Mystery #1
time
TAS lock
Ideal
threads
Art of Multiprocessor Programming
What is
going
on?
31 (1)
Test-and-Test-and-Set Locks
• Lurking stage
– Wait until lock “looks” free
– Spin while read returns true (lock taken)
• Pouncing state
–
–
–
–
As soon as lock “looks” available
Read returns false (lock free)
Call TAS to acquire lock
If TAS loses, back to lurking
Art of Multiprocessor Programming
32
Test-and-test-and-set Lock
class TTASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (true) {
while (state.get()) {}
if (!state.getAndSet(true))
return;
}
}
Art of Multiprocessor Programming
33
Test-and-test-and-set Lock
class TTASlock {
AtomicBoolean state =
new AtomicBoolean(false);
void lock() {
while (true) {
while (state.get()) {}
if (!state.getAndSet(true))
return;
}
}
Wait until lock looks free
Art of Multiprocessor Programming
34
Test-and-test-and-set Lock
class TTASlock {
AtomicBoolean state =
new AtomicBoolean(false);
Then try to
acquire it
void lock() {
while (true) {
while (state.get()) {}
if (!state.getAndSet(true))
return;
}
}
Art of Multiprocessor Programming
35
Mystery #2
TAS lock
time
TTAS lock
Ideal
threads
Art of Multiprocessor Programming
36
Mystery
• Both
– TAS and TTAS
– Do the same thing (in our model)
• Except that
– TTAS performs much better than TAS
– Neither approaches ideal
Art of Multiprocessor Programming
37
Opinion
• Our memory abstraction is broken
• TAS & TTAS methods
– Are provably the same (in our model)
– Except they aren’t (in field tests)
• Need a more detailed model …
Art of Multiprocessor Programming
38
Bus-Based Architectures
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
39
Bus-Based Architectures
Random access memory
(10s of cycles)
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
40
Bus-Based Architectures
Shared Bus
•Broadcast medium
•One broadcaster at a time
•Processors and memory all
“snoop”
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
41
Per-Processor Caches
Bus-Based
Architectures
•Small
•Fast: 1 or 2 cycles
•Address & state information
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
42
Jargon Watch
• Cache hit
– “I found what I wanted in my cache”
– Good Thing
• Cache miss
– “I had to go all the way to memory for
that data”
– Bad Thing
Art of Multiprocessor Programming
43
Caveat
• This model is still a simplification
– But not in any essential way
– Illustrates basic principles
• Will discuss complexities later
Art of Multiprocessor Programming
44
Processor Issues Load Request
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
45
Processor Issues Load Request
Gimme
data
cache
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
46
Memory Responds
cache
cache
cache
Bus
Got your
data right
here
memory
Art of Multiprocessor Programming
Bus
data
47
Processor Issues Load Request
Gimme
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
48
Processor Issues Load Request
Gimme
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
49
Processor Issues Load Request
I got
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
50
Other Processor Responds
I got
data
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
Bus
data
51
Other Processor Responds
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
Bus
data
52
Modify Cached Data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
53 (1)
Modify Cached Data
data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
54 (1)
Modify Cached Data
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
55
Modify Cached Data
data
data
cache
Bus
What’s up with the
other copies?memory
Art of Multiprocessor Programming
data
56
Cache Coherence
• We have lots of copies of data
– Original copy in memory
– Cached copies at processors
• Some processor modifies its own copy
– What do we do with the others?
– How to avoid confusion?
Art of Multiprocessor Programming
57
Write-Back Caches
• Accumulate changes in cache
• Write back when needed
– Need the cache for something else
– Another processor wants it
• On first modification
– Invalidate other entries
– Requires non-trivial protocol …
Art of Multiprocessor Programming
58
Write-Back Caches
• Cache entry has three states
– Invalid: contains raw seething bits
– Valid: I can read but I can’t write
– Dirty: Data has been modified
• Intercept other load requests
• Write back to memory before using cache
Art of Multiprocessor Programming
59
Invalidate
data
data
cache
Bus
memory
Art of Multiprocessor Programming
data
60
Invalidate
data
data
Mine, all
mine!
cache
Bus
memory
Art of Multiprocessor Programming
data
61
Invalidate
Uh,oh
data
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
62
Invalidate
Other caches lose read permission
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
63
Invalidate
Other caches lose read permission
cache
data
cache
Bus
This cache acquires write permission
memory
Art of Multiprocessor Programming
data
64
Invalidate
Memory provides data only if not
present in any cache, so no need to
change it now
(expensive)
data
cache
cache
Bus
memory
Art of Multiprocessor Programming
data
65 (2)
Another Processor Asks for
Data
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
66 (2)
Owner Responds
Here it is!
cache
data
cache
Bus
memory
Art of Multiprocessor Programming
data
67 (2)
End of the Day …
data
data
cache
Bus
data
memory
Reading OK, no writing
Art of Multiprocessor Programming
68 (1)
Mutual Exclusion
• What do we want to optimize?
– Bus bandwidth used by spinning threads
– Release/Acquire latency
– Acquire latency for idle lock
Art of Multiprocessor Programming
69
Simple TASLock
• TAS invalidates cache lines
• Spinners
– Miss in cache
– Go to bus
• Thread wants to release lock
– delayed behind spinners
Art of Multiprocessor Programming
70
Test-and-test-and-set
• Wait until lock “looks” free
– Spin on local cache
– No bus use while lock busy
• Problem: when lock is released
– Invalidation storm …
Art of Multiprocessor Programming
71
Local Spinning while Lock is
Busy
busy
busy
busy
Bus
memory
Art of Multiprocessor Programming
busy
72
On Release
invalid
invalid
free
Bus
memory
Art of Multiprocessor Programming
free
73
On Release
Everyone misses,
rereads
invalid
miss
invalid
miss
free
Bus
memory
Art of Multiprocessor Programming
free
74 (1)
On Release
Everyone tries TAS
TAS(…)
invalid
TAS(…)
invalid
free
Bus
memory
Art of Multiprocessor Programming
free
75 (1)
Problems
• Everyone misses
– Reads satisfied sequentially
• Everyone does TAS
– Invalidates others’ caches
• Eventually quiesces after lock
acquired
– How long does this take?
Art of Multiprocessor Programming
76
Mystery Explained
TAS lock
time
TTAS lock
Ideal
Better than
threads TAS but still
not as good as
ideal
Art of Multiprocessor Programming
77
Solution: Introduce Delay
• If the lock looks free
• But I fail to get it
• There must be contention
• Better to back off than to collide again
time
r2d
r1d
Art of Multiprocessor Programming
d
spin lock
78
Dynamic Example:
Exponential Backoff
time
4d
2d
d
spin lock
If I fail to get lock
– wait random duration before retry
– Each subsequent failure doubles
expected wait
Art of Multiprocessor Programming
79
Exponential Backoff Lock
public class Backoff implements lock {
public void lock() {
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
}}}
Art of Multiprocessor Programming
80
Exponential Backoff Lock
public class Backoff implements lock {
public void lock() {
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
Fix minimum delay
}}}
Art of Multiprocessor Programming
81
Exponential Backoff Lock
public class Backoff implements lock {
public void lock() {
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
Wait until lock looks free
}}}
Art of Multiprocessor Programming
82
Exponential Backoff Lock
public class Backoff implements lock {
public void lock() {
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
If we win, return
}}}
Art of Multiprocessor Programming
83
Exponential Backoff Lock
public class Backoff implements lock {
for{ random duration
publicBack
void off
lock()
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
}}}
Art of Multiprocessor Programming
84
Exponential Backoff Lock
public class Backoff implements lock {
Double
delay,
within reason
public
voidmax
lock()
{
int delay = MIN_DELAY;
while (true) {
while (state.get()) {}
if (!lock.getAndSet(true))
return;
sleep(random() % delay);
if (delay < MAX_DELAY)
delay = 2 * delay;
}}}
Art of Multiprocessor Programming
85
Spin-Waiting Overhead
time
TTAS Lock
Backoff lock
threads
Art of Multiprocessor Programming
86
Backoff: Other Issues
• Good
– Easy to implement
– Beats TTAS lock
• Bad
– Must choose parameters carefully
– Not portable across platforms
Art of Multiprocessor Programming
87
Idea
• Avoid useless invalidations
– By keeping a queue of threads
• Each thread
– Notifies next in line
– Without bothering the others
Art of Multiprocessor Programming
88
Anderson Queue Lock
idle
next
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
89
Anderson Queue Lock
acquiring
next
getAndIncrement
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
90
Anderson Queue Lock
acquiring
next
getAndIncrement
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
91
Anderson Queue Lock
acquired
next
Mine!
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
92
Anderson Queue Lock
acquiring
acquired
next
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
93
Anderson Queue Lock
next
getAndIncrement
flags
T
acquiring
acquired
F
F
F
F
F
Art of Multiprocessor Programming
F
F
94
Anderson Queue Lock
next
getAndIncrement
flags
T
acquiring
acquired
F
F
F
F
F
Art of Multiprocessor Programming
F
F
95
Anderson Queue Lock
acquiring
acquired
next
flags
T
F
F
F
F
F
Art of Multiprocessor Programming
F
F
96
Anderson Queue Lock
acquired
released
next
flags
T
T
F
F
F
F
Art of Multiprocessor Programming
F
F
97
Anderson Queue Lock
acquired
released
next
Yow!
flags
T
T
F
F
F
F
Art of Multiprocessor Programming
F
F
98
Anderson Queue Lock
class ALock implements Lock {
boolean[] flags={true,false,…,false};
AtomicInteger next
= new AtomicInteger(0);
ThreadLocal<Integer> mySlot;
Art of Multiprocessor Programming
99
Anderson Queue Lock
class ALock implements Lock {
boolean[] flags={true,false,…,false};
AtomicInteger next
= new AtomicInteger(0);
ThreadLocal<Integer> mySlot;
One flag per thread
Art of Multiprocessor Programming
100
Anderson Queue Lock
class ALock implements Lock {
boolean[] flags={true,false,…,false};
AtomicInteger next
= new AtomicInteger(0);
ThreadLocal<Integer> mySlot;
Next flag to use
Art of Multiprocessor Programming
101
Anderson Queue Lock
class ALock implements Lock {
boolean[] flags={true,false,…,false};
AtomicInteger next
= new AtomicInteger(0);
ThreadLocal<Integer> mySlot;
Thread-local variable
Art of Multiprocessor Programming
102
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;
}
public unlock() {
flags[(mySlot+1) % n] = true;
}
Art of Multiprocessor Programming
103
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;
}
public unlock() {
flags[(mySlot+1) % n] = true;
Take next
}
Art of Multiprocessor Programming
slot
104
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;
}
public unlock() {
flags[(mySlot+1) % n] = true;
Spin until told
}
Art of Multiprocessor Programming
to go
105
Anderson Queue Lock
public lock() {
myslot = next.getAndIncrement();
while (!flags[myslot % n]) {};
flags[myslot % n] = false;
}
public unlock() {
flags[(myslot+1) % n] = true;
}
Prepare slot for
Art of Multiprocessor Programming
re-use
106
Anderson Queue Lock
public lock() Tell
{
next thread to
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;
}
go
public unlock() {
flags[(mySlot+1) % n] = true;
}
Art of Multiprocessor Programming
107
Performance
TTAS
queue
•
•
•
•
Shorter handover than backoff
Curve is practically flat
Scalable performance
FIFO fairness
Art of Multiprocessor Programming
108
Anderson Queue Lock
• Good
– First truly scalable lock
– Simple, easy to implement
• Bad
– Space hog
– One bit per thread
• Unknown number of threads?
• Small number of actual contenders?
Art of Multiprocessor Programming
109
CLH Lock
• FIFO order
• Small, constant-size overhead per
thread
Art of Multiprocessor Programming
110
Initially
idle
tail
false
Art of Multiprocessor Programming
111
Initially
idle
tail
false
Art of Multiprocessor Programming
Queue tail
112
Initially
idle
Lock is free
tail
false
Art of Multiprocessor Programming
113
Initially
idle
tail
false
Art of Multiprocessor Programming
114
Purple Wants the Lock
acquiring
tail
false
Art of Multiprocessor Programming
115
Purple Wants the Lock
acquiring
tail
false
true
Art of Multiprocessor Programming
116
Purple Wants the Lock
acquiring
Swap
tail
false
true
Art of Multiprocessor Programming
117
Purple Has the Lock
acquired
tail
false
true
Art of Multiprocessor Programming
118
Red Wants the Lock
acquired
acquiring
tail
false
true
Art of Multiprocessor Programming
true
119
Red Wants the Lock
acquired
acquiring
Swap
tail
false
true
Art of Multiprocessor Programming
true
120
Red Wants the Lock
acquired
acquiring
tail
false
true
Art of Multiprocessor Programming
true
121
Red Wants the Lock
acquired
acquiring
tail
false
true
Art of Multiprocessor Programming
true
122
Red Wants the Lock
acquired
acquiring
Implicit
Linked list
tail
false
true
Art of Multiprocessor Programming
true
123
Red Wants the Lock
acquired
acquiring
tail
false
true
Art of Multiprocessor Programming
true
124
Red Wants the Lock
acquired
acquiring
Actually, it
spins on
cached copy
true
tail
false
true
Art of Multiprocessor Programming
true
125
Purple Releases
release
acquiring
Bingo!
false
tail
false
false
Art of Multiprocessor Programming
true
126
Purple Releases
released
acquired
tail
true
Art of Multiprocessor Programming
127
Space Usage
• Let
– L = number of locks
– N = number of threads
• ALock
– O(LN)
• CLH lock
– O(L+N)
Art of Multiprocessor Programming
128
CLH Queue Lock
class Qnode {
AtomicBoolean locked =
new AtomicBoolean(true);
}
Art of Multiprocessor Programming
129
CLH Queue Lock
class CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
myNode.locked.set(true);
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Art of Multiprocessor Programming
130(3)
CLH Queue Lock
class CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
mynode.locked.set(true);
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
Queue tail
}}
Art of Multiprocessor Programming
131(3)
CLH Queue Lock
class CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Thread-local Qnode
Art of Multiprocessor Programming
132(3)
CLH Queue Lock
class CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
Swap in my node
= new Qnode();
public void lock() {
mynode.locked.set(true);
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Art of Multiprocessor Programming
133(3)
CLH Queue Lock
class CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
Spin until predecessor
= new Qnode();
releases lock
public void lock() {
mynode.locked.set(true);
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Art of Multiprocessor Programming
134(3)
CLH Queue Lock
Class CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Art of Multiprocessor Programming
135(3)
CLH Queue Lock
Class CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Notify successor
Art of Multiprocessor Programming
136(3)
CLH Queue Lock
Class CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Recycle
predecessor’s node
Art of Multiprocessor Programming
137(3)
CLH Queue Lock
Class CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
(Code in book shows how it’s done
using myPred reference.)
Art of Multiprocessor Programming
138(3)
CLH Lock
• Good
– Lock release affects predecessor only
– Small, constant-sized space
• Bad
– Doesn’t work for uncached NUMA
architectures
Art of Multiprocessor Programming
139
NUMA Architecturs
• Acronym:
– Non-Uniform Memory Architecture
• Illusion:
– Flat shared memory
• Truth:
– No caches (sometimes)
– Some memory regions faster than others
Art of Multiprocessor Programming
140
NUMA Machines
Spinning on local
memory is fast
Art of Multiprocessor Programming
141
NUMA Machines
Spinning on remote
memory is slow
Art of Multiprocessor Programming
142
CLH Lock
• Each thread spins on predecessor’s
memory
• Could be far away …
Art of Multiprocessor Programming
143
MCS Lock
• FIFO order
• Spin on local memory only
• Small, Constant-size overhead
Art of Multiprocessor Programming
144
Initially
idle
tail
false
Art of Multiprocessor Programming
145
Acquiring
acquiring
(allocate Qnode)
true
tail
false
Art of Multiprocessor Programming
146
Acquiring
acquired
swap
true
tail
false
Art of Multiprocessor Programming
147
Acquiring
acquired
true
tail
false
Art of Multiprocessor Programming
148
Acquired
acquired
true
tail
false
Art of Multiprocessor Programming
149
Acquiring
acquired
acquiring
false
tail
swap
Art of Multiprocessor Programming
true
150
Acquiring
acquired
acquiring
false
tail
true
Art of Multiprocessor Programming
151
Acquiring
acquired
acquiring
false
tail
true
Art of Multiprocessor Programming
152
Acquiring
acquired
acquiring
false
tail
true
Art of Multiprocessor Programming
153
Acquiring
acquired
acquiring
true
tail
true
false
Art of Multiprocessor Programming
154
Acquiring
acquired
acquiring
Yes!
true
tail
false
true
Art of Multiprocessor Programming
155
MCS Queue Lock
class Qnode {
boolean locked = false;
qnode
next
= null;
}
Art of Multiprocessor Programming
156
MCS Queue Lock
class MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Art of Multiprocessor Programming
157(3)
MCS Queue Lock
class MCSLock implements Lock {
Make a
AtomicReference tail;
QNode
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Art of Multiprocessor Programming
158(3)
MCS Queue Lock
class MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;add my Node to
pred.next = qnode;
the tail of
while (qnode.locked) {}
queue
}}}
Art of Multiprocessor Programming
159(3)
MCS Queue Lock
class MCSLock implements Lock {
Fix if queue
AtomicReference tail;
was non-empty
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Art of Multiprocessor Programming
160(3)
MCS Queue Lock
class MCSLock implements Lock {
AtomicReference tail;
Wait until
public void lock() {
unlocked
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Art of Multiprocessor Programming
161(3)
MCS Queue Unlock
class MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Art of Multiprocessor Programming
162(3)
MCS Queue Lock
class MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
Missing
qnode.next.locked = false;
successor?
}}
Art of Multiprocessor Programming
163(3)
MCS Queue Lock
class MCSLock implements Lock {
If really no successor,
AtomicReference tail;
return
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Art of Multiprocessor Programming
164(3)
MCS Queue Lock
class MCSLock implements Lock {
Otherwise wait for
AtomicReference tail;
successor to catch up
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Art of Multiprocessor Programming
165(3)
MCS Queue Lock
class MCSLock implements Lock {
AtomicReference queue;
Pass lock to successor
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Art of Multiprocessor Programming
166(3)
Purple Release
releasing
swap
false
false
Art of Multiprocessor Programming
167(2)
Purple Release
By looking at the queue, I
see another thread is
releasing
activeswap
false
false
Art of Multiprocessor Programming
168(2)
Purple Release
By looking at the queue, I
see another thread is
releasing
activeswap
false
false
I have to wait for that
thread to finish
Art of Multiprocessor Programming
169(2)
Purple Release
releasing
prepare to spin
true
false
Art of Multiprocessor Programming
170
Purple Release
releasing
spinning
true
false
Art of Multiprocessor Programming
171
Purple Release
releasing
spinning
false
true
false
Art of Multiprocessor Programming
172
Purple Release
releasing
Acquired lock
false
true
false
Art of Multiprocessor Programming
173
Abortable Locks
• What if you want to give up waiting
for a lock?
• For example
– Timeout
– Database transaction aborted by user
Art of Multiprocessor Programming
174
Back-off Lock
• Aborting is trivial
– Just return from lock() call
• Extra benefit:
– No cleaning up
– Wait-free
– Immediate return
Art of Multiprocessor Programming
175
Queue Locks
• Can’t just quit
– Thread in line behind will starve
• Need a graceful way out
• Timeout Queue Lock
Art of Multiprocessor Programming
176
One Lock To Rule Them All?
•
•
•
•
TTAS+Backoff, CLH, MCS, ToLock…
Each better than others in some way
There is no one solution
Lock we pick really depends on:
– the application
– the hardware
– which properties are important
Art of Multiprocessor Programming
177
Download