Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for CIS 640, University of Pennsylvania Muddy children’s puzzle (Common Knowledge) • A group of kids are playing. A stranger walks by and announces “Some of you have mud on your forehead • Each kid can see everyone else’s forehead, but not his/her own (and they don’t talk to one another) • Stranger says “Raise your hand if you conclude that you have mud on your forehead”. Nobody does. • Stranger keeps on repeating the statement. • If k kids have muddy foreheads, then exactly these k kids raise their hands after the stranger repeats the statement exactly k times Art of Multiprocessor Programming 2 Muddy children’s puzzle Why does this happen? • For every k: – If >=k kids have muddy foreheads, then in the first k-1 rounds nobody raises hands – If k kids have muddy foreheads, then in the k-th round, exactly muddy kids raise their hands • This claim can be proved by induction on k – Base case k=1 – Inductive case (assume for k, and prove for k+1) Art of Multiprocessor Programming 3 What is the role of stranger’s statement? • Let p stand for “> 0 kids have muddy foreheads” • Assuming >1 kids are muddy, stranger announcing p does not add to anyone’s information • However, without stranger’s announcement, nobody will ever raise their hands • So what’s going on • Well, the base case for our proof fails, but exactly what information do kids acquire from the stranger’s announcement? Art of Multiprocessor Programming 4 Common Knowledge • • • • E p : Everybody knows p E E p: Everybody knows that everybody knows p Ek p : defined similarly (k repetitions) C p : p is “common knowledge”: limit of Everybody knows that everybody knows …. • For k =2, each kid knows p, but not Ep, and after stranger’s announcement, each kid knows Ep • If k kids are muddy, before announcement, each kid knows Ek-1 p, but not Ek p • Stranger makes p the common knowledge Art of Multiprocessor Programming 5 Mutual Exclusion Focus so far: Correctness • Models – Accurate – But idealized • Protocols – Elegant – Important – But used in practice Art of Multiprocessor Programming 6 New Focus: Performance • Models – More complicated – Still focus on principles • Protocols – Elegant – Important – And realistic Art of Multiprocessor Programming 7 Kinds of Architectures • SISD (Uniprocessor) – Single instruction stream – Single data stream • SIMD (Vector) – Single instruction – Multiple data • MIMD (Multiprocessors) – Multiple instruction – Multiple data. Art of Multiprocessor Programming 8 Kinds of Architectures • SISD (Uniprocessor) – Single instruction stream – Single data stream • SIMD (Vector) Our space – Single instruction – Multiple data • MIMD (Multiprocessors) – Multiple instruction – Multiple data. (1) Art of Multiprocessor Programming 9 MIMD Architectures memory Shared Bus Distributed • Memory Contention • Communication Contention • Communication Latency Art of Multiprocessor Programming 10 Today: Revisit Mutual Exclusion • Think of performance, not just correctness and progress • Begin to understand how performance depends on our software properly utilizing the multiprocessor machine’s hardware • And get to know a collection of locking algorithms… Art of Multiprocessor Programming 11 (1) What Should you do if you can’t get a lock? • Keep trying – “spin” or “busy-wait” – Good if delays are short • Give up the processor – Good if delays are long – Always good on uniprocessor Art of Multiprocessor Programming 12 (1) What Should you do if you can’t get a lock? • Keep trying – “spin” or “busy-wait” – Good if delays are short • Give up the processor – Good if delays are long – Always good on uniprocessor our focus Art of Multiprocessor Programming 13 Basic Spin-Lock .. CS spin lock critical section Art of Multiprocessor Programming Resets lock upon exit 14 Basic Spin-Lock …lock introduces sequential bottleneck .. CS spin lock critical section Art of Multiprocessor Programming Resets lock upon exit 15 Basic Spin-Lock …lock suffers from contention .. CS spin lock critical section Art of Multiprocessor Programming Resets lock upon exit 16 Basic Spin-Lock …lock suffers from contention .. CS spin lock critical section Resets lock upon exit Notice: these are distinct phenomena Art of Multiprocessor Programming 17 Test-and-Set Primitive • Boolean value • Test-and-set (TAS) – Swap true with current value – Return value tells if prior value was true or false • Can reset just by writing false • TAS aka “getAndSet” Art of Multiprocessor Programming 18 Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } } Art of Multiprocessor Programming 19 (5) Review: Test-and-Set public class AtomicBoolean { boolean value; } public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } Package java.util.concurrent.atomic Art of Multiprocessor Programming 20 Review: Test-and-Set public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) { boolean prior = value; value = newValue; return prior; } } Swap old and new values Art of Multiprocessor Programming 21 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) Art of Multiprocessor Programming 22 Test-and-Set AtomicBoolean lock = new AtomicBoolean(false) … boolean prior = lock.getAndSet(true) Swapping in true is called “test-and-set” or TAS Art of Multiprocessor Programming 23 (5) Test-and-Set Locks • Locking – Lock is free: value is false – Lock is taken: value is true • Acquire lock by calling TAS – If result is false, you win – If result is true, you lose • Release lock by writing false Art of Multiprocessor Programming 24 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 25 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); Lock state }} is AtomicBoolean Art of Multiprocessor Programming 26 Test-and-set Lock class TASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); Keep trying until }} lock acquired Art of Multiprocessor Programming 27 Test-and-set Lock class TASlock { Release lock AtomicBoolean state = by resetting new AtomicBoolean(false); state to false void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }} Art of Multiprocessor Programming 28 Space Complexity • • • • TAS spin-lock has small “footprint” N thread spin-lock uses O(1) space As opposed to O(n) Peterson/Bakery How did we overcome the W(n) lower bound? • We used a combined read-write operation… Art of Multiprocessor Programming 29 Performance • Experiment – n threads – Increment shared counter 1 million times • How long should it take? • How long does it take? Art of Multiprocessor Programming 30 Mystery #1 time TAS lock Ideal threads Art of Multiprocessor Programming What is going on? 31 (1) Test-and-Test-and-Set Locks • Lurking stage – Wait until lock “looks” free – Spin while read returns true (lock taken) • Pouncing state – – – – As soon as lock “looks” available Read returns false (lock free) Call TAS to acquire lock If TAS loses, back to lurking Art of Multiprocessor Programming 32 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Art of Multiprocessor Programming 33 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Wait until lock looks free Art of Multiprocessor Programming 34 Test-and-test-and-set Lock class TTASlock { AtomicBoolean state = new AtomicBoolean(false); Then try to acquire it void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; } } Art of Multiprocessor Programming 35 Mystery #2 TAS lock time TTAS lock Ideal threads Art of Multiprocessor Programming 36 Mystery • Both – TAS and TTAS – Do the same thing (in our model) • Except that – TTAS performs much better than TAS – Neither approaches ideal Art of Multiprocessor Programming 37 Opinion • Our memory abstraction is broken • TAS & TTAS methods – Are provably the same (in our model) – Except they aren’t (in field tests) • Need a more detailed model … Art of Multiprocessor Programming 38 Bus-Based Architectures cache cache cache Bus memory Art of Multiprocessor Programming 39 Bus-Based Architectures Random access memory (10s of cycles) cache cache cache Bus memory Art of Multiprocessor Programming 40 Bus-Based Architectures Shared Bus •Broadcast medium •One broadcaster at a time •Processors and memory all “snoop” cache cache cache Bus memory Art of Multiprocessor Programming 41 Per-Processor Caches Bus-Based Architectures •Small •Fast: 1 or 2 cycles •Address & state information cache cache cache Bus memory Art of Multiprocessor Programming 42 Jargon Watch • Cache hit – “I found what I wanted in my cache” – Good Thing • Cache miss – “I had to go all the way to memory for that data” – Bad Thing Art of Multiprocessor Programming 43 Caveat • This model is still a simplification – But not in any essential way – Illustrates basic principles • Will discuss complexities later Art of Multiprocessor Programming 44 Processor Issues Load Request cache cache cache Bus memory Art of Multiprocessor Programming data 45 Processor Issues Load Request Gimme data cache cache cache Bus memory Art of Multiprocessor Programming data 46 Memory Responds cache cache cache Bus Got your data right here memory Art of Multiprocessor Programming Bus data 47 Processor Issues Load Request Gimme data data cache cache Bus memory Art of Multiprocessor Programming data 48 Processor Issues Load Request Gimme data data cache cache Bus memory Art of Multiprocessor Programming data 49 Processor Issues Load Request I got data data cache cache Bus memory Art of Multiprocessor Programming data 50 Other Processor Responds I got data data cache cache Bus memory Art of Multiprocessor Programming Bus data 51 Other Processor Responds data cache cache Bus memory Art of Multiprocessor Programming Bus data 52 Modify Cached Data data data cache Bus memory Art of Multiprocessor Programming data 53 (1) Modify Cached Data data data data cache Bus memory Art of Multiprocessor Programming data 54 (1) Modify Cached Data data data cache Bus memory Art of Multiprocessor Programming data 55 Modify Cached Data data data cache Bus What’s up with the other copies?memory Art of Multiprocessor Programming data 56 Cache Coherence • We have lots of copies of data – Original copy in memory – Cached copies at processors • Some processor modifies its own copy – What do we do with the others? – How to avoid confusion? Art of Multiprocessor Programming 57 Write-Back Caches • Accumulate changes in cache • Write back when needed – Need the cache for something else – Another processor wants it • On first modification – Invalidate other entries – Requires non-trivial protocol … Art of Multiprocessor Programming 58 Write-Back Caches • Cache entry has three states – Invalid: contains raw seething bits – Valid: I can read but I can’t write – Dirty: Data has been modified • Intercept other load requests • Write back to memory before using cache Art of Multiprocessor Programming 59 Invalidate data data cache Bus memory Art of Multiprocessor Programming data 60 Invalidate data data Mine, all mine! cache Bus memory Art of Multiprocessor Programming data 61 Invalidate Uh,oh data cache data cache Bus memory Art of Multiprocessor Programming data 62 Invalidate Other caches lose read permission cache data cache Bus memory Art of Multiprocessor Programming data 63 Invalidate Other caches lose read permission cache data cache Bus This cache acquires write permission memory Art of Multiprocessor Programming data 64 Invalidate Memory provides data only if not present in any cache, so no need to change it now (expensive) data cache cache Bus memory Art of Multiprocessor Programming data 65 (2) Another Processor Asks for Data cache data cache Bus memory Art of Multiprocessor Programming data 66 (2) Owner Responds Here it is! cache data cache Bus memory Art of Multiprocessor Programming data 67 (2) End of the Day … data data cache Bus data memory Reading OK, no writing Art of Multiprocessor Programming 68 (1) Mutual Exclusion • What do we want to optimize? – Bus bandwidth used by spinning threads – Release/Acquire latency – Acquire latency for idle lock Art of Multiprocessor Programming 69 Simple TASLock • TAS invalidates cache lines • Spinners – Miss in cache – Go to bus • Thread wants to release lock – delayed behind spinners Art of Multiprocessor Programming 70 Test-and-test-and-set • Wait until lock “looks” free – Spin on local cache – No bus use while lock busy • Problem: when lock is released – Invalidation storm … Art of Multiprocessor Programming 71 Local Spinning while Lock is Busy busy busy busy Bus memory Art of Multiprocessor Programming busy 72 On Release invalid invalid free Bus memory Art of Multiprocessor Programming free 73 On Release Everyone misses, rereads invalid miss invalid miss free Bus memory Art of Multiprocessor Programming free 74 (1) On Release Everyone tries TAS TAS(…) invalid TAS(…) invalid free Bus memory Art of Multiprocessor Programming free 75 (1) Problems • Everyone misses – Reads satisfied sequentially • Everyone does TAS – Invalidates others’ caches • Eventually quiesces after lock acquired – How long does this take? Art of Multiprocessor Programming 76 Mystery Explained TAS lock time TTAS lock Ideal Better than threads TAS but still not as good as ideal Art of Multiprocessor Programming 77 Solution: Introduce Delay • If the lock looks free • But I fail to get it • There must be contention • Better to back off than to collide again time r2d r1d Art of Multiprocessor Programming d spin lock 78 Dynamic Example: Exponential Backoff time 4d 2d d spin lock If I fail to get lock – wait random duration before retry – Each subsequent failure doubles expected wait Art of Multiprocessor Programming 79 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Art of Multiprocessor Programming 80 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; Fix minimum delay }}} Art of Multiprocessor Programming 81 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; Wait until lock looks free }}} Art of Multiprocessor Programming 82 Exponential Backoff Lock public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; If we win, return }}} Art of Multiprocessor Programming 83 Exponential Backoff Lock public class Backoff implements lock { for{ random duration publicBack void off lock() int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Art of Multiprocessor Programming 84 Exponential Backoff Lock public class Backoff implements lock { Double delay, within reason public voidmax lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Art of Multiprocessor Programming 85 Spin-Waiting Overhead time TTAS Lock Backoff lock threads Art of Multiprocessor Programming 86 Backoff: Other Issues • Good – Easy to implement – Beats TTAS lock • Bad – Must choose parameters carefully – Not portable across platforms Art of Multiprocessor Programming 87 Idea • Avoid useless invalidations – By keeping a queue of threads • Each thread – Notifies next in line – Without bothering the others Art of Multiprocessor Programming 88 Anderson Queue Lock idle next flags T F F F F F Art of Multiprocessor Programming F F 89 Anderson Queue Lock acquiring next getAndIncrement flags T F F F F F Art of Multiprocessor Programming F F 90 Anderson Queue Lock acquiring next getAndIncrement flags T F F F F F Art of Multiprocessor Programming F F 91 Anderson Queue Lock acquired next Mine! flags T F F F F F Art of Multiprocessor Programming F F 92 Anderson Queue Lock acquiring acquired next flags T F F F F F Art of Multiprocessor Programming F F 93 Anderson Queue Lock next getAndIncrement flags T acquiring acquired F F F F F Art of Multiprocessor Programming F F 94 Anderson Queue Lock next getAndIncrement flags T acquiring acquired F F F F F Art of Multiprocessor Programming F F 95 Anderson Queue Lock acquiring acquired next flags T F F F F F Art of Multiprocessor Programming F F 96 Anderson Queue Lock acquired released next flags T T F F F F Art of Multiprocessor Programming F F 97 Anderson Queue Lock acquired released next Yow! flags T T F F F F Art of Multiprocessor Programming F F 98 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; Art of Multiprocessor Programming 99 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; One flag per thread Art of Multiprocessor Programming 100 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; Next flag to use Art of Multiprocessor Programming 101 Anderson Queue Lock class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot; Thread-local variable Art of Multiprocessor Programming 102 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; } Art of Multiprocessor Programming 103 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; Take next } Art of Multiprocessor Programming slot 104 Anderson Queue Lock public lock() { mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } public unlock() { flags[(mySlot+1) % n] = true; Spin until told } Art of Multiprocessor Programming to go 105 Anderson Queue Lock public lock() { myslot = next.getAndIncrement(); while (!flags[myslot % n]) {}; flags[myslot % n] = false; } public unlock() { flags[(myslot+1) % n] = true; } Prepare slot for Art of Multiprocessor Programming re-use 106 Anderson Queue Lock public lock() Tell { next thread to mySlot = next.getAndIncrement(); while (!flags[mySlot % n]) {}; flags[mySlot % n] = false; } go public unlock() { flags[(mySlot+1) % n] = true; } Art of Multiprocessor Programming 107 Performance TTAS queue • • • • Shorter handover than backoff Curve is practically flat Scalable performance FIFO fairness Art of Multiprocessor Programming 108 Anderson Queue Lock • Good – First truly scalable lock – Simple, easy to implement • Bad – Space hog – One bit per thread • Unknown number of threads? • Small number of actual contenders? Art of Multiprocessor Programming 109 CLH Lock • FIFO order • Small, constant-size overhead per thread Art of Multiprocessor Programming 110 Initially idle tail false Art of Multiprocessor Programming 111 Initially idle tail false Art of Multiprocessor Programming Queue tail 112 Initially idle Lock is free tail false Art of Multiprocessor Programming 113 Initially idle tail false Art of Multiprocessor Programming 114 Purple Wants the Lock acquiring tail false Art of Multiprocessor Programming 115 Purple Wants the Lock acquiring tail false true Art of Multiprocessor Programming 116 Purple Wants the Lock acquiring Swap tail false true Art of Multiprocessor Programming 117 Purple Has the Lock acquired tail false true Art of Multiprocessor Programming 118 Red Wants the Lock acquired acquiring tail false true Art of Multiprocessor Programming true 119 Red Wants the Lock acquired acquiring Swap tail false true Art of Multiprocessor Programming true 120 Red Wants the Lock acquired acquiring tail false true Art of Multiprocessor Programming true 121 Red Wants the Lock acquired acquiring tail false true Art of Multiprocessor Programming true 122 Red Wants the Lock acquired acquiring Implicit Linked list tail false true Art of Multiprocessor Programming true 123 Red Wants the Lock acquired acquiring tail false true Art of Multiprocessor Programming true 124 Red Wants the Lock acquired acquiring Actually, it spins on cached copy true tail false true Art of Multiprocessor Programming true 125 Purple Releases release acquiring Bingo! false tail false false Art of Multiprocessor Programming true 126 Purple Releases released acquired tail true Art of Multiprocessor Programming 127 Space Usage • Let – L = number of locks – N = number of threads • ALock – O(LN) • CLH lock – O(L+N) Art of Multiprocessor Programming 128 CLH Queue Lock class Qnode { AtomicBoolean locked = new AtomicBoolean(true); } Art of Multiprocessor Programming 129 CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { myNode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Art of Multiprocessor Programming 130(3) CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} Queue tail }} Art of Multiprocessor Programming 131(3) CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode = new Qnode(); public void lock() { Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Thread-local Qnode Art of Multiprocessor Programming 132(3) CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode Swap in my node = new Qnode(); public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Art of Multiprocessor Programming 133(3) CLH Queue Lock class CLHLock implements Lock { AtomicReference<Qnode> tail; ThreadLocal<Qnode> myNode Spin until predecessor = new Qnode(); releases lock public void lock() { mynode.locked.set(true); Qnode pred = tail.getAndSet(myNode); while (pred.locked) {} }} Art of Multiprocessor Programming 134(3) CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } } Art of Multiprocessor Programming 135(3) CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } } Notify successor Art of Multiprocessor Programming 136(3) CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } } Recycle predecessor’s node Art of Multiprocessor Programming 137(3) CLH Queue Lock Class CLHLock implements Lock { … public void unlock() { myNode.locked.set(false); myNode = pred; } } (Code in book shows how it’s done using myPred reference.) Art of Multiprocessor Programming 138(3) CLH Lock • Good – Lock release affects predecessor only – Small, constant-sized space • Bad – Doesn’t work for uncached NUMA architectures Art of Multiprocessor Programming 139 NUMA Architecturs • Acronym: – Non-Uniform Memory Architecture • Illusion: – Flat shared memory • Truth: – No caches (sometimes) – Some memory regions faster than others Art of Multiprocessor Programming 140 NUMA Machines Spinning on local memory is fast Art of Multiprocessor Programming 141 NUMA Machines Spinning on remote memory is slow Art of Multiprocessor Programming 142 CLH Lock • Each thread spins on predecessor’s memory • Could be far away … Art of Multiprocessor Programming 143 MCS Lock • FIFO order • Spin on local memory only • Small, Constant-size overhead Art of Multiprocessor Programming 144 Initially idle tail false Art of Multiprocessor Programming 145 Acquiring acquiring (allocate Qnode) true tail false Art of Multiprocessor Programming 146 Acquiring acquired swap true tail false Art of Multiprocessor Programming 147 Acquiring acquired true tail false Art of Multiprocessor Programming 148 Acquired acquired true tail false Art of Multiprocessor Programming 149 Acquiring acquired acquiring false tail swap Art of Multiprocessor Programming true 150 Acquiring acquired acquiring false tail true Art of Multiprocessor Programming 151 Acquiring acquired acquiring false tail true Art of Multiprocessor Programming 152 Acquiring acquired acquiring false tail true Art of Multiprocessor Programming 153 Acquiring acquired acquiring true tail true false Art of Multiprocessor Programming 154 Acquiring acquired acquiring Yes! true tail false true Art of Multiprocessor Programming 155 MCS Queue Lock class Qnode { boolean locked = false; qnode next = null; } Art of Multiprocessor Programming 156 MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Art of Multiprocessor Programming 157(3) MCS Queue Lock class MCSLock implements Lock { Make a AtomicReference tail; QNode public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Art of Multiprocessor Programming 158(3) MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true;add my Node to pred.next = qnode; the tail of while (qnode.locked) {} queue }}} Art of Multiprocessor Programming 159(3) MCS Queue Lock class MCSLock implements Lock { Fix if queue AtomicReference tail; was non-empty public void lock() { Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Art of Multiprocessor Programming 160(3) MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; Wait until public void lock() { unlocked Qnode qnode = new Qnode(); Qnode pred = tail.getAndSet(qnode); if (pred != null) { qnode.locked = true; pred.next = qnode; while (qnode.locked) {} }}} Art of Multiprocessor Programming 161(3) MCS Queue Unlock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Art of Multiprocessor Programming 162(3) MCS Queue Lock class MCSLock implements Lock { AtomicReference tail; public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } Missing qnode.next.locked = false; successor? }} Art of Multiprocessor Programming 163(3) MCS Queue Lock class MCSLock implements Lock { If really no successor, AtomicReference tail; return public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Art of Multiprocessor Programming 164(3) MCS Queue Lock class MCSLock implements Lock { Otherwise wait for AtomicReference tail; successor to catch up public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Art of Multiprocessor Programming 165(3) MCS Queue Lock class MCSLock implements Lock { AtomicReference queue; Pass lock to successor public void unlock() { if (qnode.next == null) { if (tail.CAS(qnode, null) return; while (qnode.next == null) {} } qnode.next.locked = false; }} Art of Multiprocessor Programming 166(3) Purple Release releasing swap false false Art of Multiprocessor Programming 167(2) Purple Release By looking at the queue, I see another thread is releasing activeswap false false Art of Multiprocessor Programming 168(2) Purple Release By looking at the queue, I see another thread is releasing activeswap false false I have to wait for that thread to finish Art of Multiprocessor Programming 169(2) Purple Release releasing prepare to spin true false Art of Multiprocessor Programming 170 Purple Release releasing spinning true false Art of Multiprocessor Programming 171 Purple Release releasing spinning false true false Art of Multiprocessor Programming 172 Purple Release releasing Acquired lock false true false Art of Multiprocessor Programming 173 Abortable Locks • What if you want to give up waiting for a lock? • For example – Timeout – Database transaction aborted by user Art of Multiprocessor Programming 174 Back-off Lock • Aborting is trivial – Just return from lock() call • Extra benefit: – No cleaning up – Wait-free – Immediate return Art of Multiprocessor Programming 175 Queue Locks • Can’t just quit – Thread in line behind will starve • Need a graceful way out • Timeout Queue Lock Art of Multiprocessor Programming 176 One Lock To Rule Them All? • • • • TTAS+Backoff, CLH, MCS, ToLock… Each better than others in some way There is no one solution Lock we pick really depends on: – the application – the hardware – which properties are important Art of Multiprocessor Programming 177