Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit The Five-Fold Path • • • • • Coarse-grained locking Fine-grained locking Optimistic synchronization Lazy synchronization Lock-free synchronization Art of Multiprocessor Programming 2 Another Fundamental Problem • We told you about – Sets implemented by linked lists • Next: queues • Next: stacks Art of Multiprocessor Programming 3 Queues & Stacks • pool of items Art of Multiprocessor Programming 4 Queues deq()/ enq( ) Total order First in First out Art of Multiprocessor Programming 5 Stacks pop()/ push( Total order Last in First out ) Art of Multiprocessor Programming 6 Bounded • Fixed capacity • Good when resources an issue Art of Multiprocessor Programming 7 Unbounded • Unlimited capacity • Often more convenient Art of Multiprocessor Programming 8 Blocking zzz … Block on attempt to remove from empty stack or queue Art of Multiprocessor Programming 9 Blocking zzz … Block on attempt to add to full bounded stack or queue Art of Multiprocessor Programming 10 Non-Blocking Ouch! Throw exception on attempt to remove from empty stack or queue Art of Multiprocessor Programming 11 This Lecture • Queue – Bounded, blocking, lock-based – Unbounded, non-blocking, lock-free • Stack – Unbounded, non-blocking lock-free – Elimination-backoff algorithm Art of Multiprocessor Programming 12 Queue: Concurrency enq(x) tail head y=deq() enq() and deq() work at different ends of the object Art of Multiprocessor Programming 13 Concurrency y=deq() enq(x) Challenge: what if the queue is empty or full? Art of Multiprocessor Programming 14 Bounded Queue head tail Sentinel Art of Multiprocessor Programming 15 Bounded Queue head tail First actual item Art of Multiprocessor Programming 16 Bounded Queue head tail deqLock Lock out other deq() calls Art of Multiprocessor Programming 17 Bounded Queue head tail deqLock enqLock Lock out other enq() calls Art of Multiprocessor Programming 18 Not Done Yet head tail deqLock enqLock Need to tell whether queue is full or empty Art of Multiprocessor Programming 19 Not Done Yet head tail deqLock enqLock size 1 Max size is 8 items Art of Multiprocessor Programming 20 Not Done Yet head tail deqLock enqLock size 1 Incremented by enq() Decremented by deq() Art of Multiprocessor Programming 21 Enqueuer head tail deqLock enqLock Lock enqLock size 1 Art of Multiprocessor Programming 22 Enqueuer head tail deqLock Read size enqLock size 1 OK Art of Multiprocessor Programming 23 Enqueuer head tail deqLock No need to lock tail enqLock size 1 Art of Multiprocessor Programming 24 Enqueuer head tail deqLock enqLock Enqueue Node size 1 Art of Multiprocessor Programming 25 Enqueuer head tail deqLock enqLock size 12 getAndincrement() Art of Multiprocessor Programming 26 Enqueuer head tail deqLock enqLock size Release lock 8 2 Art of Multiprocessor Programming 27 Enqueuer head tail deqLock enqLock If queue was empty, notify waiting dequeuers size 2 Art of Multiprocessor Programming 28 Unsuccesful Enqueuer … head tail deqLock Read size enqLock size 8 Uh-oh Art of Multiprocessor Programming 29 Dequeuer head tail deqLock enqLock Lock deqLock size 2 Art of Multiprocessor Programming 30 Dequeuer head tail deqLock enqLock size Read sentinel’s next field 2 OK Art of Multiprocessor Programming 31 Dequeuer head tail deqLock enqLock Read value size 2 Art of Multiprocessor Programming 32 Dequeuer Make first Node new sentinel head tail deqLock enqLock size 2 Art of Multiprocessor Programming 33 Dequeuer head tail deqLock enqLock size Decrement size 1 Art of Multiprocessor Programming 34 Dequeuer head tail deqLock enqLock size 1 Release deqLock Art of Multiprocessor Programming 35 Unsuccesful Dequeuer head tail deqLock enqLock size Read sentinel’s next field 0 uh-oh Art of Multiprocessor Programming 36 Bounded Queue public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Node tail; int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 37 Bounded Queue public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Node tail; int capacity; enq & deq locks enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 38 Digression: Monitor Locks • Java synchronized objects and ReentrantLocks are monitors • Allow blocking on a condition rather than spinning • Threads: – acquire and release lock – wait on a condition Art of Multiprocessor Programming 39 The Java Lock Interface public interface Lock { void lock(); void lockInterruptibly() throws InterruptedException; boolean tryLock(); boolean tryLock(long time, TimeUnit unit); Condition newCondition(); void unlock; } Acquire lock Art of Multiprocessor Programming 40 The Java Lock Interface public interface Lock { void lock(); void lockInterruptibly() throws InterruptedException; boolean tryLock(); boolean tryLock(long time, TimeUnit unit); Condition newCondition(); void unlock; Release lock } Art of Multiprocessor Programming 41 The Java Lock Interface public interface Lock { void lock(); void lockInterruptibly() throws InterruptedException; boolean tryLock(); boolean tryLock(long time, TimeUnit unit); Condition newCondition(); void unlock; } Try for lock, but not too hard Art of Multiprocessor Programming 42 The Java Lock Interface public interface Lock { void lock(); void lockInterruptibly() throws InterruptedException; boolean tryLock(); boolean tryLock(long time, TimeUnit unit); Condition newCondition(); void unlock; } Create condition to wait on Art of Multiprocessor Programming 43 The Java Lock Interface public interface Lock { void lock(); void lockInterruptibly() throws InterruptedException; boolean tryLock(); boolean tryLock(long time, TimeUnit unit); Condition newCondition(); void unlock; } Never mind what this method does Art of Multiprocessor Programming 44 Lock Conditions public interface Condition { void await(); boolean await(long time, TimeUnit unit); … void signal(); void signalAll(); } Art of Multiprocessor Programming 45 Lock Conditions public interface Condition { void await(); boolean await(long time, TimeUnit unit); … void signal(); void signalAll(); } Release lock and wait on condition Art of Multiprocessor Programming 46 Lock Conditions public interface Condition { void await(); boolean await(long time, TimeUnit unit); … void signal(); void signalAll(); } Wake up one waiting thread Art of Multiprocessor Programming 47 Lock Conditions public interface Condition { void await(); boolean await(long time, TimeUnit unit); … void signal(); void signalAll(); } Wake up all waiting threads Art of Multiprocessor Programming 48 Await q.await() • • • • Releases lock associated with q Sleeps (gives up processor) Awakens (resumes running) Reacquires lock & returns Art of Multiprocessor Programming 49 Signal q.signal(); • Awakens one waiting thread – Which will reacquire lock Art of Multiprocessor Programming 50 Signal All q.signalAll(); • Awakens all waiting threads – Which will each reacquire lock Art of Multiprocessor Programming 51 A Monitor Lock lock() Critical Section waiting room unlock() Art of Multiprocessor Programming 52 Unsuccessful Deq lock() deq() Critical Section waiting room await() Oh no, empty! Art of Multiprocessor Programming 53 Another One lock() deq() Critical Section waiting room await() Oh no, empty! Art of Multiprocessor Programming 54 Enqueuer to the Rescue Yawn! Yawn! lock() enq( ) Critical Section waiting room signalAll() unlock() Art of Multiprocessor Programming 55 Monitor Signalling Yawn! Yawn! Critical Section waiting room Awakened thread might still lose lock to outside contender… Art of Multiprocessor Programming 56 Dequeuers Signalled Yawn! Critical Section waiting room Found it Art of Multiprocessor Programming 57 Dequeuers Signaled Yawn! Critical Section waiting room Still empty! Art of Multiprocessor Programming 58 Dollar Short + Day Late Critical Section waiting room Art of Multiprocessor Programming 59 Java Synchronized Methods public class Queue<T> { int head = 0, tail = 0; T[QSIZE] items; public synchronized T deq() { while (tail – head == 0) wait(); T result = items[head % QSIZE]; head++; notifyAll(); return result; } … }} Art of Multiprocessor Programming 60 Java Synchronized Methods public class Queue<T> { int head = 0, tail = 0; T[QSIZE] items; public synchronized T deq() { while (tail – head == 0) wait(); T result = items[head % QSIZE]; head++; notifyAll(); return result; Each object has an implicit } … lock with an implicit condition }} Art of Multiprocessor Programming 61 Java Synchronized Methods public class Queue<T> { int head = 0, tail = 0; T[QSIZE] items; Lock on entry, unlock on return public synchronized T deq() { while (tail – head == 0) wait(); T result = items[head % QSIZE]; head++; notifyAll(); return result; } … }} Art of Multiprocessor Programming 62 Java Synchronized Methods public class Queue<T> { int head = 0, tail = 0; T[QSIZE] items; Wait on implicit condition public synchronized T deq() { while (tail – head == 0) wait(); T result = items[head % QSIZE]; head++; this.notifyAll(); return result; } … }} Art of Multiprocessor Programming 63 Java Synchronized Methods public class Queue<T> { Signal all threads waiting on condition tail = 0; int head = 0, T[QSIZE] items; public synchronized T deq() { while (tail – head == 0) this.wait(); T result = items[head % QSIZE]; head++; notifyAll(); return result; } … }} Art of Multiprocessor Programming 64 (Pop!) The Bounded Queue public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Node tail; int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 65 Bounded Queue Fields public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Node tail; int capacity; Enq & deq locks enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 66 Bounded Queue Fields public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Enq lock’s associated Node tail; condition int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 67 Bounded Queue Fields public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Node head; Node tail; size: 0 to capacity int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 68 Bounded Queue Fields public class BoundedQueue<T> { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger size; Head and Tail Node head; Node tail; int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); } Art of Multiprocessor Programming 69 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == Capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } … } Art of Multiprocessor Programming 70 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { Lock and unlock while (size.get() == capacity) enq lock notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } … } Art of Multiprocessor Programming 71 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } Wait while queue is … } Art of Multiprocessor Programming full … 72 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } when await() returns, you … might still fail the test ! } Art of Multiprocessor Programming 73 Be Afraid public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } After the loop: how do we know the … queue won’t become full again? } Art of Multiprocessor Programming 74 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } … Add new } Art of Multiprocessor Programming node 75 Enq Method Part One public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (size.get() == capacity) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = tail.next; if (size.getAndIncrement() == 0) mustWakeDequeuers = true; } finally { enqLock.unlock(); } If queue was empty, wake … frustrated dequeuers } Art of Multiprocessor Programming 76 Beware Lost Wake-Ups Yawn! lock() enq( ) Critical Section waiting room Queue empty so signal () unlock() Art of Multiprocessor Programming 77 Lost Wake-Up Yawn! lock() enq( ) Critical Section waiting room Queue not empty so no need to signal unlock() Art of Multiprocessor Programming 78 Lost Wake-Up Yawn! Critical Section waiting room Art of Multiprocessor Programming 79 Lost Wake-Up Critical Section waiting room Found it Art of Multiprocessor Programming 80 What’s Wrong Here? Still waiting ….! Critical Section waiting room Art of Multiprocessor Programming 81 Solution to Lost Wakeup • Always use – signalAll() and notifyAll() • Not – signal() and notify() Art of Multiprocessor Programming 82 Enq Method Part Deux public void enq(T x) { … if (mustWakeDequeuers) { deqLock.lock(); try { notEmptyCondition.signalAll(); } finally { deqLock.unlock(); } } } Art of Multiprocessor Programming 83 Enq Method Part Deux public void enq(T x) { … if (mustWakeDequeuers) { deqLock.lock(); try { notEmptyCondition.signalAll(); } finally { deqLock.unlock(); } } }Are there dequeuers to be signaled? Art of Multiprocessor Programming 84 Enq Method Part Deux public void enq(T x) { Lock and … if (mustWakeDequeuers) { unlock deq lock deqLock.lock(); try { notEmptyCondition.signalAll(); } finally { deqLock.unlock(); } } } Art of Multiprocessor Programming 85 Enq Method Part Deux public void enq(T x) { Signal dequeuers that … queue is no longer empty if (mustWakeDequeuers) { deqLock.lock(); try { notEmptyCondition.signalAll(); } finally { deqLock.unlock(); } } } Art of Multiprocessor Programming 86 The Enq() & Deq() Methods • Share no locks – That’s good • But do share an atomic counter – Accessed on every method call – That’s not so good • Can we alleviate this bottleneck? Art of Multiprocessor Programming 87 Split the Counter • The enq() method – Increments only – Cares only if value is capacity • The deq() method – Decrements only – Cares only if value is zero Art of Multiprocessor Programming 88 Split Counter • Enqueuer increments enqSize • Dequeuer decrements deqSize • When enqueuer runs out – Locks deqLock – computes size = enqSize - DeqSize • Intermittent synchronization – Not with each method call – Need both locks! (careful …) Art of Multiprocessor Programming 89 A Lock-Free Queue head tail Sentinel Art of Multiprocessor Programming 90 Compare and Set CAS Art of Multiprocessor Programming 91 Enqueue head tail enq( ) Art of Multiprocessor Programming 92 Enqueue head tail Art of Multiprocessor Programming 93 Logical Enqueue CAS head tail Art of Multiprocessor Programming 94 Physical Enqueue head tail CAS Art of Multiprocessor Programming 95 Enqueue • These two steps are not atomic • The tail field refers to either – Actual last Node (good) – Penultimate Node (not so good) • Be prepared! Art of Multiprocessor Programming 96 Enqueue • What do you do if you find – A trailing tail? • Stop and help fix it – If tail node has non-null next field – CAS the queue’s tail field to tail.next • As in the universal construction Art of Multiprocessor Programming 97 When CASs Fail • During logical enqueue – Abandon hope, restart – Still lock-free (why?) • During physical enqueue – Ignore it (why?) Art of Multiprocessor Programming 98 Dequeuer head tail Read value Art of Multiprocessor Programming 99 Dequeuer Make first Node new sentinel CAS head tail Art of Multiprocessor Programming 100 Memory Reuse? • What do we do with nodes after we dequeue them? • Java: let garbage collector deal? • Suppose there is no GC, or we prefer not to use it? Art of Multiprocessor Programming 101 Dequeuer CAS head tail Can recycle Art of Multiprocessor Programming 102 Simple Solution • Each thread has a free list of unused queue nodes • Allocate node: pop from list • Free node: push onto list • Deal with underflow somehow … Art of Multiprocessor Programming 103 Why Recycling is Hard head tail Want to redirect head from gray to red zzz… Free pool Art of Multiprocessor Programming 104 Both Nodes Reclaimed head tail zzz Free pool Art of Multiprocessor Programming 105 One Node Recycled head tail Yawn! Free pool Art of Multiprocessor Programming 106 Why Recycling is Hard head tail CAS OK, here I go! Free pool Art of Multiprocessor Programming 107 Recycle FAIL head tail zOMG what went wrong? Free pool Art of Multiprocessor Programming 108 The Dreaded ABA Problem head tail Head reference has value A Thread reads value A Art of Multiprocessor Programming 109 Dreaded ABA continued head zzz tail Head reference has value B Node A freed Art of Multiprocessor Programming 110 Dreaded ABA continued head Yawn! tail Head reference has value A again Node A recycled and reinitialized Art of Multiprocessor Programming 111 Dreaded ABA continued head tail CAS CAS succeeds because references match, even though reference’s meaning has changed Art of Multiprocessor Programming 112 The Dreaded ABA FAIL • Is a result of CAS() semantics – I blame Sun, Intel, AMD, … • Not with Load-Locked/Store-Conditional – Good for IBM? Art of Multiprocessor Programming 113 Dreaded ABA – A Solution • • • • Tag each pointer with a counter Unique over lifetime of node Pointer size vs word size issues Overflow? – Don’t worry be happy? – Bounded tags? • AtomicStampedReference class Art of Multiprocessor Programming 114 Atomic Stamped Reference • AtomicStampedReference class – Java.util.concurrent.atomic package Can get reference & stamp atomically Reference address S Stamp Art of Multiprocessor Programming 115 Concurrent Stack • Methods – push(x) – pop() • Last-in, First-out (LIFO) order • Lock-Free! Art of Multiprocessor Programming 116 Empty Stack Top Art of Multiprocessor Programming 117 Push Top Art of Multiprocessor Programming 118 Push Top CAS Art of Multiprocessor Programming 119 Push Top Art of Multiprocessor Programming 120 Push Top Art of Multiprocessor Programming 121 Push Top Art of Multiprocessor Programming 122 Push Top CAS Art of Multiprocessor Programming 123 Push Top Art of Multiprocessor Programming 124 Pop Top Art of Multiprocessor Programming 125 Pop Top CAS Art of Multiprocessor Programming 126 Pop Top CAS mine! Art of Multiprocessor Programming 127 Pop Top CAS Art of Multiprocessor Programming 128 Pop Top Art of Multiprocessor Programming 129 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else backoff.backoff(); }} Art of Multiprocessor Programming 130 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public Boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else backoff.backoff() tryPush attempts to push a node }} Art of Multiprocessor Programming 131 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; Read top value } else backoff.backoff() }} Art of Multiprocessor Programming 132 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else top backoff.backoff() current will be new node’s successor }} Art of Multiprocessor Programming 133 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; else backoff.backoff() Try to} swing top, return success or failure }} Art of Multiprocessor Programming 134 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else backoff.backoff() Push calls tryPush }} Art of Multiprocessor Programming 135 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); node.next = oldTop; return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else backoff.backoff() Create new node }} Art of Multiprocessor Programming 136 Lock-free Stack public class LockFreeStack { private AtomicReference top = new AtomicReference(null); public boolean tryPush(Node node){ Node oldTop = top.get(); If tryPush() fails, node.next = oldTop; back off before retrying return(top.compareAndSet(oldTop, node)) } public void push(T value) { Node node = new Node(value); while (true) { if (tryPush(node)) { return; } else backoff.backoff() }} Art of Multiprocessor Programming 137 Lock-free Stack • Good – No locking • Bad – Without GC, fear ABA – Without backoff, huge contention at top – In any case, no parallelism Art of Multiprocessor Programming 138 Big Question • Are stacks inherently sequential? • Reasons why – Every pop() call fights for top item • Reasons why not – Stay tuned … Art of Multiprocessor Programming 139 Elimination-Backoff Stack • How to – “turn contention into parallelism” • Replace familiar – exponential backoff • With alternative – elimination-backoff Art of Multiprocessor Programming 140 Observation linearizable stack Push( ) Pop() Yes! After an equal number of pushes and pops, stack stays the same Art of Multiprocessor Programming 141 Idea: Elimination Array Pick at random stack Push( ) Pop() Pick at random Elimination Array Art of Multiprocessor Programming 142 Push Collides With Pop continue stack Push( ) Pop() continue No need to access stack Yes! Art of Multiprocessor Programming 143 No Collision Push( ) stack Pop() If pushes collide or If nocollide collision, pops access stack access stack Art of Multiprocessor Programming 144 Elimination-Backoff Stack • Lock-free stack + elimination array • Access Lock-free stack, – If uncontended, apply operation – if contended, back off to elimination array and attempt elimination Art of Multiprocessor Programming 145 Elimination-Backoff Stack If CAS fails, back off Push( ) CAS Top Pop() Art of Multiprocessor Programming 146 Dynamic Range and Delay Pick random range and max waiting time based on level of contention encountered Push( ) Art of Multiprocessor Programming 147 Linearizability • Un-eliminated calls – linearized as before • Eliminated calls: – linearize pop() immediately after matching push() • Combination is a linearizable stack Art of Multiprocessor Programming 148 Un-Eliminated Linearizability pop(v1) push(v1) time time Art of Multiprocessor Programming 149 Eliminated Linearizability Collision Point pop(v1) push(v1) pop(v2) push(v2) Red calls are eliminated time time Art of Multiprocessor Programming 150 Backoff Has Dual Effect • Elimination introduces parallelism • Backoff to array cuts contention on lockfree stack • Elimination in array cuts down number of threads accessing lock-free stack Art of Multiprocessor Programming 151 Elimination Array public class EliminationArray { private static final int duration = ...; private static final int timeUnit = ...; Exchanger<T>[] exchanger; public EliminationArray(int capacity) { exchanger = new Exchanger[capacity]; for (int i = 0; i < capacity; i++) exchanger[i] = new Exchanger<T>(); … } … } Art of Multiprocessor Programming 152 Elimination Array public class EliminationArray { private static final int duration = ...; private static final int timeUnit = ...; Exchanger<T>[] exchanger; public EliminationArray(int capacity) { exchanger = new Exchanger[capacity]; for (int i = 0; i < capacity; i++) exchanger[i] = new Exchanger<T>(); … } An array of Exchangers … } Art of Multiprocessor Programming 153 Digression: A Lock-Free Exchanger public class Exchanger<T> { AtomicStampedReference<T> slot = new AtomicStampedReference<T>(null, 0); Art of Multiprocessor Programming 154 A Lock-Free Exchanger public class Exchanger<T> { AtomicStampedReference<T> slot = new AtomicStampedReference<T>(null, 0); Atomically modifiable reference + status Art of Multiprocessor Programming 155 Atomic Stamped Reference • AtomicStampedReference class – Java.util.concurrent.atomic package • In C or C++: reference address S stamp Art of Multiprocessor Programming 156 Extracting Reference & Stamp public T get(int[] stampHolder); Art of Multiprocessor Programming 157 Extracting Reference & Stamp public T get(int[] stampHolder); Returns reference to object of type T Returns stamp at array index 0! Art of Multiprocessor Programming 158 Exchanger Status enum Status {EMPTY, WAITING, BUSY}; Art of Multiprocessor Programming 159 Exchanger Status enum Status {EMPTY, WAITING, BUSY}; Nothing yet Art of Multiprocessor Programming 160 Exchange Status enum Status {EMPTY, WAITING, BUSY}; Nothing yet One thread is waiting for rendez-vous Art of Multiprocessor Programming 161 Exchange Status enum Status {EMPTY, WAITING, BUSY}; Nothing yet One thread is waiting for rendez-vous Other threads busy with rendez-vous Art of Multiprocessor Programming 162 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = {EMPTY}; while (true) { if (System.nanoTime() > timeBound) throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; switch(stamp) { case EMPTY: … // slot is free case WAITING: … // someone waiting for me case BUSY: … // others exchanging } } Art of Multiprocessor Programming 163 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = {EMPTY}; while (true) { if (System.nanoTime() > timeBound) Item and timeout throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; switch(stamp) { case EMPTY: … // slot is free case WAITING: … // someone waiting for me case BUSY: … // others exchanging } } Art of Multiprocessor Programming 164 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = {EMPTY}; while (true) { if (System.nanoTime() > timeBound) throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; Array holds status switch(stamp) { case EMPTY: … // slot is free case WAITING: … // someone waiting for me case BUSY: … // others exchanging } } Art of Multiprocessor Programming 165 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = {0}; while (true) { if (System.nanoTime() > timeBound) throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; switch(stamp) { case EMPTY: // slot is free case WAITING: // someone waiting for me case BUSY: // others exchanging Loop until timeout } }} Art of Multiprocessor Programming 166 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = {0}; while (true) { if (System.nanoTime() > timeBound) throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; switch(stamp) { case EMPTY: // slot is free case WAITING: // someone waiting for me case BUSY: // others exchanging Get other’s item and status } }} Art of Multiprocessor Programming 167 The Exchange public T Exchange(T myItem, long nanos) throws TimeoutException { long timeBound = System.nanoTime() + nanos; int[] stampHolder = three {0}; possible states An Exchanger has while (true) { if (System.nanoTime() > timeBound) throw new TimeoutException(); T herItem = slot.get(stampHolder); int stamp = stampHolder[0]; switch(stamp) { case EMPTY: … // slot is free case WAITING: … // someone waiting for me case BUSY: … // others exchanging } }} Art of Multiprocessor Programming 168 Lock-free Exchanger EMPTY Art of Multiprocessor Programming 169 Lock-free Exchanger CAS EMPTY Art of Multiprocessor Programming 170 Lock-free Exchanger WAITING Art of Multiprocessor Programming 171 Lock-free Exchanger In search of partner … WAITING Art of Multiprocessor Programming 172 Lock-free Exchanger Try to exchange item and set status to BUSY Still waiting … Slot CAS WAITING Art of Multiprocessor Programming 173 Lock-free Exchanger Partner showed up, take item and reset to EMPTY Slot BUSY item status Art of Multiprocessor Programming 174 Lock-free Exchanger Partner showed up, take item and reset to EMPTY Slot BUSY EMPTY item status Art of Multiprocessor Programming 175 Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); return herItem; }} if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 176 Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); Try to insert myItem and return herItem; }} change state to WAITING if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 177 Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); return herItem; }} if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { Spin until either herItem = slot.get(stampHolder); myItem is taken or timeout slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 178 Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); return herItem; }} if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { was taken, herItemmyItem = slot.get(stampHolder); slot.set(null, EMPTY); so return herItem return herItem; that was put in its place } } break; Art of Multiprocessor Programming 179 Exchanger State EMPTY case EMPTY: // slot is free ifOtherwise (slot.CAS(herItem, myItem, EMPTY, WAITING)) { we ran out of time, while (System.nanoTime() < timeBound){ try to reset status to EMPTY herItem = slot.get(stampHolder); if (stampHolder[0] and time out== BUSY) { slot.set(null, EMPTY); return herItem; }} if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 180 Exchanger State EMPTY case EMPTY: // slot is free if (slot.compareAndSet(herItem, myItem, WAITING, BUSY)) { while (System.nanoTime() < timeBound){ If reset failed, herItem = slot.get(stampHolder); someone showed up after if (stampHolder[0] == BUSY) { all, slot.set(null, so takeEMPTY); that item return herItem; }} if (slot.compareAndSet(myItem, null, WAITING, EMPTY)){throw new TimeoutException(); } else { herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } Art ofArt Multiprocessor of Multiprocessor Programming 181 } break; Programming© Herlihy-Shavit Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); Clear slot and take that item return herItem; }} if (slot.CAS(myItem, null, WAITING, EMPTY)){ throw new TimeoutException(); } else { herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 182 Exchanger State EMPTY case EMPTY: // slot is free if (slot.CAS(herItem, myItem, EMPTY, WAITING)) { while (System.nanoTime() < timeBound){ herItem = slot.get(stampHolder); if (stampHolder[0] == BUSY) { slot.set(null, EMPTY); If initial CAS failed, return herItem; }} then someone else changed status if (slot.CAS(myItem, null, WAITING, EMPTY)){ EMPTY to WAITING, throw from new TimeoutException(); } else { so retry from start herItem = slot.get(stampHolder); slot.set(null, EMPTY); return herItem; } } break; Art of Multiprocessor Programming 183 States WAITING and BUSY case WAITING: // someone waiting for me if (slot.CAS(herItem, myItem, WAITING, BUSY)) return herItem; break; case BUSY: // others in middle of exchanging break; default: // impossible break; } } } } Art of Multiprocessor Programming 184 States WAITING and BUSY case WAITING: // someone waiting for me if (slot.CAS(herItem, myItem, WAITING, BUSY)) return herItem; break; case BUSY: // others in middle of exchanging break; default: // impossible someone is waiting to exchange, break; } so try to CAS my item in } and change state to BUSY } } Art of Multiprocessor Programming 185 States WAITING and BUSY case WAITING: // someone waiting for me if (slot.CAS(herItem, myItem, WAITING, BUSY)) return herItem; break; case BUSY: // others in middle of exchanging break; default: // impossible break; } If successful, return other’s item, } } otherwise someone else took it, } so try again from start Art of Multiprocessor Programming 186 States WAITING and BUSY case WAITING: // someone waiting for me if (slot.CAS(herItem, myItem, WAITING, BUSY)) return herItem; break; case BUSY: // others in middle of exchanging break; default: // impossible break; } If BUSY, } } other threads exchanging, } so start again Art of Multiprocessor Programming 187 The Exchanger Slot • Exchanger is lock-free • Because the only way an exchange can fail is if others repeatedly succeeded or no-one showed up • The slot we need does not require symmetric exchange Art of Multiprocessor Programming 188 Back to the Stack: the Elimination Array public class EliminationArray { … public T visit(T value, int range) throws TimeoutException { int slot = random.nextInt(range); int nanodur = convertToNanos(duration, timeUnit)); return (exchanger[slot].exchange(value, nanodur) }} Art of Multiprocessor Programming 189 Elimination Array public class EliminationArray { … public T visit(T value, int range) throws TimeoutException { int slot = random.nextInt(range); int nanodur = convertToNanos(duration, timeUnit)); return (exchanger[slot].exchange(value, nanodur) visit the elimination array }} with fixed value and range Art of Multiprocessor Programming 190 Elimination Array public class EliminationArray { … public T visit(T value, int range) throws TimeoutException { int slot = random.nextInt(range); int nanodur = convertToNanos(duration, timeUnit)); return (exchanger[slot].exchange(value, nanodur) }} Pick a random array entry Art of Multiprocessor Programming 191 Elimination Array public class EliminationArray { … Exchange value or time out public T visit(T value, int range) throws TimeoutException { int slot = random.nextInt(range); int nanodur = convertToNanos(duration, timeUnit)); return (exchanger[slot].exchange(value, nanodur) }} Art of Multiprocessor Programming 192 Elimination Stack Push public void push(T value) { ... while (true) { if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } } Art of Multiprocessor Programming 193 Elimination Stack Push public void push(T value) { ... while (true) { if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } First, try to push } Art of Multiprocessor Programming 194 Elimination Stack Push public void push(T value) { ... If I failed, backoff & try to eliminate while (true) { if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } } Art of Multiprocessor Programming 195 Elimination Stack Push public void push(T value) { ... while (true) { Value pushed and range to try if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } } Art of Multiprocessor Programming 196 Elimination Stack Push public void push(T value) { ... Only pop() leaves null, while (true) { so elimination was successful if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } } Art of Multiprocessor Programming 197 Elimination Stack Push public void push(T value) { ... Otherwise, retry push() on lock-free stack while (true) { if (tryPush(node)) { return; } else try { T otherValue = eliminationArray.visit(value,policy.range); if (otherValue == null) { return; } } Art of Multiprocessor Programming 198 Elimination Stack Pop public T pop() { ... while (true) { if (tryPop()) { return returnNode.value; } else try { T otherValue = eliminationArray.visit(null,policy.range; if (otherValue != null) { return otherValue; } } }} Art of Multiprocessor Programming 199 Elimination Stack Pop public T pop() { ... while (true) { if (tryPop()) { other thread is a push(), If value not null, return returnNode.value; } else so elimination succeeded try { T otherValue = eliminationArray.visit(null,policy.range; if ( otherValue != null) { return otherValue; } } }} Art of Multiprocessor Programming 200 Summary • We saw both lock-based and lock-free implementations of • queues and stacks • Don’t be quick to declare a data structure inherently sequential – Linearizable stack is not inherently sequential (though it is in worst case) • ABA is a real problem, pay attention Art of Multiprocessor Programming 201 This work is licensed under a Creative Commons AttributionShareAlike 2.5 License. • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work • Under the following conditions: – Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. • For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – http://creativecommons.org/licenses/by-sa/3.0/. • Any of the above conditions can be waived if you get permission from the copyright holder. • Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming 202 Art of Multiprocessor Programming 203