Linked Lists: Locking, LockFree, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Last Lecture: Spin-Locks .. CS spin lock critical section Art of Multiprocessor Programming© Herlihy-Shavit Resets lock upon exit 2 This Lecture • Introduce four “patterns” – Bag of tricks … – Methods that work more than once … • For highly-concurrent objects • Goal: – Concurrent access – More threads, more throughput Art of Multiprocessor Programming© Herlihy-Shavit 6 Linked List • Illustrate these patterns … • Using a list-based Set – Common application – Building block for other apps Art of Multiprocessor Programming© Herlihy-Shavit 11 Set Interface • Unordered collection of items • No duplicates • Methods – add(x) put x in set – remove(x) take x out of set – contains(x) tests if x in set Art of Multiprocessor Programming© Herlihy-Shavit 12 List Node public class Node { public T item; public int key; public Node next; } Art of Multiprocessor Programming© Herlihy-Shavit 17 The List-Based Set -∞ a b c +∞ Sorted with Sentinel nodes (min & max possible keys) Art of Multiprocessor Programming© Herlihy-Shavit 21 Reasoning about Concurrent Objects • Invariant – Property that always holds • Established by – True when object is created – Truth preserved by each method • Each step of each method Art of Multiprocessor Programming© Herlihy-Shavit 22 Specifically … • Invariants preserved by – add() – remove() – contains() • Most steps are trivial – Usually one step tricky – Often linearization point Art of Multiprocessor Programming© Herlihy-Shavit 23 Interference • Invariants make sense only if – methods considered – are the only modifiers • Language encapsulation helps – List nodes not visible outside class Art of Multiprocessor Programming© Herlihy-Shavit 24 Interference • Freedom from interference needed even for removed nodes – Some algorithms traverse removed nodes – Careful with malloc() & free()! • Garbage-collection helps here Art of Multiprocessor Programming© Herlihy-Shavit 25 Rep Invariant • Which concrete values meaningful? – Sorted? – Duplicates? • Rep invariant – Characterizes legal concrete reps – Preserved by methods – Relied on by methods Art of Multiprocessor Programming© Herlihy-Shavit 28 Blame Game • Rep invariant is a contract • Suppose – add() leaves behind 2 copies of x – remove() removes only 1 • Which one is incorrect? Art of Multiprocessor Programming© Herlihy-Shavit 29 Blame Game • Suppose – add() leaves behind 2 copies of x – remove() removes only 1 • Which one is incorrect? – If rep invariant says no duplicates • add() is incorrect – Otherwise • remove() is incorrect Art of Multiprocessor Programming© Herlihy-Shavit 30 Rep Invariant (partly) • Sentinel nodes – tail reachable from head • Sorted • No duplicates Art of Multiprocessor Programming© Herlihy-Shavit 31 Abstraction Map • S(head) = – { x | there exists a such that • a reachable from head and • a.item = x –} Art of Multiprocessor Programming© Herlihy-Shavit 32 Sequential List Based Set Add() a c d a b c Remove() Art of Multiprocessor Programming© Herlihy-Shavit 33 Sequential List Based Set Add() a c d b c b Remove() a Art of Multiprocessor Programming© Herlihy-Shavit 34 Coarse Grained Locking a b Art of Multiprocessor Programming© Herlihy-Shavit d 35 Coarse Grained Locking a d b c Art of Multiprocessor Programming© Herlihy-Shavit 36 Coarse Grained Locking a d b honk! honk! c Simple but hotspot + bottleneck Art of Multiprocessor Programming© Herlihy-Shavit 37 Coarse-Grained Locking • Easy, same as synchronized methods – “One lock to rule them all …” • Simple, clearly correct – Deserves respect! • Works poorly with contention – Queue locks help – But bottleneck still an issue Art of Multiprocessor Programming© Herlihy-Shavit 38 Fine-grained Locking • Requires careful thought – “Do not meddle in the affairs of wizards, for they are subtle and quick to anger” • Split object into pieces – Each piece has own lock – Methods that work on disjoint pieces need not exclude each other Art of Multiprocessor Programming© Herlihy-Shavit 39 Hand-over-Hand locking a b Art of Multiprocessor Programming© Herlihy-Shavit c 40 Hand-over-Hand locking a b Art of Multiprocessor Programming© Herlihy-Shavit c 41 Hand-over-Hand locking a b Art of Multiprocessor Programming© Herlihy-Shavit c 42 Hand-over-Hand locking a b Art of Multiprocessor Programming© Herlihy-Shavit c 43 Hand-over-Hand locking a b Art of Multiprocessor Programming© Herlihy-Shavit c 44 Removing a Node a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 45 Removing a Node a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 46 Removing a Node a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 47 Removing a Node a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 48 Removing a Node a c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 49 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 50 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 51 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 52 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 53 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 54 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 55 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 56 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 57 Uh, Oh a c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 58 Uh, Oh Bad news a c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 59 Problem • To delete node b – Swing node a’s next field to c a b c • Problem is, – Someone could delete c concurrently a Art of Multiprocessor Programming© Herlihy-Shavit b c 60 Insight • If a node is locked – No one can delete node’s successor • If a thread locks – Node to be deleted – And its predecessor – Then it works Art of Multiprocessor Programming© Herlihy-Shavit 61 Hand-Over-Hand Again a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 62 Hand-Over-Hand Again a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 63 Hand-Over-Hand Again a b c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 64 Hand-Over-Hand Again a b c d Found it! remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 65 Hand-Over-Hand Again a b c d Found it! remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 66 Hand-Over-Hand Again a c d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 67 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 68 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 69 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 70 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 71 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 72 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 73 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 74 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 75 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 76 Removing a Node a b c d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 77 Removing a Node a b d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 78 Removing a Node a b d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 79 Removing a Node a b d remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 80 Removing a Node a d remove(c) remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 81 Removing a Node a d Art of Multiprocessor Programming© Herlihy-Shavit 82 Adding Nodes • To add node e – Must lock predecessor – Must lock successor • Neither can be deleted – (Is successor lock actually required?) Art of Multiprocessor Programming© Herlihy-Shavit 110 Same Abstraction Map • S(head) = – { x | there exists a such that • a reachable from head and • a.item = x –} Art of Multiprocessor Programming© Herlihy-Shavit 111 Rep Invariant • Easy to check that – tail always reachable from head – Nodes sorted, no duplicates Art of Multiprocessor Programming© Herlihy-Shavit 112 Drawbacks • Better than coarse-grained lock – Threads can traverse in parallel • Still not ideal – Long chain of acquire/release – Inefficient Art of Multiprocessor Programming© Herlihy-Shavit 113 Optimistic Synchronization • Find nodes without locking • Lock nodes • Check that everything is OK Art of Multiprocessor Programming© Herlihy-Shavit 114 Optimistic: Traverse without Locking a add(c) b d e Aha! Art of Multiprocessor Programming© Herlihy-Shavit 115 Optimistic: Lock and Load a b d e add(c) Art of Multiprocessor Programming© Herlihy-Shavit 116 What Can Possibly Go Wrong? a b d e add(c) Art of Multiprocessor Programming© Herlihy-Shavit 117 What Can Possibly Go Wrong? a b d e remove(b ) add(c) Art of Multiprocessor Programming© Herlihy-Shavit 118 What Can Possibly Go Wrong? a b d e add(c) Art of Multiprocessor Programming© Herlihy-Shavit 119 Validate (1) a add(c) b d e Yes, b still reachable from head Art of Multiprocessor Programming© Herlihy-Shavit 120 What Else Can Go Wrong? a b d e add(c) Art of Multiprocessor Programming© Herlihy-Shavit 121 What Else Can Go Wrong? b’ a b d e add(b’) add(c) Art of Multiprocessor Programming© Herlihy-Shavit 122 What Else Can Go Wrong? b’ a b d e add(c) Art of Multiprocessor Programming© Herlihy-Shavit 123 Optimistic: Validate(2) a add(c) b d e Yes, b still points to d Art of Multiprocessor Programming© Herlihy-Shavit 124 Optimistic: Linearization Point a b d e c add(c) Art of Multiprocessor Programming© Herlihy-Shavit 125 Same Abstraction Map • S(head) = – { x | there exists a such that • a reachable from head and • a.item = x –} Art of Multiprocessor Programming© Herlihy-Shavit 126 Invariants • Careful: we may traverse deleted nodes • But we establish properties by – Validation – After we lock target nodes Art of Multiprocessor Programming© Herlihy-Shavit 127 Removing an Absent Node a b d e Aha! remove(c ) Art of Multiprocessor Programming© Herlihy-Shavit 129 Validate (1) a b d e Yes, b still reachable from head remove(c ) Art of Multiprocessor Programming© Herlihy-Shavit 130 Validate (2) a b remove(c ) d e Yes, b still points to d Art of Multiprocessor Programming© Herlihy-Shavit 131 OK Computer a b remove(c ) d e return true Art of Multiprocessor Programming© Herlihy-Shavit 132 Optimistic List • Limited hot-spots – Targets of add(), remove(), contains() – No contention on traversals • Moreover – Traversals are wait-free – Food for thought … Art of Multiprocessor Programming© Herlihy-Shavit 156 So Far, So Good • Much less lock acquisition/release – Performance – Concurrency • Problems – Need to traverse list twice – contains() method acquires locks • Most common method call Art of Multiprocessor Programming© Herlihy-Shavit 157 Evaluation • Optimistic is effective if – cost of scanning twice without locks • Less than – cost of scanning once with locks • Drawback – contains() acquires locks – 90% of calls in many apps Art of Multiprocessor Programming© Herlihy-Shavit 158 Lazy List • Like optimistic, except – Scan once – contains(x) never locks … • Key insight – Removing nodes causes trouble – Do it “lazily” Art of Multiprocessor Programming© Herlihy-Shavit 159 Lazy List • remove() – Scans list (as before) – Locks predecessor & current (as before) • Logical delete – Marks current node as removed (new!) • Physical delete – Redirects predecessor’s next (as before) Art of Multiprocessor Programming© Herlihy-Shavit 160 Lazy Removal a b c Art of Multiprocessor Programming© Herlihy-Shavit d 161 Lazy Removal a b c d Present in list Art of Multiprocessor Programming© Herlihy-Shavit 162 Lazy Removal a b c d Logically deleted Art of Multiprocessor Programming© Herlihy-Shavit 163 Lazy Removal a b c d Physically deleted Art of Multiprocessor Programming© Herlihy-Shavit 164 Lazy Removal a b d Physically deleted Art of Multiprocessor Programming© Herlihy-Shavit 165 Lazy List • All Methods – Scan through locked and marked nodes – Removing a node doesn’t slow down other method calls … • Must still lock pred and curr nodes. Art of Multiprocessor Programming© Herlihy-Shavit 166 Validation • • • • No need to rescan list! Check that pred is not marked Check that curr is not marked Check that pred points to curr Art of Multiprocessor Programming© Herlihy-Shavit 167 Business as Usual a b Art of Multiprocessor Programming© Herlihy-Shavit c 168 Business as Usual a b Art of Multiprocessor Programming© Herlihy-Shavit c 169 Business as Usual a b Art of Multiprocessor Programming© Herlihy-Shavit c 170 Business as Usual a b c remove(b) Art of Multiprocessor Programming© Herlihy-Shavit 171 Business as Usual a b c a not marked Art of Multiprocessor Programming© Herlihy-Shavit 172 Business as Usual a b c a still points to b Art of Multiprocessor Programming© Herlihy-Shavit 173 Business as Usual a b c Logical delete Art of Multiprocessor Programming© Herlihy-Shavit 174 Business as Usual a b c physical delete Art of Multiprocessor Programming© Herlihy-Shavit 175 Business as Usual a b Art of Multiprocessor Programming© Herlihy-Shavit c 176 New Abstraction Map • S(head) = – { x | there exists node a such that • a reachable from head and • a.item = x and • a is unmarked –} Art of Multiprocessor Programming© Herlihy-Shavit 177 Invariant • If not marked then item in the set • and reachable from head • and if not yet traversed it is reachable from pred Art of Multiprocessor Programming© Herlihy-Shavit 178 Summary: Wait-free Contains a 0 b 0 dc 1 0 e 0 Use Mark bit + Fact that List is ordered 1. Not marked in the set 2. Marked or missing not in the set Art of Multiprocessor Programming© Herlihy-Shavit 193 Lazy List a 0 b 0 dc 1 0 e 0 Lazy add() and remove() + Wait-free contains() Art of Multiprocessor Programming© Herlihy-Shavit 194 Evaluation • Good: – – – – contains() doesn’t lock In fact, its wait-free! Good because typically high % contains() Uncontended calls don’t re-traverse • Bad – Contended calls do re-traverse – Traffic jam if one thread delays Art of Multiprocessor Programming© Herlihy-Shavit 195 Traffic Jam • Any concurrent data structure based on mutual exclusion has a weakness • If one thread – Enters critical section – And “eats the big muffin” • Cache miss, page fault, descheduled … • Software error, … – Everyone else using that lock is stuck! Art of Multiprocessor Programming© Herlihy-Shavit 196 Reminder: Lock-Free Data Structures • No matter what … – Some thread will complete method call – Even if others halt at malicious times – Weaker than wait-free, yet • Implies that – You can’t use locks (why?) – Um, that’s why they call it lock-free Art of Multiprocessor Programming© Herlihy-Shavit 197 Lock-free Lists • Next logical step • Eliminate locking entirely • contains() wait-free and add() and remove() lock-free • Use only compareAndSet() • What could go wrong? Art of Multiprocessor Programming© Herlihy-Shavit 198 Adding a Node a b Art of Multiprocessor Programming© Herlihy-Shavit c 199 Adding a Node a b c b Art of Multiprocessor Programming© Herlihy-Shavit 200 Adding a Node a CAS b c b Art of Multiprocessor Programming© Herlihy-Shavit 201 Adding a Node a b c b Art of Multiprocessor Programming© Herlihy-Shavit 202 Adding a Node a b c b Art of Multiprocessor Programming© Herlihy-Shavit 203 Removing a Node a CAS bCAS c d remov e c remov e b Art of Multiprocessor Programming© Herlihy-Shavit 204 Look Familiar? Bad news a b c d remov e c remov e b Art of Multiprocessor Programming© Herlihy-Shavit 205 Problem • Method updates node’s next field • After node has been removed Art of Multiprocessor Programming© Herlihy-Shavit 206 Solution • Use AtomicMarkableReference • Atomically – Swing reference and – Update flag • Remove in two steps – Set mark bit in next field – Redirect predecessor’s pointer Art of Multiprocessor Programming© Herlihy-Shavit 207 Removing a Node a bCAS c d remov e c Art of Multiprocessor Programming© Herlihy-Shavit 218 Removing a Node failed a CAS bCAS c d remov e c remov e b Art of Multiprocessor Programming© Herlihy-Shavit 219 Removing a Node a b c d remov e c remov e b Art of Multiprocessor Programming© Herlihy-Shavit 220 Removing a Node a d remov e c remov e b Art of Multiprocessor Programming© Herlihy-Shavit 221 Traversing the List • Q: what do you do when you find a “logically” deleted node in your path? • A: finish the job. – CAS the predecessor’s next field – Proceed (repeat as needed) Art of Multiprocessor Programming© Herlihy-Shavit 222 Lock-Free Traversal a CAS b c d Uh-oh Art of Multiprocessor Programming© Herlihy-Shavit 223 Summary: Lock-free Removal Logical Removal = Set Mark Bit a 0 b 0 Use CAS to verify pointer is correct Not enough! cc 1 0 e 0 Physical Removal CAS pointer Art of Multiprocessor Programming© Herlihy-Shavit 256 Lock-free Removal Logical Removal = Set Mark Bit a 0 b 0 cc 1 0 Problem: Physical d not added to list… Removal Must Prevent CAS manipulation of removed node’s pointer Art of Multiprocessor Programming© Herlihy-Shavit e 0 d 0 Node added Before Physical Removal CAS 257 Our Solution: Combine Bit and Pointer Logical Removal = Set Mark Bit a 0 b 0 e 0 cc 1 0 d 0 Mark-Bit and Pointer are CASed together Physical Removal Fail CAS: Node not added after logical CAS Removal Art of Multiprocessor Programming© Herlihy-Shavit 258 A Lock-free Algorithm a 0 b 0 cc 1 0 e 0 1. add() and remove() physically remove marked nodes 2. Wait-free find() traverses both marked and removed nodes Art of Multiprocessor Programming© Herlihy-Shavit 259 Performance On 16 node shared memory machine Benchmark throughput of Java List-based Set algs. Vary % of Contains() method Calls. Art of Multiprocessor Programming© Herlihy-Shavit 260 High Contains Ratio Lock-free Lazy list Course Grained Fine Lock-coupling Art of Multiprocessor Programming© Herlihy-Shavit 261 Low Contains Ratio Lock-free Lazy list Course Grained Fine Lock-coupling Art of Multiprocessor Programming© Herlihy-Shavit 262 As Contains Ratio Increases Lock-free Lazy list Course Grained Fine Lock-coupling % Contains() Art of Multiprocessor Programming© Herlihy-Shavit 263 Summary • • • • Coarse-grained locking Fine-grained locking Optimistic synchronization Lock-free synchronization Art of Multiprocessor Programming© Herlihy-Shavit 264 “To Lock or Not to Lock” • Locking vs. Non-blocking: Extremist views on both sides • The answer: nobler to compromise, combine locking and non-blocking – Example: Lazy list combines blocking add() and remove() and a wait-free contains() – Blocking/non-blocking is a property of a method Art of Multiprocessor Programming© Herlihy-Shavit 265 This work is licensed under a Creative Commons AttributionShareAlike 2.5 License. • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work • Under the following conditions: – Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. • For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – http://creativecommons.org/licenses/by-sa/3.0/. • Any of the above conditions can be waived if you get permission from the copyright holder. • Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming© Herlihy-Shavit 266