CONCURRENT HASHING AND NATURAL PARALLELISM Lecture by Sagi Marcovich Based on Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming, chapter 13. Hashing - Reminder Hashing is mapping data of arbitrary size to data of fixed size. Hash function. Hash values (hashes). © Wikipedia: The Free Encyclopedia, “Hash Function”. 2 The Set Interface the Set interface provides the following methods, which return Boolean values: add(x) adds x to the set. Returns true if x was absent, and false otherwise. remove(x) removes x from the set. Returns true if x was present, and false otherwise. contains(x) returns true if x is present, and false otherwise. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 4 Designing Set Implementations “When designing set implementations, we need to keep the following principle in mind: we can buy more memory, but we cannot buy more time.” © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 5 Hash Sets An efficient way to implement a Set. Uses hashing. Ensures that contains(), add() and remove() calls take 𝑂 1 average time. Uses 𝑂(𝑛) memory when 𝑛 is the input size. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 6 Hash Sets Typically implemented using an array, called ‘table’, and a hash function ‘h’. Two main issues: How do we choose the hash function? (won’t be discussed) We’ll use a simple modulo hash function. What do we do in the case of collisions? Two values 𝑥 ≠ 𝑦 s.t. ℎ 𝑥 = ℎ 𝑦 © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 7 Hash Sets - Collisions Open addressing: Each table entry refers to a single item. Collisions resolved by applying alternative hash functions to test alternative table entries. We know double hashing (data structures 1). Closed addressing: Each table entry refers to a set of items, traditionally called a bucket. Colliding items are placed in the same bucket. We will discuss them in this lecture. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 8 Hash Sets - Resizing In both kinds of algorithms, it is sometimes necessary to resize the table. In open-addressing algorithms, the table may become too full to find alternative table entries. in closed-addressing algorithms, buckets may become too large to search efficiently. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 9 Hash Sets – Extensible Hashing Anecdotal evidence suggests that in most applications, sets are subject to the following distribution of method calls: 90% contains() 9% add() 1% remove() Sets are more likely to grow than to shrink, so we will focus here on extensible hashing, in which hash sets can only grow. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 10 Natural Parallelism So far we discussed how to extract parallelism from data structures that seemed sequential and provided few opportunities for parallelism: Lists, Queue, Stacks, etc.. Concurrent hashing takes the opposite approach: Seems to be “naturally parallelizable”. Disjoint-access-parallel – concurrent method calls are likely to access disjoint locations. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 11 Base Hash Set We start by defining base hash set implementation common to all concurrent closed-addressing hash sets. 1 public abstract class BaseHashSet<T> { 2 protected List<T>[] table; 3 protected int setSize; 4 public BaseHashSet(int capacity) { 5 setSize = 0; 6 table = (List<T>[]) new List[capacity]; 7 for (int i = 0; i < capacity; i++) { 8 table[i] = new ArrayList<T>(); 9 } 10 } 11 ... 12 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 12 Base Hash Set The BaseHashSet<T> implements add(x), contains(x) and remove(x) The BaseHashSet<T> does not implement the abstract methods acquire(x), release(x), policy() and resize() 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 public boolean contains(T x) { acquire(x); try { int myBucket = x.hashCode() % table.length; return table[myBucket].contains(x); } finally { release(x); } } public boolean add(T x) { boolean result = false; acquire(x); try { int myBucket = x.hashCode() % table.length; result = table[myBucket].add(x); setSize = result ? setSize + 1 : setSize; } finally { release(x); } if (policy()) resize(); return result; } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 13 Base Hash Set – abstract functions acquire(x) acquires the locks necessary to manipulate item x. release(x) releases them. policy() decides whether to resize the set, and resize() doubles the capacity of the table[] array. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 14 Hash Sets – policy() The policy() and resize() methods make sure that the hash table’s method calls take constant expected time. Will only happen if traversal in a bucket (linked list) takes constant time, means the bucket has constant expected length. So, when to resize? Many strategies: When the average bucket size exceeds a fixed threshold. When more than ¼ of the buckets exceed the bucket threshold, or a single exceeds the global threshold. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 15 Concurrent closed-Addressed Hash Sets We are going to learn four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set Striped Hash Set Refinable Hash Set Lock-Free Hash Set © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 16 Coarse-Grained Hash Set Synchronization is provided by a single reentrant lock. Limited parallelism. Disjoint-access operations can’t run in parallel. 1 public class CoarseHashSet<T> extends BaseHashSet<T>{ 2 final Lock lock; 3 CoarseHashSet(int capacity) { 4 super(capacity); 5 lock = new ReentrantLock(); 6 } 7 public final void acquire(T x) { 8 lock.lock(); 9 } 10 public void release(T x) { 11 lock.unlock(); 12 } 13 ... 14 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 17 Coarse-Grained Hash Set An “average bucket size” policy. Create a new table and move all the items from the previous table to the new table using the new hash function. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 public boolean policy() { return setSize / table.length > 4; } public void resize() { int oldCapacity = table.length; lock.lock(); try { if (oldCapacity != table.length) { return; // someone beat us to it } int newCapacity = 2 * oldCapacity; List<T>[] oldTable = table; table = (List<T>[]) new List[newCapacity]; for (int i = 0; i < newCapacity; i++) table[i] = new ArrayList<T>(); for (List<T> bucket : oldTable) { for (T x : bucket) { table[x.hashCode() % table.length].add(x); } } } finally { lock.unlock(); } } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 18 Concurrent closed-Addressed Hash Sets We are going to learn four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set Striped Hash Set Refinable Hash Set Lock-Free Hash Set © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 19 Striped Hash Set Instead of using a single lock to synchronize the entire set, we split the set into independently synchronized pieces. The method is called lock striping. The set is initialized with an array of locks and an array of buckets with the same size 𝐿(=capacity). When resizing, the table will grow but the array of locks will not. Table size signed with 𝑁. Lock 𝑖 eventually protects each table entry 𝑗 where 𝑗 = 𝑖 𝑚𝑜𝑑 𝐿 . © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 20 Striped Hash Set Lock 𝑖 = 5 covers both locations that are equal to 5 modulo 𝐿. a Striped hash set with 𝑁 = 16 and 𝐿 = 8. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 21 Striped Hash Set A final array of locks. 1 public class StripedHashSet<T> extends BaseHashSet<T>{ 2 final ReentrantLock[] locks; 3 public StripedHashSet(int capacity) { 4 super(capacity); 5 locks = new Lock[capacity]; 6 for (int j = 0; j < locks.length; j++) { 7 locks[j] = new ReentrantLock(); 8 } 9 } 10 public final void acquire(T x) { 11 locks[x.hashCode() % locks.length].lock(); 12 } 13 public void release(T x) { 14 locks[x.hashCode() % locks.length].unlock(); 15 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 22 Striped Hash Set Resizing is a “stop-theworld” operation. resize() acquires the locks in ascending order. This ensures us that resize() cannot deadlock with any other operation. Releases in ascending order as well. Is it necessary? 16 public void resize() { 17 int oldCapacity = table.length; 18 for (Lock lock : locks) { 19 lock.lock(); 20 } 21 try { 22 if (oldCapacity != table.length) { 23 return; // someone beat us to it 24 } 25 int newCapacity = 2 * oldCapacity; 26 List<T>[] oldTable = table; 27 table = (List<T>[]) new List[newCapacity]; 28 for (int i = 0; i < newCapacity; i++) 29 table[i] = new ArrayList<T>(); 30 for (List<T> bucket : oldTable) { 31 for (T x : bucket) { 32 table[x.hashCode() % table.length].add(x); 33 } 34 } 35 } finally { 36 for (Lock lock : locks) { 37 lock.unlock(); 38 } 39 } 40 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 23 Striped Hash Set There are two reasons not to grow the lock array every time we grow the table: Associating a lock with every table entry could consume too much space, especially when tables are large and contention is low. While resizing the table is straightforward, resizing the lock array (while in use) is more complex, (you’ll see in 5 minutes ) The Striped hash set provides a medium-level granularity Between the Coarse-Grained hash set and the Refinable hash set. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 24 Concurrent closed-Addressed Hash Sets We are going to learn four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set Striped Hash Set Refinable Hash Set Lock-Free Hash Set © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 25 Refinable Hash Set What if we want to refine the granularity of locking as the table size grows, so that the number of locations in a stripe does not continuously grow? Clearly, if we want to resize the lock array, then we need to rely on another form of synchronization. Resizing is rare, so our principal goal is to devise a way to permit the lock array to be resized without substantially increasing the cost of normal method calls. The solution: AtomicMarkableReference<Thread>! © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 26 Atomic Markable Reference - Reminder An AtomicMarkableReference<T> is an object that encapsulates both a reference to an object of type T and a Boolean mark. The fields can be updated atomically, either together or individually. tests and updates both the mark and reference fields updates the mark if the reference field has the expected value 1 public boolean compareAndSet( T expectedReference, 2 T newReference, 3 boolean expectedMark, 4 boolean newMark); 5 public boolean attemptMark(T expectedReference, 6 boolean newMark); 7 public T get(boolean[] marked); returns the reference and stores the mark at position 0 in the argument array. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 27 Refinable Hash Set We introduce a globally shared owner field that combines a Boolean value with a reference to a thread. We use the owner as a mutual exclusion flag between the resize() method and any of the add() methods. Normally, the Boolean value is false, meaning the set is not in the middle of resizing. 1 public class RefinableHashSet<T> extends BaseHashSet<T>{ 2 AtomicMarkableReference<Thread> owner; 3 volatile ReentrantLock[] locks; 4 public RefinableHashSet(int capacity) { 5 super(capacity); 6 locks = new ReentrantLock[capacity]; 7 for (int i = 0; i < capacity; i++) { 8 locks[i] = new ReentrantLock(); 9 } 10 owner = new AtomicMarkableReference<Thread>(null, false); 11 } 12 ... 13 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 28 Refinable Hash Set Spin until no one else resizes the set. Check no other resizes happened. 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 public void acquire(T x) { boolean[] mark = {true}; Thread me = Thread.currentThread(); Thread who; while (true) { do { who = owner.get(mark); } while (mark[0] && who != me); ReentrantLock[] oldLocks = locks; ReentrantLock oldLock = oldLocks[x.hashCode() % oldLocks.length]; oldLock.lock(); who = owner.get(mark); if ((!mark[0] || who == me) && locks == oldLocks) { return; } else { Success. oldLock.unlock(); } Failed, try again. } } public void release(T x) { locks[x.hashCode() % locks.length].unlock(); } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 29 Refinable Hash Set Set myself as the current resizer. Failure means another thread is resizing, means we can return. Enough because no other operation can lock when the mark is true. 61 protected void quiesce() { 62 for (ReentrantLock lock : locks) { 63 while (lock.isLocked()) {} 64 } 65 } 36 public void resize() { 37 int oldCapacity = table.length; 38 boolean[] mark = {false}; 39 int newCapacity = 2 * oldCapacity; 40 Thread me = Thread.currentThread(); 41 if (owner.compareAndSet(null, me, false, true)) { 42 try { 43 if (table.length != oldCapacity) { 44 return; // someone else resized first 45 } 46 quiesce(); 47 List<T>[] oldTable = table; 48 table = (List<T>[]) new List[newCapacity]; 49 for (int i = 0; i < newCapacity; i++) 50 table[i] = new ArrayList<T>(); 51 locks = new ReentrantLock[newCapacity]; 52 for (int j = 0; j < locks.length; j++) { 53 locks[j] = new ReentrantLock(); 54 } 55 initializeFrom(oldTable); Return the mark to 56 } finally { it’s normal state. 57 owner.set(null, false); 58 } 59 } 60 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 30 Concurrent closed-Addressed Hash Sets We are going to learn four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set Striped Hash Set Refinable Hash Set Lock-Free Hash Set © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 31 Lock Free Hash Set It’s time to take the next step and make the hash table lock-free. Problem: the resize operation is a “stop-the-world” operation, and is very difficult to be implemented without locks. Atomic methods operate only on a single memory location. Makes it difficult to move a node atomically from one linked list to another. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 32 Lock Free Hash Set Solution: Let’s flip the conventional hashing structure on it’s head! Move the items among the buckets © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 33 Lock Free Hash Set More specifically, keep all items sorted in a single lock-free linked list. Use LockFreeList as a black box. The buckets are simply references into the list. As the list grows, we add bucket references so that no object is ever too far from the start of a bucket. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 34 Recursive Split-Ordering Let’s assume the number of buckets (=capacity) is 𝑁 = 2𝑖 . A bucket 𝑏 “contains” the items whose hash code 𝑘 satisfies 𝑘 = 𝑏 𝑚𝑜𝑑 𝑁 . Let’s try: 1 𝑁=2 Bucket 0 6 4 0 5 7 Bucket 1 © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 35 Recursive Split-Ordering Let’s assume the number of buckets (=capacity) is 𝑁 = 2𝑖 . A bucket 𝑏 “contains” the items whose hash code 𝑘 satisfies 𝑘 = 𝑏 𝑚𝑜𝑑 𝑁 . Let’s try: 𝑁=4 Bucket 0 1 6 4 0 Bucket 1 5 7 Bucket 2 © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming Bucket 3 36 Recursive Split-Ordering We just defined a total order on the items, which is called recursive split ordering. Given a key, it’s order is defined by it bit-reversed value. Let’s see some magic: these are our number friends, recursive split-ordered. 000 001 011 100 101 111 0 4 6 1 5 7 Bucket 0 Bucket 2 The items do not move! Bucket 1 © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming Bucket 3 37 Split-Ordered Hash set Formally, items of buckets are distinguished by their 𝑖 𝑡ℎ binary digits, when capacity is 2𝑖 . When the capacity grows to 2𝑖+1 , bucket 𝑏 splits between bucket 𝑏 and bucket 𝑏 + 2𝑖 : Those for which 𝑘 = 𝑏 𝑚𝑜𝑑 2𝑖+1 remain in bucket 𝑏. Those for which 𝑘 = 𝑏 + 2𝑖 𝑚𝑜𝑑 2𝑖+1 migrate to bucket 𝑏 + 2𝑖 . The recursive-split ordering ensures us that these two groups are positioned one after another in the list. So splitting is achieved by simply setting bucket 𝑏 + 2𝑖 to reference the first item of the second group. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 38 Split-Ordered Hash set So, a Split-Ordered Hash set is an array of buckets, where each bucket is a reference into a lock-free list. The nodes in the list are sorted by their bit-reversed hash codes. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 39 Split-Ordered Hash set – Corner Cases A bucket is initialized when first used. To avoid deletion of a node referenced by a bucket , we add a sentinel node, which is never deleted, to start of each bucket. The sentinel node’s location rules it’s code to be 𝑏 when inserted to bucket 𝑏. For example: assume 𝑁 = 4, and bucket 3, the sentinel node’s hash key is 3, therefore it’s bitreversed hash code is 11000000. Every ordinary node in bucket 3 has bit-reversed hash code starts with 11, which ensures us that the sentinel node always stays first in the bucket. To avoid contradiction between a sentinel node and an ordinary node, we set the MSB of the ordinary nodes to 1. Sentinel node Ordinary node © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 40 Split-Ordered Hash set – add() simulation Let’s add an item with hash code 𝑘 = 10. 10 = 2 (𝑚𝑜𝑑 4), means we should initialize bucket 2. (𝒂) © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 41 Split-Ordered Hash set – add() simulation (𝒂) Bucket 2 is being initialized, at first an appropriate sentinel node is created. (𝒃) © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 42 Split-Ordered Hash set – add() simulation (𝒃) Let bucket 2 reference the sentinel node. (𝒄) © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 43 (𝒄) Split-Ordered Hash set – add() simulation Finally, the ordinary node with 𝑘 = 10 is inserted to bucket 2. (𝒅) © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 44 Split-Ordered Hash set The BucketList<T> implements the lock-free list used by the splitordered hash set, as explained earlier. An AtomicInteger is an integer which supports the atomic operations: • get() • getAndIncrement() • compareAndSet() 1 public class LockFreeHashSet<T> { protected BucketList<T>[] bucket; 2 protected AtomicInteger bucketSize; 3 protected AtomicInteger setSize; 4 public LockFreeHashSet(int capacity) { 5 bucket = (BucketList<T>[]) new BucketList[capacity]; 6 bucket[0] = new BucketList<T>(); 7 bucketSize = new AtomicInteger(2); 8 setSize = new AtomicInteger(0); 9 10 } 11 ... 12 } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 45 Split-Ordered Hash set Like the earlier policy(). we ensure that no another resize happened. That’s the new resize() method! 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 public boolean add(T x) { int myBucket = BucketList.hashCode(x) % bucketSize.get(); BucketList<T> b = getBucketList(myBucket); if (!b.add(x)) return false; int setSizeNow = setSize.getAndIncrement(); int bucketSizeNow = bucketSize.get(); if (setSizeNow / bucketSizeNow > THRESHOLD) bucketSize.compareAndSet(bucketSizeNow, 2 * bucketSizeNow); return true; } private BucketList<T> getBucketList(int myBucket) { if (bucket[myBucket] == null) initializeBucket(myBucket); return bucket[myBucket]; } © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 46 Split-Ordered Hash Set – Array of buckets To avoid technical distractions, we kept the array of buckets in a large, fixed-size array. This design is obviously far from ideal. In practice, there are more efficient ways to present the buckets. Multilevel tree © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 47 Concurrent closed-Addressed Hash Sets We are going to learn four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set Striped Hash Set Refinable Hash Set Lock-Free Hash Set © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 48 Summary We learned four concurrent closed-Addressed Hash Sets data structures: Coarse-Grained Hash Set – not practical, doesn’t allow parallelism at all. Striped Hash Set – medium granularity, keeps us from the heavy burden of growing the lock array on with every resize, and saves us space. Refinable Hash Set – fine granularity leads to great disjoint-access performance, but resizes are “stop-the-world” and heavy operations. Lock-Free Hash Set – Lock free, provides opportunity to ultimate disjoint-access and parallel performance, but is complicated and hard to implement. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 49 Simulation Simulation with fixed 10,000,000 operations, capacity=128. Varying Contain()%, #Threads. Comparing all four concurrent closed-addressed hash sets Except LockFreeHashSet, results taken from: A Lock-Free Wait-Free Hash Table, Dr. Cliff Click, Azul Systems. © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 50 Simulation – 75% Contains() © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 51 Simulation – 75% Contains() © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 52 Simulation – 95% Contains() © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 53 Simulation – 95% Contains() © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 54 QUESTIONS? © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 55 THE END © Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming 56