Hashing and Natural Parallism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Sequential Closed Hash Map 0 16 1 9 buckets 2 3 2 Items h(k) = k mod 4 Art of Multiprocessor Programming 2 Add an Item 0 16 1 9 2 3 7 3 Items h(k) = k mod 4 Art of Multiprocessor Programming 3 Add Another: Collision 0 16 1 9 4 2 3 7 4 Items h(k) = k mod 4 Art of Multiprocessor Programming 4 More Collisions 0 16 1 9 4 2 3 7 15 5 Items h(k) = k mod 4 Art of Multiprocessor Programming 5 More Collisions 0 16 1 9 4 2 3 7 15 5 Items Problem: buckets becoming too long h(k) = k mod 4 Art of Multiprocessor Programming 6 Resizing 0 16 1 9 4 2 3 4 7 15 5 Items 5 6 7 h(k) = k mod 4 Grow the array Art of Multiprocessor Programming 7 Resizing 0 16 1 9 4 Adjust hash function 2 3 4 7 15 5 Items 5 6 h(k) = k mod 8 7 Art of Multiprocessor Programming 8 Resizing 0 16 1 9 4 h(4) = 4 mod 8 2 3 7 15 4 5 6 h(k) = k mod 8 7 Art of Multiprocessor Programming 9 Resizing 0 16 1 9 h(4) = 4 mod 8 2 3 7 4 4 15 5 6 h(k) = k mod 8 7 Art of Multiprocessor Programming 10 Resizing 0 16 1 9 h(7) = h(15) = 7 mod 8 2 3 7 4 4 15 5 6 h(k) = k mod 8 7 Art of Multiprocessor Programming 11 Resizing 0 16 1 9 h(15) = 7 mod 8 2 3 4 4 5 h(k) = k mod 8 6 7 7 15 Art of Multiprocessor Programming 12 Fields public class SimpleHashSet { protected LockFreeList[] table; public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); } … Array of lock-free lists Art of Multiprocessor Programming 13 Constructor public class SimpleHashSet { protected LockFreeList[] table; public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); } … Initial size Art of Multiprocessor Programming 14 Constructor public class SimpleHashSet { protected LockFreeList[] table; public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); } … Allocate memory Art of Multiprocessor Programming 15 Constructor public class SimpleHashSet { protected LockFreeList[] table; public SimpleHashSet(int capacity) { table = new LockFreeList[capacity]; for (int i = 0; i < capacity; i++) table[i] = new LockFreeList(); } … Initialization Art of Multiprocessor Programming 16 Add Method public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); } Art of Multiprocessor Programming 17 Add Method public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); } Use object hash code to pick a bucket Art of Multiprocessor Programming 18 Add Method public boolean add(Object key) { int hash = key.hashCode() % table.length; return table[hash].add(key); } Call bucket’s add() method Art of Multiprocessor Programming 19 No Brainer? • We just saw a – Simple – Lock-free – Concurrent hash-based set implementation • What’s not to like? Art of Multiprocessor Programming 20 No Brainer? • We just saw a – Simple – Lock-free – Concurrent hash-based set implementation • What’s not to like? • We don’t know how to resize … Art of Multiprocessor Programming 21 Is Resizing Necessary? • Constant-time method calls require – Constant-length buckets – Table size proportional to set size – As set grows, must be able to resize Art of Multiprocessor Programming 22 Set Method Mix • Typical load – 90% contains() – 9% add () – 1% remove() • Growing is important • Shrinking not so much Art of Multiprocessor Programming 23 When to Resize? • Many reasonable policies. Here’s one. • Pick a threshold on num of items in a bucket • Global threshold – When ≥ ¼ buckets exceed this value • Bucket threshold – When any bucket exceeds this value Art of Multiprocessor Programming 24 Coarse-Grained Locking • Good parts – Simple – Hard to mess up • Bad parts – Sequential bottleneck Art of Multiprocessor Programming 25 Fine-grained Locking 0 4 8 1 9 17 7 11 2 3 root Each lock associated with one bucket Art of Multiprocessor Programming 26 Make sure root reference didn’t change Resize between resize decision This and lock acquisition 0 4 8 1 9 17 7 11 2 3 root Acquire locks in ascending order Art of Multiprocessor Programming 27 Resize This 0 1 2 3 0 1 4 8 9 17 7 11 2 3 4 5 Allocate new super-sized table 6 7 Art of Multiprocessor Programming 28 Resize This 0 1 2 3 0 1 4 8 9 9 8 17 17 2 3 4 7 11 11 4 5 6 7 7 Art of Multiprocessor Programming 29 Striped Locks: each lock now associated with two buckets Resize This 0 1 2 3 0 8 1 9 17 2 3 11 4 4 5 6 7 7 Art of Multiprocessor Programming 31 Observations • We grow the table, but not locks – Resizing lock array is tricky … • We use sequential lists – Not LockFreeList lists – If we’re locking anyway, why pay? Art of Multiprocessor Programming 32 Fine-Grained Hash Set public class FGHashSet { protected RangeLock[] lock; protected List[] table; public FGHashSet(int capacity) { table = new List[capacity]; lock = new RangeLock[capacity]; for (int i = 0; i < capacity; i++) { lock[i] = new RangeLock(); table[i] = new LinkedList(); }} … Art of Multiprocessor Programming 33 Fine-Grained Hash Set public class FGHashSet { protected RangeLock[] lock; protected List[] table; public FGHashSet(int capacity) { table = new List[capacity]; lock = new RangeLock[capacity]; for (int i = 0; i < capacity; i++) { lock[i] = new RangeLock(); table[i] = new LinkedList(); }} … Array of locks Art of Multiprocessor Programming 34 Fine-Grained Hash Set public class FGHashSet { protected RangeLock[] lock; protected List[] table; public FGHashSet(int capacity) { table = new List[capacity]; lock = new RangeLock[capacity]; for (int i = 0; i < capacity; i++) { lock[i] = new RangeLock(); table[i] = new LinkedList(); }} … Array of buckets Art of Multiprocessor Programming 35 Fine-Grained Hash Set public class FGHashSet { Initially same number of protected RangeLock[] lock; and buckets protected List[]locks table; public FGHashSet(int capacity) { table = new List[capacity]; lock = new RangeLock[capacity]; for (int i = 0; i < capacity; i++) { lock[i] = new RangeLock(); table[i] = new LinkedList(); }} … Art of Multiprocessor Programming 36 The add() method public boolean add(Object key) { int keyHash = key.hashCode() % lock.length; synchronized (lock[keyHash]) { int tabHash = key.hashCode() % table.length; return table[tabHash].add(key); } } Art of Multiprocessor Programming 37 Fine-Grained Locking public boolean add(Object key) { int keyHash = key.hashCode() % lock.length; synchronized (lock[keyHash]) { int tabHash = key.hashCode() % table.length; return table[tabHash].add(key); } Which lock? } Art of Multiprocessor Programming 38 The add() method public boolean add(Object key) { int keyHash = key.hashCode() % lock.length; synchronized (lock[keyHash]) { int tabHash = key.hashCode() % table.length; return table[tabHash].add(key); } } Acquire the lock Art of Multiprocessor Programming 39 Fine-Grained Locking public boolean add(Object key) { int keyHash = key.hashCode() % lock.length; synchronized (lock[keyHash]) { int tabHash = key.hashCode() % table.length; return table[tabHash].add(key); } } Which bucket? Art of Multiprocessor Programming 40 The add() method public boolean add(Object key) { int keyHash Call that bucket’s = key.hashCode() % lock.length; add() method synchronized (lock[keyHash]) { int tabHash = key.hashCode() % table.length; return table[tabHash].add(key); } } Art of Multiprocessor Programming 41 Another Locking Structure • add, remove, contains – Lock table in shared mode • resize – Locks table in exclusive mode Art of Multiprocessor Programming 48 Read-Write Locks public interface ReadWriteLock { Lock readLock(); Lock writeLock(); } Art of Multiprocessor Programming 49 Read/Write Locks Returns associated interface ReadWriteLock { read lock readLock(); public Lock Lock writeLock(); } Art of Multiprocessor Programming 50 Read/Write Locks Returns associated interface ReadWriteLock { read lock readLock(); public Lock Lock writeLock(); } Returns associated write lock Art of Multiprocessor Programming 51 Lock Safety Properties • Read lock: – Locks out writers – Allows concurrent readers • Write lock – Locks out writers – Locks out readers Art of Multiprocessor Programming 52 Lets Try to Design a ReadWrite Lock • Read lock: – Locks out writers – Allows concurrent readers • Write lock – Locks out writers – Locks out readers Art of Multiprocessor Programming 53 Read/Write Lock • Safety – If readers > 0 then writer == false – If writer == true then readers == 0 • Liveness? – Will a continual stream of readers … – Lock out writers? Art of Multiprocessor Programming 54 FIFO R/W Lock • • • • As soon as a writer requests a lock No more readers accepted Current readers “drain” from lock Writer gets in Art of Multiprocessor Programming 55 The Story So Far • Resizing is the hard part • Fine-grained locks – Striped locks cover a range (not resized) • Read/Write locks – FIFO property tricky Art of Multiprocessor Programming 61 Stop The World Resizing • • • • Resizing stops all concurrent operations What about an incremental resize? Must avoid locking the table A lock-free table + incremental resizing? Art of Multiprocessor Programming 62 Lock-Free Resizing Problem 0 4 1 9 8 2 3 7 15 Art of Multiprocessor Programming 63 Lock-Free Resizing Problem 0 44 1 9 8 12 Need to extend table 2 3 7 15 4 5 6 7 Art of Multiprocessor Programming 64 Lock-Free Resizing Problem 0 4 1 9 8 12 2 3 7 4 4 15 12 5 6 7 Art of Multiprocessor Programming 65 Lock-Free Resizing Problem 0 4 1 9 8 2 3 7 4 4 15 12 12 to remove and then add even a single item: single location CAS not enough 5 6 7 We need a new idea… Art of Multiprocessor Programming 66 Don’t move the items • Move the buckets instead! • Keep all items in a single, lock-free list • Buckets are short-cuts into the list 16 4 9 7 15 0 1 2 3 Art of Multiprocessor Programming 67 Recursive Split Ordering 0 4 2 6 1 5 3 7 0 Art of Multiprocessor Programming 68 Recursive Split Ordering 1/2 0 4 2 6 1 5 3 7 0 1 Art of Multiprocessor Programming 69 Recursive Split Ordering 1/4 0 4 1/2 2 6 3/4 1 5 3 7 0 1 2 3 Art of Multiprocessor Programming 70 Recursive Split Ordering 1/4 0 4 1/2 2 6 3/4 1 5 3 7 0 1 2 3 List entries sorted in order that allows recursive splitting. How? Art of Multiprocessor Programming 71 Recursive Split Ordering 0 4 2 6 1 5 3 7 0 Art of Multiprocessor Programming 72 Recursive Split Ordering LSB 0 0 4 2 LSB 1 6 1 5 3 7 0 1 LSB = Least significant Bit Art of Multiprocessor Programming 73 Recursive Split Ordering LSB 00 0 4 LSB 10 2 6 LSB 01 1 5 LSB 11 3 7 0 1 2 3 Art of Multiprocessor Programming 74 Split-Order • If the table size is 2i, – Bucket b contains keys k • k = b (mod 2i) – bucket index consists of key's i LSBs Art of Multiprocessor Programming 75 When Table Splits • Some keys stay – b = k mod(2i+1) • Some move – b+2i = k mod(2i+1) • Determined by (i+1)st bit – Counting backwards • Key must be accessible from both – Keys that will move must come later Art of Multiprocessor Programming 76 A Bit of Magic Real keys: 0 4 2 6 1 5 Art of Multiprocessor Programming 3 7 77 A Bit of Magic Real keys: 0 4 2 6 1 5 3 7 2 3 4 5 6 7 Real key 1 is in the 4th location Split-order: 0 1 Art of Multiprocessor Programming 78 A Bit of Magic Real keys: 0 000 4 100 2 6 010 110 1 001 5 3 101 011 5 6 101 110 7 111 Real key 1 is in 4th location Split-order: 0 000 1 001 2 3 010 011 4 100 Art of Multiprocessor Programming 7 111 79 A Bit of Magic Real keys: 000 100 010 110 001 101 011 111 001 010 011 100 101 110 111 Split-order: 000 Art of Multiprocessor Programming 80 A Bit of Magic Real keys: 000 100 010 110 001 101 011 111 001 010 011 100 101 110 111 Split-order: 000 Just reverse the order of the key bits Art of Multiprocessor Programming 81 Split Ordered Hashing Order according to reversed bits 000 0 001 4 010 011 2 6 100 1 101 110 5 3 111 7 0 1 2 3 Art of Multiprocessor Programming 82 Bucket Relations parent 0 4 2 6 1 5 3 7 0 1 child Art of Multiprocessor Programming 83 Parent Always Provides a Short Cut 0 0 4 2 6 1 5 3 7 search 1 2 3 Art of Multiprocessor Programming 84 Sentinel Nodes 16 4 9 7 15 0 1 2 3 Problem: how to remove a node pointed by 2 sources using CAS Art of Multiprocessor Programming 85 Sentinel Nodes 0 16 4 1 9 3 7 15 0 1 2 3 Solution: use a Sentinel node for each bucket Art of Multiprocessor Programming 86 Sentinel vs Regular Keys • Want sentinel key for i ordered – before all keys that hash to bucket i – after all keys that hash to bucket (i-1) Art of Multiprocessor Programming 87 Splitting a Bucket • We can now split a bucket • In a lock-free manner • Using two CAS() calls ... – One to add the sentinel to the list – The other to point from the bucket to the sentinel Art of Multiprocessor Programming 88 Initialization of Buckets 0 16 4 1 9 7 15 0 1 Art of Multiprocessor Programming 89 Initialization of Buckets 0 0 16 4 1 9 7 15 3 1 2 3 Need to initialize bucket 3 to split bucket 1 Art of Multiprocessor Programming 90 Adding 10 10 0 0 16 hashes to 2 1 4 9 3 7 22 1 2 3 Must initialize bucket 2 Before adding 10 Art of Multiprocessor Programming 91 Recursive Initialization To add 7 to the list 0 8 12 1 = 3 mod 4 7 3 = 1 mod 2 0 1 2 Could be log n depth But expected depth is constant Must initialize bucket 1 3 Must initialize bucket 3 Art of Multiprocessor Programming 92 Lock-Free List int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); } Art of Multiprocessor Programming 93 Lock-Free List int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); } Regular key: set high-order bit to 1 and reverse Art of Multiprocessor Programming 94 Lock-Free List int makeRegularKey(int key) { return reverse(key | 0x80000000); } int makeSentinelKey(int key) { return reverse(key); } Sentinel key: simply reverse (high-order bit is 0) Art of Multiprocessor Programming 95 Main List • Lock-Free List from earlier class • With some minor variations Art of Multiprocessor Programming 96 Lock-Free List public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...}; } Art of Multiprocessor Programming 97 Lock-Free List public class LockFreeList { public boolean add(Object object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, Change: add takes key int key) {...}; argument } Art of Multiprocessor Programming 98 Lock-Free List Inserts with key if{ not public sentinel class LockFreeList public boolean add(Object already present … object, int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...}; } Art of Multiprocessor Programming 99 Lock-Free List …public returns newLockFreeList list starting{with class public boolean object, sentinel (sharesadd(Object with parent) int key) {...} public boolean remove(int k) {...} public boolean contains(int k) {...} public LockFreeList(LockFreeList parent, int key) {...}; } Art of Multiprocessor Programming 100 Split-Ordered Set: Fields public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); tableSize = new AtomicInteger(2); setSize = new AtomicInteger(0); } Art of Multiprocessor Programming 101 Fields public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); For simplicity treat table as big tableSize = new AtomicInteger(2); setSize = new AtomicInteger(0); array … } Art of Multiprocessor Programming 102 Fields public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); In practice, something that tableSize = new want AtomicInteger(2); setSize = grows new AtomicInteger(0); dynamically } Art of Multiprocessor Programming 103 Fields public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); How much table array are we tableSize = newof AtomicInteger(2); setSize = new AtomicInteger(0); actually using? } Art of Multiprocessor Programming 104 Fields public class SOSet { protected LockFreeList[] table; protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); Track set size tableSize = new AtomicInteger(2); setSize newknow AtomicInteger(0); so=we when to resize } Art of Multiprocessor Programming 105 Fields public class Initially use SOSet single{ bucket, protected LockFreeList[] table; and size is zero protected AtomicInteger tableSize; protected AtomicInteger setSize; public SOSet(int capacity) { table = new LockFreeList[capacity]; table[0] = new LockFreeList(); tableSize = new AtomicInteger(1); setSize = new AtomicInteger(0); } Art of Multiprocessor Programming 106 add() public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; } Art of Multiprocessor Programming 107 add() public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; Pick a bucket } Art of Multiprocessor Programming 108 add() public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; Non-Sentinel } split-ordered key Art of Multiprocessor Programming 109 add() public boolean add(Object object) { int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); reference to bucket’s returnGet true; sentinel, initializing if necessary } Art of Multiprocessor Programming 110 add() public boolean add(Object object) { Call bucket’s add() method with int hash = object.hashCode(); key int bucket reversed = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; } Art of Multiprocessor Programming 111 add() public boolean add(Object object) { No change? We’re done. int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; } Art of Multiprocessor Programming 112 add() public boolean add(Object object) { Time to resize? int hash = object.hashCode(); int bucket = hash % tableSize.get(); int key = makeRegularKey(hash); LockFreeList list = getBucketList(bucket); if (!list.add(object, key)) return false; resizeCheck(); return true; } Art of Multiprocessor Programming 113 Resize • Divide set size by total number of buckets • If quotient exceeds threshold – Double tableSize field – Up to fixed limit Art of Multiprocessor Programming 114 Initialize Buckets • Buckets originally null • If you find one, initialize it • Go to bucket’s parent – Earlier nearby bucket – Recursively initialize if necessary • Constant expected work Art of Multiprocessor Programming 115 Recall: Recursive Initialization To add 7 to the list 0 8 12 1 7 3 = 3 mod 4 = 1 mod 2 0 1 2 expected isdepth constant Coulddepth be log n Must initialize bucket 1 3 Must initialize bucket 3 Art of Multiprocessor Programming 116 Initialize Bucket void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Art of Multiprocessor Programming 117 Initialize Bucket void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Find parent, recursively initialize if needed Art of Multiprocessor Programming 118 Initialize Bucket void initializeBucket(int bucket) { int parent = getParent(bucket); if (table[parent] == null) initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Prepare key for new sentinel Art of Multiprocessor Programming 119 Initialize Bucket void initializeBucket(int bucket) { Insert sentinel if not present, and int parent = getParent(bucket); if (table[parent] == to null) back reference rest of list initializeBucket(parent); int key = makeSentinelKey(bucket); LockFreeList list = new LockFreeList(table[parent], key); } Art of Multiprocessor Programming get 120 Correctness • Linearizable concurrent set • Theorem: O(1) expected time – No more than O(1) items expected between two sentinels on average – Lazy initialization causes at most O(1) expected recursion depth in initializeBucket() • Can eliminate use of sentinels Art of Multiprocessor Programming 121 Closed (Chained) Hashing • Advantages: – with N buckets, M items, Uniform h – retains good performance as table density (M/N) increases less resizing • Disadvantages: – dynamic memory allocation – bad cache behavior (no locality) Oh, did we mention that cache behavior matters on a multicore? Open Addressed Hashing – Keep all items in an array – One per bucket – If you have collisions, find an empty bucket and use it – Must know how to find items if they are outside their bucket Linear Probing* h(x) z 1 2 3 4 5 6 7 8 x 9 10 11 12 13 14 15 16 17 18 19 20 z H =7 contains(x) – search linearly from h(x) to h(x) + H recorded in bucket. *Attributed to Amdahl… Linear Probing h(x) z z zz zx zz z zzz 1 2 3 4 5 6 7 8 zz 9 10 11 12 13 14 15 16 17 18 19 20 z H =3 =6 add(x) – put in first empty bucket, and update H. Linear Probing • Open address means M ¿ N • Expected items in bucket same as Chaining • Expected distance till open slot: ³ 1 2 ´ 1+ 1 ( 1+ M =N ) 2 M/N = 0.5 search 2.5 buckets M/N = 0.9 search 50 buckets Linear Probing • Advantages: – Good locality fewer cache misses • Disadvantages: – As M/N increases more cache misses • searching 10s of unrelated buckets • “Clustering” of keys into neighboring buckets – As computation proceeds “Contamination” by deleted items more cache misses But cycles can form Cuckoo Hashing h1(x) z zzz 1 2 3 4 5 2 3 4 6 7 8 z zz z zz 1 z yx z z z 5 6 h2(y) 7 8 zz zz 9 10 11 12 13 14 15 16 17 18 19 20 zw z zz zz 9 10 11 12 13 14 15 16 17 18 19 20 h2(x) add(x) – if h1(x) and h2(x) full evict y and move it to h2(y) h2(x). Then place x in its place. Cuckoo Hashing • Advantages: – contains(x): deterministic 2 buckets – No clustering or contamination • Disadvantages: – 2 tables – hi(x) are complex – As M/N increases relocation cycles – Above M/N = 0.5 Add() does not work! Concurrent Cuckoo Hashing • Need to either lock whole chain of displacements (see book) • or have extra space to keep items as they are displaced step by step. Hopscotch Hashing • Single Array, Simple hash function • Idea: define neighborhood of original bucket • In neighborhood items found quickly • Use sequences of displacements to move items into their neighborhood Hopscotch Hashing h(x) z 1 2 3 4 5 6 x 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 0 1 0 H=4 contains(x) – search in at most H buckets (the hop-range) based on hop-info bitmap. In practice pick H to be 32. Hopscotch Hashing h(x) x uw v z r 1 2 3 4 5 6 7 1 1 0 0 1 8 s 9 10 11 12 13 14 15 16 17 18 19 20 1 1 0 0 1 0 add(x) – probe linearly to find open slot. Move the empty slot via sequence of displacements into the hop-range of h(x). Hopscotch Hashing • contains – wait-free, just look in neighborhood Hopscotch Hashing • contains – wait-free, just look in neighborhood • add – expected distance same as in linear probing Hopscotch Hashing • contains – wait-free, just look in neighborhood • add – Expected distance same as in linear probing • resize – neighborhood full less likely as H log n – one word hop-info bitmap, or use smaller H and default to linear probing of bucket Advantages • Good locality and cache behavior • As table density (M/N) increases less resizing • Move cost to add()from contains(x) • Easy to parallelize Recall: Concurrent Chained Hashing 1 2 3 4 5 Striped Locks 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Lock for add() and unsuccessful contains() Concurrent Simple Hopscotch h(x) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 contains() is wait-free Concurrent Simple Hopscotch u zv x r 1 2 3 4 5 6 7 1 0 0 1 8 9 10 11 12 13 14 15 16 17 18 19 20 ts add(x) – lock bucket, mark empty slot using CAS, add x erasing mark Concurrent Simple Hopscotch u zv r 1 2 3 4 5 6 7 1 0 0 1 8 s 9 10 11 12 13 14 15 16 17 18 19 20 ts 1 0 0 1 1 0 ts+1 ts add(x) – lock bucket, mark empty slot using CAS, lock bucket and update timestamp of bucket being displaced before erasing old value Concurrent Simple Hopscotch x not found u zv s r 1 2 3 4 5 6 7 1 0 0 1 8 9 10 11 12 13 14 15 16 17 18 19 20 ts Contains(x) – traverse using bitmap and if ts has not changed after traversal item not found. If ts changed, after a few tries traverse through all items. Is performance dominated by cache behavior? • Test on multicores and uniprocessors: – Sun 64 way Niagara II, and – Intel 3GHz Xeon • Benchmarks pre-allocated memory to eliminate effects of memory management 5000 Sequential SPARC Throughput 90% contain, 5% insert, 5% remove 4500 4000 ops /ms 3500 3000 2500 with memory pre-allocated 2000 1500 1000 500 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 table density 0.6 0.7 0.8 0.9 Sequential SPARC High-Density;Throuthput 90% contain, 5% insert,5% remove 4000 3500 ops /ms 3000 2500 2000 1500 1000 500 0 0.9 Hopscotch_D Hopscotch_ND LinearProbing Chained 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 table density 14000 Sequential CoreDuo; Throughput 90% contain, 5% insert, 5% remove 12000 Cuckoo stops here ops /ms 10000 8000 6000 4000 2000 0 0.1 Hopscotch_D Hopscotch_ND LinearProbing Chained Cuckoo 0.2 0.3 0.4 0.5 table density 0.6 0.7 0.8 0.9 Concurrent SPARC Throughput 90% density; 70% contain, 15% insert, 15% remove 160000 Hopscotch_D Chained_PRE Chained_MTM 140000 ops /ms 120000 100000 with memory pre-allocated with allocation 80000 60000 40000 20000 0 1 8 16 24 32 CPUs 40 48 56 64 Concurrent SPARC Throughput 90% density; Cache-Miss per UnSuccessful-Lookup 3 miss / ops 2.5 2 1.5 1 0.5 Hopscotch_D Chained_PRE Chained_MTM 0 1 8 16 24 CPUs 32 40 48 56 64 Summary • Chained hash with striped locking is simple and effective in many cases • Hopscotch with striped locking great cache behavior • If incremental resizing needed go for split-ordered This work is licensed under a Creative Commons AttributionShareAlike 2.5 License. • • • • • You are free: – to Share — to copy, distribute and transmit the work – to Remix — to adapt the work Under the following conditions: – Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming 152