Hashing and Natural Parallism Companion slides for The Art of Multiprocessor Programming

advertisement
Hashing and Natural Parallism
Companion slides for
The Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
Sequential Closed Hash Map
0
16
1
9
buckets
2
3
2 Items
h(k) = k mod 4
Art of Multiprocessor Programming
2
Add an Item
0
16
1
9
2
3
7
3 Items
h(k) = k mod 4
Art of Multiprocessor Programming
3
Add Another: Collision
0
16
1
9
4
2
3
7
4 Items
h(k) = k mod 4
Art of Multiprocessor Programming
4
More Collisions
0
16
1
9
4
2
3
7
15
5 Items
h(k) = k mod 4
Art of Multiprocessor Programming
5
More Collisions
0
16
1
9
4
2
3
7
15
5 Items
Problem:
buckets becoming too long
h(k) = k mod 4
Art of Multiprocessor Programming
6
Resizing
0
16
1
9
4
2
3
4
7
15
5 Items
5
6
7
h(k) = k mod 4
Grow the array
Art of Multiprocessor Programming
7
Resizing
0
16
1
9
4
Adjust hash function
2
3
4
7
15
5 Items
5
6
h(k) = k mod 8
7
Art of Multiprocessor Programming
8
Resizing
0
16
1
9
4
h(4) = 4 mod 8
2
3
7
15
4
5
6
h(k) = k mod 8
7
Art of Multiprocessor Programming
9
Resizing
0
16
1
9
h(4) = 4 mod 8
2
3
7
4
4
15
5
6
h(k) = k mod 8
7
Art of Multiprocessor Programming
10
Resizing
0
16
1
9
h(7) = h(15) = 7 mod 8
2
3
7
4
4
15
5
6
h(k) = k mod 8
7
Art of Multiprocessor Programming
11
Resizing
0
16
1
9
h(15) = 7 mod 8
2
3
4
4
5
h(k) = k mod 8
6
7
7
15
Art of Multiprocessor Programming
12
Fields
public class SimpleHashSet {
protected LockFreeList[] table;
public SimpleHashSet(int capacity) {
table = new LockFreeList[capacity];
for (int i = 0; i < capacity; i++)
table[i] = new LockFreeList();
}
…
Array of lock-free lists
Art of Multiprocessor Programming
13
Constructor
public class SimpleHashSet {
protected LockFreeList[] table;
public SimpleHashSet(int capacity) {
table = new LockFreeList[capacity];
for (int i = 0; i < capacity; i++)
table[i] = new LockFreeList();
}
…
Initial size
Art of Multiprocessor Programming
14
Constructor
public class SimpleHashSet {
protected LockFreeList[] table;
public SimpleHashSet(int capacity) {
table = new LockFreeList[capacity];
for (int i = 0; i < capacity; i++)
table[i] = new LockFreeList();
}
…
Allocate memory
Art of Multiprocessor Programming
15
Constructor
public class SimpleHashSet {
protected LockFreeList[] table;
public SimpleHashSet(int capacity) {
table = new LockFreeList[capacity];
for (int i = 0; i < capacity; i++)
table[i] = new LockFreeList();
}
…
Initialization
Art of Multiprocessor Programming
16
Add Method
public boolean add(Object key) {
int hash =
key.hashCode() % table.length;
return table[hash].add(key);
}
Art of Multiprocessor Programming
17
Add Method
public boolean add(Object key) {
int hash =
key.hashCode() % table.length;
return table[hash].add(key);
}
Use object hash code to
pick a bucket
Art of Multiprocessor Programming
18
Add Method
public boolean add(Object key) {
int hash =
key.hashCode() % table.length;
return table[hash].add(key);
}
Call bucket’s add() method
Art of Multiprocessor Programming
19
No Brainer?
• We just saw a
– Simple
– Lock-free
– Concurrent hash-based set implementation
• What’s not to like?
Art of Multiprocessor Programming
20
No Brainer?
• We just saw a
– Simple
– Lock-free
– Concurrent hash-based set implementation
• What’s not to like?
• We don’t know how to resize …
Art of Multiprocessor Programming
21
Is Resizing Necessary?
• Constant-time method calls require
– Constant-length buckets
– Table size proportional to set size
– As set grows, must be able to resize
Art of Multiprocessor Programming
22
Set Method Mix
• Typical load
– 90% contains()
– 9% add ()
– 1% remove()
• Growing is important
• Shrinking not so much
Art of Multiprocessor Programming
23
When to Resize?
• Many reasonable policies. Here’s one.
• Pick a threshold on num of items in a
bucket
• Global threshold
– When ≥ ¼ buckets exceed this value
• Bucket threshold
– When any bucket exceeds this value
Art of Multiprocessor Programming
24
Coarse-Grained Locking
• Good parts
– Simple
– Hard to mess up
• Bad parts
– Sequential bottleneck
Art of Multiprocessor Programming
25
Fine-grained Locking
0
4
8
1
9
17
7
11
2
3
root
Each lock associated with one bucket
Art of Multiprocessor Programming
26
Make sure root reference didn’t change
Resize
between resize
decision This
and lock acquisition
0
4
8
1
9
17
7
11
2
3
root
Acquire locks in
ascending order
Art of Multiprocessor Programming
27
Resize This
0
1
2
3
0
1
4
8
9
17
7
11
2
3
4
5
Allocate new
super-sized table
6
7
Art of Multiprocessor Programming
28
Resize This
0
1
2
3
0
1
4
8
9
9
8
17
17
2
3
4
7
11
11
4
5
6
7
7
Art of Multiprocessor Programming
29
Striped Locks: each lock now associated with two buckets
Resize This
0
1
2
3
0
8
1
9
17
2
3
11
4
4
5
6
7
7
Art of Multiprocessor Programming
31
Observations
• We grow the table, but not locks
– Resizing lock array is tricky …
• We use sequential lists
– Not LockFreeList lists
– If we’re locking anyway, why pay?
Art of Multiprocessor Programming
32
Fine-Grained Hash Set
public class FGHashSet {
protected RangeLock[] lock;
protected List[] table;
public FGHashSet(int capacity) {
table = new List[capacity];
lock = new RangeLock[capacity];
for (int i = 0; i < capacity; i++) {
lock[i] = new RangeLock();
table[i] = new LinkedList();
}} …
Art of Multiprocessor Programming
33
Fine-Grained Hash Set
public class FGHashSet {
protected RangeLock[] lock;
protected List[] table;
public FGHashSet(int capacity) {
table = new List[capacity];
lock = new RangeLock[capacity];
for (int i = 0; i < capacity; i++) {
lock[i] = new RangeLock();
table[i] = new LinkedList();
}} …
Array of locks
Art of Multiprocessor Programming
34
Fine-Grained Hash Set
public class FGHashSet {
protected RangeLock[] lock;
protected List[] table;
public FGHashSet(int capacity) {
table = new List[capacity];
lock = new RangeLock[capacity];
for (int i = 0; i < capacity; i++) {
lock[i] = new RangeLock();
table[i] = new LinkedList();
}} …
Array of buckets
Art of Multiprocessor Programming
35
Fine-Grained Hash Set
public class FGHashSet
{
Initially same
number of
protected RangeLock[] lock;
and buckets
protected List[]locks
table;
public FGHashSet(int capacity) {
table = new List[capacity];
lock = new RangeLock[capacity];
for (int i = 0; i < capacity; i++) {
lock[i] = new RangeLock();
table[i] = new LinkedList();
}} …
Art of Multiprocessor Programming
36
The add() method
public boolean add(Object key) {
int keyHash
= key.hashCode() % lock.length;
synchronized (lock[keyHash]) {
int tabHash = key.hashCode() %
table.length;
return table[tabHash].add(key);
}
}
Art of Multiprocessor Programming
37
Fine-Grained Locking
public boolean add(Object key) {
int keyHash
= key.hashCode() % lock.length;
synchronized (lock[keyHash]) {
int tabHash = key.hashCode() %
table.length;
return table[tabHash].add(key);
}
Which lock?
}
Art of Multiprocessor Programming
38
The add() method
public boolean add(Object key) {
int keyHash
= key.hashCode() % lock.length;
synchronized (lock[keyHash]) {
int tabHash = key.hashCode() %
table.length;
return table[tabHash].add(key);
}
}
Acquire the lock
Art of Multiprocessor Programming
39
Fine-Grained Locking
public boolean add(Object key) {
int keyHash
= key.hashCode() % lock.length;
synchronized (lock[keyHash]) {
int tabHash = key.hashCode() %
table.length;
return table[tabHash].add(key);
}
}
Which bucket?
Art of Multiprocessor Programming
40
The add() method
public boolean add(Object key) {
int keyHash
Call that bucket’s
= key.hashCode() % lock.length;
add() method
synchronized (lock[keyHash]) {
int tabHash = key.hashCode() %
table.length;
return table[tabHash].add(key);
}
}
Art of Multiprocessor Programming
41
Another Locking Structure
• add, remove, contains
– Lock table in shared mode
• resize
– Locks table in exclusive mode
Art of Multiprocessor Programming
48
Read-Write Locks
public interface ReadWriteLock {
Lock readLock();
Lock writeLock();
}
Art of Multiprocessor Programming
49
Read/Write Locks
Returns associated
interface ReadWriteLock {
read lock
readLock();
public
Lock
Lock writeLock();
}
Art of Multiprocessor Programming
50
Read/Write Locks
Returns associated
interface ReadWriteLock {
read lock
readLock();
public
Lock
Lock writeLock();
}
Returns associated
write lock
Art of Multiprocessor Programming
51
Lock Safety Properties
• Read lock:
– Locks out writers
– Allows concurrent readers
• Write lock
– Locks out writers
– Locks out readers
Art of Multiprocessor Programming
52
Lets Try to Design a ReadWrite Lock
• Read lock:
– Locks out writers
– Allows concurrent readers
• Write lock
– Locks out writers
– Locks out readers
Art of Multiprocessor Programming
53
Read/Write Lock
• Safety
– If readers > 0 then writer == false
– If writer == true then readers == 0
• Liveness?
– Will a continual stream of readers …
– Lock out writers?
Art of Multiprocessor Programming
54
FIFO R/W Lock
•
•
•
•
As soon as a writer requests a lock
No more readers accepted
Current readers “drain” from lock
Writer gets in
Art of Multiprocessor Programming
55
The Story So Far
• Resizing is the hard part
• Fine-grained locks
– Striped locks cover a range (not resized)
• Read/Write locks
– FIFO property tricky
Art of Multiprocessor Programming
61
Stop The World Resizing
•
•
•
•
Resizing stops all concurrent operations
What about an incremental resize?
Must avoid locking the table
A lock-free table + incremental resizing?
Art of Multiprocessor Programming
62
Lock-Free Resizing Problem
0
4
1
9
8
2
3
7
15
Art of Multiprocessor Programming
63
Lock-Free Resizing Problem
0
44
1
9
8
12
Need to extend table
2
3
7
15
4
5
6
7
Art of Multiprocessor Programming
64
Lock-Free Resizing Problem
0
4
1
9
8
12
2
3
7
4
4
15
12
5
6
7
Art of Multiprocessor Programming
65
Lock-Free Resizing Problem
0
4
1
9
8
2
3
7
4
4
15
12
12
to remove and
then add even a
single item: single
location CAS
not enough
5
6
7
We need a new idea…
Art of Multiprocessor Programming
66
Don’t move the items
• Move the buckets instead!
• Keep all items in a single, lock-free list
• Buckets are short-cuts into the list
16
4
9
7
15
0
1
2
3
Art of Multiprocessor Programming
67
Recursive Split Ordering
0
4
2
6
1
5
3
7
0
Art of Multiprocessor Programming
68
Recursive Split Ordering
1/2
0
4
2
6
1
5
3
7
0
1
Art of Multiprocessor Programming
69
Recursive Split Ordering
1/4
0
4
1/2
2
6
3/4
1
5
3
7
0
1
2
3
Art of Multiprocessor Programming
70
Recursive Split Ordering
1/4
0
4
1/2
2
6
3/4
1
5
3
7
0
1
2
3
List entries sorted in order that allows
recursive splitting. How?
Art of Multiprocessor Programming
71
Recursive Split Ordering
0
4
2
6
1
5
3
7
0
Art of Multiprocessor Programming
72
Recursive Split Ordering
LSB 0
0
4
2
LSB 1
6
1
5
3
7
0
1
LSB = Least significant Bit
Art of Multiprocessor Programming
73
Recursive Split Ordering
LSB 00
0
4
LSB 10
2
6
LSB 01
1
5
LSB 11
3
7
0
1
2
3
Art of Multiprocessor Programming
74
Split-Order
• If the table size is 2i,
– Bucket b contains keys k
• k = b (mod 2i)
– bucket index consists of key's i LSBs
Art of Multiprocessor Programming
75
When Table Splits
• Some keys stay
– b = k mod(2i+1)
• Some move
– b+2i = k mod(2i+1)
• Determined by (i+1)st bit
– Counting backwards
• Key must be accessible from both
– Keys that will move must come later
Art of Multiprocessor Programming
76
A Bit of Magic
Real keys:
0
4
2
6
1
5
Art of Multiprocessor Programming
3
7
77
A Bit of Magic
Real keys:
0
4
2
6
1
5
3
7
2
3
4
5
6
7
Real key 1 is in
the 4th location
Split-order:
0
1
Art of Multiprocessor Programming
78
A Bit of Magic
Real keys:
0
000
4
100
2
6
010
110
1
001
5
3
101
011
5
6
101
110
7
111
Real key 1 is in 4th location
Split-order:
0
000
1
001
2
3
010
011
4
100
Art of Multiprocessor Programming
7
111
79
A Bit of Magic
Real keys:
000
100
010
110
001
101
011
111
001
010
011
100
101
110
111
Split-order:
000
Art of Multiprocessor Programming
80
A Bit of Magic
Real keys:
000
100
010
110
001
101
011
111
001
010
011
100
101
110
111
Split-order:
000
Just reverse the order of the
key bits
Art of Multiprocessor Programming
81
Split Ordered Hashing
Order according to reversed bits
000
0
001
4
010
011
2
6
100
1
101
110
5
3
111
7
0
1
2
3
Art of Multiprocessor Programming
82
Bucket Relations
parent
0
4
2
6
1
5
3
7
0
1
child
Art of Multiprocessor Programming
83
Parent Always Provides a
Short Cut
0
0
4
2
6
1
5
3
7
search
1
2
3
Art of Multiprocessor Programming
84
Sentinel Nodes
16
4
9
7
15
0
1
2
3
Problem: how to remove a node
pointed by 2 sources using CAS
Art of Multiprocessor Programming
85
Sentinel Nodes
0
16
4
1
9
3
7
15
0
1
2
3
Solution: use a Sentinel node for each bucket
Art of Multiprocessor Programming
86
Sentinel vs Regular Keys
• Want sentinel key for i ordered
– before all keys that hash to bucket i
– after all keys that hash to bucket (i-1)
Art of Multiprocessor Programming
87
Splitting a Bucket
• We can now split a bucket
• In a lock-free manner
• Using two CAS() calls ...
– One to add the sentinel to the list
– The other to point from the bucket to the
sentinel
Art of Multiprocessor Programming
88
Initialization of Buckets
0
16
4
1
9
7
15
0
1
Art of Multiprocessor Programming
89
Initialization of Buckets
0
0
16
4
1
9
7
15
3
1
2
3
Need to initialize bucket 3 to split bucket 1
Art of Multiprocessor Programming
90
Adding 10
10
0
0
16
hashes to 2
1
4
9
3
7
22
1
2
3
Must initialize bucket 2
Before adding 10
Art of Multiprocessor Programming
91
Recursive Initialization
To add 7 to the list
0
8
12
1
= 3 mod 4
7
3
= 1 mod 2
0
1
2
Could be log n depth
But expected depth is constant
Must initialize bucket 1
3
Must initialize bucket 3
Art of Multiprocessor Programming
92
Lock-Free List
int makeRegularKey(int key) {
return reverse(key | 0x80000000);
}
int makeSentinelKey(int key) {
return reverse(key);
}
Art of Multiprocessor Programming
93
Lock-Free List
int makeRegularKey(int key) {
return reverse(key | 0x80000000);
}
int makeSentinelKey(int key) {
return reverse(key);
}
Regular key: set high-order bit
to 1 and reverse
Art of Multiprocessor Programming
94
Lock-Free List
int makeRegularKey(int key) {
return reverse(key | 0x80000000);
}
int makeSentinelKey(int key) {
return reverse(key);
}
Sentinel key: simply reverse
(high-order bit is 0)
Art of Multiprocessor Programming
95
Main List
• Lock-Free List from earlier class
• With some minor variations
Art of Multiprocessor Programming
96
Lock-Free List
public class LockFreeList {
public boolean add(Object object,
int key) {...}
public boolean remove(int k) {...}
public boolean contains(int k) {...}
public
LockFreeList(LockFreeList parent,
int key) {...};
}
Art of Multiprocessor Programming
97
Lock-Free List
public class LockFreeList {
public boolean add(Object object,
int key) {...}
public boolean remove(int k) {...}
public boolean contains(int k) {...}
public
LockFreeList(LockFreeList
parent,
Change: add takes
key
int key) {...};
argument
}
Art of Multiprocessor Programming
98
Lock-Free List
Inserts
with key if{ not
public sentinel
class LockFreeList
public
boolean
add(Object
already
present
… object,
int key) {...}
public boolean remove(int k) {...}
public boolean contains(int k) {...}
public
LockFreeList(LockFreeList parent,
int key) {...};
}
Art of Multiprocessor Programming
99
Lock-Free List
…public
returns
newLockFreeList
list starting{with
class
public boolean
object,
sentinel
(sharesadd(Object
with parent)
int key) {...}
public boolean remove(int k) {...}
public boolean contains(int k) {...}
public
LockFreeList(LockFreeList parent,
int key) {...};
}
Art of Multiprocessor Programming
100
Split-Ordered Set: Fields
public class SOSet {
protected LockFreeList[] table;
protected AtomicInteger tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
tableSize = new AtomicInteger(2);
setSize = new AtomicInteger(0);
}
Art of Multiprocessor Programming
101
Fields
public class SOSet {
protected LockFreeList[] table;
protected AtomicInteger tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
For simplicity
treat table as big
tableSize
= new AtomicInteger(2);
setSize = new AtomicInteger(0);
array …
}
Art of Multiprocessor Programming
102
Fields
public class SOSet {
protected LockFreeList[] table;
protected AtomicInteger tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
In practice,
something that
tableSize
= new want
AtomicInteger(2);
setSize = grows
new AtomicInteger(0);
dynamically
}
Art of Multiprocessor Programming
103
Fields
public class SOSet {
protected LockFreeList[] table;
protected AtomicInteger tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
How much
table array are we
tableSize
= newof
AtomicInteger(2);
setSize = new
AtomicInteger(0);
actually
using?
}
Art of Multiprocessor Programming
104
Fields
public class SOSet {
protected LockFreeList[] table;
protected AtomicInteger tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
Track
set size
tableSize = new
AtomicInteger(2);
setSize
newknow
AtomicInteger(0);
so=we
when to resize
}
Art of Multiprocessor Programming
105
Fields
public
class
Initially
use SOSet
single{ bucket,
protected LockFreeList[] table;
and size
is zero
protected
AtomicInteger
tableSize;
protected AtomicInteger setSize;
public SOSet(int capacity) {
table = new LockFreeList[capacity];
table[0] = new LockFreeList();
tableSize = new AtomicInteger(1);
setSize = new AtomicInteger(0);
}
Art of Multiprocessor Programming
106
add()
public boolean add(Object object) {
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
}
Art of Multiprocessor Programming
107
add()
public boolean add(Object object) {
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
Pick a bucket
}
Art of Multiprocessor Programming
108
add()
public boolean add(Object object) {
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
Non-Sentinel
}
split-ordered key
Art of Multiprocessor Programming
109
add()
public boolean add(Object object) {
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
reference to bucket’s
returnGet
true;
sentinel, initializing if necessary
}
Art of Multiprocessor Programming
110
add()
public boolean add(Object object) {
Call
bucket’s
add()
method
with
int hash = object.hashCode();
key
int bucket reversed
= hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
}
Art of Multiprocessor Programming
111
add()
public boolean add(Object object) {
No
change?
We’re
done.
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
}
Art of Multiprocessor Programming
112
add()
public boolean add(Object object) {
Time
to
resize?
int hash = object.hashCode();
int bucket = hash % tableSize.get();
int key = makeRegularKey(hash);
LockFreeList list
= getBucketList(bucket);
if (!list.add(object, key))
return false;
resizeCheck();
return true;
}
Art of Multiprocessor Programming
113
Resize
• Divide set size by total number of
buckets
• If quotient exceeds threshold
– Double tableSize field
– Up to fixed limit
Art of Multiprocessor Programming
114
Initialize Buckets
• Buckets originally null
• If you find one, initialize it
• Go to bucket’s parent
– Earlier nearby bucket
– Recursively initialize if necessary
• Constant expected work
Art of Multiprocessor Programming
115
Recall: Recursive Initialization
To add 7 to the list
0
8
12
1
7
3
= 3 mod 4
= 1 mod 2
0
1
2
expected
isdepth
constant
Coulddepth
be log n
Must initialize bucket 1
3
Must initialize bucket 3
Art of Multiprocessor Programming
116
Initialize Bucket
void initializeBucket(int bucket) {
int parent = getParent(bucket);
if (table[parent] == null)
initializeBucket(parent);
int key = makeSentinelKey(bucket);
LockFreeList list =
new LockFreeList(table[parent],
key);
}
Art of Multiprocessor Programming
117
Initialize Bucket
void initializeBucket(int bucket) {
int parent = getParent(bucket);
if (table[parent] == null)
initializeBucket(parent);
int key = makeSentinelKey(bucket);
LockFreeList list =
new LockFreeList(table[parent],
key);
}
Find parent, recursively
initialize if needed
Art of Multiprocessor Programming
118
Initialize Bucket
void initializeBucket(int bucket) {
int parent = getParent(bucket);
if (table[parent] == null)
initializeBucket(parent);
int key = makeSentinelKey(bucket);
LockFreeList list =
new LockFreeList(table[parent],
key);
}
Prepare key for new sentinel
Art of Multiprocessor Programming
119
Initialize Bucket
void initializeBucket(int bucket) {
Insert
sentinel
if not present, and
int parent
= getParent(bucket);
if (table[parent]
== to
null)
back reference
rest of list
initializeBucket(parent);
int key = makeSentinelKey(bucket);
LockFreeList list =
new LockFreeList(table[parent],
key);
}
Art of Multiprocessor Programming
get
120
Correctness
• Linearizable concurrent set
• Theorem: O(1) expected time
– No more than O(1) items expected
between two sentinels on average
– Lazy initialization causes at most O(1)
expected recursion depth in
initializeBucket()
• Can eliminate use of sentinels
Art of Multiprocessor Programming
121
Closed (Chained) Hashing
• Advantages:
– with N buckets, M items, Uniform h
– retains good performance as table density
(M/N) increases  less resizing
• Disadvantages:
– dynamic memory allocation
– bad cache behavior (no locality)
Oh, did we mention that cache
behavior matters on a multicore?
Open Addressed Hashing
– Keep all items in an array
– One per bucket
– If you have collisions, find an empty bucket
and use it
– Must know how to find items if they are
outside their bucket
Linear Probing*
h(x)
z
1
2
3
4
5
6
7
8
x
9 10 11 12 13 14 15 16 17 18 19 20
z
H =7
contains(x) – search linearly from h(x)
to h(x) + H recorded in bucket.
*Attributed to Amdahl…
Linear Probing
h(x)
z z zz zx zz
z zzz
1
2
3
4
5
6
7
8
zz
9 10 11 12 13 14 15 16 17 18 19 20
z
H
=3
=6
add(x) – put in first empty bucket, and
update H.
Linear Probing
• Open address means M ¿ N
• Expected items in bucket same as Chaining
• Expected distance till open slot:
³
1
2
´
1+
1
( 1+ M =N ) 2
M/N = 0.5  search 2.5 buckets
M/N = 0.9  search 50 buckets
Linear Probing
• Advantages:
– Good locality  fewer cache misses
• Disadvantages:
– As M/N increases more cache misses
• searching 10s of unrelated buckets
• “Clustering” of keys into neighboring buckets
– As computation proceeds “Contamination” by
deleted items  more cache misses
But cycles
can form
Cuckoo Hashing
h1(x)
z zzz
1
2
3
4
5
2
3
4
6
7
8
z zz
z zz
1
z yx z z z
5
6
h2(y)
7
8
zz
zz
9 10 11 12 13 14 15 16 17 18 19 20
zw z
zz
zz
9 10 11 12 13 14 15 16 17 18 19 20
h2(x)
add(x) – if h1(x) and h2(x) full evict y and move it to
h2(y)  h2(x). Then place x in its place.
Cuckoo Hashing
• Advantages:
– contains(x): deterministic 2 buckets
– No clustering or contamination
• Disadvantages:
– 2 tables
– hi(x) are complex
– As M/N increases  relocation cycles
– Above M/N = 0.5 Add() does not work!
Concurrent Cuckoo Hashing
• Need to either lock whole chain of
displacements (see book)
• or have extra space to keep items as they
are displaced step by step.
Hopscotch Hashing
• Single Array, Simple hash function
• Idea: define neighborhood of original
bucket
• In neighborhood items found quickly
• Use sequences of displacements to
move items into their neighborhood
Hopscotch Hashing
h(x)
z
1
2
3
4
5
6
x
7
8
9 10 11 12 13 14 15 16 17 18 19 20
1 0 1 0 H=4
contains(x) – search in at most H buckets
(the hop-range) based on hop-info bitmap.
In practice pick H to be 32.
Hopscotch Hashing
h(x)
x
uw v z r
1
2
3
4
5
6
7
1 1
0 0 1
8
s
9 10 11 12 13 14 15 16 17 18 19 20
1
1
0 0 1 0
add(x) – probe linearly to find open slot.
Move the empty slot via sequence of
displacements into the hop-range of h(x).
Hopscotch Hashing
• contains
– wait-free, just look in neighborhood
Hopscotch Hashing
• contains
– wait-free, just look in neighborhood
• add
– expected distance same as in linear probing
Hopscotch Hashing
• contains
– wait-free, just look in neighborhood
• add
– Expected distance same as in linear probing
• resize
– neighborhood full less likely as H  log n
– one word hop-info bitmap, or use smaller H and
default to linear probing of bucket
Advantages
• Good locality and cache behavior
• As table density (M/N) increases
 less resizing
• Move cost to add()from
contains(x)
• Easy to parallelize
Recall: Concurrent Chained
Hashing
1
2
3
4
5
Striped Locks
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
Lock for add()
and unsuccessful
contains()
Concurrent Simple Hopscotch
h(x)
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20
contains() is wait-free
Concurrent Simple Hopscotch
u zv x r
1
2
3
4
5
6
7
1 0 0 1
8
9 10 11 12 13 14 15 16 17 18 19 20
ts
add(x) – lock bucket, mark empty
slot using CAS, add x erasing mark
Concurrent Simple Hopscotch
u zv r
1
2
3
4
5
6
7
1 0 0 1
8
s
9 10 11 12 13 14 15 16 17 18 19 20
ts
1
0 0 1 1
0 ts+1
ts
add(x) – lock bucket, mark empty
slot using CAS, lock bucket and
update timestamp of bucket being
displaced before erasing old value
Concurrent Simple Hopscotch
x not found
u zv s r
1
2
3
4
5
6
7
1 0 0 1
8
9 10 11 12 13 14 15 16 17 18 19 20
ts
Contains(x) – traverse using bitmap
and if ts has not changed after traversal
item not found. If ts changed, after a few
tries traverse through all items.
Is performance dominated by
cache behavior?
• Test on multicores and uniprocessors:
– Sun 64 way Niagara II, and
– Intel 3GHz Xeon
• Benchmarks pre-allocated memory to
eliminate effects of memory
management
5000
Sequential SPARC Throughput
90% contain, 5% insert, 5% remove
4500
4000
ops /ms
3500
3000
2500
with memory
pre-allocated
2000
1500
1000
500
0
0.1
Hopscotch_D
Hopscotch_ND
LinearProbing
Chained
Cuckoo
0.2
0.3
0.4
0.5
table density
0.6
0.7
0.8
0.9
Sequential SPARC High-Density;Throuthput
90% contain, 5% insert,5% remove
4000
3500
ops /ms
3000
2500
2000
1500
1000
500
0
0.9
Hopscotch_D
Hopscotch_ND
LinearProbing
Chained
0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99
table density
14000
Sequential CoreDuo; Throughput
90% contain, 5% insert, 5% remove
12000
Cuckoo stops here
ops /ms
10000
8000
6000
4000
2000
0
0.1
Hopscotch_D
Hopscotch_ND
LinearProbing
Chained
Cuckoo
0.2
0.3
0.4
0.5
table density
0.6
0.7
0.8
0.9
Concurrent SPARC Throughput
90% density; 70% contain, 15% insert, 15% remove
160000
Hopscotch_D
Chained_PRE
Chained_MTM
140000
ops /ms
120000
100000
with memory
pre-allocated
with
allocation
80000
60000
40000
20000
0
1
8
16
24
32
CPUs
40
48
56
64
Concurrent SPARC Throughput
90% density; Cache-Miss per UnSuccessful-Lookup
3
miss / ops
2.5
2
1.5
1
0.5
Hopscotch_D
Chained_PRE
Chained_MTM
0
1
8
16
24
CPUs
32
40
48
56
64
Summary
• Chained hash with striped locking is
simple and effective in many cases
• Hopscotch with striped locking great
cache behavior
• If incremental resizing needed go for
split-ordered
This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.
•
•
•
•
•
You are free:
– to Share — to copy, distribute and transmit the work
– to Remix — to adapt the work
Under the following conditions:
– Attribution. You must attribute the work to “The Art of
Multiprocessor Programming” (but not in any way that suggests that
the authors endorse you or your use of the work).
– Share Alike. If you alter, transform, or build upon this work, you
may distribute the resulting work only under the same, similar or a
compatible license.
For any reuse or distribution, you must make clear to others the license
terms of this work. The best way to do this is with a link to
– http://creativecommons.org/licenses/by-sa/3.0/.
Any of the above conditions can be waived if you get permission from
the copyright holder.
Nothing in this license impairs or restricts the author's moral rights.
Art of Multiprocessor Programming
152
Download