Split-Ordered Hash set

advertisement
CONCURRENT HASHING AND
NATURAL PARALLELISM
Lecture by Sagi Marcovich
Based on Maurice Herlihy & Nir Shavit, The Art of
Multiprocessor Programming, chapter 13.
Hashing - Reminder
 Hashing is mapping data of
arbitrary size to data of fixed size.
 Hash function.
 Hash values (hashes).
© Wikipedia: The Free Encyclopedia, “Hash Function”.
2
The Set Interface
 the Set interface provides the following methods, which return Boolean
values:
 add(x) adds x to the set. Returns true if x was absent, and false otherwise.
 remove(x) removes x from the set. Returns true if x was present, and false otherwise.
 contains(x) returns true if x is present, and false otherwise.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
4
Designing Set Implementations
“When designing set implementations, we need
to keep the following principle in mind: we can
buy more memory, but we cannot buy more
time.”
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
5
Hash Sets
 An efficient way to implement a Set.
 Uses hashing.
 Ensures that contains(), add() and remove() calls take 𝑂 1 average
time.
 Uses 𝑂(𝑛) memory when 𝑛 is the input size.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
6
Hash Sets
 Typically implemented using an array, called ‘table’, and a hash function ‘h’.
 Two main issues:
 How do we choose the hash function? (won’t be discussed)
 We’ll use a simple modulo hash function.
 What do we do in the case of collisions?
 Two values 𝑥 ≠ 𝑦 s.t. ℎ 𝑥 = ℎ 𝑦
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
7
Hash Sets - Collisions
 Open addressing:
 Each table entry refers to a single item.
 Collisions resolved by applying alternative hash functions to test alternative table entries.
 We know double hashing (data structures 1).
 Closed addressing:
 Each table entry refers to a set of items, traditionally called a bucket.
 Colliding items are placed in the same bucket.
 We will discuss them in this lecture.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
8
Hash Sets - Resizing
 In both kinds of algorithms, it is sometimes necessary to resize the table.
 In open-addressing algorithms, the table may become too full to find
alternative table entries.
 in closed-addressing algorithms, buckets may become too large to search
efficiently.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
9
Hash Sets – Extensible Hashing
 Anecdotal evidence suggests that in most applications, sets are subject to
the following distribution of method calls:
 90% contains()
 9% add()
 1% remove()
 Sets are more likely to grow than to shrink, so we will focus here on
extensible hashing, in which hash sets can only grow.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
10
Natural Parallelism
 So far we discussed how to extract parallelism from data structures that
seemed sequential and provided few opportunities for parallelism:
 Lists, Queue, Stacks, etc..
 Concurrent hashing takes the opposite approach:
 Seems to be “naturally parallelizable”.
 Disjoint-access-parallel – concurrent method calls are likely to access disjoint locations.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
11
Base Hash Set
 We start by defining base
hash set implementation
common to all concurrent
closed-addressing hash sets.
1 public abstract class BaseHashSet<T> {
2
protected List<T>[] table;
3
protected int setSize;
4
public BaseHashSet(int capacity) {
5
setSize = 0;
6
table = (List<T>[]) new List[capacity];
7
for (int i = 0; i < capacity; i++) {
8
table[i] = new ArrayList<T>();
9
}
10 }
11 ...
12 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
12
Base Hash Set
 The BaseHashSet<T>
implements add(x),
contains(x) and remove(x)
 The BaseHashSet<T> does
not implement the abstract
methods acquire(x),
release(x), policy() and
resize()
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public boolean contains(T x) {
acquire(x);
try {
int myBucket = x.hashCode() % table.length;
return table[myBucket].contains(x);
} finally {
release(x);
}
}
public boolean add(T x) {
boolean result = false;
acquire(x);
try {
int myBucket = x.hashCode() % table.length;
result = table[myBucket].add(x);
setSize = result ? setSize + 1 : setSize;
} finally {
release(x);
}
if (policy())
resize();
return result;
}
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
13
Base Hash Set – abstract functions
 acquire(x) acquires the locks necessary to manipulate item x.
 release(x) releases them.
 policy() decides whether to resize the set, and
 resize() doubles the capacity of the table[] array.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
14
Hash Sets – policy()
 The policy() and resize() methods make sure that the hash table’s method calls take
constant expected time.
 Will only happen if traversal in a bucket (linked list) takes constant time, means the
bucket has constant expected length.
 So, when to resize? Many strategies:
 When the average bucket size exceeds a fixed threshold.
 When more than ¼ of the buckets exceed the bucket threshold, or a single exceeds the global
threshold.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
15
Concurrent closed-Addressed Hash Sets
 We are going to learn four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set
 Striped Hash Set
 Refinable Hash Set
 Lock-Free Hash Set
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
16
Coarse-Grained Hash Set
 Synchronization is provided by
a single reentrant lock.
 Limited parallelism.
 Disjoint-access operations
can’t run in parallel.
1 public class CoarseHashSet<T> extends BaseHashSet<T>{
2
final Lock lock;
3
CoarseHashSet(int capacity) {
4
super(capacity);
5
lock = new ReentrantLock();
6
}
7
public final void acquire(T x) {
8
lock.lock();
9
}
10 public void release(T x) {
11
lock.unlock();
12 }
13 ...
14 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
17
Coarse-Grained Hash Set
An “average
bucket size”
policy.
Create a new table and
move all the items from
the previous table to
the new table using the
new hash function.
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
public boolean policy() {
return setSize / table.length > 4;
}
public void resize() {
int oldCapacity = table.length;
lock.lock();
try {
if (oldCapacity != table.length) {
return; // someone beat us to it
}
int newCapacity = 2 * oldCapacity;
List<T>[] oldTable = table;
table = (List<T>[]) new List[newCapacity];
for (int i = 0; i < newCapacity; i++)
table[i] = new ArrayList<T>();
for (List<T> bucket : oldTable) {
for (T x : bucket) {
table[x.hashCode() % table.length].add(x);
}
}
} finally {
lock.unlock();
}
}
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
18
Concurrent closed-Addressed Hash Sets
 We are going to learn four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set
 Striped Hash Set
 Refinable Hash Set
 Lock-Free Hash Set
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
19
Striped Hash Set
 Instead of using a single lock to synchronize the entire set, we split the set into
independently synchronized pieces.
 The method is called lock striping.
 The set is initialized with an array of locks and an array of buckets with the same
size 𝐿(=capacity).
 When resizing, the table will grow but the array of locks will not.
 Table size signed with 𝑁.
 Lock 𝑖 eventually protects each table entry 𝑗 where 𝑗 = 𝑖 𝑚𝑜𝑑 𝐿 .
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
20
Striped Hash Set
Lock 𝑖 = 5
covers both
locations that are
equal to 5
modulo 𝐿.
a Striped hash set with 𝑁 = 16 and 𝐿 = 8.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
21
Striped Hash Set
A final array of
locks.
1 public class StripedHashSet<T> extends BaseHashSet<T>{
2
final ReentrantLock[] locks;
3
public StripedHashSet(int capacity) {
4
super(capacity);
5
locks = new Lock[capacity];
6
for (int j = 0; j < locks.length; j++) {
7
locks[j] = new ReentrantLock();
8
}
9
}
10 public final void acquire(T x) {
11
locks[x.hashCode() % locks.length].lock();
12 }
13 public void release(T x) {
14
locks[x.hashCode() % locks.length].unlock();
15 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
22
Striped Hash Set
 Resizing is a “stop-theworld” operation.
 resize() acquires the
locks in ascending order.
 This ensures us that
resize() cannot
deadlock with any other
operation.
Releases in ascending
order as well. Is it
necessary?
16 public void resize() {
17 int oldCapacity = table.length;
18 for (Lock lock : locks) {
19
lock.lock();
20 }
21 try {
22
if (oldCapacity != table.length) {
23
return; // someone beat us to it
24
}
25
int newCapacity = 2 * oldCapacity;
26
List<T>[] oldTable = table;
27
table = (List<T>[]) new List[newCapacity];
28
for (int i = 0; i < newCapacity; i++)
29
table[i] = new ArrayList<T>();
30
for (List<T> bucket : oldTable) {
31
for (T x : bucket) {
32
table[x.hashCode() % table.length].add(x);
33
}
34
}
35 } finally {
36
for (Lock lock : locks) {
37
lock.unlock();
38
}
39 }
40 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
23
Striped Hash Set
 There are two reasons not to grow the lock array every time we grow the
table:
 Associating a lock with every table entry could consume too much space, especially
when tables are large and contention is low.
 While resizing the table is straightforward, resizing the lock array (while in use) is more
complex, (you’ll see in 5 minutes )
 The Striped hash set provides a medium-level granularity
 Between the Coarse-Grained hash set and the Refinable hash set.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
24
Concurrent closed-Addressed Hash Sets
 We are going to learn four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set
 Striped Hash Set
 Refinable Hash Set
 Lock-Free Hash Set
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
25
Refinable Hash Set
 What if we want to refine the granularity of locking as the table size grows, so that
the number of locations in a stripe does not continuously grow?
 Clearly, if we want to resize the lock array, then we need to rely on another form of
synchronization.
 Resizing is rare, so our principal goal is to devise a way to permit the lock array to
be resized without substantially increasing the cost of normal method calls.
 The solution: AtomicMarkableReference<Thread>!
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
26
Atomic Markable Reference - Reminder
 An AtomicMarkableReference<T> is an object that encapsulates both a reference
to an object of type T and a Boolean mark.
 The fields can be updated atomically, either together or individually.
tests and updates
both the mark and
reference fields
updates the mark
if the reference
field has the
expected value
1 public boolean compareAndSet( T expectedReference,
2
T newReference,
3
boolean expectedMark,
4
boolean newMark);
5 public boolean attemptMark(T expectedReference,
6
boolean newMark);
7 public T get(boolean[] marked);
returns the reference and stores
the mark at position 0 in the
argument array.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
27
Refinable Hash Set
 We introduce a globally shared owner field that combines a Boolean value with a
reference to a thread.
 We use the owner as a mutual exclusion flag between the resize() method and
any of the add() methods.
Normally, the
Boolean value is
false, meaning the
set is not in the
middle of resizing.
1 public class RefinableHashSet<T> extends BaseHashSet<T>{
2
AtomicMarkableReference<Thread> owner;
3
volatile ReentrantLock[] locks;
4
public RefinableHashSet(int capacity) {
5
super(capacity);
6
locks = new ReentrantLock[capacity];
7
for (int i = 0; i < capacity; i++) {
8
locks[i] = new ReentrantLock();
9
}
10
owner = new AtomicMarkableReference<Thread>(null, false);
11 }
12 ...
13 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
28
Refinable Hash Set
Spin until no one
else resizes the set.
Check no other
resizes happened.
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public void acquire(T x) {
boolean[] mark = {true};
Thread me = Thread.currentThread();
Thread who;
while (true) {
do {
who = owner.get(mark);
} while (mark[0] && who != me);
ReentrantLock[] oldLocks = locks;
ReentrantLock oldLock = oldLocks[x.hashCode() % oldLocks.length];
oldLock.lock();
who = owner.get(mark);
if ((!mark[0] || who == me) && locks == oldLocks) {
return;
} else {
Success.
oldLock.unlock();
}
Failed, try again.
}
}
public void release(T x) {
locks[x.hashCode() % locks.length].unlock();
}
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
29
Refinable Hash Set
Set myself as the current
resizer. Failure means
another thread is resizing,
means we can return.
Enough because no other
operation can lock when
the mark is true.
61 protected void quiesce() {
62 for (ReentrantLock lock : locks) {
63
while (lock.isLocked()) {}
64 }
65 }
36 public void resize() {
37 int oldCapacity = table.length;
38 boolean[] mark = {false};
39 int newCapacity = 2 * oldCapacity;
40 Thread me = Thread.currentThread();
41 if (owner.compareAndSet(null, me, false, true)) {
42
try {
43
if (table.length != oldCapacity) {
44
return; // someone else resized first
45
}
46
quiesce();
47
List<T>[] oldTable = table;
48
table = (List<T>[]) new List[newCapacity];
49
for (int i = 0; i < newCapacity; i++)
50
table[i] = new ArrayList<T>();
51
locks = new ReentrantLock[newCapacity];
52
for (int j = 0; j < locks.length; j++) {
53
locks[j] = new ReentrantLock();
54
}
55
initializeFrom(oldTable);
Return the mark to
56
} finally {
it’s normal state.
57
owner.set(null, false);
58
}
59 }
60 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
30
Concurrent closed-Addressed Hash Sets
 We are going to learn four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set
 Striped Hash Set
 Refinable Hash Set
 Lock-Free Hash Set
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
31
Lock Free Hash Set
 It’s time to take the next step and make the hash table lock-free.
 Problem: the resize operation is a “stop-the-world” operation, and is very difficult to
be implemented without locks.
 Atomic methods operate only on a single memory location.
 Makes it difficult to move a node atomically from one linked list to another.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
32
Lock Free Hash Set
 Solution: Let’s flip the conventional hashing structure on it’s head!
Move the
items
among the
buckets
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
33
Lock Free Hash Set
 More specifically, keep all items sorted in a single lock-free linked list.
 Use LockFreeList as a black box.
 The buckets are simply references into the list.
 As the list grows, we add bucket references so that no object is ever too far from the
start of a bucket.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
34
Recursive Split-Ordering
 Let’s assume the number of buckets (=capacity) is 𝑁 = 2𝑖 .
 A bucket 𝑏 “contains” the items whose hash code 𝑘 satisfies 𝑘 = 𝑏 𝑚𝑜𝑑 𝑁 .
 Let’s try:
1
𝑁=2
Bucket 0
6
4
0
5
7
Bucket 1
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
35
Recursive Split-Ordering
 Let’s assume the number of buckets (=capacity) is 𝑁 = 2𝑖 .
 A bucket 𝑏 “contains” the items whose hash code 𝑘 satisfies 𝑘 = 𝑏 𝑚𝑜𝑑 𝑁 .
 Let’s try:
𝑁=4
Bucket 0
1
6
4
0
Bucket 1
5
7
Bucket 2
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
Bucket 3
36
Recursive Split-Ordering
 We just defined a total order on the items, which is called recursive split ordering.
 Given a key, it’s order is defined by it bit-reversed value.
 Let’s see some magic: these are our number friends, recursive split-ordered.
000
001
011
100
101
111
0
4
6
1
5
7
Bucket 0
Bucket 2
The items do not move!
Bucket 1
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
Bucket 3
37
Split-Ordered Hash set
 Formally, items of buckets are distinguished by their 𝑖 𝑡ℎ binary digits, when capacity
is 2𝑖 .
 When the capacity grows to 2𝑖+1 , bucket 𝑏 splits between bucket 𝑏 and bucket
𝑏 + 2𝑖 :
 Those for which 𝑘 = 𝑏 𝑚𝑜𝑑 2𝑖+1 remain in bucket 𝑏.
 Those for which 𝑘 = 𝑏 + 2𝑖 𝑚𝑜𝑑 2𝑖+1 migrate to bucket 𝑏 + 2𝑖 .
 The recursive-split ordering ensures us that these two groups are positioned one
after another in the list.
 So splitting is achieved by simply setting bucket 𝑏 + 2𝑖 to reference the first item of
the second group.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
38
Split-Ordered Hash set
 So, a Split-Ordered Hash set is an array of buckets, where each bucket is a
reference into a lock-free list.
 The nodes in the list are sorted by their bit-reversed hash codes.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
39
Split-Ordered Hash set – Corner Cases
 A bucket is initialized when first used.
 To avoid deletion of a node referenced by a bucket , we add a sentinel node, which
is never deleted, to start of each bucket.
 The sentinel node’s location rules it’s code to be 𝑏 when inserted to bucket 𝑏.
 For example: assume 𝑁 = 4, and bucket 3, the sentinel node’s hash key is 3, therefore it’s bitreversed hash code is 11000000.
 Every ordinary node in bucket 3 has bit-reversed hash code starts with 11, which ensures us that the
sentinel node always stays first in the bucket.
 To avoid contradiction between a sentinel node and an ordinary node, we set the
MSB of the ordinary nodes to 1.
Sentinel node
Ordinary node
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
40
Split-Ordered Hash set – add() simulation
 Let’s add an item with hash code 𝑘 = 10.
 10 = 2 (𝑚𝑜𝑑 4), means we should initialize bucket 2.
(𝒂)
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
41
Split-Ordered Hash set – add() simulation
(𝒂)
 Bucket 2 is being initialized, at first an appropriate
sentinel node is created.
(𝒃)
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
42
Split-Ordered Hash set – add() simulation
(𝒃)
 Let bucket 2 reference the sentinel node.
(𝒄)
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
43
(𝒄)
Split-Ordered Hash set – add() simulation
 Finally, the ordinary node with 𝑘 = 10 is inserted to
bucket 2.
(𝒅)
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
44
Split-Ordered Hash set
The BucketList<T>
implements the lock-free
list used by the splitordered hash set, as
explained earlier.
An AtomicInteger is an
integer which supports
the atomic operations:
• get()
• getAndIncrement()
• compareAndSet()
1 public class LockFreeHashSet<T> {
protected BucketList<T>[] bucket;
2
protected AtomicInteger bucketSize;
3
protected AtomicInteger setSize;
4
public LockFreeHashSet(int capacity) {
5
bucket = (BucketList<T>[]) new BucketList[capacity];
6
bucket[0] = new BucketList<T>();
7
bucketSize = new AtomicInteger(2);
8
setSize = new AtomicInteger(0);
9
10 }
11 ...
12 }
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
45
Split-Ordered Hash set
Like the earlier
policy().
we ensure that no
another resize
happened.
That’s the new
resize() method!
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public boolean add(T x) {
int myBucket = BucketList.hashCode(x) % bucketSize.get();
BucketList<T> b = getBucketList(myBucket);
if (!b.add(x))
return false;
int setSizeNow = setSize.getAndIncrement();
int bucketSizeNow = bucketSize.get();
if (setSizeNow / bucketSizeNow > THRESHOLD)
bucketSize.compareAndSet(bucketSizeNow, 2 * bucketSizeNow);
return true;
}
private BucketList<T> getBucketList(int myBucket) {
if (bucket[myBucket] == null)
initializeBucket(myBucket);
return bucket[myBucket];
}
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
46
Split-Ordered Hash Set – Array of buckets
 To avoid technical distractions, we kept the array of buckets in a large, fixed-size
array.
 This design is obviously far from ideal.
 In practice, there are more efficient ways to present the buckets.
 Multilevel tree
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
47
Concurrent closed-Addressed Hash Sets
 We are going to learn four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set
 Striped Hash Set
 Refinable Hash Set
 Lock-Free Hash Set
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
48
Summary
 We learned four concurrent closed-Addressed Hash Sets data structures:
 Coarse-Grained Hash Set – not practical, doesn’t allow parallelism at all.
 Striped Hash Set – medium granularity, keeps us from the heavy burden of growing the lock array on
with every resize, and saves us space.
 Refinable Hash Set – fine granularity leads to great disjoint-access performance, but resizes are
“stop-the-world” and heavy operations.
 Lock-Free Hash Set – Lock free, provides opportunity to ultimate disjoint-access and parallel
performance, but is complicated and hard to implement.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
49
Simulation
 Simulation with fixed 10,000,000 operations, capacity=128.
 Varying Contain()%, #Threads.
 Comparing all four concurrent closed-addressed hash sets
 Except LockFreeHashSet, results taken from: A Lock-Free Wait-Free Hash Table, Dr. Cliff Click, Azul
Systems.
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
50
Simulation – 75% Contains()
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
51
Simulation – 75% Contains()
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
52
Simulation – 95% Contains()
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
53
Simulation – 95% Contains()
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
54
QUESTIONS?
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
55
THE END
© Maurice Herlihy & Nir Shavit, The Art of Multiprocessor Programming
56
Download