Lock-free Skiplists

advertisement

Hao Li

Project Report on

Concurrent Lock-free Skiplist

By

Fang Huang

Hao Li

Rohit Nandwate

Page 1

Index

1. Introduction

2. Data structure properties

3. Implementation details

4. Testing

5. Performance

6. Conclusion

5

7

3

4

8

18

Hao Li Page 2

Introduction

Skiplist data structure has been around since the early 1990s. It is a logarithmic time search structure. It does not require any rebalancing like other popular search structures. In concurrent applications rebalancing can prove to be a bottleneck and hence lead to high contention between threads. As Skiplist do not need rebalancing they are considered to be a great alternative to balanced search trees when we want to implement concurrent applications. In our implementation we have a lock-free version of the concurrent skiplist. We use atomic instructions like compare-swap to achieve the lock-free performance.

The Skiplist that we have implemented is based on an abstract set. This has been done to simplify the implementation to some extent. What this means is that the list will contain only unique keys for elements. The skiplist maintains this set in a sorted order. This sorted order is one of the important reasons that helps us achieve logarithmic time search.

Hao Li Page 3

Data structure properties

A simple way to describe a skiplist would be to say that it is a group of simple linked lists. Each linked list in this group is given a level. Every list in a higher level contains fewer elements than the list in the level below itself. Thus a level 0 skiplist is nothing but a simple linked list containing all the elements present in our skiplist.

Each node is part of at least one i.e. the lowest level list, and maximum it could be part of all the lists. Another way to look at skiplist would be to think of it as a simple linked list with each node containing more than one next pointer. So basically depending on what we are looking for we can choose which next pointer we want to follow in our traversal of the data structure. Now we can see why maintaining a sorted order in the list is important. As it will help us when we want to find something in the list. As the higher skiplist contains fewer elements i.e. are sparser we can use them to reach the node/element we are looking for in fewer steps.

We provide the following interface for our skiplist implementation:

Hao Li Page 4

bool add(T item); bool remove(T item); bool contains(T item);

All the methods are lock-free. The contains method is wait-free as well. What we mean by wait free will be described in the next section.

Hao Li Page 5

Implementation details

Lock-free implementation of skiplist uses atomic operations like compare-swap. A compare-swap relies on a snapshot of the state of the data structure, and succeeds only if the at the time of swapping the snapshot is still valid. So basically a CAS might fail. So every time a CAS fails we have to basically restart the whole process by taking a new snapshot. Here a lot of details are being overlooked, but this is the basic principle on which CAS works. So if we are using CAS for any operation then that operation is not necessarily bounded in terms of time. It might take some time before the operation can actually go through successfully. Hence the methods add() and remove() might need to wait. However the contains method does not modify the data structure in anyway. Hence it does not use CAS and hence is lock-free as well as wait-free.

The add() and remove() methods use a helper method which is not exposed in the interface called find() , this method basically helps with finding a particular element you wish to insert or delete from the skiplist. The find method is the one that cleans up the skiplist i.e. physically removes nodes from the list. Whenever we

Hao Li Page 6

remove a node, it is first logically removed by marking the node.

Hence we need someone to actually change the pointers and remove the node physically from the list. This is done by the find method every time it comes across a marked node. It uses CAS for changing the pointers.

However contains() method, which pretty much does the same thing as find() , does not physically remove the marked nodes. It basically just skips the marked nodes. Hence it doesn’t use

CAS and is wait-free.

We also use a special trick to do the marking of the nodes.

Instead of using a separate boolean variable to store the mark, we use an unused bit of the pointer to indicate whether a node is marked or not. As we have multiple next pointers in a node, we have to mark each one of them. A node is considered to be removed from the skiplist only when it is marked in the lowest level pointer.

The way we implemented the storing of the Boolean mark inside a pointer is by creating a new class called atomic-markable reference. This class as the name suggests atomically marks a

Hao Li Page 7

reference. It is basically a union of a pointer and a 1 bit bool. As we know a union allocates memory equivalent to the largest member variable and all the variables share the same memory location. This class basically wraps all the mechanism for using the pointer as a mark and provides a simple interface. This interface is as follows:

T* getReference();

T* get(bool &mark); bool compareAndSwap(T* expected_ref, T* new_ref, bool expected_mark, bool new_mark);

The way this class works is that it basically sets the last bit of a pointer to true if we want to mark the pointer. However since this is an unsed bit, for every valid pointer it must be false. Hence we only let the user access the actual pointer value using the interface’s getReference() method. In this method we basically mark the last bit irrespective of whether it is set or not. This way when the user wants to use the pointer, a valid pointer is always returned by the getReference() method.

This is a very crucial trick and one of the most important parts of the lock-free implementation. This trick is pretty valuable and can

Hao Li Page 8

definitely be used with other data structures to avoid the use of locks where more than one variable needs to be changed atomically.

Hao Li Page 9

Testing

We adopted Google C++ test library to do the unit testing.

We ’ve written extensive test cases on AtomicMarkableReference class and LockFreeSkipList class.

For both of the classes, we have sequential test cases and concurrent test cases. We did find several bugs in the application and fixed them. For example, in the heavy load concurrent

AtomicMarkableReference test cases, after we theoretically proved that there ’s a possibility of getting different version of mark and reference, we managed to make it happen during test cases and after fixing that, we make sure that the reference and mark are of the same version, either both new or both old.

Since LockFreeSkipList is similar as integration test, we didn ’t work further on that.

Hao Li Page 10

Performance

Comparing Skiplist with Map

To test the performance of lock-free skiplist, we set up two kinds of application environment. In both cases, we compare lockfree skiplist with map in C++ standard library.

First, we set up an skiplist with 100,000 initial elements, and an map with exact the same elements. Then we generate 400,000 searching requests from 10 threads on a eight-core machine, to see how long does skiplist take to finish these requests. We vary the number of requests and see how does skiplist perform corresponding to that. We did the same to a locked map. The time lineally increases as number of requests increase for both data structure, but skiplist performs more than 4 times better than map. See figure 1 below.

Hao Li Page 11

40

30

20

10

0

Time (Seconds)

60

50

Number of operations

Figure 1

Second, in addition to huge amount of searching requests, we starts 6 other threads to keep adding and removing elements from skiplist and map, and we got the performance graph of both while the data structure is changing. See figure 2.

Hao Li Page 12

35

30

25

20

15

10

5

0

Number of operations

Figure 2

In the second case, skiplist still performs better than map, but takes two times longer than in the first case. The map didn’t get worse in this case. On the contrary , it’s get a little better in the second one.

Hao Li Page 13

Choose the Best Maximum Level

In addition to comparing, we also explored how to choose the best maximum level for skiplist. We tested the skiplist with varying max level, with 100,000 initial nodes and 400,000 searching requests.

15

10

5

0

Time (Seconds)

25

20

13

Figure 3

15 16 18 23

MaxLevel

From figure 3, we see that when there were pure searching requests, skiplist will takes slightly more time when the max level increases. If there were a lot adding and removing requests, skiplist will be much more sensitive to the value of max level. Thus, choosing a good max level is important.

Hao Li Page 14

We have 100,000 initial nodes. As adding and removing requests are being served, the number of nodes will vary from

100,000 to 1,700,000. Thus, the calculated best level is log(900000)/log(2) = 20.

However, we got the optimal value of 16. It is because that if there are more levels, the time of taking snapshots will be longer, and it is more likely some other threads will change the predecessor or successor of current nodes. Thus, the best max level is a little less than the logarithmic calculation of amount of nodes.

Hao Li Page 15

Scalability

To see if the skiplist is scalable, we set up 100,000 initial nodes.

Each thread will perform 400,000 searching. We vary the number of threads, and got figure below.

Time (Seconds)

300

100

50

0

250

200

150

Pure Searching

With Many Mods

Number of threads

Figure 4

The red line is the pure searching case. In this case, skiplist is scalable, the time goes linearly up as the total number of requests

(number of threads * 400,000) goes up.

The green line is searching with many adding and removing requests. Except the spike at 90, the graph is roughly linear. We have to state that the green line is overestimate the time spent, because

Hao Li Page 16

we didn ’t free memory after deletion. As adding and removing requests keep coming, the available memory becomes less. At some point, it will greatly affect the performance. If memory is freed at every physical deletion phase, skiplist will perform better than in the graph.

If we consider the memory problem, the skiplist in a changing environment is also scalable.

Hao Li Page 17

Conclusion

Lock-free skiplist is much more efficient than traditional data structures in a concurrent environment. Its key advantage is generating levels of nodes randomly. Thus, it avoids the rebalancing problem of tree structure.

However, choosing the best max level is crucial to get expected performance. We will suggest take a logarithmic calculation of estimated amount of elements as the first guess, or use 15 as it performs good in our pretty large tests.

There still a lot to do with skiplist, such as the ABA problem.

Migrating Java programs into C++ ones is not as straightforward as it seems to be. The garbage collection mechanism, memory management system will affect the way of programming deeply.

Hao Li Page 18

Download