bintree-problem1

advertisement
Parallelization of Binary Search Trees
1 Problem Description
In this assignment, you will parallelize primitives that occur while inserting and deleting
numbers in a binary search tree (called BST in rest of the document). Further you will
learn ways in which you can avoid deadlocks. Please refer to Chap 12, Binary Search
Trees, in the book “Introduction to Algorithms” by Rivest, Cormen, Leiserson, and Stein.
The basic code is given in the file btree.c. You will need one of the two header files
x86_funcs.h or aix_funcs.h depending on which machine you use.
Fig.1 : Example of a binary search tree.
In a BST, each node has a maximum of two children, one left and one right. If y & z are
left and right children of a node ‘x’, then, the following property always holds
Value (y) < Value (x) < Value (z)
This property is fundamental to a BST. In our case, each node holds an integer value.
Function node_insert() takes a node whose value is already set and inserts it into the tree.
If the value is already present in the tree, no change is done to the tree. Function
node_delete() takes an integer as argument, searches it in the tree, and if found, deletes
the node containing the value. Initially, main() function allocates an array of nodes using
create_node_array() function. We also create two other arrays. We pre-allocate the
numbers to be inserted into the tree or to be deleted from the tree, by calling rand()
function on for each element of these arrays. These operations are to be done outside the
parallel loop as both malloc() and rand() functions are not parallel. Each iteration of the
insertion loop takes a number from data_array_1, inserts it into next available node from
array_1 and inserts the node into the tree by calling node_insert(). Similarly, each
iteration of the deletion loop takes a number from the data_array_2, searches the tree for
a node containing the number and deletes the node, if found by calling node_delete().
Note that the first for loop only does insertions and the second for loop does only
deletions. Each for loop is called sequentially. However, each for loop is parallelized
using OpenMP directives.
Further note that, sequential execution and parallel execution will lead to different trees at
the end of the execution. This is not a correctness problem as long as serializability is
maintained. “a parallel execution of a group of operations or primitives is serializable
if there is some sequential execution of the operations or primitives that produce
an identical result”. Because the tasks modify the tree, the code shows a correct (but
inefficient) solution of using an OpenMP coarse-grain lock encapsulating the entire body
of the insertion and deletion function. This is similar to the coarse grain locking approach
discussed in the textbook at Chapter 5, Section 5.2.1.
2 Assignment
2.1. Global Lock Approach. You will be provided with a sequential code of Binary
Search Tree called bst.c. Your first assignment is to implement a global lock
approach (example in Section 5.2.2), in which an insertion/deletion operation
is broken down into two logical steps: (1) the traversal to find where
to insert a node or where the node to be deleted is located, and (2) modification
of the binary search tree. The first (traversal) step can be performed in parallel
in this approach. Only when a thread needs to modify the tree (after completing
its traversal), it enters a critical section, modifies the tree, and exits
the critical section. Of course, prior to modifying the tree, the thread must
check whether the parts of the tree structure that it relies on for correctness
and has read prior to entering the critical section have not been changed by
other threads, between the time it read them and the time it has successfully
entered the critical section. The correctness of your implementation greatly
depends on how you perform the check.
2.2. Fine-Grain Lock Approach. Your second assignment is to implement the
Fine-Grain Lock approach (example in Section 5.2.3). For this, you need
to associate a lock variable with each node. Rather than using both read
and write locks, to simplify the problem, use write locks for everything,
including when read locks can be used. With this fine-grain
locking, different threads can completely execute in parallel as long as they
modify different parts of tree. Only when two threads conflict on a
node, they will execute sequentially. The correctness of your implementation
greatly depends on which nodes you lock for a given operation, the order of
their locking, and how you check for the possibility of a race situation in
which the tree has changed prior and after entering the critical section. See end of this
document for some implementation suggestions.
Perform and report the following for each of the two assignments:
 Describe how you parallelize the insertion and deletion of nodes, and argue why you
think your implementation is correct.
 State how optimal you think the performance will be, in terms of overheads and
concurrency.
 Run the program and report the performance results (execution time) that you obtain
by running the program with one thread, two threads, and four threads. For each
number of threads, try the following number of elements, NUM_ELEMENTS is equal
to 1000, 10000, 100000, and 1000000.
 In order to minimize affects of other programs running on the computer at the same
time as yours, it is better to run 3-4 iterations of each experiment and report the best
time you get.
 Your report should contain the following:
o Two figures, one for the global lock approach and the second for the fine
lock approach. NUM_THREADS should vary along the x-axis and the
execution time should vary along the y-axis.
o Each figure consists of four different lines, each line corresponds to a
different number of elements (as mentioned above). Each line consists of
four points that represent one thread, two threads, four threads, and eight
threads.
o A detailed discussion following the figures that explains the differences
among the four lines and any other observations you can point out. i.e
diminishing returns.
o A comparison between the fine lock approach and the global lock
approach in terms of overhead and execution time.
Implementation suggestions
When implementing fine grain lock, you must be careful in the order you lock the various
nodes so as to prevent deadlocks. For example, think of what happens when two threads
try to delete a node and one of its children. It is very possible (especially with
node_delete() ) that thread 1 will want to lock nodes A,B,C,D (in that order) and thread 2
will want to lock nodes B,A,C,D (in that order). If you naively try to lock the nodes in the
order in which your code encounters them, you will run into a deadlock. Here are two
suggestions on how to overcome this problem.


Sort the nodes on the basis of the value stored in the nodes.
Sort the nodes on the basis of their addresses.
In either case, you will need to create an array of pointers to nodes which need to locked
and perform sorting on it. You may find the function qsort() which can be accessed by
including stdlib.h , to be useful. qsort() function takes a comparison function as fourth
argument. You may find the following function useful for the purpose.
int compare (const void* x, const void* y)
{
int xx = (int)*(node_t*)x;
int yy = (int)*(node_t*)y;
int ret_val = -100;
if (xx == yy)
ret_val = 0;
else if (xx > yy)
ret_val = 1;
else if (xx < yy)
ret_val = -1;
return ret_val;
}
What to hand in:
1- An Output file that contains the binary tree printed out using the function
“do_inorder_print()” provided in bst.c.
2- Source code of the global lock approach and the fine lock approach. (Don’t print
source code on paper)
3- A report file.
4- All files should be put in one zipped folder, named as groupID_BST.zip
5- You should submit it electronically (Emails are not accepted).
Download