Parallelization of Merge Sort 1 Problem Description In this assignment, you will write a parallel version of the popular sorting algorithm, Merge Sort. The basic code is given in the file merge_sort.c you will need one of the two header files x86_funcs.h or aix_funcs.h depending on which machine you use. Merge Sort is described in detail in Sec2.3.1 of the book “Introduction to Algorithms” by Rivest, Cormen, Leiserson, and Stein. From the book, “The merge sort algorithm closely follows the divide-and-conquer paradigm. Intuitively, it operates as follows. Divide: Divide the n-element sequence to be sorted into two subsequences of n/2 elements each. Conquer: Sort the two subsequences recursively using merge sort. Combine: Merge the two sorted subsequences to produce the sorted answer. The recursion “bottoms out” when the sequence to be sorted has length 1, in which case there is no work to be done, since every sequence of length 1 is already in sorted order.” 2. Assignment Your assignment is to parallelize the basic merge sort algorithm. However, in the classical merge sort, each array is divided into two sub arrays, which indicates that you are likely to get a speed up only up to two threads. Explore and experiment with dividing the list into four sub arrays while each sub array is sorted using one thread. You will need to change the last combine step accordingly to handle four sub arrays. Problem 6.5-8 in the above mentioned book (2nd Edition) will be useful for merging. Perform and report the following: Describe the procedures you have taken in order to parallelize the algorithm. Run the program and print out the sorted list, and report the performance results (execution time) that you obtain by running the program with one thread(sequential code), two threads(the list is divided into two subarrays), and four threads(list is divided into 4 subarrays). For each number of threads, try the following number of elements, where NUM_ELEMENTS is equal to 1000, 10000, 100000, and 1000000. In order to minimize affects of other programs running on the computer at the same time as yours, it is better to run 3-4 iterations of each experiment and report the best time you get. Your report should include the following: o One figure where NUM_THREADS should vary along the x-axis and the execution time should vary along the y-axis. o The figure consists of four different lines, each line corresponds to a different number of elements (1000, 10000, 100000, 1000000). Each line consists of three points that represent the execution time of one thread, two threads, and four threads. o An insightful discussion following the figure that explains the differences among the four lines and any other observations you can point out. i.e diminishing returns. Implementation Hints: You will be using OpenMP directives to parallelize the code. In this part of the assignment, you need to pay attention to functional parallelism and how OpenMP achieves that. i.e parallel sections. What to hand in: 1- An Output file that contains the sorted list. 2- Source code of the different cases: two threads, and four threads. (Don’t print source code on paper) 3- A report file. 4- All files should be put in one zipped folder, named as groupID_mergeSort.zip 5- You should submit it electronically (Emails are not accepted).