CEng 315 Homework 2 – External Merge Sort Overview of External Merge Sort External merge sort is useful when the data to be sorted is too big to fit into memory. The basic idea is to sort the data in chunks and combine these chunks using K-way merge sort. Assume that we have N unsorted elements in a data file, and the main memory allows to keep only M data items at any given time. In the first stage of the algorithm, M items are read from the file. These M items are sorted and written to a temporary output file. Such a set of M items is called a run. In your implementation, you must use the “sort” function available in <algorithm> library to sort the M items. You must also use a single temporary file named “extmerge.out” where you concatenate all of the runs. In the second stage of the algorithm, K sorted runs are merged at a time until there is a single run left (which comprises the final output). To sort K runs, a single item is read from each of the K runs and inserted into a priority queue (You must use the “priority_queue” container from the <queue> library for this purpose). The smallest item is then removed from the priority queue and saved to a temporary output file. A new data item is read from the run where the smallest item originally came from and is inserted into the queue. The process is repeated until all the items from K runs are read and saved into the temporary file in sorted order. The merge stage is repeated hierarchically in multiple passes. The example below shows a 2-pass external merge sorting for K=3. Implementation Instructions: Implement your external merge sort program in C++ and submit your code as a single “extmerge.cpp” file. You must use data structures and algorithms available in STL as described above. I recommend you use stdio.h library for file input/output instead of the C++ I/O. You must use only two files: the original “extmerge.inp” input file and the “extmerge.out” output file. You can alternate between these two files during each pass, using one of them as input and the other one as temporary output file. You may use the “fileseek” function to read from the correct location corresponding to a run in the temporary input file. You can copy over the final sorted items into “extmerge.out” file if it is not already so. Input File Format: The “extmerge.inp” file is given in the following binary format. The data to be sorted are double precision floating numbers. The number of data items N is not given in the input file; you need to deduce it from the file size. You can ignore the value for the number of passes P given in the input file. long M long K long P double x1 double x2 ……………………………………... double xn Output File Format: The “extmerge.out” output file needs to be in the same binary format as the input file. The value P needs to be the correct number of passes your program used in external merge sort. You can use the sample input and output binary files given on COW to test your program. Grading Notes: Your program will be allowed to use a limited amount of memory to enforce external merge sort. The grade you receive from the autograder on COW is not final; your implementation will be tested further after the deadline and confirmed manually. Any program that does not implement external merge sort as described above will receive zero for grade. All the code must be written by you. Do not copy-paste any portion of your code from other sources. Your source code will be checked automatically for plagiarism (against other students’ homeworks, and against the web). 1 of 1