Uploaded by ahmetyarimcam

20081.hw2.externalmerge

advertisement
CEng 315 Homework 2 – External Merge Sort
Overview of External Merge Sort
External merge sort is useful when the data to be sorted is too big to fit into memory. The basic idea is to sort
the data in chunks and combine these chunks using K-way merge sort.
Assume that we have N unsorted elements in a data file, and the main memory allows to keep only M data items
at any given time. In the first stage of the algorithm, M items are read from the file. These M items are sorted and
written to a temporary output file. Such a set of M items is called a run. In your implementation, you must use the
“sort” function available in <algorithm> library to sort the M items. You must also use a single temporary file named
“extmerge.out” where you concatenate all of the runs.
In the second stage of the algorithm, K sorted runs are merged at a time until there is a single run left (which
comprises the final output). To sort K runs, a single item is read from each of the K runs and inserted into a priority
queue (You must use the “priority_queue” container from the <queue> library for this purpose). The smallest item is
then removed from the priority queue and saved to a temporary output file. A new data item is read from the run
where the smallest item originally came from and is inserted into the queue. The process is repeated until all the
items from K runs are read and saved into the temporary file in sorted order. The merge stage is repeated
hierarchically in multiple passes. The example below shows a 2-pass external merge sorting for K=3.
Implementation Instructions: Implement your external merge sort program in C++ and submit your code as a
single “extmerge.cpp” file. You must use data structures and algorithms available in STL as described above. I
recommend you use stdio.h library for file input/output instead of the C++ I/O. You must use only two files: the
original “extmerge.inp” input file and the “extmerge.out” output file. You can alternate between these two files
during each pass, using one of them as input and the other one as temporary output file. You may use the “fileseek”
function to read from the correct location corresponding to a run in the temporary input file. You can copy over the
final sorted items into “extmerge.out” file if it is not already so.
Input File Format: The “extmerge.inp” file is given in the following binary format. The data to be sorted are
double precision floating numbers. The number of data items N is not given in the input file; you need to deduce it
from the file size. You can ignore the value for the number of passes P given in the input file.
long M long K long P double x1 double x2 ……………………………………... double xn
Output File Format: The “extmerge.out” output file needs to be in the same binary format as the input file. The
value P needs to be the correct number of passes your program used in external merge sort. You can use the sample
input and output binary files given on COW to test your program.
Grading Notes: Your program will be allowed to use a limited amount of memory to enforce external merge sort.
The grade you receive from the autograder on COW is not final; your implementation will be tested further after the
deadline and confirmed manually. Any program that does not implement external merge sort as described above
will receive zero for grade.
All the code must be written by you. Do not copy-paste any portion of your code from other sources. Your
source code will be checked automatically for plagiarism (against other students’ homeworks, and against the web).
1 of 1
Download