Sorting Really Big Files Sorting Part 3 Using K Temporary Files Given N records in file F M records will fit into internal memory Use K temp files, where K = N / M Create K sorted files from F, then merge them Problems computers compare 2 values at once, not K values merging only 2 of K runs at once creates LOTS of temp files in the illustration on the next page, notice that we soon begin merging small runs with big temp files too many comparisons What would these trees look like with 8 runs? Alternative Merging Strategy F R1 F R2 T1 R1 R2 R3 empty R5 R3 T1 T2 T2 T3 R4 S1 T3 R1 = Run 1 R2 = Run 2 etc R4 R5 S2 N-Way Merge We can create that tree using just 4 temp files 2 are input and 2 are output, the pairs alternate being input and output files Algorithm Write Write Write Write ... Merge Merge Merge ... Merge Merge ... Run Run Run Run 1 2 3 4 into into into into T1 T2 T1 T2 first runs in T1 and T2 into T3 second runs in T1 and T2 into T4 thirds runs in T1 and T2 into T3 first runs in T3 and T4 into T1 second runs in T3 and T4 into T2 N-Way Merge Step Number F T1 T3 T1 T3 T2 T4 T2 T4 Files Contain Runs 1 T1 - R1 T2 - R2 T3 T4 - R3 R4 2 T1 T2 T3 - R1-R2 T4 - R3-R4 3 T1 T2 T3 T4 - 4 T1 T2 T3 - R1-R8 T4 - R9-R10 5 T1 T2 T3 T4 - R1-R4 R5-R8 R1-R10 R5 R6 R7 R8 R5-R6 R7-R8 R9-R10 R9 R10 R9-10 Analysis Number of Comparisons: N-Way Merge -- O (n log2 n) K Temp Files -- O ( n2 ) Disk Space Could the run size be one record? In other words, is the internal sort necessary?