Sorting Really Big Files

advertisement
Sorting Really
Big Files
Sorting Part 3
Using K Temporary Files

Given
 N records in file F
 M records will fit into internal memory
 Use K temp files, where K = N / M

Create K sorted files from F, then merge them

Problems
 computers
compare 2 values at once, not K values
 merging only 2 of K runs at once creates LOTS of temp files
 in the illustration on the next page, notice that we soon begin
merging small runs with big temp files

too many comparisons
What would these trees look like with 8 runs?
Alternative Merging Strategy
F
R1
F
R2
T1
R1
R2
R3
empty
R5
R3
T1
T2
T2
T3
R4
S1
T3
R1 = Run 1
R2 = Run 2
etc
R4
R5
S2
N-Way Merge

We can create that tree using just 4 temp files


2 are input and 2 are output, the pairs alternate being input and output files
Algorithm
Write
Write
Write
Write
...
Merge
Merge
Merge
...
Merge
Merge
...
Run
Run
Run
Run
1
2
3
4
into
into
into
into
T1
T2
T1
T2
first runs in T1 and T2 into T3
second runs in T1 and T2 into T4
thirds runs in T1 and T2 into T3
first runs in T3 and T4 into T1
second runs in T3 and T4 into T2
N-Way Merge
Step
Number
F
T1
T3
T1
T3
T2
T4
T2
T4
Files Contain Runs
1
T1 - R1
T2 - R2
T3 T4 -
R3
R4
2
T1 T2 T3 - R1-R2
T4 - R3-R4
3
T1 T2 T3 T4 -
4
T1 T2 T3 - R1-R8
T4 - R9-R10
5
T1 T2 T3 T4 -
R1-R4
R5-R8
R1-R10
R5
R6
R7
R8
R5-R6
R7-R8
R9-R10
R9
R10
R9-10
Analysis

Number of Comparisons:
 N-Way
Merge -- O (n log2 n)
 K Temp Files -- O ( n2 )

Disk Space

Could the run size be one record?

In other words, is the internal sort necessary?
Download