Sorting Large Files

advertisement
Sorting Large Files
Part One:

Why even bother?
 And a simple solution.
Starter Questions

Why sort a large data file?


speed of searching
Why not sort a large data file?

difficult to add and delete data
Searching Unsorted Files

Algorithm - Sequential Search
 start
at top of the file and inspect each record
until found

Efficiency

best case:
 worst case:
 average case:


1
N
N/2
average search for 1,000,000 records is 500,000 compares
Big O
N
Searching Sorted Files

Example 1: Sequential Search

Example 2: Binary Search
 Basic
Algorithm
look at middle record
if (target < current record)
look at front half
else
look at end half
 Big

O = log2(N)
average search for 1,000,000 records is 20 compares
Editing Unsorted Files

How do you add data?


append new data to end of file
How do you delete data?


mark over records with Xs and 0s
periodically clean the file
Editing Sorted Files

To Delete Records, we cannot put Xs over
the key field of records

Maintain 3 sorted Files
 working
data
 data to delete
 data to add

To Update --> Merge the three all at once
Example Update of Sorted File
Working Data:
aardvark
bat
cat
dog
giraffe
hippopotamus
Data to Delete:
cat
Data to Add:
elephant
ferret
New Working Data:
aardvark
bat
dog
elephant
ferret
giraffe
hippopotamus
Question

Why we would ever need to sort a file?
Wouldn't we build it sorted to begin with
and just keep it sorted?
 sort a big block of new data


e.g., list of transactions from today
sort a huge file by a different key
File Sorting Algorithms

Internal Sorts
 when
the whole file will fit in main memory
 algorithm:
1. read the unsorted file into memory
2. sort all at once
3. write to new file
File Sorting Algorithms

External Sorts
 when
the file is too big to fit in memory
 over simplified algorithm:
while not eof
read a big block of the data into memory
sort that portion
write into a temp file
merge all those temp files
2-Way Merge Sort
Create 2 sorted files
Read 1st half of file W
sort it, then write to
Read 2nd half of W into
sort it, then write to
into memory
file X
memory
file Y
Merge the 2 files
Read record x from X
Read record y from Y
While both X and Y contain records
if x < y
write x to Z
read x from X
else
write y to Z
read y from Y
If X is empty
write remainder of Y to Z
else
write remainder of X to Z
Next Time

Good internal sorts

Merging a small amount of unsorted new
data into a Big Sorted File

N-Way Merge Sort
Download