The Multi-Disciplinary Nature of Technology

advertisement
When is O(n lg n)
Really
O(n lg n)?
A Comparison of the Quicksort and
Heapsort Algorithms
Gerald Kruse
Juniata College
kruse@juniata.edu
Huntingdon, PA
Outline
 Analyzing Sorting Algorithms
 Quicksort
 Heapsort
 Experimental Results
 Observations
(this is a fun, open-ended, student project)
How Fast is my Sorting Algorithm?
“A nice blend of Math and CS”
The Sorting Problem, from Cormen et. al.
Input: A sequence of n numbers, (a1, a2, … an)

Output: A permutation (reordering) (a1’, a2’, … an’) of the input sequence
such that a1’ ≤ a2’ ≤ … ≤ an’
Note: This definition can be expanded to include sorting primitive data such
as characters or strings, alpha-numeric data, and data records with key
values.
 Sorting algorithms are analyzed using many different metrics:
expected run-time, memory usage, communication bandwidth,
implementation complexity, …
 Expected running time is given using “Big-O” notation
O( g(n) ) = { f(n): pos. constants c and n0 s.t. 0 ≤ f(n) ≤ c*g(n)  n ≥ n0 }.
While O-notation describes an asymptotic upper bound on a function, it is
frequently used to describe asymptotically tight bounds.
Algorithm analysis also requires a
model of the implementation
technology to be used
The most commonly used model is RAM, the RandomAccess Machine.
This should NOT be confused with Random-Access
Memory.
Each instruction requires an equal amount of processing time
Memory hierarchy (cache, virtual memory) is NOT modeled
The RAM model is relatively straightforward and “usually an
excellent predictor of performance on actual machines.”
Quicksort
“Good” partitioning means the partitions are usually equally sized
After a partition, the element partitioned around will be in the correct position
There are n compares per level, and log(n) levels, resulting in an algorithm that
should run proportionally to n * lg n, taking the assumptions of the RAM model
Quicksort
Pathological data leads to “bad” or unbalanced partitions and the worstcase for Quicksort
The element partitioned around will be in sorted position
This data will be sorted in O(n2) time, since there are still n compares per
level, but now there are n -1 levels.
Heaps
A heap can be seen as a complete binary
tree:
16
14
10
8
2
7
4
9
3
1
In practice, heaps are usually implemented as
arrays.
A = 16 14 10 8
7
9
3
2
4
1
Heaps, continued
16
14
10
8
2
7
4
9
3
1
Heaps satisfy the heap property:
A[Parent(i)]  A[i]
for all nodes i > 1
In other words, the value of a node is at most the value of its parent.
By the way, e-Bay uses a “heap-like” data structure to track bids.
Heapsort
Heapsort(A)
{
BuildHeap(A);
for (i = length(A) downto 2)
{
Swap(A[1], A[i]);
heap_size(A) -= 1;
Heapify(A, 1);
}
}
When the heap property is violated at just one node (which has subtrees which are valid heaps), Heapify “floats down” the parent node to
fix the heap. Remembering the tree structure of the heap, each
Heapify call takes O(lg n) time.
Since there are n – 1 calls to Heapify, Heapsort’s expected execution
time is O(n lg n), just like Quicksort.
Counting Comparisons
Ratio Test for Heapsort w/Number of Key Comparisons
# comp / (n*lg n)
2.00
1.95
1.90
1.85
1.80
0
500000
1000000
1500000
2000000
2500000
3000000
Input Size, n
Win/Rec/C++
Win/Seq/C
3500000
4000000
Timing Results
Ratio Test for Quicksort
Ratio Test for Heapsort
T(n) / (n*lg n)
0.000000045
T(n) / (n*lg n)
0.000000040
0.000000035
0.000000030
0.000000025
0.000000020
0
500000
1000000
1500000
2000000
0.00016
0.00014
0.00012
0.0001
0.00008
0.00006
0.00004
0.00002
0
0
500000
1000000
Input Size, n
Input Size, n
Win/Rec/C++/Ox
Win/Rec/C++/O2
1500000
2000000
Observations
Implementation
Run on Windows and Unix based machines, implemented in C, C++, and Java,
and based on psuedo-code from: Cormen et. al., Sedgewick, and Joyce et. al.
Heapsort does not run in O(n lg n) time
even for the relatively small values of n tested
Quicksort does exhibit O(n lg n) behavior
Consider the memory access patterns
For very large n, we would expect a slowdown for ANY algorithm as the data no
longer fits in memory
For the size n run here, the partitions in Quicksort consist of elements which are
contiguous in memory, while “floating down” a Heap requires accessing elements
which are not close in memory
This is a fun exploration for students, appealing to those with an interest
in the mathematics or computer science
Bibliography
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, “Introduction to Algorithms,
Second Edition,” Cambridge, MA/London, England: The MIT Press/McGraw-Hill, 2003.
N. Dale, C. Weems, D. T. Joyce, “Object-Oriented Data Structures Using Java,” Boston,
MA: Jones and Bartlett, 2002.
M. T. Goodrich and R. Tamassia, “Algorithm Design: Foundation, Analysis, and Internet
Examples,” Wiley: New York: 2001.
D. E. Knuth, “The Art of Computer Programming, Volume 3: (Second Edition) Sorting and
Searching,” Addison-Wesley-Longman: Redwood City, CA, 1998.
C. C. McGeoch, “Analyzing algorithms by simulation: Variance reduction techniques and
simulation speedups,” ACM Computing Surveys, vol. 24, no. 2, pp. 195 – 212, 1992.
C. C. McGeoch, D. Precup, and P. R. Cohen, “How to find the Big-Oh of your data set
(and how not to),” Advances in Intelligent Data Analysis, vol. 1280 of Lecture Notes in
Computer Science, pp. 41 – 52, Springer-Verlag, 1997.
R. Sedgewick, “Algorithms in C, Parts 1-4: Fundamentals, Data Structures, Sorting,
Searching, Third Edition,” Addison-Wesley: Boston, MA, 1997
Download