New discoveries during the exploration of sorting: How I got my

advertisement
New discoveries during the
exploration of sorting:
How I got my thesis topic
By
Spencer Morgan
Reference: Algorithms, Sequential, Parallel,
and Distributed by K. Berman & J. Paul,
Thompson Course Technology, 2005
Quicksort

Partitions around an element and
recursively sorts the partitioned sets
 Efficient in the average case on large data
sets
 Thought to be not as efficient as insertion
sort on small data sets
 some implementations of Quicksort will
switch to insertion sort when the partition
size is small
Trying to improve the
secondary sort
Use a binary search to identify where
the element will end up to reducing
the number of comparisons. BPInsert
Move the search to the center

The average location in the list is in the
middle.
 Keeping the sorted area in the center will
lessen the amount of movement (or
assignments).
 But elements near the center move back
and forth often occupying temporarily their
resulting location
SMInsert
Treesort
Keep track of where elements need to go to
avoid unnecessary moves back and
forth.
What resulted is a version of treesort where:
1. elements are added to a tree, and,
2. a map of where each element will end up
is created, and
3. the map is processed putting the
elements where they will end up
A major improvement
Separating the assignments from the
comparisons allows the two
processes to be analyzed and
improved independently.
Lower bound for worst-case
complexity of comparison
Lower bound for worst-case complexity
of compares of any comparisonbased sorting algorithm is log2n! or
Ω(n log n).
Proposition 3.5.4 page 99 of Algorithms by Berman & Paul.
Lower bound for worst-case
complexity of assignments
Lower bound for worst-case
complexity of assigns of any in-place
sorting algorithm is Integer(3n/2).
 Worst-case will have all elements out
of place
 To help conceptualize this, let’s
consider different cases

2 Elements
If there are two
elements, three
assignments are
required:
One to temp,
one direct, and
one from temp.
1
2
1
2
2
3
1
1
1 to temp
1 direct
1 from temp
3 Elements
With three elements
four assignments
are required:
One to temp,
two direct, and
one from temp
1
2
2
3
2
3
1
4
1
1 to temp
1 direct
2
1 from temp
1
3
Define Circuit
I define a circuit as: two or more outof-place elements that that can be put
into place with only one assignment
to and from a temporary location
 If there are n elements in a circuit, the
optimal number of assignments will
be n+1 to put them in place
 The 2 & 3 element cases are each 1
circuit

Lower bound for assignments

The lower bound for assignments is
the number of elements out of place
+ the number of circuits.
More Elements
With 4 elements out of place the worst-case
is when there are 2 circuits of 2 elements
(requiring 6 assignments).
The worst case for assignments is when:
1. all elements are out of place, and
2. the number of circuits is maximized
(because each circuit requires an extra
assignment).
Lower bound for worst-case
complexity of assignments
The maximum number of circuits for
any data set is Integer(n/2).
 The lower bound for assignments in
the worst-case is n + Integer(n/2) or
Integer(3n/2).
 This is significantly less than the
lower bound of comparisons (n log n).

Equal elements
Equal elements can allow more efficient
assignments if the sort is not stable. This means
equal elements do not have to keep the same
position relative to each other.
21
1
31
32
22
1
21
22
31
32
The stable ordering has 2
circuits: (21,1) and
(31,32,22) which requires 7
assignments to put inplace
Spencer-Stable w/5
Improvements
If some elements are already in a valid location
(32 in our example) there is no need to move
them. So we could leave them where they
are.
21
1
31
32
22
1
21
22
32
31
This unstable ordering has 2
circuits: (21,1) and (31, 22)
which requires 6
assignments to put inplace
Spencer-Unstable w/5
Optimal Assignments
Connecting circuits among equal elements
will reduce comparisons
21
1
31
32
22
1
22
21
32
31
This unstable ordering has 1
circuit: (21,1,31, 22) which
requires 5 assignments
Spencer-Optimal w/5
Possible Problems
•
•
Optimal assignments have been
attained
But treesort has worst-case
comparisons of order n2
Alter the order of entry
Add the center element and recursively
add the left and right elements (the
tree will be balanced with ordered
data sets).
Worst-case complexity is still n2
comparisons; but the chances of
having a data set like that in practice
are reduced.
Spencer not sequential
Splay Tree
A splay tree can make comparisons more
efficient (with even partially ordered data)
by doing rotations to move the most
recently accessed element to the root.
But this can result in n2 comparisons and
rotations.
Splay Killer
Splay 1
If only the first element is rotated some
patterns can still be identified but with
half the compares in our extreme
case.
But this is still poor performance and
only has benefits over a regular tree
in isolated cases.
Splay1
Splay/2
only rotates every other node
 allows greater restructuring of the
splay tree than Splay1 but only half
as much as Splay
Splay/2

Is the tree necessary?
Since using the tree was what
originally allowed me to dissociate
the assignments for the comparisons,
to this point, I have focused on using
them
 But since the comparisons are now
isolated from the assignments that
constraint is not necessary

Mergesort
Use a non-in-place (linked list)
version of mergesort as the
comparison step
 But this version of mergesort is stable
and does not provide the necessary
information about equal elements to
achieve optimal assignments (or
comparisons)
BPMerge & Spencer-BPMerge

Improved Mergesort
Use a version of mergesort that
keeps track of equal elements
 If all elements are unique, it will have
the same number of comparisons
 If there are equal elements, there can
be a reduction of comparisons to as
little as n-1.
SMerge

Download