The Sorting Problem Content: Input: ≤

advertisement
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
The Sorting Problem
Content:
Input:
Lecture 8: Sorting part I:
• A list L of data items with keys (the part of each data item
we base our sorting on)
Output:
• A list L’ of the same data items placed in order, i.e.:
– Intro: aspects of sorting, different strategies
– Insertion Sort, Selection Sort, Quick Sort
Lecture 9: Sorting part II:
∀i, j ∈{0...L −1}:i < j →L'[i] ≤ L'[ j]
– Heap Sort, Merge Sort (Vilhelm Dahllöf)
– A movie ”Sort out Sorting”: survey of 9 comparison-based
sorting algorithms (Bengt Werstén)
Caution!
• Don’t over use sorting!
• Do you really need to have it sorted, or will a
dictionary do fine instead of a sorted array?
Lecture 10: Sorting part III and Selection
– Theoretical lower bound for comparison-based sorting,
– BucketSort, RadixSort
– Selection, median finding, quick select
Jan Maluszynski - HT 2005
8.1
Jan Maluszynski - HT 2005
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Aspects of Sorting:
Aspects of Sorting (cont.)
• Internal vs. External sorting:
•
Can data be kept in fast, random accessed internal memory – or...
• Sorting in-place vs. sorting with auxiliary data
structures
What is the ”expected case”?
In some applications we may never have a really bad worst case –
pick algorithm accordingly!
•
Sorting by comparison vs. Sorting digitally
compare keys, or use e.g. Binary representation of data as sorting
criteria?
Does the sorting algorithm need extra data structures?
Often the stack is used as a ”hidden” extra structure!
•
• Worst-case vs. expected-case performance
Stable vs. unstable sorting
What happens with multiple occurrences of the same key?
How does the algorithm behave in different situations?
•
Jan Maluszynski - HT 2005
8.2
8.3
”Quick’n’Dirty” vs. Efficient but hard-to-remember...
Jan Maluszynski - HT 2005
8.4
1
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Different strategies used when sorting...
(Linear) insertion sort
• Insertion sorts:
”In each iteration, insert the first item from unsorted part
Its proper place in the sorted part”
For each new element to add to the sorted set, look for the right
place in that set to put the element...
Linear insertion, Binary insertion, Shell sort, ...
An in-place sorting algorithm!
• Selection sorts:
In each iteration, search the unsorted set for the smallest (largest)
remaining item to add to the end of the sorted set
Straight selection, Tree selection1, Heap sort, ...
• Exchange sorts:
Browse back and forth in some pattern, and whenever we are
looking at a pair with wrong relative order, swap them...
Bubble sort, Shaker sort, Quick sort, Merge sort1...
1) Requires extra data structures apart from the stack
Jan Maluszynski - HT 2005
1
2
3
4
5
•
•
•
•
•
•
•
8.5
Data stored in A[0..n-1]
i
Iterate i from 1 to n-1:
• The table consist of:
– Sorted data in A[0.. i -1]
– Unsorted data in A[i..n-1]
• Scan sorted part for index s
for insertion of the selected item
• Increase i
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Analysis of Insertion Sort
(Straight) selection sort
i
8.6
”In each iteration, search the unsorted set for the smallest
Procedure InsertionSort (table A[0..n-1]):
for i from 1 to n-1 do
s ← i; x ← A[i]
while j ≥1 and A[j-1]>x do
A[j] ← A[j-1] ; j ← j-1
A[j] ↔ x
Jan Maluszynski - HT 2005
s
Jan Maluszynski - HT 2005
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
t1: n-1 passes over this ”constant speed” code
t2: n-1 passes...
t3: I = worst case no. of iterations in inner loop:
I = 1+2+…+n-1 = (n-2)(n-1)/2 = n2-3n+2
t4: I passes
t5: n-1 passes
T: t1+t2+t3+t4+t5 = 3*(n-1)+2*(n2-3n+2) = 3n-3+2n2-6n+4 = 2n2- 3n+1
...thus we have an algorithm in O(n2) in worst case, but ….
good if file almost sorted
i
remaining item to add to the end of the sorted set”
An in-place sorting algorithm!
Data stored in A[0..n-1]
Iterate i from 1 to n-1:
• The table consist of:
– Sorted data in A[0.. i -1]
– Unsorted data in A[i..n-1]
• Scan unsorted part for index s for
smallest remaining item
• Swap places for A[i] and A[s]
i
i
s
i i is is s
8.7
Jan Maluszynski - HT 2005
8.8
2
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Analysis of Straight selection
1
2
3
4
5
Procedure StraightSelection (table A[0..n-1]):
for i from 0 to n-2 do
s←i
for j from i+1 to n-1 do
if A[j] < A[s] then s ← j
A[i] ↔ A[s]
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Is this analysis good
enough? Can we
compare algs. of
similar order?
How expensive is...
•Index comparison
•Data comparison
•Data copying
...are they
comparable or quite
different?
• t1: n-1 passes over this ”constant speed” code
• t2: n-1 passes...
• t3: I = no. of iterations in inner loop:
I = n-2 + n-3 + n-4 +...+1 = (n-2)(n-1)/2 = n2-3n+2 •Worst case?
•Best case?
• t4: I passes
•Expected case?
• t5: n-1 passes
• T: t1+t2+t3+t4+t5 = 3*(n-1)+2*(n2-3n+2) = 3n-3+2n2-6n+4 = 2n2- 3n+1
•
•
•
•
•
•
•
Procedure StraightSelection (table A[0..n-1]):
for i from 0 to n-2 do
s←i
for j from i+1 to n-1 do
if A[j] < A[s] then s ← j
A[i] ↔ A[s]
1
2
3
4
5
We may redo the analysis and differentiate between
•
Cheap operations as assignment and comparison of index or
pointers
•
Different levels of ”expensive” operations such as
–
–
–
Procedure calls
Comparison of data
Copying of data
8.9
Jan Maluszynski - HT 2005
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Quick Sort - overview
Divide–and–conquer principle
1. divide-and-conquer principle
2. example, basic ideas
3. quick sort algorithm, top–down
4. examples: worst and best case
5. randomization principle
6. randomized quick sort
7. fine tuning – make it faster!
Jan Maluszynski - HT 2005
Is this an
expensive op?
It’s allways
called if data is
in reverse order!
”worst case”!
...and we may then find two O(x) algorithms to be quite different...
...thus we have an algorithm in O(n2) ...rather bad!
Jan Maluszynski - HT 2005
Analysis of Straight selection – details?
8.10
1. divide a problem into smaller, independent subproblems
2. conquer: solve the sub-problems recursively (or
directly if trivial)
3. combine the solutions of the sub-problems
8.11
Jan Maluszynski - HT 2005
8.12
3
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Quick Sort Example – basic idea
Quick sort: Partitioning an array in–place
Procedure QuickSort (table A[l : r]):
1. If l ≥ r return
2. select some element of A, e.g. A[l],
as the so–called pivot element: p ← A[l];
3.
partition A in–place into two disjoint sub-arrays AL, AR:
m ← partition( A[l : r], p );
{ determines m, l<m<r, and
reorders A[l : r], such that all
elements in A[l : m] are now ≤ p
and all in A[m+1 : r] are now ≥ p.}
4.
apply the algorithm recursively
to AL and AR:
quicksort ( A[l : m]);
{sorts AL }
quicksort ( A[m +1 : r]); {sorts AR }
Jan Maluszynski - HT 2005
int partition ( array <key> A[l : r], key p )
{ the pivot element p is A[l] }
i ← l-1;
j ← r+1;
while ( true ) do
do i ← i+1 while A[i] < p
do j ← j -1 while A[j] > p
if (i < j)
A[i] ↔ A[j]
else
return j;
• This code will scan through the entire set once, and will as a max
move each element once!
...thus:
• Running time of partition: Θ(r – l + 1)
8.13
Jan Maluszynski - HT 2005
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Warning – details matter!
Quick Sort - Analysis
• Book: right most element as pivot, swaps it in at end,
recurses at either side excluding the old pivot
8.14
Run time as a
recursive expression.
We implicitly build a
search tree!
What is the worst
case?
• Film: left most element as pivot, swaps it in at end
recurses at either side excluding the old pivot
Expression
reformulated to select
a ”worst” partition!
• Slides: left most as pivot, includes it in area to partition,
returns one position containing an element of size equal
to the pivot – recurse on both halves including the pivot
...and the way i’s and j’s are compared (< or ≤), if they are
incremented (decremented) before or after comparison, etc...
Jan Maluszynski - HT 2005
8.15
Jan Maluszynski - HT 2005
8.16
4
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Quick Sort – Analysis – worst case...
Quick Sort – Analysis – best case...
If the pivot element happens
to be the min or max element
of A in each call to quicksort...
(e.g., pre-sorted data)
• Best – balanced search tree! How...?
• If pivot is median of data set!
– Unbalanced recursion tree
– Recursion depth becomes n
Sorting nearly–sorted arrays
occurs frequently in practice
Jan Maluszynski - HT 2005
8.17
Jan Maluszynski - HT 2005
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Quick sort: apply randomization
Quick sort – fine tuning....
Randomization algorithmic design principle
• applicable where choosing among several alternative
directions
• to avoid long sequences of bad decisions with high
probability, independently of the input
• simplifies the average case analysis
Select pivot randomly (not first, not last...)
p ← A[random(l,r)];
⇒ running time not only dependant on good input data
⇒ can not construct bad input data...
Jan Maluszynski - HT 2005
8.19
8.18
Median-of-three and sentinels...
• Inner loop of partition fn should be:
i ← i+1; while i ≤ r and A[i] < p do i ← i+1;
j ← j-1; while j ≥ l and A[j] > p do j ← j-1;
•
Improve:
–
–
–
Sort the first, middle and last elements of A
Use the content of middle element as pivot value
Data set now has sentinels at the end (values selected to stop the
iteration) and we may remove extra test i ≤ r and j ≥ l .
–
Probability of the middle value to be a bad pivot is low!
Jan Maluszynski - HT 2005
8.20
5
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Quick sort – fine tuning....
Quick sort – fine tuning....
Reduce need for extra space – upper bound on stack!
When only a few elements remain (e.g., |A| < 4)...
• Observations:
– Large partition lead to a large stack depth
– Does not matter in which order we perform recursive calls (left
part of A before right part or vice versa)
• Over head for recursion becomes significant
• Entire A is almost sorted (except for small, locally
unsorted sections)
• Enhancements:
– Replace last (tail-) recursive call with iteration (reusing the same
procedure call), leave first recursive call as is
– Select the larger part of A for the repeated iteration, and use
recursion for the smaller part of A
Stop sorting by QuickSort, perform one global sort using
Linear InsertionSort– although O(n2) worst case, much
better on allmost sorted data., which is the case now!
...and we have the worst maximum stack depth when we have a
balanced search tree, i.e. O(log n).
Jan Maluszynski - HT 2005
8.21
Jan Maluszynski - HT 2005
8.22
TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I)
Straight Insertion – the good case?
• If table is almost sorted? E.g., max 3 items unsorted, then
remainder are bigger?
Procedure InsertionSort(table A[0..n-1]):
1
2
3
4
5
6
for i from 1 to n-1 do
j ← i; tmp ← A[i]
while j>0 and tmp < A[j-1] do
j ← j-1
A[j+1] ← A[j]
A[j] ← tmp
• t1: n-1 passes over this
”constant speed” code
• t2: n-1 passes...
• T3+4+5: I = no. of iterations in inner loop (max 3 elements ”totaly
unsorted”):
I = (n-1)*3  worst case, all three allways in reverse order
• t6: n-1 passes
• T: t1+t2+t3+4+5+t6 = 3*(n-1)+3*(n-1)= 3n-3
...thus we have an algorithm in O(n) ...rather good!
Jan Maluszynski - HT 2005
8.23
6
Download