TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) The Sorting Problem Content: Input: Lecture 8: Sorting part I: • A list L of data items with keys (the part of each data item we base our sorting on) Output: • A list L’ of the same data items placed in order, i.e.: – Intro: aspects of sorting, different strategies – Insertion Sort, Selection Sort, Quick Sort Lecture 9: Sorting part II: ∀i, j ∈{0...L −1}:i < j →L'[i] ≤ L'[ j] – Heap Sort, Merge Sort (Vilhelm Dahllöf) – A movie ”Sort out Sorting”: survey of 9 comparison-based sorting algorithms (Bengt Werstén) Caution! • Don’t over use sorting! • Do you really need to have it sorted, or will a dictionary do fine instead of a sorted array? Lecture 10: Sorting part III and Selection – Theoretical lower bound for comparison-based sorting, – BucketSort, RadixSort – Selection, median finding, quick select Jan Maluszynski - HT 2005 8.1 Jan Maluszynski - HT 2005 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Aspects of Sorting: Aspects of Sorting (cont.) • Internal vs. External sorting: • Can data be kept in fast, random accessed internal memory – or... • Sorting in-place vs. sorting with auxiliary data structures What is the ”expected case”? In some applications we may never have a really bad worst case – pick algorithm accordingly! • Sorting by comparison vs. Sorting digitally compare keys, or use e.g. Binary representation of data as sorting criteria? Does the sorting algorithm need extra data structures? Often the stack is used as a ”hidden” extra structure! • • Worst-case vs. expected-case performance Stable vs. unstable sorting What happens with multiple occurrences of the same key? How does the algorithm behave in different situations? • Jan Maluszynski - HT 2005 8.2 8.3 ”Quick’n’Dirty” vs. Efficient but hard-to-remember... Jan Maluszynski - HT 2005 8.4 1 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Different strategies used when sorting... (Linear) insertion sort • Insertion sorts: ”In each iteration, insert the first item from unsorted part Its proper place in the sorted part” For each new element to add to the sorted set, look for the right place in that set to put the element... Linear insertion, Binary insertion, Shell sort, ... An in-place sorting algorithm! • Selection sorts: In each iteration, search the unsorted set for the smallest (largest) remaining item to add to the end of the sorted set Straight selection, Tree selection1, Heap sort, ... • Exchange sorts: Browse back and forth in some pattern, and whenever we are looking at a pair with wrong relative order, swap them... Bubble sort, Shaker sort, Quick sort, Merge sort1... 1) Requires extra data structures apart from the stack Jan Maluszynski - HT 2005 1 2 3 4 5 • • • • • • • 8.5 Data stored in A[0..n-1] i Iterate i from 1 to n-1: • The table consist of: – Sorted data in A[0.. i -1] – Unsorted data in A[i..n-1] • Scan sorted part for index s for insertion of the selected item • Increase i TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Analysis of Insertion Sort (Straight) selection sort i 8.6 ”In each iteration, search the unsorted set for the smallest Procedure InsertionSort (table A[0..n-1]): for i from 1 to n-1 do s ← i; x ← A[i] while j ≥1 and A[j-1]>x do A[j] ← A[j-1] ; j ← j-1 A[j] ↔ x Jan Maluszynski - HT 2005 s Jan Maluszynski - HT 2005 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) t1: n-1 passes over this ”constant speed” code t2: n-1 passes... t3: I = worst case no. of iterations in inner loop: I = 1+2+…+n-1 = (n-2)(n-1)/2 = n2-3n+2 t4: I passes t5: n-1 passes T: t1+t2+t3+t4+t5 = 3*(n-1)+2*(n2-3n+2) = 3n-3+2n2-6n+4 = 2n2- 3n+1 ...thus we have an algorithm in O(n2) in worst case, but …. good if file almost sorted i remaining item to add to the end of the sorted set” An in-place sorting algorithm! Data stored in A[0..n-1] Iterate i from 1 to n-1: • The table consist of: – Sorted data in A[0.. i -1] – Unsorted data in A[i..n-1] • Scan unsorted part for index s for smallest remaining item • Swap places for A[i] and A[s] i i s i i is is s 8.7 Jan Maluszynski - HT 2005 8.8 2 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Analysis of Straight selection 1 2 3 4 5 Procedure StraightSelection (table A[0..n-1]): for i from 0 to n-2 do s←i for j from i+1 to n-1 do if A[j] < A[s] then s ← j A[i] ↔ A[s] TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Is this analysis good enough? Can we compare algs. of similar order? How expensive is... •Index comparison •Data comparison •Data copying ...are they comparable or quite different? • t1: n-1 passes over this ”constant speed” code • t2: n-1 passes... • t3: I = no. of iterations in inner loop: I = n-2 + n-3 + n-4 +...+1 = (n-2)(n-1)/2 = n2-3n+2 •Worst case? •Best case? • t4: I passes •Expected case? • t5: n-1 passes • T: t1+t2+t3+t4+t5 = 3*(n-1)+2*(n2-3n+2) = 3n-3+2n2-6n+4 = 2n2- 3n+1 • • • • • • • Procedure StraightSelection (table A[0..n-1]): for i from 0 to n-2 do s←i for j from i+1 to n-1 do if A[j] < A[s] then s ← j A[i] ↔ A[s] 1 2 3 4 5 We may redo the analysis and differentiate between • Cheap operations as assignment and comparison of index or pointers • Different levels of ”expensive” operations such as – – – Procedure calls Comparison of data Copying of data 8.9 Jan Maluszynski - HT 2005 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Quick Sort - overview Divide–and–conquer principle 1. divide-and-conquer principle 2. example, basic ideas 3. quick sort algorithm, top–down 4. examples: worst and best case 5. randomization principle 6. randomized quick sort 7. fine tuning – make it faster! Jan Maluszynski - HT 2005 Is this an expensive op? It’s allways called if data is in reverse order! ”worst case”! ...and we may then find two O(x) algorithms to be quite different... ...thus we have an algorithm in O(n2) ...rather bad! Jan Maluszynski - HT 2005 Analysis of Straight selection – details? 8.10 1. divide a problem into smaller, independent subproblems 2. conquer: solve the sub-problems recursively (or directly if trivial) 3. combine the solutions of the sub-problems 8.11 Jan Maluszynski - HT 2005 8.12 3 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Quick Sort Example – basic idea Quick sort: Partitioning an array in–place Procedure QuickSort (table A[l : r]): 1. If l ≥ r return 2. select some element of A, e.g. A[l], as the so–called pivot element: p ← A[l]; 3. partition A in–place into two disjoint sub-arrays AL, AR: m ← partition( A[l : r], p ); { determines m, l<m<r, and reorders A[l : r], such that all elements in A[l : m] are now ≤ p and all in A[m+1 : r] are now ≥ p.} 4. apply the algorithm recursively to AL and AR: quicksort ( A[l : m]); {sorts AL } quicksort ( A[m +1 : r]); {sorts AR } Jan Maluszynski - HT 2005 int partition ( array <key> A[l : r], key p ) { the pivot element p is A[l] } i ← l-1; j ← r+1; while ( true ) do do i ← i+1 while A[i] < p do j ← j -1 while A[j] > p if (i < j) A[i] ↔ A[j] else return j; • This code will scan through the entire set once, and will as a max move each element once! ...thus: • Running time of partition: Θ(r – l + 1) 8.13 Jan Maluszynski - HT 2005 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Warning – details matter! Quick Sort - Analysis • Book: right most element as pivot, swaps it in at end, recurses at either side excluding the old pivot 8.14 Run time as a recursive expression. We implicitly build a search tree! What is the worst case? • Film: left most element as pivot, swaps it in at end recurses at either side excluding the old pivot Expression reformulated to select a ”worst” partition! • Slides: left most as pivot, includes it in area to partition, returns one position containing an element of size equal to the pivot – recurse on both halves including the pivot ...and the way i’s and j’s are compared (< or ≤), if they are incremented (decremented) before or after comparison, etc... Jan Maluszynski - HT 2005 8.15 Jan Maluszynski - HT 2005 8.16 4 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Quick Sort – Analysis – worst case... Quick Sort – Analysis – best case... If the pivot element happens to be the min or max element of A in each call to quicksort... (e.g., pre-sorted data) • Best – balanced search tree! How...? • If pivot is median of data set! – Unbalanced recursion tree – Recursion depth becomes n Sorting nearly–sorted arrays occurs frequently in practice Jan Maluszynski - HT 2005 8.17 Jan Maluszynski - HT 2005 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Quick sort: apply randomization Quick sort – fine tuning.... Randomization algorithmic design principle • applicable where choosing among several alternative directions • to avoid long sequences of bad decisions with high probability, independently of the input • simplifies the average case analysis Select pivot randomly (not first, not last...) p ← A[random(l,r)]; ⇒ running time not only dependant on good input data ⇒ can not construct bad input data... Jan Maluszynski - HT 2005 8.19 8.18 Median-of-three and sentinels... • Inner loop of partition fn should be: i ← i+1; while i ≤ r and A[i] < p do i ← i+1; j ← j-1; while j ≥ l and A[j] > p do j ← j-1; • Improve: – – – Sort the first, middle and last elements of A Use the content of middle element as pivot value Data set now has sentinels at the end (values selected to stop the iteration) and we may remove extra test i ≤ r and j ≥ l . – Probability of the middle value to be a bad pivot is low! Jan Maluszynski - HT 2005 8.20 5 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Quick sort – fine tuning.... Quick sort – fine tuning.... Reduce need for extra space – upper bound on stack! When only a few elements remain (e.g., |A| < 4)... • Observations: – Large partition lead to a large stack depth – Does not matter in which order we perform recursive calls (left part of A before right part or vice versa) • Over head for recursion becomes significant • Entire A is almost sorted (except for small, locally unsorted sections) • Enhancements: – Replace last (tail-) recursive call with iteration (reusing the same procedure call), leave first recursive call as is – Select the larger part of A for the repeated iteration, and use recursion for the smaller part of A Stop sorting by QuickSort, perform one global sort using Linear InsertionSort– although O(n2) worst case, much better on allmost sorted data., which is the case now! ...and we have the worst maximum stack depth when we have a balanced search tree, i.e. O(log n). Jan Maluszynski - HT 2005 8.21 Jan Maluszynski - HT 2005 8.22 TDDB56 DALGOPT-D – Lecture 8 – Sorting (part I) Straight Insertion – the good case? • If table is almost sorted? E.g., max 3 items unsorted, then remainder are bigger? Procedure InsertionSort(table A[0..n-1]): 1 2 3 4 5 6 for i from 1 to n-1 do j ← i; tmp ← A[i] while j>0 and tmp < A[j-1] do j ← j-1 A[j+1] ← A[j] A[j] ← tmp • t1: n-1 passes over this ”constant speed” code • t2: n-1 passes... • T3+4+5: I = no. of iterations in inner loop (max 3 elements ”totaly unsorted”): I = (n-1)*3 worst case, all three allways in reverse order • t6: n-1 passes • T: t1+t2+t3+4+5+t6 = 3*(n-1)+3*(n-1)= 3n-3 ...thus we have an algorithm in O(n) ...rather good! Jan Maluszynski - HT 2005 8.23 6