Sorting list stored as array sort on key that uniquely represents a record count comparisons for work determine if inplace or not! INSERTION SORT or push down sort - Worst case for each insert max comps are i-1 w(n) = sum i = 2 to n (i - 1) = [n(n-1)]/2 O(n ** 2) - Average case all keys unique all permutations of keys are equally likely i positions where x might go 1/i chance that the number belongs in any position so work for one insertion so amount of total work for all n-1 insertions j = i + 1 in the last sum. An in place sort - Lower Bound on the behavior of Certain Sorting Algorithms Let the basic operation be: compare adjacent keys and swap if necessary Show all algorithms that do a limited swapping must do a certain amount of work! There are n! permutations on n items There is one permutation for which the list is sorted. The original list of keys is a permutation. inversion 2, 4, 1, 5, 3 has inversions (2,1), (4,1), (5,3),(4,3) Sorting algorithm removes at most one inversion after each key comparison. So the number of comparisons performed on the input is at least the number of inversions. So consider inversions. There is a permutation that has n(n-1)/2 inversions (5,4,3,2,1) So worst case behavior must be omega(n ** 2) Lower bound worst case is this, can't be done in fewer comparisons or less work. Consider the average lower bound. Done by considering the average number of inversions. Each permutation has a transpose. There is one inversion for every pair of numbers either in the transpose or in the original permutation. There are n (n-1)/2 such pairs of integers thus on the average there will be n(n-1)/4 inversions. So the best we can do is about n**2 /4. The lower bound! Section 4.3 Quicksort next! Quicksort and Mergesort: Divide and Conquer Quick sort strategy Worst case analysis: Divides list into 0 and k-1, pivot is the smallest value. Does n**2 work in worst case Up to 2n-3 inversions can be removed with one swap, max with split. 5 4 3 2 1 has 4+3+2+1=10 inversions, 4 3 2 1 5 has 3+2+1=6 inversions. Average Behavior all keys distinct all permutations equally likely k is the number of keys in a section t sort A(k) avg number of comparisons done for lists of k size let x get placed in the ith position after split executes split does k-1 comparisons each sublist has i-1 keys and k-i keys respectively each position for split point i has i/k probability so initial k = n This simplifies to: Solve the recurrence relation by: 1) guess with induction proof (establishes there are enough good cases, that the average behavior is realized) 2) by manipulation Divide and conquer - general technique Prove there are enough permutations of numbers that allow Quick sort to realize the computed average performance: That is A(n) <= c n ln n for n >= 1 A(n) n 1 (1/ n) * ( A(i) A(n 1 i)) n1 i 0 So n1 A(n) n 1 2 / n A(i) i 1 Proof: induction on n. Base case: n = 1 A(1) = 0 c 1 ln 1 = 0 for n > 1 we know by the recurrence relation and induction hypothesis We can bound the sum by integrating. n 1 n i 1 i ci ln i c x ln xdx Equation 1.15 gives n ci ln i c x ln xdx n1 i 1 i Equation 1.15 gives n x ln xdx n ln(n) n 1 1 1 2 2 2 4 So 2c 1 n 2 A(n) n 1 (n ln n n ) 1 2 2 4 c cn ln n n(1 ) 1 2 ln n .693lg n which makes 1.386n lg n --------------------------------------------------------------------------------------------------- now solve the recurrence relation directly. we know n1 A(n) n 1 2 / n A(i) i 1 and n2 A(n 1) n 2 2 /(n 1) A(i) i 1 subtract n-1 times 2nd from n times first is n1 n2 i 1 i 1 nA(n) (n 1) A(n 1) n(n 1) 2 A(i) (n 1)(n 2) 2 A(i) which is n 2 n 2 i 1 i 1 nA(n) (n 1) A(n 1) n(n 1) 2 A(i) 2 A(n 1) (n 1)(n 2) 2 A(i) and nA(n) (n 1) A(n 1) n(n 1) 2 A(n 1) (n 1)(n 2) So nA(n) (n 1) A(n 1) (n 1)(n n 2) 2 A(n 1) nA(n) (n 1) A(n 1) 2(n 1) 2 A(n 1) And nA(n) nA(n 1) A(n 1) 2(n 1) 2 A(n 1) nA(n) nA(n 1) 2(n 1) A(n 1) nA(n) nA(n 1) 2(n 1) A(n 1) nA(n) (n 1) A(n 1) 2(n 1) A( n ) A ( n 1) n 1 2 ( n 1) n ( n 1) n if we let B ( n) A( n ) n1 the recurrence relation for B is B(n) B(n 1) 2 ( n 1) n ( n 1) B(n 1) B(n 2) 2 ( n2 ) ( n 1) n 2 ( n 3 ) B(n 2) B(n 3) ( n 2 )( n 1) B(n) B(n 3) 2 ( n 3 ) ( n 2 )( n 1) 2 ( n2 ) n 1 ( n 1)( n ) 2 ( n 1) n ( n 1) So show n B ( n) 2 i 2 i 2 i ( i 1) n 1 2 4 i 1 i i 1 i ( i 1) n B(n) n 2 ( i 111) 2 i 2 i 2 i ( i 1) n 2 ( i 111) i 2 i ( i 1) n 2 ( i 12 ) i 2 i ( i 1) n 2 ( i 1)4 i 2 i ( i 1) n 2 ( i 1) i 2 i ( i 1) i ( i 1) i 2 n 2 ( i 12 ) i 2 i ( i 1) n 2 ( i 1)4 i 2 i ( i 1) n 2 ( i 1) n 2 i 2 i ( i 1) 4 4 i ( i 1) n 1 4 i ( i 1) (Thanks to Eun Jin) i 2 i i 2 i ( i 1) n B ( n) 2 i 2 i 2 i ( i 1) n 1 n 2 4 i 2 i 1 i 2 i ( i 1) Which is 4n B( n) 2(ln n 0.577 ) n 1 From summation formula 1.11 on page 22 We have A(n) (n 1) B(n) And Therefore A(n) 1.386 n lg n 2.846 n from ln n .693lg n which makes 1.386n lg n Improvements on quicksort - different key (middle) - sort using a different method on small size - manipulate stack yourselves - stack larger interval, process the smaller General technique - solve several small instances rather than one large one recursively break the problem down solve small cases combine small solutions S(n) steps done in direct solution D(n) steps done in breaking the problem donw C(n) steps to combine Recurrence relation for work done Example for quick sort k = 2 -- divided in two pieces each time smallsize = 1 s(1) = 0 D(n) = n-1 -- had to consider n-1 elements in the interval C(n) = 0 so 0(nlgn) terms in the recurrence relation Merge sort of Sorted lists 2 sorted lists - merge into one measure of work - comparison of keys Worst case n in list A, m in list B does n+m-1 comparisons worst case is when the last item in A or B goes into C Show for n = m that 2n-1 is optimal Simply - the proof states that the last two items of each list must be compared in order to determine the correct ordering, thus the 2n-1 is optimal. Space usage: O(n) when m = n Now merge sort since we know how to merge two lists A Divide and conquer paradigm Similar analysis to Quick sort, but Quick sort does not always divide the list evenly. Merge sort will. W (n) W ( n / 2 W ( n / 2 n 1 W (1) 0 This is a similar relation to what we had for quick sort, thus: W (n) n lg n n A worst case performance W (n) (n lg n) merge sort is not an in place sort Section 4.7 Lower Bounds for Sorting by comparisons of keys Consider a decision tree: n keys x1, x2, ..., xn each comparison has a 2 way branch Example decision tree for n = 3 p. 178 Figure 4.15 1:2 2:3 1:3 1,3,2 2:3 2,1,3 1:3 1,2,3 3,1,2 2,3,1 3,2,1 Every sort algorithm will follow one path through the tree. There must be n! leaves/paths for all possible permutations. The number of comparisons is the depth of the tree. For the tree where n = 3, 3 comparisons is the worst case. 2 + 3 + 3 + 2 + 3 + 3 = 16 / 6 = 2 2/3 average performance. Section 4.7.2 Derive the lower bound in terms of the number of leaves. L - leaves in the tree h - height of tree then L <= 2h Proof: induct on h when h = 1, l = 1, l <= 20 assume L <= 2h what happens for h + 1. To add a depth of 1 demands adding two leaves. Now, h lg L by taking the log of both sides and since h is an integer. Prove the depth must be at least lg n! In our decision tree L = n! So, the number of comparisons needed to sort in worst case is at least lg n! How close is lg n! n lg n to we know n! n(n 1)...(n / 2 n / 2 n/2 so n n 2 2 lg n! lg which is (n lg n) To get a closer lower bound observe n lg n! lg( j ) n 1 Equation 1.18 page 27 established n lg n! lg( j ) n lg n n(lg e) n 1 Where e is base of the natural log and lg(e) is 1.433. so the height of the decision tree is at least n lg n 1.443n and thus any sorting algorithm must do at least that much work. Merge sort is close to optimal! Consider n = 5, insertion sort does 10 comparisons, mergesort does 8 but lg 5! = lg 120 = 7, hence the lower bound is better than merge sort does. Try to find a sort of 5 keys in 7 comparisons in worst case. Section 4.7.3 Lower bound for Average Behavior Consider the External Path Length - EPL Our decision trees are 2 trees - each node has 2 or 0 children EPL is minimized if all leaves are on at most two adjacent levels. The average path length from the root to a leaf is epl/L for 2 trees with L leaves. We are looking for a lower bound on epl. Trees that minimize epl are as balanced as possible. So 2 trees with L leaves have external path length about LlgL. So we have argued that the average path length to a leaf is at least lgL. Therefore, the average number of comparisons done by any algorithm to sort n items by comparison of keys is at least lg(n!), which is about nlgn-1.443n. No algorithm can do better. That is, this is the best we can do on the average. Quick sort cannot guarantee worst case performance, Merge sort is not in place (requires sizable external storage). Heap Sort combines the best of each, but with a little larger constant. Heap Sort build heap fix heap Creates a partial order in a complete tree with some right most nodes missing. Essentially a priority queue ADT. S is a set of elements of integer keys. T is a binary tree h is the height of the tree HEAP a binary tree T is a heap iff T is complete through depth h-1 all leaves are at depth h or h-1 all paths to a leaf of depth h are to the left of all paths to a leaf of depth h-1…. that is, T is a left-complete tree. Partial order T is partial order tree iff the node is >= its children. heapSort(E,n) Construct H from E, the set of n elements to be sorted. for(i = n, i >= 1; i--) curMax = get Max(H); deleteMax(H); E[i] = curMax; // place Max back into array of elements deleteMax(H) Copy the rightmost element on the lowest level of H into K Delete the right most element on the lowest level of H fixHeap(H,K) Fix Heap 50 24 30 21 20 12 5 6 18 3 6 24 30 21 20 3 18 5 12 30 24 6 21 20 12 5 6 18 3 30 24 18 21 20 12 5 6 3 6 The fixHeap procedure does 2h comparisons of keys in the worst case on a heap with height h. Construct heap starts at the bottom and recursively fixHeap. While loop iterates to depth of tree, 2 comparisons each iteration. Worst case: “construct heap” depends on the cost of fixHeap. n number of numbers H heap structure fixHeap requires about 2 lg(n) comparisons (lgn is the height of the tree. r the number of nodes in the right subheap of H, we have W(n) = W(n-r-1) + W(r) + 2lgn for n > 1 fix left subtree, fix right subtree, 2lgn each pass. First we solve the above recurrence relation for N 2 d 1 Which is for complete binary trees, W(N) is an upper bound on W(n) for almost complete binary trees. For N 2 d 1 the numbers of nodes in the right and left subtrees are equal. The recurrence relation becomes: 1 W ( N ) 2W ( ( N 1)) 2 lg( N ) 2 Applying the master theorem 3.17 with b =2, c=2, E=1 and f(N)=2lgN it follows that W(N) = (N ) ie. the heap can be built in linear time. The number of comparisons done by fixHeap on a heap with k nodes is at most so most the lg k total of all n 1 n k 1 1 deletions is at 2 lg k 2 (lg e) ln xdx = 2(lge) (nlnn-n) = 2(nlgn-1.443n) So, the heap construction phase does at most O(n) c0mparisons and the deletions do at most 2nlgn in both average and worst case.