Sorting UC Berkeley Fall 2004, E77 http://jagger.me.berkeley.edu/~pack/e77 Copyright 2005, Andy Packard. This work is licensed under the Creative Commons Attribution-ShareAlike License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/2.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA. Sorting Keeping data in “order” allows it to be searched more efficiently Example: Phone Book – Sorted by Last Name (“lots” of work to do this) • Easy to look someone up if you know their last name • Tedious (but straightforward) to find by First name or Address Important if data will be searched many times Two algorithms for sorting today – BubbleSort – MergeSort Searching: next lecture Bubble Sort (“Sink” sort here) If A(1)>A(2) If A(1)>A(2) If A(1)>A(2) switch switch switch If A(2)>A(3) If A(2)>A(3) If A(2)>A(3) switch switch switch If A(3)>A(4) If A(3)>A(4) If A(3)>A(4) switch switch switch If A(4)>A(5) If A(4)>A(5) If A(4)>A(5) switch switch switch … If A(N-3)>A(N-2) If A(N-3)>A(N-2) If A(N-3)>A(N-2) switch switch switch If A(N-2)>A(N-1) If A(N-2)>A(N-1) A(N-2) is now 3rd switch switch largest entry If A(N-1)>A(N) A(N-1) is now A(N-1) is still 2nd switch 2nd largest entry largest entry A(N) is now largest entry A(N) is still largest enry A(N) is still largest enry If A(1)>A(2) switch A(1) is now Nth largest entry. A(2) is still (N-1)th largest entry. A(3) is still (N-2)th largest entry. A(N-3) is still 4th largest entry A(N-2) is still 3rd largest entry A(N-1) is still 2nd largest entry A(N) is still largest entry Bubble Sort (“Sink” sort here) If A(1)>A(2) If A(1)>A(2) If A(1)>A(2) If A(1)>A(2) switch switch switch switch If A(2)>A(3) If A(2)>A(3) If A(2)>A(3) 1 step switch switch switch If A(3)>A(4) If A(3)>A(4) If A(3)>A(4) switch switch switch If A(4)>A(5) If A(4)>A(5) If A(4)>A(5) switch switch switch … If A(N-3)>A(N-2) If A(N-3)>A(N-2) If A(N-3)>A(N-2) switch switch switch If A(N-2)>A(N-1) If A(N-2)>A(N-1) N-3 steps switch switch If A(N-1)>A(N) N-2 steps N 1 switch N ( N 1) N 2 N-1 steps # of steps i i 1 2 2 Bubble Sort (“Sink” sort here) If A(1)>A(2) switch If A(2)>A(3) switch If A(3)>A(4) switch If A(4)>A(5) switch … If A(N-3)>A(N-2) switch If A(N-2)>A(N-1) switch If A(N-1)>A(N) switch If A(1)>A(2) switch If A(2)>A(3) switch If A(3)>A(4) switch If A(4)>A(5) switch If A(1)>A(2) switch If A(2)>A(3) switch If A(3)>A(4) switch If A(4)>A(5) switch If A(1)>A(2) switch If A(N-3)>A(N-2) If A(N-3)>A(N-2) switch switch If A(N-2)>A(N-1) switch for lastcompare=N-1:-1:1 for i=1:lastcompare if A(i)>A(i+1) Matlab code for Bubble Sort function S = bubblesort(A) % Assume A row/column; Copy A to S S = A; N = length(S); for lastcompare=N-1:-1:1 for i=1:lastcompare if S(i)>S(i+1) tmp = S(i); S(i) = S(i+1); What about returning S(i+1) = tmp; an Index vector Idx, end with the property that end S = A(Idx)? end Matlab code for Bubble Sort function [S,Idx] = bubblesort(A) % Assume A row/column; Copy A to S N = length(A); S = A; Idx = 1:N; % A(Idx) equals S for lastcompare=N-1:-1:1 for i=1:lastcompare if S(i)>S(i+1) tmp = S(i); tmpi = Idx(i); S(i) = S(i+1); Idx(i) = Idx(i+1); S(i+1) = tmp; Idx(i+1) = tmpi; end If we switch two entries of S, then exchange the same end two entries of Idx. This keeps A(Idx) equaling S end Merging two already sorted arrays Suppose A and B are two sorted arrays (different lengths) How do you “merge” these into a sorted array C? Chalkboard… Pseudo-code: Merging two already sorted arrays function C = merge(A,B) nA = length(A); nB = length(B); iA = 1; iB = 1; %smallest unused element C = zeros(1,nA+nB); for iC=1:nA+nB if A(iA)<B(iB) %compare smallest unused C(iC) = A(iA); iA = iA+1; %use A else C(iC) = B(iB); iB = iB+1; %use B end end # of "steps" n n A B MergeSort function S = mergeSort(A) n = length(A); Base Case if n==1 S = A; Split in half else hn = floor(n/2); Sort 1st half S1 = mergeSort(A(1:hn)); S2 = mergeSort(A(hn+1:end)); Sort 2nd half S = merge(S1,S2); end Merge 2 sorted arrays Rough Operation Count for MergeSort Let R(n) denote the number of operations necessary to sort (using mergeSort) an array of length n. function S = mergeSort(A) n = length(A); if n==1 R(n/2) to sort array R(1) = 0 S = A; of length n/2 else hn = floor(n/2); R(n/2) to sort array S1 = mergeSort(A(1:hn)); of length n/2 S2 = mergeSort(A(hn+1:end)); S = merge(S1,S2); n steps to merge two sorted end arrays of total length n Recursive relation: R(1)=0, R(n) = 2*R(n/2) + n Rough Operation Count for MergeSort The recursive relation for R R(1)=0, R(n) = 2*R(n/2) + n Claim: For n=2m, it is true that R(n) ≤ n log2(n) Case (m=0): true, since log2(1)=0 Case (m=k+1 from m=k) R2 2 2 R2 2 2 2 2 log 2 2 2 R2 k 1 k Recursive relation k k k k 2 2k 1 k 1 2 k 1 log 2 2 k 1 k Induction hypothesis Matlab command: sort Syntax is [S] = sort(A) If A is a vector, then S is a vector in ascending order The indices which rearrange A into S are also available. [S,Idx] = sort(A) S is the sorted values of A, and A(Idx) equals S.