Searching and Sorting Gary Wong Prerequisite • Time complexity • Pseudocode • (Recursion) Agenda • Searching – Linear (Sequential) Search – Binary Search • Sorting – – – – Bubble Sort Merge Sort Quick Sort Counting Sort Linear Search One by one... Linear Search • Check every element in the list, until the target is found • For example, our target is 38: i 0 1 2 3 4 5 a[i] 25 14 9 38 77 45 Not found! Found! Linear Search 1) Initilize an index variable i 2) Compare a[i] with target • • If a[i]==target, found If a[i]!=target, • • If all have checked already, not found Otherwise, change i into next index and go to step 2 Linear Search • Time complexity in worst case? – If N is number of elements, – Time complexity = O(N) • Advantage? • Disadvantage? Binary Search Chop by half... Binary Search • Given a SORTED list: • (Again, our target is 38) Smaller! Found! i 0 1 2 3 4 5 a[i] 9 14 25 38 45 77 L Larger! R Binary Search • Why always in the middle, but not other positions, say one-third of list? 1) Initialize boundaries L and R 2) While L is still on the left of R • • • • mid = (L+R)/2 If a[mid]>Target, set R be m-1 and go to step 2 If a[mid]<Target, set L be m+1 and go to step 2 If a[mid]==Target, found Binary Search • Time complexity in the worst case? – If N is the number of elements, – Time complexity = O(lg N) – Why? • Advantage? • Disadvantage? Example: Three Little Pigs • HKOI 2006 Final Senior Q1 • Given three lists, each with M numbers, choose one number from each list such that their sum is maximum, but not greater than N. • Constraints: – M ≤ 3000 – Time limit: 1s Example: Three Little Pigs • • • • How many possible triples are there? Why not check them all? Time complexity? Expected score = 50 Example: Three Little Pigs • A simpler question: How would you search for the maximum number in ONE SORTED list such that it is not greater than N? • Binary search! – With slight modification though – How? Example: Three Little Pigs • Say, N=37 i 0 1 2 3 4 5 a[i] 9 14 25 38 45 77 L R Example: Three Little Pigs • Let’s go back to original problem • If you have chosen two numbers a1[i] and a2[j] already, how would you search for the third number? • Recall: How would you search for the maximum number in ONE SORTED list such that it is not greater than N-a1[i]-a2[j]? Example: Three Little Pigs • Overall time complexity? • Expected score = 90 • Example: Three Little Pigs • Slightly harder: Given TWO lists, each with M numbers, choose one number from each list such that their sum is maximum, but not greater than N. • Linear search? • Sort one list, then binary search! • Time complexity = O(M lg M) – O(M2) if less efficient sorting algorithm is used • But, can it be better? Example: Three Little Pigs • Why is it so slow if we use linear search? • If a1[i] and a2[j] are chosen, and their sum is smaller than N: – Will you consider any number in a1 that is smaller than or equal a1[i]? • If a1[i] and a2[j] are chosen, and their sum is greater than N: – Will you consider any number in a2 that is greater than or equal to a2[j]? Example: Three Little Pigs • Recall: Why is it so slow if we use linear search? – Because you use it for too many times! • At which number in each list should you begin the linear search? • Never look back at those we do not consider! • Time complexity? • Expected score = 100 What can you learn? • Improve one ‘dimension’ using binary search • Linear search for a few times can be more efficient than binary search for many times! – DO NOT underestimate linear search!!! Points to note • To use binary search, the list MUST BE SORTED (either ascending or decending) – NEVER make assumptions yourself – Problem setters usually do not sort for you • Sorting is the bottleneck of efficiency • But... how to sort? How to sort? • For C++: sort() • Time complexity for sort() is O(N lg N) – which is considered as efficient • HOWEVER... – Problem setters SELDOM test contestants on pure usage of efficient sorting algorithms – Special properties of sorting algorithms are essential in problem-solving • So, pay attention! Bubble Sort Smaller? Float! Larger? Sink! Bubble Sort • Suppose we need to sort in ascending order... • Repeatedly check adjacent pairs sequentially, swap if not in correct order • Example: 18 9 20 11 77 45 Incorrect order, swap! • The last number is always the largest Correct order, pass! Bubble Sort • Fix the last number, do the same procedures for first N-1 numbers again... • Fix the last 2 numbers, do the same procedures for first N-2 numbers again... • ... • Fix the last N-2 numbers, do the same procedures for first 2 numbers again... Bubble Sort for i -> 1 to n-1 for j -> 1 to n-i if a[j]>a[j+1], swap them • How to swap? Merge Sort & Quick Sort Many a little makes a mickle... Merge Sort • Now given two SORTED list, how would you ‘merge’ them to form ONE SORTED list? List 1: 8 14 22 List 2: 10 13 29 65 Temporary list: 8 10 13 14 22 29 65 Combined list: Merge Sort Merge 1) While both lists have numbers still not yet considered • Compare the current first number in two lists • Put the smaller number into temporary list, then discard it 2) If list 1 is not empty, add them into temporary list 3) If list 2 is not empty, add them into temporary list 4) Put the numbers in temporary list back to the desired list Merge Sort • Suppose you are given a ‘function’ called ‘mergesort(L,R)’, which can sort the left half and right half of list from L to R: 10 • • • • 13 29 65 8 14 L (L+R)/2 (L+R)/2+1 How to sort the whole list? Merge them! How can we sort the left and right half? Why not making use of ‘mergesort(L,R)’? 22 R Merge Sort mergesort(L,R){ If L is equal to R, done; Otherwise, m=(L+R)/2; mergesort(L,M); mergesort(M+1,R); Merge the lists [L,M] and [M+1,R]; } Merge Sort mergesort(0,6) mergesort(0,1) 65 mergesort(0,0) 10 mergesort(1,1) mergesort(2,3) 29 13 mergesort(2,2) mergesort(3,3) mergesort(4,5) 14 65 13 10 29 13 22 mergesort(4,4) mergesort(5,5) mergesort(0,3) 10 8 8 mergesort(6,6) mergesort(4,6) 29 65 14 22 8 14 29 65 22 Merge Sort • Time complexity? • O(N lg N) • Why? Quick Sort • Choose a number as a ‘pivot’ • Put all numbers smaller than ‘pivot’ on its left side • Put all numbers greater than (or equal to) ‘pivot’ on its right side 10 13 29 65 8 14 22 10 13 8 14 22 65 29 Quick Sort a[y] < pivot! shift x, swap! • How? 10 x 13 29 65 8 14 22 y • y shifts to right by 1 unit in every round • Check if a[y] < pivot – If yes, shift x to right by 1 unit and swap a[x] and a[y] • If y is at 2nd last position, swap a[x+1] and pivot • Time complexity? Quick Sort 10 13 8 14 22 65 29 • Use similar idea as in merge sort • If we have a function called ‘quicksort(L,R)’... – Make use of ‘quicksort(L,R)’ to sort the two halves! Quick Sort quicksort(L,R){ If L is equal to R, done; Otherwise, Choose a number as a pivot, say a[p]; Perform pivoting action; quicksort(L,p-1); quicksort(p+1,R); } Quick Sort • • • • Time complexity in worst case? O(N2) What is the worst case? Preventions: – Choose a random number as pivot – Pick 3 numbers, choose their median • Under randomization, expected time complexity is O(N lg N) Counting Sort No more comparison... Counting Sort • Create a list, then tally! • Count the occurences of elements • Example: – Sort {2,6,1,4,2,1,9,6,4,1} in ascending order – A list from 1 to 9 (or upper bound of input no.) – Three ‘1’s, two ‘2’s, two ‘4’s, two ‘6’s and one ‘9’ – List them all! Counting Sort • Time complexity? • O(range * N) • Good for numbers with small range More... If we have time... More... • • • • • Lower bound of sorting efficiency?! Guess the answer by binary search Count the number of ‘inversions’ Find the kth largest number Other sorting algorithms Time for lunch Yummy!