Searching Arrays Problem: Given an array A of some type, and a value x of that same type, does the value x occur somewhere within A? Depending on the information needed, we can answer this in several ways. We could just say yes or now (true or false); however, this doesn’t provide information as to the location of x if it occurs in A. We could return the index of the first occurrence of x in A, or -1 if x does not occur in A. We could return the index of the last occurrence of x in A, or -1 if x does not occur in A. We could return the index of any occurrence of x in A, or -1 if x does not occur in A. Linear Search In a linear search, we start comparing x to elements of A beginning at A[0] and compare successive elements until either x is found or we have looked at all elements in A. Note that this will give us the first occurrence of x if it is in A. /* * Searches array A for the value x and returns the * index of the first occurence (-1 if x is not in A). */ public static int linearSearch_1(int[] A, int x) { int i; for(i=0; i<A.length; i++){ if(A[i]==x) return i; } return -1; //x was not found } An alternate way would be to initialize an additional index variable to -1, search all elements of the array from first to last, and each time x is found update the index to the current location. Note that this gives us the last occurrence of x if it is in A. /* * Search for the last occurrence of x in A. */ public static int linearSearch_2(int[] A, int x) { int i, index = -1; for(i=0; i<A.length; i++){ if(A[i]==x) index = i; } return index; //index is -1 unless x was found } Each of the algorithms above could also be implemented by searching from the last to the first element in A instead of from first to the last. How would this change the results for each of them? What is the cost of running a linear search? First consider linearSearch_1 above. Since this terminates upon the first occurrence of x, it depends on where x is located within the array. If it is at the front, then only one comparison is needed. If x is at the rear of the array, or not in the array at all, each element of the array must be compared. If the array has length n, then: ๏ท Best Case: O(1) (constant time) ๏ท Worst Case: O(n) (linear in the length of the array) What about linearSearch_2? In this case, the loop will run the entire length of A regardless of where x is. So, it’s running time is always O(n), but it is still linear in the length of A. Binary Search For binary search we assume the array A is sorted in increasing order. At any given point in this algorithm we are looking at some contiguous portion of the array, the whole array to start with. Further, we know that if x is in A, it must be in the portion currently being examined. At each stage, we look at A[m] where m is the index nearest the middle of the contiguous portion currently being considered. There are three possibilities. If x = A[m], it has been found and m is returned. If x < A[m], then if it occurs in A its index must be less than m. If x > A[m], then if it occurs in A, its index must be greater than m. Since m is near the middle, we can eliminate at least half the elements of the current portion. Here is the algorithm written as a method. /* * Binary Search Algorithm - Array must be sorted */ public static int binarySearch(int[] A, int x) { int f = 0; //front of segment considered int r = A.length - 1; //rear of segment considered int mid; //approximate middle of segment while (f <= r) { mid = (f + r) / 2; if (A[mid] == x) { //x is at A[mid] return mid; } if (A[mid] > x) { //x left of A[mid] r = mid - 1; } else { f = mid + 1; //x must be to right } } return -1; //x not found } For the following discussion recall the definition of the floor function. If x is a real number, then ⌊๐ฅ⌋ = ๐๐๐๐๐ก๐๐ ๐ก ๐๐๐ก๐๐๐๐ ๐๐๐ ๐ ๐กโ๐๐ ๐๐ ๐๐๐ข๐๐ ๐ก๐ ๐ฅ. For example, ⌊2.5⌋ = 2, ⌊−3.7⌋ = −4, and ⌊6.0⌋ = 6. What is the cost of running this algorithm on an array of size n? Observe that at each stage, if the value x is not found, an upper bound on the size of the set remaining to search is approximately half of the current set. More precisely, at each comparison, if x is not found and the size of the array portion is n, no more than ⌊๐/2⌋ elements in the current contiguous portion of the array will remain to be searched. Question: Given a set of n elements, how many times can one remove half of the remaining elements? For example, what if n is 21? After the first comparison at most 10 elements remain. After the second comparison at most 5 elements remain. After the third comparison at most 2 elements remain. After the fourth comparison 1 element remains. The fifth comparison will then resolve the problem. Observe that 24 ≤ 21 < 25 . In general given a positive integer n, there is a unique integer k such that 2๐ ≤ ๐ < 2๐+1. It can be shown that the maximum number of comparisons on an array of size n is ๐ + 1. Thus the maximum number of comparisons required in binary search for an array of length n is: ⌊๐๐๐2 (๐)⌋ + 1 = ๐(๐๐๐2 (๐)) The logarithm function grows slowly. To illustrate this with a large example, suppose our array had 20,000,000 elements. Then a binary search for an element in the array would require at most 25 comparisons (๐๐๐2 (20,000,000) = 24.25 … ), whereas a linear search could require 20,000,000 comparisons.