Searching Arrays

advertisement
Searching Arrays
Problem: Given an array A of some type, and a value x of that same type, does the
value x occur somewhere within A? Depending on the information needed, we can
answer this in several ways. We could just say yes or now (true or false);
however, this doesn’t provide information as to the location of x if it occurs in A.
We could return the index of the first occurrence of x in A, or -1 if x does not occur
in A. We could return the index of the last occurrence of x in A, or -1 if x does not
occur in A. We could return the index of any occurrence of x in A, or -1 if x does not
occur in A.
Linear Search
In a linear search, we start comparing x to elements of A beginning at A[0] and
compare successive elements until either x is found or we have looked at all
elements in A. Note that this will give us the first occurrence of x if it is in A.
/*
* Searches array A for the value x and returns the
* index of the first occurence (-1 if x is not in A).
*/
public static int linearSearch_1(int[] A, int x) {
int i;
for(i=0; i<A.length; i++){
if(A[i]==x)
return i;
}
return -1;
//x was not found
}
An alternate way would be to initialize an additional index variable to -1, search all
elements of the array from first to last, and each time x is found update the index to
the current location. Note that this gives us the last occurrence of x if it is in A.
/*
* Search for the last occurrence of x in A.
*/
public static int linearSearch_2(int[] A, int x) {
int i, index = -1;
for(i=0; i<A.length; i++){
if(A[i]==x)
index = i;
}
return index; //index is -1 unless x was found
}
Each of the algorithms above could also be implemented by searching from the last
to the first element in A instead of from first to the last. How would this change the
results for each of them?
What is the cost of running a linear search? First consider linearSearch_1
above. Since this terminates upon the first occurrence of x, it depends on where x is
located within the array. If it is at the front, then only one comparison is needed. If
x is at the rear of the array, or not in the array at all, each element of the array must
be compared. If the array has length n, then:
๏‚ท Best Case: O(1) (constant time)
๏‚ท Worst Case: O(n) (linear in the length of the array)
What about linearSearch_2? In this case, the loop will run the entire length of A
regardless of where x is. So, it’s running time is always O(n), but it is still linear in
the length of A.
Binary Search
For binary search we assume the array A is sorted in increasing order. At any given
point in this algorithm we are looking at some contiguous portion of the array, the
whole array to start with. Further, we know that if x is in A, it must be in the portion
currently being examined. At each stage, we look at A[m] where m is the index
nearest the middle of the contiguous portion currently being considered. There are
three possibilities. If x = A[m], it has been found and m is returned. If x < A[m], then
if it occurs in A its index must be less than m. If x > A[m], then if it occurs in A, its
index must be greater than m. Since m is near the middle, we can eliminate at least
half the elements of the current portion. Here is the algorithm written as a method.
/*
* Binary Search Algorithm - Array must be sorted
*/
public static int binarySearch(int[] A, int x) {
int f = 0;
//front of segment considered
int r = A.length - 1;
//rear of segment considered
int mid; //approximate middle of segment
while (f <= r) {
mid = (f + r) / 2;
if (A[mid] == x) { //x is at A[mid]
return mid;
}
if (A[mid] > x) { //x left of A[mid]
r = mid - 1;
} else {
f = mid + 1; //x must be to right
}
}
return -1;
//x not found
}
For the following discussion recall the definition of the floor function. If x is a real
number, then ⌊๐‘ฅ⌋ = ๐‘”๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘’๐‘ ๐‘ก ๐‘–๐‘›๐‘ก๐‘’๐‘”๐‘’๐‘Ÿ ๐‘™๐‘’๐‘ ๐‘  ๐‘กโ„Ž๐‘Ž๐‘› ๐‘œ๐‘Ÿ ๐‘’๐‘ž๐‘ข๐‘Ž๐‘™ ๐‘ก๐‘œ ๐‘ฅ. For example, ⌊2.5⌋ =
2, ⌊−3.7⌋ = −4, and ⌊6.0⌋ = 6.
What is the cost of running this algorithm on an array of size n? Observe that at
each stage, if the value x is not found, an upper bound on the size of the set
remaining to search is approximately half of the current set. More precisely, at each
comparison, if x is not found and the size of the array portion is n, no more than
⌊๐‘›/2⌋ elements in the current contiguous portion of the array will remain to be
searched. Question: Given a set of n elements, how many times can one remove half
of the remaining elements? For example, what if n is 21? After the first comparison
at most 10 elements remain. After the second comparison at most 5 elements
remain. After the third comparison at most 2 elements remain. After the fourth
comparison 1 element remains. The fifth comparison will then resolve the problem.
Observe that 24 ≤ 21 < 25 . In general given a positive integer n, there is a unique
integer k such that 2๐‘˜ ≤ ๐‘› < 2๐‘˜+1. It can be shown that the maximum number of
comparisons on an array of size n is ๐‘˜ + 1. Thus the maximum number of
comparisons required in binary search for an array of length n is:
⌊๐‘™๐‘œ๐‘”2 (๐‘›)⌋ + 1 = ๐‘‚(๐‘™๐‘œ๐‘”2 (๐‘›))
The logarithm function grows slowly. To illustrate this with a large example,
suppose our array had 20,000,000 elements. Then a binary search for an element in
the array would require at most 25 comparisons (๐‘™๐‘œ๐‘”2 (20,000,000) = 24.25 … ),
whereas a linear search could require 20,000,000 comparisons.
Download