Lecture Notes Slide 1 Summarize the four method of search that we will discuss. Serial Search Binary Search Open Address Hashing Chaining Slide 4 Airline reservation systems – search for free seats on all flights for a given day of the week. Given N credit card transactions for a given day and M known stolen credit cards. Identify which of these N transactions involve a stolen credit card. N is typically much larger than M. For example, N = 10,000,000 and M = 100,000. Slide 6-7 Serial Search Examines the elements of an array a[0:n-1], from left to right, to see if one of them equals x. If an element equal to x is found, return the position of the first occurrence of x. If the array has no element equal to x, return –1. Slide 8 Best Case If x is stored in a[0], only one array access is necessary. Worst Case If x is in position a[n-1] or x is not in the array at all, n array accesses are necessary. Average Case (successful search) Assume x is randomly placed in the array (i.e., x has an equal probability of being stored in any position of the array). We analyze what happens if x is stored in each position of the array. If x is in a[0] then 1 If x is in a[1] then 2 If x is in a[1] then 3 ... If x is in a[n-1] then comparison is necessary to detect x. comparisons are necessary to detect x. comparisons are necessary to detect x. n comparisons are necessary to detect x. Average number of comparisons = (1 + 2 + … + n)/n = (n+1)/2 Average Case (unsuccessful search) Average number of comparisons = n Slides 14-15 Binary Search Examines the elements of a sorted array a[0:n-1] to see if one of them equals x. The search begins by comparing x to the middle of the array, a[middle]. If x equals this element, the search terminates. If x is smaller than this element, then we need to search the left half of the array for x. If x is greater than this element, then we search the right half of the array for x. The processes continues until we find the element that equals x, in which case, we return its position. If the search fails, we return –1. Slide 17 Search for x = 51 0 1 2 12 17 30 3 45 4 47 5 6 7 8 52 59 77 81 9 82 10 85 start size middle 0 11 5 0 5 2 3 2 4 5 1 5 5 0 (Seems to work but suspicious) Analysis of splitting array Start with an array of odd size = 2m+1, a[0:2m] then first = 0, size = 2m+1, size/2 = m, middle = m. So, search(a,first,size/2,x) = search(a, 0, m, x)=> a[0:m-1] search(a,middle+1,size/2,target) = search(a, m+1, m, x)=> a[m+1,2m] Splitting looks ok. Start with an array of even size = 2m, a[0:2m-1] then first = 0, size = 2m, size/2 = m, middle = m. So, search(a,first,size/2,x) = search(a, 0, m, x)=> a[0:m-1] search(a,middle+1,size/2,target) = search(a, m+1, m, x)=> a[m+1,2m] Splitting looks bogus. Slide 21 Can a link list be used to implement binary search? We would have to build an ordered list. Problem: finding the middle would require O(n) operations vs. O(1) for an array. The maximum number of function calls is (log n). Therefore, with an array the running time is (log n), while with a linked list the running time is (n log n). Slide 22 Suppose 2k-1 N 2k – 1 nodes. N = 2k-1 + t where 0 t 2k-1 – 1 Nodes 1 3 7 … 2k-1 – 1 2k-1 N 2k – 1 Max Levels (comparisons) 1 2 3 … k-1 k k k Example N = 15 = 23 + 7 => k-1 = 3 or k = 4 is the max number of comparisons need with binary search. Example (Search for H) 0 A 1 A Start 0 8 8 8 2 A 3 C 4 E 5 E 6 F size 15 7 3 1 7 G 8 H 9 10 I L middle 7 11 9 8 (Search for P) Start 0 8 12 size 15 7 3 middle 7 11 13 Recurrence relation: T(1) = 1 T(n) = T(n/2) + 1, n > 1 Suppose n = 2k T(n) = T(n/2) + 1 = T(n/4) + 1 + 1 = T(n/8) + 1 + 1 + 1 = … = T(n/2k) + 1 + 1 + … + 1 = k + 1 = lg n + 1 11 M 12 N 13 P 14 R Slide 25 Interpolation search Generalizes the idea of computing the middle = first + size/2 with first + size*(x – a[first])/(a[first+size-1]-a[first]) Assumptions must be made though – the keys are numerical and they are uniformly distributed. Performs poorly when keys are not uniformly distributed. Lg lg n + 1 for large random files For small n, lg n is close to lg lg n so no improvement over binary search. Example n = 1,000,000,000 lg lg 1,000,000,000 < 5 Slide 32 Toy Example: H(key) = key % 11 Hash the integer keys: 50, 31, 78, 20, 22, 65 0 1 2 22 78 65 3 4 5 6 7 8 50 findIndex(31) returns i= 9 findIndex(65) returns i= 10, 0, 1, 2 findIndex(53) returns i = 9, 10, 0, 1, 2, 3, -1 9 31 10 20 Slide 33 put(28,”Happy Birthday”) 0 1 2 22 78 65 3 4 5 6 7 8 50 Clustering 9 31 10 20 Replace key 50 with 28 and its data with “Happy Birthday”, return 50’s data to caller Slide 34 Put(104,”Bon jour”) Insert key 104 at index 5 in keys, insert “Bon jour” in data[5], hasBeenUsed[5] is set to true, numItems is increased by 1, return null to caller. How is an element removed from a hash table? Slide 35-36 Double hashing H1(key) = key % 11 H2(key) = key % 9 Hash the integer keys: 50, 31, 78, 20, 22, 65, 39 0 1 20 78 2 3 4 22 H1(20) = 20 % 11 = 9 collision H2(20) = 20 % 9 = 2 H1(22) = 22 % 11 = 0 collision H2(22) = 22 % 9 = 4 H1(65) = 65 % 11 = 10 5 6 7 50 39 8 9 31 10 65 H1(39) H2(39) Trying How to = 39 % 11 = 6 collision = 39 % 9 = 3 to place 39: Collision at 9, 1, then 4 finally place at 7. choose the second hash function? Make sure the second function can never evaluate to 0 since if it does the probe sequence would lead to an infinite loop. (Knuth) Chose twin primes M and M-2. If h1(key) = key mod M then h2(key) = key mod (M-2) + 1 This discussion applies to an assignment made during the summer of 2002. Discuss HW3 Algorithm Input M (number of soldiers) and N (number used to eliminate soldiers). Input the names of the soldiers in clockwise order. Construct the almost complete binary tree T. (see figure 1) Let p point to the root of T. Let the number of leaves that remain to be counted = N % M. while there is more than one leaf in T { //Locate the next soldier to be eliminated while there is more than one leaf in the subtree pointed to by p (see figure 2, 5, 8) { Let p be set to point to its left child. if number of leaves that remain to be counted > number of leaves in the subtree pointed to by p { Decrement number of leaves that remain to be counted by number of leaves in the subtree pointed to by p Let p point to its right brother } } // p should be pointing to the next soldier to be eliminated from the circle Display the soldier pointed to by p. Let q point to the same node that p points to. while q is not null (see figure 3, 6, 9) { //Reduce the count of each ancestor of q Decrement number of leaves in the subtree pointed to by q by 1. if number of leaves in the subtree pointed to by q equals 1 then { if number of leaves in the subtree pointed to by the left child of q equals 1 then replace the soldier's name in q with the soldier's name in q's left child else replace the soldier's name in q with the soldier's name in q's right child } replace q with q's parent } Let the number of leaves that remain to be counted be set to N. if p points to a left child then let p point to it right brother while number of leaves pointed to by { Decrement number of pointed to by p while p points to a (see figure 4, 7, 10) that remain to be counted > number of leaves in the subtree p and p not equal to T leaves that remain to be counted by the number of leaves in the subtree right child set p to point to its father if p does not point to T then let p point to its right brother } if p points to T then set number of leaves to be counted - 1 to itself modulo the number of leaves in the subtree pointed to by T + 1 } // At this point only, one leaf remains containing the lucky soldier who goes for reinforcements Figure 1 p Remain = 4 T 9 5 3 Ajax 1 2 Cato 1 2 4 Drusus 1 2 Flavius 1 Gaius 1 2 Hector 1 Maximus 1 Otho 1 Brutus 1 Figure 2 T Remain = 4 9 p 5 4 p p 3 2 2 Remain = 1 2 p Cato 1 2 Ajax 1 Brutus 1 Drusu s 1 Flavius 1 Gaius 1 Hector 1 Maximus 1 Otho 1 Figure 3 q T 9 8 q 5 4 4 Flavius 2 q 3 2 2 1 P Cato 1 2 q Drusus 1 0 Flavius 1 Gaius 1 Hector 1 Maximus 1 Otho 1 Remain = 13 Ajax 1 Brutus 1 Figure 4 T P 8 Remain = 8 P P 4 Remain = 8 4 P Flavius 1 3 2 Remain = 12 2 P Cato 1 2 Ajax 1 Brutus 1 Drusus 0 Flavius 1 Gaius 1 Hector 1 Maximus 1 Otho 1 Figure 5 T P Remain = 8 8 P P Remain = 4 4 4 P Flavius 1 3 P 2 2 P Cato 1 2 Ajax 1 Gaius 1 Remain = 2 Hector 1 p Maximus 1 Remain = 1 Otho 1 Brutus 1 Figure 6 T q 8 7 q 4 3 4 q Flavius 1 3 2 2 1 Maximus remain = 13 p q Cato 1 2 Ajax 1 Brutus 1 Gaius 1 Hector 1 Maximus 1 Otho 0 Figure 7 T p 7 p 4 3 p Flavius 1 3 1 Maximus 2 remain = 13 p Cato 1 2 Ajax 1 Gaius 1 Hector 1 Maximus 1 Brutus 1 Figure 8 T p 7 remain = 6 p p Remain = 2 4 3 p Flavius 1 3 2 p Cato 1 2 Ajax 1 Brutus 1 Maximus 1 Gaius 1 p Remain = 1 Hector 1 Otho 0 Figure 9 T q 7 6 q 4 32 q Flavius 1 3 Maximus 1 21 Gaius p q Cato 1 2 Ajax 1 Gaius 1 Remain = 1 Hector 10 Brutus 1 Figure 10 T P Remain=6 6 p 4 2 P p Flavius 1 3 Gaius Cato 1 2 Ajax 1 Brutus 1 Maximus 1 1 p Gaius 1 Remain = 12 Remain = 13 Hector 0 Figure 11 T p 6 Remain=6 p p 4 3 Remain=2 p Gaius 1 Remain=1 Maximus 1 Cato 1 2 Ajax 1 p Flavius 1 2 Brutus 1 Figure 12 T Q Remain=3 5 q 1 4 Gaius Flavius 1 3 Cato 1 2 Ajax 1 Brutus 1 Gaius 1 p q Remain=13 Maximus 0 Figure 13 T p Remain=3 5 p p p Ajax 1 Flavius 1 3 p Remain=1 Cato 1 2 Brutus 1 Gaius 1 4 Trie //*** Leaf //*** //*** //*** //*** //*** -------------|is Leaf| true | |-------|------| |suffix | | -------------- //*** NonLeaf //*** //*** //*** //*** //*** //*** //*** //*** //*** --------------|isLeaf |false| |---------|-----| |endOfWord| | |---------|-----| |letters | o--|----->[ ][ ][ ][ ][ ] (array of letters) |---------|-----| |ptrs[] | o--|----->[o][o][o][o][o](ptrs to leaves/nonleaves) --------------| | | | | v v v v v class TrieNode { public boolean isLeaf; } class TrieNonLeaf extends TrieNode { public boolean endOfWord; public String letters; public TrieNode[] ptrs = new TrieNode[1]; TrieNonLeaf() { isLeaf = false; } TrieNonLeaf(char ch) { letters = new String(); letters += ch; isLeaf = false; } } class TrieLeaf extends TrieNode { public String suffix; TrieLeaf() { isLeaf = true; } TrieLeaf(String s) { suffix = new String(s); isLeaf = true; } } Words: the there their was when wasp # t w # h # a h # # # the # s # a e i r r # # e p # was wasp there their t what