Chap12 -1- HASHING The associative containers, set, multisets, maps, and multimaps implement their operations using a binary search tree They access the data in sorted order – structures are ordered associative containers. Another type of associative container is a container, called a hash table A hash table distributes the elements in clusters defined by their value. An associated function called a has function maps a data item into a cluster, and the hash table inserts, updates, or erases the value in the cluster. A hash table provides an implementation of sets and maps However – it is an unordered associative container The running time analysis of the hash table search algorithm called chaining with separate lists, has a running time O(1) The efficiency of accessing data in a binary tree depends on the shape of the tree The worst case could be a degenerated tree O(n) However, there are a number of tree-balancing algorithms that create binary search trees to insure a balanced tree ( ex AVL tree) Chap12 -2HASHING FUNCTIONS: TABLE LOOKUP is a method to accomplish information retrieval. We can use INDEX FUNCTIONS that provide a one-to-one correspondence from a set of unique keys to locations in a list or array. The time required to retrieve an item from a list is therefore not dependent on the number of items in the list, but is bounded by a constant - using big O notation, the time required is O(1). Table lookup is therefore more efficient than any searching method. Example - Indexing rectangular arrays: To retrieve items stored in rectangular arrays implemented in contiguous storage - row major ordering, and assuming each item has only one key, and there is only one item with a given key, we can access each item in a rectangular array via the following formula or index function: Entry (i,j) goes to position ni + j RELATIONSHIP BETWEEN TABLES AND FUNCTIONS With a table - we start with an index and calculate the corresponding value. With a function, we start with an argument and calculate the corresponding value. Table access begins with an index ( index set - domain ) and we use a table to look up a corresponding value ( codomain - base type or value type ). Chap12 -3- ADT TABLE Definition and operations that can be performed on tables: A TABLE with domain ( index set I ) and codomain ( base type or value type T ) is a function from I into T together with the following operations: 1. Table Access: Evaluate the function at any index in I 2. Table assignment: Modify the function by changing its value at a specified index in I to the new value specified in the assignment. ( changing the value at a location in the table ). 3. Insertion: Adjoin a new element x to the index set I and define the corresponding value of the function at x. 4. Deletion: Delete an element x from the index set I and restrict the function to the resulting smaller domain. Chap12 -4- HASHING When the key cannot be used directly as an index ( as in an array ) we can still come up with an index function ( hash function ) that will produce an index into an array to locate entries in a table ( hash table ). We can use the following steps: 1. Start with an ARRAY that holds the hash table. 2. Initialize the all locations in the array to show that they are empty. 3. To insert a record into the hash table, use a HASH FUNCTION to take a key and map it to some index in the array. This function will generally map several different keys to the same index and cause a collision which must be resolved. 0 1 key Locater “hash function” i 2 key-value n-1 4. If the corresponding location is empty, then the record can be inserted, else if the keys are equal, we cannot add the key, and if the keys are different we have to resolve the collision. 5. Retrieving the record is similar. The hash functions is computed and if the record is in the corresponding location it can be retrieved. If not, and the location is nonempty, follow the same steps as for collision resolution. If the key is still not found, then the search is unsuccessful. THUS, our goal is: (a) find good hash functions and (b) determine how to resolve collisions. Chap12 -5- METHODS FOR BUILDING HASH FUNCTIONS TRUNCATION - Ignore part of the key and use the remaining part direct as an index. Example: If a key has 8 digits and hash table has 1000 locations, then a hash function could extract the 1st, second, and 5th digit from the right to produce an index into the hash table. 62538194 --- > index 394 This method is fast, but fails to distribute the keys evenly through the table FOLDING - Partition the key into several parts and combine the parts. Example: key 8 digits ---> 62538194 ---> 625+381+94 = 1100 This method achieves better spread than truncation alone. MODULAR ARITHMETIC - Convert the key to an integer, divide by the size of the index range, and take the remainder as the result. The spread depends on the modulus - the best choice is a prime number Example: Key consists of alphanumeric characters which are mapped by the hash function into a range of integers from 0 to HASHSIZE-1. Chap12 String HASH FUNCTION using modulus arithmetic /* Declarations for a chained hash table */ #define HASHSIZE 997 typedef char *Key_type; typedef struct item_tag { Key_type key; } Item_type; typedef struct node_tag { Item_type info; /* Information to be stored in table*/ struct node_tag *next; /*next item in the linked list*/ } Node_type; typedef Node_type *List_type; typedef List_type Hashtable_type[HASHSIZE]; int Hash ( Key_type s) { int h = 0; while (*s) /* loop through all the characters */ h += *s ++ ; /* Add the value of each to h */ return abs ( h % HASHSIZE ); /* return index into hash table */ } -6- Chap12 -7- COLLISION RESOLUTION WITH OPEN ADDRESSING LINEAR PROBING - starts with the hash address and searches a circular array sequentially for the target key or an empty position. Major drawback: There is a tendency toward clustering. 77 89 14 94 QUADRATIC PROBING - If there is a collision at the hash address h, quadratic probing goes to locations h+1, H=4, h+9 ...... 2 that is, at locations h + i ( % HASHSIZE ) for i = 1, 2 . Note: If HASHSIZE is prime than the total number of distinct positions that will be probed is exactly (HASHSIZE +1)/2. If that many probes have been made ->>> we have overflow. KEY-DEPENDENT INCREMENTS - Let the increment depend on the quotient of the same division that calculates the remainder or let the increment be a function of the key such a truncating the key to a single character and use its code as the increment. (ex: increment = *key); The increment remains a constant, and if HASHSIZE is a prime, the probes will step through all entries in the hashtable and overflow will not occur until array is completely full. RANDOM PROBING - The generator should always generate the same sequence provided it starts with the same seed. The seed can be specified as some function of the key. This method avoids clustering, but is slower than others. With any of the above methods, deletions are difficult. Chap12 -8- COLLISION RESOLUTION BY CHAINING with separate lists bucket 0 1 2 3 n-1 ADVANTAGES OF LINKED STORAGE: 1. Collision resolution becomes easy. 2. Overflow only occurs if system is out of memory. 3. Deletion is straight forward. 4. Space saving when the records are large or the table is not nearly full. DISADVANTAGE OF LINKED STORAGE: 1. Extra space for linking records is needed. If records are small the extra space could be substantial. Chap12 -9- Function Objects A function object is an object of a class that behaves like a function. These objects can be created, stored and destroyed like any other object and can have associated data members and operations. template <typename T> class functionObject { public returnType operator ( ) (arguments) const { // use arguments to create a return value ……. return returnValue; } ….. }; Example functionObject greaterThan template <typename T> class greaterThan { public bool operator ( ) (const T& x, const T& y) const { // use arguments to create a return value return x > y ; } }; The expression greaterThan<T> defines a type whose objects act like a function that compares two values of type T. greaterThan<int> f; //object f of type greaterThan<int> int a, b; cin >> a >> b; if ( f ( a, b ) ) //evaluates to f.operator() (a,b) cout << a << “ > “ << b << endl; else cout << a << “ <= “ b << endl; // File: prg12_1.cpp Chap12 - 10 - Program 12-1 // // // // // // // // // // the program demonstrates the use of function object types. it declares the function object types greaterThan and lessThan, whose objects evaluate the operators > and < respectively. a modified version of the insertion sort takes a second template argument that corresponds to a function object type. the function object is used to order elements. in this way, the function can sort a vector in either ascending or descending order. the program declares a vector and calls insertionSort() to order the values both ways. in each case, writeVector() outputs the sorted values #include <iostream> #include <vector> #include "d_util.h" // for writeVector() using namespace std; // objects of type greaterThan<T> evaluate x > y template<typename T> class greaterThan { public: bool operator() (const T& x, const T& y) const { return x > y; } }; // objects of type lessThan<T> evaluate x < y template<typename T> class lessThan { public: bool operator() (const T& x, const T& y) const { return x < y; } }; Chap12 - 11 - // use the insertion sort to order v using function object comp template <typename T, typename Compare> void insertionSort(vector<T>& v, Compare comp); int main() { int arr[] = {2, 1, 7, 8, 12, 15, 3, 5}; int arrSize = sizeof(arr)/sizeof(int); vector<int> v(arr, arr+arrSize); // put the vector in ascending order insertionSort(v, lessThan<int>()); // output it writeVector(v); cout << endl; // put the vector in descending order insertionSort(v, greaterThan<int>()); writeVector(v); // output it cout << endl; return 0; } template <typename T, typename Compare> void insertionSort(vector<T>& v, Compare comp) { int i, j, n = v.size(); T temp; // place v[i] into the sublist v[0] ... v[i-1], 1 <= i <= n-1, // so it is in the correct position for (i = 1; i < n; i++) { // index j scans down list from v[i] looking for correct position // to locate target. assigns it to v[j] j = i; temp = v[i]; // locate insertion point by scanning downward as long // as comp(temp, v[j-1]) is true and we have not encountered // the beginning of the list while (j > 0 && comp(temp, v[j-1])) { // shift elements up list to make room for insertion v[j] = v[j-1]; j--; } // the location is found; insert temp v[j] = temp; } } /* Run: 1 2 3 5 7 8 12 15 15 12 8 7 5 3 2 1 */#ifdef __BORLANDC__ Chap12 - 12 Hash Function Objects #ifndef HASH_FUNCTIONS #define HASH_FUNCTIONS #include <string> #include <cmath> using namespace std; class hFintID { public: unsigned int operator()(int item) const { return (unsigned)item; } }; class hFint { public: unsigned int operator()(int item) const { unsigned int value = (unsigned int)item; value *= value; // square the value value /= 256; // discard the low order 8 bits return value % 65536; // return result in range 0 to 65535 } }; class hFreal { public: unsigned int operator()(double item) const { int exp; double mant; unsigned int hashval; if (item == 0) hashval = 0; else { mant = frexp(item,&exp); hashval = (unsigned int)((2 * fabs(mant) -1) * (unsigned int)~0); } return hashval; } }; class hFstring { public: unsigned int operator()(const string& item) const { unsigned int prime = 2049982463; int n = 0, i; for (i = 0; i < item.length(); i++) n = n*8 + item[i]; return n > 0 ? (n % prime) : (-n % prime); } }; #endif // HASH_FUNCTIONS Chap12 - 13 - #ifndef HASH_CLASS #define HASH_CLASS #include #include #include #include <iostream> <vector> <list> <utility> #include "d_except.h" using namespace std; template <typename T, typename HashFunc> class hash { public: #include "d_hiter.h" // hash table iterator nested classes hash(int nbuckets, const HashFunc& hfunc = HashFunc()); // constructor specifying the number of buckets in // the hash table and the hash function hash(T *first, T *last, int nbuckets, const HashFunc& hfunc = HashFunc()); // constructor with arguments including a pointer range // [first, last) of values to insert, the number of // buckets in the hash table, and the hash function bool empty() const; // is the hash table empty? int size() const; // return number of elements in the hash table iterator find(const T& item); const_iterator find(const T& item) const; // return an iterator pointing at item if it is in the // table; otherwise, return end() pair<iterator,bool> insert(const T& item); // if item is not in the table, insert it and // return a pair whose iterator component points // at item and whose bool component is true. if item // is in the table, return a pair whose iterator // component points at the existing item and whose // bool component is false // Postcondition: the table size increases by 1 if item // is not in the table int erase(const T& item); // if item is in the table, erase it and return 1; // otherwise, return 0 // Postcondition: the table size decreases by 1 if // item is in the table void erase(iterator pos); // erase the item pointed to by pos. Chap12 - 14 // Precondition: the table is not empty and pos points // to an item in the table. if the table is empty, the // function throws the underflowError exception. if the // iterator is invalid, the function throws the // referenceError exception. // Postcondition: the tree size decreases by 1 void erase(iterator first, iterator last); // erase all items in the range [first, last). // Precondition: the table is not empty. if the table // is empty, the function throws the underflowError // exception. // Postcondition: the size of the table decreases by // the number of elements in the range [first, last) iterator begin(); // return an iterator positioned at the start of the // hash table const_iterator begin() const; // constant version iterator end(); // return an iterator positioned past the last // element of the hash table const_iterator end() const; // constant version private: int numBuckets; // number of buckets in the table vector<list<T> > bucket; // the hash table is a vector of lists HashFunc hf; // hash function int hashtableSize; // number of elements in the hash table }; Chap12 - 15 - // constructor. create an empty hash table template <typename T, typename HashFunc> hash<T, HashFunc>::hash(int nbuckets, const HashFunc& hfunc): numBuckets(nbuckets), bucket(nbuckets), hf(hfunc), hashtableSize(0) { } // constructor. initialize table from pointer range [first, last) template <typename T, typename HashFunc> hash<T, HashFunc>::hash(T *first, T *last, int nbuckets, const HashFunc& hfunc): numBuckets(nbuckets), bucket(nbuckets), hf(hfunc), hashtableSize(0) { T *p = first; while (p != last) { insert(*p); p++; } } template <typename T, typename HashFunc> bool hash<T, HashFunc>::empty() const { return hashtableSize == 0; } template <typename T, typename HashFunc> int hash<T, HashFunc>::size() const { return hashtableSize; } template <typename T, typename HashFunc> hash<T, HashFunc>::iterator hash<T, HashFunc>::find(const T& item) { // hashIndex is the bucket number (index of the linked list) int hashIndex = int(hf(item) % numBuckets); // use alias for bucket[hashIndex] to avoid indexing list<T>& myBucket = bucket[hashIndex]; // use to traverse the list bucket[hashIndex] list<T>::iterator bucketIter; // returned if we find item // traverse list and look for a match with item bucketIter = myBucket.begin(); while(bucketIter != myBucket.end()) { // if locate item, return an iterator positioned in // bucket hashIndex at location bucketIter if (*bucketIter == item) return iterator(this, hashIndex, bucketIter); bucketIter++; } // return iterator positioned at the end of the hash table return end(); } Chap12 - 16 - template <typename T, typename HashFunc> hash<T, HashFunc>::const_iterator hash<T, HashFunc>::find(const T& item) const { // hashIndex is the bucket number (index of the linked list) int hashIndex = int(hf(item) % numBuckets); // use alias for bucket[hashIndex] to avoid indexing const list<T>& myBucket = bucket[hashIndex]; // use to traverse the list bucket[hashIndex] list<T>::const_iterator bucketIter; // returned if we find item // traverse list and look for a match with item bucketIter = myBucket.begin(); while(bucketIter != myBucket.end()) { // if locate item, return an iterator positioned in // bucket hashIndex at location bucketIter if (*bucketIter == item) return const_iterator(this, hashIndex, bucketIter); bucketIter++; } // return iterator positioned at the end of the hash table return end(); } template <typename T, typename HashFunc> pair<hash<T, HashFunc>::iterator,bool> hash<T, HashFunc>::insert(const T& item) { // hashIndex is the bucket number int hashIndex = int(hf(item) % numBuckets); // for convenience, make myBucket an alias for bucket[hashIndex] list<T>& myBucket = bucket[hashIndex]; // use iterator to traverse the list myBucket list<T>::iterator bucketIter; // specifies whether or not we do an insert bool success; // traverse list until we arrive at the end of // the bucket or find a match with item bucketIter = myBucket.begin(); while (bucketIter != myBucket.end()) if (*bucketIter == item) break; else bucketIter++; if (bucketIter == myBucket.end()) { // at the end of the list, so item is not // in the hash table. call list class insert() // and assign its return value to bucketIter bucketIter = myBucket.insert(bucketIter, item); Chap12 - 17 success = true; // increment the hash table size hashtableSize++; } else // item is in the hash table. duplicates not allowed. // no insertion success = false; // return a pair with iterator pointing at the new or // pre-existing item and success reflecting whether // an insert took place return pair<iterator,bool> (iterator(this, hashIndex, bucketIter), success); } template <typename T, typename HashFunc> void hash<T, HashFunc>::erase(iterator pos) { if (hashtableSize == 0) throw underflowError("hash erase(pos): hash table empty"); if (pos.currentBucket == -1) throw referenceError("hash erase(pos): invalid iterator"); // go to the bucket (list object) and erase the list item // at pos.currentLoc bucket[pos.currentBucket].erase(pos.currentLoc); } template <typename T, typename HashFunc> void hash<T, HashFunc>::erase(hash<T, HashFunc>::iterator first, hash<T, HashFunc>::iterator last) { if (hashtableSize == 0) throw underflowError("hash erase(first,last): hash table empty"); // call erase(pos) for each item in the range while (first != last) erase(first++); } template <typename T, typename HashFunc> int hash<T, HashFunc>::erase(const T& item) { iterator iter; int numberErased = 1; iter = find(item); if (iter != end()) erase(iter); else numberErased = 0; return numberErased; } Chap12 - 18 - template <typename T, typename HashFunc> hash<T, HashFunc>::iterator hash<T, HashFunc>::begin() { hash<T, HashFunc>::iterator tmp; tmp.hashTable = this; tmp.currentBucket = -1; // start at index -1 + 1 = 0 and search for a non-empty // list tmp.findNext(); return tmp; } template <typename T, typename HashFunc> hash<T, HashFunc>::const_iterator hash<T, HashFunc>::begin() const { hash<T, HashFunc>::const_iterator tmp; tmp.hashTable = this; tmp.currentBucket = -1; // start at index -1 + 1 = 0 and search for a non-empty // list tmp.findNext(); return tmp; } template <typename T, typename HashFunc> hash<T, HashFunc>::iterator hash<T, HashFunc>::end() { hash<T, HashFunc>::iterator tmp; tmp.hashTable = this; // currentBucket of -1 means we are at end of the table tmp.currentBucket = -1; return tmp; } template <typename T, typename HashFunc> hash<T, HashFunc>::const_iterator hash<T, HashFunc>::end() const { hash<T, HashFunc>::const_iterator tmp; tmp.hashTable = this; // currentBucket of -1 means we are at end of the table tmp.currentBucket = -1; return tmp; } #endif // HASH_CLASS Chap12 // // // // // // // // // - 19 - File: prg12_2.cpp the program declares a hash table with integer data and the identity hash function object. it inserts the elements from the array intArr into the hash table, noting which values are duplicates that do not go into the table. after displaying the size of the hash table, a loop prompts the user for 2 values. if a value is in the table, the erase() operation deletes it. the program terminates by using an iterator to traverse and output the elements of the hash table #include <iostream> #include "d_hash.h" #include "d_hashf.h" using namespace std; int main() { // array that holds 10 integers with some duplicates int intArr[] = {20, 16, 9, 14, 8, 17, 3, 9, 16, 12}; int arrSize = sizeof(intArr)/sizeof(int); // alias describing integer hash table using identity function // object typedef hash<int, hFintID> hashTable; // hash table with 7 buckets and hash iterator hashTable ht(7); hashTable::iterator hIter; // <iterator,bool> pair for the insert operation pair<hashTable::iterator, bool> p; int item, i; // insert elements from intArr, noting duplicates for (i = 0; i < arrSize; i++) { p = ht.insert(intArr[i]); if (p.second == false) cout << "Duplicate value " << intArr[i] << endl; } // output the hash size which reflects duplicates cout << "Hash table size " << ht.size() << endl; // prompt for item to erase and indicate if not found for (i = 1; i <= 2; i++) { cout << "Enter a number to delete: "; cin >> item; if ((hIter = ht.find(item)) == ht.end()) cout << "Item not found" << endl; else ht.erase(hIter); Chap12 } // output the elements using an iterator to scan the table for (hIter = ht.begin(); hIter != ht.end(); hIter++) cout << *hIter << " "; cout << endl; return 0; } /* Run: Duplicate value 9 Duplicate value 16 Hash table size 8 Enter a number to delete: 10 Item not found Enter a number to delete: 17 14 8 16 9 3 12 20 */ - 20 -