hashing

advertisement
Chap12
-1-
HASHING
The associative containers, set, multisets, maps, and multimaps implement their
operations using a binary search tree
They access the data in sorted order – structures are ordered associative containers.
Another type of associative container is a container, called a hash table
A hash table distributes the elements in clusters defined by their value.
An associated function called a has function maps a data item into a cluster, and the hash
table inserts, updates, or erases the value in the cluster.
A hash table provides an implementation of sets and maps
However – it is an unordered associative container
The running time analysis of the hash table search algorithm called chaining with
separate lists, has a running time O(1)
The efficiency of accessing data in a binary tree depends on the shape of the tree
The worst case could be a degenerated tree O(n)
However, there are a number of tree-balancing algorithms that create binary search trees
to insure a balanced tree ( ex AVL tree)
Chap12
-2HASHING FUNCTIONS:
TABLE LOOKUP is a method to accomplish information retrieval. We can
use INDEX FUNCTIONS that provide a one-to-one correspondence from a
set of unique keys to locations in a list or array.
The time required to retrieve an item from a list is therefore not dependent
on the number of items in the list, but is bounded by a constant - using big O
notation, the time required is O(1). Table lookup is therefore more efficient
than any searching method.
Example - Indexing rectangular arrays:
To retrieve items stored in rectangular arrays implemented in contiguous
storage - row major ordering, and assuming each item has only one key, and
there is only one item with a given key, we can access each item in a
rectangular array via the following formula or index function:
Entry (i,j) goes to position ni + j
RELATIONSHIP BETWEEN TABLES AND FUNCTIONS
With a table - we start with an index and calculate the corresponding value.
With a function, we start with an argument and calculate the corresponding
value.
Table access begins with an index ( index set - domain ) and we use a table
to look up a corresponding value ( codomain - base type or value type ).
Chap12
-3-
ADT TABLE
Definition and operations that can be performed on tables:
A TABLE with domain ( index set I ) and codomain ( base type or value
type T ) is a function from I into T together with the following operations:
1. Table Access: Evaluate the function at any index in I
2. Table assignment: Modify the function by changing its value at a
specified index
in I to the new value specified in the assignment. ( changing the value at a
location
in the table ).
3. Insertion: Adjoin a new element x to the index set I and define the
corresponding
value of the function at x.
4. Deletion: Delete an element x from the index set I and restrict the function
to the
resulting smaller domain.
Chap12
-4-
HASHING
When the key cannot be used directly as an index ( as in an array ) we can
still come up with an index function ( hash function ) that will produce an
index into an array to locate entries in a table ( hash table ).
We can use the following steps:
1. Start with an ARRAY that holds the hash table.
2. Initialize the all locations in the array to show that they are empty.
3. To insert a record into the hash table, use a HASH FUNCTION to take a
key and map it to some index in the array. This function will generally map
several different keys to the same index and cause a collision which must be
resolved.
0
1
key
Locater
“hash function”
i
2
key-value
n-1
4. If the corresponding location is empty, then the record can be inserted,
else if the keys are equal, we cannot add the key, and if the keys are different
we have to resolve the collision.
5. Retrieving the record is similar. The hash functions is computed and if the
record is in the corresponding location it can be retrieved. If not, and the
location is nonempty, follow the same steps as for collision resolution. If the
key is still not found, then the search is unsuccessful.
THUS, our goal is:
(a) find good hash functions and
(b) determine how to resolve collisions.
Chap12
-5-
METHODS FOR BUILDING HASH FUNCTIONS
TRUNCATION - Ignore part of the key and use the remaining part direct
as an index.
Example:
If a key has 8 digits and hash table has 1000 locations, then a hash function
could extract the 1st, second, and 5th digit from the right to produce an
index into the hash table.
62538194 --- > index 394
This method is fast, but fails to distribute the keys evenly through the table
FOLDING - Partition the key into several parts and combine the parts.
Example:
key 8 digits ---> 62538194 ---> 625+381+94 = 1100
This method achieves better spread than truncation alone.
MODULAR ARITHMETIC - Convert the key to an integer, divide by the
size of the index range, and take the remainder as the result.
The spread depends on the modulus - the best choice is a prime number
Example: Key consists of alphanumeric characters which are mapped by the
hash function into a range of integers from 0 to HASHSIZE-1.
Chap12
String HASH FUNCTION using modulus arithmetic
/* Declarations for a chained hash table */
#define HASHSIZE 997
typedef char *Key_type;
typedef struct item_tag {
Key_type key;
} Item_type;
typedef struct node_tag {
Item_type info; /* Information to be stored in table*/
struct node_tag *next; /*next item in the linked list*/
} Node_type;
typedef Node_type *List_type;
typedef List_type Hashtable_type[HASHSIZE];
int Hash ( Key_type s)
{
int h = 0;
while (*s) /* loop through all the characters */
h += *s ++ ; /* Add the value of each to h */
return abs ( h % HASHSIZE ); /* return index into
hash table */
}
-6-
Chap12
-7-
COLLISION RESOLUTION WITH OPEN ADDRESSING
LINEAR PROBING - starts with the hash address and searches a circular
array sequentially for the target key or an empty position. Major drawback:
There is a tendency toward clustering.
77
89
14
94
QUADRATIC PROBING - If there is a collision at the hash address h,
quadratic probing goes to locations h+1, H=4, h+9 ......
2
that is, at locations h + i ( % HASHSIZE ) for i = 1, 2 .
Note: If HASHSIZE is prime than the total number of distinct positions
that will be probed is exactly (HASHSIZE +1)/2.
If that many probes have been made ->>> we have overflow.
KEY-DEPENDENT INCREMENTS - Let the increment depend on the
quotient of the same division that calculates the remainder or let the
increment be a function of the key such a truncating the key to a single
character and use its code as the increment. (ex: increment = *key); The
increment remains a constant, and if HASHSIZE is a prime, the probes
will step through all entries in the hashtable and overflow will not occur
until array is completely full.
RANDOM PROBING - The generator should always generate the same
sequence provided it starts with the same seed. The seed can be specified as
some function of the key. This method avoids clustering, but is slower
than others.
With any of the above methods, deletions are difficult.
Chap12
-8-
COLLISION RESOLUTION BY CHAINING with separate lists
bucket
0
1
2
3
n-1
ADVANTAGES OF LINKED STORAGE:
1. Collision resolution becomes easy.
2. Overflow only occurs if system is out of memory.
3. Deletion is straight forward.
4. Space saving when the records are large or the table is not
nearly full.
DISADVANTAGE OF LINKED STORAGE:
1. Extra space for linking records is needed. If records are small the
extra space could be substantial.
Chap12
-9-
Function Objects
A function object is an object of a class that behaves like a function.
These objects can be created, stored and destroyed like any other object and can have
associated data members and operations.
template <typename T>
class functionObject
{
public
returnType operator ( ) (arguments) const
{
// use arguments to create a return value
…….
return returnValue;
}
…..
};
Example functionObject greaterThan
template <typename T>
class greaterThan
{
public
bool operator ( ) (const T& x, const T& y) const
{
// use arguments to create a return value
return x > y ;
}
};
The expression greaterThan<T> defines a type whose objects act like a
function that compares two values of type T.
greaterThan<int> f; //object f of type greaterThan<int>
int a, b;
cin >> a >> b;
if ( f ( a, b ) ) //evaluates to f.operator() (a,b)
cout << a << “ > “ << b << endl;
else
cout << a << “ <= “ b << endl;
// File: prg12_1.cpp
Chap12
- 10 -
Program 12-1
//
//
//
//
//
//
//
//
//
//
the program demonstrates the use of function object types.
it declares the function object types greaterThan and lessThan,
whose objects evaluate the operators > and < respectively.
a modified version of the insertion sort takes a second
template argument that corresponds to a function object
type. the function object is used to order elements. in
this way, the function can sort a vector in either ascending
or descending order. the program declares a vector and calls
insertionSort() to order the values both ways. in each case,
writeVector() outputs the sorted values
#include <iostream>
#include <vector>
#include "d_util.h"
// for writeVector()
using namespace std;
// objects of type greaterThan<T> evaluate x > y
template<typename T>
class greaterThan
{
public:
bool operator() (const T& x, const T& y) const
{
return x > y;
}
};
// objects of type lessThan<T> evaluate x < y
template<typename T>
class lessThan
{
public:
bool operator() (const T& x, const T& y) const
{
return x < y;
}
};
Chap12
- 11 -
// use the insertion sort to order v using function object comp
template <typename T, typename Compare>
void insertionSort(vector<T>& v, Compare comp);
int main()
{
int arr[] = {2, 1, 7, 8, 12, 15, 3, 5};
int arrSize = sizeof(arr)/sizeof(int);
vector<int> v(arr, arr+arrSize);
// put the vector in ascending order
insertionSort(v, lessThan<int>());
// output it
writeVector(v);
cout << endl;
// put the vector in descending order
insertionSort(v, greaterThan<int>());
writeVector(v); // output it
cout << endl;
return 0;
}
template <typename T, typename Compare>
void insertionSort(vector<T>& v, Compare comp)
{
int i, j, n = v.size();
T temp;
// place v[i] into the sublist v[0] ... v[i-1], 1 <= i <= n-1,
// so it is in the correct position
for (i = 1; i < n; i++)
{
// index j scans down list from v[i] looking for correct position
// to locate target. assigns it to v[j]
j = i;
temp = v[i];
// locate insertion point by scanning downward as long
// as comp(temp, v[j-1]) is true and we have not encountered
// the beginning of the list
while (j > 0 && comp(temp, v[j-1]))
{
// shift elements up list to make room for insertion
v[j] = v[j-1];
j--;
}
// the location is found; insert temp
v[j] = temp;
}
}
/*
Run:
1 2 3 5 7 8 12 15
15 12 8 7 5 3 2 1
*/#ifdef __BORLANDC__
Chap12
- 12 Hash Function Objects
#ifndef HASH_FUNCTIONS
#define HASH_FUNCTIONS
#include <string>
#include <cmath>
using namespace std;
class hFintID
{
public:
unsigned int operator()(int item) const
{ return (unsigned)item; }
};
class hFint
{
public:
unsigned int operator()(int item) const
{
unsigned int value = (unsigned int)item;
value *= value;
// square the value
value /= 256;
// discard the low order 8 bits
return value % 65536;
// return result in range 0 to 65535
}
};
class hFreal
{
public:
unsigned int operator()(double item) const {
int exp;
double mant;
unsigned int hashval;
if (item == 0)
hashval = 0;
else {
mant = frexp(item,&exp);
hashval = (unsigned int)((2 * fabs(mant) -1) *
(unsigned int)~0);
}
return hashval;
}
};
class hFstring
{
public:
unsigned int operator()(const string& item) const {
unsigned int prime = 2049982463;
int n = 0, i;
for (i = 0; i < item.length(); i++)
n = n*8 + item[i];
return n > 0 ? (n % prime) : (-n % prime);
}
};
#endif
// HASH_FUNCTIONS
Chap12
- 13 -
#ifndef HASH_CLASS
#define HASH_CLASS
#include
#include
#include
#include
<iostream>
<vector>
<list>
<utility>
#include "d_except.h"
using namespace std;
template <typename T, typename HashFunc>
class hash
{
public:
#include "d_hiter.h"
// hash table iterator nested classes
hash(int nbuckets, const HashFunc& hfunc = HashFunc());
// constructor specifying the number of buckets in
// the hash table and the hash function
hash(T *first, T *last, int nbuckets,
const HashFunc& hfunc = HashFunc());
// constructor with arguments including a pointer range
// [first, last) of values to insert, the number of
// buckets in the hash table, and the hash function
bool empty() const;
// is the hash table empty?
int size() const;
// return number of elements in the hash table
iterator find(const T& item);
const_iterator find(const T& item) const;
// return an iterator pointing at item if it is in the
// table; otherwise, return end()
pair<iterator,bool> insert(const T& item);
// if item is not in the table, insert it and
// return a pair whose iterator component points
// at item and whose bool component is true. if item
// is in the table, return a pair whose iterator
// component points at the existing item and whose
// bool component is false
// Postcondition: the table size increases by 1 if item
// is not in the table
int erase(const T& item);
// if item is in the table, erase it and return 1;
// otherwise, return 0
// Postcondition: the table size decreases by 1 if
// item is in the table
void erase(iterator pos);
// erase the item pointed to by pos.
Chap12
- 14 // Precondition: the table is not empty and pos points
// to an item in the table. if the table is empty, the
// function throws the underflowError exception. if the
// iterator is invalid, the function throws the
// referenceError exception.
// Postcondition: the tree size decreases by 1
void erase(iterator first, iterator last);
// erase all items in the range [first, last).
// Precondition: the table is not empty. if the table
// is empty, the function throws the underflowError
// exception.
// Postcondition: the size of the table decreases by
// the number of elements in the range [first, last)
iterator begin();
// return an iterator positioned at the start of the
// hash table
const_iterator begin() const;
// constant version
iterator end();
// return an iterator positioned past the last
// element of the hash table
const_iterator end() const;
// constant version
private:
int numBuckets;
// number of buckets in the table
vector<list<T> > bucket;
// the hash table is a vector of lists
HashFunc hf;
// hash function
int hashtableSize;
// number of elements in the hash table
};
Chap12
- 15 -
// constructor. create an empty hash table
template <typename T, typename HashFunc>
hash<T, HashFunc>::hash(int nbuckets, const HashFunc& hfunc):
numBuckets(nbuckets), bucket(nbuckets), hf(hfunc),
hashtableSize(0) { }
// constructor. initialize table from pointer range [first, last)
template <typename T, typename HashFunc>
hash<T, HashFunc>::hash(T *first, T *last, int nbuckets,
const HashFunc& hfunc): numBuckets(nbuckets), bucket(nbuckets),
hf(hfunc), hashtableSize(0)
{
T *p = first;
while (p != last)
{
insert(*p);
p++;
}
}
template <typename T, typename HashFunc>
bool hash<T, HashFunc>::empty() const
{
return hashtableSize == 0;
}
template <typename T, typename HashFunc>
int hash<T, HashFunc>::size() const
{
return hashtableSize;
}
template <typename T, typename HashFunc>
hash<T, HashFunc>::iterator hash<T, HashFunc>::find(const T& item)
{
// hashIndex is the bucket number (index of the linked list)
int hashIndex = int(hf(item) % numBuckets);
// use alias for bucket[hashIndex] to avoid indexing
list<T>& myBucket = bucket[hashIndex];
// use to traverse the list bucket[hashIndex]
list<T>::iterator bucketIter;
// returned if we find item
// traverse list and look for a match with item
bucketIter = myBucket.begin();
while(bucketIter != myBucket.end())
{
// if locate item, return an iterator positioned in
// bucket hashIndex at location bucketIter
if (*bucketIter == item)
return iterator(this, hashIndex, bucketIter);
bucketIter++;
}
// return iterator positioned at the end of the hash table
return end();
}
Chap12
- 16 -
template <typename T, typename HashFunc>
hash<T, HashFunc>::const_iterator
hash<T, HashFunc>::find(const T& item) const
{
// hashIndex is the bucket number (index of the linked list)
int hashIndex = int(hf(item) % numBuckets);
// use alias for bucket[hashIndex] to avoid indexing
const list<T>& myBucket = bucket[hashIndex];
// use to traverse the list bucket[hashIndex]
list<T>::const_iterator bucketIter;
// returned if we find item
// traverse list and look for a match with item
bucketIter = myBucket.begin();
while(bucketIter != myBucket.end())
{
// if locate item, return an iterator positioned in
// bucket hashIndex at location bucketIter
if (*bucketIter == item)
return const_iterator(this, hashIndex, bucketIter);
bucketIter++;
}
// return iterator positioned at the end of the hash table
return end();
}
template <typename T, typename HashFunc>
pair<hash<T, HashFunc>::iterator,bool>
hash<T, HashFunc>::insert(const T& item)
{
// hashIndex is the bucket number
int hashIndex = int(hf(item) % numBuckets);
// for convenience, make myBucket an alias for bucket[hashIndex]
list<T>& myBucket = bucket[hashIndex];
// use iterator to traverse the list myBucket
list<T>::iterator bucketIter;
// specifies whether or not we do an insert
bool success;
// traverse list until we arrive at the end of
// the bucket or find a match with item
bucketIter = myBucket.begin();
while (bucketIter != myBucket.end())
if (*bucketIter == item)
break;
else
bucketIter++;
if (bucketIter == myBucket.end())
{
// at the end of the list, so item is not
// in the hash table. call list class insert()
// and assign its return value to bucketIter
bucketIter = myBucket.insert(bucketIter, item);
Chap12
- 17 success = true;
// increment the hash table size
hashtableSize++;
}
else
// item is in the hash table. duplicates not allowed.
// no insertion
success = false;
// return a pair with iterator pointing at the new or
// pre-existing item and success reflecting whether
// an insert took place
return pair<iterator,bool>
(iterator(this, hashIndex, bucketIter),
success);
}
template <typename T, typename HashFunc>
void hash<T, HashFunc>::erase(iterator pos)
{
if (hashtableSize == 0)
throw underflowError("hash erase(pos): hash table empty");
if (pos.currentBucket == -1)
throw referenceError("hash erase(pos): invalid iterator");
// go to the bucket (list object) and erase the list item
// at pos.currentLoc
bucket[pos.currentBucket].erase(pos.currentLoc);
}
template <typename T, typename HashFunc>
void hash<T, HashFunc>::erase(hash<T, HashFunc>::iterator first,
hash<T,
HashFunc>::iterator last)
{
if (hashtableSize == 0)
throw underflowError("hash erase(first,last): hash table
empty");
// call erase(pos) for each item in the range
while (first != last)
erase(first++);
}
template <typename T, typename HashFunc>
int hash<T, HashFunc>::erase(const T& item)
{
iterator iter;
int numberErased = 1;
iter = find(item);
if (iter != end())
erase(iter);
else
numberErased = 0;
return numberErased;
}
Chap12
- 18 -
template <typename T, typename HashFunc>
hash<T, HashFunc>::iterator hash<T, HashFunc>::begin()
{
hash<T, HashFunc>::iterator tmp;
tmp.hashTable = this;
tmp.currentBucket = -1;
// start at index -1 + 1 = 0 and search for a non-empty
// list
tmp.findNext();
return tmp;
}
template <typename T, typename HashFunc>
hash<T, HashFunc>::const_iterator hash<T, HashFunc>::begin() const
{
hash<T, HashFunc>::const_iterator tmp;
tmp.hashTable = this;
tmp.currentBucket = -1;
// start at index -1 + 1 = 0 and search for a non-empty
// list
tmp.findNext();
return tmp;
}
template <typename T, typename HashFunc>
hash<T, HashFunc>::iterator hash<T, HashFunc>::end()
{
hash<T, HashFunc>::iterator tmp;
tmp.hashTable = this;
// currentBucket of -1 means we are at end of the table
tmp.currentBucket = -1;
return tmp;
}
template <typename T, typename HashFunc>
hash<T, HashFunc>::const_iterator hash<T, HashFunc>::end() const
{
hash<T, HashFunc>::const_iterator tmp;
tmp.hashTable = this;
// currentBucket of -1 means we are at end of the table
tmp.currentBucket = -1;
return tmp;
}
#endif
// HASH_CLASS
Chap12
//
//
//
//
//
//
//
//
//
- 19 -
File: prg12_2.cpp
the program declares a hash table with integer data and
the identity hash function object. it inserts the elements
from the array intArr into the hash table, noting which
values are duplicates that do not go into the table. after
displaying the size of the hash table, a loop prompts the user
for 2 values. if a value is in the table, the erase() operation
deletes it. the program terminates by using an iterator to
traverse and output the elements of the hash table
#include <iostream>
#include "d_hash.h"
#include "d_hashf.h"
using namespace std;
int main()
{
// array that holds 10 integers with some duplicates
int intArr[] = {20, 16, 9, 14, 8, 17, 3, 9, 16, 12};
int arrSize = sizeof(intArr)/sizeof(int);
// alias describing integer hash table using identity function
// object
typedef hash<int, hFintID> hashTable;
// hash table with 7 buckets and hash iterator
hashTable ht(7);
hashTable::iterator hIter;
// <iterator,bool> pair for the insert operation
pair<hashTable::iterator, bool> p;
int item, i;
// insert elements from intArr, noting duplicates
for (i = 0; i < arrSize; i++)
{
p = ht.insert(intArr[i]);
if (p.second == false)
cout << "Duplicate value " << intArr[i] << endl;
}
// output the hash size which reflects duplicates
cout << "Hash table size " << ht.size() << endl;
// prompt for item to erase and indicate if not found
for (i = 1; i <= 2; i++)
{
cout << "Enter a number to delete: ";
cin >> item;
if ((hIter = ht.find(item)) == ht.end())
cout << "Item not found" << endl;
else
ht.erase(hIter);
Chap12
}
// output the elements using an iterator to scan the table
for (hIter = ht.begin(); hIter != ht.end(); hIter++)
cout << *hIter << " ";
cout << endl;
return 0;
}
/*
Run:
Duplicate value 9
Duplicate value 16
Hash table size 8
Enter a number to delete: 10
Item not found
Enter a number to delete: 17
14 8 16 9 3 12 20
*/
- 20 -
Download