Uploaded by Dhineshkumar CSE

UNIT 5 -DataStructures

UNIT V
SEARCHING, SORTING AND HASHING TECHNIQUES 9
Searching – Linear Search – Binary Search. Sorting – Bubble sort – Selection sort – Insertion
sort – Shell sort –. Merge Sort – Hashing – Hash Functions – Separate Chaining – Open
Addressing – Rehashing – Extendible Hashing.
SEARCHING
Searching is the process of finding some particular element in the list. If the element is
present in the list, then the process is called successful, and the process returns the location of
that element; otherwise, the search is called unsuccessful.
Based on the type of search operation, these algorithms are generally classified into two
categories:
1. Sequential Search: In this, the list or array is traversed sequentially and every element
is checked. For example: Linear Search.
2. Interval Search: These algorithms are specifically designed for searching in sorted
data-structures. These type of searching algorithms are much more efficient than Linear
Search as they repeatedly target the center of the search structure and divide the search
space in half. For Example: Binary Search.
LINEAR SEARCH
Linear search, also called as sequential search, is a very simple method used for searching an
array for a particular value. It works by comparing the value to be searched with every
element of the array one by one in a sequence until a match is found. If the match is found,
then the location of the item is returned; otherwise, the algorithm returns NULL. Linear
search is mostly used to search an unordered list of elements (array in which data elements
are not sorted).
The worst-case time complexity of linear search is O(n)
The steps used in the implementation of Linear Search are listed as follows o
First, we have to traverse the array elements using a for loop.
o
In each iteration of for loop, compare the search element with the current array
element, and -
o
o
If the element matches, then return the index of the corresponding array
element.
o
If the element does not match, then move to the next element.
If there is no match or the search element is not present in the given array, return -1.
ALGORITHM
Linear_Search(a, n, val) // 'a' is the given array, 'n' is the size of given array, 'val' is the value t
o search
Step 1: set pos = -1
Step 2: set i = 1
Step 3: repeat step 4 while i <= n
Step 4: if a[i] == val
set pos = i
print pos
go to step 6
[end of if]
set ii = i + 1
[end of loop]
Step 5: if pos = -1
print "value is not present in the array "
[end of if]
Step 6: exit
In Steps 1 and 2 of the algorithm, we initialize the value of POS and I. In Step 3, a while loop
is executed that would be executed till I is less than N (total number of elements in the array).
In Step 4, a check is made to see if a match is found between the current array element and
VAL. If a match is found, then the position of the array element is printed, else the value of I
is incremented to match the next element with VAL. However, if all the array elements have
been compared with VAL and no match is found, then it means that VAL is not present in the
array.
WORKING OF LINEAR SEARCH
Now, let's see the working of the linear search Algorithm.
To understand the working of linear search algorithm, let's take an unsorted array. It will be
easy to understand the working of linear search with an example.
Let the elements of array are -
Let the element to be searched is K = 41
Now, start from the first element and compare K with each element of the array.
The value of K, i.e., 41, is not matched with the first element of the array. So, move to the
next element. And follow the same process until the respective element is found.
Now, the element to be searched is found. So algorithm will return the index of the element
matched.
LINEAR SEARCH COMPLEXITY
1. Time Complexity
o
Best Case Complexity - In Linear search, best case occurs when the element we are
finding is at the first position of the array. The best-case time complexity of linear
search is O(1).
o
o
Average Case Complexity - The average case time complexity of linear search
is O(n).
Worst Case Complexity - In Linear search, the worst case occurs when the element
we are looking is present at the end of the array. The worst-case in linear search could
be when the target element is not present in the given array, and we have to traverse
the entire array. The worst-case time complexity of linear search is O(n).
The time complexity of linear search is O(n) because every element in the array is compared
only once.
2. Space Complexity
o
The space complexity of linear search is O(1).
PROGRAM
#include<stdio.h>
#include<conio.h>
void main(){
int list[20],size,i,sElement;
printf("Enter size of the list: ");
scanf("%d",&size);
printf("Enter any %d integer values: ",size);
for(i = 0; i < size; i++)
scanf("%d",&list[i]);
printf("Enter the element to be Search: ");
scanf("%d",&sElement);
// Linear Search Logic
for(i = 0; i < size; i++)
{
if(sElement == list[i])
{
printf("Element is found at %d index", i);
break;
}
}
if(i == size)
printf("Given element is not found in the list!!!");
getch();
}
Output
Enter size of the list: 3
Enter any 3 integer values: 35
98
12
Enter the element to be Search: 12
Element is found at 2 index
Linear Search Applications
1. For searching operations in smaller arrays (<100 items).
BINARY SEARCH
Binary search algorithm finds a given element in a list of elements with O(log n) time
complexity where n is total number of elements in the list. The binary search algorithm can
be used with only a sorted list of elements. That means the binary search is used only with a
list of elements that are already arranged in an order. The binary search can not be used for a
list of elements arranged in random order. This search process starts comparing the search
element with the middle element in the list. If both are matched, then the result is "element
found". Otherwise, we check whether the search element is smaller or larger than the middle
element in the list. If the search element is smaller, then we repeat the same process for the
left sublist of the middle element. If the search element is larger, then we repeat the same
process for the right sublist of the middle element. We repeat this process until we find the
search element in the list or until we left with a sublist of only one element. And if that
element also doesn't match with the search element, then the result is "Element not found in
the list".
ALGORITHM
Step 1 - Read the search element from the user.
Step 2 - Find the middle element in the sorted list.
Step 3 - Compare the search element with the middle element in the sorted list.
Step 4 - If both are matched, then display "Given element is found!!!" and terminate the
function.
Step 5 - If both are not matched, then check whether the search element is smaller or larger
than the middle element.
Step 6 - If the search element is smaller than middle element, repeat steps 2, 3, 4 and 5 for the
left sublist of the middle element.
Step 7 - If the search element is larger than middle element, repeat steps 2, 3, 4 and 5 for the
right sublist of the middle element.
Step 8 - Repeat the same process until we find the search element in the list or until sublist
contains only one element.
Step 9 - If that element also doesn't match with the search element, then display "Element is
not found in the list!!!" and terminate the function.
Working of Binary search
Now, let's see the working of the Binary Search Algorithm.
To understand the working of the Binary search algorithm, let's take a sorted array. It will be
easy to understand the working of Binary search with an example.
There are two methods to implement the binary search algorithm o
Iterative method
o
Recursive method
The recursive method of binary search follows the divide and conquer approach.
Let the elements of array are -
Let the element to search is, K = 56
We have to use the below formula to calculate the mid of the array mid = (beg + end)/2
So, in the given array -
beg = 0
end = 8
mid = (0 + 8)/2 = 4. So, 4 is the mid of the array.
Now, the element to search is found. So algorithm will return the index of the element
matched.
Binary Search complexity
1. Time Complexity
o
Best Case Complexity - In Binary search, best case occurs when the element to
search is found in first comparison, i.e., when the first middle element itself is the
element to be searched. The best-case time complexity of Binary search is O(1).
o
Average Case Complexity - The average case time complexity of Binary search
is O(logn).
o
Worst Case Complexity - In Binary search, the worst case occurs, when we have to
keep reducing the search space till it has only one element. The worst-case time
complexity of Binary search is O(logn).
2. Space Complexity
o The space complexity of binary search is O(1).
PROGRAM
#include<stdio.h>
#include<conio.h>
void main()
{ int first, last, middle, size, i, sElement, list[100];
clrscr();
printf("Enter the size of the list: ");
scanf("%d",&size);
printf("Enter %d integer values in Assending order\n", size);
for (i = 0; i < size; i++)
scanf("%d",&list[i]);
printf("Enter value to be search: ");
scanf("%d", &sElement);
first = 0;
last = size - 1;
middle = (first+last)/2;
while (first <= last) {
if (list[middle] < sElement)
first = middle + 1;
else if (list[middle] == sElement) {
printf("Element found at index %d.\n",middle);
break;
}
else
last = middle - 1;
middle = (first + last)/2;
}
if (first > last)
printf("Element Not found in the list.");
getch(); }
OUTPUT
Enter the size of the list: 5
Enter 5 integer values in Ascending order
13579
Enter value to be search: 3
Element found at index 1.
EXAMPLE
Binary Search Applications
In libraries of Java, .Net, C++ STL
While debugging, the binary search is used to pinpoint the place where the error happens.
SORTING
Sorting Algorithms are methods of reorganizing a large number of items into some specific
order such as highest to lowest, or vice-versa, or even in some alphabetical order.
These algorithms take an input list, processes it (i.e, performs some operations on it) and
produce the sorted list.
EXAMPLES OF SORTING IN REAL-LIFE SCENARIOS:
• Telephone Directory
• Dictionary
TYPES:
•
stable sorting.
•
NOT stable sorting.
STABLE SORTING:
If a sorting algorithm, after sorting the contents, does not change the sequence of similar
content in which they appear, it is called stable sorting.
NOT STABLE SORTING:
If a sorting algorithm, after sorting the contents, changes the sequence of similar content in
which they appear, it is called unstable sorting.
EXAMPLE FOR STABLE & UNSTABLE:
TYPES OF SORTING:
i. Internal sorting
ii. External sorting
Internal Sorting:
If all the data that is to be sorted can be adjusted at a time in the main memory, the
internal
sorting method is being performed.
External Sorting:
When the data that is to be sorted cannot be accommodated in the memory at the same time
and some has to be kept in auxiliary memory such as hard disk, floppy disk, magnetic tapes
etc, then external sorting methods are performed.
Complexity of Sorting Algorithms
The complexity of sorting algorithm calculates the running time of a function in which 'n'
number of items are to be sorted. The most noteworthy of these considerations are:
•
The length of time spent by the programmer in programming a specific sorting program
•
Amount of machine time necessary for running the program
•
The amount of memory necessary for running the program
•
The Efficiency of Sorting Techniques
The Efficiency of Sorting Techniques
●
To get the amount of time required to sort an array of 'n' elements by a particular
method, the normal approach is to analyze the method to find the number of comparisons
(or exchanges) required by it.
●
Most of the sorting techniques are data sensitive, and so the metrics for them depends
on the order in which they appear in an input array.
●
Various sorting techniques are analysed, that are
Best case
Worst case
Average case
TYPES OF SORTING ALGORITHM
1.
2.
3.
4.
5.
6.
7.
8.
Quick Sort
Bubble Sort
Merge Sort
Insertion Sort
Selection Sort
Heap Sort
Radix Sort
Bucket Sort
Time Complexities of Sorting Algorithms:
Algorithm
Best
Average
Worst
Algorithm
Best
Average
Worst
Quick Sort
Ω(n log(n))
Θ(n log(n))
O(n^2)
Bubble Sort
Ω(n)
Θ(n^2)
O(n^2)
Merge Sort
Ω(n log(n))
Θ(n log(n))
O(n log(n))
Insertion Sort
Ω(n)
Θ(n^2)
O(n^2)
Selection Sort
Ω(n^2)
Θ(n^2)
O(n^2)
Heap Sort
Ω(n log(n))
Θ(n log(n))
O(n log(n))
Radix Sort
Ω(nk)
Θ(nk)
O(nk)
Bucket Sort
Ω(n+k)
Θ(n+k)
O(n^2)
BUBBLE SORT
Bubble sort is a simple sorting algorithm. This sorting algorithm is comparison-based
algorithm in which each pair of adjacent elements is compared and the elements are swapped
if they are not in order. This algorithm is not suitable for large data sets as its average and
worst case complexity are of Ο(n2) where n is the number of items.
PROGRAM
#include <stdio.h>
void bubble_sort(long [], long);
int main()
{
long array[100], n, c, d, swap;
printf("Enter Elements\n");
scanf("%ld", &n);
printf("Enter %ld integers\n", n);
for (c = 0; c < n; c++)
scanf("%ld", &array[c]);
bubble_sort(array, n);
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%ld\n", array[c]);
return 0;
}
void bubble_sort(long list[], long n)
{
long c, d, t;
for (c = 0 ; c < ( n - 1 ); c++)
{
for (d = 0 ; d < n - c - 1; d++)
{
if (list[d] > list[d+1])
{
/* Swapping */
t
= list[d];
list[d] = list[d+1];
list[d+1] = t;
}
}
}
}
OUTPUT
Enter Elements
6
Enter 6 integers
15
95
45
65
72
25
Sorted list in ascending order:
15
25
45
65
72
95
SELECTION SORT
Selection sort is a simple sorting algorithm which finds the smallest element in the array and
exchanges it with the element in the first position. Then finds the second smallest element
and exchanges it with the element in the second position and continues until the entire array
is sorted.
Following is a pictorial depiction of the entire sorting process –
Program for Selection Sort
#include <stdio.h>
int main()
{
int array[100], n, c, d, position, swap;
printf("Enter number of elements\n");
scanf("%d", &n);
printf("Enter %d integers\n", n);
for ( c = 0 ; c < n ; c++ )
scanf("%d", &array[c]);
for ( c = 0 ; c < ( n - 1 ) ; c++ )
{
position = c;
for ( d = c + 1 ; d < n ; d++ )
{
if ( array[position] > array[d] )
position = d;
}
if ( position != c )
{
swap = array[c];
array[c] = array[position];
array[position] = swap;
}
}
printf("Sorted list in ascending order:\n");
for ( c = 0 ; c < n ; c++ )
printf("%d\n", array[c]);
return 0;
}
OUTPUT
Enter number of elements
5
Enter 5 integers
25
62
8
52
11
Sorted list in ascending order:
8
11
25
52
62
INSERTION SORT
What is Insertion Sort Algorithm?
Insertion sort is slightly different from the other sorting algorithms. It is based on the idea
that each element in the array is consumed in each iteration to find its right position in the
sorted array such that the entire array is sorted at the end of all the iterations.
In other words,it compares the current element with the elements on the left-hand side
(sorted array).If the current element is greater than all the elements on its left hand side, then
it leaves the element in its place and moves on to the next element. Else it finds its correct
position and moves it to that position by shifting all the elements, which are larger than the
current element, in the sorted array to one position ahead.

The above diagram represents how insertion sort works. Insertion sort works like the way
we sort playing cards in our hands. It always starts with the second element as key. The key
is compared with the elements ahead of it and is put it in the right place.

In the above figure, 40 has nothing before it. Element 10 is compared to 40 and is inserted
before 40. Element 9 is smaller than 40 and 10, so it is inserted before 10 and this operation
continues until the array is sorted in ascending order.
Program
#include <stdio.h>
int main()
{
int n, array[1000], c, d, t;
printf("Enter number of elements\n");
scanf("%d", &n);
printf("Enter %d integers\n", n);
for (c = 0; c < n; c++)
{
scanf("%d", &array[c]);
}
for (c = 1 ; c <= n - 1; c++)
{
d = c;
while ( d > 0 && array[d] < array[d-1])
{0
t
= array[d];
array[d] = array[d-1];
array[d-1] = t;
d--;
}
}
printf("Sorted list in ascending order:\n");
for (c = 0; c <= n - 1; c++)
{
printf("%d\n", array[c]);
}
return 0;
}
Example 2
SHELL SORT
What is Shell Sort Algorithm?
Shellsort, also known as Shell sort or Shell’s method, is an in-place comparison sort. It can
either be seen as a generalization of sorting by exchange (bubble sort) or sorting by insertion
(insertion sort). Worst case time complexity is O(n2) and best case complexity is O(nlog(n)).
Shell sort algorithm is very similar to that of the Insertion sort algorithm. In case of Insertion
sort, we move elements one position ahead to insert an element at its correct position.
Whereas here, Shell sort starts by sorting pairs of elements far apart from each other, then
progressively reducing the gap between elements to be compared. Starting with far apart
elements, it can move some out-of-place elements into the position faster than a simple
nearest-neighbor exchange.
Shelling out the Shell Sort Algorithm with Examples
Example 1
Here is an example to help you understand the working of Shell sort on array of elements
name A = {17, 3, 9, 1, 8}
Example 2
Program
#include<stdio.h>
int main()
{
int n, i, j, temp, gap;
scanf("%d", &n);
int arr[n];
for(i = 0; i < n; i++)
{
scanf("%d", &arr[i]);
}
for (gap = n/2; gap > 0; gap = gap / 2)
{
// Do a gapped insertion sort
// The first gap elements arr[0..gap-1] are already in gapped order
// keep adding one more element until the entire array is gap sorted
for (i = gap; i < n; i = i + 1)
{
// add arr[i] to the elements that have been gap sorted
// save arr[i] in temp and make a empty space at index i
int temp = arr[i];
// shift earlier gap-sorted elements up until the correct location for arr[i] is found
for (j = i; j >= gap && arr[j - gap] > temp; j = j - gap)
arr[j] = arr[j - gap];
// put temp (the original arr[i]) in its correct position
arr[j] = temp;
}
}
for(i = 0; i < n; i++)
{
printf("%d ", arr[i]);
}
}
OUTPUT
N=5
1
6
45
12
20
1 6 12 20 45
MERGE SORT
Merge sort is a divide-and-conquer algorithm based on the idea of breaking down a list into
several sub-lists until each sublist consists of a single element and merging those sublists in a
manner that results into a sorted list.
Idea:



Divide the unsorted list into N sublists, each containing 1 element.
Take adjacent pairs of two singleton lists and merge them to form a list of 2
elements. N will now convert into N/2 lists of size 2.
Repeat the process till a single sorted list of obtained.
While comparing two sublists for merging, the first element of both lists is taken into
consideration. While sorting in ascending order, the element that is of a lesser value becomes
a new element of the sorted list. This procedure is repeated until both the smaller sublists are
empty and the new combined sublist comprises all the elements of both the sublists.
DIVIDE AND CONQUER STRATEGY
Using the Divide and Conquer technique, we divide a problem into subproblems. When the
solution to each subproblem is ready, we 'combine' the results from the subproblems to solve
the main problem.
Suppose we had to sort an array A. A subproblem would be to sort a sub-section of this array
starting at index p and ending at index r, denoted as A[p..r].
Divide
If q is the half-way point between p and r, then we can split the subarray A[p..r] into two
arrays A[p..q] and A[q+1, r].
Conquer
In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r]. If we haven't yet
reached the base case, we again divide both these subarrays and try to sort them.
Combine
When
the
conquer
step
reaches
the
base
step
and
we
get
two
sorted
subarrays A[p..q] and A[q+1, r] for array A[p..r], we combine the results by creating a sorted
array A[p..r] from two sorted subarrays A[p..q] and A[q+1, r].
The merge Step of Merge Sort
Every recursive algorithm is dependent on a base case and the ability to combine the results
from base cases. Merge sort is no different. The most important part of the merge sort
algorithm is, you guessed it, merge step.
The merge step is the solution to the simple problem of merging two sorted lists(arrays) to
build one large sorted list(array).
The algorithm maintains three pointers, one for each of the two arrays and one for
maintaining the current index of the final sorted array.
Step 1: Create duplicate copies of sub-arrays to be sorted
// Create L ← A[p..q] and M ← A[q+1..r]
int n1 = q - p + 1 = 3 - 0 + 1 = 4;
int n2 = r - q = 5 - 3 = 2;
int L[4], M[2];
for (int i = 0; i < 4; i++)
L[i] = arr[p + i];
// L[0,1,2,3] = A[0,1,2,3] = [1,5,10,12]
for (int j = 0; j < 2; j++)
M[j] = arr[q + 1 + j];
// M[0,1] = A[4,5] = [6,9]
Step 2: Maintain current index of sub-arrays and main array
int
i =
j =
k =
i, j, k;
0;
0;
p;
Step 3: Until we reach the end of either L or M, pick larger among elements L and M
and place them in the correct position at A[p..r]
while (i < n1 && j < n2) {
if (L[i] <= M[j]) {
arr[k] = L[i]; i++;
}
else {
arr[k] = M[j];
j++;
}
k++;
}
Comparing individual elements of sorted subarrays until we reach end of one.
Step 4: When we run out of elements in either L or M, pick up the remaining elements
and put in A[p..r]
// We exited the earlier loop because j < n2 doesn't hold
while (i < n1)
{
arr[k] = L[i];
i++;
k++;
}
Copy the remaining elements from the first array to main subarray
// We exited the earlier loop because i < n1 doesn't hold
while (j < n2)
{
arr[k] = M[j];
j++;
k++;
}
}
This step would have been needed if the size of M was greater than L.
At the end of the merge function, the subarray A[p..r] is sorted.
PROGRAM:
// Maintain current index of sub-arrays and main array
int i, j, k;
i = 0;
j = 0;
k = p;
// Until we reach either end of either L or M, pick larger among
// elements L and M and place them in the correct position at A[p..r]
while (i < n1 && j < n2) {
if (L[i] <= M[j]) {
arr[k] = L[i];
i++;
} else {
arr[k] = M[j];
j++;
}
k++;
}
// When we
// pick up
while (i <
arr[k]
i++;
k++;
}
run out of elements in either L or M,
the remaining elements and put in A[p..r]
n1) {
= L[i];
while (j < n2) {
arr[k] = M[j];
j++;
k++;
}
}
Merge Sort Complexity
Time Complexity
Best
-O(n*log n)
Worst
-O(n*log n)
Average
-O(n*log n)
Space Complexity
-O(n)
Stability
-Yes
Merge Sort Applications

Inversion count problem

External sorting

E-commerce applications
From the image above, at each step a list of size M is being divided into 2 sublists of
size M/2, until no further division can be done. To understand better, consider a smaller
array A containing the elements (9,7,8).
At the first step this list of size 3 is divided into 2 sublists the first consisting of
elements (9,7) and the second one being (8). Now, the first list consisting of elements (9,7) is
further divided into 2 sublists consisting of elements (9) and (7) respectively.
As no further breakdown of this list can be done, as each sublist consists of a
maximum of 1 element, we now start to merge these lists. The 2 sub-lists formed in the last
step are then merged together in sorted order using the procedure mentioned above leading to
a new list (7,9). Backtracking further, we then need to merge the list consisting of
element (8) too with this list, leading to the new sorted list (7,8,9).
HASHING
Hashing is a technique that is used to store, retrieve and find data in the
data structure called Hash Table. It is used to overcome the drawback of
Linear Search (Comparison) & Binary Search (Sorted order list). It involves
two important concepts Hash Table
 Hash Function
Hash table
 A hash table is a data structure that is used to store and retrieve data
(keys) veryquickly.
 It is an array of some fixed size, containing the keys. Hash table run from 0 to Tablesize –
1.
 Each key is mapped into some number in the range 0 to Tablesize – 1. This mapping is
called Hash function.
 Insertion of the data in the hash table is based on the key value obtained from the hash
function.
 Using same hash key value, the data can be retrieved from the hash table by fewor more
Hash key comparison.

The load factor of a hash table is calculated using the formula:
(Number of data elements in the hash table) / (Size of the hash table)
Factors affecting Hash Table Design
Hash
function
Table size.
Collision handling scheme
0
1
2
3
.
Simple Hash table with table size = 10
.
8
9
Hash function:

It is a function, which distributes the keys evenly among the cells
in the HashTable.

Using the same hash function we can retrieve data from the
hash table.Hash function is used to implement hash table.

The integer value returned by the hash function is called hash key.

If the input keys are integer, the commonly used hash function is
H ( key ) = key % Tablesize
A simple hash function
typedef unsigned int index;
index Hash ( const char *key , int Tablesize )
{
unsigned int Hashval = 0 ;
while ( * key ! = „ \0 „ )
Hashval + = * key ++ ;
return ( Hashval % Tablesize ) ;
}
Types of Hash Functions
1. Division Method
2. Mid Square Method
3. Multiplicative Hash Function
4. Digit Folding
1. Division Method:
It depends on the remainder of division.
Divisor is Table Size.
Formula is ( H ( key ) = key % table size )
E.g. consider the following data or record or key (36, 18, 72, 43, 6) table size = 8
2. Mid Square Method:
We first square the item, and then extract some portion of the resulting digits. For example, if
the item were 44, we would first compute 442=1,936. Extract the middle two digit 93 from
the answer. Store the key 44 in the index 93.
3. Multiplicative Hash Function:
Key is multiplied by some constant value.
Hash function is given by,
H(key)=Floor (P * ( key * A ))
P = Integer constant [e.g. P=50]
A = Constant real number [A=0.61803398987],suggested by Donald Knuth to use this
constant
E.g. Key 107 H(107)=Floor(50*(107*0.61803398987))
=Floor(3306.481845)
H(107)=3306
4.Digit Folding Method:
The folding method for constructing hash functions begins by dividing the item into equalsize pieces (the last piece may not be of equal size). These pieces are then added together to
give the resulting hash key value.
For example, if our item was the phone number 436-555- 4601, we would take the
digits and divide them into groups of 2 (43, 65, 55, 46, 01). After the addition,
43+65+55+46+01, we get 210. If we assume our hash table has 11 slots, then we need to
perform the extra step of dividing by 11 and keeping the remainder. In this case 210 % 11 is
1, so the phone number 436-555-4601 hashes to slot 1.
Collision
If two more keys hashes to the same index, the corresponding records cannot be
stored in the same location. This condition is known as collision.
Characteristics of Good Hashing Function:
●
It should be Simple to compute.
● Number of Collision should be less while placing record in Hash Table.
●
Hash function with no collision è Perfect hash function.
●
Hash Function should produce keys which are distributed uniformly in hash table.
The hash function should depend upon every bit of the key. Thus the hash function
that simply extracts the portion of a key is not suitable.
Collision Resolution Strategies / Techniques (CRT):
Obviously, two records cannot be stored in the same location. Therefore, a method used to
solve the problem of collision, also called collision resolution technique, is applied. The
two most popular methods of resolving collisions are:
● Separate chaining (Open Hashing)
● Open addressing. (Closed Hashing)
1.LinearProbing
2.Quadratic
3.ProbingDouble hashing
SEPARATE CHAINING
In chaining, each location in a hash table stores a pointer to a linked list that contains all the
key values that were hashed to that location. That is, location l in the hash table points to the
head of the linked list of all the key values that hashed to l. However, if no key value hashes
to l, then location l in the hash table contains NULL.
Figure , shows how the key values are mapped to a location in the hash table and stored in a
linked list that corresponds to that location.
Searching for a value in a chained hash table is as simple as scanning a linked list for an
entry with the given key. Insertion operation appends the key to the end of the linked list
pointed by the hashed location. Deleting a key requires searching the list and removing the
element.
Chained hash tables with linked lists are widely used due to the simplicity of the algorithms
to insert, delete, and search a key. The code for these algorithms is exactly the same as that
for inserting, deleting, and searching a value in a single linked list.
While the cost of inserting a key in a chained hash table is O(1), the cost of deleting and
searching a value is given as O(m) where m is the number of elements in the list of that
location. Searching and deleting takes more time because these operations scan the entries
of the selected location for the desired key.
In the worst case, searching a value may take a running time of O(n), where n is the number
of key values stored in the chained hash table. This case arises when all the key values are
inserted into the linked list of the same location (of the hash table). In this case, the hash
table is ineffective.
Codes to initialize, insert, delete, and search a value in a chained hash table
Struture of the node
typedef struct node_HT
{
int value;
struct node *next;
}node;
Code to initialize a chained hash table
/* Initializes m location in the
chainedhash table.
The operation takes a running time of
O(m) */
void initializeHashTable (node
*hash_ta-ble[], int m)
{
int i;
for(i=0i<=m;i++)
hash_table[i]=N
ULL;
Code to search a value
/* The element is searched in the
linked list whose pointer to its head is
stored in the location given by h(k). If
search is successful, the function
returns a pointer to the node in the
linked list; otherwiseit returns NULL.
The worst case running time of the
search operation is given as order of
size of the linked list. */
node *search_value(node
*hash_table[],int val)
{
node *ptr;
ptr = hash_table[h(x)];
while ( (ptr!=NULL) && (ptr –>
value
!= val))
ptr = ptr –>
next;if (ptr–>value
== val)
Code to insert a value
/* The element is inserted at the
beginning of the linked list whose pointer
to its head is stored in the location given
by h(k). The run- ning time of the insert
operation is O(1), as the new key value is
always added as the firstelement of the list
irrespective of the size of the linked list as
well as that of the chainedhash table. */
node *insert_value( node *hash_table[],
int val)
{
node *new_node;
new_node = (node
*)malloc(sizeof(node));
new_node value = val; new_node next
= hash_table[h(x)];
hash_table[h(x)] = new_node;
}
Code to delete a value
/* To delete a node from the linked list whose
head is stored at the location given by h(k) in
the hash table, we need to know the address of
the node’s predecessor. We do this using a
pointer save. The running time complexity of
the delete operation is same as that of the
search operation because we need to search the
predecessor of the node so that the node can
be removed without affecting other nodes in the
list. */
void delete_value (node *hash_table[], int val)
{
node *save, *ptr;
save = NULL;
ptr = hash_table[h(x)];
while ((ptr != NULL) && (ptr value != val))
{
save = ptr; ptr =
ptr next;
}
if (ptr != NULL)
return ptr;
{
else
save next = ptr next;
free (ptr);
return NULL;
}
else
printf("\n VALUE NOT FOUND");
}
}
Example
Insert the following four keys 22 84 35 62 into hash table of size 10 using separate chaining.
The hash function is
H(key) = key % 10
1. H(22) = 22 % 10 =2
2. 84 % 10 = 4
3.H(35)=35%10=5
4. H(62)=62%10=2
Pros and Cons:
-The main advantage of using a chained hash table is that
it remains effective even when the number of key values to be stored is much higher
than the number of locations in the hash table.
-However, with the increase in the number of keys to be stored, the performance of a chained
hash table does degrade gradually (linearly). For example, a chained hash table with 1000
memory locations and 10,000 stored keys will give 5 to 10 times less performance as
compared to a chained hash table with 10,000 locations. But a chained hash table is still 1000
times faster than a simple hash table.
-The other advantage of using chaining for collision resolution is that its performance,
unlike quadratic probing, does not degrade when the table is more than half full. This technique is
absolutely free from clustering problems and thus provides an efficient mechanism to handle
collisions.
-However, chained hash tables inherit the disadvantages of linked lists. First, to store a key
value, the space overhead of the next pointer in each entry can be significant. Second,
traversing a linked list has poor cache performance, making the processor cache ineffective.
Advantages
1. More number of elements can be inserted using array of Link List
Disadvantages
1. It requires more pointers, which occupies more memory space.
2. Search takes time. Since it takes time to evaluate Hash Function and also to traverse the
List
OPEN ADDRESSING
Closed Hashing
Collision resolution technique
Uses Hi(X)=(Hash(X)+F(i))mod Tablesize
When collision occurs, alternative cells are tried until empty cells are found.
Types:• Linear Probing
• Quadratic Probing
• Double Hashing
Hash function
 H(key) = key % table size.
Insert Operation
 To insert a key; Use the hash function to identify the list to which the element should be
inserted.
 Then traverse the list to check whether the element is already present.
 If exists, increment the count.
 Else the new element is placed at the front of the list.
LINEAR PROBING
Easiest method to handle collision.
Apply the hash function H (key) = key % table size
Hi(X)=(Hash(X)+F(i))mod Tablesize,where F(i)=i.
How to Probing:
first probe – given a key k, hash to H(key)
second probe – if H(key)+f(1) is occupied, try H(key)+f(2)
And so forth.
Probing Properties:
We force f(0)=0
The ith probe is to (H (key) +f (i)) %table size.
If i reach size-1, the probe has failed.
Depending on f (i), the probe may fail sooner.
Long sequences of probe are costly.
Probe Sequence is:
H (key) % table size
H (key)+1 % Table size
H (Key)+2 % Table size
1. H(Key)=Key mod Tablesize
This is the common formula that you should apply for
any hashingIf collocation occurs use Formula 2
2. H(Key)=(H(key)+i) Tablesize
Where i=1, 2, 3, …… etc
Consider a hash table of size 10. Using linear probing, insert the keys 72, 27, 36, 24, 63, 81,
92, and 101 into the table.
Let h¢(k) = k mod m, m = 10
Initially, the hash table can be given as:
0
–1
Step 1
1
–1
2
–1
3
–1
4
–1
5
–1
6
–1
7
–1
8
–1
9
–1
Key = 72
h(72, 0) = (72 mod 10 + 0) mod 10
= (2) mod 10
=2
Since T[2] is vacant, insert key 72 at this location.
0
–1
Step 2
1
–1
2
3
–1
72
4
–1
5
–1
6
–1
7
–1
8
–1
9
–1
Key = 27
h(27, 0) = (27 mod 10 + 0) mod 10
= (7) mod 10
=7
Since T[7] is vacant, insert key 27 at this location.
0
–1
Step 3
1
–1
Key = 36
2
72
3
–1
4
–1
5
–1
6
–1
7
27
8
–1
9
–1
h(36, 0) = (36 mod 10 + 0) mod 10
= (6) mod 10
=6
Since T[6] is vacant, insert key 36 at this location.
0
1
–1
–1
Step 4
2
3
–1
72
4
–1
5
–1
6
36
7
8
–1
27
9
–1
Key = 24
h(24, 0) = (24 mod 10 + 0) mod 10
= (4) mod 10
=4
Since T[4] is vacant, insert key 24 at this location.
0
–1
1
–1
Step 5
2
3
–1
72
4
5
–1
24
6
36
7
8
–1
27
9
–1
Key = 63
h(63, 0) = (63 mod 10 + 0) mod 10
= (3) mod 10
=3
Since T[3] is vacant, insert key 63 at this location.
0
–1
1
–1
Step 6
2
3
63
72
4
5
–1
24
6
36
7
8
–1
27
9
–1
Key = 81
h(81, 0) = (81 mod 10 + 0) mod 10
= (1) mod 10
=1
Since T[1] is vacant, insert key 81 at this location.
0
0
1
81
2
72
3
63
4
24
5
–1
6
36
7
27
8
–1
9
–1
Step 7
Key = 92
h(92, 0) = (92 mod 10 + 0) mod 10
= (2) mod 10
=2
Now T[2] is occupied, so we cannot store the key 92 in T[2]. Therefore, try again for the
next location. Thus probe, i = 1, this time.
Key = 92
h(92, 1) = (92 mod 10 + 1) mod 10
= (2 + 1) mod 10
=3
Now T[3] is occupied, so we cannot store the key 92 in T[3]. Therefore, try again for the
next location. Thus probe, i = 2, this time.
Key = 92
h(92, 2) = (92 mod 10 + 2) mod 10
= (2 + 2) mod 10
=4
Now T[4] is occupied, so we cannot store the key 92 in T[4]. Therefore, try again for the
next location. Thus probe, i = 3, this time.
Key = 92
h(92, 3) = (92 mod 10 + 3) mod 10
= (2 + 3) mod 10
=5
Since T[5] is vacant, insert key 92 at this location.
0
–1
Step 8
1
81
2
72
3
63
Key = 101
h(101, 0) = (101 mod 10 + 0) mod 10
= (1) mod 10
=1
4
24
5
92
6
36
7
27
8
–1
9
–1
Now T[1] is occupied, so we cannot store the key 101 in T[1]. Therefore, try again for the
next location. Thus probe, i = 1, this time.
Key = 101
h(101, 1) = (101 mod 10 + 1) mod 10
= (1 + 1) mod 10
=2
T[2] is also occupied, so we cannot store the key in this location. The procedure will be
repeated until the hash function generates the address of location 8 which is vacant and can
be used to store the value in it.
Pros and Cons
Linear probing finds an empty location by doing a linear search in the array beginning from
position h(k). Although the algorithm provides good memory caching through good locality
of reference, the drawback of this algorithm is that it results in clustering, and thus there is a
higher risk of more collisions where one collision has already taken place. The performance
of linear probing is sensitive to the distribution of input values.
As the hash table fills, clusters of consecutive cells are formed and the time required for a
search increases with the size of the cluster. In addition to this, when a new value has to be
inserted into the table at a position which is already occupied, that value is inserted at the end
of the cluster, which again increases the length of the cluster. Generally, an insertion is made
between two clusters that are separated by one vacant location. But with linear probing,
there are more chances that subsequent insertions will also end up in one of the clusters,
thereby potentially increasing the cluster length by an amount much greater than one.
More the number of collisions, higher the probes that are required to find a free location and
lesser is the performance. This phenomenon is called primary clustering. To avoid primary
clustering, other techniques such as quadratic probingand double hashing are used
QUADRATIC PROBING
To resolve the primary clustering problem, quadratic probing can be used.
With quadratic probing, rather than always moving one spot, move i2 spots from
the point of collision, wherei is the number of attempts to resolve the collision.
Another collision resolution method which distributes items more evenly.
 From the original index H, if the slot is filled, try cells H+12, H+22, H+32,.., H + i2
withwrap-around.

Hi(X)=(Hash(X)+F(i))mod Tablesize,F(i)=i2

Hi(X)=(Hash(X)+ i2)mod Tablesize
Example:
Consider a hash table of size 10. Using quadratic probing, insert the keys 72, 27, 36, 24, 63,
81, and 101 into the table. Take c = 1 and c = 3.
Solution
Let h¢(k) = k mod m, m = 10
Initially, the hash table can be given as:
0
–1
We have,
1
–1
2
–1
h(k, i) = [h¢(k) + c i + c i2] mod m
1
2
Step 1
Key = 72
h(72, 0) = [72 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [72 mod 10] mod 10
= 2 mod 10
3
–1
4
–1
5
–1
6
–1
7
–1
8
–1
9
–1
=2
Since T[2] is vacant, insert the key 72 in T[2]. The hash table now becomes:
0
–1
Step 2
1
–1
2
3
–1
72
4
–1
5
–1
6
–1
7
–1
8
–1
9
–1
Key = 27
h(27, 0) = [27 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [27 mod 10] mod 10
= 7 mod 10
=7
Since T[7] is vacant, insert the key 27 in T[7]. The hash table now becomes:
0
–1
Step 3
1
–1
2
3
–1
72
4
–1
5
–1
6
–1
7
8
–1
27
9
–1
Key = 36
h(36, 0) = [36 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [36 mod 10] mod 10
= 6 mod 10
=6
Since T[6] is vacant, insert the key 36 in T[6]. The hash table now becomes:
01
–1
Step 4
–1
2
3
–1
72
4
–1
5
–1
6
36
7
8
–1
27
9
–1
Key = 24
h(24, 0) = [24 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [24 mod 10] mod 10
= 4 mod 10
=4
Since T[4] is vacant, insert the key 24 in T[4]. The hash table now becomes:
0
1
2
3
4
5
6
7
89
–1
Step 5
–1
–1
72
–1
24
36
–1
27
–1
Key = 63
h(63, 0) = [63 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [63 mod 10] mod 10
= 3 mod 10
=3
Since T[3] is vacant, insert the key 63 in T[3]. The hash table now becomes:
0 1
–1
Step 6
–1
2
3
63
72
4
5
–1
24
6
36
7
8
–1
27
9
–1
Key = 81
h(81,0) = [81 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [81 mod 10] mod 10
= 81 mod 10
=1
Since T[1] is vacant, insert the key 81 in T[1]. The hash table now becomes:
01
–1
Step 7
81
2
72
3
63
4
24
5
–1
6
36
7
27
8
–1
9
–1
Key = 101
h(101,0) = [101 mod 10 + 1 ¥ 0 + 3 ¥ 0] mod 10
= [101 mod 10 + 0] mod 10
= 1 mod 10
=1
Since T[1] is already occupied, the key 101 cannot be stored in T[1]. Therefore, try again for
next location. Thus probe, i = 1, this time.
Key = 101
h(101,0) = [101 mod 10 + 1 ¥ 1 + 3 ¥ 1] mod 10
= [101 mod 10 + 1 + 3] mod 10
= [101 mod 10 + 4] mod 10
= [1 + 4] mod 10
= 5 mod 10
=5
Since T[5] is vacant, insert the key 101 in T[5]. The hash table now becomes:
01
–1
81
2
72
3
63
4
24
5
101
6
36
7
27
8
–1
9
–1
Pros and Cons
Quadratic probing resolves the primary clustering problem that exists in the linear probing
technique. Quadratic probing provides good memory caching because it preserves some
locality of reference. But linear probing does this task better and gives a better cache
performance.
One of the major drawbacks of quadratic probing is that a sequence of successive probes
may only explore a fraction of the table, and this fraction may be quite small. If this happens,
then we will not be able to find an empty location in the table despite the fact that the table is
by no means full. In Example 15.6 try to insert the key 92 and you will encounter this
problem.
Although quadratic probing is free from primary clustering, it is still liable to what is
known as secondary clustering. It means that if there is a collision between two keys, then
the same probe sequence will be followed for both. With quadratic probing, the probability for
multiple collisions increases as the table becomes full. This situation is usually encountered
when the hash table is more than full.
Quadratic probing is widely applied in the Berkeley Fast File System to allocate free blocks.
Limitation: at most half of the table can be used as alternative locations to resolve collisions.
This means that once the table is more than half full, it's difficult to find an
empty spot. This new problem is known as secondary clustering because
elements that hash to the same hash key will always probe the same alternative
cells.
DOUBLE HASHING
Double hashing uses the idea of applying a second hash function to the key when a collision
occurs. The result of the second hash function will be the number of positions forms the point
of collision to insert.
There are a couple of requirements for the second function:
It must never evaluate to 0 must make sure that all cells can be probed.
Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE is size of hash table.
(We repeat by increasing i when collision occurs)
First hash function is typically hash1(key) = key % TABLE_SIZE
A popular second hash function is : hash2(key) = PRIME – (key % PRIME) where PRIME
is a prime smaller than the TABLE_SIZE.
Example 1
Example 2
Example 3
REHASHING
When the hash table becomes nearly full, the number of collisions increases, thereby
degrading the performance of insertion and search operations. In such cases, a better option
is to create a new hash table with size double of the original hash table.
All the entries in the original hash table will then have to be moved to the new hash table.
This is done by taking each entry, computing its new hash value, and then inserting it in the
new hash table.
Though rehashing seems to be a simple process, it is quite expensive and must therefore
not be done frequently.
Advantage:
A programmer doesn‟t worry about table system.Simple to implement
Can be used in other data structure as well
The new size of the hash table:
should also be prime
will be used to calculate the new insertion spot (hence the name rehashing)
This is a very expensive operation! O(N) since there are N elements to rehash and the table
size is roughly 2N. This is ok though since it doesn't happen that often.
The question becomes when should the rehashing be applied?
Some possible answers:
once the table becomes half full
once an insertion fails
once a specific load factor has been reached, where load factor is the ratio
of thenumber of elements in the hash table to the table size
How Rehashing is done?
Rehashing can be done as follows:




For each addition of a new entry to the map, check the load factor.
If it’s greater than its pre-defined value (or default value of 0.75 if not given), then
Rehash.
For Rehash, make a new array of double the previous size and make it the new
bucketarray.
Then traverse to each element in the old bucketArray and call the insert() for each so as
to insert it into the new larger bucket array.
Consider the hash table of size 5 given below. The hash function used is h(x)
= x % 5. Rehash the entries into to a new hash table.
0
1
2
3 4
26
31
43
17
Note that the new hash table is of 10 locations, double the size of the original table.
01
2
3
4
5
6
7
8
9
Now, rehash the key values from the old hash table into the new one using hash function—h(x)
= x % 10.
01
31
2
3
4
43
5
6
26
7
8
9
17
EXTENDIBLE HASHING

Extendible Hashing is a mechanism for altering the size of the hash table to
accommodatenew entries when buckets overflow.

Common strategy in internal hashing is to double the hash table and rehash each entry.
However, this technique is slow, because writing all pages to disk is too expensive.

Therefore, instead of doubling the whole hash table, we use a directory of pointers to
buckets, and double the number of buckets by doubling the directory, splitting just thebucket
that overflows.

Since the directory is much smaller than the file, doubling it is much cheaper. Only
onepage of keys and pointers is split
Extendible Hashing is a dynamic hashing method wherein directories, and buckets are
used to hash data. It is an aggressively flexible method in which the hash function also
experiences dynamic changes.
Main features of Extendible Hashing: The main features in this hashing technique are:


Directories: The directories store addresses of the buckets in pointers. An id is assigned
to each directory which may change each time when Directory Expansion takes place.
Buckets: The buckets are used to hash the actual data.
Basic Structure of Extendible Hashing:
Frequently used terms in Extendible Hashing:





Directories: These containers store pointers to buckets. Each directory is given a
unique id which may change each time when expansion takes place. The hash function
returns this directory id which is used to navigate to the appropriate bucket. Number of
Directories = 2^Global Depth.
Buckets: They store the hashed keys. Directories point to buckets. A bucket may
contain more than one pointers to it if its local depth is less than the global depth.
Global Depth: It is associated with the Directories. They denote the number of bits
which are used by the hash function to categorize the keys. Global Depth = Number of
bits in directory id.
Local Depth: It is the same as that of Global Depth except for the fact that Local Depth
is associated with the buckets and not the directories. Local depth in accordance with
the global depth is used to decide the action that to be performed in case an overflow
occurs. Local Depth is always less than or equal to the Global Depth.
Bucket Splitting: When the number of elements in a bucket exceeds a particular size,
then the bucket is split into two parts.

Directory Expansion: Directory Expansion Takes place when a bucket overflows.
Directory Expansion is performed when the local depth of the overflowing bucket is
equal to the global depth.
Basic Working of Extendible Hashing:







Step 1 – Analyze Data Elements: Data elements may exist in various forms eg.
Integer, String, Float, etc.. Currently, let us consider data elements of type integer. eg:
49.
Step 2 – Convert into binary format: Convert the data element in Binary form. For
string elements, consider the ASCII equivalent integer of the starting character and then
convert the integer into binary form. Since we have 49 as our data element, its binary
form is 110001.
Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hashdirectory is 3.
Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the
binary number and match it to the directory id.
Eg. The binary obtained is: 110001 and the global-depth is 3. So, the hash function will
return 3 LSBs of 110001 viz. 001.
Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with
directory-id 001.
Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket
overflows. If an overflow is encountered, go to step 7 followed by Step 8, otherwise, go
to step 9.
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while
inserting data in the buckets, it might happen that the Bucket overflows. In such cases,
we need to follow an appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one
of the cases below.
 Case1: If the local depth of the overflowing Bucket is equal to the global
depth, then Directory Expansion, as well as Bucket Split, needs to be
performed. Then increment the global depth and the local depth value by 1.



And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash
structure.
Case2: In case the local depth is less than the global depth, then only Bucket
Split takes place. Then increment only the local depth value by 1. And,
assign appropriate pointers.
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the
overflowing bucket that is split are rehashed w.r.t the new global depth of the directory.
Step 9 – The element is successfully hashed.
Example based on Extendible Hashing: Now, let us consider a prominent example of
hashing the following elements: 16,4,6,22,24,10,31,7,9,20,26.
Bucket Size: 3 (Assume)
Hash Function: Suppose the global depth is X. Then the Hash Function returns X LSBs.

Solution: First, calculate the binary forms of each of the given numbers.
16- 10000
4- 00100
6- 00110
22- 10110
24- 11000
10- 01010
31- 11111
7- 00111
9- 01001
20- 10100
26- 11010

Initially, the global-depth and local-depth is always 1. Thus, the hashing frame looks
like this:

Inserting 16:
The binary format of 16 is 10000 and global-depth is 1. The hash function returns 1
LSB of 10000 which is 0. Hence, 16 is mapped to the directory with id=0.

Inserting 4 and 6:
Both 4(100) and 6(110)have 0 in their LSB. Hence, they are hashed as follows:

Inserting 22: The binary form of 22 is 10110. Its LSB is 0. The bucket pointed by
directory 0 is already full. Hence, Over Flow occurs.
As directed by Step 7-Case 1, Since Local Depth = Global Depth, the bucket splits and
directory expansion takes place. Also, rehashing of numbers present in the overflowing
bucket takes place after the split. And, since the global depth is incremented by 1, now,the
global depth is 2. Hence, 16,4,6,22 are now rehashed w.r.t 2 LSBs.[
16(10000),4(100),6(110),22(10110) ]
*Notice that the bucket which was underflow has remained untouched. But, since the
number of directories has doubled, we now have 2 directories 01 and 11 pointing to the
same bucket. This is because the local-depth of the bucket has remained 1. And, any bucket
having a local depth less than the global depth is pointed-to by more than one directories.

Inserting 24 and 10: 24(11000) and 10 (1010) can be hashed based on directories with
id 00 and 10. Here, we encounter no overflow condition.

Inserting 31,7,9: All of these elements[ 31(11111), 7(111), 9(1001) ] have either 01 or
11 in their LSBs. Hence, they are mapped on the bucket pointed out by 01 and 11. We
do not encounter any overflow condition here.

Inserting 20: Insertion of data element 20 (10100) will again cause the overflow
problem.
20 is inserted in bucket pointed out by 00. As directed by Step 7-Case 1, since
the local depth of the bucket = global-depth, directory expansion (doubling) takes place
along with bucket splitting. Elements present in overflowing bucket are rehashed with the
new global depth. Now, the new Hash table looks like this:

Inserting 26: Global depth is 3. Hence, 3 LSBs of 26(11010) are considered. Therefore
26 best fits in the bucket pointed out by directory 010.


The bucket overflows, and, as directed by Step 7-Case 2, since the local depth of
bucket < Global depth (2<3), directories are not doubled but, only the bucket is split
and elements are rehashed.
Finally, the output of hashing the given list of numbers is obtained.
Hashing of 11 Numbers is Thus Completed.
Key Observations:
1. A Bucket will have more than one pointers pointing to it if its local depth is less than
the global depth.
2. When overflow condition occurs in a bucket, all the entries in the bucket are rehashed
with a new local depth.
3. If Local Depth of the overflowing bucket
4. The size of a bucket cannot be changed after the data insertion process begins.
Advantages:
1. Data retrieval is less expensive (in terms of computing).
2. No problem of Data-loss since the storage capacity increases dynamically.
3. With dynamic changes in hashing function, associated old values are rehashed w.r.t the
new hash function.
Limitations Of Extendible Hashing:
1. The directory size may increase significantly if several records are hashed on the same
directory while keeping the record distribution non-uniform.
2. Size of every bucket is fixed.
3. Memory is wasted in pointers when the global depth and local depth difference
becomes drastic.
4. This method is complicated to code.