UNIT III SORTING AND SEARCHING Bubble sort – selection sort – insertion sort – merge sort – quick sort – analysis of sorting algorithms – linear search – binary search – hashing – hash functions – collision handling – load factors, rehashing, and efficiency SORTING: Sorting is arranging the given elements either in ascending order or descending order. A Sorting Algorithm is used to rearrange a given array or list of elements according to a comparison operator on the elements. By sorting data, it is easier to search through it quickly and easily. The simplest example of sorting is a dictionary. CATEGORIES OF SORTING: The techniques of sorting can be divided into two categories. These are: Internal Sorting External Sorting Internal sorting: If the input data is such that it can be adjusted in the main memory at once, it is called internal sorting. External sorting: If the input data is such that it cannot be adjusted in the memory entirely at once, it needs to be stored in a hard disk, floppy disk, or any other storage device. This is called external sorting. Different Sorting Algorithms Bubble Sort Selection Sort Insertion Sort Merge Sort Quicksort Counting Sort Radix Sort Bucket Sort Heap Sort Shell Sort Complexity of Sorting Algorithms The efficiency of any sorting algorithm is determined by the time complexity and space complexity of the algorithm. 1. Time Complexity: Time complexity refers to the time taken by an algorithm to complete its execution with respect to the size of the input. It can be represented in different forms: Big-O notation (O) Omega notation (Ω) Theta notation (Θ) 2. Space Complexity: Space complexity refers to the total amount of memory used by the algorithm for a complete execution. It includes both the auxiliary memory and the input. BUBBLE SORT Bubble sort is a sorting algorithm that compares two adjacent elements and swaps them until they are in the intended order. The bubble sort uses a straightforward logic that works by repeating swapping the adjacent elements if they are not in the right order. It compares one pair at a time and swaps if the first element is greater than the second element; otherwise, move further to the next pair of elements for comparison. Working of Bubble Sort Suppose we are trying to sort the elements in ascending order. First Iteration (Compare and Swap) 1. Starting from the first index, compare the first and the second elements. 2. If the first element is greater than the second element, they are swapped. 3. Now, compare the second and the third elements. Swap them if they are not in order. 4. The above process goes on until the last element Remaining Iteration 1. The same process goes on for the remaining iterations. 2. After each iteration, the largest element among the unsorted elements is placed at the end. Example We are creating a list of element, which stores the integer numbers list1 = [5, 3, 8, 6, 7, 2] Here the algorithm sort the elements First iteration [5, 3, 8, 6, 7, 2] here 5>3 then swap with each other. [3, 5, 8, 6, 7, 2] 5 < 8 then no swap taken place. [3, 5, 8, 6, 7, 2] 8>6 then swap with each other [3, 5, 6, 8, 7, 2] 8>7 then swap with each other [3, 5, 6, 7, 8, 2] 8>2 then swap with each other [3, 5, 6, 7, 2, 8] Here the first iteration is complete and we get the largest element at the end. Now we need to the len(list1) - 1 Second Iteration [3, 5, 6, 7, 2, 8] - > [3, 5, 6, 7, 2, 8] here, 3<5 then no swap taken place [3, 5, 6, 7, 2, 8] - > [3, 5, 6, 7, 2, 8] here, 5<6 then no swap taken place [3, 5, 6, 7, 2, 8] - > [3, 5, 6, 7, 2, 8] here, 6<7 then no swap taken place [3, 5, 6, 7, 2, 8] - > [3, 5, 6, 2, 7, 8] here 7>2 then swap their position. Now [3, 5, 6, 2, 7, 8] - > [3, 5, 6, 2, 7, 8] here 7<8 then no swap taken place. Third Iteration [3, 5, 6, 2, 7, 8] - > [3, 5, 6, 7, 2, 8] here, 3<5 then no swap taken place [3, 5, 6, 2, 7, 8] - > [3, 5, 6, 7, 2, 8] here, 5<6 then no swap taken place [3, 5, 6, 2, 7, 8] - > [3, 5, 2, 6, 7, 8] here, 6>2 then swap their positions [3, 5, 2, 6, 7, 8] - > [3, 5, 2, 6, 7, 8] here 6<7 then no swap taken place. Now [3, 5, 2, 6, 7, 8] - > [3, 5, 2, 6, 7, 8] here 7<8 then swap their position. Fourth Iteration [3, 5, 2, 6, 7, 8] - > [3, 5, 2, 6, 7, 8] here, 3<5 then no swap taken place [3, 5, 2, 6, 7, 8] - > [3, 2, 5, 6, 7, 8] here 5>2 then swap their position. [3, 2, 5, 6, 7, 8] - > [3, 2, 5, 6, 7, 8] here 5<6 then no swap taken place [3, 2, 5, 6, 7, 8] - > [3, 2, 5, 6, 7, 8] here 6<7 then no swap taken place [3, 2, 5, 6, 7, 8] - > [3, 2, 5, 6, 7, 8] here 7<8 then no swap taken place Fifth Iteration [3, 2, 5, 6, 7, 8] - > [2, 3, 5, 6, 7, 8]here 3>2 then swap their position. Bubble sort Example: def bsort(array): for i in range(len(array)-1): for j in range(0,len(array)-i-1): if(array[j]>array[j+1]): array[j],array[j+1]=array[j+1],array[j] list1=[5,3,8,6,7,2] print("The unsoted list is",list1) print("sorted list is",bsort(list1)) OUTPUT: The unsoted list is [5, 3, 8, 6, 7, 2] sorted list is [2, 3, 5, 6, 7, 8] Time Complexities Worst Case Complexity: O(n2) If we want to sort in ascending order and the array is in descending order then the worst case occurs. Best Case Complexity: O(n) If the array is already sorted, then there is no need for sorting. Average Case Complexity: O(n2) It occurs when the elements of the array are in jumbled order (neither ascending nor descending). 2. Space Complexity Space complexity is O(1) because an extra variable is used for swapping. SELECTION SORT Selection sort is a sorting algorithm that selects the smallest element from an unsorted list in each iteration and places that element at the beginning of the unsorted list. selection sort algorithm sorts an array by repeatedly finding the minimum element (considering ascending order) from unsorted part and putting it at the beginning. The algorithm maintains two subarrays in a given array. 1) The subarray which is already sorted. 2) Remaining subarray which is unsorted. In every iteration of selection sort, the minimum element (considering ascending order) from the unsorted subarray is picked and moved to the sorted subarray. Working of Selection Sort 1. Set the first element as minimum.Select first element as minimum 2. Compare minimum with the second element. If the second element is smaller than minimum, assign the second element as minimum. 3. Compare minimum with the third element. Again, if the third element is smaller, then assign minimum to the third element otherwise do nothing. The process goes on until the last element.Compare minimum with the remaining element 4. After each iteration, minimum is placed in the front of the unsorted list. 5. Swap the first with minimum 6. For each iteration, indexing starts from the first unsorted element. Step 1 to 3 are repeated until all the elements are placed at their correct positions. Selection Sort: def ssort(a): for i in range(len(a)): min=i for j in range(i+1,len(a)): if(a[j]<a[min]): min=j a[i],a[min]=a[min],a[i] list1=[8,3,9,5,1] print("The unsorted list of element are",list1) print("The sorted list of element are",ssort(list1)) OUTPUT: The unsorted list of element are [8, 3, 9, 5, 1] The sorted list of element are [1, 3, 5, 8, 9] Time Complexity Worst Case Complexity: O(n2) If we want to sort in ascending order and the array is in descending order then, the worst case occurs. Best Case Complexity: O(n2) It occurs when the array is already sorted Average Case Complexity: O(n2) It occurs when the elements of the array are in jumbled order (neither ascending nor descending). Space Complexity: Space complexity is O(1) because an extra variable temp is used. INSERTION SORT: Insertion sort is the simple method of sorting an array. In this technique, the array is virtually split into the sorted and unsorted part. An element from unsorted part is picked and is placed at correct position in the sorted part. Insertion sort works similarly as we sort cards in our hand in a card game. We assume that the first card is already sorted then, we select an unsorted card. If the unsorted card is greater than the card in hand, it is placed on the right otherwise, to the left. In the same way, other unsorted cards are taken and put in their right place. Working of Insertion Sort Suppose we need to sort the following array. Initial array The first element in the array is assumed to be sorted. Take the second element and store it separately in key. Compare key with the first element. If the first element is greater than key, then key is placed in front of the first element. If the first element is greater than key, then key is placed in front of the first element. Now, the first two elements are sorted. Take the third element and compare it with the elements on the left of it. Placed it just behind the element smaller than it. If there is no element smaller than it, then place it at the beginning of the array. Place 1 at the beginning Similarly, place every unsorted element at its correct position. Example Program: def isort(array): for i in range(1,len(array)): key=array[i] j=i-1 while(j>=0 and key<a[j]): a[j+1]=a[j] j-=1 a[j+1]=key list1=[2,6,8,4,1] print("The given list is",list1) print("The sorted list is",isort(list1)) OUTPUT: The given list is [2, 6, 8, 4, 1] The sorted list is [1, 2, 4, 6, 8] Insertion Sort Complexity Time Complexity Best O(n) Worst O(n2) Average O(n2) Space Complexity O(1) MERGE SORT Merge Sort is one of the most popular sorting algorithms that is based on the principle of Divide and Conquer Algorithm. Here, a problem is divided into multiple sub-problems. Each sub-problem is solved individually. Finally, sub-problems are combined to form the final solution. Divide and Conquer Strategy Using the Divide and Conquer technique, we divide a problem into sub problems. When the solution to each sub problem is ready, we 'combine' the results from the sub problems to solve the main problem. Suppose we had to sort an array A. A sub problem would be to sort a sub-section of this array starting at index p and ending at index r, denoted as A[p..r]. Divide If q is the half-way point between p and r, then we can split the subarray A[p..r] into two arrays A[p..q] and A[q+1, r]. Conquer In the conquer step, we try to sort both the subarrays A[p..q] and A[q+1, r]. If we haven't yet reached the base case, we again divide both these subarrays and try to sort them. Combine When the conquer step reaches the base step and we get two sorted subarrays A[p..q] and A[q+1, r] for array A[p..r], we combine the results by creating a sorted array A[p..r] from two sorted subarrays A[p..q] and A[q+1, r]. Merge Sort Algorithm The Merge Sort function repeatedly divides the array into two halves until we reach a stage where we try to perform Merge Sort on a subarray of size 1 i.e. p == r. After that, the merge function comes into play and combines the sorted arrays into larger arrays until the whole array is merged. To sort an entire array, we need to call MergeSort(A, 0, length(A)-1). As shown in the image below, the merge sort algorithm recursively divides the array into halves until we reach the base case of array with 1 element. After that, the merge function picks up the sorted sub-arrays and merges them to gradually sort the entire array. The merge Step of Merge Sort The merge step is the solution to the simple problem of merging two sorted lists(arrays) to build one large sorted list(array). The algorithm maintains three pointers, one for each of the two arrays and one for maintaining the current index of the final sorted array. The merge function works as follows: 1. Create copies of the subarrays L ← A[p..q] and M ← A[q+1..r]. 2. Create three pointers i, j and k a. i maintains current index of L, starting at 1 b. j maintains current index of M, starting at 1 c. k maintains the current index of A[p..q], starting at p. 3. Until we reach the end of either L or M, pick the larger among the elements from L and M and place them in the correct position at A[p..q] 4. When we run out of elements in either L or M, pick up the remaining elements and put in A[p..q] def mergesort(array): if(len(array)>1): q=len(array)//2 l=array[:q] r=array[q:] mergesort(l) mergesort(r) i=j=k=0 while(i<len(l) and j<len(r)): if(l[i]<r[j]): array[k]=l[i] i+=1 else: array[k]=r[j] j+=1 k+=1 while(i<len(l)): array[k]=l[i] i+=1 k+=1 while(j<len(r)): array[k]=r[j] j+=1 k+=1 def printlist(a): for i in range(len(a)): print(a[i]) array = [6, 5, 12, 10, 9, 1] print("The unsorted array is",array) mergesort(array) print("Sorted array is: ") printlist(array) OUTPUT: The unsorted array is [6, 5, 12, 10, 9, 1] Sorted array is: 1 5 6 9 10 12 QUICKSORT ALGORITHM Quicksort is a sorting algorithm based on the divide and conquer approach where An array is divided into subarrays by selecting a pivot element (element selected from the array). While dividing the array, the pivot element should be positioned in such a way that elements less than pivot are kept on the left side and elements greater than pivot are on the right side of the pivot. The left and right subarrays are also divided using the same approach. This process continues until each subarray contains a single element. At this point, elements are already sorted. Finally, elements are combined to form a sorted array. Divide: In Divide, first pick a pivot element. After that, partition or rearrange the array into two sub-arrays such that each element in the left sub-array is less than or equal to the pivot element and each element in the right sub-array is larger than the pivot element. Conquer: Recursively, sort two subarrays with Quicksort. Combine: Combine the already sorted array. Quicksort picks an element as pivot, and then it partitions the given array around the picked pivot element. In quick sort, a large array is divided into two arrays in which one holds values that are smaller than the specified value (Pivot), and another array holds the values that are greater than the pivot. After that, left and right sub-arrays are also partitioned using the same approach. It will continue until the single element remains in the sub-array. Working of Quick Sort Algorithm To understand the working of quick sort, let's take an unsorted array. It will make the concept more clear and understandable. Let the elements of array are - In the given array, we consider the leftmost element as pivot. So, in this case, a[left] = 24, a[right] = 27 and a[pivot] = 24. Since, pivot is at left, so algorithm starts from right and move towards left. Now, a[pivot] < a[right], so algorithm moves forward one position towards left, i.e. - Now, a[left] = 24, a[right] = 19, and a[pivot] = 24. Because, a[pivot] > a[right], so, algorithm will swap a[pivot] with a[right], and pivot moves to right, as - Now, a[left] = 19, a[right] = 24, and a[pivot] = 24. Since, pivot is at right, so algorithm starts from left and moves to right. As a[pivot] > a[left], so algorithm moves one position to right as - Now, a[left] = 9, a[right] = 24, and a[pivot] = 24. As a[pivot] > a[left], so algorithm moves one position to right as - Now, a[left] = 29, a[right] = 24, and a[pivot] = 24. As a[pivot] < a[left], so, swap a[pivot] and a[left], now pivot is at left, i.e. - Since, pivot is at left, so algorithm starts from right, and move to left. Now, a[left] = 24, a[right] = 29, and a[pivot] = 24. As a[pivot] < a[right], so algorithm moves one position to left, as - Now, a[pivot] = 24, a[left] = 24, and a[right] = 14. As a[pivot] > a[right], so, swap a[pivot] and a[right], now pivot is at right, i.e. - Now, a[pivot] = 24, a[left] = 14, and a[right] = 24. Pivot is at right, so the algorithm starts from left and move to right. Now, a[pivot] = 24, a[left] = 24, and a[right] = 24. So, pivot, left and right are pointing the same element. It represents the termination of procedure. Element 24, which is the pivot element is placed at its exact position. Elements that are right side of element 24 are greater than it, and the elements that are left side of element 24 are smaller than it. Now, in a similar manner, quick sort algorithm is separately applied to the left and right sub-arrays. After sorting gets done, the array will be - Quicksort complexity 1. Time Complexity Case Best Case Average Case Worst Case Time Complexity O(n*logn) O(n*logn) O(n2) 2. Space Complexity Space Complexity Example Program: O(n*logn) An array is split into smaller arrays using the divide-and-conquer strategy in the sorting algorithm known as Quicksort (item taken from the array). When splitting the given array, the pivot component should be set so that items of value greater than the pivot component are located on the right side, and components with a value less than the pivot value are stored on the left side. The same method is used to divide the right and left subarrays. This procedure is repeated until only one element remains in each subarray. The elements have already been sorted at this point. The combination of the items creates a sorted array at the end. def quicksort(number, first, last): if(first<last): pivot=first i=first+1 j=last while(i<j): while number[i]<number[pivot] and i<last: i=i+1 while number[j]>number[pivot] and j>pivot: j=j-1 if(i<j): number[i],number[j]=number[j],number[i] number[pivot],number[j]=number[j],number[pivot] quicksort(number,first,j-1) quicksort(number,j+1,last) data = [1, 7, 4, 1, 10, 9, 2] print("Unsorted Array") print(data) size = len(data) quickSort(data, 0, size - 1) print('Sorted Array in Ascending Order:') print(data) OUTPUT: Unsorted Array [1, 7, 4, 1, 10, 9, 2] Sorted Array in Ascending Order: [1, 1, 2, 4, 7, 9, 10] Let's see a complexity analysis of different sorting algorithms. Sorting Algorithm Time Complexity Best Time Complexity Worst Time Complexity - Average Space Complexity Bubble Sort n n2 n2 1 Selection Sort n2 n2 n2 1 Insertion Sort n n2 n2 1 Merge Sort nlog n nlog n nlog n n Quicksort nlog n n2 nlog n log n SEARCHING Searching is an operation or a technique that helps finds the place of a given element or value in the list. Any search is said to be successful or unsuccessful depending upon whether the element that is being searched is found or not. Some of the standard searching technique that is being followed in the data structure is listed below: Linear Search or Sequential Search Binary Search Linear Search Linear Search is defined as a sequential search algorithm that starts at one end and goes through each element of a list until the desired element is found, otherwise the search continues till the end of the data set. It is the easiest searching algorithm Working of Linear search Now, let's see the working of the linear search Algorithm. To understand the working of linear search algorithm, let's take an unsorted array. It will be easy to understand the working of linear search with an example. Let the elements of array are - Let the element to be searched is K = 41 Now, start from the first element and compare K with each element of the array. The value of K, i.e., 41, is not matched with the first element of the array. So, move to the next element. And follow the same process until the respective element is found. Now, the element to be searched is found. So algorithm will return the index of the element matched. Linear Search complexity Now, let's see the time complexity of linear search in the best case, average case, and worst case. We will also see the space complexity of linear search. Time Complexity Case Time Complexity Best Case O(1) Average Case O(n) Worst Case O(n) Space Complexity Space Complexity O(1) Example program: def linear_Search(list1, n, key): for i in range(0, n): if (list1[i] == key): return i return -1 list1 = [1 ,3, 5, 4, 7, 9] key = 3 n = len(list1) res = linear_Search(list1, n, key) if(res == -1): print("Element not found") else: print("Element found at index: ", res) OUTPUT: Element found at index: 2 BINARY SEARCH: The recursive method of binary search follows the divide and conquer approach. Let the elements of array are - Let the element to search is, K = 56 We have to use the below formula to calculate the mid of the array mid = (beg + end)/2 So, in the given array - beg = 0 end = 8 mid = (0 + 8)/2 = 4. So, 4 is the mid of the array. Now, the element to search is found. So algorithm will return the index of the element matched. Binary Search complexity Now, let's see the time complexity of Binary search in the best case, average case, and worst case. We will also see the space complexity of Binary search. Time Complexity Case Time Complexity Best Case O(1) Average Case O(logn) Worst Case O(logn) Space Complexity Space Complexity O(1) Example Program: def binary_search(list1, key): low = 0 high = len(list1) - 1 mid = 0 while low <= high: mid = (high + low) // 2 if list1[mid] < key: low = mid + 1 elif list1[mid] > key: high = mid - 1 else: return mid return -1 # Initial list1 list1 = [12, 24, 32, 39, 45, 50, 54] key = 45 # Function call result = binary_search(list1, key) if result != -1: print("Element is present ") else: print("Element is not present in list1") OUTPUT: Element is present. HASHING: Hashing is a technique or process of mapping keys, and values into the hash table by using a hash function. It is done for faster access to elements. The efficiency of mapping depends on the efficiency of the hash function used. Hash Table The Hash table data structure stores elements in key-value pairs where Key- unique integer that is used for indexing the values Value - data that are associated with keys. Key and Value in Hash table HASH FUNCTION: A Hash Function is a function that converts a given numeric or alphanumeric key to a small practical integer value. The mapped integer value is used as an index in the hash table. In simple terms, a hash function maps a significant number or string to a small integer that can be used as the index in the hash table. The pair is of the form (key, value), where for a given key, one can find a value using some kind of a “function” that maps keys to values. The key for a given object can be calculated using a function called a hash function. For example, given an array A, if iis the key, then we can find the value by simply looking up A[i]. Types of Hash functions Methods to find Hash Function: 1. Folding Method. 2. Mid-Square Method. 3. Pseudo random number generator method 4. Digit or Character extraction method 5. Modulo Method. 1. Folding method This method involves splitting keys into two or more parts each of which has the same length as the required address and then adding the parts to form the hash function. Two types 1. Fold shifting method 2. Fold boundary method 1.1 Fold shifting method key = 123203241 Partition key into 3 parts of equal length. 123, 203 & 241 Add these three parts 123+203+241 = 567 is the hash value 1.2 Fold boundary method Similar to fold shifting except the boundary parts are reversed 123203241 is the key 3 parts 123, 203, 241 Reverse the boundary partitions 321, 302, 142 321 + 302 + 142 = 666 is the hash value 2. Mid square method The key is squared and the middle part of the result is taken as the hash value based on the number or digits required for addressing. H(X) = middle digits of X² • For example: – Map the key 2453 into a hash table of size 1000. – Now X = 2453 and X² = 6017209 – Extracted middle value “172“ is the hash value 3. Digit or Character extraction method This method extracts the selected digits from the key Example: Map the key 123203241 to a hash table size of 1000 Select the digits from the positions 2 , 5, 8 . Now the hash value = 204 4. Pseudo random number generator method This method generates random number given a seed as parameter and the resulting random number then scaled into the possible address range using modulo division. The random number produced can be transformed to produce a hash value. 5. Modulo Method: 1. Computes hash value from key using the % operator. 2. H(key) = Key % Table_size If the input keys are integers then simply Key mod TableSize is a general strategy. For example: – Map the key 4 into a hash table of size 8 – H(4) = 4 % 4 = 0 COLLISION HANDLING: Collision is a Situation in which, the hash function returns the same hash key for more than one record/key value. The process of finding another position for the collide record is called Collision Resolution strategy. Example: 37, 24, 7 (table size n=5). There are various Collision Resolution Techniques: 1. 2. 3. 4. Separate Chaining/Open Hashing. Open Addressing/Closed Hashing. Rehashing. Extendible Hashing 1. OPEN HASHING/SEPARATE CHAINING It is an open hashing technique. A pointer field is added to each record location Each bucket in the hash table is the head of a linked list. All elements that hash to same value are linked together. The first strategy, commonly known as either open hashing, or separate chaining, is to keep a list of all elements that hash to the same value. For convenience, our lists have headers. A pointer field is added to each record location, when an overflow occurs, this pointer is set to overflow blocks making a Linked list. Example: 10, 11, 81, 7, 34, 94, 17, 29,89 n=10 Advantages: 1. More number of elements can be inserted than the table size. 2. Collision resolution is simple and efficient. Disadvantages: 1. It requires a pointer that occupies more space. 2. It takes more effort to perform search, since it takes time to evaluate the hash function and also to traverse the list 3. Elements are not evenly distributed. 2. OPEN ADDRESSING/CLOSED HASHING Open Addressing: It is a closed hashing technique. In this method, if collision occurs, alternative cells are tried until an empty cell is found. There are three common methods 1. Linear probing 2. Quadratic probing 3. Double hashing 2.1. Linear probing Probing is a process of getting next available hash table array cell. Whenever collision occurs, cells are searched sequentially for an empty cell. When 2 record/key value has the same hash key, then 2nd record is placed linearly down (wrap around), whenever the empty index is found. If Collision Occurs, new hash values is computed: Based on quadratic function: [Hi(x) = Hash(X) + i mod Tablesize] Example: To insert 42, 39, 69,21,71,55 to the hash table of size 10 using linear probing 1. H0(42) = 42 % 10 = 2 2. H0(39) = 39 %10 = 9 3. H0(69) = 69 % 10 = 9 collides with 39 H1(69) = (9+1) % 10 = 10 % 10 = 0 [Hi(x) = Hash(X) + i mod Tablesize] 4. H0(21) = 21 % 10 = 1 5. H0(71) = 71 % 10 = 1 collides with 21 H1(71) = (1 +1) % 10 = 2 % 10 = 2 collides with 42 H2(71) = (2 +1) % 10 = 3 % 10 = 3 H1(71) = (1 +1) % 10 = 2 % 10 = 2 collides with 42 H2(71) = (2 +1) % 10 = 3 % 10 = 3. 6.H0(55) = 55 % 10 = 5 //Closed hash table with linear probing, after each insertion Advantages 1. It doesn‟t require pointers 2. Suffers from primary clustering. • Disadvantages 1. It forms clusters that degrades the performance of the hash table 2. Increased search time. 2.2. QUADRATIC PROBING: Quadratic probing is a collision resolution method that eliminates the primary clustering problem of linear probing. (Primary clustering is one of two major failure modes of open addressing based hash tables, especially those using linear probing. Whenever another record is hashed to anywhere within the cluster, it grows in size by one cell.) If Collision Occurs, new hash values is computed: Based on quadratic function i.e., Hi(x) = Hash(X) + i² mod Tablesize Example: To insert 89,18, 49, 58, 69 to the hash table of size 10 using quadratic probing. 1. H0(89) = 89 %10 =9 2. H0(18) = 18 %10 =8 3. H0(49) = 49 %10 =9 collides with 89 H1(49) = (9+1 ) % 10 = 10 % 10 = 0 4. H0(58) = 58 %10 = 8 collides with 18 H1(58) = (8 +12) % 10 = 9 % 10 = 9 collides with 89 H2(58) = (8 + 2 ) % 10 = 12 % 10 = 2 5. H0(69) = 69 % 10 =9 collides with 89 H1(69) = (9 +1 ) % 10 = 10 % 10 = 0 collides with 49 H2(69) = (9 + 2 ) % 10 = 13 % 10 = 3 2 2 2 2 Disadvantages: It faces secondary clustering that is difficult to find the empty slot if the table is half full. Key will not be inserted evenly. 2.3. Double Hashing 1. It uses the idea of applying a second hash function to the key when a collision occurs. 2. The result of the second hash function will be the number of positions from the point of collision to insert. Hi(x) = (Hash(X) + i * Hash2(X) ) mod Tablesize A popular second hash function is Hash2(X) = R – (X % R) where R is a prime number Insert: 89, 18, 49, 58, 69 using Hash2(X) = R – (X % R) and R = 7 Here Hash(X) = X % 10 & Hash2(X) = 7 – (X % 7) 1. H0(89) = 89 % 10 = 9 2. H0(18) = 18 % 10 = 8 3. H0(49) = 49 % 10 = 9 collides with 89 H1(49) = ((49 % 10 ) + 1 * (7- (49 % 7)) ) % 10 = 16 % 10 = 6 4. H0(58) = 58 % 10 = 8 collides with 18 H1(58) = ((58 % 10 ) + 1 * (7- (58 % 7)) ) % 10 = 13 % 10 = 3 5. 5. H0(69) = 69 % 10 = 9 collides with 89 H1(69) = ((69 % 10 ) + 1 * (7- (69 % 7)) ) % 10 = 10 % 10 = 0. S.No. Separate Chaining Open Addressing 1. Chaining is Simpler to implement. Open Addressing requires more computation. 2. In chaining, Hash table never fills up, we can always add more elements to chain. In open addressing, table may become full. S.No. Separate Chaining Open Addressing 3. Chaining is Less sensitive to the hash function or load factors. Open addressing requires extra care to avoid clustering and load factor. 4. Chaining is mostly used when it is unknown how many and how frequently keys may be inserted or deleted. Open addressing is used when the frequency and number of keys is known. 5. Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides better cache performance as everything is stored in the same table. 6. Wastage of Space (Some Parts of hash table in chaining are never used). In Open addressing, a slot can be used even if an input doesn’t map to it. 7. Chaining uses extra space for links. No links in Open addressing LOAD FACTOR AND REHASHING Complexity and Load Factor if there are n entries and b is the size of the array there would be n/b entries on each index. This value n/b is called the load factor that represents the load that is there on our map. This Load Factor needs to be kept low, so that number of entries at one index is less and so is the complexity almost constant, i.e., O(1). REHASHING: 1. It is a closed hashing technique. 2. If the table gets too full, then the rehashing method builds new table that is about twice as big and scan down the entire original hash table, comparing the new hash value for each element and inserting it in the new table. 3. Rehashing is very expensive since the running time is O(N), since there are N elements to rehash and the table size is roughly 2N Rehashing can be implemented in several ways like 1. Rehash, as soon as the table is half full 2. Rehash only when an insertion fails Example: Suppose the elements 13, 15, 24, 6 are inserted into an open addressing hash table of size 7 and if linear probing is used when collision occurs. 0 1 2 3 4 5 6 6 15 24 13 If 23 is inserted, the resulting table will be over 70 percent full. 0 6 1 15 2 23 3 24 4 5 6 13 A new table is created. The size of the new table is 17, as this is the first prime number that is twice as large as the old table size. Advantages 1. Programmer doesn’t worry about the table size 2. Simple to implement Why rehashing? Rehashing is done because whenever key value pairs are inserted into the map, the load factor increases, which implies that the time complexity also increases as explained above. This might not give the required time complexity of O(1). Hence, rehash must be done, increasing the size of the bucket. Array so as to reduce the load factor and the time complexity.