Design and Analysis of Optimized Counting Sort Algorithm (OCSA) Tanvi Puri#1, Anuj Kumar Jain*2, Anjana Sangwan#3 #1, #3 Department of Computer Science Swami Keshwanand Institute of Technology, Jaipur, INDIA 1 Tanvipuri45@gmail.com, 3sangwan.anjana@gmail.com *2 Department of Computer Science, Institute of Engineering & Technology, Alwar, INDIA *2 anujjaingit@gmail.com Abstract- In computer science, sorting techniques are used to place the list of elements in a certain order. Several Sorting algorithms of various time and space complexity are exist and used. In this paper we have proposed the new optimized counting sorting algorithm which is based on counting the each element in list. We also compare with Count Sort that run in linear time. We have used the MATLAB for implementation and analysis of CPU time taken by optimized counting sort and counting sort. For checking correctness of the algorithms we have generate the random input sequence of length 10 to 10,00,000. Result shows that the performance of newly proposed optimized counting sort is better than Count Sort and also reduce the extra space required in counting sort for storing the temporary sorted array that is O(n). Keywords: algorithm, optimization counting sort, Count Sort. I. INTRODUCTION Sorting is the process of separating or arranging to class or king, computer programmers traditionally use the word in the much special sense of marshaling things into ascending or descending order. Some of the most important application of sorting are [3]: a) Solving the “togetherness” problem, in which all items the same identification are brought together. Suppose that we have N number of arbitrary order many of which have equal values; suppose that we want to rearrange the data so that all item with equal values appear in consecutive positions. This is essentially the problem of sorting in the older sense of the word; and it can be solved easily by sorting the file in the new sense of the word, so that the values are ascending order ≤ ≤⋯≤ . b) Matching items in two files or more files. If several files have been sorted into the same order, it is possible to find all of the matching entries in one sequence pass through them, without backing up. c) Searching for information by key values. Sorting are the tasks that are regularly encountered in many software. Since they imitate basic tasks that must be attempted quite often, researchers have attempted in past to develop algorithms efficient in terms of minimum memory and minimum time required i.e., Time or Space Complexities. Sorting algorithms always draw our attention because time complexity of sorting has always been a matter of research attention. By optimizing sorting, we can improve the whole computation time. After analyzing different sorting technique we can say that selection of particular sorting algorithm is depends on the nature of data, number of input in the problem and time for complete the sorting process. Therefore we have to find a relationship which shows how much time is required by the algorithm for the particular amount of data. Sometime we analyze that time for the algorithm may increase much more rapidly than for particular amount of data. We have revealed that, some factors other than the sorting algorithm selected to solve a problem, affect the time needed for run. The execution time of algorithm may be differ for the computers because of the variation in clock speeds that depends upon the architecture and organization of the computer system or machine. The efficiency of the algorithm also depends on the nature of the data. For example , if we have data in the larger size but the range of data is too small than we use the some comparison based sorting technique such as bubble sort, selection sort, quick sort etc. that are not appropriate for sorting this nature of data. The efficient sorting technique is non-comparison sorting like counting sort, shell sort etc . Consequently, analysis of sorting algorithm cannot predict exactly how much time it will take on computer system. All these factors are support to find the exact complexity of the algorithm [4,5,7,8]. This paper have the organization follow as: section 2 introduces the overview of counting sort. Section 3 define the problem statement. Section 3 has proposed work. In this section flow chart of algorithm, pseudo code, analysis of algorithm are given .in the section 4 we provide the experimental result. In section 5 we conclude the paper. II. BACKGROUND STUDY According to lower bound theory, any comparison based sort have either in worst case or expected running time of ( ) on input sequence size n. there are sorting algorithm that run faster than ( ) time but they are some assumption about the input sequence to be sorted. There are some sorting algorithm that run in the linear time like counting sort, radix sort, bucket sort etc. counting sort and radix sort assume that the input consists of integer in small range. Whereas bucket sort assume that distributes elements uniformly over the interval generates the input. These algorithm uses the operations other than comparisons to determine the sorted order. Counting sort [1,2,3,7,8,10,11] The principal of the counting sort is identical in which items are counted before each pass to obtain the number of storage location required for the each value of the array. This precount of items enables the computer to make much better use of the available storage. Let N number which range from 0 to R to be sorted into ascending order. These number are stored in location ARR[1] to ARR[N]. The same amount of storage is available in location OUT[1] to OUT[N]. This array are used as alternate between the role of input and output on successive pass. Another array COUNT[] has each possible value of item. This array are initially used to count the number of times each item in the array ARR[]. After counting the counter are added in a manner which yields the proper address in which to store each item in the output array. After that each element are stored according their address in the OUT[] array. Algorithm Counting Sort (ARR, N, R) Step 1: repeat for ← 1 [ ] ← 0; Step 2: repeat for ← 1 [] ← do endfor Step 3: repeat for ← 1 []← [ − 1] + endfor Step 4: repeat for ← 1 end for Step 5: stop [] ← [] [ ← [ [ ] − 1; [ ]] + 1; [] [] In the counting sort the step-1 takes timeΘ( ), the step-2 takes time Θ( ), step 3 takes time Θ( ), and step-4 takes time Θ( ). Thus the overall running time is Θ( + ). In practical, we use counting sort when we have K=O(N), in which the running time is Θ( ). Radix Sort: Radix sort is a linear sorting algorithm for integers that uses the concept of sorting names in alphabetical order. When we have a list of sorted names, the radix is 26 (or 26 buckets) because there are 26 letters of the alphabet. Observe that words are first sorted according to the first letter of the name. That is, 26 classes are used to arrange the names, where the first class stores the names that begins with ‘A’, the second class contains names with ‘B’, so on and so forth. In the idea of Radix sort we sort on each digit of numericals starting with the least significant. If the radix is B, then there are B buckets. We repeats the process, progressing towords the most significant digit. After each distribution, we regroup the items a new taking care to preserve their order from the previous distribution. After the last regrouping the item is sorted. During the second pass, names are grouped according to the second letter. After the second pass, the names are sorted on the first two letters. This process is continued till nth pass, where n is the length of the names with maximum letters. After every pass, all the names are collected in order of buckets. That is, first pick up the names in the first bucket that contains names beginning with ‘A’. In the second pass collect the names from the second bucket, so on and so forth. When radix sort is used on integers, sorting is done on each of the digits in the number. The sorting procedure proceeds by sorting the least significant to most significant digit. When sorting numbers, we will have ten buckets, each for one digit (0, 1, 2…, 9) and the number of passes will depend on the length of the number having maximum digits. Radix sorting algorithm (ARR, N,B) 1: Find the largest number in ARR as LARGE 2: [Initialize] SET NOP = Number of digits in LARGE 3: SET PASS = 0 4: Repeat Step 5 while PASS <= NOP-1 5: SET I = 0 AND Initialize buckets 6: Repeat Step 7 to Step 9 while I<N-1 7: SET DIGIT = digit at PASS th place in A[I] 8: Add A[I} to the bucket numbered DIGIT 9: INCEREMENT bucket count for bucket numbered DIGIT [END OF LOOP] 10: Collect the numbers in the bucket [END OF LOOP] 11: END To calculate the complexity of radix sort algorithm, assume that there are n numbers that have to be sorted and the k is the number of digits in the largest number. In this case, the radix sort algorithm is called a total of k times. The inner loop is executed for n times. Hence the entire Radix Sort algorithm takes O(kn) time to execute. When radix sort is applied on a data set of finite size (very small set of numbers, then the algorithm runs in O(n) asymptotic time. III. PROPOSED WORK Let A[1….N] is the array which have N numbers of elements within the small rage K. the value of each element should be positive integer value. The arrays C[1…K] is counting the number of replication of each elements. In the step 1 we set value of L to 1. In the step 2 we find the maximum & minimum of the array that is store in SMALL and LARGE. The value of k is the difference between LARGE and SMALL. In the step 3 we initialize the array C[] with value 0. In the step 4 we count the number of each different elements through scanning each element of array A[ ]. Now, in the step 5 we going to sort the element. We start the check the value of C[i] array while the C[i] is not zero then we insert the i in to the A[L]and set Lie equal to L+1 and decrease the value of C[i] by 1.check it again until C[i] reach at zero. a) Flow Chart of Algorithm A flowchart is a graphical representation of an algorithm or process. The flow chart of any algorithm are showing the steps as boxes of various kinds, and their order by connecting them with arrows. Process operations are represented in these boxes, and arrows; rather, they are implied by the sequencing of operations. Flowcharts used in study, design, document or manage a process or program in various fields. Step 4: repeat for ← 1 do [ [ ]] ← [ [ ]] + 1; endfor Step 5: repeat for ← Do while [ ] ≠ 0 then []← ; ← +1 [ ] ← [ ]−1 Endwhile endfor step 7: stop c) Execution of Individual statements We take as K = LARGE-SMALL Pseudo code Instruction 3. 4. 5. Instru ction Execu tion Time for ← 3.1 [ ] ← 0; for ← 1 4.1 do [ [ ]] ← [ [ ]] + 1; for ← 5.1 Do while [ ] ≠ 0 5.1.1 [ ] ← ; 5.1.2 ← + 1 5.1.3 [ ] ← [] −1 How Many times the instruction is executed by CPU K K N N K K.t K.t K.t K.t To evaluate the execution time complexity N is the number of elements and k is the range of elements. We simplify the execution time of above algorithm. ( )= + + + + + . + . + . + . + ( ) =( + + ) +( + ) +( + + + ) . + ……………….(1) d) Execution for Worst Case Complexity We take = Figure1: Flowchart of Optimized Counting Sort b) Pseudo code for optimized counting sort Algorithm optimized_counting_sort(A,n) Step 1: set ← 1; Step 2: ← maximum (A); ( ); ← Step 3: repeat for ← [ ] ← 0; And = = = = = = = 1 From equation (1) ( ) = 1+2 +3 +4 . ( ) = 1+2 +3 +4 . = = = ( ) = 1+2 +3 +4 ( )= 1+6 +3 ( ) = ( + )……………(2) The Equation (2) show that the execution time of sorting algorithm for worst case is the O(N+K) which means the running time of optimized counting sort is linear in respect of N. e) Execution for Average Case Complexity We take as t=K And = = = = = = = = = = 1 From Equation (1) ( )= 1+2 +3 +4 . ( ) = 1 + 4 + 3 + 4 ^2 We know that 4 << .Now T(N)=1+2N+3K+N T(N)=1+3N+3K T(N)=O(N+K)…………………………(3) The Equation (3) show that the execution time of sorting algorithm for average case is the O(N+K) which means the running time of optimized counting sort is linear in respect of N. f) Execution for best Case Complexity We take as t=N that mean the array A[] have the single value of N number . So at that time the value of K=1 And = = = = = = = = = = = =1 From Equation (1) ( )= 1+4 +3∗1+4∗1∗ ( ) = 4+2 +4 ( )= 4+6 ( ) = ( )……………(4) The Equation (4) show that the execution time of sorting algorithm for best case is the O(N) which means the running time of optimized counting sort is linear in respect of N. IV. COMPARISON When we compare the proposed algorithm with counting sort. For comparison, we have taken the many range (K) of elements such as 1-99, 1-999 and 1-9999. We have generate the random N number of input within the range K and calculate the execution time by using the MATLAB function tic and toc and using the Intel core2 Due processor of 2.00Ghz with 2Gb RAM and having 64 bit window 8 . The value return by the toc is stored in a file. Then we draw the graph for the following values. There are following graph and table give below for comparison. In the table some of value having the negative value that show the counting Table 1: Execution Time for counting & Purposed algorithm for K=49 Number of Input N 1000 Execution Time for K=99 in Sec Optimized Counting Sort 1.726957 Counting Sort 0.003315 Radix Sort 0.008751 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 500000 1000000 17.20312 34.89535 51.41824 81.26701 101.4711 125.9636 143.2801 164.7435 162.5578 173.8811 342.7598 515.97 0.023315 0.029192 0.042205 0.056121 0.068372 0.081187 0.096696 0.107045 0.119641 0.160739 0.659961 1.254126 0.009751 0.205143 0.029548 0.038154 0.054031 0.058889 0.06686 0.075202 0.165222 0.10099 0.4487 0.871253 Table 2: Execution Time for counting & Purposed algorithm for K=999 Number of Input N 1000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 500000 1000000 Execution Time for K=999 in Sec Optimized Counting Sort 1.726957 16.703121 33.695352 52.518241 80.67013 99.471146 124.963555 145.280057 163.743452 162.557803 173.881081 338.759762 517.970002 Counting Sort Radix Sort 0.004708 0.014708 0.028299 0.045938 0.057897 0.067984 0.177064 0.097558 0.109771 0.13524 0.136866 0.633319 1.247926 0.001284 0.011284 0.022493 0.031737 0.042011 0.045774 0.067438 0.072141 0.088679 0.096584 0.115031 0.472872 0.986916 Table 3: Execution Time for counting & Purposed algorithm for K=9999 Number of Input N 1000 10000 20000 30000 40000 50000 60000 70000 80000 90000 Execution Time for K=999 in Sec Optimized Counting Sort 1.726957 17.20312 34.89535 51.41824 81.26701 101.4711 125.9636 143.2801 164.7435 162.5578 Counting Sort 0.009434 0.029434 0.033333 0.048883 0.06113 0.072815 0.092384 0.098455 0.109163 0.12797 Radix Sort 0.006886 0.016886 0.027955 0.035984 0.058935 0.054655 0.066174 0.074234 0.08425 0.096894 173.8811 345.7598 523.97 100000 500000 1000000 0.16787 0.641422 1.327945 0.110239 0.470315 0.977271 Exacution Time (T) optimized counting sort, counting sort and Radix Sort for K=99 100% 100% 100% 99% 99% 99% 500000 Number of Input (N) 1000000 90000 100000 80000 70000 60000 50000 40000 30000 20000 1000 10000 99% Figure 2: Comparison chart of optimized counting sort, counting sort and Radix Sort for K=99 Execution Time (T) optimized counting sort, counting sort and Radix Sort for K=9999 100% 100% 99% 1000 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 500000 1000000 99% Radix Sort Number of Inputs Count Sort OCSA Figure 3: Comparison chart of optimized counting sort & counting sort for K=999 optimized counting sort, counting sort and Radix Sort for K=999 100% 100% 99% Radix Sort 1000000 500000 Number of Inputs Count Sort OCSA 100000 90000 80000 70000 60000 50000 40000 30000 20000 1000 99% 10000 Extecution Time T 100% Figure 4: Comparison chart of optimized counting sort, counting sort and radix sort for K=9999 IV. CONCLUSION From above the result in the table 1, Table 2, Table 3 we can say that our purposed algorithm optimized counting sort have improve the running time 18-26% compare to counting sort. It reduce the auxiliary space required in the counting sort in term of O(n). it reduce the no of steps in the sorting that is the major reason to reduce the time complexity of the algorithm. The Equation (2) show that the execution time of sorting algorithm for worst case is the O(N+K) which means the running time of optimized counting sort is linear in respect of N. The Equation (3) show that the execution time of sorting algorithm for average case is the O(N) which means the running time of optimized counting sort is linear in respect of N. The Equation (4) show that the execution time of sorting algorithm for best case is the O(N) which means the running time of optimized counting sort is linear in respect of N. V. REFERENCES [1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. ISBN 0-262-03293-7. [2] Edmonds, Jeff (2008), "5.2 Counting Sort (a Stable Sort)", How to Think about Algorithms, Cambridge University Press, pp. 72–75, ISBN 978-0-521-84931-9. [3] Sedgewick, Robert (2003), "6.10 Key-Indexed Counting", Algorithms in Java, Parts 1-4: Fundamentals, Data Structures, Sorting, and Searching (3rd ed.), AddisonWesley, pp. 312–314. [4] Knuth, D. E. (1998), The Art of Computer Programming, Volume 3: Sorting and Searching (2nd ed.), AddisonWesley, ISBN 0-201-89685-0. Section 5.2, Sorting by counting, pp. 75–80. [5] Alfred V., Aho J., Horroroft, Jeffrey D.U. (2002) Data Structures and Algorithms [6] Y. KIRANI SINGH, B. B. CHAUDHURI “Matlab Programming” PHI Learning-2007 [7] Levitin, Introduction to the Design & Analysis of Algorithms, Addison–Wesley Longman, 2007, [8] Parag Bhalchandra*, Nilesh Deshmukh, Sakharam Lokhande, Santosh Phulari, “A Comprehensive Note on Complexity Issues in Sorting Algorithms”, Advances in Computational Research, ISSN: 0975–3273, Volume 1, Issue 2, 2009. [9] Counting Sort: Web Link: http://www.cse.iitk.ac.in/users/dsrkg/cs210/.../sortingII/c ountingSort/count.htm. [10] Counting Sort Web Linkhttp://www.courses.csail.mit.edu/6.006/spring11/rec/rec1 1.pdf. [11] Flow chart Design Web http://en.wikipedia.org/wiki/Flowchart. Link- About Author Tanvi Puri is M.Tech research Scholar in Department of computer Science & engineering at SKIT, Jaipur affiliated from RTU, Kota. She holds a B.E. degree in Computer Science and M.BA in Human Resourse. She has working in the areas of design and analysis of algorithm and has presented papers at conferences. Anuj Kumar Jain is presently Asst. Professor at IET, Alwar and has been teaching to engineering students for past five years, mainly in the areas of computer engineering. She holds a B.E. degree in Information Technology and M.Tech. in Computer Science. He has working in the areas of parallel algorithm, Design and analysis of algorithm and has presented papers at conferences. Anjana Sangwan is presently Asst. Professor at SKIT, Jaipur and has been teaching to engineering students for past seven years, mainly in the areas of computer engineering. She holds a B.E. degree in Computer Science and M.Tech. in Computer Science. She has working in the areas of data structure, design & analysis of algorithm and has presented papers at conferences.