Design and Analysis of Optimized Counting Sort Algorithm (OCSA)

advertisement
Design and Analysis of Optimized Counting Sort
Algorithm (OCSA)
Tanvi Puri#1, Anuj Kumar Jain*2, Anjana Sangwan#3
#1, #3
Department of Computer Science Swami Keshwanand Institute of Technology, Jaipur, INDIA
1
Tanvipuri45@gmail.com, 3sangwan.anjana@gmail.com
*2
Department of Computer Science, Institute of Engineering & Technology, Alwar, INDIA
*2
anujjaingit@gmail.com
Abstract- In computer science, sorting techniques are used
to place the list of elements in a certain order. Several
Sorting algorithms of various time and space complexity are
exist and used. In this paper we have proposed the new
optimized counting sorting algorithm which is based on
counting the each element in list. We also compare with
Count Sort that run in linear time. We have used the
MATLAB for implementation and analysis of CPU time
taken by optimized counting sort and counting sort. For
checking correctness of the algorithms we have generate the
random input sequence of length 10 to 10,00,000. Result
shows that the performance of newly proposed optimized
counting sort is better than Count Sort and also reduce the
extra space required in counting sort for storing the
temporary sorted array that is O(n).
Keywords: algorithm, optimization counting sort, Count
Sort.
I.
INTRODUCTION
Sorting is the process of separating or arranging to class or
king, computer programmers traditionally use the word in the
much special sense of marshaling things into ascending or
descending order.
Some of the most important application of sorting are [3]:
a) Solving the “togetherness” problem, in which all items
the same identification are brought together. Suppose that
we have N number of arbitrary order many of which have
equal values; suppose that we want to rearrange the data
so that all item with equal values appear in consecutive
positions. This is essentially the problem of sorting in the
older sense of the word; and it can be solved easily by
sorting the file in the new sense of the word, so that the
values are ascending order
≤
≤⋯≤ .
b) Matching items in two files or more files. If several files
have been sorted into the same order, it is possible to find
all of the matching entries in one sequence pass through
them, without backing up.
c) Searching for information by key values.
Sorting are the tasks that are regularly encountered in many
software. Since they imitate basic tasks that must be attempted
quite often, researchers have attempted in past to develop
algorithms efficient in terms of minimum memory and
minimum time required i.e., Time or Space Complexities.
Sorting algorithms always draw our attention because time
complexity of sorting has always been a matter of research
attention.
By optimizing sorting, we can improve the whole computation
time. After analyzing different sorting technique we can say
that selection of particular sorting algorithm is depends on the
nature of data, number of input in the problem and time for
complete the sorting process. Therefore we have to find a
relationship which shows how much time is required by the
algorithm for the particular amount of data. Sometime we
analyze that time for the algorithm may increase much more
rapidly than for particular amount of data. We have revealed
that, some factors other than the sorting algorithm selected to
solve a problem, affect the time needed for run. The execution
time of algorithm may be differ for the computers because of
the variation in clock speeds that depends upon the
architecture and organization of the computer system or
machine. The efficiency of the algorithm also depends on the
nature of the data. For example , if we have data in the larger
size but the range of data is too small than we use the some
comparison based sorting technique such as bubble sort,
selection sort, quick sort etc. that are not appropriate for
sorting this nature of data. The efficient sorting technique is
non-comparison sorting like counting sort, shell sort etc .
Consequently, analysis of sorting algorithm cannot predict
exactly how much time it will take on computer system. All
these factors are support to find the exact complexity of the
algorithm [4,5,7,8].
This paper have the organization follow as: section 2
introduces the overview of counting sort. Section 3 define the
problem statement. Section 3 has proposed work. In this
section flow chart of algorithm, pseudo code, analysis of
algorithm are given .in the section 4 we provide the
experimental result. In section 5 we conclude the paper.
II.
BACKGROUND STUDY
According to lower bound theory, any comparison based sort
have either in worst case or expected running time of
(
) on input sequence size n. there are sorting
algorithm that run faster than (
) time but they are
some assumption about the input sequence to be sorted. There
are some sorting algorithm that run in the linear time like
counting sort, radix sort, bucket sort etc. counting sort and
radix sort assume that the input consists of integer in small
range. Whereas bucket sort assume that distributes elements
uniformly over the interval generates the input. These
algorithm uses the operations other than comparisons to
determine the sorted order.
Counting sort [1,2,3,7,8,10,11]
The principal of the counting sort is identical in which items
are counted before each pass to obtain the number of storage
location required for the each value of the array. This precount of items enables the computer to make much better use
of the available storage. Let N number which range from 0 to
R to be sorted into ascending order. These number are stored
in location ARR[1] to ARR[N]. The same amount of storage
is available in location OUT[1] to OUT[N]. This array are
used as alternate between the role of input and output on
successive pass. Another array COUNT[] has each possible
value of item. This array are initially used to count the number
of times each item in the array ARR[]. After counting the
counter are added in a manner which yields the proper address
in which to store each item in the output array. After that each
element are stored according their address in the OUT[] array.
Algorithm Counting Sort (ARR, N, R)
Step 1: repeat for ← 1
[ ] ← 0;
Step 2: repeat for ← 1
[] ←
do
endfor
Step 3: repeat for ← 1
[]←
[ − 1] +
endfor
Step 4: repeat for ←
1
end for
Step 5: stop
[] ←
[]
[
←
[
[ ] − 1;
[ ]] + 1;
[]
[]
In the counting sort the step-1 takes timeΘ( ), the step-2
takes time Θ( ), step 3 takes time Θ( ), and step-4 takes
time Θ( ). Thus the overall running time is Θ( + ). In
practical, we use counting sort when we have K=O(N), in
which the running time is Θ( ).
Radix Sort:
Radix sort is a linear sorting algorithm for integers that uses
the concept of sorting names in alphabetical order. When we
have a list of sorted names, the radix is 26 (or 26 buckets)
because there are 26 letters of the alphabet. Observe that
words are first sorted according to the first letter of the name.
That is, 26 classes are used to arrange the names, where the
first class stores the names that begins with ‘A’, the second
class contains names with ‘B’, so on and so forth. In the idea
of Radix sort we sort on each digit of numericals starting with
the least significant. If the radix is B, then there are B buckets.
We repeats the process, progressing towords the most
significant digit. After each distribution, we regroup the items
a new taking care to preserve their order from the previous
distribution. After the last regrouping the item is sorted.
During the second pass, names are grouped according to the
second letter. After the second pass, the names are sorted on
the first two letters. This process is continued till nth pass,
where n is the length of the names with maximum letters.
After every pass, all the names are collected in order of
buckets. That is, first pick up the names in the first bucket that
contains names beginning with ‘A’. In the second pass collect
the names from the second bucket, so on and so forth. When
radix sort is used on integers, sorting is done on each of the
digits in the number. The sorting procedure proceeds by
sorting the least significant to most significant digit. When
sorting numbers, we will have ten buckets, each for one digit
(0, 1, 2…, 9) and the number of passes will depend on the
length of the number having maximum digits.
Radix sorting algorithm (ARR, N,B)
1: Find the largest number in ARR as LARGE
2: [Initialize] SET NOP = Number of digits in LARGE
3: SET PASS = 0
4: Repeat Step 5 while PASS <= NOP-1
5: SET I = 0 AND Initialize buckets
6:
Repeat Step 7 to Step 9 while I<N-1
7:
SET DIGIT = digit at PASS th place in A[I]
8:
Add A[I} to the bucket numbered DIGIT
9:
INCEREMENT bucket count for bucket numbered DIGIT
[END OF LOOP]
10: Collect the numbers in the bucket
[END OF LOOP]
11: END
To calculate the complexity of radix sort algorithm, assume
that there are n numbers that have to be sorted and the k is the
number of digits in the largest number. In this case, the radix
sort algorithm is called a total of k times. The inner loop is
executed for n times. Hence the entire Radix Sort algorithm
takes O(kn) time to execute. When radix sort is applied on a
data set of finite size (very small set of numbers, then the
algorithm runs in O(n) asymptotic time.
III.
PROPOSED WORK
Let A[1….N] is the array which have N numbers of elements
within the small rage K. the value of each element should be
positive integer value. The arrays C[1…K] is counting the
number of replication of each elements. In the step 1 we set
value of L to 1. In the step 2 we find the maximum &
minimum of the array that is store in SMALL and LARGE.
The value of k is the difference between LARGE and
SMALL. In the step 3 we initialize the array C[] with value 0.
In the step 4 we count the number of each different elements
through scanning each element of array A[ ]. Now, in the step
5 we going to sort the element. We start the check the value of
C[i] array while the C[i] is not zero then we insert the i in to
the A[L]and set Lie equal to L+1 and decrease the value of
C[i] by 1.check it again until C[i] reach at zero.
a) Flow Chart of Algorithm
A flowchart is a graphical representation of an algorithm or
process. The flow chart of any algorithm are showing the
steps as boxes of various kinds, and their order by connecting
them with arrows. Process operations are represented in these
boxes, and arrows; rather, they are implied by the sequencing
of operations. Flowcharts used in study, design, document or
manage a process or program in various fields.
Step 4: repeat for ← 1
do [ [ ]] ← [ [ ]] + 1;
endfor
Step 5: repeat for ←
Do while [ ] ≠ 0 then
[]← ;
← +1
[ ] ← [ ]−1
Endwhile
endfor
step 7: stop
c) Execution of Individual statements
We take as K = LARGE-SMALL
Pseudo code Instruction
3.
4.
5.
Instru
ction
Execu
tion
Time
for ←
3.1
[ ] ← 0;
for ← 1
4.1
do
[ [ ]] ←
[ [ ]] + 1;
for ←
5.1 Do while [ ] ≠ 0
5.1.1 [ ] ← ;
5.1.2 ← + 1
5.1.3 [ ]
← []
−1
How Many
times the
instruction is
executed by
CPU
K
K
N
N
K
K.t
K.t
K.t
K.t
To evaluate the execution time complexity N is the number of
elements and k is the range of elements. We simplify the
execution time of above algorithm.
( )=
+
+
+
+
+
. +
.
+
. +
. +
( ) =( + + ) +( + ) +( + + +
) . +
……………….(1)
d) Execution for Worst Case Complexity
We take =
Figure1: Flowchart of Optimized Counting Sort
b) Pseudo code for optimized counting sort
Algorithm optimized_counting_sort(A,n)
Step 1: set ← 1;
Step 2:
← maximum (A);
( );
←
Step 3: repeat for ←
[ ] ← 0;
And
=
=
=
=
=
=
=
1
From equation (1)
( ) = 1+2 +3 +4 .
( ) = 1+2 +3 +4 .
=
=
=
( ) = 1+2 +3 +4
( )= 1+6 +3
( ) = ( + )……………(2)
The Equation (2) show that the execution time of sorting
algorithm for worst case is the O(N+K) which means the
running time of optimized counting sort is linear in respect of
N.
e) Execution for Average Case Complexity
We take as t=K
And
=
=
=
=
=
=
=
=
=
=
1
From Equation (1)
( )= 1+2 +3 +4 .
( ) = 1 + 4 + 3 + 4 ^2
We know that 4 << .Now
T(N)=1+2N+3K+N
T(N)=1+3N+3K
T(N)=O(N+K)…………………………(3)
The Equation (3) show that the execution time of sorting
algorithm for average case is the O(N+K) which means the
running time of optimized counting sort is linear in respect of
N.
f) Execution for best Case Complexity
We take as t=N that mean the array A[] have the single value
of N number . So at that time the value of K=1
And
=
=
=
=
=
=
=
=
=
=
=
=1
From Equation (1)
( )= 1+4 +3∗1+4∗1∗
( ) = 4+2 +4
( )= 4+6
( ) = ( )……………(4)
The Equation (4) show that the execution time of sorting
algorithm for best case is the O(N) which means the running
time of optimized counting sort is linear in respect of N.
IV.
COMPARISON
When we compare the proposed algorithm with counting sort.
For comparison, we have taken the many range (K) of
elements such as 1-99, 1-999 and 1-9999. We have generate
the random N number of input within the range K and
calculate the execution time by using the MATLAB function
tic and toc and using the Intel core2 Due processor of 2.00Ghz
with 2Gb RAM and having 64 bit window 8 . The value return
by the toc is stored in a file. Then we draw the graph for the
following values. There are following graph and table give
below for comparison. In the table some of value having the
negative value that show the counting
Table 1: Execution Time for counting & Purposed algorithm for
K=49
Number
of Input
N
1000
Execution Time for K=99 in Sec
Optimized
Counting
Sort
1.726957
Counting
Sort
0.003315
Radix Sort
0.008751
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
500000
1000000
17.20312
34.89535
51.41824
81.26701
101.4711
125.9636
143.2801
164.7435
162.5578
173.8811
342.7598
515.97
0.023315
0.029192
0.042205
0.056121
0.068372
0.081187
0.096696
0.107045
0.119641
0.160739
0.659961
1.254126
0.009751
0.205143
0.029548
0.038154
0.054031
0.058889
0.06686
0.075202
0.165222
0.10099
0.4487
0.871253
Table 2: Execution Time for counting & Purposed algorithm for
K=999
Number
of Input
N
1000
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
500000
1000000
Execution Time for K=999 in Sec
Optimized
Counting
Sort
1.726957
16.703121
33.695352
52.518241
80.67013
99.471146
124.963555
145.280057
163.743452
162.557803
173.881081
338.759762
517.970002
Counting
Sort
Radix Sort
0.004708
0.014708
0.028299
0.045938
0.057897
0.067984
0.177064
0.097558
0.109771
0.13524
0.136866
0.633319
1.247926
0.001284
0.011284
0.022493
0.031737
0.042011
0.045774
0.067438
0.072141
0.088679
0.096584
0.115031
0.472872
0.986916
Table 3: Execution Time for counting & Purposed algorithm for
K=9999
Number
of Input
N
1000
10000
20000
30000
40000
50000
60000
70000
80000
90000
Execution Time for K=999 in Sec
Optimized
Counting
Sort
1.726957
17.20312
34.89535
51.41824
81.26701
101.4711
125.9636
143.2801
164.7435
162.5578
Counting
Sort
0.009434
0.029434
0.033333
0.048883
0.06113
0.072815
0.092384
0.098455
0.109163
0.12797
Radix Sort
0.006886
0.016886
0.027955
0.035984
0.058935
0.054655
0.066174
0.074234
0.08425
0.096894
173.8811
345.7598
523.97
100000
500000
1000000
0.16787
0.641422
1.327945
0.110239
0.470315
0.977271
Exacution Time (T)
optimized counting sort, counting sort and Radix Sort
for K=99
100%
100%
100%
99%
99%
99%
500000
Number of Input (N)
1000000
90000
100000
80000
70000
60000
50000
40000
30000
20000
1000
10000
99%
Figure 2: Comparison chart of optimized counting sort, counting sort
and Radix Sort for K=99
Execution Time (T)
optimized counting sort, counting sort and
Radix Sort for K=9999
100%
100%
99%
1000
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
500000
1000000
99%
Radix Sort
Number of Inputs
Count Sort
OCSA
Figure 3: Comparison chart of optimized counting sort & counting
sort for K=999
optimized counting sort, counting sort and Radix
Sort for K=999
100%
100%
99%
Radix Sort
1000000
500000
Number of Inputs
Count Sort
OCSA
100000
90000
80000
70000
60000
50000
40000
30000
20000
1000
99%
10000
Extecution Time T
100%
Figure 4: Comparison chart of optimized counting sort, counting sort
and radix sort for K=9999
IV.
CONCLUSION
From above the result in the table 1, Table 2, Table 3 we can
say that our purposed algorithm optimized counting sort have
improve the running time 18-26% compare to counting sort. It
reduce the auxiliary space required in the counting sort in term
of O(n). it reduce the no of steps in the sorting that is the
major reason to reduce the time complexity of the algorithm.
The Equation (2) show that the execution time of sorting
algorithm for worst case is the O(N+K) which means the
running time of optimized counting sort is linear in respect of
N. The Equation (3) show that the execution time of sorting
algorithm for average case is the O(N) which means the
running time of optimized counting sort is linear in respect of
N. The Equation (4) show that the execution time of sorting
algorithm for best case is the O(N) which means the running
time of optimized counting sort is linear in respect of N.
V.
REFERENCES
[1] Cormen, Thomas H.; Leiserson, Charles E.; Rivest,
Ronald L.; Stein, Clifford (2001), Introduction to
Algorithms (2nd ed.), MIT Press and McGraw-Hill,
pp. ISBN 0-262-03293-7.
[2] Edmonds, Jeff (2008), "5.2 Counting Sort (a Stable
Sort)", How to Think about Algorithms, Cambridge
University Press, pp. 72–75, ISBN 978-0-521-84931-9.
[3] Sedgewick, Robert (2003), "6.10 Key-Indexed Counting",
Algorithms in Java, Parts 1-4: Fundamentals, Data
Structures, Sorting, and Searching (3rd ed.), AddisonWesley, pp. 312–314.
[4] Knuth, D. E. (1998), The Art of Computer Programming,
Volume 3: Sorting and Searching (2nd ed.), AddisonWesley, ISBN 0-201-89685-0. Section 5.2, Sorting by
counting, pp. 75–80.
[5] Alfred V., Aho J., Horroroft, Jeffrey D.U. (2002) Data
Structures and Algorithms
[6] Y. KIRANI SINGH, B. B. CHAUDHURI “Matlab
Programming” PHI Learning-2007
[7] Levitin, Introduction to the Design & Analysis of
Algorithms, Addison–Wesley Longman, 2007,
[8] Parag Bhalchandra*, Nilesh Deshmukh, Sakharam
Lokhande, Santosh Phulari, “A Comprehensive Note on
Complexity Issues in Sorting Algorithms”, Advances in
Computational Research, ISSN: 0975–3273, Volume 1,
Issue 2, 2009.
[9] Counting Sort: Web Link:
http://www.cse.iitk.ac.in/users/dsrkg/cs210/.../sortingII/c
ountingSort/count.htm.
[10] Counting
Sort
Web
Linkhttp://www.courses.csail.mit.edu/6.006/spring11/rec/rec1
1.pdf.
[11]
Flow chart Design Web
http://en.wikipedia.org/wiki/Flowchart.
Link-
About Author
Tanvi Puri is M.Tech research Scholar in
Department of computer Science & engineering at SKIT,
Jaipur affiliated from RTU, Kota. She holds a B.E. degree in
Computer Science and M.BA in Human
Resourse. She has working in the areas of design
and analysis of algorithm and has presented
papers at conferences.
Anuj Kumar Jain is presently Asst. Professor
at IET, Alwar and has been teaching to
engineering students for past five years, mainly in the areas of
computer engineering. She holds a B.E. degree in Information
Technology and M.Tech. in Computer Science.
He has working in the areas of parallel
algorithm, Design and analysis of algorithm and
has presented papers at conferences.
Anjana Sangwan is presently Asst. Professor at
SKIT, Jaipur and has been teaching to engineering students
for past seven years, mainly in the areas of computer
engineering. She holds a B.E. degree in Computer Science and
M.Tech. in Computer Science. She has working in the areas of
data structure, design & analysis of algorithm and has
presented papers at conferences.
Download