Modified Quick Sort: Worst Case Made Best Case

advertisement
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
Modified Quick Sort: Worst Case Made Best Case
Omar Khan Durrani1, Dr. Khalid Nazim S. A2
1,2
Department of Computer Science & EngineeringVidya Vikas Institute of Engineering & Technology, Mysore
Abstract— Sufficient work has been carried out on analysis
of quick sort by pioneers of computer Science and applied
mathematics, which, no doubt is out of reach to describe and
discuss. Also we cannot oversight any contribution made in
the field of research and development, but still, facts come out
with necessities. Having this in mind some improvements have
been made with respect to the worst case performance of
quick sort which is found in the academic reference material
and contributions at a saturated state. In this paper quick sort
is modified to perform Best when it is suppose to be worst.
The results of modifications have yielded sufficient
improvements over the existence one.
The remaining sections are as follows, the section
Analysis of quick sort exhibits the algorithms complexity
in all possible cases as a review, the literature survey gives
us brief description of references which leads us to classify
quick sort worst case in the class O (n2), the section
quicksort worst made best explain the modified code which
sorts the ordered input in
Ө (n) time. Finally in
experiment results and performance measurement section
the experiment conducted is shown as acknowledgement
for correctness and also time clocked for different set ups
are plotted and discussed from the point of efficiency.
Keywords— sorting, quicksort, randomized, worst case,
quicksort_wmb, partition, global variables, ordered input.
II. QUICK SORT
QuickSort is an algorithm based on the Divide-andConquer paradigm that selects a pivot element and reorders
the given list in such a way that all elements smaller to it
are on one side and those bigger than it are on the other.
Then the sub lists are recursively sorted until the whole list
gets completely sorted. The selection of pivot could be the
first element when the input is random numbers. In case of
ordered input elements choosing pivot as first element will
lead slow performance of algorithm resulting complexity as
θ(n2) (i.e. worst case) instead of O (n log n) which is the
complexity in best and average cases. To make quick sort
perform O (n log n) with respect to time complexity when
ordered input is considered practitioners have preferred
random selection of the pivot leading to Randomized
Quicksort. A complete theoretical analysis of Quick Sort is
given in the following subsection. Also Quick sort is the
default sorting scheme in some operating systems, such as
UNIX.
I. INTRODUCTION
Mathematicians have contributed algorithmic analysis
from info-theoretic view point on the other side we
algorithm engineers contribute from the angle of software
and the computer architecture. A responsibility that
showers is to cope for the upcoming challenges is to
improve and provide a compatible code design keeping
time efficiency as our objective. Sorting is one of the most
important, well-studied and commonly applied problem in
the field of computer technology.
Many sorting algorithms are known which offer various
trade-offs in efficiency, simplicity, memory use, and other
factors. However, these algorithms do not take much into
account the features of compilers and computer
architectures that significantly influence performance.
Hence Quick sort algorithm analyzed and improved it in
the worst case, though practitioners may have not shown
much interest as they manage it with other sorting
algorithms like the merge sort which perform much better
for such cases, for which quick sort is proven to be slow.
As an academician I feel this type of outcomes have to be
shared among our community. It is also well known that
Quick sort has been proven to be fastest on average case
when compared to the other n log n class of algorithms like
Merge sort and heap sort. Other sorting algorithms like
bubble sort, selection, insertion sort, shell sort fall under
n2class of sorting algorithm which has shown slow
performance. Analysis and performance measurement of
both the classes is experimented [7, 8].
III. ANALYSIS OF QUICKSORT
The total time taken to re-arrange the array as just
described in the above section always takes O (n) or α n
where α is some constant needed to execute in every
partition. Let us suppose that the pivot we just chose has
divided the array into two parts: one of size k and the other
of size n − k. Notice that both these parts still need to be
sorted. This gives us the following relation:
T (n) = T (k) + T (n − k) + α n,
---------------------- (1)
Where T (n) refers to the time taken by the algorithm to
sort n elements.
373
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
A. Worst case analysis
Consider the case, when pivot is the least element of the
array (input array is in ascending order), so that we have k
= 1 and n − k = n – 1 in equation (1). In such a case, we
have:
T (n) = T (1) + T (n − 1) + α n
as follows:
= T (n − i) + iT (1) + α (
Notice that this recurrence will continue only until n =
2k (otherwise we have n/2k < 1), i.e. until k = log n. Thus,
by putting k = log n, we have the following equation:
T (n) = n T (1) + α n log n, which is O (n log n). This is
the best case for quick sort.
It also turns out that in the average case (over all
possible pivot configurations), quick sort has a time
complexity of O (n log n), the proof of which is beyond the
scope..
by solving the recurrence
))
---------- (2)
C. Avoiding the worst case
Practical implementations of quick sort often pick a
pivot randomly from the list each time [1, 2]. This greatly
reduces the chance that the worst-case ever occurs. This
method is seen to work excellently in practice but still
much time is exploited by the randomizer [1]. The other
technique, which deterministically prevents the worst case
from ever occurring, is to find the median of the array to be
sorted each time, and use that as the pivot. The median can
be found in linear time but that is saddled with a huge
constant factor overhead, rendering it suboptimal for
practical implementations [3].
Now clearly such a recurrence can only go on until i = n
− 1 (because otherwise n – 1 would be less than 1). So,
substitute i = n − 1 in the equation (2), which gives us:
T (n) = T (1) + (n − 1)T(1) + α
= nT (1) + α (n (n − 2) − (n − 2) (n − 1)/2) (Notice
that
=
=
(n
−
2)
(n
−
1)/2)
which is O (n2).
This is the worst case of quick-sort, which happens when
the pivot we picked turns out to be the least element of the
array to be sorted, in every step (i.e. in every recursive
call). A similar situation will also occur if the pivot
happens to be the largest element of the array to be sorted.
IV. LITERATURE SURVEY
Thomas H. Cormen et. al [3] have quoted that Quick
sort, deteriorates and takes Quadratic time in the worst
case, spends a lot of time even on the sorted or almost
sorted data. It performs a lot of comparisons even on sorted
data, but swap count is low for sorted or almost sorted
input. Mark Allan Weiss in [11] has also stated that quick
sort has O (n2) worst case performance. Howrowitz et at in
[1] have said that, a possible input on which quicksort
displays worst case behavior is one in which the elements
are already in order.
Almost all the authors of algorithm books and research
papers on quick sort analysis have agreed that quicksort
perform no better than O(n2) in case of ordered input
(considering randomized and medians as exceptional
cases). With all the above survey and many others
references made the theoreticians and practitioners have
considered quick sort worst case classified under
asymptotic class O (n2).
B. Best case analysis
The best case of quick sort occurs when the pivot we
pick happens to divide the array into two exactly equal
parts, in every step. Thus we have k = n/2 and n−k = n/2 in
quation (1) for the original array of size n.
Consider, therefore, the recurrence:
T (n) = 2 T (n/2) + α n --------------------------- (3)
= 2 (2T (n/4) + α n/2) + α n
(Note: T (n/2) = 2T (n/4) + α n/2 by just substituting n/2 for
n in the equation (3)
= 22 T (n/4) + 2 α n (By simplifying and grouping terms
together).
= 22(2 T (n/8) + α n/4) + 2 α n
= 23T (n/8) + 3 α n
= 2kT (n/2k) + k α n (Continuing likewise till the kth step)
374
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
void quick::quicksort_wmb(int low,int high)
{
int j;
if(low<high)
{ j=partition_wmb(low,high);
if (no_part= =1)
if( Aorder) {cout<<”A order”<<endl; return;}
else if( Dorder) {cout<<”Dorder”<<endl; return;}
quicksort_wmb(low,j-1);
quicksort_wmb(j+1,high); }// end of if compound statement
}// end of quicksort
int quick::partition_wmb(int low,int high)
{ int key,i,j,temp;
no_part++; // Global variable
key=a[low];
i=low;
j=high+1;
while(i<=j)
{ do{i++;}while(key>=a[i]);
do{j--;}while(key<a[j]);
if(i<j) {temp=a[i];a[i]=a[j]; a[j]=temp;}}
temp=a[low]; a[low]=a[j]; a[j]=temp;
if (no_part= =1) if ((j= =n-1)&& (i= =n)) Dorder =1;
else if ((i= =1) &&(j= =0)) Aorder =1;
return j;
}//end of partition_wmb
Note:The array element at the position high+1 is assisgned with a value
maximum + 1(1000 in our case) for the algorithm as it will stops index I at
high+1.
That i gets increamented to n searching for key >a[i]
which is the highest in the array and j stands still at n-1 as
a[n-1] i.e a[j] is < key as shown in figure 1.2.
Aorder is set to value 1 in the partition function when
i=1 and j=0, that is i does not get increamented further as
key>=a[i]
becomes false
in the statement
do{i++;}while(key>=a[i]);(do- while is executed only
once), where as key<a[j] is true until j become 0 after n
executions of the statement do{j--;}while(key<a[j]); (until
key=a[j]). The above two cases are shown in figure 1.2(in
case of ascending order input)and figure1.3(when
descending order input is considered).
In case of descending order input, a small modification
can be made in the algorithm which will avoid the last or
the highest element getting swapped with the first (i.e the
lowest) element and printing the array in reverse order
which may take a time complexity of Ө(2n), i.e Ө(n) time
for one execution of partition function and Ө(n) for
reversing the input array.
Also the quicksort_wmb fails to sort the random input
arrays which has the first element as the highest or the
lowest among the array elements. In such cases one can
always use int findmin() or int findmax() which will return
respective minimum and maximum array elements using it
in comparison with the first element of given input array
one can overcome the such problem.This modification will
take extra time of about O(n) along with the time of
quicksort_wmb which is θ(n)resulting the time complexity
as θ(n+n)=θ(n) using the theorem1 given in chapter 2 of
[2].
Figure 1.1: Quicksort and Partition funcions modified to perform as
best case for ordered inputs.
V. QUICKSORT WORST CASE MADE BEST CASE
When bubble sort and selection sort can be modified to
perform only n-1 comparison on sorted input data as is
stated in most algorithm books and also realized by
experiments conducted at laboratory, why not Quick sort
can do this? This made me to sit aside for a short time and
modify the existing code of quicksort and partition as
quicksort_wmb and partition_wmb as shown in the
following C++ code in figure 1.1. The bold statements in
the code reflects the changes made over the existing one.
In the above code, three global variables int no_part,
Aorder, Dorder are initialized to zero, further the decision
to return from the quick sort is made when the respective
values of global variables are set to 1 and recursion is
avoided later on. The global variable no_part counts the
number of partitions made in each execution of quicksort,
Aorder is an indicator when the array is encountered to be
in ascending order, similarly Dorder indicates if the array
is in descending order.
Dorder is set to value 1 in the partition function when
i=n and j=n-1, that is i is out of upper bound and j is at the
upper bound of given array.
375
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
3.
Quick sort for ordered input array elements
considering pivot as first element(column 5)
4. Randomized Quick sort for ordered input array
elements with randomized pivot (column 6)
5. WMB_Quick sort for any given array of elements
(column 7)
A method specified by [1] under Bird’s Eye View is
used to clock the time in my C++ code for all the above
listed algorithms. We can notice from the Table 4 that the
quick sort worst case time for n=1000 (column 6) is
0.467081 which is slowest in comparison with other
experiments results (i.e. in the last row). For randomly
generated input array with first element as pivot the time
clocked is 0.093458 which shows that it is quite fast from
that of worst case algorithm (column 3). The randomized
pivot selection for the same has resulted the time as
0.09873 which is because of calling randomized function to
select the pivot element (column 4). The column 6 which
show cases the results for randomized Quick sort in which
ordered input array of elements are considered but pivot is
randomly generated has improved the speed by resulting
the value of time clocked (for n=1000) as 0.064995 which
is the speediest among the experiments considered so far.
Finally we can see the worst made best which has resulted
the time for ordered input as 0.1319. The respective graph
plotted in figure 1.4 for the data in Table 4 gives a clearer
statistics of speed and time complexity.
VI. EXPERIMENTAL RESULTS
The configuration of the test bed used for experiments is
described as: Intel ®, core (™) 2E7200, 2.53 GHz, 0.99
GB of RAM, System: Micro Windows XP Windows 7
viena, Service Pack 3.version 2008-2009. First the code
Quicksort_wmb was tested for its correctness on different
samples of under following 3 categories (two test cases
under each category)
(1) Ascending order Input (Table 1):
(2) Descending Order Input (Table 2)
(3) Randomly generated input numbers (Table 3)
The results obtained for the above three categories (for
simplicity only two are shown here) of data tests the
correctness of the algorithm. In the above tables the first
line indicates the input size of array, the second line request
for entering the input list of elements, no. of partitions
displays the number of partitions happened when
quicksort_wmb was executed. Finally we see the sorted
array of elements displayed. The flag Aorder and Dorder is
displayed when quicksort_wmb identifies the input array in
ascending order or in descending order respectivelyand it
had returned from quicksort_wmb after the first parttion.
Further after confirmation of test cases for
quicksort_wmb the time of execution for various samples
of n ranging from 0…1000 for the following set of
algorithms:
1. Quick sort for randomly generated array elements
considering pivot as first element(column 3)
2. Quick sort for randomly generated array elements
with randomized pivot (column 4)
Table 1:
shows the result obtained when ascending order input was considered
for Quicksort_wmb
enter the size of array :20
enter the elements
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
no. of partition:1
Aorder
sorted array is
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
________________________________________
enter the size of array : 50
enter the elements
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Aorder
no. of partition:1
sorted array is
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
376
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
Table 2:
shows the result obtained when descending order input was
considered for Quicksort_wmb
VII. CONCLUSION AND FUTURE WORK
In this paper, I have shown how Quicksort can perform
well from an angle which has not been discussed in the
theories and analysis of quick sort. With fine tuning the
quicksort_wmb one can use this to sort all possible cases of
input data. Finally considering work done in this paper, we
can claim that this modification will classify quicksort’s
efficiency as Ө (n log n) when we have almost ordered or
randomly ordered as worst and average case respectively,
and Ө(n) when we have strictly ordered as best case.
Further this paper also gives grounds to prove for its
correctness formally. The paper also encourages one to
consider design of program from various angles like
compilers design, system architecture being used to execute
the programs and similar other aspects. This paper also
encourages a program designer to review the designed
algorithms especially for sorting and searching.
enter the size of array : 20
enter the elements
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Dorder
no. of partition:1
sorted array is
1 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 20
___________________________________________________________
__________________________________________________________
enter the size of array :50
enter the elements
50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Dorder
no. of partition:1
sorted array is
1 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27
26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 50
Table 3:
Shows the result obtained when random input numbers were
considered for Quicksort_wmb
Acknowledgement
First of all I would like to thank almighty for making
this work possible. Heartfelt gratitude to Sartaj Sahni for
supporting during the course of analysis. Special thanks to
my colleagues at Vidya Vikas Institute of Engineering and
Technolgy, Mysore especially Aditya C. R Assistant
Professor who has reviewed and pointed places where I had
to improve, finally to my Head of the department Dr.
Khalid Nazim S. A for his encouragement and motivation.
enter the size of array ;20
enter the elements
6 10 2 10 16 17 15 15 8 6 4 18 11 19 12 0 12 1 3 7
no. of partition:15
sorted array is
0 1 2 3 4 6 6 7 8 10 10 11 12 12 15 15 16 17 18 19
___________________________________________________________
__________________________________________________________
enter the size of array :50
enter the elements
46 30 32 40 6 17 45 15 48 26 4 8 21 29 42 10 12 21 13 47 19 41 40 35 14
9 2 21 29 16 31 1 45 43 34 10 29 45 11 42 39 38 16 14 42 13 16 14 39 1
no. of partition: 35
sorted array is
1 1 2 4 6 8 9 10 10 11 12 13 13 14 14 14 15 16 16 16 17 19 21 21 21 26
29 29 29 30 31 32 34 35 38 39 39 40 40 41 42 42 42 43 45 45 45 46 47 48
REFERENCES
[1]
[2]
Table 4:
Time clocked in millisecond for various samples of n
N
value
Avg case
time
Randomised
Avg case
0
10
20
30
40
50
60
70
80
90
100
200
300
400
500
600
700
800
900
1000
0.000389
0.001304
0.002
0.00271
0.003451
0.004201
0.004971
0.005766
0.006558
0.007361
0.008174
0.016648
0.02558
0.034796
0.04424
0.053857
0.063597
0.073447
0.083448
0.093458
0.000395
0.001331
0.00208
0.002853
0.003644
0.004454
0.005282
0.006136
0.006982
0.007834
0.0087
0.017748
0.027214
0.03698
0.046964
0.057149
0.067406
0.077662
0.08816
0.09873
worst
case
0.000392
0.000879
0.001659
0.002493
0.00333
0.004367
0.005494
0.006669
0.007945
0.009262
0.010625
0.029192
0.055645
0.090217
0.132669
0.183577
0.242041
0.309246
0.38297
0.467081
Worst case
randomised
time
0.000395
0.000893
0.001592
0.002245
0.002886
0.003525
0.004163
0.004812
0.005457
0.006089
0.00671
0.013082
0.019452
0.025911
0.032436
0.038907
0.045375
0.051857
0.058463
0.064995
[3]
worst
made best
[4]
0.000392
0.00054
0.000672
0.000792
0.000922
0.001044
0.001188
0.001318
0.001445
0.001573
0.001698
0.002961
0.004227
0.005487
0.00675
0.008009
0.009274
0.010538
0.011894
0.01319
[5]
[6]
[7]
377
Sartaj Sahni, “Data structures and Algorithms in C++”, university
press publication 2004, chapter 4: Performance measurement, pages
123.
Levitin, A. Introduction to the Design and Analysis of Algorithms.
Addison-Wesley,Boston MA, 2007.
Thomas H. Cormen, Charles E. Leiserson,Ronald L. Rivest,Clifford
Stein, “Introduction to Algorithms”, Second Edition, Prentice-Hall
New Delhi,2004.
Vandana Sharma, Parvinder S. Sandhu, Satwinder Singh, and Baljit
Saini,” Analysis of Modified Heap Sort Algorithm on Different
Environment", World Academy of Science, Engineering and
Technology 42 2008.
Yediyah Langsam, Moshe J Augenstein,Aaron M Tenenbaum, “An
introduction to data structures with c++” Prentice hall India Learning
private limited,2e 2008.
”Sorting Algorithm Analysis”, Gina Soileau,Muhammad Younus,
Suresh Nandlall,Tamiko Jenkins, Thierry Ngouolali, Tom
Rivers,Data Structures and Algorithms (SMT-274304-01-08FA1),
Professor James Iannibelli, December 21, 2008.
Omar Khan Durrani, Shreelakshmi V, Sushma Shetty,” Performance
Measurement and analysis
of sorting Algorithms”, National
Conference on Convergent innovative technologies and management
(CITAM-11),held on Dec 2-3,2011 at Cambridge Institute of
Technology and Management, Bangaluru.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 5, Issue 8, August 2015)
[8]
[9]
Omar Khan Durrani, Shreelakshmi V, Sushma Shetty & Vinutha D
C “ Analysis and Determination of Asymptotic Behavior Range For
Popular Sorting Algorithms”Special Issue of International Journal of
Computer Science & Informatics (IJCSI), ISSN (PRINT) :2231–
5292, Vol.- II, Issue-1, 2.
C.Canaan, M.S Garai, M Daya,”Popular Sorting Algorithms”,World
Applied Programming, Vol 1,No. 1,April 2011,pages 42-50.
[10] Donald Knuth. The Art of Computer Programming, Volume 3:
Sorting and Searching, Second Edition. Addison-Wesley, 1998.
ISBN 0-201-89685-0. Pages 106–110 of section 5.2.2:Sorting by
Exchanging.
[11] Weiss, M. A., “Data Structures and Algorithm Analysis in C”.
Addison-Wesley,Second Edition, 1997 ISBN: 0-201-49840-5.
378
Download