Efficient Text Searching Algorithms: A Comparative Analysis

Efficient Text Searching Algorithm Kartikay Kaul | Professor Saleena B. | School of Computing Science and Engineering Introduction First we will consider a simple C++ character array: “This is an array.” What if you wanted to remove all the whitespaces?.bWe can accomplish this by constructing a new array in which every time we run into a whitespace, we shift all the contents to the left. Or we can just modify the same array. We will either convert the entire array into lowercase or uppercase to make the matching humane. Now that the text parsing is done we can apply and analyse the famous text searching algorithms. They are: Naïve search, Rabin-Karp method, KMP method, Boyermoore method. there is some preprocessing involved. RabinKarp method uses the hashing technique. Quite useful in checking for plagiarized documents. Boyer-moore algorithm also does preprocessin g. The method is same as naïve method when the text size is small. Results Table 1: Result in milliseconds – Alphabetical pattern |pattern| matches Naive KMP Boyermoore RK 3 40 225 221 242 194 10 0 224 221 82 194 50 0 224 221 25 194 Table 2 : Result in milliseconds – DNA Text File |pattern| matches Naive KMP Boyermoore RK Scope 3 156455 331 308 340 204 Intrusion detection, plagiarized documents, dna sequenced bioinformatics, text mining research, etc are some of the fields where the scope of these algorithms are and it is important for us to know which algorithm will be applicable where on the basis of field we have chosen to investigate. 10 8 329 306 204 187 50 2 328 305 148 186 Methodology The naïve search method is quite simple and straightforward. It slides the pattern through the text and that’s it. When we move to KMP method, Conclusion Out of all the tested algorithms, boyer moore algorithm seems to be the best algorithm. It is not best in all the cases. Applications such as plagiarized text documents find rabin-karp to be the most apt solution. In cases which consist of a text which is about just 100 characters long, boyer-moore algorithm has running time same as naïve search algorithm. So in such cases we can implement the short and simple algorithm of naïve method. CONTACT DETAILS kartikay.kaul2016@vitstudent.ac.in kartikaykaul13@gmail.com

Efficient Text Searching Algorithms: A Comparative Analysis

Related documents

Products

Support

Efficient Text Searching Algorithms: A Comparative Analysis

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib