Uploaded by Kartikay Kaul

Project Poster Template

advertisement
Efficient Text Searching Algorithm
Kartikay Kaul | Professor Saleena B. | School of Computing Science and Engineering
Introduction
First we will consider a simple C++ character
array: “This is an array.”
What if you wanted to remove all the
whitespaces?.bWe can accomplish this by
constructing a new array in which every time
we run into a whitespace, we shift all the
contents to the left. Or we can just modify the
same array. We will either convert the entire
array into lowercase or uppercase to make the
matching humane. Now that the text parsing is
done we can apply and analyse the famous text
searching algorithms. They are: Naïve search,
Rabin-Karp method, KMP method, Boyermoore method.
there is some preprocessing involved. RabinKarp method uses the hashing technique. Quite
useful in checking for plagiarized documents.
Boyer-moore algorithm also does preprocessin
g. The method is same as naïve method when
the text size is small.
Results
Table 1: Result in milliseconds – Alphabetical pattern
|pattern|
matches
Naive
KMP
Boyermoore
RK
3
40
225
221
242
194
10
0
224
221
82
194
50
0
224
221
25
194
Table 2 : Result in milliseconds – DNA Text File
|pattern|
matches
Naive
KMP
Boyermoore
RK
Scope
3
156455
331
308
340
204
Intrusion detection, plagiarized documents, dna
sequenced bioinformatics, text mining research,
etc are some of the fields where the scope of
these algorithms are and it is important for us to
know which algorithm will be applicable where
on
the basis of field we have chosen to investigate.
10
8
329
306
204
187
50
2
328
305
148
186
Methodology
The naïve search method is quite simple and
straightforward.
It
slides
the
pattern
through the text and
that’s it. When we
move to KMP method,
Conclusion
Out of all the tested algorithms, boyer moore algorithm
seems to be the best algorithm. It is not best in all the
cases. Applications such as plagiarized text documents
find rabin-karp to be the most apt solution. In cases
which consist of a text which is about just 100
characters long, boyer-moore algorithm has running
time same as naïve search algorithm. So in such cases
we can implement the short and simple algorithm of
naïve method.
CONTACT DETAILS
kartikay.kaul2016@vitstudent.ac.in
kartikaykaul13@gmail.com
Download