Submission Date: August 7, 2012ACKNOWLEDGEMENT

advertisement
TRIBHUVAN UNIVERSITY
Institute Of Engineering
Pulchowk Campus
Department of Electronics and Computer Engineering
A Report
on
Artificial Intelligence
Title: Fuzzy
Submitted by
Aayush Shrestha
(065/BCT/502)
Logic String Search
Submitted to
Department of Electronics and
Computer Engineering
Pulchowk Campus
Submission Date: August 7, 2012
ACKNOWLEDGEMENT
I would like to express my deep gratitude to Dr. Sashidhar Ram Joshi for the inclusion
of Artificial Intelligence project which is expected to help us in getting practical
knowledge in implementation of Artificial Intelligence in real life application.
I am grateful to him for his continuous suggestions, ideas and encouragement to do a
project that would address to real world problem.
Also I would like to appreciate all the friends for their ideas and help.
~2~
ABSTRACT
Fuzzy Logic String Search is a web plugin written in javascript that uses “Levenshtein
Edit Distance Algorithm” to search for appropriate string suggestion from a dictionary
file that is stored in a “trie structure”.
The plugin searches the dictionary for the list of words with the same suffix as the user
types in the input form. Then levenshtein edit distance algorithm is used to filter the list
to show only the options that are within the given restrictions of edit distance.
~3~
INTRODUCTION
In computer science, fuzzy string suggestion algorithm is the technique of finding strings
that match a pattern approximately (rather than exactly). Fuzzy string searching concept
provides a powerful way for comparing strings. By using this logic we are able to
compare between two strings. We are able to find the similarities between two string. If
two strings are same then the similarity between them is 1 i.e. 100%. But if the strings
are different then the similarity between them is less than one i.e. there is certain
distance between them. For example:- if we use fuzzy string searching algorithm for
finding similarity between the strings ‘John’ and ‘Jon’ then the similarity between them is
0.6875.
Fuzzy Logic String Search can be customized to compare not only strings but other
data so that it would be helpful in calculation and comparison of data in other various
fields.
PROJECT OBJECTIVE
This project is related to finding similarities between strings. By using this program we
can find possible option for the string if the string is wrong.
The major objectives of the project are as follows:
1) To find the similarities between two strings.
2) To find the possible option for the entered string if the entered string is
wrong.
3) To use fuzzy string searching algorithm.
4) Utilize the “Trie Structure” to store the words for better search mechanism.
~4~
METHODOLOGY
FLOWCHART
Start
Enter letters of
string
Search the dictionary and list
all the words that start with the
input suffix
Apply Levenshtein Algorithm
to find the edit distance
between input string and all
the listed words from the
dictionary.
Is the edit
distance less
than 4?
~5~
Display the word
DESCRIPTION
The program starts reading the structure of the text as soon as the user starts typing it.
The program will then read a dictionary file where there is a huge list of valid English
words which will be used to compare with the entered string in alphabetic order. The
program will also keep a count of the total number of strings read.
The comparison between two strings, one of which is the entered string by the user and
the other is the string obtained from the mini library is done by finding the records of the
number of hits, cost of each hits and the distance between the two strings. After
calculating the hits using cost and the distance, we find the similarity value between two
strings. The similarity value between the two string ranges from 0 to 1. If the similarity
value is 1 that means the two strings are exactly same. Greater the similarity values,
more similar are the two strings than the strings have less similarity value. Thus we
determine at most top ten strings that have the highest ten similarity value with the
given string.
Trie Structure
In computer science, a trie, or prefix tree, is an ordered tree data structure that is
used to store a dynamic set or associative array where the keys are usually strings.
Unlike a binary search tree, no node in the tree stores the key associated with that
node; instead, its position in the tree defines the key with which it is associated. All
the descendants of a node have a common prefix of the string associated with that
node, and the root is associated with the empty string. Values are normally not
associated with every node, only with leaves and some inner nodes that correspond
to keys of interest.
~6~
Levenshtein Edit Distance
The Levenshtein Edit Distance (LED) is the number of edits (insertions, deletions, and
substitutions) required to transform a string (A) into another string (B). Among other
levels, edit distances can be computed at the level of letters, words, phrases, or even
passages.
Levenshtein distance is obtained by finding the cheapest way to transform one string
into another. Transformations are the one-step operations of (single-phone) insertion,
deletion and substitution. In the simplest versions substitutions cost two units except
when the source and target are identical, in which case the cost is zero. Insertions and
deletions costs half that of substitutions. This demonstration illustrates a simple
algorithm which basically looks at all of the different ways for operations to transform
one string to another.
For example, the Levenshtein distance between "George" and "Geordie" is 3, since the
following three edits change one into the other, and there is no way to do it with fewer
than three edits:
1. George → Georde(substitution of 'g' for 'd')
2. George → Geordie (insertion of 'i' between 'd' and ‘e’)
The edit distance is calculated by forming a matrix of size (m+1)x(n+1) where m is the
length of the first word and n is the length of the second word (in the "George" vs.
"Geordie" example above, the matrix would measure 7 by 8). Note the "plus one", as
these rows are the "seed" values to kick everything off and allow a very simple
comparison later. The first column and row will not be modified later as the loops start at
the column and row indexed by one, not zero.
~7~
While looping through the strings, the indexes of both strings are the coordinates of
the matrix. Note that the string indexes and the matrix will be off by one as the matrix
coordinates start at. The actual calculation is interesting as it looks at three cells, the
cell to the left, the cell above and the cell to the upper left. The point of the three
comparisons is to take the minimum value from any of the three cells.
Once the matrix is initialized, the remainder of the values may be filled in. Once the
entire matrix is filled in, the answer is in the lower right hand cell.
For each comparison, either 0 or 1 is the result. If the two characters are the same,
then the result is 0; if they are different, the result is 1. This is used in generating one
of the inputs to the selection of the minimum value of the three candidates that will
result in the value inserted into the next cell of the matrix. The result of comparing
the 'G' to the 'G' would then be 0. The cell above has a value of 1,the cell to the left
has a value of 1, and the value of the cell one up and one to the left plus the result of
the comparison is 0. Feeding 1,1,0 into the three way minimum results in 0, which is
then placed into the cell resulting in a matrix with these values:
~8~
3.
4. Figure 2: Matrix after the first iteration
The result of the first pass would then be:
5.
6. Figure 3: Matrix after the first pass
Letting the loops run to completion results in a matrix that looks like:
7.
8. Figure 4: The completed matrix
And the result is 2, as predicted originally.
~9~
OUTPUT
~ 10 ~
AREAS OF APPLICATION
This program is a web plugin and can be applied to any webpage for autocomplete
feature in input forms. It is easy to use and user friendly and can increase the efficiency
of users. It can be used in search forms and smaller, more peculiar dictionary set can
be used to provide the user with the possible options for the input.
LIMITATIONS
There are few limitations in the project which are listed below:
1. This demo plugin works only for single string. i.e. The input form can only handle
one single string and show the possible options.
2. The algorithm is not as optimized as it can be. So there are some performance
issues including speed and occasional hang-ups.
~ 11 ~
Download