DFA Minimizing State Machines Using Hash- Tables

advertisement
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
DFA Minimizing State Machines Using HashTables
Vishal Garg1, Anu2
1
Department of Computer Science & Engineering., Kurukshetra University
JMIT Radaur, Yamuna Nagar, Haryana, India.
2
Department of Computer Science & Engineering., Kurukshetra University
JMIT Radaur, Yamuna Nagar, Haryana, India
Abstract— In algorithm design DFA minimization is an
important problem.DFA minimization is based upon DFA
equivalence. Two DFA’s can be called as equivalent DFA’s if and
only if they accept the same strings of same set. Various
algorithms are used to implement the minimized DFA. The
performance of these algorithms is verified using a set of data
which is generated artificially. Results presents a number of
cases where the new Hash table based algorithms perform better
than the traditional TD algorithm and there is also a
comparative analysis of performance between DFA using array
and DFA using Hash Tables.
Keywords— Deterministic Finite Automata, NFA, Regular
expressions, Array, Acceptable States. Performance.
I. INTRODUCTION
In automata theory, a branch of theoretical computer
science, a deterministic finite automaton[1] (DFA)—also
known as deterministic finite state machine—is a finite state
machine that accepts/rejects finite strings of symbols and only
produces a unique computation (or run) of the automaton for
each input string. We can refer deterministic as the
distinctiveness of the computation.
II. DFA MINIMIZATION
DFA can be called as deterministic finite automaton and
also known as deterministic finite state machine [5] which
accepts/rejects finite strings of finite symbols and after that
generates a unique computation of the automaton for every
input string.
This paper presents an algorithm which converts a regular
expression (finite set of strings) into a minimal DFA using
Hash tables. The algorithm works in
the following two
phases:
(i) Constructing a DFA by using the methods for automaton
Construction as well as joining operations of DFA
(ii) Minimizing the obtained DFA
It is based on the set of graph grammar rules which
combines many graphs (DFA) to obtain another preferred
graph (DFA). The graph grammar rules are available in the
form of a parsing algorithm which converts a regular
expression R into a minimal deterministic finite automaton M
in such a way that the language which is accepted by DFA M
is same as the language which is described by regular
A DFA representing a regular language [3] can be used
expression R.
either in an accepting mode to validate that an input string is
DFA minimization converts a given deterministic finite
part of the language, or in a generating mode to generate a list
automaton (DFA) into an equivalent deterministic finite
of all the strings in the language.
automaton DFA [5]. That equivalent DFA has minimum
number of states. Two DFAs are called equivalent if and only
In the accept mode an input string is provided which the
if they recognize the same regular languages.
automaton can read in direction from left to right and only
For each and every regular language [3] which can be
one symbol at a time. The computation starts at the initial state
accepted by a DFA, there will be a minimal automaton i.e. a
and proceeds by reading the first symbol from the input string
DFA which has a minimum number of states and this DFA
and following the state transition corresponding to that symbol.
will always be a unique one. Equivalent states can be
DFAs recognize precisely the set of regular languages which
combined in building minimal finite automata [6].
are, among other things; for doing lexical analysis and pattern
matching [4] these regular languages are very useful. DFAs
There are two classes of states in a DFA. These states can
can be built from nondeterministic finite automata through the
be removed or can be merged from the original DFA to
power-set construction.
minimize the DFA without affecting the language (accepted
by DFA) .Unreachable and Non-Distinguishable are those two
states.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2577
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
III. OBJECTIVES

Implementation of DFA.

Implement States of input strings [5] using Generic
Hash Table.

Generate State Graphs for each user input.

Create Test cases and test for the DFA Sate Machine
using Hash Tables [5].

Assess the Performance of Table Driven [1] DFA
using Hash Table appropriate tools.

Formulation of results in proper means such as
graphs.
IV. ALGORITHM
The procedure used for implementation of this hash Table
is shown in Figure1 and also described below with the help of
an algorithm. There will be two set of states to implement the
DFA, First one is set of Input States and second one is set of
Acceptable States. Final DFA contains only those states which
are in set of acceptable states. Following Steps are followed to
implement a minimized DFA using hash tables:
1.
Initialize Input Symbols.
2.
Initialize Acceptable States. Add Acceptable states to
Hash table.
3.
For each Input do
i)
If states in acceptable table add states to the
DFA.
ii) Else remove the state from input symbol.
End
4.
Minimize automaton to DFA.
5.
OUT := Minimized DFA
6.
End
FIG 1. (FLOW CHART)
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2578
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
V. PROBLEM DESCRIPTION
Finite state automata (FSA) are ubiquitous in
mathematics and computer science,. Applications include
pattern matching, signal processing, natural language
processing, speech recognition, token passing networks
[6](including sorting networks), compilers, and digital logic.
The DFA Machine Can be used to minimize a NFA, what
is needed is to do is query it. Some functions like Generic
match() can be built. But such a Function would take longer
with "Regular Expressions"[7][8], Such as “*”, “?”, “.” Etc.
The Reason to use a Hash Table[11] instead of a linear
Array is because an Array is good for large sum of data, but
poor in searching. However, A Hash Table is excellent for
Search. The Tradeoffs is Hash Table is not useful for pattern
matching [6] or "path finding”
As with any DFA performance is the key. The
implementation has is comparable to a Hash table. As it is
known, In Arrays Searches are many times slower than a Hash
table, but a Hash table is incredibly fast this purpose.
VI. . SIMULATION AND RESULTS
Fig2. DFA State Machine Working Diagram
We implemented DFA with two techniques:
a) Array
b) Hash Tables
The comparison of both techniques is shown in figure. Both
techniques are implemented using same number of input and
accepted states. As shown in figure (3&4) as the number of
states increases the time is also increases but the execution
time in hash table is less than the array for the same number of
states. After some time the execution time in hash table
technique becomes stable but in array it increases
exponentially.
DFA accepts only those states which are in acceptable set of
states and rejects all those states which are not in acceptable
set.
Various Algorithms and Libraries have been developed
over the years for the minimization of a NFA to DFA. KMP
(Knuth-Morris-Pratt) Algorithm is one such method for
finding patterns in the String. There are various other
Algorithms available but almost all use the simple arrays for
storage of patterns. The Use for arrays although Simplifies
the implantation process to a great extent, However, Linear
Arrays fade in performance comparison to Hash Tables.
Because of their Performance in Searching the Hash Tables
can be used to greatly improve the running time of a DFA.
The Implantation will take two Tables one will consist of
Set of end States and other Table will be set of Strings
provided by the user. Both tables can be changed as needed
and the system will generate DFA Sate Diagrams toward the
end sate for each user defined input symbol.
Fig 3, Performance Comparison B/w DFA using Array and Hash Table
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2579
International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013
As it is well known that in Arrays Searches are many
times slower than a Hash-Table, but a Hash-Table is
incredibly fast for same purpose.. In near future, we would
like to modify this algorithm in such a way that it will also
accepts words and sentence within a limit and also access the
Space/time tradeoffs of the same in more efficient way.
ACKNOWLEDGEMENT
Fig 4. Performance Comparison B/w DFA using Array and Hash Table
VII.
This research paper is made possible through the help and
support from everyone including family, teachers, and friends.
We thanks to our teachers, friends and especially to our
parents who provide the advice and financial support. This
research paper would not be possible without all of them.
CONCLUSION
REFERENCES
[1]
An important problem in algorithm design is DFA
minimization and it is based on the notion of DFA
equivalence. Number of counting problems can be solved in a
easy way by encoding of combinatorial objects, the objects
which we want to count as strings. If the collection of encoded
strings is accepted by a deterministic finite automata, then
number of algorithms from computational linear algebra can
be used to solve the counting problem in an efficient way.
Table-driven (TD) DFA based string processing algorithms
are checked and verified by using a number of methods.
Firstly, various strategies for implementing such algorithms in
a cache efficient manner are identified.
The performance of these algorithms is verified using a set of
data which is generated artificially. Results indicate a number of
cases where the new algorithms outperform the traditional TD
algorithm.
The DFA machine has to take a "path", which is usually a
string, at a time, so that adding a sentence or a word
automatically gets accepted in the machine.
VIII.
FUTURE WORK
Ernest ketcha am, derrick g. kourie, and brucngasse w. watson,“On
Implementation and Performance of Table-Driven DFA-Based String
Processors”. Int. J. Found. Comput. Sci. 19, 53 (2008).
[2]
Bruce William Watson, “Constructing Minimal Acyclic Deterministic
Finite Automata”. FASTAR Research Group.
[3]
Domenico Ficara, Stefano Giordano, Gregorio Procissi, Fabio Vitucci,
Gianni Antichi, and Andrea Di Pietro.. “An improved DFA for fast
regular expression matching”, SIGCOMM Comput. Commun. Rev. 38,
September 2008.
[4]
Aakanksha Pandey, Dr. Nilay Khare and Akhtar Rasool” Efficient
Design and Implementation of DFA Based Pattern Matching on
Hardware”, IJCSI March 2012.
[5]
Vishal Garg, Anu “A review of DFA Minimizing State Machines
Using Hash- Tables ” Department of Computer Science &
Engineering, Kurukshetra University, April 2013
.[6]
Vlad Slavici, Daniel Kunkle, Gene Cooperman and Stephen Linton,”
Finding the Minimal DFA of Very Large Finite State Automata with
an Application to Token Passing Networks” Northeastern University,
29 March 2011.
[7]
Ben-David, S., D. Fisman, and S. Ruah “ Embedd in finite automata
Within regular expressions”.Theoretical Computer Science. vol. 404,
no. 3, pp. 202-218., 2008.
[8]
Berry, G. and R. Sethi “From regular expressions to deterministic
automata”. [1986]
[9]
Bruggemann-Klein A. “Regular expressions into finite automata”.
Theoretical Computer Science. vol. 120, no. 2, pp. 197-213, [1993].
[10] Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and
Dan Suciu. “Processing XML streams with deterministic automata
and stream indexes”, ACM Trans. Database Syst. 29, 4 (December
2004)
[11] Isaac Keslassy, David Hay, Yossi Kanizo, Isaac Keslassy, David Hay,
Yossi Kanizo. “Optimal Fast Hashing”, Technical Report Tr08-05,
Comnet, Technion, Israel.
Various algorithms have been developed for the
minimization of a NFA to DFA and also for implementation
of DFA. In previous years almost all use the simple arrays for
storage of patterns. The Use for arrays although Greatly
Simplifies the implementation process, However, Linear
Arrays is very poor in performance as comparison to more
generic data structures like, Hash-Tables. Hash Tables are
much faster in searching than arrays so because of their
performance in Searching the Hash-Tables can be used to
greatly improve the running time of a DFA.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 2580
Download