International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 DFA Minimizing State Machines Using HashTables Vishal Garg1, Anu2 1 Department of Computer Science & Engineering., Kurukshetra University JMIT Radaur, Yamuna Nagar, Haryana, India. 2 Department of Computer Science & Engineering., Kurukshetra University JMIT Radaur, Yamuna Nagar, Haryana, India Abstract— In algorithm design DFA minimization is an important problem.DFA minimization is based upon DFA equivalence. Two DFA’s can be called as equivalent DFA’s if and only if they accept the same strings of same set. Various algorithms are used to implement the minimized DFA. The performance of these algorithms is verified using a set of data which is generated artificially. Results presents a number of cases where the new Hash table based algorithms perform better than the traditional TD algorithm and there is also a comparative analysis of performance between DFA using array and DFA using Hash Tables. Keywords— Deterministic Finite Automata, NFA, Regular expressions, Array, Acceptable States. Performance. I. INTRODUCTION In automata theory, a branch of theoretical computer science, a deterministic finite automaton[1] (DFA)—also known as deterministic finite state machine—is a finite state machine that accepts/rejects finite strings of symbols and only produces a unique computation (or run) of the automaton for each input string. We can refer deterministic as the distinctiveness of the computation. II. DFA MINIMIZATION DFA can be called as deterministic finite automaton and also known as deterministic finite state machine [5] which accepts/rejects finite strings of finite symbols and after that generates a unique computation of the automaton for every input string. This paper presents an algorithm which converts a regular expression (finite set of strings) into a minimal DFA using Hash tables. The algorithm works in the following two phases: (i) Constructing a DFA by using the methods for automaton Construction as well as joining operations of DFA (ii) Minimizing the obtained DFA It is based on the set of graph grammar rules which combines many graphs (DFA) to obtain another preferred graph (DFA). The graph grammar rules are available in the form of a parsing algorithm which converts a regular expression R into a minimal deterministic finite automaton M in such a way that the language which is accepted by DFA M is same as the language which is described by regular A DFA representing a regular language [3] can be used expression R. either in an accepting mode to validate that an input string is DFA minimization converts a given deterministic finite part of the language, or in a generating mode to generate a list automaton (DFA) into an equivalent deterministic finite of all the strings in the language. automaton DFA [5]. That equivalent DFA has minimum number of states. Two DFAs are called equivalent if and only In the accept mode an input string is provided which the if they recognize the same regular languages. automaton can read in direction from left to right and only For each and every regular language [3] which can be one symbol at a time. The computation starts at the initial state accepted by a DFA, there will be a minimal automaton i.e. a and proceeds by reading the first symbol from the input string DFA which has a minimum number of states and this DFA and following the state transition corresponding to that symbol. will always be a unique one. Equivalent states can be DFAs recognize precisely the set of regular languages which combined in building minimal finite automata [6]. are, among other things; for doing lexical analysis and pattern matching [4] these regular languages are very useful. DFAs There are two classes of states in a DFA. These states can can be built from nondeterministic finite automata through the be removed or can be merged from the original DFA to power-set construction. minimize the DFA without affecting the language (accepted by DFA) .Unreachable and Non-Distinguishable are those two states. ISSN: 2231-5381 http://www.ijettjournal.org Page 2577 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 III. OBJECTIVES Implementation of DFA. Implement States of input strings [5] using Generic Hash Table. Generate State Graphs for each user input. Create Test cases and test for the DFA Sate Machine using Hash Tables [5]. Assess the Performance of Table Driven [1] DFA using Hash Table appropriate tools. Formulation of results in proper means such as graphs. IV. ALGORITHM The procedure used for implementation of this hash Table is shown in Figure1 and also described below with the help of an algorithm. There will be two set of states to implement the DFA, First one is set of Input States and second one is set of Acceptable States. Final DFA contains only those states which are in set of acceptable states. Following Steps are followed to implement a minimized DFA using hash tables: 1. Initialize Input Symbols. 2. Initialize Acceptable States. Add Acceptable states to Hash table. 3. For each Input do i) If states in acceptable table add states to the DFA. ii) Else remove the state from input symbol. End 4. Minimize automaton to DFA. 5. OUT := Minimized DFA 6. End FIG 1. (FLOW CHART) ISSN: 2231-5381 http://www.ijettjournal.org Page 2578 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 V. PROBLEM DESCRIPTION Finite state automata (FSA) are ubiquitous in mathematics and computer science,. Applications include pattern matching, signal processing, natural language processing, speech recognition, token passing networks [6](including sorting networks), compilers, and digital logic. The DFA Machine Can be used to minimize a NFA, what is needed is to do is query it. Some functions like Generic match() can be built. But such a Function would take longer with "Regular Expressions"[7][8], Such as “*”, “?”, “.” Etc. The Reason to use a Hash Table[11] instead of a linear Array is because an Array is good for large sum of data, but poor in searching. However, A Hash Table is excellent for Search. The Tradeoffs is Hash Table is not useful for pattern matching [6] or "path finding” As with any DFA performance is the key. The implementation has is comparable to a Hash table. As it is known, In Arrays Searches are many times slower than a Hash table, but a Hash table is incredibly fast this purpose. VI. . SIMULATION AND RESULTS Fig2. DFA State Machine Working Diagram We implemented DFA with two techniques: a) Array b) Hash Tables The comparison of both techniques is shown in figure. Both techniques are implemented using same number of input and accepted states. As shown in figure (3&4) as the number of states increases the time is also increases but the execution time in hash table is less than the array for the same number of states. After some time the execution time in hash table technique becomes stable but in array it increases exponentially. DFA accepts only those states which are in acceptable set of states and rejects all those states which are not in acceptable set. Various Algorithms and Libraries have been developed over the years for the minimization of a NFA to DFA. KMP (Knuth-Morris-Pratt) Algorithm is one such method for finding patterns in the String. There are various other Algorithms available but almost all use the simple arrays for storage of patterns. The Use for arrays although Simplifies the implantation process to a great extent, However, Linear Arrays fade in performance comparison to Hash Tables. Because of their Performance in Searching the Hash Tables can be used to greatly improve the running time of a DFA. The Implantation will take two Tables one will consist of Set of end States and other Table will be set of Strings provided by the user. Both tables can be changed as needed and the system will generate DFA Sate Diagrams toward the end sate for each user defined input symbol. Fig 3, Performance Comparison B/w DFA using Array and Hash Table ISSN: 2231-5381 http://www.ijettjournal.org Page 2579 International Journal of Engineering Trends and Technology (IJETT) - Volume4 Issue6- June 2013 As it is well known that in Arrays Searches are many times slower than a Hash-Table, but a Hash-Table is incredibly fast for same purpose.. In near future, we would like to modify this algorithm in such a way that it will also accepts words and sentence within a limit and also access the Space/time tradeoffs of the same in more efficient way. ACKNOWLEDGEMENT Fig 4. Performance Comparison B/w DFA using Array and Hash Table VII. This research paper is made possible through the help and support from everyone including family, teachers, and friends. We thanks to our teachers, friends and especially to our parents who provide the advice and financial support. This research paper would not be possible without all of them. CONCLUSION REFERENCES [1] An important problem in algorithm design is DFA minimization and it is based on the notion of DFA equivalence. Number of counting problems can be solved in a easy way by encoding of combinatorial objects, the objects which we want to count as strings. If the collection of encoded strings is accepted by a deterministic finite automata, then number of algorithms from computational linear algebra can be used to solve the counting problem in an efficient way. Table-driven (TD) DFA based string processing algorithms are checked and verified by using a number of methods. Firstly, various strategies for implementing such algorithms in a cache efficient manner are identified. The performance of these algorithms is verified using a set of data which is generated artificially. Results indicate a number of cases where the new algorithms outperform the traditional TD algorithm. The DFA machine has to take a "path", which is usually a string, at a time, so that adding a sentence or a word automatically gets accepted in the machine. VIII. FUTURE WORK Ernest ketcha am, derrick g. kourie, and brucngasse w. watson,“On Implementation and Performance of Table-Driven DFA-Based String Processors”. Int. J. Found. Comput. Sci. 19, 53 (2008). [2] Bruce William Watson, “Constructing Minimal Acyclic Deterministic Finite Automata”. FASTAR Research Group. [3] Domenico Ficara, Stefano Giordano, Gregorio Procissi, Fabio Vitucci, Gianni Antichi, and Andrea Di Pietro.. “An improved DFA for fast regular expression matching”, SIGCOMM Comput. Commun. Rev. 38, September 2008. [4] Aakanksha Pandey, Dr. Nilay Khare and Akhtar Rasool” Efficient Design and Implementation of DFA Based Pattern Matching on Hardware”, IJCSI March 2012. [5] Vishal Garg, Anu “A review of DFA Minimizing State Machines Using Hash- Tables ” Department of Computer Science & Engineering, Kurukshetra University, April 2013 .[6] Vlad Slavici, Daniel Kunkle, Gene Cooperman and Stephen Linton,” Finding the Minimal DFA of Very Large Finite State Automata with an Application to Token Passing Networks” Northeastern University, 29 March 2011. [7] Ben-David, S., D. Fisman, and S. Ruah “ Embedd in finite automata Within regular expressions”.Theoretical Computer Science. vol. 404, no. 3, pp. 202-218., 2008. [8] Berry, G. and R. Sethi “From regular expressions to deterministic automata”. [1986] [9] Bruggemann-Klein A. “Regular expressions into finite automata”. Theoretical Computer Science. vol. 120, no. 2, pp. 197-213, [1993]. [10] Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. “Processing XML streams with deterministic automata and stream indexes”, ACM Trans. Database Syst. 29, 4 (December 2004) [11] Isaac Keslassy, David Hay, Yossi Kanizo, Isaac Keslassy, David Hay, Yossi Kanizo. “Optimal Fast Hashing”, Technical Report Tr08-05, Comnet, Technion, Israel. Various algorithms have been developed for the minimization of a NFA to DFA and also for implementation of DFA. In previous years almost all use the simple arrays for storage of patterns. The Use for arrays although Greatly Simplifies the implementation process, However, Linear Arrays is very poor in performance as comparison to more generic data structures like, Hash-Tables. Hash Tables are much faster in searching than arrays so because of their performance in Searching the Hash-Tables can be used to greatly improve the running time of a DFA. ISSN: 2231-5381 http://www.ijettjournal.org Page 2580