CS-5800 Theory of Computation II PROJECT PRESENTATION By Quincy Campbell & Sandeep Ravikanti Text Searching Conversion of Regular Expression to DFA and Use for Text Searching. Introduction • Text Searching , an application of Finite State automata concepts. • A regular expression which is initially defined is parsed to DFA (Deterministic Finite Automata). • The obtained DFA is transformed to a transition table. • Thus the Scenario of Searching text is implemented with the help transition table. Regular Expression: • Regular expressions consist of constants and operator symbols that denote sets of strings and operations over these sets, which is therefore referred to as a regular set or language. • Let ∑ be an alphabet. The regular expression over ∑ are defined recursively • Basis :Ø,λ and a ,for every a €∑, are regular expressions over ∑. • Recursive Step: Let u and v be regular expressions over ∑. (u U v) (uv) (u*) These Expressions are regular expressions over ∑. • Closure: u is a regular expression over ∑ only if it can be obtained from the basis elements by a finite number of applications of recursive step. Deterministic Finite Automata: A deterministic finite automata M is a quintuple, (Q, Σ, δ, q0, F), consisting of • a finite set of states (Q) • a finite set of input symbols called the alphabet (Σ) • a transition function (δ : Q × Σ → Q) • a start state (q0 ∈ Q) • a set of accept states (F ⊆ Q) Text searching using a “*” Kleen Closure: Input String : (ab)* Output :Final State are “2”,”0” in transition table: Strings that can be obtained from the given expression {λ,ab, abab, ababab……etc.….} Kleen closure: Input String : a* Output :Final State are “2”,”0” in transition table: Strings that can be obtained from the given expression {λ,ab, abab, ababab……etc.….} Union “(a)+”: Input: Union Closure. Of a+” Output: Final States of “1” with transitions from aa Regular Expression With Parenthesis , Kleen Closure And Union Closure Input: ((ab)*(cd)+) with String “ When hug” Output: Final States would be “4” with No Matches for given String for searching. Issues: • Limitations of accessibility in handling transitions • Using of –closures operation for the Union operation is necessary. • An example for the expression which doesn’t work… New Approach Based on the limitations we had in the previous approach , we designed Text Searching with the Implementation of Thompson’s algorithm. Parsing of regular expression to NFA- Non Deterministic Finite Automata is done with the Thompson’s algorithm .The obtained NFA is converted DFA (Deterministic Finite Automata) with Subset Construction Algorithm. The generated DFA is formed into transition table and used for text Searching. THOMPSON’S ALGORITHM • The simplest method to convert a regular expression to a NFA is Thompson's Construction, also known as Thompson's Algorithm. Roughly speaking this works by reducing the regular expression to its smallest constituent regular expressions, converting these to NFA and then joining these NFA together. • Derives a nondeterministic finite automata (NFA) from any regular expression by splitting it into its constituent sub expressions, from which the NFA will be constructed using a set of rules Rules of Thompson’s Algorithm For a regular expression of a single symbol such as “b” resultant NFA would be as follows: For Union of regular expression “a|b:” For Kleen Star “(a|b)*” Final NFA: The NFA obtained after the application of rules of Thompson’s algorithm SUBSET CONSTRUCTION ALGORITHM: 1. Create the start state of the DFA by taking the Є-closure of the start state of the NFA. 2. Perform the following for the new DFA state: For each possible input symbol: 1. Apply move to the newly-created state and the input symbol; this will return a set of states. 2. Apply the Є -closure to this set of states, possibly resulting in a new set. 3. This set of NFA states will be a single state in the DFA. 4. Each time we generate a new DFA state, we must apply step 2 to it. The process is complete when applying step 2 does not yield any new states. 5. The finish states of the DFA are those which contain any of the finish states of the NFA. Example: If A, B and C are states, move({A,B,C},`a') = Move (A, ‘a’) move(B, ‘a’) move(C, ‘a’). Considering an Example to perform Subset Construction Algorithm to generate DFA from the given NFA Given NFA Creating the Start State for DFA.by removing the epsilon closures The Final DFA : a q1 c c q0 q3 c b a b q0={1,3,5,7,8,9}start state q1={1,2,3,5,6,8,9} q2={1,3,4,5,6,8,9} q3={10} final state q2 b c Sample Output Conclusion: We finally conclude that the searching of text is implemented with achievement transition table of DFA for the given regular expression. References : Sudkamp A Thomas., Languages and Machines .Introduction to Theory of Computer Science Hop craft, J.E., and Ullman[1979],Introduction to automata theory ,Languages and Computation, Addison-Wesley, Reading ,MA https://class.coursera.org/automata/lecture/preview http://en.wikipedia.org/wiki/Thompson's_construction_algo rithm Thank you