Lecture 18 - NFA and RegEx

INTRODUCTION TO DISCRETE STRUCTURES NFA and RegEx 1.1–1.2 INTRODUCTION TO DISCRETE STRUCTURES 12.2 NFA and Regular Expressions • Regular Expressions • Non-deterministic finite automata (NFA) • Rules for building NFA Regular Sets A language is the set of strings derived from a grammar Type 3 grammars are known as regular grammars. A set is generated by a regular grammar if and only if it is a regular set. The set of languages generated by right-linear grammar are regular sets. Regular sets are sets that represents the value of a regular expression Regular Expressions Previously, we had been describing the languages generated by grammars in English, but describing the sets this way may become overly complicated Regular expressions, or often called regex, are an algebraic way to describe the sequence of characters that compose the strings in regular sets Each regular expression represents a set specified by these rules: Æ represents the empty set, that is, the set with no strings; λ represents the set {λ}, which is the set containing the empty string; x represents the set {x} containing the string with one symbol x; (AB) represents the concatenation of the sets represented by A and by B; (A ∪ B) represents the union of the sets represented by A and by B; A* represents the Kleene closure of the set represented by A. Regex Operations There are few important operations on RegEx such as 1. Concatenation – (Priority 3) - The ability to combine two RegEx to build a larger RegEx 2. or – (Priority 4) – Denoted by | or ∪, the operation allows a choice between one RegEx or another 3. Closure – (Priority 2) - Denoted by *, it allows more general patterns to be accepted by the machine. 4. Parenthesis – (Priority 1) - The use of parenthesis determines the order of operations Regular Sets: The regular sets are those that can be formed using the operations of concatenation, union, and Kleene closure in arbitrary order. Regular Expressions Examples Example: What are the strings in the regular sets specified by the regular expressions 10*, (10)*, 0 ∪ 01, 0(0 ∪ 1)*, and (0*1)*? Solution: Writing Regex Common Symbol Notations • For a list of symbols x, • [x] : Exactly one of one the symbols must be substituted here • [x]+ : At least one or more symbols must be substituted here • [x]* : Zero or more symbols can be substituted here • [x]{n} : Exactly n symbols must be substituted here • [x]{n,} : Exactly n or more symbols must be substituted here • A backslash before a symbol such as \) means that that it represents the character string ‘)’ and not a formula notation Social Security Number. [0-9]{3}\-[0-9]{2}\-[0-9]{4} Email Address. [a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} Java Identifier. [a-zA-Z_$][a-zA-Z0-9_$]* Phone Number. $[0-9]{3}$ [0-9]{3}\-[0-9]{4} Duality between RegEx and DFA RE. (Regular Expression) Concise way to describe a set of strings DFA. (Deterministic Finite Automata) Machine to recognize a given string in a given set Kleene’s Theorem: In 1956, Kleene showed that a set is recognized by a Finite State Automata (FSA) if and only if it is a regular set. • For any DFA, there exists a RE that describes the same set of strings • For any RE, there exists a DFA that recognizes the same set of strings • One of the most widely used string search algorithms called KMP algorithm, builds a DFA in the preprocessing stage to efficiently search patterns within a long string. RE to DFA Describe a regex that is any binary string with 1’s that are a multiple of 3 (including zero 1’s) Basic Plan. 1. Build a DFA from the RE 2. Simulate DFA with text as input INTRODUCTION TO DISCRETE STRUCTURES NFA and Regular Expressions • Regular Expressions • Non-deterministic finite automata (NFA) • Rules for building NFA State Machines FSM • A finite state machine is a state machine with finite number of states and an output. The machine must have a start state. The machine runs as long as there is input FSA • Finite state automata is a machine with no output. It must have a start state and at least one accept state. The machine runs until the end of input and based on where the machine ends up, the input is accepted or rejected DFA • Deterministic finite automata is an FSA that is deterministic. That is, for each state there is a unique state that it can transition into. NFA • Non-deterministic finite automata ? NFA A nondeterministic finite-state automaton M = (S, I, f, s0, F) consists of A finite set S of states A finite input alphabet I A transition function f that assigns a set of states to every pair of state and input (so that f: S × I → P(S)) An initial or start state s0 A subset F of S consisting of final (or accepting) states. We can represent a nondeterministic finite-state automaton using a state table where we give a list of possible next states for each pair of a state and an input value. We construct a state diagram for a nondeterministic automaton by including an edge from each state to all possible next states, labeling edges with the input or inputs that lead to this transition. We use the abbreviation NFA for a nondeterministic finite-state automaton and DFA for a deterministic finite-state automata when we needed to distinguish between NFA and DFA. Example: Find the state diagram for the NFA with the state table shown in Table 2. The final states are s2 and s3. Finding a DFA Equivalent to a NFA For every NFA there is an equivalent DFA. That is, if the language L is recognized by a NFA M0, then L is also recognized by a DFA M1. We construct the DFA M1 so that The start symbol of M1 is {s0}. The input set of M1 is the same as the input set of M0. Each state in M1 is made from of a set of states in M0. Construct new states in M1 by interpreting each unique output in the M0 transition table as a its singular own state, e.g. s! , 𝑠" , 𝑠# , ∅ Given a state {𝑠$! , 𝑠$" ,…, 𝑠$# } in M1 and an input symbol x, the transitions from this state to the next is the union of transitions f(𝑠$! , x), f(𝑠$" ,x), … , f(𝑠$# ,x) from M0 for the states that compose the state from 𝑀" The final states of M1 are any sets that contain a final state of M0. To see that M0 and M1 are equivalent, first suppose that an input string is recognized by M0. This means that one of the states that can be reached from s0 is a final state. So, in M1 this input string leads from {s0} to a set of states of M0 that contains the final state. Since this is a final state of M1, this string is also recognized by M1. Conversely, a string that is not recognized by M0 does not lead to any final states in M0. Consequently, this input string does not lead from {s0} to a final state of M1. Finding an Equivalent DFA (cont.) Example: Find a DFA that recognizes the same language as the NFA: Solution: Following the steps of the procedure described on the previous slide, we obtain the DFA shown here. NFA Workshop 1 Consider the following NFA on the alphabet = {0,1} and states ={𝑞% , 𝑞" , 𝑞# } and start state 𝑞% . 1. What language is recognized by this machine? ∗ 𝐿 = 0 + | (0 1 2, ) 2. Fill in the transition table for NFA 3. convert NFA to a DFA Input\State q0 0 q0, q1 1 q2 NFA q1 q2 q1, q2 Input\State q0 q1 q2 [q0, q1] [q1, q2] ∅ 0 [q0, q1] ∅ ∅ [q0, q1] ∅ ∅ 1 q2 ∅ [q1, q2] q2 [q1, q2] ∅ DFA This Photo by Unknown Author is licensed under CC BYSA q1 can be deleted as there is no way to reach it from q0 NFA Workshop 2 1. What language is recognized by this machine? (0*1+)* | (0*1+01*)* 2. Fill in the transition table for NFA 3. convert NFA to a DFA Input\State s0 s1 0 [s0,s1] s0 1 s3 [s1,s3] NFA s2 s3 [s0,s1,s2] [s0,s2] s1 Input\State s0 s1 s2 s3 [s0,s1] [s0,s2] [s1,s3] [s0,s1,s2] [s0,s2,s3] [s0,s1,s2,s3] ∅ 0 [s0,s1] s0 ∅ [s0,s1,s2] [s0,s1] [s0,s1] [s0,s1,s2] [s0,s1] [s0,s1,s2] [s0,s1,s2] ∅ 1 s3 s1 [s1,s3] [s0,s2,s3] [s1,s3] [s0,s1,s2,s3] [s0,s1,s2,s3] [s0,s1,s2,s3] ∅ DFA [s1,s3] [s0,s2] s2 and [s0,s2] can be deleted as there is no way to reach them from s0 NFA Workshop 3 1. What language is recognized by this machine? 0*(0|1)(0|1)* 2. Fill in the transition table for NFA 3. convert NFA to a DFA This Photo by Unknown Author is licensed under CC BY-SA NFA DFA State\Input a b c d e 0 a,b,c,d,e c 1 d,e c,e b State\Input a b c d e [a,b,c,d,e] [d,e] [c,e] [b,c,d,e] [b,c,e] ∅ 0 [a,b,c,d,e] c ∅ e ∅ [a,b,c,d,e] e ∅ [c,e] c ∅ 1 [d,e] [c,e] b ∅ ∅ [b,c,d,e] ∅ b [b,c,e] [b,c,e] ∅ e d can be deleted as there is no way to reach it from a DFA vs NFA Build a DFA to recognize all strings of the form 𝑎 𝑏 ' 𝑎 𝑎 𝑏 ∗ • Pretty easy to do, only takes n+2 states • Can also easily construct a simple NFA Now, construct an NFA to recognize the reversed string 𝑎 𝑏 ∗ 𝑎 𝑎 𝑏 '(" Constructing the corresponding DFA would take 2 unfeasible. ' states. This exponential growth makes this NFA’s with 𝜀-transitions Allows the automaton to change its state without consuming any input symbols Purpose of 𝜀-transitions 1. Modeling optional elements • An 𝜀- transition from one state to another can be used to represent an optional character 2. Adding flexibility to automata • Allow NFAs to represent languages that would be difficult to represent with DFAs 3. Modeling context-free grammars • Used to describe the syntax of programming languages and natural languages Despite their flexibility, epsilon transitions do not increase expressive power. In other words, a language that can be recognized by an NFA with epsilon transitions can be recognized without epsilon transitions. However, epsilon transitions can make NFAs more concise and easier to understand for some grammars. 𝜀-transition Example Given the following NFA with 𝜀-transition and alphabet={a,b}, find the final states given the input string ‘aba’. Given this change to the automata from a previous slide, how does the language recognized by the automata change? 𝜖 INTRODUCTION TO DISCRETE STRUCTURES NFA and Regular Expressions • Regular Expressions • Non-deterministic finite automata (NFA) • Rules for building NFA (optional) Rules for building NFA Regular Expressions matching NFA’s • Enclose RE in parenthesis • One state per each character (including parenthesis) • Red Є–transitions can change state w/o scanning a text • Black match transitions change state and scan to next char • Accept if any sequence of transitions ends in accept state Steps in building NFA Step 0. Add Є-transitions from parenthesis to next state Step 1. Add a black transition edge from each match character to next Step 2. Add 3 red Є-transitions edges for each closure (*) operator Step 3. Add 2 red Є-transitions edges for each | operator

Lecture 18 - NFA and RegEx

Related documents

Products

Support

Lecture 18 - NFA and RegEx

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib