Surinder Kumar Jain, University of Sydney Automaton Expressions DFA NFA Ε-NFA CFG as a DFA Equivalence Minimal DFA Definition Conversion from/to Automaton Regular Langauges Pumping Lemma – proving regularness Closures Equivalence A system with many states Can transition from one state to another Usually caused by external input Set of states is finite System is in one state at any given time Mathematical Definition of a DFA A = (Q, Σ,δ, q0,F) Q : States, DFA is in one of these finite states at any time. Σ : Input symbols, DFA changes its state from one state to another state on consuming an input symbol. δ : Transition function. Given a state and an input symbols, gives the next DFA state Function over QxΣ -> Q. q0 : Initial DFA state F : Accepting states. Once DFA reaches one of these states, it may not accept any more input symbols. Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid } start receive Waiting accept Pending pay Accepted Paid Paid reject Rejected Paid Q = { waiting, pending, rejected, approved, paid } Σ = {receive, reject, accept, pay } δ : (waiting -> receive -> pending), (pending -> reject -> rejected), (pending -> accept -> accepted), (accepted -> pay -> paid) q0 : {waiting} F : { rejected, paid } Set of alphabets Concatenation (joining) Strings A subset of strings is a language A DFA defines a language Alphabet set is the set of input symbols Concatenation - one symbol follows another Acceptance – sequence of symbols takes DFA from start state to one of the accepting states Five-tuple like a DFA, (Q, Σ,δ, q0,F) Transition function returns a set not one state Several outgoing arcs with same symbol In several states at the same time Language of NFA Any NFA language can be described by some DFA Adding non-determinism does not give any thing more Why use NFAs then : Easier to make for some languages May have fewer states and less complex Algorithm to convert NFA to DFA For n state NFA,DFA may have up to 2n states Can throw away inaccessible states Observation : DFA has practically the same number of states as NFA though it often has more transitions For an NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd} Qd = Powerset of Q δd(S, a) = Up in S δ(p,a) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, NFA operates on sets of states. Given a state, NFA gives a set of new states Make all possible sets of DFA states as NFA states Transit from one set of states to a new set of all possible state set Any set with an accepting state is the accepting state in NFA O(2n) (number of subsets of a set) Efficient algorithm Do not construct the entire power set Start with start state Only construct subsets that can reach an accepting state from the start state The number of states in DFA is much less than 2n. DFA has practically the same number of states as NFA though it often has more transitions Includes ε (the empty string, not in alphabet set) as a transition ε is identity in concatenation a.ε = ε.a = a for all a Spontaneous transition without an input An ε-NFA language can be described by some NFA Every NFA can be described by some DFA Adding ε transition does not give any thing more Why use ε-NFAs then : Easier to make for some languages Useful in proving equivalence of languages Conversion aims to remove ε transitions Define a new set of states ε are contained inside the set No ε arc leaves or enters the new set of states Epsilon closure (eclose) For a state, set of all states reachable spontaneously Follow the ε arcs recursively and include reachable states in the epsilon closure For an ε-NFA, N = {Q, Σ, δ, q0, F}, Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd} Qd = { eclose(q) | q = eclose(q) and q in Q } δd(S, a) = Up in S δ(p,eclose(a)) for every S in Qd. Fd = S : S is subset of Q and S has an accepting state of NFA DFA operates on one state at a time, ε-NFA operates on sets of states with no ε transition leaving the set Make all eclose sets as DFA states Transit from one set of states to a new set of all eclose state set Any set with an accepting state is the accepting state in NFA An imperative program can be represented as a Control Flow Graph (CFG) with It statements at nodes and predicates at edges can be converted into a CFG with both statements and predicates at edges by pushing node statements up incoming edges Such a CFG is a DFA Program points are States Statements are input symbols that change program state from program point to point Algebraic expression to denote languages Composed of symbols “ε”, “Ø”, “+”, “*”, “.”, “(“, “)” and alphabets The language is generated using rules : L(ε) = empty set L(Ø) = empty set L(a) = a for all alphabets a L(p+q) = L(p) U L(q) L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) } L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1 a+b.c The language generated is : { a, b.c } a.b.c*.d the language generated is : { a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … } A finite way to express an infinite language DEFINITION Two regular expression (or automaton) are EQUAL if they both generate same languages Thus (a.b)* + (b.a)* + a.(b.a)* + b.(b.a)* = (ε + b).(a.b)*.(ε+a) p+q=q+p (p + q) + r = p + (q + r) (p.q).r = p.(q.r) Ø+p=p+Ø=p ε.p = p.ε = p Ø.p = p.Ø = Ø p.(q=r) = p.q + p.r (p + q).r = p.r + q.r p+p=p (p*)* = p* Ø* = ε ε* = ε p.p* = p*.p (p + q)* = (p*.q*)* Every language defined by a finite automaton is also defined by some regular expression defined by a regular expression is also defined by some DFA Hopcroft’s formula Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1) Rij(n) is the regular expression of all paths from i to j. (n is the number of states) States are sorted in some order and numbered 1 to n Rij(k) is regular expression of all paths from i to j passing thru nodes whose sort order is less than k Computed for all i,j for k=0, then k=1,…,k=n Rs,f1(n)+…+Rs,fk(n) is the regular expression of the DFA s is the start state, f1,…,fk are accepting states, n is the number of states. Hopcroft n3 to compute the table and 4n as size of regular expression grows by 4 every time. In A formula is O(n34n), practice it is close to O(n3) By simplifying the regular expression at every step and using judicious algorithm avoiding recomputation of Rkk(k) Most DFAs have almost n and not 2n accessible states faster state elimination method close to O(n2) is also available Regular expression is converted to ε-NFA ε-NFA can the be converted to NFA and to DFA RE to ε-NFA conversion rules : ε Ø a + -> -> -> -> One edge (two state) DFA with ε transition Two state DFA with no edges Two state with “a” transition A new start/accept statejoining two arguments of + in parallel . -> Accept of first is start of second * -> An ε edge joining star/accept of argument and a new start/accept state Convert resulting ε-NFA to a DFA Augment regular expression r to (r).# Position number for each occurrence of alphabet Compute for each node of syntax tree nullable (ε in the language) firstpos (set of possible first alphabets) lastpos (set of possible last alphabets) Compute for each position followpos (set of possible next alphabet after this position) Construct the DFA Unix text search, search matching patterns (grep) Lexical/Parser analysis Parse text against a regular expression find set of first tokens at this expression root find set of last tkens at this expression root can the expression at this root be null set find set of next tokens after an alphabet position in a regular expression Efficient search of patterns in very large repository (web text search) DEFINITION A language (a set of strings) is defined to be a regular language if it can be defined by a finite automaton by a DFA or by an NFA or by an ε-NFA or by a regular expression Four different ways to describe a regular language If L is a regular language then there exists integer n such that for every string w in L we can break w into x, y, z such that w=x.y.z yε |x.y| =< n x.yk.z is in L (for all k >= 0) Proof based on For a DFA of length n any string of length > n must revisit a state Used to prove that a language is not regular Language is a set of string over finite alphabets Language operators : Union of two languages L(A B) = L(A) L(B) - re Intersection Concatenation L(A.B) = { a.b | a in A, b in B} Kleene Closure L(A*) = { an | a in A, n >= 0 } a0 = ε for all a and an = an-1 Compliment L(A’) = { a | a not in A } (with respect to some overall alphabet set) - dfa Difference L(A-B) = L(A) – L(B) - dfa switch q0 F Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A } Homomorphism – replace an alphabet with another regular expression Inverse homomorphism Is the language described empty? Is a particualr string in the described language? Do two different of languages actually describe the same language? Decision properties may require conversion between various forms. Can the conversion be done in reasonable time? Conversion Complexity Computing ε closures O(n3) Warshall’s O(n) Subset construction O(2n) NFA to DFA O(n32n) (In practice O(n3s) DFA to NFA conversion O(n) NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less) Regular Expression to εNFA O(n) Regular Expression to NFA O(n3) Regular Expression to DFA O(n34n^32^n) Equivalence of two states States p and q in an automaton are Defined to be equivalent if For all input strings applied at state p or q p ends up in an accepting state if and only if q also ends up in an accepting state The accepting state reached by p does not have to be same accepting state as that reached by q If two states p and q are equivalent we can combine them together into a single state it wont affect the language accepted by the DFA This process of combining states together is called Minimization Table-filling algorithm can find if two states are equivalent or not. Complexity O(n2) Non-equivalent pairs are distinguishable Minimum DFA is unique Eliminate all states not reachable from start Determine which states are equivalent Partition states into blocks of equivalent states Equivalence is transitive Thus no state is in two blocks Equivalence of two Regular Languages Convert them into their minimum DFAs and check for isomorphism Union method Make a minimum DFA of the union of the two Start state of the two original DFAs must be equivalent if and only if DFAs are equivalent