Finite Automata & Regular Languages Sipser, Chapter 1 Deterministic Finite Automata A DFA or deterministic finite automaton M is a 5-tuple, M = (Q, , , q0, F), where: Q is a finite set of states of M is the finite input alphabet of M : Q Q is the state transition function q0 is the start state of M F Q is the set of accepting states or final states of M DFA Example State diagram Q = { q0, q1 } = { 0, 1 } F = { q1 } 0 M q0 1 0 q1 1 q0 0 q0 1 q1 q1 q1 q0 State Table State table & state transition function State table q0 0 q0 1 q1 q1 q1 q0 State transition function (q0, 0) = q0, (q0, 1) = q1 (q1, 0) = q1, (q1, 1) = q0 State transitions If q, q’ Q, s , and (q, s) = q’, then we say that q’ is an s-successor of q, or there is a transition from q to q’ on input s, and we write q s q’ Example: since (q0, 1) = q1, then there is a transition from q0 to q1 on input 1, and we write q0 1 q1. State sequences If a string of input symbols w = s0s1s2 … sk-1 takes M from initial state q0 to state qk, namely q0 s0 q1 s1 q2 s2 q3 … s[k-1] qk then we say that qk is a w-successor of q0, and write q0 w qk. Also q0q1q2 … qk is called an admissible state sequence for w. Strings accepted by a DFA Let M = (Q, , , q0, F) be a DFA, and w = s0s1s2 … sk-1 * be a string over alphabet . Then M accepts w if there exists an admissible state sequence q0q1q2 … qk for w, starting at initial state q0 and ending with state qk, where qk F. That is, M accepts input string w if M ends up in one of the final states. Language recognized by a DFA The language L(M) that is recognized by a DFA, M = (Q, , , q0, F), is the set of all strings accepted by M. That is, L(M) = { w * | M accepts w } = { w * | q0 w qk, qk F }. Example: For the previous DFA, L(M) is the set of all strings of 0s and 1s with odd parity, that is, odd number of 1s. DFA Example 2 Recognizer for 11*01* 1 B 0 1 1 C A 0 0 Trap D 0,1 DFA Example 2 M = (Q, , , q0, F), L(M) = 11*01* Q = { q0=A, B, C, D } = { 0, 1 } 0 1 F={C} A D B B C B C D C D D D DFA Example 3 0 Modulo 3 counter B 1 0,R 2,R A 1 2 2 1,R C 0 DFA Example 3 M = (Q, , , q0, F) Q = { q0=A, B, C } = { 0, 1, 2, R } F={A} 0 1 2 R A A B C A B B C A A C C A B A Regular Languages A language L * is called regular if there exists a DFA M such that L(M)=L. Earlier, we defined a language L * as regular if there exists a T3 or regular (left-linear or right-linear) grammar G such that L(G)=L. We shall prove that these two definitions are equivalent. Operations on Regular Languages Let A and B be regular languages: Union: A B = { x | x A or x B } Concatenation: AB = { xy | x A and y B }. Kleene Closure (A-star) A* = {x1x2x3 ... xk | k 0 and xi A } Examples of regular operations A = { good, bad }, B = { boy, girl } A B = { good, bad, boy, girl } AB = { goodboy, goodgirl, badboy, badgirl } A* = { , good, bad, goodgood, goodbad, badgood, badbad, … } Closure under Union If A and B are regular languages, then their union, A B, is a regular language Union Machine M(A B) q1F q0 M(A) q2F r0 p1F p0 M(B) p2F Closure under Concatenation If A and B are regular languages, then their concatenation, AB, is a regular language. Concatenation Machine M(AB) Closure under Kleene Star If A is a regular language, then the Kleene closure of A, A*, is also a regular language Kleene Closure Machine M(A*) NFAs: Nondeterministic Finite Automata Presence of lambda transtitions. May have more than one initial state. On input a, state q may have no transition out. On input a, state q may have more than one transition out. NFAs A nondeterministic finite automaton M is a five-tuple M = ( Q, , R, I, F ), where Q is a finite set of states is the (finite) input alphabet R is the transition relation, R QQ I Q is the set of initial states F Q is the set of final states Example NFAs NFA that recognizes the language 0*1 1*0 NFA that recognizes the language (0 1)*11 (0 1)* Converting NFAs to DFAs Given a NFA, M = (Q, , R, I, F), build a DFA, M’ = (Q’, , , S0, F’) as follows. The states S0, S1, S2, … of M’ are sets of states of M. The initial state of M’ is obtained by putting together all the initial states of M and all states reachable from those by transitions, and calling this set S0, the initial state of M’ Converting NFAs to DFAs For each state Sk already in Q’ in M’, and for each input symbol a , put together into a set Sj all states of M reachable from each state in Sk on input a. This set Sj may or may not yet already be in Q’. Also it may be the empty set . Add to the transition from Sk to Sj on input a. Since there can only be a finite number of subsets of states of M, this procedure will stop after a finite number of steps. Example conversions Convert the NFA for the language (0 1)*00 (0 1)*11 to a DFA 0,1 0 A 0 0,1 1 D C B 1 E F State transition table of NFA 0 1 A A,B A - B C - - C - - - D D D,E - E - F - F - - - State table of DFA 0 1 A,D A,B,D A,D,E A,B,D A,B,C,D A,D,E A,D,E A,B,D A,D,E,F A,B,C,D A,B,C,D A,D,E A,D,E,F A,B,D A,D,E,F State diagram of DFA ABD 0 ABCD 0 1 0 1 0 AD 0 1 ADE 1 ADEF 1 Regular Expressions (r.e.) If a , then the set a = {a} is a r.e. The set = {} is a r.e. The set = { } is a r.e. If R and S are r.e., then (R S) is a r.e. If R and S are r.e., then (RS) is a r.e. If R is a r.e., then ( R )* is a r.e. Any r.e. is obtained by a finite application of the above rules. REs and Regular Languages R.E.s are shorthand notation for regular languages. Regex: REs in Unix [a-f], [^a-f] R*, R+, R? {R} RS R|S Minimization of DFAs Subset construction (Myhill-Nerode Theorem) NFAs, DFAs, & Lexical Analyzer Generators Sec 3.6: Finite Automata, Aho, Sethi, Ullman, “Compilers: P.T.T” Sec 3.7: From REs to NFAs (Thompson’s Construction) Sec 3.8: Design of a Lexical Analyzer generator Sec 3.9: Optimization of DFA-based Lexical Analyzers